{"id":3524,"date":"2021-05-02T06:30:23","date_gmt":"2021-05-02T06:30:23","guid":{"rendered":"https:\/\/cloudxlab.com\/blog\/?p=3524"},"modified":"2021-06-24T12:57:15","modified_gmt":"2021-06-24T12:57:15","slug":"how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study","status":"publish","type":"post","link":"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/","title":{"rendered":"How to design a large-scale system to process emails using multiple machines [Zookeeper Use Case Study]?"},"content":{"rendered":"\n<h2>Introduction<\/h2>\n\n\n\n<p>As part of this blog we are going to discuss various ways of large scale system design and the pros-cons of each.<\/p>\n\n\n\n<p>To get a fair understanding of this post, you should know <a href=\"https:\/\/cloudxlab.com\/blog\/introduction-to-big-data-and-distributed-computing\/\" target=\"_blank\" rel=\"noreferrer noopener\">what is distributed computing<\/a>, <a href=\"https:\/\/cloudxlab.com\/blog\/race-condition-and-deadlock\/\" target=\"_blank\" rel=\"noreferrer noopener\">what is deadlock and race conditions<\/a>, <a href=\"https:\/\/cloudxlab.com\/blog\/distributed-computing-with-locks\/\" target=\"_blank\" rel=\"noreferrer noopener\">locking in distributed systems<\/a> and <a href=\"https:\/\/cloudxlab.com\/blog\/introduction-to-apache-zookeeper\/\" target=\"_blank\" rel=\"noreferrer noopener\">Zookeeper<\/a> etc. Let&#8217;s get started.<\/p>\n\n\n\n<h2>Scenario<\/h2>\n\n\n\n<p>Consider a situation where we have an email inbox that consists of emails, and emails are to be processed. For example, processing those emails and classifying each of the emails as spam or non-spam. The other example of the processing could be we are indexing the email so that the search could be performed.<\/p>\n\n\n\n<p>We have an email-processor program, running on various machines distributed physically from each other. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2021\/05\/zk-casestudy-1.png\" alt=\"\" class=\"wp-image-3521\" width=\"598\" height=\"448\" \/><figcaption>Email processor program running on distributed systems<\/figcaption><\/figure><\/div>\n\n\n\n<p>Now these machines need to somehow coordinate such that:<\/p>\n\n\n\n<ul><li>No email is processed two times<\/li><li>No email is left unprocessed<\/li><\/ul>\n\n\n\n<!--more-->\n\n\n\n<h4>Solution 1:<\/h4>\n\n\n\n<p>Usage of flags: we could mark the emails to be read or unread by any machine previously, and only consider those emails which are not yet read.<\/p>\n\n\n\n<h4>CONS:<\/h4>\n\n\n\n<p>While processor 1 reads an email and marks it as read, and then the processor dies, then the email would not be touched by any other processor in future, because it was already marked as read by the first processor,  and thus this email would be left unprocessed.<\/p>\n\n\n\n<h4>SOLUTION 2:<\/h4>\n\n\n\n<p>There should be a manager that could handle the workload and distribute the work to workers.<\/p>\n\n\n\n<h4>Cons:<\/h4>\n\n\n\n<p>This manager could be a bottleneck as it has to maintain a large number of systems, and thus it would be overloaded. Also, what is the manager dies?<\/p>\n\n\n\n<h4>SOLUTION 3:<\/h4>\n\n\n\n<p>We need a central storage which could note down who is doing what, like email id, timestamp it was taken up by a processor, status of completion of processing, etc.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2021\/05\/Untitled-drawing-4.png\" alt=\"\" class=\"wp-image-3526\" width=\"548\" height=\"410\" \/><figcaption>Zookeeper playing crucial role to achieve coordination among distributed systems<\/figcaption><\/figure><\/div>\n\n\n\n<h4>CONS:<\/h4>\n\n\n\n<p>The central storage system can be a bottleneck. Say the email processor programs are running on a lot of machines, then the central storage system would be on high demand and thus it will be overloaded, and it may also die.<\/p>\n\n\n\n<h4>Solution 4:<\/h4>\n\n\n\n<p>By using a distributed system that provides locking such as Zookeeper. You can also use the standard RDBMS system with locking but that would not be highly available.<\/p>\n\n\n\n<p><strong>Zookeeper :<\/strong><\/p>\n\n\n\n<ul><li>provides simple primitives like set\/get, so easy to program<\/li><li>has an easy data model, like a directory tree<\/li><li>is a resilient and highly available tool<\/li><\/ul>\n\n\n\n<p>To know more about CloudxLab courses, <a href=\"https:\/\/cloudxlab.com\/home\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a> you go!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction As part of this blog we are going to discuss various ways of large scale system design and the pros-cons of each. To get a fair understanding of this post, you should know what is distributed computing, what is deadlock and race conditions, locking in distributed systems and Zookeeper etc. Let&#8217;s get started. Scenario &hellip; <a href=\"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;How to design a large-scale system to process emails using multiple machines [Zookeeper Use Case Study]?&#8221;<\/span><\/a><\/p>\n","protected":false},"author":29,"featured_media":3526,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[24],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v16.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to design a large-scale system to process emails using multiple machines [Zookeeper Use Case Study]? | CloudxLab Blog<\/title>\n<meta name=\"description\" content=\"we are going to discuss various ways of large scale system design and the pros-cons of each.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to design a large-scale system to process emails using multiple machines [Zookeeper Use Case Study]? | CloudxLab Blog\" \/>\n<meta property=\"og:description\" content=\"we are going to discuss various ways of large scale system design and the pros-cons of each.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/\" \/>\n<meta property=\"og:site_name\" content=\"CloudxLab Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cloudxlab\" \/>\n<meta property=\"article:published_time\" content=\"2021-05-02T06:30:23+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-06-24T12:57:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2021\/05\/Untitled-drawing-4.png\" \/>\n\t<meta property=\"og:image:width\" content=\"960\" \/>\n\t<meta property=\"og:image:height\" content=\"720\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:site\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\">\n\t<meta name=\"twitter:data1\" content=\"3 minutes\">\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"CloudxLab Blog\",\"description\":\"Learn AI, Machine Learning, Deep Learning, Devops &amp; Big Data\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/cloudxlab.com\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-drawing-4.png\",\"contentUrl\":\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-drawing-4.png\",\"width\":960,\"height\":720,\"caption\":\"Zookeeper playing crucial role to achieve coordination among distributed systems\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/#webpage\",\"url\":\"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/\",\"name\":\"How to design a large-scale system to process emails using multiple machines [Zookeeper Use Case Study]? | CloudxLab Blog\",\"isPartOf\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/#primaryimage\"},\"datePublished\":\"2021-05-02T06:30:23+00:00\",\"dateModified\":\"2021-06-24T12:57:15+00:00\",\"author\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/d3d0a11dfd64a63deaaa52e09d52049e\"},\"description\":\"we are going to discuss various ways of large scale system design and the pros-cons of each.\",\"breadcrumb\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"Home\"}},{\"@type\":\"ListItem\",\"position\":2,\"item\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/how-to-design-a-large-scale-system-to-process-emails-using-multiple-machines-zookeeper-use-case-study\/#webpage\"}}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/d3d0a11dfd64a63deaaa52e09d52049e\",\"name\":\"Vagdevi K\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/15df8c96b21e806c59505fe147d6fa92?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/15df8c96b21e806c59505fe147d6fa92?s=96&d=mm&r=g\",\"caption\":\"Vagdevi K\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/3524"}],"collection":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/users\/29"}],"replies":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/comments?post=3524"}],"version-history":[{"count":4,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/3524\/revisions"}],"predecessor-version":[{"id":3603,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/3524\/revisions\/3603"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/media\/3526"}],"wp:attachment":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/media?parent=3524"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/categories?post=3524"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/tags?post=3524"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}