You are probably reading this article with a dual- or quad-core processor, and perhaps with even more cores. Your computer is already a distributed system, with multiple computing components—cores—communicating with each other via main memory and other channels such as physical buses—or wires—between them. As you browse multiple web pages you are interacting with the largest distributed system ever created—the Internet. We recently celebrated IPv6 Day : IPv6 is a new form of addressing devices connected to the Internet because its sheer scale has outgrown the previous standard IPv4’s list of addresses—all 4+ billion of them. Every Internet company depends on distributed systems, and, by extension, the economies of the world are now tied to them.
Companies such as Google, Facebook, and Amazon are all interested in building highly efficient large-scale distributed systems which power their businesses. Over the previous decade, Google has described their Google File System (GFS) —a file system spanning thousands of computers to store more data than any single computer system, and a technology that has shaped almost every form of large-scale computing since publication: MapReduce . MapReduce is distributed computing for the masses because it distills everything down to two functions—Map and Reduce—and once they are specified it handles all other aspects of coordinating thousands of computers on behalf of the programmer. Facebook has released open source projects such as Thrift  for implementing communication between programs in different programming languages. Amazon built the first, and largest, public cloud EC2  by inventing new distributed systems designed to bring datacenter scale to the masses—with EC2 you can easily start 100 servers within minutes. Amazon has offered many other services to enhance their overall cloud such as a storage substrate called S3 —think of it as a building block for a GFS—and CloudFront , a content distribution network (CDN) designed to distribute data around the world for low latency and high bandwidth access. Akamai  also helps deliver the web’s content with one of the largest CDN networks in the world. Netflix has their own distributed CDN  as they outgrew solutions provided by Akamai and Amazon.
The Domain Name System  is a large distributed system everyone is familiar with—either directly or indirectly. You may have registered a DNS name in the past providing you with your own customized domain name such as www.wolfgangrichter.com (not registered by the author). DNS is comprised of a multi-tier distributed architecture for load-balancing and efficiency. DNS one of the earliest examples of a distributed key-value store, sometimes called a dictionary—just something that maps arbitrary input keys to arbitrary output values. In DNS, your input key is a human-readable domain name and the output value your computer expects DNS to return is a numeric IP address—described in IPv4 or IPv6—meant for machine consumption. The highest tier redirects to lower tiers and so on to reduce load and force those responsible for domain names to host their own mapping DNS servers. You can imagine how slow the Internet would become if all domain name mappings had to be stored on a single small set of computers. The IP address is used by your web browser or other network-enabled applications to contact a server representing the human-readable domain name provided by you.
With world economies tied to distributed systems, it is no mistake that the study of distributed systems is paramount to the future of computing and research reflects this with efforts such as the Exascale  project. The Exascale project explores what future distributed systems might look like beyond the largest scale imaginable today. No problem moving forward will be able to avoid the often messy, although ultimately satisfying when overcome, challenges of distributed computing. The future of computing depends upon our capabilities to develop, deploy, and maintain distributed systems.
As a readership, and author for this blog, we must come to agreement on the definition for ‘distributed system.’ Of course, as a distributed systems researcher, my view is clouded by a lens through which I see everything as a distributed system. You may not agree with me, and we encourage discourse, so please feel free to comment in with your criticism. You might wonder, “Why don’t we just use the definition in Merriam-Websters [not in Merriam-Websters!]? Or Wikipedia ?” Well, everyone in academia likes to make up their own definitions for things, and occassionally their own words. I have, of course, saved the best for last. I hope that the definition below crisply defines what a distributed system is in your mind, as I hope to dissect many of the most interesting developments in distributed systems research in future articles:
In Computer Science, a distributed system is any set of entities capable of computation which also have the capability of communicating via a set of mechanisms such that computation may be organized among them.
Examples of distributed systems:
- Car – multiple embedded microprocessors
- Single core computer with graphics card – two discrete computation entities communicating via shared buses
- Multicore computer – clearly a distributed system with multiple cores
- Networked computers – at a minimum they cooperate via network protocols; in the limit they could be architected together for high performance or scientific computing