Explore the mystery behind the Google App Engine (2) - Google's overall architecture guess (reprint)

Note: This is a guest blog post series. Contributor Wu Zhu Hua had engaged in related research and cloud computing at IBM China Research Institute is now working on cloud computing technology.

This article is to summarize and speculation about Google's overall architecture is based on existing public information and personal experience.

In software engineering, we have a consensus that "demand determines the architecture" that is, the development of infrastructure in order to better support the application. This paper describes the architecture before then, tell us about the main products offered by Google What?

product

For Google and its several major products, such as search, mail, etc., we have been very familiar with, but which provides services not only here, and can be divided into six categories:

  • Various search: web search, image search and video search.
  • Advertising System: AdWords and AdSense.
  • Productivity tools: Gmail and Google Apps and so on.
  • Geographic Product: Maps, Google Earth and Google Sky and so on.
  • Video playback: Youtube.
  • PaaS platforms: Google App Engine.

design concept

According to available information, Google's design philosophy can be summed up mainly following these six:

  • Scale, Scale, Scale Scale: Google because the customer service most are facing more than one million level, resulting in Scale is scalable already deeply embedded in the DNA of Google, and Google to help developers better development of distributed applications and services, not only for the development of large-scale data processing MapReduce framework, also introduced for the deployment of distributed applications PaaS platform Google App Engine.
  • Fault Tolerance: a distributed system, even if it is built on expensive minicomputers or mainframes, software or hardware errors can also occur from time to time, not to mention Google's distributed systems or pouring on cheap X86 server, even if its equipment nominal MTBF (mean time between failure) is high, but due to the very many devices within a cluster, which is likely to lead to errors is very high, such as Kai-fu Lee had mentioned one such example: in a X86 has more than 20,000 Taiwan cluster server, every day about 110 machines will appear downtime and other adverse circumstances, it is a fault-tolerant can not be ignored, and this point has also been mentioned many times Google Fellow Jeffrey Dean in his speech.
  • Low latency: The delay is a very important factor affecting the user experience, Marissa Mayer, vice president of Google once said: "If the time delay of more than half a second of each search, then use the search service will be reduced by 20%", As can be seen from this example, low latency is critical to the user experience, but also the speed of light in order to avoid delays and complex network environment, Google is already set up local data centers in many areas.
  • Inexpensive hardware and software: Since the data Google processed requests on a daily basis and on an unprecedented scale, so existing servers and business software vendors is hard to Google "tailor-made" a set of distributed systems, and even if we can design and produced, its price is also Google can not afford, so that millions of servers on the basic use of cheaper X86 system and open-source Linux, and develop a set of distributed software stack, including MapReduce Part I mentioned , BigTable and GFS and so on.
  • Priority mobile computing: Although with the development of Moore's Law, makes a lot of resources are in constant growth, such as bandwidth, but until now the cost is far greater than the cost of mobile data mobile computing, so when processing large-scale data, Google still prefer mobile computing, rather than moving data.
  • Service Mode: Google's system, the service is quite common, such as its core search engine relies 700-1000 internal service, and the service of such loosely coupled development model in the testing, development and expansion have advantages because it is suitable for small development teams, and is easy to test.

The overall architecture of conjecture

In this part of the overall structure, the first will include Google's three major workloads, then we will try to classify the data center will eventually do some summary.

Three operating load

For Google, in fact, the workload is not just only this kind of search, it can be mainly divided into three categories:

  • Local interaction: for local users to provide basic Google services, such as Web Search, but will content generation and management handed over to the following content delivery system, such as: to generate the required search Index and so on. By local interaction, allowing users to reduce latency, thereby improving the user experience, but also its SLA demanding, because it is directly facing customers.
  • Content Delivery: Stores provide content for the majority of Google services, generation and management, such as creating the desired search Index, data and video such as YouTube, GMail storage, and content interaction system is mainly based on Google's own set of distributed development style software stack. Also, the system attaches great importance to throughput and cost, rather than the SLA.
  • Key business: including some of Google's enterprise-class services, such as for customer management and human resources and other business systems daily operations and profit advertising system (AdWords and AdSense), while demand for business-critical SLA is very high.

Two types of data center

According to 2008 data, Google in 37 data centers worldwide, including 19 in the United States, 12 in Europe, three in Asia (Beijing, Hong Kong, Tokyo), the other three located in Russia and South America. FIG 36 shows the distribution of the data centers around the world:

pingdom_google_map_worldwide.jpgFigure 1. In 2008 Google data centers worldwide distribution

According to  Jeffrey Dean  said in a speech a few of the most recent quarterly and year-end 2009 can be speculated that Google is not too much to increase the number of global data centers in 2009, the total number should still be slightly more than 36, but most likely in Taiwan, Malaysia , Lithuania add new data centers.

Although Google has a lot number of data centers, but there are some differences between them, and can be divided into two categories: one is a giant data centers, and the second is a medium to large data centers.

Mega Data Center : Server scale should be more than hundreds of thousands, often located next to the power plant in order to obtain cheaper energy, mainly for internal Google services, that is, content delivery services, but also in terms of design focused on cost and throughput, Therefore, the introduction of a large number of custom hardware and software to reduce PUE and improve handling capacity, but its respect SLA requirements are not particularly stringent, as long as most of the time can be used. Below is a representative of Google's massive data center, the data center is located in the city Dalles on the Columbia River in northern Oregon, with a total area of nearly 30 acres and occupies most of the power output of 1.8GW near a hydroelectric power station, when after this data center fully operational, it will consume 103 megawatts of electricity, which is equivalent to the entire electricity consumption of a small and medium sized cities.

google DC.jpg

Figure 2. Google Columbia River in Oregon giant close-range view of the data center

Medium to large data center : server scale to around thousand units million units, it can be used to interact with local or business-critical in the design and development very seriously delay and high availability, so that it is located as close as possible location of the user and the use of standard hardware and software , such as Dell's server and MySQL database, common PUE probably between 1.5 and 1.9. Originally located near the Beijing Chaoyang District Jiuxianqiao "Century Internet" Google China data center room also belong to this type of medium to large data center, which uses the workstation hardware and Juniper firewalls DELL, etc., below its corner.

2008421124418.jpg

Figure 3. The front corner of Google China Data Center (see [26])

About the difference between the two: For details, please see the following table:

  Giant data centers Medium to large data centers
Workload Content Delivery Local interactive / business-critical
location From power plant near Users from nearly
Design Features High throughput, low cost Low latency, high availability
Server Customization many less
LETTUCE ordinary high
The number of servers Over thousands of units Thousands more
Number of data centers Within ten tens of
PUE valuation 1.2 1.5

Table 1. Comparative table of medium to large data centers huge

to sum up

Finally, a little sum up, first of all, ordinary users when accessing Google services, most of them will forward the request to the user's local data center according to ISP IP address its request or under it, a request if the local data center can not handle this, it is likely to forward the request to the content interaction center remote. Second, when an advertiser wants to access Google's advertising system, the request will be forwarded directly to their professional business-critical data center to deal with.

google architecture.PNG

FIG 4. Summary

Because this article is a summary based on existing public information and personal experience and conjecture, and so the actual operation of Google does not have any contact.

Benpian end, next will Google App Engine and its main components are introduced.

--EOF--

Reproduced in: https: //www.cnblogs.com/licheng/archive/2010/09/09/1821967.html

Guess you like

Origin blog.csdn.net/weixin_34067102/article/details/92626863