Why Alibaba withstood 90 seconds 10 billion?

Why Alibaba withstood 90 seconds 10 billion?

Outline

In this paper, Taobao as an example, from one hundred concurrent introduction to the evolution of the architecture of the server under ten million concurrency, and include the relevant technical evolution of each stage of encounter, so that we have an overall framework for the evolution of cognitive, Finally, a summary of some of the principles of architectural design.

basic concepts

Before introducing the framework, in order to avoid some readers of architecture design concepts do not understand, some of the most basic concepts below are introduced.

What is distributed?

A plurality of modules in the system deployed on different servers, can be called a distributed system, such as Tomcat and databases are deployed on a different server, or both Tomcat same function are deployed on different servers.

What is availability?

When part of the system node failure, another node can take over continue to provide services, the system can be considered to have high availability.

What is a cluster?

A software specific areas of deployment and as a whole provides a class of service on multiple servers, called the whole cluster. Zookeeper as the Master and Slave are deployed on multiple servers together to form a whole provides centralized configuration services. In a typical cluster, clients are often able to connect any node access to services, and when dropped cluster node, the other nodes are often able to automatically take over continue to provide services, at this time indicates that cluster highly available.

What is load balancing?

When the request is sent to the system, in some way by the request is distributed evenly to the plurality of nodes, each node of the system can be uniformly load of processing request, the system may be considered to be load-balanced.

What is a forward proxy and reverse proxy?

When internal systems to access external networks, unified by a proxy server forwards the request out, it seems that the proxy server initiated in external network access, then the proxy server implementation is forward proxy; when the external request enters the system, the proxy server the request is forwarded to a server system, to external requests, the proxy server interacts only, then the proxy server is implemented reverse proxy. Briefly, forward proxy is a proxy server instead of internal system procedures to access the external network, and the reverse proxy forwarding the request to access the system via the external proxy server to the internal server process.

The Age of Innocence: single architecture

 

 

 Taobao as an example: initially, the number of applications and the number of users on the site are small, can be deployed in Tomcat and database on the same server. Www.taobao.com when the browser to initiate a request, through the first DNS server (Domain Name System) translates domain names to actual IP address 10.102.4.1, turn your browser to access the IP corresponding Tomcat. Infrastructure bottlenecks: the number of users grows, competition for resources, stand-alone performance between Tomcat and database enough to support the business.

The first Evolution: Tomcat and database deployed separately

 

 

 Tomcat and database server resources are exclusive significantly improve both their performance. Infrastructure bottlenecks: With the growth in the number of users, concurrent read and write database becomes a bottleneck. Tips: Welcome concern micro-channel public number: Java backend, to push for more technical posts.

Second evolution: the introduction of local caching and distributed cache

 

 

Tomcat on the same server or increase the local cache with the JVM, and distributed caching increase in external cache hot commodity information or html pages and other popular merchandise. By caching can request the vast majority intercepted off before the reading and writing database, the database is greatly reduced pressure. Technologies involved include: the use as a local cache memcached, Redis used as a distributed cache, cache coherency also relates to cache penetration / breakdown, avalanche cache, problems such as the failure hot dataset. Infrastructure bottlenecks: cache withstood most of the access request, with the growth of the number of users, concurrent pressure has fallen mainly on Tomcat stand-alone, gradually slows down response.

Third Evolution: the introduction of a reverse proxy load balancing

 

 

 Tomcat are deployed on multiple servers, using a reverse proxy software (the Nginx) uniformly distributed to each of the request Tomcat. Here assume that Tomcat supports up to 100 concurrent, Nginx supports up to 50,000 concurrent, then theoretically the request Nginx distributed to 500 Tomcat, can be withstood 50,000 concurrent. Where the technologies involved include: Nginx, HAProxy, both of which are working in reverse proxy software seventh layer of the network, mainly to support the http protocol, also involving session sharing, file uploading and downloading problems. Infrastructure bottlenecks: a reverse proxy server so that the application can support concurrent large increase, but concurrent volume growth also means more penetrating request to the database, the database will eventually become a single bottleneck.

Fourth evolution: a database separate read and write

 

 

 The database is divided into reading library and write library, the library can have multiple read, write synchronization mechanism by the library to read data synchronization library, the latest data is written to the need to query the scene, by a write in the cache, by get the latest cache data. Technologies involved include: Mycat, which is a database middleware, which can be organized by separate read and write sub-library and database sub-table, database clients to access the lower layer through which also relates to data synchronization, data consistency problems . Infrastructure bottlenecks: business gradually increased, the gap between the views of different businesses large database of different services in direct competition with each other affect performance.

Fifth evolution: the database by business sub-libraries

 

 Save the data from different services to different databases, so that reduce competition for resources between business, access to a large amount of business, you can deploy more servers to support. At the same time this leads to a table across the business can not make a direct correlation analysis, need to be addressed through other means, but this is not the focus of this paper, are interested can search for their own solutions. Infrastructure bottlenecks: With the growth in the number of users, stand-alone write library will gradually reach a performance bottleneck.

Sixth evolution: the large table split into small table

 

 For example, the comment data can be performed in accordance with the hash commodity ID, the routing table stored in a corresponding; against payment record, can create a table by the hour, every hour to continue the table split into small tables, using the user ID or route data record number . As long as the amount of real-time operating table data is sufficiently small, the request can be uniformly distributed to the small enough tables on multiple servers, databases that can improve performance by way of horizontal expansion. Wherein the aforementioned Mycat also supports access control table in the large case is split into small tables. This approach significantly increases the difficulty of the operation and maintenance of the database, the higher the requirements for a DBA. Database design to this structure, has been called but only the overall distributed database, the database of different components of a logic database by different components separately implemented as sub-library management table and the request distribution points, realized by mycat, SQL parsing is implemented by a stand-alone database, separate read and write may be realized by a gateway and the message queue, the summary of the query results may be realized by a database interface layers, etc. this architecture is actually the MPP (Massively parallel processing) a class that implements the architecture. Currently both commercial and open source has been a lot MPP database, open source more popular there Greenplum, TiDB, Postgresql XC, HAWQ and other commercial as South generic GBase, snowball DB Core sail technology, Huawei and so different LibrA the focus of MPP database are not the same, as TiDB more focused on distributed OLTP scenarios, Greenplum more focused on distributed database OLAP scene these MPP basically provides similar Postgresql, Oracle, MySQL supports standard SQL as the ability to be able to resolved to distribute a query to be executed in parallel on each machine, and ultimately by the database itself aggregated data for distributed execution plan also provides the ability to return, such as rights management, sub-library sub-table, transactions, and other data copies, and most of them can support 100 more cluster nodes, greatly reducing the cost of operation and maintenance of the database, and the database can achieve the level of expansion. Infrastructure bottlenecks: database and Tomcat are able to scale horizontally, can support a substantial increase in concurrency, with the growth of the number of users, the ultimate stand-alone Nginx will become a bottleneck.

Seventh Evolution: LVS or using the F5 to load balance multiple Nginx

 

 

 Because of the bottleneck in Nginx, can not be achieved through load balancing multiple Nginx Nginx two layers. The LVS and the protocol of FIG F5 is working in the fourth layer of the network load balancing solution, wherein LVS software, running on the operating system kernel mode, the request may be forwarded to the TCP network protocol or higher level, thus supporting more rich, and performance is much higher than Nginx, can assume stand-alone LVS can support hundreds of thousands of concurrent requests forwarded; F5 hardware load balancing is a similar capacity and LVS offer higher performance than LVS, but expensive . Because LVS is a stand-alone version of the software, if LVS where the server is down will cause the entire back-end systems are inaccessible, thus the need for a spare node. When keepalived software can be used to simulate the virtual IP, then the virtual IP is bound to multiple servers LVS, when a browser to access the virtual IP, the router will be redirected to the real LVS LVS server when the primary server goes down, the software will automatically keepalived update the router's routing table, the virtual IP redirection to another one normal LVS server, so as to achieve high availability server LVS results. Note that here, the figure above Nginx layer from this layer to the Tomcat Videos Nginx not represent all requests are forwarded to all of the Tomcat in actual use, the following may be connected to several Nginx part Tomcat, which the Nginx keepalived achieved by inter-availability, other additional contact Nginx Tomcat, Tomcat to increase the number of such access can be doubled. Infrastructure bottlenecks: Because LVS is also a stand-alone, with the number of concurrent growth when hundreds of thousands to, LVS server will eventually reach a bottleneck, this time the number of users reach tens of millions or even billions levels, users located in different areas, from the server room different, resulting in a delay in access will be significantly different.

Eighth Evolution: load balancing machine room by DNS polling

 

 DNS server can be configured in the domain name corresponding to a plurality of IP addresses, each IP address corresponds to a different room where the virtual IP. When a user accesses www.taobao.com, DNS server uses polling policy or other policies, to select an IP for user access. This method can achieve load balancing machine room this point, the system can achieve room-level horizontal expansion, ten million to one hundred million of concurrency can be solved by increasing the engine room, the entrance to a request concurrency system is no longer a problem. Infrastructure bottlenecks: With the development of the richness and business data, retrieval, analysis and other needs become increasingly diverse, rely solely on the database can not be solved so rich needs.

Ninth evolution: the introduction of NoSQL database and search engine technology

 

 When the data in the database more to a certain size, the database does not apply to complex queries, and often only meet General Queries scene. For statistical reports scene, when large volumes of data may not be able to run out the results, but also in the running complex queries can lead to other queries slow down for full-text search, variable data structures and other scenes, database naturally do not apply. Hence the need for a particular scene, introducing the right solution. For mass storage, such as files, can be resolved through a distributed file system HDFS, key value for the type of data that can be solved for full-text search scene, can be resolved through search engines such as ElasticSearch by HBase and Redis and other programs, for multi-dimensional analysis of the scene, by Kylin or Druid and other solutions. Of course, the introduction of more components will also increase the complexity of the system, different components of the saved data to be synchronized, to consider the issue of consistency, the need to have more tools to manage the operation and maintenance of these components. Infrastructure bottlenecks: the introduction of more components to address the needs of the rich, the dimensions can greatly expand the business, followed by an application contains too many business code, the iterative upgrade business difficult.

Tenth Evolution: large application is split into small application

 

 According to business segments to divide the application code, the responsibility of a single application clearer, the upgrade can be done independently of each other between iterations. This time may be related to some public between the application configuration, distributed configuration can be solved through the center of Zookeeper. Architectural bottlenecks: shared between different application modules, separately managed by the application can lead to the presence of multiple copies of the same codes, resulting in all the application code should follow the common upgrade feature upgrades.

Eleventh Evolution: Multiplexed features pulled into fine service

 

 Such as user management, orders, payments, authentication and other functions in multiple applications exist, you can put on these features extracted singled out to form a separate service to manage this service is called micro-services, applications and services of between access to public services through a variety of ways HTTP, TCP or RPC requests, etc., each individual service can be managed by a separate team. In addition, it can be achieved through Dubbo, SpringCloud and other service governance framework, current limiting, fuse, demotion and other functions, improve the stability and availability of the service. Infrastructure bottlenecks: the way different interfaces to access different services, application code needs to fit a variety of access methods to use the service. In addition, the application to access the service, the service may also access between each other, the call chain will become very complicated, the logic becomes confusion.

The difference from the introduction of an Enterprise Service Bus ESB shield service interface: Twelfth Evolution

 

 Access protocol conversion by ESB unified, unified application to access back-end services through the ESB, but also to call each other by ESB services and services in order to reduce the degree of coupling system. This single application into multiple applications, public services managed separately extracted, and used to lift the enterprise message bus architecture coupling problems between the services, is called the SOA (Service Oriented) architecture that micro Services architecture confusing, because very similar expressions. Personal understanding, more micro-service architecture refers to the system of public service extracted thought alone operation and maintenance management, and SOA architecture is split refers to a service and the service interface for accessing become a unified framework for thinking, SOA architecture It contains the idea of ​​micro-services. Infrastructure bottlenecks: business development, applications and services will continue to become more application and service deployment becomes complicated issue deploy multiple services on the same server runtime environment but also to resolve conflicts addition, such as the need to promote large dynamic scalable capacity scenario, we need to expand the level of service performance, you need to prepare the operating environment on the new service, deployment services, operation and maintenance will become very difficult.

Thirteenth evolution: the introduction of container technology to achieve operating environment isolated from the dynamic service management

 

 The most popular technique is the Docker containers, the most popular container management services are Kubernetes (K8S), applications / services can be packaged as a Docker mirror, to dynamically distribute and deploy the image by K8S. Docker Mirror understood as a minimal operating system to run your applications / services, which stood application / service running code, the operating environment is set up according to actual needs. After the whole "operating system" is packaged as a mirror, you can distribute to the machine need to deploy related services, direct start Docker image can be put together from the service, the service deployment and operation and maintenance easier. Before the big promotion, can be divided on an existing server to a cluster of machines Docker start mirroring, enhanced service performance after large pro can close the mirror, (before Section 18 of the other services on the machine is not affected, services need to modify the system configuration to run on the new machine to adapt services, which can lead to the other services needed to run the machine environment is destroyed). Infrastructure bottlenecks: After using container technology services dynamically scaling capacity problem is resolved, but the machine still needs the company itself to manage, in a non-high-actuation time, or required idle a lot of machine resources to deal with the big promotion, the machine itself costs and operation and maintenance costs are high, low resource utilization.

Fourteenth evolution: the cloud platform carrying system

 

 The system can be deployed to public clouds, using the public cloud massive machine resources, problem-solving dynamic hardware resources to promote the large time period, provisional application platform in the cloud more resources, combined with Docker and K8S to quickly deploy services, released after the end of the big promotion resources, truly pay-resource utilization is increased, while significantly reducing operational costs. The so-called cloud platform, is the massive machine resources, through a unified resource management, a resource abstraction for the whole on-demand cloud platform can dynamically apply hardware resources (such as CPU, memory, network, etc.), and on providing common operating system to provide common technology components (such as Hadoop technology stack, MPP databases, etc.) for users, developers and even provide a good application users need not be concerned applications for internal use what technology can address the needs (such as audio and video transcoding service, mail service, personal blog, etc.).

In the cloud platform will involve the following concepts:

  1. IaaS: Infrastructure as a Service. Corresponding to the above mentioned machine resources as a unified whole resource, dynamically apply level hardware resources;

  2. PaaS: Platform as a Service. Provided corresponding to said conventional technology components to facilitate system development and maintenance of the above;

  3. SaaS: Software as a Service. Corresponding to the above mentioned applications provide a good or service development, by function or performance requirements of cost.

So far: from a high-level architecture and systems concurrent access problems to the services mentioned above have implemented their own solutions.

But it also should be aware that, in the above description, in fact, is to ignore the real issues such as data synchronization across the room, a distributed transaction implementation, etc., these problems have the opportunity a chance to discuss separately.

 

Guess you like

Origin www.cnblogs.com/Dominic-Ji/p/11883326.html
Recommended