High-concurrency server distributed architecture evolution of the road

1 Overview

In this paper, Taobao as an example, from one hundred concurrent introduction to the evolution of the architecture of the server under ten million concurrency, and include the relevant technical evolution of each stage of encounter, so that we have an overall framework for the evolution of cognitive, Finally, a summary of some of the principles of architectural design.

2. Basic Concepts

Before introducing the framework, in order to avoid some readers of architecture design concepts do not understand the following concepts are introduced some of the most basic:

  • Distributed
    system a plurality of modules deployed on different servers, can be called a distributed system, such as Tomcat and databases are deployed on a different server, or both Tomcat same function are deployed on a different server
  • High availability
    when some nodes in the system fails, the other node can take over continue to provide services, it can be considered to have a high-availability system
  • Cluster
    a particular field of software deployment and as a whole provides a class of service on multiple servers, called the whole cluster. Zookeeper as the Master and Slave are deployed on multiple servers together to form a whole provides centralized configuration services. In a typical cluster, clients are often able to connect any node access to services, and when dropped cluster node, the other nodes are often able to automatically take over continue to provide services, this time with a high-availability cluster description
  • Load balance
    request sent to the system, in some way by the request is distributed evenly to the plurality of nodes, each node of the system can be uniformly processing request load it can be considered a load balancing system
  • Forward proxy and reverse proxy
    internal system to access external networks, unified the request forwarded by a proxy server, it seems that the proxy server initiated in external network access, then the proxy server implementation is forward proxy; when an external request into the system, the proxy server forwards the request to a server system, to external requests, the proxy server interacts only, then the proxy server is implemented reverse proxy. Briefly, forward proxy is a proxy server instead of internal system procedures to access the external network, and the reverse proxy forwarding the request to access the system via the external proxy server to the internal server process.

3. Architecture Evolution

3.1 stand-alone architecture

Taobao as an example. When the initial site, the number of applications and the number of users are small, it can be deployed in Tomcat and database on the same server. Www.taobao.com when the browser to initiate a request, through the first DNS server (Domain Name System) translates domain names to actual IP address 10.102.4.1, turn your browser to access the IP corresponding Tomcat.

With the growth of the number of users, competition for resources, stand-alone performance between Tomcat and database enough to support business

The first Evolution 3.2: Tomcat and database deployed separately

Tomcat and database server resources are exclusive significantly improve both their performance.

With the growth of the number of users, concurrent read and write database becomes a bottleneck

3.3 Second evolution: the introduction of local caching and distributed cache

Tomcat on the same server or increase the local cache with the JVM, and distributed caching increase in external cache hot commodity information or html pages and other popular merchandise. By caching can request the vast majority intercepted off before the reading and writing database, the database is greatly reduced pressure. Technologies involved include: the use as a local cache memcached, Redis used as a distributed cache, cache coherency also relates to cache penetration / breakdown, avalanche cache, problems such as the failure hot dataset.

Cache withstood most of the access request, with the growth of the number of users, concurrent pressure has fallen mainly on Tomcat stand-alone, gradually slow response

3.4 Third Evolution: the introduction of a reverse proxy load balancing

 

Tomcat are deployed on multiple servers, using a reverse proxy software (the Nginx) uniformly distributed to each of the request Tomcat. Here assume that Tomcat supports up to 100 concurrent, Nginx supports up to 50,000 concurrent, then theoretically the request Nginx distributed to 500 Tomcat, can be withstood 50,000 concurrent. Where the technologies involved include: Nginx, HAProxy, both of which are working in reverse proxy software seventh layer of the network, mainly to support the http protocol, also involving session sharing, file uploading and downloading problems.

Reverse proxy server so that the application can support concurrent large increase, but concurrent volume growth also means more penetrating request to the database, the database will eventually become a single bottleneck

3.5 The fourth evolution: a database separate read and write

The database is divided into reading library and write library, the library can have multiple read, write synchronization mechanism by the library to read data synchronization library, the latest data is written to the need to query the scene, by a write in the cache, by get the latest cache data. Technologies involved include: Mycat, which is a database middleware, which can be organized by separate read and write sub-library and database sub-table, database clients to access the lower layer through which also relates to data synchronization, data consistency problems .

Business gradually increased, the gap between the views of different businesses large database of different services in direct competition with each other affect performance

3.6 The fifth evolution: the database by business sub-libraries

Save the data from different services to different databases, so that reduce competition for resources between business, access to a large amount of business, you can deploy more servers to support. At the same time this leads to a table across the business can not make a direct correlation analysis, need to be addressed through other means, but this is not the focus of this paper, are interested can search for their own solutions.

With the growth of the number of users, stand-alone write library will gradually reach a performance bottleneck

3.7 Sixth evolution: the large table split into small table

For example, the comment data can be performed in accordance with the hash commodity ID, the routing table stored in a corresponding; against payment record, can create a table by the hour, every hour to continue the table split into small tables, using the user ID or route data record number . As long as the amount of real-time operating table data is sufficiently small, the request can be uniformly distributed to the small enough tables on multiple servers, databases that can improve performance by way of horizontal expansion. Wherein the aforementioned Mycat also supports access control table in the large case is split into small tables.

This approach significantly increases the difficulty of the operation and maintenance of the database, the higher the requirements for a DBA. Database design to this structure, has been called a distributed database, but this is only a whole, the different components of the database is a separate database logic implemented by various components, such as sub-sub-library management table and requests distribution, implemented by mycat, SQL parsing is implemented by a single database, separate read and write may be implemented by the gateway and message queues, aggregate query results may be achieved by the database interface layer and so on, in fact, this architecture is MPP (a large-scale parallel processing) realization of a class architecture.

Currently both commercial and open source has been a lot MPP database, open source more popular there Greenplum, TiDB, Postgresql XC, HAWQ and other commercial as South generic GBase, snowball DB Core sail technology, Huawei's LibrA and so on, focus different MPP databases are not the same, as TiDB more focused on distributed OLTP scenarios, Greenplum more focused on distributed OLAP scene, these basic MPP database provides similar Postgresql, Oracle, MySQL as SQL standard support capabilities, be able to resolve a query distribution for distributed execution plan to be executed in parallel on each machine, and ultimately by the database itself the ability to aggregate data return, but also provided such as rights management, sub-library sub-table, transactions, and other data copies, and most of It can support more than 100 nodes in a cluster, dramatically reducing the cost of operation and maintenance of the database, and the database can achieve the level of expansion.

Database and Tomcat are able to scale horizontally, can support a substantial increase in concurrency, with the growth of the number of users, the ultimate stand-alone Nginx will become a bottleneck

3.8 Seventh Evolution: LVS or using the F5 to load balance multiple Nginx

Because of the bottleneck in Nginx, can not be achieved through load balancing multiple Nginx Nginx two layers. The LVS and the protocol of FIG F5 is working in the fourth layer of the network load balancing solution, wherein LVS software, running on the operating system kernel mode, the request may be forwarded to the TCP network protocol or higher level, thus supporting more rich, and performance is much higher than Nginx, can assume stand-alone LVS can support hundreds of thousands of concurrent requests forwarded; F5 hardware load balancing is a similar capacity and LVS offer higher performance than LVS, but expensive . Because LVS is a stand-alone version of the software, if LVS where the server is down will cause the entire back-end systems are inaccessible, thus the need for a spare node. Keepalived software can be used to simulate the virtual IP, then the virtual IP is bound to multiple servers LVS, when a browser to access the virtual IP, the router will be redirected to the real server LVS, LVS when the primary server goes down, the software will keepalived router automatically updates the routing table, the virtual IP redirected to another server LVS a normal, so as to achieve the effect of high availability servers LVS.

Note that here, the figure above Nginx from layer to layer thus Tomcat Videos do not represent all of these Nginx Nginx requests are forwarded to all of the Tomcat, in actual use, the following may be connected to several Nginx part of Tomcat, keepalived achieved between high availability, other additional contact Nginx Tomcat, Tomcat to increase the number of such access can be doubled.

Because LVS is also a stand-alone, with the number of concurrent growth when hundreds of thousands to, LVS server will eventually reach a bottleneck, this time the number of users reach tens of millions or even billions levels, users located in different regions, different from the server room, leading to delays can access significantly different

3.9 Eighth Evolution: load balancing machine room by DNS polling

DNS server can be configured in the domain name corresponding to a plurality of IP addresses, each IP address corresponds to a different room where the virtual IP. When a user accesses www.taobao.com, DNS server uses polling policy or other policies, to select an IP for user access. This method can achieve load balancing machine room, thus, the system can achieve room-level horizontal expansion, ten million to one hundred million of concurrency can be solved by increasing the engine room, the entrance to the system of concurrent request quantity is no longer a problem .

As more abundant richness of data and business development, retrieval, analysis and other needs, rely solely on the database can not be solved so rich demand

3.10 Ninth evolution: the introduction of NoSQL database and search engine technology

When the data in the database more to a certain size, the database does not apply to complex queries, and often only meet General Queries scene. For statistical reports scene, when large volumes of data may not be able to run out the results, but also in the running complex queries can lead to other queries slow down for full-text search, variable data structures and other scenes, database naturally do not apply. Hence the need for a particular scene, introducing the right solution. For mass storage, such as files, can be resolved through a distributed file system HDFS, key value for the type of data that can be addressed through HBase and Redis and other programs, for full-text search scenarios, such as ElasticSearch can be resolved through a search engine, for multi-dimensional scene analysis, may by solving such as Kylin or Druid program.

Of course, the introduction of more components will also increase the complexity of the system, different components of the saved data to be synchronized, to consider the issue of consistency, the need to have more tools to manage the operation and maintenance of these components.

The introduction of more components to address the needs of the rich, the dimensions can greatly expand the business, followed by an application contains too many business code, the iterative upgrade business difficult

3.11 tenth Evolution: large application is split into small application

According to business segments to divide the application code, the responsibility of a single application clearer, the upgrade can be done independently of each other between iterations. This time may be related to some public between the application configuration, distributed configuration can be solved through the center of Zookeeper.

Shared between different application modules, separately managed by the application can lead to the presence of multiple copies of the same codes, resulting in all the application code should follow the common upgrade function upgrade

3.12 Eleventh Evolution: Multiplexed features pulled into fine service

Such as user management, orders, payments, authentication and other functions are present in multiple applications, you can put on these features extracted form a separate service to manage alone, such a service is called micro-services, applications and services between access to public services through a variety of ways HTTP, TCP or RPC requests, etc., each individual service can be managed by a separate team. In addition, it can be achieved through Dubbo, SpringCloud and other service governance framework, current limiting, fuse, demotion and other functions, improve the stability and availability of the service.

Ways different interfaces to access different services, application code needs to fit a variety of access methods to use the service. In addition, the application to access the service, the service may also access between each other, the call chain will become very complicated, the logic becomes confused

3.13 Twelfth evolution: the introduction of an Enterprise Service Bus ESB service interface shielding difference of access

Access protocol conversion by ESB unified, unified application to access back-end services through the ESB, but also to call each other by ESB services and services in order to reduce the degree of coupling system. This single application into multiple applications, public services managed separately extracted, and used to lift the enterprise message bus architecture coupling problems between the services, is called the SOA (Service Oriented) architecture that micro Services architecture confusing, because very similar expressions. Personal understanding, more micro-service architecture refers to the system of public service extracted thought alone operation and maintenance management, and SOA architecture is split refers to a service and the service interface for accessing become a unified framework for thinking, SOA architecture It contains the idea of ​​micro-services.

Business development, applications and services will continue to become more application and service deployment complicate the deployment of multiple services on the same server operating environment but also solve the problem of conflict, in addition, to promote such needs as large dynamic scaling Yung scenes, extended service required performance level, you need to prepare the operating environment on the new service, deployment services, operation and maintenance will become very difficult

3.14 Thirteenth evolution: the introduction of container technology to achieve operating environment isolated from the dynamic service management

The most popular technique is the Docker containers, the most popular container management services are Kubernetes (K8S), applications / services can be packaged as a Docker mirror, to dynamically distribute and deploy the image by K8S. Docker Mirror understood as a minimal operating system to run your applications / services, which stood application / service running code, the operating environment is set up according to actual needs. After the whole "operating system" is packaged as a mirror, you can distribute to the machine need to deploy related services, direct start Docker image can be put together from the service, the service deployment and operation and maintenance easier.

Before the big promotion, can be divided on an existing machine out of a cluster server to start Docker image, enhance the performance of the service, after a big promotion can turn off mirroring of other services on the machine does not affect (section 3.14 before, services need to modify the system configuration to run on the new machine to adapt services, which can lead to the other services needed to run the machine environment is destroyed).

After using container technology services dynamically scaling capacity problem is resolved, but the machine still needs the company itself to manage, in a non-high-actuation time, or required idle a lot of machine resources to deal with the big promotion, the machine itself costs and operation and maintenance costs They are high, low resource utilization

3.15 Fourteenth evolution: the cloud platform carrying system

The system can be deployed on a public cloud, public cloud resources using the massive machines, problem-solving dynamic hardware resources, the time period of big promotion, in the cloud platform provisional application of more resources, combined with Docker and K8S to quickly deploy services free up resources after the big promotion, pay-truly, resource utilization is increased, while significantly reducing operational costs.

The so-called cloud platform, is the massive machine resources, through a unified resource management, a resource for the whole abstract, dynamic on-demand application of hardware resources (such as CPU, memory, network, etc.), and on providing common operating system to provide common technology components (such as Hadoop technology stack, MPP databases, etc.) for users, even provide a good application, users do not need internal application development relationships What technology is used, will be able to address the needs (such as audio and video transcoding service , mail service, personal blog, etc.). In the cloud platform will involve the following concepts:

  • IaaS: Infrastructure as a Service. Corresponding to the above mentioned machine resources as a unified whole resource, dynamically apply level hardware resources;
  • PaaS: Platform as a Service. Provided corresponding to said conventional technology components to facilitate system development and maintenance of the above;
  • SaaS: Software as a Service. Corresponding to the above mentioned applications provide a good or service development, by function or performance requirements of cost.
So far, high-level architecture and systems from concurrent access problems to the services mentioned above have implemented their own solutions, but also should be aware that, in the above description, in fact, is deliberately ignored, such as cross the actual problem of data synchronization engine room, a distributed transaction implementation, etc., these problems a chance to have the opportunity to discuss individual

4. Architecture Design Summary

  • Whether to adjust the architecture must be carried out in accordance with the above-mentioned evolution path?
    No, the above mentioned order of the evolution of architecture but a separate improvements to a side, in the actual scene, the same time may have several problems to be solved, or perhaps the first to reach the bottleneck is another aspect, this time We should actually solve the actual problem. As the government may not be in the amount of concurrent classes, but the business may be very rich scene, high concurrency issues not focused on solving of priority needs at this time may be rich demand solutions.
  • For the system to be implemented, the architecture should be designed to what extent?
    For a single implementation and performance of a clear system architecture designed to support the performance requirements of the system is sufficient, but needed to leave in order to extend the schema of the interface is not prepared. For the continuous development of systems, such as electronic business platform should be designed to meet the degree of users and performance requirements of the next stage, and in accordance with the continuous growth of business iterative upgrade infrastructure to support higher concurrency and richer services .
  • Server architecture and big data architecture What is the difference?
    The so-called "big data" is actually a collection of massive data cleansing conversion, data storage, data analysis, a generic data service solutions scene, in every scene contains a variety of alternative technologies, such as data acquisition has Flume, Sqoop, Kettle and other data are stored distributed file system HDFS, FastDFS, NoSQL database HBase, MongoDB and other data analysis is Spark technology stack, machine learning algorithms. Overall big data architecture is based on business needs, integrate various components of a combination of big data architecture, generally provide distributed storage, distributed computing, multi-dimensional analysis, data warehousing capabilities, machine learning algorithms. And more server architecture refers to the organizational level of the application architecture, the underlying capability is often provided by big data architecture.
  • There is no principle some of the architectural design?

    • Design 1 N +. Each system component should be so no single point of failure;
    • Rollback design. Ensure that the system can be forward compatible, have a way to roll back versions should upgrade;
    • Disable design. Specific functions should be provided a control configuration is available, the system fails to quickly offline function;
    • Monitor design. During the design phase should consider means of monitoring;
    • Live data center design. If the system requires high availability should be considered in a live data center more embodiments, the system at least in the case of a power failure of the room is still available;
    • Using mature technology. Newly developed or open source technology often there are many hidden bug, a problem with no commercial support could be a disaster;
    • Resource isolation design. Avoid single business occupy all of the resources;
    • Architecture should be able to scale horizontally. The system can only achieve the level of expansion, in order to avoid bottlenecks;
    • The purchase of non-core. If non-core functions would take a lot of R & D resources to resolve, then consider buying mature product;
    • Using commodity hardware. Commodity hardware can effectively reduce the chance of hardware failure;
    • Rapid iteration. The system should quickly develop small functional modules, on-line as soon as possible to verify the early detection of problems greatly reduce the risk of delivery systems;
    • Stateless design. The service interface should be made stateless, access to the current interface does not depend on the state of the interface last visited.

Guess you like

Origin blog.csdn.net/qq_16681169/article/details/92379547