Architecture evolution

 

Overview

This article uses Taobao as an example to introduce the evolution process of the server architecture from one hundred concurrency to tens of millions of concurrency. At the same time, it lists the related technologies that will be encountered in each evolution stage, so that everyone has an overall view of the evolution of the architecture. Cognitive, the article summarizes some principles of architecture design at the end.

basic concepts

Before introducing the architecture, in order to prevent some readers from not understanding some concepts in the architecture design, the following are some of the most basic concepts:

a distributed system a plurality of modules deployed on different servers, it can be called a distributed system, such as Tomcat and databases are deployed on a different server, or both Tomcat same function are deployed on different servers. • When some nodes in a high-availability system fail, other nodes can take over and continue to provide services, and the system can be considered to be highly available. Cluster A specific field of software is deployed on multiple servers and provides a class of services as a whole. This whole is called a cluster. For example, the Master and Slave in ZooKeeper are separately deployed on multiple servers, forming a whole to provide centralized configuration services. In a common cluster, the client can often connect to any node to obtain services, and when a node in the cluster goes offline, other nodes can automatically take over for it to continue to provide services, which shows that the cluster has high availability. • When a load balancing request is sent to the system, the request is evenly distributed to multiple nodes in some way, so that each node in the system can evenly process the request load, then the system can be considered to be load balanced. • When the forward proxy and reverse proxy systems internally access the external network, they forward the request through a proxy server. From the perspective of the external network, it is the access initiated by the proxy server. At this time, the proxy server implements a forward proxy; When a request enters the system, the proxy server forwards the request to a certain server in the system. For external requests, only the proxy server interacts with it. At this time, the proxy server implements a reverse proxy. To put it simply, forward proxy is a process in which a proxy server replaces the inside of the system to access the external network, and reverse proxy is a process in which external requests to access the system are forwarded to the internal server through the proxy server.

Architecture evolution

Stand-alone architecture

However, as the number of users grows, Tomcat and the database compete for resources, and the performance of a single machine is not enough to support the business.

The first evolution: Tomcat is deployed separately from the database

Tomcat and database separately occupy server resources, significantly improving their respective performance.

However, as the number of users grows, concurrent reading and writing to the database becomes a bottleneck.

The second evolution: the introduction of local cache and distributed cache

Concurrent reading and writing of the database has become a bottleneck, and the database pressure is too great. How to solve it? Distributed cache.

Add a local cache on the same Tomcat server or in the same JVM, and add a distributed cache externally to cache popular product information or HTML pages of popular products, etc. Through caching, most requests can be intercepted before reading and writing to the database, greatly reducing database pressure.

The technologies involved include: using Memcached as a local cache, using Redis as a distributed cache, and problems such as cache consistency, cache penetration/breakdown, cache avalanche, and centralized invalidation of hot data.

The cache resisted most of the access requests. As the number of users increased, the concurrency pressure mainly fell on the stand-alone Tomcat, and the response gradually slowed down.

The third evolution: the introduction of reverse proxy to achieve load balancing

Deploy Tomcat on multiple servers separately, and use reverse proxy software (Nginx) to evenly distribute requests to each Tomcat. Assuming here that Tomcat supports up to 100 concurrency and Nginx supports up to 50,000 concurrency, then theoretically Nginx can distribute requests to 500 Tomcats, and it can resist 50,000 concurrency.

The technologies involved include: Nginx and HAProxy, both of which are reverse proxy software working on the seventh layer of the network, mainly supporting the HTTP protocol, and also involve session sharing, file upload and download issues.

The reverse proxy greatly increases the amount of concurrency that the application server can support, but the increase in the amount of concurrency also means that more requests penetrate the database, and the stand-alone database eventually becomes a bottleneck.

The fourth evolution: separation of database read and write

The database is divided into a read library and a write library. There can be multiple read libraries. The data in the write library is synchronized to the read library through the synchronization mechanism. For scenarios that need to query the latest written data, you can write one more copy in the cache. Get the latest data from the cache.

The technologies involved include: Mycat, which is a database middleware, through which the database can be separated, read and written, and sub-database sub-table. The client uses it to access the lower-level database, and it also involves data synchronization and data consistency issues. .

Businesses are gradually increasing, and there is a large gap in traffic between different businesses. Different businesses directly compete for databases and affect each other's performance.

The fifth evolution: database is divided by business

Save the data of different businesses in different databases to reduce resource competition between businesses. For businesses with a large amount of visits, more servers can be deployed to support them.

At the same time, cross-business tables cannot be directly associated with the analysis, and need to be resolved through other means, but this is not the focus of this article, and those who are interested can search for solutions by themselves.

As the number of users grows, the stand-alone writing library will gradually reach a performance bottleneck.

The sixth evolution: split the big table into small tables

For example, for review data, you can hash according to the product ID and route to the corresponding table for storage; for payment records, you can create a table by hour, and each hour table continues to be split into small tables, and the user ID or record number is used to route the data . As long as the amount of table data for real-time operations is small enough and requests can be distributed to small tables on multiple servers evenly, the database can improve performance through horizontal expansion. The Mycat mentioned earlier also supports access control in the case of splitting large tables into small tables.

This approach significantly increases the difficulty of database operation and maintenance, and requires higher DBA requirements. When the database is designed to this structure, it can already be called a distributed database, but this is only a logical database as a whole. Different components in the database are realized by different components, such as the management and request of sub-database and sub-table Distribution is implemented by Mycat. SQL analysis is implemented by a stand-alone database. Read-write separation may be implemented by gateways and message queues. The summary of query results may be implemented by the database interface layer. This architecture is actually MPP (large-scale Parallel processing) architecture.

At present, there are many MPP databases for both open source and commercial use. Among the more popular open source ones are Greenplum, TiDB, Postgresql XC, HAWQ, etc., commercial ones such as Nanda General GBase, Ruifan Technology’s Snowball DB, Huawei’s LibrA, etc. Different MPP databases have different focuses. For example, TiDB focuses more on distributed OLTP scenarios, and Greenplum focuses more on distributed OLAP scenarios. These MPP databases basically provide SQL standard support capabilities like Postgresql, Oracle, and MySQL. A query can be parsed into a distributed execution plan and distributed to each machine for parallel execution, and finally the data is summarized and returned by the database itself. It also provides capabilities such as permission management, sub-database sub-table, transaction, data copy, etc., and most It can support clusters with more than 100 nodes, greatly reducing the cost of database operation and maintenance, and enabling the database to achieve horizontal expansion.

Both the database and Tomcat can be scaled horizontally, and the supportable concurrency is greatly improved. As the number of users grows, the single-machine Nginx will eventually become a bottleneck.

The seventh evolution: Use LVS or F5 to balance multiple Nginx loads

Since the bottleneck is in Nginx, it is impossible to achieve load balancing of multiple Nginx through two layers of Nginx.

The LVS and F5 in the figure are load balancing solutions that work at the fourth layer of the network. LVS is software and runs in the kernel state of the operating system. It can forward TCP requests or higher-level network protocols, so the supported protocols are more Rich, and the performance is much higher than Nginx, it can be assumed that a single machine LVS can support hundreds of thousands of concurrent request forwarding; F5 is a load balancing hardware, similar to the capabilities provided by LVS, higher performance than LVS, but expensive . Since LVS is a stand-alone software, if the server where LVS is located goes down, the entire back-end system will be inaccessible, so a spare node is required. You can use the keepalived software to simulate a virtual IP, and then bind the virtual IP to multiple LVS servers. When the browser accesses the virtual IP, it will be redirected to the real LVS server by the router. When the main LVS server is down, the keepalived software will Automatically update the routing table in the router and redirect the virtual IP to another normal LVS server, so as to achieve the effect of high availability of the LVS server.

It should be noted here that the drawing from the Nginx layer to the Tomcat layer in the above figure does not mean that all Nginx forwards requests to all Tomcats. In actual use, it may be several Nginx followed by a part of Tomcat. These Nginx High availability is achieved through keepalived, and other Nginx is connected to another Tomcat, so that the number of accessible Tomcat can be doubled.

Since the LVS is also a stand-alone machine, as the number of concurrency increases to hundreds of thousands, the LVS server will eventually reach a bottleneck. At this time, the number of users reaches tens of millions or even hundreds of millions. The users are distributed in different regions and the distance from the server room is different. The delay of access will be significantly different.

The eighth evolution: load balancing in the machine room through DNS polling

With the richness of data and the development of business, the requirements for retrieval and analysis are becoming more and more abundant, and the database alone cannot solve such rich requirements.

The ninth evolution: the introduction of technologies such as NoSQL databases and search engines

When the data in the database reaches a certain scale, the database is not suitable for complex queries, and can only meet the scenarios of ordinary queries. For statistical report scenarios, the results may not be able to run when the amount of data is large, and other queries will slow down when running complex queries. For scenarios such as full-text retrieval and variable data structure, the database is inherently unsuitable. Therefore, it is necessary to introduce suitable solutions for specific scenarios. For example, for massive file storage, it can be solved by the distributed file system HDFS, for key value data, it can be solved by HBase and Redis, for full-text retrieval scenarios, it can be solved by search engines such as ElasticSearch, and for multi-dimensional analysis scenarios, it can be solved Solve through solutions such as Kylin or Druid.

Of course, the introduction of more components will also increase the complexity of the system. The data saved by different components needs to be synchronized, consistency issues need to be considered, and more operation and maintenance methods are needed to manage these components.

The introduction of more components solves the rich needs, and the business dimension can be greatly expanded. With this, too much business code is included in an application, and business upgrade iteration becomes difficult.

Tenth evolution: big applications are split into small applications

Divide the application code according to business sectors, so that the responsibilities of a single application are clearer, and they can be upgraded independently. At this time, some common configurations may be involved between applications, which can be solved by the distributed configuration center ZooKeeper.

There are shared modules between different applications. Separate management by the application will result in multiple copies of the same code, resulting in the upgrade of all application codes when public functions are upgraded.

The eleventh evolution: the reuse of functions is separated into microservices

If user management, order, payment, authentication and other functions exist in multiple applications, then the codes of these functions can be extracted separately to form a single service for management. Such services are so-called microservices, applications and services Public services are accessed through multiple methods such as HTTP, TCP, or RPC requests, and each individual service can be managed by a separate team. In addition, functions such as service governance, current limiting, fusing, and downgrading can be implemented through frameworks such as Dubbo and SpringCloud to improve the stability and availability of services.

Different services have different interface access methods. Application code needs to adapt to multiple access methods to use services. In addition, applications access services and services may also access each other. The call chain will become very complicated and the logic will become chaotic.

The Twelfth Evolution: Introducing the Enterprise Service Bus ESB to shield the access differences of service interfaces

Through the ESB unified access protocol conversion, the application unified through the ESB to access back-end services, services and services also call each other through the ESB, thereby reducing the degree of system coupling. This single application is split into multiple applications, public services are separately extracted for management, and the enterprise message bus is used to release the coupling problem between services. This is the so-called SOA (service-oriented) architecture, which is similar to microservices. The architecture is easy to confuse, because the presentation is very similar. Personally, microservice architecture refers to the idea of ​​extracting public services from the system separately for operation, maintenance and management, while SOA architecture refers to an architectural idea that splits services and makes service interface access uniform. SOA architecture It contains the idea of ​​microservices.

With the continuous development of business, the number of applications and services will continue to increase, and the deployment of applications and services will become more complex. Deploying multiple services on the same server must solve the problem of operating environment conflicts. In addition, dynamic expansion and contraction are required for large promotions. In the case of content, the performance of the service needs to be expanded horizontally, and it is necessary to prepare the operating environment and deploy the service on the newly added service. Operation and maintenance will become very difficult.

The thirteenth evolution: the introduction of containerization technology to achieve operating environment isolation and dynamic service management

At present, the most popular containerization technology is Docker, and the most popular container management service is Kubernetes (K8S). Applications/services can be packaged as Docker images, and the images can be dynamically distributed and deployed through K8S. Docker image can be understood as a minimal operating system that can run your application/service, which contains the running code of the application/service, and the operating environment is set up according to actual needs. After packaging the entire "operating system" as an image, it can be distributed to the machines that need to deploy related services. You can start the service directly by starting the Docker image, making service deployment and operation and maintenance easier.

Before the big promotion, servers can be partitioned on the existing machine cluster to start Docker mirroring to enhance the performance of the service. After the big promotion, the mirror can be closed without affecting other services on the machine.

After the use of containerization technology, the problem of service dynamic expansion and contraction is solved, but the machine still needs to be managed by the company itself. When it is not a big promotion, a large amount of machine resources still need to be idle to deal with the big promotion, the cost of the machine itself and the operation and maintenance cost Both are extremely high, and resource utilization is low.

The fourteenth evolution: carrying the system on a cloud platform

The system can be deployed on the public cloud, using the massive machine resources of the public cloud to solve the problem of dynamic hardware resources. During the big promotion period, temporarily apply for more resources in the cloud platform, and combine Docker and K8S to quickly deploy services , Release resources after the end of the big promotion, truly pay on demand, greatly improve resource utilization, and greatly reduce operation and maintenance costs.

The so-called cloud platform is to abstract a large amount of machine resources into a whole resource through unified resource management, on which it can dynamically apply for hardware resources (such as CPU, memory, network, etc.) on demand, and provide general operations on it. The system provides common technical components (such as Hadoop technology stack, MPP database, etc.) for users to use, and even provides developed applications. Users do not need to be concerned with what technology is used inside the application to solve their needs (such as audio and video transcoding services) , Mail service, personal blog, etc.). The following concepts are involved in the cloud platform:

IaaS: Infrastructure as a service. Corresponding to the above-mentioned machine resources are unified as a resource as a whole, and can dynamically apply for hardware resources; • PaaS: platform as a service. Corresponds to the above-mentioned provision of commonly used technical components to facilitate system development and maintenance; • SaaS: Software as a service. Corresponding to the above-mentioned providing well-developed applications or services, pay according to the function or performance requirements.

At this point, the above-mentioned problems have their own solutions from the high concurrent access problem to the service architecture and system implementation level, but at the same time, it should be realized that in the above introduction, it is actually deliberately ignored such as cross Practical issues such as computer room data synchronization, distributed transaction implementation, etc., will have the opportunity to discuss them separately in the future.

Architecture design summary

• Does the adjustment of the structure have to follow the above evolution path? No, the sequence of architecture evolution mentioned above is only a single improvement for a certain aspect. In actual scenarios, there may be several problems that need to be resolved at the same time, or the bottleneck may be reached in another aspect first It should be solved according to actual problems. For example, in a scenario where the amount of concurrency in the government category may not be large, but the business may be very rich, high concurrency is not a key problem to be solved. At this time, the first priority may be a solution with rich requirements. • To what extent should the architecture be designed for the system to be implemented? For a single-implemented system with clear performance indicators, it is enough that the architecture is designed to support the system's performance indicator requirements, but there must be an interface for extending the architecture so that it is not needed. For the evolving system, such as the e-commerce platform, it should be designed to the extent that it can meet the requirements of the next stage of user volume and performance indicators, and iteratively upgrade the architecture according to the growth of the business to support higher concurrency and richer business . • What is the difference between server-side architecture and big data architecture? The so-called "big data" is actually a general term for scene solutions such as mass data collection, cleaning and conversion, data storage, data analysis, and data services. Each scene includes a variety of optional technologies, such as data collection with Flume, Sqoop, Kettle, etc., data storage includes distributed file system HDFS, FastDFS, NoSQL database HBase, MongoDB, etc., data analysis includes Spark Technology stack, machine learning algorithms, etc. In general, big data architecture is an architecture that integrates various big data components according to business needs. It generally provides distributed storage, distributed computing, multi-dimensional analysis, data warehouse, machine learning algorithms and other capabilities. The server-side architecture refers more to the application organization level architecture, and the underlying capabilities are often provided by the big data architecture. • Are there any principles for architectural design? • N+1 design. Every component in the system should have no single point of failure; • Roll back the design. Ensure that the system is forward compatible, and there should be a way to roll back the version when the system is upgraded; • Disable the design. It should provide a configuration that controls whether specific functions are available, and can quickly go offline when the system fails; • Monitoring design. In the design stage, the means of monitoring must be considered; • Multiactive data center design. If the system requires extremely high availability, consider implementing multiple data centers in multiple locations, and the system is still available when at least one computer room is out of power; • Use mature technology. Newly developed or open source technologies often have many hidden bugs. If there is a problem, it may be a disaster without commercial support; • Resource isolation design. A single business should avoid occupying all resources; • The architecture should be able to scale horizontally. Only when the system can be expanded horizontally can the bottleneck problem be effectively avoided; • If it is not the core, buy it. If non-core functions require a lot of R&D resources to be solved, consider buying mature products; • Use commercial hardware. Commercial hardware can effectively reduce the probability of hardware failure; • Fast iteration. The system should quickly develop small functional modules, go online for verification as soon as possible, and find problems as early as possible to greatly reduce the risk of system delivery; • Stateless design. The service interface should be made stateless, and the access of the current interface does not depend on the state of the interface last accessed.

Guess you like

Origin blog.csdn.net/qq_37557563/article/details/103387812