Experience sharing of high-performance, high-concurrency network architecture with tens of millions of scale

Architecture and the nature of architecture

in Pay attention to it and despise it tactically. Let's take an example to feel what the order of magnitude is in the end? Uber, which is very popular now, according to the information released by the media, it receives an average of about one million orders per day. If there are 10 hours of service time per day, the average QPS is only about 30. For a background server, the average QPS of a single machine can reach 800-1000, and the business volume of reading and writing alone is very simple. Why can't we say that we despise it? First, let's look at its data storage. If it is one million per day, what is the scale of the data volume in a year? Secondly, as for the order volume just mentioned, each order needs to be pushed to nearby drivers, and
the drivers need to grab orders concurrently. The traffic volume of the latter business scenarios is often hundreds of times that of the former, easily exceeding hundreds of millions of levels.

Today, I want to talk about the essence of architecture, and I hope everyone understands the starting point and the problems it solves when doing some architectural design.

Architecture, the explanation at the beginning is what I saw from Zhihu. What is schema? Some people say that the architecture is not a very dangling thing, it is actually a shelf where some businesses and algorithms are placed, very similar to the drying rack in our life. To be more abstract, the architecture is actually an abstraction of our repetitive business and a foresight of our future business expansion, emphasizing past experience and your predictions about the entire industry.

What capabilities do we need to make an architecture? I think the most important thing is that one of the most important abilities of an architect is that you have the ability to decompose strategically. How to look at this:

First, you must have the ability to abstract. The most basic ability of abstraction is de-duplication. De-duplication is reflected in all aspects of the entire architecture, from defining a function, to defining a class, to providing a Services, as well as templates, are all behind to improve reusability.
Second, classification ability. When making software, you need to decouple objects, define the attributes and methods of objects, and when you make distributed systems, you need to split and modularize services, and define service interfaces and specifications.
Third, algorithm (performance), its value is reflected in improving the performance of the system. All performance improvements will eventually fall on the four major blocks of CPU, memory, IO and network.


This page of PPT gives some examples to gain a deeper understanding of the architectural concepts behind common technologies.

In the first example, in a distributed system, we will do MySQL sub-database and sub-table. We need to read data from different libraries and tables. The most intuitive abstraction is to use templates, because most of the SQL semantics are the same. In addition to routing to which library and which table, if you do not use Proxy middleware, templates are the most cost-effective method.
Second, let’s take a look at the CDN that accelerates the network. It is to improve the performance in terms of speed. We also mentioned that from the four aspects of CPU, memory, IO, and network, CDN is essentially to do network intelligent scheduling optimization, and the other is a multi-level cache optimization.
The third thing is to look at servitization. As I mentioned just now, during the transformation process of each large website, servitization will definitely be done. In fact, it is the separation of abstraction and service. The fourth is to look at the message queue. In essence, it is still classified, but it is not two classes with clear margins, but two subsystems with unclear margins are deconstructed and asynchronous through the queue.

What is the overall structure of Sina Weibo? Next, let's

take look at the overall structure of Weibo. The entire structure of a system of a certain magnitude will become a three-tier system. The client includes WEB, Android and IOS, so I won't talk about it here.
Then there will be an interface layer, which has three main functions:

the first function is to do security isolation, because the front-end nodes are directly interacting with users, and various malicious attacks need to be prevented;
The second one also acts as a traffic control function. As we all know, during the Spring Festival in 2014, WeChat red envelopes received more than 800 million requests per minute. In fact, the amount of requests that actually reached its background was only about 100,000 orders of magnitude ( The data here may be inaccurate), the remaining traffic is blocked at the interface layer;
third, we see that the requirements for the PC side and the mobile side are different, so we can split. After the interface layer is the background, you can see that there are three major blocks in the Weibo background:
one is platform services,
the second is search, and
the third is big data.
The various services in the background are actually processed data. Like the business department of the platform, what it does is data storage and reading, for search, it is data retrieval, and for big data, it is data mining. Weibo is actually very similar


to Taobao. Weibo is actually very similar to Taobao. Generally speaking, the first-generation architecture can basically support millions of users, and the second-generation architecture can basically support tens of millions of users. When the business scale reaches 100 million, a third-generation architecture is required. .

From the LAMP architecture to the service-oriented architecture, there are several areas that are very difficult. First of all, it is impossible to meet the rapid growth of users through simple tinkering on the basis of the first generation. At the same time, the online business cannot be stopped. It is the problem we often talk about changing the engine on the plane. Two days ago, a friend of mine asked me, saying that when he implemented servicization internally, he completed the servicization of a module, and other departments just refused to accept it. I suggest that when doing service, first of all, it is more about business sorting, and at the same time, it is necessary to find a good entry point. Both the improvement of architecture and service, the business side should also benefit, such as improving performance or To reduce maintenance costs and smooth the upgrade process, it is recommended to start with atomic services, such as basic user services, basic short message services, and basic push services. Second, it is possible to do stateless services, which will be discussed in detail later, and data sharding needs to be done after the amount of data is large, which will be discussed later. The problem to be solved by the third-generation architecture is that the number of users and services tend to increase steadily (exponential growth relative to the explosive period), more consideration is given to the stability of the technical framework, to improve the overall performance of the system, reduce costs, and Improvement and upgrade of system monitoring.

How does the system architecture of a large website evolve?



Let look at its challenges through data. The PV is at the 1 billion level, the QPS is at the millions, and the data volume is at the 100 billion level. Our availability means that the SLA requires 4 9s, the interface response cannot exceed 150 milliseconds at most, and all online faults must be resolved within 5 minutes. What if it didn't work for 5 minutes? That will affect your year-end performance review. In 2015, Weibo DAU has exceeded 100 million. There are hundreds of microservices in our system, and there will be regular launch twice a week and unlimited emergency launch. Our challenges are all the same, that is, the amount of data, bigger and bigger, user experience is faster and faster, and business is more and more. Internet business is more driven by product experience, and the most effective contribution of technology to product experience is that your performance is getting better and better. Every time you reduce the time to load a page, you can indirectly reduce the churn rate of users on that page.



Weibo's Technical Challenges and Orthogonal Decomposition Analysis Architecture

Let's take a look at the architecture diagram of the third generation and how we use the orthogonal decomposition method to illustrate. We can see that we can see from two dimensions, the horizontal axis and the vertical axis. One dimension will be split horizontally, the second will split vertically. The horizontal dimension is from the interface layer, to the service layer to the data storage layer. How to split vertically will be handled with business architecture, technical architecture, monitoring platform, service governance, etc. I believe that by the second generation, many architectures have already split
the business architecture and the technical architecture. Let's take a look. The interface layer has feeds, user relationships, and communication interfaces; in the service layer, there are basic services, atomic services, and composite services in SOA. We only have atomic services and composite services in Weibo. Atomic services do not depend on any other services. Composite services are constructed from several atomic services and their own business logic. The resource layer is responsible for the storage of massive data (the examples will be described in detail later). The technical framework solves the technical problems in massive high-concurrency scenarios that are independent of the business, and is constructed by a number of technical components. In the interface layer, Weibo uses the JERSY framework to help you do parameter parsing, parameter verification, serialization and deserialization; the resource layer mainly includes various components related to cache and DB, such as Cache components and object library components. Monitor platform and service governance, complete pixel-level monitoring of system services, and perform early diagnosis, early warning and governance of distributed systems. It includes the formulation of SLA rules, service monitoring, service call chain monitoring, traffic monitoring, error exception monitoring, online grayscale release and online system, and online capacity expansion and shrinkage scheduling system.



Let's talk about common design principles.

The first one is the three powerful tools of the system architecture:
one, our RPC service component (not mentioned here), and
the second one, our message middleware. The role of the message middleware: it can asynchronize the interaction between the two modules, and secondly, it can output uneven request traffic as a uniform output traffic, so the message middleware is a powerful tool for asynchronous decoupling and traffic peak clipping.
The third is configuration management, which is a powerful tool for code-level grayscale release and guaranteeing system degradation.
The second, stateless, the most important interface layer is stateless. When we shop on e-commerce websites, many cases are stateful in the process. For example, which products I browsed, why do people often say that the interface layer is stateless? In fact, we have stripped the state from the interface layer to the data layer. For example, when a user goes shopping on an e-commerce website and selects a few items, at which step, after the interface is stateless, the state is either stored in the cache or in the database. In fact, it is not stateless, but in this process we need to Some stateful things are extracted out to the data layer.
Third, the data layer requires more design than the service layer, which is a very important lesson. For the service layer, you can use PHP to write, tomorrow you can use JAVA to write, but if your data structure design is unreasonable, the future data structure changes will cost you several times, the old data format to the new Data format migration will make you feel unbearable, not only in terms of workload, but also in the time period that data migration spans, and some even take more than half a year.
Fourth, the mapping between the physical structure and the logical structure. In the previous picture, we saw that the two dimensions are divided into twelve intervals, and each interval represents a technical field, which can be regarded as our logical structure. In addition, no matter the development team in the background or the application layer, it is generally divided into several vertical business groups plus a basic technical architecture group. This is the perfect mapping from the physical organizational structure to the logical technical architecture. Conducive to improve the efficiency of communication and collaboration.
Fifth, the access process of www .sanhao.com is not involved in our architecture diagram. For example, when you enter the www.sanhao website in the browser, what happens before the interface layer of this request? First, it will check your local DNS and DNS service, find the IP address corresponding to the domain name, and then send an HTTP request. This request will first go to the front-end VIP address (public network service IP address), and then the VIP will go through the load balancer (Nginx server) before reaching your application interface layer. So many things happened before the interface layer. When a user reports a problem, you can't find the problem by checking the log at the interface layer. The reason is that the problem may occur before reaching the interface layer.
Sixth, we say distributed system, where will its ultimate bottleneck lie? When a netizen discussed with me at the front-end time, he said that their system encountered a bottleneck. After checking the CPU, memory, network, and storage, there was no problem. I said you should check it again, because in the end, whether you use thousands of servers or tens of thousands of servers, the bottleneck of the system will definitely fall on a certain machine (maybe a leaf node or a core node), and it will definitely fall. In the CPU, memory, storage and network, it was finally found that the problem was in the bandwidth of a server's network card.

Weibo multi-level dual-computer room cache architecture

Next , let's take a look at Weibo's feed multi-level cache. When we do business, we often seldom do business analysis, and the sharing at technical conferences is biased towards technical architecture. In fact, more of your daily work is to spend more time on business optimization. This picture is to count the access ratio of the first few pages of Weibo's information flow. For example, the first three pages account for 97%. When doing cache design, we only store the most recent M pieces of data. The emphasis here is that the system design should be based on the user's scenario, and the more detailed the better. As an example, everyone will use e-commerce. E-commerce will conduct nationwide activities during the Double Eleven. They will also consider the scene when designing. One is the shopping cart. I have discussed with relevant developers, shopping. Before the Double Eleven, the number of users visiting the car was very large, that is, they kept adding products to Riga. On the day of Double Eleven, he will not add anything to the shopping cart, but he will frequently browse the shopping cart. For this scenario, focus on designing and optimizing the shopping cart writing scenario before the event, and optimizing the shopping cart reading scenario after the event starts.




What parts of Weibo do you see aggregated? On the far right is the Feed, which is composed of all the followers of Weibo and their Weibo. On Weibo, we will sort the order of all followers in chronological order. With the development of the business, in addition to time-series-related Weibo and non-time-series Weibo, there will be advertising requirements, adding some advertisements, and fan headlines, which are bought with money, popular Weibo, will be inserted in it. Distribution control means that it is related to some recommendations. I recommend some related friends' Weibo, I recommend some Weibo that you may not have read, and I recommend some other types of Weibo. Of course, for non-sequential microblogs and distribution control microblogs, multiple parallel programs will actually be read, and finally unified aggregation will be done synchronously. Let me share a little here. From the perspective of SNS social networking, there are three relatively good information flows in China:

Weibo is a media information flow based on weak ties;
Moments are an information flow based on strong ties; the
other is a comparison The good thing is Toutiao, which does not build information flow based on relationships, but a personalized recommendation information flow based on interests and relevance.
The aggregation of information flow is reflected in many, many products. In addition to SNS, e-commerce also has the shadow of the aggregation of information flow. For example, after searching for a product, the information flow of the list page is basically composed of several parts: first, advertising; second, making some recommendations, popular products, and second, keyword-related search results. The information flow is very simple at the beginning, but in the later stage, you will find that how to control and distribute this flow is very complicated. Weibo has been doing this kind of work in the past one or two years.


Just now we analyzed the business from the perspective of business, so how to solve the problem of high concurrency and high performance technically? When Weibo has a large number of visits, the underlying storage is MySQL database, and of course there are others. When the volume of query requests is large, we all know that there must be a cache, which can reuse reusable calculation results. As you can see, when I post a Weibo, I have a lot of followers, and they all come to see what I post, so Weibo is the most suitable system to use caching, and the ratio of reading and writing to Weibo is basically dozens to one. Weibo uses a double-layer cache. The top is L1. Each L1 is a group (including 4-6 machines). The left frame is equivalent to a computer room, and the right side is another computer room. What is the role of the L1 cache in this system? First, the L1 cache increases the QPS of the entire system, and secondly, it increases the bandwidth of the system through low-cost and flexible expansion. Imagine an extreme scenario where there is only one blog post, but its traffic grows infinitely. In fact, we do not need to affect the L2 cache, because its content storage is small, but it is a large number of visits. In this scenario, you need to use L1 to expand capacity to improve QPS and bandwidth bottlenecks. Another scenario is that the L2 level cache comes into play. For example, I have 10 million users and I am accessing the Weibo of 1 million users. At this time, he not only talks about your throughput and access bandwidth, but also you There are also a lot of blog posts to be cached. At this time, you need to consider the capacity of the cache. The second-level cache is more planned in terms of capacity to ensure that requests penetrate into the back-end database in a smaller proportion. According to your From the user model, you can estimate the percentage of requests that cannot penetrate the DB. After evaluating this capacity, you can better assess how many libraries the DB needs and how much access pressure it needs to bear. In addition, if we look at the double computer room, there is one on the left and one on the right. The two computer rooms are each other's primary and backup, or each other's hot backup. If two users are not
In the same region, when they visit two different computer rooms, it is assumed that the user comes from IDC1. Because of the proximity principle, he will visit L1. If not, he will go to the Master, and when IDC1 is not found, he will go to IDC2 to find it. At the same time, there are users accessing from IDC2, and there will also be requests returning from L1 and Master or going to IDC1 to find them. IDC1 and IDC2, both computer rooms have full user data and provide services online at the same time, but cache queries follow the principle of recent access.


What other examples of multilevel caches exist? A CDN is a typical multi-level cache. CDN has made many nodes in various regions of China. For example, when a node is deployed in Hangzhou, there must be more than one machine in the computer room. Therefore, for a region, only a few servers go to the origin and return to the origin, and other nodes go here. A few servers can go back to the source, so there are at least two levels of CDN. Local Cache+ distributed cache, which is also a common strategy. There is a scenario where distributed cache is not applicable, such as the explosive peak traffic of single-point resources. In this case, Local Cache + distributed cache is used. Local Cache blocks a small amount of extreme peak traffic with small memory resources on the application server. The long-tail traffic still accesses the distributed cache. Such a Hybrid cache architecture reduces the overall cost of the system by multiplexing many application server nodes.

Let's take a look at the storage structure of the feed. Weibo's blog posts are mainly stored in MySQL. First, let’s look at the content table, which is relatively simple. There is one index for each content, and one table is built every day. Next, let’s look at the index table. A total of two levels of indexes are built. First, imagine the user scenario. When most users swipe Weibo, they see the Weibo of everyone he follows, and then sort them by time. Careful analysis found that in this scenario, the correlation with a user's self is very small. Therefore, in the first-level indexing, according to the users concerned, their previous Weibo IDs will be obtained, and then aggregated and sorted. When we do hashing (sub-database and sub-table), we consider both hashing by UID and time dimension. There is a high correlation between business and time. Today's hot news will not be hot tomorrow. The hot and cold data is very obvious. This scenario needs to be divided into tables according to the time dimension. First, the hot and cold data are separated (you can Hot and cold data use different storage schemes to reduce costs), and secondly, it is easy to control the explosion of my database tables. For example, if Weibo is only distinguished according to the user dimension, then all the data of this user is in one table, and this table will grow infinitely, and the query will become slower and slower over time. The secondary index is a relatively special scenario in us, that is, when I want to quickly find the Weibo of a certain period of time that this person wants to publish, I can quickly locate it through the secondary index.



Distributed service tracking system

Distributed tracking service system, when the system reaches the level of tens of millions, it becomes more and more complex, and the problems it solves are more inclined to stability, performance and monitoring. I just said that as long as a user has a request, you can rely on your services RPC1 and RPC2, and you will find that RPC2 also depends on RPC3 and RPC4. A pain point in distributed services is that after a request comes from the user, it is continuously called and returned between different machines in the background.



When you find a problem, these logs fall on different machines, and you don't know where the problem is. The services are isolated from each other and are not related to each other. Therefore, there is basically no means to troubleshoot the problem, that is, there is no way to solve the problem.

The problem we want to solve, we just said that the logs are isolated from each other, and we have to connect it. We have a request ID to establish a contact, and then combined with the RPC framework, the service governance function. Suppose the request comes from the client, which contains an ID 101, and it still has ID 101 when it arrives at service A, and then when calling RPC1, it will also identify this as 101, so a unique request ID is needed to identify recursive iterations passed to each related nodes. Second, when you do it, you can't say that you can add it everywhere. For the business system, you need a framework to complete this work. This framework should be the principle of the least intrusion to the business system. If you use JAVA, you can use AOP. The principle of zero intrusion is to manage all related middleware, from interface layer components (HTTP Client, HTTP Server) to service layer components (RPC Client, RPC Server), and data access middleware, so that The business system only needs a small amount of configuration information to implement full-link monitoring. Why use logs? After service, each service can use different development languages. Considering the compatibility of multiple development languages, the only and effective way is to define standardized logs internally.


Finally, how to build GPS navigation-based traffic monitoring? We just talked about distributed service tracing. Distributed service tracking can solve problems. If a single user finds a problem, he can quickly find out where the node where the
problem , but it does not solve how to find the problem. Let's look at road monitoring, which is relatively easy to understand in reality. Every vehicle has GPS positioning. I want to see where in Beijing is congested, what should I do? First, you must
know where each car is and where it has gone. In fact, it can be said that as long as there is a logo on each car, plus the information of each flow, you can see the position and direction of each traffic flow. Secondly, how to monitor and alarm, how can we understand the traffic condition and load of the road, and alarm in time. We need to define how wide and high this street is, and how many vehicles can pass through per unit time, which is the capacity of the road. With the road capacity and the real-time traffic flow of the road, we can make early warnings based on the actual road conditions?

How to build it for distributed systems? First, you have to define what is the SLA A of each service node? SLA can be defined from the system's CPU usage, memory usage, disk usage, and the number of QPS requests, which is equivalent to defining the capacity of the system. Second, to count dynamic online traffic, you need to know the average QPS, minimum QPS, and maximum QPS of the service. With the traffic and capacity, you can monitor and alarm the system comprehensively.

What I just talked about is theory, but the actual situation is definitely more complicated than this. Weibo does a lot of activities during the Spring Festival, and it must ensure the stability of the system. In theory, you only need to define the capacity and traffic. But it is far from practical, why? There are technical factors and human factors. Because the traffic and capacity indicators defined by different developers are subjective, it is difficult to quantify the standard globally. Therefore, after the real traffic arrives, the system bottleneck you pre-assessed is often incorrect. In practice, we mainly took three measures before the Spring Festival: First, the easiest way is to have a downgrade plan. When the traffic exceeds the system capacity, which functions should be cut first, it is necessary to have a clear priority. The second, online full-link stress test, is to amplify the current traffic to five times or even ten times our usual traffic (for example, half of the servers offline, shrink rather than expand), and see if the system bottleneck occurs first Where. We have some examples before. It is speculated that the system database will bottleneck first, but the actual measurement found that the front-end program encounters the bottleneck first. Third, build an online Docker cluster, and all businesses share the spare Docker cluster resources, which can greatly avoid reserving resources for each business, but in fact there is no waste caused by traffic growth.

Summary



Next , we will talk about how to keep learning and improving. Taking the Java language as an example, first of all, we must understand JAVA; second, after JAVA is finished, we must understand JVM; secondly, we must understand the operating system; again Or learn about Design Patterns, which will tell you how to abstract past experience for future reference; also learn TCP/IP, distributed systems, data structures and algorithms.

The last thing I want to say is that what I said today may be all wrong! Through continuous study, practice and summary, everyone has formed their own set of architectural design principles and methods. Thank you.

Taken from: http://wely.iteye.com/blog/2336874

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326676675&siteId=291194637