Introduction to relevant frameworks of Taobao's highly scalable and high-performance architecture

An application stateless (Taobao session framework)

As the saying goes, the scalability of a system depends on how the application state is managed. Why do you say that? Let's imagine, if we save a lot of state information with the client in the session, what should we do when the server that saves the state information goes down? Generally speaking, we solve this problem through clusters, and the so-called cluster not only has load balancing, but more importantly, it has failure recovery failover, such as cluster node broadcast replication adopted by tomcat, paired replication adopted by jboss Wait for the session state replication strategy, but the state recovery in the cluster also has its shortcomings, that is, it seriously affects the scalability of the system, and the system cannot achieve good horizontal scalability by adding more machines, because the communication of sessions between cluster nodes will change with time. As the number of nodes increases, the overhead increases. Therefore, in order to achieve the scalability of the application itself, we need to ensure the statelessness of the application, so that each node in the cluster is the same, so the system is at a better level. telescopic.

OK, the importance of statelessness is mentioned above, so how to achieve statelessness? At this point a session framework will come into play. Fortunately, Taobao already has such a framework. Taobao's session framework uses client cookie implementation, which mainly saves the state in the cookie, so that the application node itself does not need to save any state information, so that when the number of system users increases, more application nodes can be added by adding more application nodes. To achieve the purpose of horizontal expansion. However, the use of client-side cookies to save the state will also encounter limitations. For example, each cookie generally cannot exceed 4K in size, and many browsers limit a site to save a maximum of 20 cookies. Taobao cookie The framework uses a "multi-value cookie", which is a composite key corresponding to the value of multiple cookies, which not only prevents the number of cookies from exceeding 20, but also saves the space for cookies to store valid information, because each cookie will have about 50 by default. bytes of meta information to describe the cookie.

In addition to Taobao's current session framework implementation, it is actually done by centralized session management. Specifically, multiple stateless application nodes are connected to a session server, the session server saves the session in the cache, and the back end of the session server is reconfigured. There are underlying persistent data sources, such as databases, file systems, etc.

2. Brothers who effectively use cache (Tair)

for Internet applications should all know how important caching is for an Internet application. From browser caching, reverse proxy caching, page caching, partial page caching, object caching, etc. are all caches application scenarios.

Generally speaking, the cache can be divided into: local cache and remote cache according to the distance from the application. Generally, either local cache or remote cache is used in the system. If the two are used together, the data consistency processing of local cache and remote cache will become larger and more troublesome.

In most cases, the cache we are talking about is read Cache, there is another type of cache: write cache. For some data with low read-write ratio and low demand for data security, we can cache it to reduce access to the underlying database, such as access to statistical commodities The number of times, the number of statistical API calls, etc., can be written to the memory cache first and then delayed to the database, which can greatly reduce the write pressure on the database.

OK, I take the system of the store line as an example. When users browse the store, such as the store introduction, the store communication area page, the store service terms page, the store fitting room page, and the search interface in the store. These interface updates are not very frequent. Therefore, it is suitable to be placed in the cache, which can greatly reduce the load of the DB. In addition, the baby details page is relatively less updated, so it is also suitable to be placed in the cache to reduce the DB load.

Three Application Split (HSF)

First, before explaining application splitting, let's review some of the problems encountered in the process of growing a system from small to large. Through these problems, we will find out how important splitting is for building a large-scale system.

When the system is just launched, there are not many users. All logic may be placed in one system, and all logic runs to one process or one application. At this time, because there are fewer users and less system access, all the logic will be stored in one system. It's not a bad idea to put all the logic in one application. However, brothers all know that the good times will not last long. With the continuous increase of system users, the access pressure of the system is increasing. At the same time, with the development of the system, in order to meet the needs of users, the original system needs to add new functions. , When the system becomes more and more complex, we will find that the system becomes more and more difficult to maintain and expand, and the scalability and availability of the system will also be affected. So how do we solve these problems at this time? The sensible way is to split (this is also a kind of decoupling), we need to divide the original system into different subsystems according to certain criteria, such as business relevance, and different systems are responsible for different functions, so that the split In the future, we can expand and maintain individual subsystems, thereby improving the scalability and maintainability of the system. At the same time, the horizontal scalability scale out of our system is greatly improved, because we can target high-pressure systems. The subsystem can be extended horizontally without affecting other subsystems, unlike before the split, every time the system pressure increases, we need to scale the entire large system, and this cost is relatively large , In addition, after segmentation, the coupling between subsystems is reduced. When a subsystem is temporarily unavailable, the overall system is still available, so the availability of the overall system is greatly enhanced.

Therefore, a large-scale Internet application must be split, because only after splitting, the scalability, maintainability, scalability and availability of the system will become better. However, the split also brings problems to the system, that is, how to communicate between subsystems, and what are the specific communication methods? Generally, there are synchronous communication and asynchronous communication. Here we will first talk about synchronous communication. The following topic "message system" will talk about asynchronous communication. Since communication is required, a high-performance remote calling framework is very necessary at this time, so we Taobao also have our own HSF framework.

The above are the benefits of splitting, but after splitting, new problems will inevitably arise. In addition to the subsystem communication problem just mentioned, the most worthy of attention is the dependency between systems, because the system If there are too many, the dependencies of the system will become complicated. At this time, it is necessary to pay more attention to the splitting standards, such as whether to verticalize some dependent systems, so that the functions of these systems are as vertical as possible. At present, Taobao is making the system vertical. At the same time, we must pay attention to the circular dependencies between the systems. If there are circular dependencies, we must be careful, because this may lead to the failure of the system chain startup.

OK, now that we understand the importance of splitting, let's see how Taobao itself splits the system with the development of Taobao.

First let's look at the following picture:



From the above figure, we can see an evolution process of Taobao system. During this evolution process, what we call split occurs between V2.2 and V3.0. In the V2.2 version, almost all the logic of Taobao is placed in the (Denali) system, which leads to the problem that the expansion and modification of the system is very troublesome, and even more fatal is that with the increase of Taobao's business volume, if according to V2. The architecture of 2 has no way to support the rapid development of Taobao in the future, so we decided to split the entire system. The final V3.0 version of the Taobao system architecture diagram is as follows:



It can be seen from the above figure that the V3.0 version of the system splits the entire system in two directions: horizontal and vertical. It is divided into business systems, core business systems and basic services. In this way, each system can be independently maintained and independently scaled horizontally. For example, the transaction system can scale and function independently without affecting other systems.



It can be seen from the above that if a large-scale system wants to become maintainable, scalable, and scalable, we must split it. Splitting will inevitably bring about issues such as how to communicate between systems and how to manage dependencies between systems. , Regarding communication, Taobao has independently developed its own high-performance service framework HSF, which mainly solves the synchronous and asynchronous communication between all subsystems of Taobao (currently HSF is mainly used in synchronous occasions, and the calling scenario of FutureTask method is also a bit less). As for the dependency management between systems, Taobao is still not doing well enough, and this should also be a problem we will try to solve in the future.

[b] Four database splitting (TDDL) [/b]

In the previous topic of "application splitting", we mentioned that a large Internet application needs to be well split, and there we only talked about "application level" splitting In fact, in addition to the application-level splitting of our Internet applications, another very important aspect is how to split the storage. So this topic mainly involves how to split the storage system, usually called RDBMS.

Well, after determining the theme of this section, let's review some problems encountered in the process of an Internet application growing from small to large, and lead us to the importance of splitting RDBMS through the problems encountered.

At the beginning of the system, because the system was just launched, there were not many users. At that time, all the data were placed in the same database. At this time, because there were fewer users and less pressure, one database could fully cope with it, but with the operation After the hard shouting and desperate promotion of those buddies, suddenly one day I found that, oh, god, the number of users suddenly increased, and what followed was the database. Time to hang up. At this time, our technical buddies, let's go and see what the reason is. After we checked it, we found that it was because the reading pressure of the database was too great. At this time, we all knew that it was time to separate reading and writing. We will configure a server as the master node, and then configure several salve nodes. In this way, through the separation of read and write, the pressure of reading data is distributed to different salve nodes, and the system finally returns to normal and starts to operate normally. But the good times are not long. One day we found that the master can’t hold it anymore. Its load is too high, it sweats profusely, and there is a risk of it falling off at any time. At this time, we need to partition vertically (that is, the so-called sub-database). , for example, store product information, user information, and transaction information in different databases. At the same time, you can also use master, salve mode, OK for the database of product information. After the database is divided, the write pressure of each database divided by function is It is shared with different servers, so that the pressure on the database finally returns to normal. But isn't that so, we can sit back and relax? NO, this NO is not what I said, it is summed up by the seniors through experience. With the continuous increase of the number of users, you will find that some tables in the system will become extremely large, such as the friend relationship table and the parameters of the store. Configuring tables, etc. At this time, whether it is writing or reading the data of these tables, it is a very labor-intensive thing for the database, so we need to perform "horizontal partitioning" at this time (this is what the saying goes table, or sharding).

OK, I have said a lot above, but it is nothing more than telling you the fact that "database is the most difficult layer in the system to scale out", a large-scale Internet application will inevitably go through a single DB server, to Master/salve, and then The process of vertical partition (sub-library), and then to horizontal partition (sub-table, sharding), and in this process, Master/salve and vertical partition are relatively easy, and the impact on the application is not great, but the table is divided It will cause some difficult problems, such as not being able to join and query data across multiple partitions, how to balance the load of each shards, etc. At this time, a general DAL framework is needed to shield the impact of the underlying data storage on the application logic, so that the underlying data can be Access is transparent to the application.

Taking the current situation of Taobao as an example, Taobao is currently switching from expensive high-end storage (minicomputer + ORACLE) to MYSQL. After switching to MYSQL, it is bound to encounter the problems of vertical partitioning (sharding) and horizontal partitioning (Sharding). Therefore, Taobao has also developed its own TDDL framework according to its own business characteristics. This framework mainly solves the application transparency of sub-database and sub-table and data replication between heterogeneous databases.

Five Asynchronous Communication (Notify)

In the introduction of "Remote Calling Framework", we said that a large system must be split for scalability and scalability requirements, but after splitting, between subsystems How to communicate has become our primary problem. In the "Remote Calling Framework" section, we talked about the application of synchronous communication in a large distributed system, then in this section we will talk about asynchronous communication. Well, since When it comes to asynchronous communication, then the "message middleware" will come on stage. The use of asynchronous communication is actually related to the scalability of the system and the maximum decoupling of various subsystems.

When it comes to asynchronous communication, we need to pay attention to One point is that the asynchrony here must be based on the characteristics of the business, and it must be for the asynchrony of the business. Usually, the asynchronous occasion is some loosely coupled communication occasions. For business systems with relatively large business correlations, we still It is more reliable to use synchronous communication.

OK, then let's talk about what kind of benefits asynchrony can bring to the system. First of all, let's think about it. If the system consists of two subsystems, A and B, and if A and B communicate synchronously, then in order to improve the overall scalability of the system, A and B must be scaled at the same time, which affects the overall system. Scale out. Secondly, synchronous calls also affect availability. From the point of view of mathematical reasoning, A calls B synchronously. If A is available, then B is available. The inverse proposition is that if B is unavailable, then A is also unavailable. This will greatly affect the system availability. Thirdly, asynchronous communication between systems can greatly improve the response time of the system, making the response time of each request shorter, thereby improving the user experience. Therefore, asynchrony improves the scalability and availability of the system. At the same time, it also greatly enhances the response time of the request (of course, the overall processing time of the request may not be reduced).

Let's take a look at the specific application of asynchronous in Taobao based on Taobao's business. The trading system will interact with many other business systems. If a synchronous call is used in a transaction process, it requires that all the systems that must be relied on are available for the transaction to succeed. If asynchronous communication is used, the trading system uses the message The middleware Notify is decoupled from other systems, so that when other systems are unavailable, it will not affect a certain transaction, thereby improving the availability of the system.

Six unstructured data storage (TFS, NOSQL)

In a large-scale Internet application, we will find that not all data is structured, such as some configuration files, the dynamic corresponding to a user, and the snapshot of a transaction, which are generally not suitable for saving in RDBMS. , they are more in line with a key-value structure, and there is another type of data. The amount of data is very large, but the real-time requirements are not high. At this time, these data also need to be stored in another storage method, and some static Files, such as pictures of various products, product descriptions and other information, because these information are relatively large, putting them into RDBMS will cause reading performance problems, which will affect the reading performance of other data, so these information also need to be stored separately from other information. The general Internet application system will choose to save this information in the distributed file system. Therefore, Taobao has also developed its own distributed file system TFS. TFS currently limits the file size to 2M, which is suitable for the storage of some data smaller than 2M. .

With the development of the Internet, a concept that has gradually become popular in the industry since the second half of 2008 is NOSQL. We all know that according to the CAP theory, consistency, availability and partition fault tolerance cannot be satisfied at the same time, but at most two can be satisfied at the same time. Our traditional relational data adopts the ACID transaction strategy, and the ACID transaction strategy is more particular about A high consistency reduces the demand for availability, but Internet applications often have slightly higher requirements for availability than consistency. At this time, we need to avoid using the ACID transaction strategy of data, and instead use the BASE transaction strategy, BASE transaction. Strategy is the abbreviation of basic availability, transaction soft state and eventual consistency. Through the BASE transaction strategy, we can improve the availability of the system through eventual consistency, which is also the strategy adopted by many NOSQL products, including facebook's cassandra, apache hbase , google bigtable, etc. These products are very suitable for some unstructured data, such as data storage in the form of key-value, and these products have a good advantage of horizontal scalability. At present, Taobao is also researching and using some mature NOSQL products.

Seven monitoring and early warning systems

For large systems, the only thing that is reliable is that parts of the system are unreliable.

Because a large distributed system is bound to involve a variety of equipment, such as network switches, ordinary PCs, various types of network cards, hard disks, memory, etc., and these things are all in large numbers. , the probability of errors will also increase, so we need to monitor the status of the system all the time, and the monitoring also has granularity. If the granularity is coarser, we need to monitor the entire application system, such as the current system network traffic. How much is, what is the memory utilization, what is the load of IO and CPU, what is the access pressure of the service, what is the response time of the service, etc. This series of monitoring, and if it is more fine-grained, we need to monitor, for example, in the application For a certain function, how many visits to a certain URL, what is the PV of each page, how much bandwidth the page occupies every day, how much is the page rendering time, how much bandwidth does static resources such as pictures occupy every day, etc. Fine-grained monitoring. Therefore, a monitoring system becomes indispensable.

The importance of a monitoring system was mentioned earlier. After the monitoring system is in place, it is more important to combine it with the early warning system. For example, when the number of visits to a certain page increases, the system can automatically warn the CPU and memory of a certain server. When the occupancy rate suddenly increases, the system can also automatically warn, and when the concurrent requests are seriously lost, the system can also automatically warn, etc. In this way, the combination of the monitoring system and the early warning system can enable us to quickly respond to system problems. , to improve the stability and availability of the system.

Eight configuration unified management

A large-scale distributed application is generally composed of many nodes. If the configuration of other nodes must be changed every time a new node is added, or the configuration must be changed every time a node is deleted, this is not only unfavorable for the system. Maintenance and management, but also easier to introduce errors. In addition, many systems in the cluster have the same configuration. If there is no unified configuration management, it is necessary to maintain a configuration on all systems, which will make the management and maintenance of the configuration very troublesome. Configuration management can solve these problems well. When a new node is added or deleted, the configuration management system can notify each node to update the configuration, so as to achieve the configuration consistency of all nodes, which is convenient and error-free.

 

http://www.iteye.com/news/31516

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326945403&siteId=291194637