[Spring Cloud 3] High-availability design and scalable design under distributed architecture

Chapter 1 High-availability design under distributed architecture

1. Avoid single points of failure

  1. Load balancing technology (failover, site selection, hardware load, decentralized software load)
  2. Hot Standby
  3. Multiple computer rooms (same city disaster recovery, remote disaster recovery)

2. High availability of application

  1. Fault monitoring (system monitoring (CPU, memory), link monitoring, log monitoring), automatic warning
  2. Application fault-tolerant design, service degradation, current limiting self-protection
  3. Data volume (data fragmentation, separation of read and write)

3. Scalability design under distributed architecture

  1. Vertical even
  2. Improve hardware capabilities
  3. Horizontal expansion
  4. Add server

Fourth, speed up static content access speed CDN

The full name of CDN is Content Delivery Network, and the Chinese definition is Content Delivery Network.

The role of CDN is to distribute the content that users need to the nearest place to respond, so that users can quickly obtain the content they need.

CDN is essentially a network caching technology, which can place some relatively stable resources closer to the end user. On the one hand, it can save the bandwidth consumption of the entire WAN, and on the other hand, it can also increase the user's access speed and improve the user. Experience.

Five, how to achieve high availability

1. Entrance level

The entry layer, usually refers to things at the Nginx and Apache levels, responsible for the service entry of the application (whether it is a web application or a mobile application). We usually locate the service on an IP. If the server corresponding to this IP is down, the user's access will definitely be interrupted. At this point, keepalived can be used to achieve high availability at the entrance layer. For example, if the IP of machine A is 1.2.3.4 and the IP of machine B is 1.2.3.5, then apply for an IP 1.2.3.6 (called a hop IP), which is usually bound to machine A. If A crashes, the IP It will be automatically bound to machine B; if B is down, the IP will be automatically bound to machine A. For this form, we bind DNS to the heartbeat IP to achieve high availability at the entrance layer.

But this scheme has a little problem.

First, its switching may be interrupted for one to two seconds, which means that there will be no problem if it is not required to the very strict millisecond level.

Second, there is some waste of the entrance machine, because if you buy two entrance machines, only one machine may be used. For some long-connected applications, it may cause service interruption. At this time, the client needs to cooperate to re-create the connection. Simply put, for a more ordinary business, this solution can solve some of the problems.

It should be noted here that keepalived has some restrictions on its use.

  1. The two machines must be on the same network segment, and no longer on the same network segment. There is no way to grab each other's IP.
  2. Intranet services can also do heartbeat, but it should be noted that in the past, for security, we would bind intranet services to the intranet IP to avoid security issues. But in order to use keepalived, it must be monitored on all IPs (if the monitoring is on the heartbeat IP, then the service cannot be started if the machine does not hold that IP). The simple solution is to enable iptables to prevent intranet services from being accessed by external networks.

  3. The server utilization rate has dropped. At this time, a hybrid deployment can be considered to improve this.

A common mistake is that if there are two machines, two public network IPs, and the domain name is located on the two IPs in the DNS, it feels that high availability has been achieved. This is not high availability at all, because if a machine goes down, about half of the users cannot access it.

In addition to keepalive, lvs can also be used to solve the high availability problem of the entry layer. However, compared with keepalived, lvs will be more complicated and the threshold will be higher.

2. Business layer

The business layer is usually composed of logic codes written in PHP, Java, Python, Go, etc., and needs to rely on back-end databases and some cache-level things. How to achieve high availability of the business layer? The most important thing is that the business layer should not have state, and the state should be distributed to the cache layer and database. Currently, people usually like to put the following types of data into the business layer.

First, session, which is the data related to user login, but a good practice is to store the session in a database or a relatively stable cache system.

Second, caching. If a query is slow when accessing the database, it is hoped that these results will be temporarily put in the process, and the database will not be accessed when the query is made next time. The problem with this approach is that when there is more than one business layer server, the data is difficult to be consistent, and the data obtained from the cache may be wrong.

A simple principle is that the business layer does not need to have state.

When there is no state in the business layer, and a business layer server is down, Nginx/Apache will automatically send all requests to another business layer server. Since there is no state, there is no difference between the two servers, so users can't feel it at all. If you put the session in the business layer, then the problem is that the user was logged on to a machine before, and after the process dies, the user will be logged out.

Friendly reminder: Cookie session was popular for a period of time, which is to encrypt the data in the session and put it in the client's cookie, and then send it to the client, so that it can be completely stateless with the server. But there are many pits in this, if you can bypass these pits, you can use it like this.

The first pit is how to ensure that the encryption key is not leaked. Once leaked, it means that the attacker can forge anyone's identity.

The second pit is the replay attack. How to prevent others from constantly trying the verification code by saving the cookie. Of course, there are also some other attack methods.

If there is no good way to solve these two problems, then try not to use cookie session, it is better to put the session in the cache than in the cookie.

3. Cache layer

There is no concept of caching in a very simple architecture. But after the traffic comes up, databases such as MySQL can no longer be supported. For example, when running MySQL on a SATA disk, when the QPS reaches 200, 300 or even 500, the performance of MySQL will drop significantly. At this time, you can consider using a cache layer. Block most service requests and increase the overall capacity of the system.

A simple way to make the cache layer highly available is to divide the cache layer into a little bit. For example, if the caching layer is a single machine, then after this machine is used, all application layer pressure will be put into the database. If the database cannot support it, the entire website (or application) will be crashed. And if the cache layer is divided into four machines, each is only a quarter. After this machine is crashed, only a quarter of the total traffic will be pressed on the database. If the database can handle it, The website can wait steadily until the caching layer is restored. In practice, a quarter is obviously not enough, we will divide it into more detail to ensure that the database can be supported after a single cache crashes. In small and medium scales, the cache layer and business layer can be deployed in a mixed manner, which saves machines.

4. Database layer

To achieve high availability at the database level is usually done at the software level. For example, MySQL has a master-slave mode (Master-Slave) and a master-master mode (Master-Master) that can meet the demand. MongoDB also has the concept of ReplicaSet, which can basically meet everyone's needs.

In short, if you want to achieve high availability, you need to do the following: the entrance layer does the heartbeat, the business layer server is stateless, the cache layer reduces the granularity, and the database becomes a master-slave mode. For this model, we do not need too many servers for high availability, and these things can be deployed on two servers at the same time. At this time, two servers can meet the early high availability requirements. The user is completely unaware that any server is down.

Chapter 2 Scalable Design under Distributed Architecture

One, scalability/scalability (Scalable/scalability)

Scalability (scalability) is a design index for the computing and processing capabilities of a software system. High scalability represents a kind of flexibility. In the process of system expansion and growth, the software can ensure vigorous vitality through few changes or even Only the addition of hardware equipment can achieve linear growth in the processing capacity of the entire system, achieving high throughput and low latency and high performance.

Scalability and pure performance tuning are fundamentally different. Scalability is a comprehensive consideration and balance of high performance, low cost, and maintainability. Scalability emphasizes smooth and linear performance improvement, and focuses more on the level of the system. Scaling, realizing distributed computing through cheap servers; while ordinary performance optimization is only the optimization of the performance index of a single machine. They all have a focus on the choice between throughput and latency according to the characteristics of the application system. Of course, the horizontal scaling of the partition will bring about the CAP theorem constraints.

The scalability design of software is very important, but it is difficult to master. The industry tries to save developers' energy through cloud computing or high-concurrency languages. However, no matter what technology is adopted, if the application system is monolithic, such as relying heavily on databases, The system reaches a certain access scale, and the load is concentrated on one or two database servers. At this time, it is more difficult to perform partition expansion and expansion. As Gavin King, the creator of the Hibernate framework, said: Relational databases are the most unscalable.

2. Performance and scalability

  • What is a performance issue? If your system is slow for a user to access, it is a performance problem;

  • What is the scalability issue? If your system is fast for a user, it will be slow when the user continues to grow high traffic.

Three, latency and throughput

Latency and throughput are a pair of indicators to measure scalability. We hope to obtain a low-latency and high-throughput system architecture. The so-called low latency refers to the system response time that users can feel. For example, a web page opens within a few seconds. The shorter the latency, the lower the latency, and the throughput indicates how many users can enjoy this low latency at the same time. If concurrent users When the volume is large, users feel that the opening speed of the web page is very slow, which means that the throughput of the system architecture needs to be improved.

The goal of scalability is to obtain maximum throughput with acceptable latency. Reliability (availability) goal: obtain the consistency of data update with acceptable delay.

Fourth, how to achieve scalability

1. Entrance level

Scalability at the entrance layer can be achieved by directly expanding the machine horizontally, and then DNS plus IP. However, it should be noted that although a domain name resolves to dozens of IPs, there is no problem, but many browser clients will only use the first few IPs. Some domain name providers have optimized this (for example, the order of IP returned each time is random), but This optimization effect is unstable.

The recommended approach is to use a small number of Nginx machines as the entrance, and the business server is hidden in the intranet (mostly HTTP-type business is this way). In addition, you can also send all IPs to the client, and then do some scheduling on the client (especially for non-HTTP services, such as games and live broadcasting).

2. Business layer

How to achieve the scalability of the business layer? Like the solution for high availability, to achieve the scalability of the business layer, it is a good way to ensure statelessness. In addition, add machines to continue horizontal deployment.

3. Cache layer

The more troublesome thing is the scalability of the cache layer. What is the simplest and rude way? Take advantage of the low volume in the middle of the night, take the entire cache layer offline, and then online the new cache layer. After the new cache layer is started, wait for these caches to warm up slowly. Of course, here is a requirement, your database can resist the amount of requests during the underestimation period. What if you can't hold it? Depending on the type of cache, let's first distinguish the type of cache.

  • Strongly consistent cache: Can not accept the wrong data from the cache (such as user balance, or will be cached downstream)
  • Weakly consistent cache: It can accept the wrong data (such as the number of Weibo reposts) from the cache within a period of time.
  • Invariant cache: The value corresponding to the cache key will not change (such as the password derived from SHA1, or the calculation result of other complex formulas).

So which cache type is more scalable? The expansion of weak consistency and invariant cache is very convenient, just use consistent Hash; the situation of strong consistency is slightly more complicated, and I will talk about it later. The reason for using consistent Hash instead of simple Hash is the failure rate of the cache. If the cache is expanded from 9 to 10, 90% of the cache will immediately become invalid in the case of simple Hash, and if the consistent Hash is used, only 10% of the cache will fail.

So, what is the problem with strong coherent caching? The first problem is that there will be a slight difference in the configuration update time of the cache client, and it is possible to get outdated data within this time window. The second problem is that if the node is abolished after expansion, it will get dirty data. For example, the key a was on machine 1 before, and it was on machine 2 after expansion. The data is updated, but after the node is abolished, the key returns to machine 1, and then dirty data will be obtained.

To solve the problem 2 is relatively simple, either keep the node never reduced, or the node adjustment interval is greater than the valid time of the data. Problem 1 can be solved with the following steps:

  1. Both sets of hash configurations are updated to the client, but the old configuration is still used;
  2. Each client is changed to use the cache when only two sets of hash results are consistent, and the rest is read from the database but written to the cache;
  3. Each client is notified to use the new configuration.

Memcache was designed relatively early, resulting in a less thoughtful consideration of scalability and high availability. Redis has made many improvements in this area, especially the @ngaut team developed codis based on redis, which solved most of the problems of the cache layer at one time. Recommend everyone to take a look.

4. Database

There are many methods and many documents to achieve scaling at the database level, so I won’t go into details here. The general method is: horizontal split, vertical split and regular scroll.

In short, we can use the methods and technologies just introduced to achieve high availability and scalability of the system at the entry layer, business layer, cache layer, and database layer. Specifically: use heartbeat at the entry layer to achieve high availability, and use parallel deployment to scale; at the business layer to achieve stateless services; at the cache layer, some granularity can be reduced to facilitate high availability, and consistent Hash It is helpful to realize the scalability of the cache layer; the master-slave mode of the database layer can solve the high availability problem, and the split and scroll can solve the scalability problem.

Previous: [Spring Cloud 2] Software Architecture Design

Next article: SpringCloud learning general outline

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/guorui_java/article/details/112102190