Spike high-performance system design thinking, ultra-detailed!

Original: designing high-performance spike systems thinking, ultra-detailed!

Foreword

Spike everyone is familiar with. Since 2011, for the first time, whether it is two-eleven shopping or 12306 grab votes, spike scene has been everywhere. In simple terms, the spike is at the same time a large number of requests scramble to buy the same goods and complete the transaction process. From an architectural perspective, the spike system is essentially a high-performance, high consistency, high availability of three high system. And to build and maintain a large traffic spike system which requires attention is the topic of this article.

Overall Thoughts

First, from the high-dimensional, the overall thinking. Spike is nothing less than to address two core issues, namely, concurrent read, concurrent write first, corresponds to the architectural design, is availability, consistency and performance requirements. Reflections on spike designed system, this paper is based on this three-layer sequentially advance, summarized as follows -

  • high performance. Involving high spike high read and write support, how to support high-concurrency, how to resist high IOPS? The core concept of optimization is actually similar: high reading as much as possible "less read" or "read less", high written data split. This article from the static and dynamic separation, hot spots and optimize server performance optimization three aspects Expand
  • consistency. The core concern is commodity inventory spike, limited commodity at the same time by multiple requests at the same deductions, but also to ensure accuracy, it is obvious a problem. How do neither more and a lot? This article from several industry-wide inventory reduction program to cut, to discuss the core logic design consistency
  • High availability. Large-scale distributed systems in the face of the actual operation condition is very complex, the sudden increase in traffic, the dependent services of instability, the application itself bottlenecks, damage to physical resources and other aspects of the system will bring big run the little big impact. How to ensure the applications are also efficient and stable operation in complex environmental conditions, how to prevent and face unexpected problems, the system design should start from what? Panoramic perspective of this article will focus on the architecture of landing think

high performance

1 static and dynamic separation

You may notice that you do not need to spike the process of refreshing the entire page, the only time in the stop beating. This is usually done because the system static transformation system for high-volume spike, that sense of movement on the data separation. Static and dynamic separation three steps: 1, split the data; 2, static cache; 3, data integration.

1.1 Data Split

Static and dynamic separation primary purpose is to transform into a dynamic page caching for static pages. Thus the first step is separated dynamic data, primarily from the following two aspects:

  1. user. User identity information includes login login status and portraits, the relevant elements can be split out separately, for obtaining dynamic request; recommendation level associated wide, such as user preferences, regional preferences, it can also be loaded asynchronously
  2. time. Spike time is set by the server unified management and control, can be obtained through dynamic request
    Here you can open a spike of electronic business platform page, take a look at this page in what are static and dynamic data.
1.2 static cache

After separating out the static data, static data the second step is reasonable cache, thus derived two questions: 1, how the cache; 2, where the cache

1.2.1 how cache

A static characteristic is the direct transformation of the entire cache HTTP connections rather than merely static data cache, this way, the Web proxy server according to the URL request, the response can be taken directly to the corresponding body is then returned directly, without having to reorganize the HTTP protocol response process, and without parse HTTP request header. As a key cache, URL unification is essential, but for the commodity system, URL natural can be uniquely identified based on the commodity ID, such as Taobao https://item.taobao.com/item .... .

1.2.2 Where Cache

Static data cache to where? There are three ways: 1, the browser; 2, CDN; 3, service terminal.

Of course, the browser is the first choice, but the user's browser is not controllable, mainly in If you do not take the initiative to refresh, the system is difficult to take the initiative to push the message to the user (note that when discussing the static data, the subtext is "relatively unchanged ", implying that" may become "), such information may cause the user to see the end of a long period of time it is wrong. For spike system, the failure may be essential to ensure that the cache in the second level of time.

The main server dynamically load calculation and logic itself is not good at handling a large number of connections, each consuming more memory while parsing HTTP Servlet container is slow, easy occupation of logical computing resources; in addition, static data sink will pull this point request path length.

It is often static data cached in the CDN, which itself better at handling large static file concurrent requests, the initiative failed to do either, but from the user as close as possible, at the same time to avoid the Java language level weaknesses. It should be noted that the CDN has several problems to be solved:

  1. Failures. Any cache should be time-sensitive, especially for a spike scene. Therefore, the system needs to ensure that the country's failure CDN fall within the second level cache information in time, the system requirements for CDN actual failure is very high
  2. Hit rates. It is the most high hit the core of the system performance requirements cache, or the cache will lose its significance. If you put the data CDN around the country, will inevitably lead to the same request to reduce the possibility of a cache hit, then the hit rate becomes an issue

Therefore, all the data into the national CDN node is not realistic, failures, shooting problems will face bigger challenges. More feasible approach is to select a number of CDN nodes are selected static transformation, nodes typically needs to meet the following conditions:

  1. Approaching traffic concentrated areas
  2. Far from the main station area
  3. Inter-node network with the master of good quality regional

Based on the above factors, choose CDN secondary cache might be appropriate, because the number of secondary cache too few, capacity is greater, the relative concentration of traffic, so that you can better solve the problem of failures and cache hit rate, is the current ideal a CDN scheme. Deployment as shown below:

1.3 Data Integration

After separating out the static data, how to organize the front page of data is a new problem, mainly due to dynamic data processing load, usually there are two options: ESI (Edge Side Includes) program and CSI (Client Side Include) program.

  1. ESI program: Web proxy requests on dynamic data and insert dynamic data to a static page, users see a page has a complete page. In this way the high performance requirements of the server, but the user experience is better
  2. CSI program: Return only static pages on the Web proxy server, a single front-end initiate asynchronous JS request dynamic data. In this way the performance of the server-friendly, but the user experience is somewhat less
1.4 Summary

For static and dynamic separation performance improvements, abstract up only two points, one less data to try to reduce unnecessary requests, the second is the path to be as short as possible, in order to improve the efficiency of a single request. The specific method is actually carried out based on this general direction.

2 hot optimization

Hot Hot into operation and hotspot data, the following discussion separately.

2.1 Hot operation

Refresh zero, zero order, zero cart Add belong hotspot operation. Hotspot operation is the user's behavior, not change, but you can do something to limit protection, such as when a user prompt block frequently refresh the page.

2.2 hot data

Three-step process of hot data, identifying one hot, two hot separator, three hot optimization.

2.2.1 Hot identification

Hot data divided into static and dynamic hotspot hotspot, as follows:

  1. Static hot: hot data can be predicted in advance. Big promotion on the eve of latitude information can be analyzed according to the pro-big business features, such as a hot commodity business activities, or screened by seller enlists way ahead; in addition, can also predict in advance by technical means, such as commodity buyers visit every day big data computing, and TOP N statistics of goods may be regarded as a hot commodity
  2. Dynamic hot: hot data can not be predicted in advance. Alternating hot and cold data often changes over the actual business scenarios, especially the rise now live mode sellers - with a cargo business temporarily to do an ad, it may lead to a commodity to be buying large quantities in a short time. Due to less access to such goods daily, even in the cache system after a period of time or will be expelled out of date, even in the cold data is in db. Influx instantaneous flow, often leading to breakdown of the cache, the request directly to the DB, DB initiator excessive pressure

Therefore spike system needs to achieve a dynamic ability to find the hot data, a common implementation ideas:

  1. All aspects of the acquisition transaction asynchronous link hotspots Key information, such as accessing a URL or Nginx collection Agent hotspot log collection (some of the middleware itself already has the ability to find hot spots), ahead of data to identify potential hot spots
  2. Analysis of the polymerization hotspot data, hotspot data reaches a certain rule, by subscribing to the push link distribution systems, the systems according to their needs to decide how to handle the hot data, or restriction, or cache, to achieve hot protection

have to be aware of is:

  1. The best hot data collection using asynchronous mode, on the one hand does not affect the core business of trading links, one can guarantee universal way of gathering
  2. Hot-second real-time discovery is best to do so dynamic discovery makes sense, actually data collection and analysis capabilities of the core nodes put forward higher requirements
2.2.2 Hot isolation

After the hot data identified, the first principle is to isolate the hot data, do not let the 1% affect the other 99% can be achieved based on the following hot isolation levels:

  1. Business isolation. Spike as a marketing activity, sellers need to apply separately, technically speaking, the system can be done in advance of the known hot spots cache warming
  2. System isolation. Isolation is isolated from the runtime system, and the deployment of additional packets by 99% isolated, additional spike can also apply a separate domain, the inlet layer is let fall into different clusters request
  3. Data isolation. Spike data as hotspot data, clustering can be individually enabled or cache DB service groups to achieve better lateral or longitudinal extension capability

Of course, there are many ways to achieve isolation. For example, the user can follow to distinguish different users assigned to different cookies, the inlet layer are routed to different service interface; another example, the domain name is consistent, but the backend service interface call different; or to the data in the data layer marking to distinguish the like, the purpose of these measures is to request and has been identified hotspot distinguish normal request.

2.2.3 Hot optimization

After the hot data isolation, this will facilitate the 1% of requests to do targeted optimization, the way is nothing less than two ways:

  1. Cache: Cache hot spot is the most effective way. If you do a movement of hot data separation, then the long-term cache static data
  2. Limiting: more traffic limit is a protective mechanism. It should be noted that each service should always pay attention to whether the request is to trigger limiting and timely review
2.2.4 Summary

Optimization of hot static and dynamic separation data is not the same, hot optimization is based on the 28 principles longitudinal split data, targeted for treatment. Hot identify and isolate not only on the "spike" in this scenario makes sense, other high-performance distributed systems is also very valuable reference.

3 system optimization

For a software system to improve performance can have a variety of means, such as raising the level of hardware, tuning JVM performance, where the main focus on the performance of optimized code level -

  1. Reducing serialization: Java serialization reduction may well improve system performance. Most occurred in the RPC serialization stage, should minimize the RPC call, one possible solution is to be more associated applications "Merge deployment" a strong, thereby reducing the RPC calls between different applications (micro service design specification)
  2. Direct output data stream: as far as a string I / O operation, both disk I / O or network I / O, are relatively CPU-intensive, because the character needs to be converted into bytes, and the look-up table must be converted and encoded. So for common data, such as static strings, and recommended in advance into byte code cache, the code specific to that level by the OutputStream () function to reduce the class code conversion data; In addition, the hot method toString () do not directly call ReflectionToString implemented, recommended hard-coded directly, and only print infrastructure elements and core elements of DO
  3. Cutting logs exception stack: whether it is abnormal external system or application itself occurs, there will be a stack play, under the large flow, frequent complete stack output, will only exacerbate the current system load. Stack output abnormality may be controlled by the profile depth log
  4. To the assembly frame: the ultimate optimization requirements, some of the assembly frame can be removed, such as to remove the traditional MVC framework, the Servlet to process the request directly. Such complex and can bypass a lot of little use in the processing logic saves time in milliseconds, of course, you need a reasonable assessment of the dependence on the frame

4 summarize

Performance optimization requires a reference value, so the system needs to do a baseline applications, such as performance baseline (when a sudden drop in performance), cost baseline (big promotion last year with the number of machines), link baseline (core processes of what happened) by continuing concern baseline system performance, cause the system to continue to improve the level of coding in the code quality, at the operational level in time to call off the irrational, the architectural level and constantly optimized and improved.

consistency

Spike system, inventory data is critical, not sell is a problem, more oversold is a problem. Consistency in the spike scenario, the main problem is the accuracy of inventory deductions.

1 way to reduce inventory

Buying process in the electricity business scene is generally divided into two steps: orders and payment. "Submit Order" is the next single, "pay order" is the payment. Based on this setting, reduce inventory generally have the following ways:

  1. Orders inventory reduction. Under the single buyer, the deduction of commodity stocks. Orders minus inventories is the easiest way to reduce inventory, but also the most precise control of a
  2. Payment minus inventory. Under the single buyer, not immediately deduct stock, but wait until after the payment deductions real inventory. But because only payment minus inventory, if complicated by relatively high, the situation could not pay the money after the buyer orders arise because goods have been bought by someone else
  3. Withholding inventory. In this way it is relatively complex, after the buyer orders, inventory reserved for a certain period of time (eg 15 minutes), over this period of time, inventory is automatically released after the release of other buyers can buy

Can be seen, inventory reduction mode is divided based on a multi-stage process of shopping, but whether it is a single-stage or the next payment stage, there will be some problems, specific analysis below.

2 minus inventory problems

2.1 orders minus inventories

Advantages: the best user experience. Order Stock Save Save stock is the easiest way, but also the most accurate control one. It can be controlled directly by merchandise inventory database transaction mechanism when the next one, so be sure will not be the case has been placed but can not pay the money.

Disadvantages: may not be sold. Under normal circumstances, the buyer under the single payment probability is very high, so it will not be a big problem. But there is a scene exception to this is when the seller to participate in a promotional event, the competitors all orders of goods under a single malicious way, leading to inventory is cleared, this will not be a normal sale - you know, maliciously single people do not really pay, which is what "orders minus inventories" shortcomings.

2.2 Payment inventory reduction

Advantage: some actual sale. "Orders minus inventories" could lead to malicious orders, thus affecting the sellers of goods sold "payment minus inventory" because of the need to pay real money can be effectively avoided.

Disadvantages: poor user experience. After a single user, not necessarily the actual payment, assuming that there are 100 commodities, 200 people under a single successful case can occur, because it would not reduce stocks when the next one, so it is possible the number of successful orders far exceed the real when the number of inventory, which is particularly popular commodity occur on the big promotion. In this way it will lead to a lot of buyers order in less than successful but can not pay the money, the shopping experience is of relatively poor.

2.3 Withholding stock

Advantages: alleviates the problem of the above two methods. Withholding actual inventory is "orders minus inventories" and "payment minus inventory" combination of the two approaches, the two operations were linked before and after, when the stock withholding orders, inventory release payment.

Disadvantages: does not completely solve the above problem. For example, malicious orders of the scene, although the effective payment time to 10 minutes, but malicious buyers can complete the order again after 10 minutes.

2.4 Summary

Less inventory problem is mainly reflected in two aspects of the user experience and business demands, by its very nature because there is a two-step operation even multi-step shopping process, reduce inventory at different stages, there is likely to be malicious exploit.

3 How to reduce actual inventory

The most common industry is withholding inventory. Whether it is outside selling food or electricity supplier shopping, after orders generally have a "valid payment time" beyond which time the order is automatically released, which is typical of withholding inventory program. But as mentioned above, withholding stocks also need to address the list of issues under the malicious, to ensure the goods to sell out; on the other hand, how to avoid oversold, also a sore point.

  1. Sell ​​out: Malicious single solution or a combination of major security and anti-cheating measures to stop it. For example, to identify frequent buyers under a single non-payment and marking, so you can not reduce the inventory on time in marking the buyer; another example is the large number of pro-set single largest commodity purchase parts, a person can only buy a maximum of N pieces commodities; or for repeat orders not to limit the number of payment behavior blocking, etc.
  2. Avoid oversold: the case of oversold stocks actually divided into two types. For general merchandise, just a big promotion spike means, even if stock oversold, businesses can be solved by replenishment; and for some commodities spike as a marketing tool, not allowed at stock is negative, that is, consistent data on, it is necessary to ensure that large inventories of concurrent requests field value in the database can not be negative, generally have a variety of programs: First, as judged by the transaction, i.e. to ensure that the stock can not be reduced to a negative, or rolled back; two direct database is provided as an unsigned integer type field, so that once the stock is negative when executing the SQL will be given; Third determined using CASE wHEN statements:
    sql UPDATE item SET inventory = CASE WHEN inventory >= xxx THEN inventory-xxx ELSE inventory END

Business means to ensure the goods sold out, the technical means to ensure the goods are not oversold, stock issues has never been a simple technical problems, problem-solving perspective are many and varied.

Optimization of consistency of performance

Inventory data is critical, it is a hot data. System, the actual impact of hot spots is "high read" and "write high", is also the most technical problems at the core of a spike scene.

4.1 high concurrent read

Spike scene to solve the problem of high concurrent read, the key word is "Tiered Parity." That is, when the read link, were not only affect the performance of the inspection operations, such as whether the user has a spike qualifications, whether normal trade status, the user answer is correct, whether the spike has ended, whether illegal requests, etc., and do not do consistency checks, etc. easily lead to bottlenecks inspection operation; until the time of writing links, fishes stocks do consistency check, the data layer to ensure the final accuracy.

Thus, in the hierarchical verification settings, the system may employ a distributed cache to resist even high LocalCache concurrent read. That allows reading of dirty data under certain scenarios, this will only result in a small amount of stock of orders originally requested to be mistaken for the stock, until the final reassurance really write data consistency, thus achieve high availability and consistency the balance between.

In fact, the core idea Tiered Parity is: different levels of filtering out invalid requests as possible, only the most effective treatment in the end "funnel", thus shortening the path of the impact of system bottlenecks.

4.2 high concurrent write

High concurrent write optimization methods for a replacement DB selection, one is to optimize the performance of DB, the following were discussed.

4.2.1 replace DB Selection

Spike goods and general merchandise inventory reduction is different, the core difference in the order of data, shorter transaction time, so the spike can reduce inventory directly into the cache system to achieve it, that is directly with a persistent subtract from the cache function of inventory operations, such as Redis?

If the inventory reduction logic is a single word, such as no complicated SKU inventory and total inventory of such linkage relationship, I think it is entirely possible. But if there are more complex logic reduced inventory, or to require the use of a transaction, it must complete the inventory reduction operations in the database.

4.2.2 DB performance optimization

Floor inventory data to the database implementation is actually a row of memory (MySQL), so there will be a large number of threads to compete InnoDB row lock. However, the higher concurrency, thread will wait for more, TPS decline, RT increased throughput may be severely affected - note that this database is based on the assumption that the above performance optimization] [complete data isolation, in order to focus the discussion.

Lock concurrency problem solving, there are two ways:

  1. Application layer queue. By caching join the cluster distributed lock, so as to control cluster on the same row of concurrent operation of the database records, while also controlling the number of database connections occupy a single product, to prevent excessive hot commodity database connections occupy
  2. Queuing data layer. The application layer is detrimental to the performance of the queue, the queue data layer is the most desirable. The industry, Ali's team developed a database for the patch (patch) on InnoDB layer, based on DB layer made of a single line queuing concurrently record, in order to achieve optimization in custom spike scene - note line up and lock contention is a difference if familiar with MySQL, you would know the internal deadlock detection of InnoDB, MySQL Server and InnoDB and switching are relatively consumption performance. Also Ali database team has done a lot of optimization in other areas, such as COMMIT_ON_SUCCESS and ROLLBACK_ON_FAIL patch, was added by the SQL prompt (hint), to achieve real-time transaction without waiting for submission, but finished last execution in a SQL data directly based on the results TARGET_AFFECT_ROW commit or rollback, reducing the waiting time of the network (in milliseconds). Currently, Ali has included these patches of open source MySQL: AliSQL
4.3 Summary

High read and write very different high of the two treatments. Optimization of space to a large number of read requests, write requests bottlenecks and generally in the storage layer, optimizing the essence of the idea is to do a balance based on CAP theory.

5 summarize

Of course, there are a lot of inventory reduction details, such as how to conduct the inventory after withholding overtime to cover, and then how to ensure that such third-party payment status during the reduced inventory and payment consistency, which is a big challenge.

High Availability

Stare too spike traffic monitoring, you will find that it is not a winding curve into the sky, but a tall straight line, because the spike request is highly concentrated in a particular point in time. As a zero will result in a particularly high peak, while the consumption of resources is almost instantaneous. So availability spike protection system is essential.

1 Flow clipping

For the target scene spike, the number of people able to grab the final product is fixed, regardless of 100 people and 10,000 people participated result is the same, that is a valid request amount is limited. The higher the degree of concurrency, the more valid request. But the spike as a business marketing tool, before the event is to have more people to brush page, just after the start of the real spike request is not better. Some rules so the system can be designed to artificially spike the delay request, can even filter out invalid request.

1.1 Answer

Early spike spike simply click a button, and later increased answer. Why add to answer it? Mainly by increasing the complexity of buying, serve two purposes:

  1. Prevent cheating. Early spike is more prevalent, there are cases of malicious buyers or competitors are using to scan cargo spike device, businesses do not achieve the purpose of marketing, so the answer to limit the increase
  2. Delay request. Zero flow onset time is milliseconds, the answer can be artificially lengthen the duration of a single peak, extending from the previous <1s to <10s. This time is very important for the server, will greatly reduce the peak concurrent pressure; In addition, because a request with the order, may have no stock answer when the post request comes, therefore simply can not be ordered at this stage fell to the data layer real write also very limited

Note that, in addition to doing answer correctness verification, validation requirements for further submission time, such as <1s possibility of manual operation is very small, the machine can be further prevented the answer.

The answer is now used very common, essentially by reducing the flow at the inlet layer, allowing the system to better support the instantaneous peak.

1.2 Queuing

The most common solution is to use the message queue clipping indirectly push the buffer instantaneous flow rate by converting a direct call to the synchronous asynchronous. In addition to the message queue, there are many queuing scheme similar to, for example:

  1. Locking thread pool wait
  2. Local flood storage memory wait
  3. Serialization write local files, and then read the order

Malpractice queuing is obvious, there are two points:

  1. Request backlog. If the peak flow duration, the water level reached the upper limit of the queue, the queue will also be crushed, so that although the protection system downstream, but discards the request, and also for a long time not much difference
  2. user experience. Real-time and asynchronous push natural ordering is not as synchronous call, which sent the case back to the first request may occur, affecting some sensitive user shopping experience

Queuing is in the nature of the business layer will step operation into a two-step operation, which acts as a cushion, but in view of the drawbacks of this approach, and ultimately to compromise and balance on the order of business and spike scene.

1.3 filter

Filtered layered core structure comprising, at different levels by filtering out invalid requests, read and write data to achieve accurate triggering. Common filter layers are the following:

  1. Read limiting: a read request made limiting protection, would exceed the carrying capacity of the system to filter out requests
  2. Read Cache: cache read request for data, the duplicate request filtered
  3. Write limiting: write requests to do limit protection, will exceed the carrying capacity of the system to filter out requests
  4. Write Verify: A write request to do consistency check, leaving only the final valid data

The core purpose of filtering by reducing the data to guarantee the effective invalidation request IO request IO performance.

1.4 Summary

Queuing system can answer, the business layer inlet layer, the filter layer data flow to achieve the purpose of clipping, essentially looking for a balance between the demands of the business architecture and performance. In addition, the new clipping means are endless, mostly to cut into the business, such as large synchronous 0:00 couponing or when to initiate and promote sweepstakes, part of the flow will disperse to other systems, which can also play the role of clipping.

Plan B

When faced with a sustained peak traffic system, in fact, it is difficult to rely solely on their own adjustment to restore the state of the daily operation and maintenance Nobody can forecast all cases, the accidents can not always be avoided. In this particular scenario spike, in order to ensure high availability of the system, it must be designed for a Plan B fallback scheme.

The construction of high-availability, in fact, is a systems engineering, throughout the entire life cycle of system construction.

Specifically, the system involves the construction of high-availability architecture phase, the coding phase, test phase, the release phase, when the operational phase, as well as failure, analysis of each:

  1. Architecture phase: Consider scalability and fault tolerance of the system, to avoid a single point problems. For example, many cell deployments, even if some IDC or even cities can fail but will not affect the operation of the system
  2. Coding stage: to ensure the robustness of the code, for example, when an RPC call, set a reasonable time-out exit mechanism to prevent the collapse of other systems, but also returns an error that can not be expected to be the default handler
  3. Test phase: to ensure coverage of CI and fault tolerance of Sonar, on the basis of quality secondary check, the overall quality of output and regular trend reports
  4. Published phases: Deployment most likely to expose error, and therefore have the front of the checklist template, center on the lower reaches of the known mechanisms and post-rollback mechanism
  5. Operational phase: The system most of the time in the running state, the most important is the real-time monitoring runtime, to identify problems, accurate alarm and can provide detailed data to troubleshoot
  6. Failure: The primary goal is to stop in time to prevent the influence was extended, and then locate the reason, solve problems, and finally recovery services

For the daily operation and maintenance, it is the more highly available, this phase requires additional terms were to strengthen against the operational phase, mainly the following means:

  1. Prevention: the establishment of normal system pressure measured regularly to service a single point of pressure measurement and pressure measurement whole link, water level Mopai
  2. Control: good lines running downgrade, fuse protection and current limiting. It should be noted that neither the current limit, downgrade or fuse, are detrimental to the business, so Before proceeding, be sure to confirm the good and the upstream and downstream business further. Take the current limit, the limit of what business can, limit what circumstances, limiting the length of time, to recover under what circumstances, and should be repeated to confirm the business side
  3. Monitoring: establish baseline performance trends, recorded performance; create an alarm system and found that timely warning issue
  4. Recovery: experience a failure can stop in time, and provide quick data revision tool, not necessarily better, but there must be
    throughout the life cycle of system construction, each session may make mistakes, and even some aspects made the mistake, followed by high costs or irreparable. So high availability is a systematic project, it must be placed throughout the life cycle of a comprehensive consideration. At the same time, taking into account the growth of services, high availability need more long-term planning and system construction.

3 summarize

High Availability really saying "stability" Stability is a usually unimportant, but the problem is terrible thing, but it's landing is a problem - usually a good business development, the stability of the construction will be downgraded to business to give way. Solution to this problem must be safeguarded in the organization, such as business owners let back stability performance indicators, while establishing the stability of the construction team in the department, part-time team members from the core strength of each line, the performance by the person in charge of stability to score, so that you can build the system of the task assigned to specific business systems up.

Personal summary

A spike system design, according to different levels of traffic, from simple to complex to create a different architecture, essentially all aspects of the trade-offs and trade-offs. Of course, you may have noticed, this paper does not deal with specific selection program, because these are not important for the architecture, the architect as, should always remind ourselves what the main line Yes.

While also here the abstract, refined look, mainly for personal outline Finisher spike design for easy reference classmates you -!

Source: https://segmentfault.com/a/1190000020970562

Guess you like

Origin www.cnblogs.com/lonelyxmas/p/11906402.html