High concurrency solutions [reprint]

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https: //blog.csdn.net/sanyaoxu_2/article/details/78992113

1. Application and static resources isolated
the beginning of the application and static resources are saved together, when the amount of concurrency reaches a certain level when you need to save the static resources to a dedicated server, static resources including images, videos, js , css and some resources such as files, these files because there is no separation of state so relatively simple, direct deposit to the server's response can be, and usually use a special domain name to visit.
It allows the browser to directly access through a different domain name server resources without the need to access the application server. Chart as follows:

 

2. The page cache
page cache is to use the generated page is cached, so you do not need to be generated every time the page, which can save a lot of CPU resources, if the cached page into memory speed is faster. If you are using Nginx server can use its built-in caching feature, of course, you can use specialized Squid server. The default page cache failure mechanisms are by a group of cache time to deal with, of course, you can manually make the appropriate cache invalidation after modifying the data.
Page caching is mainly used in the page data rarely changes, but many pages are most of the data rarely changes, of which a small part of the data changes in the frequency is very high, for example, a page of the article shows that it is normal fully static, but later in the article if the "top" and "step" function but also displayed the number of responsive, this change of frequency data is relatively high, it will affect the static. This problem can be used to Mr. static pages and then use Ajax to read and modify data response, so you can kill two birds come, you can either use the page cache can also display real-time high frequency data to some changes.

In fact, we all know, the highest efficiency, consumption is the smallest of pure static html page, so we try to make the pages on our site using static pages to realize that the easiest way is actually the most effective method. But for a lot of content and frequently updated website, we can not all go one by one manually to achieve, so there we have common information release system CMS, each portal site news channel as we often visited, and even their other channels, are by information dissemination system to manage and implementation, information release system can achieve the most simple static information into automatically generated page, but also with channel management, rights management, and other functions automatically crawl, for a large site, it has an efficient , manageable CMS is essential.

In addition to publishing and information portal type of site, the requirements for high interactivity community type sites, as the static is a necessary means to improve the performance of the posts, articles in the community in real time static, an update of when re-static strategy is heavily used, such as Mop hodgepodge is the use of such a strategy, Netease communities as well.

Meanwhile, html static cache strategy is to use some means, for the system frequently used database queries but small application updates, consider using static html to achieve, such as public information forum set up in the Forum, the information currently mainstream forums can manage and back-end database to store and then, in fact, a lot of this information is called the foreground program, but the update frequency is very small, you can consider this part of the contents of the background static of update time, thus avoiding a large number of databases access request.

3. Cluster and distributed
cluster is each server has the same function, that server can call the main diversion effect while processing the request.

The different services are distributed into different server processes a request may need to use multiple servers, so that you can improve the processing speed of a request, and clustering and distributed can also be used simultaneously.

Cluster has two ways: one is static resource cluster. Another is the application clusters. Static resource cluster is relatively simple. In the process the core of the problem is the application cluster Session synchronization.

Session sync handled in two ways: one is automatically synchronized after the change in the Session to another server, and the other is to use a unified program management Session. All server cluster use the same Session, Tomcat is the first way to use the default, by a simple configuration can be achieved, the second way you can use a dedicated server installation Mencached such as unified and efficient caching programs to manage session, then application to get the development server Session Request and covered by rewriting getSession method.

For cluster is also a core problem is load balancing, that is, after receiving a request to assign specific server to handle the problem, the problem may be the use of specialized hardware (such as: F5) by software processing solutions.

 

4. Reverse proxy
reverse proxy refers clients to directly access the server does not really provide a service that takes resources from other server then returns the results to the user.

Figure:

 

4.1 reverse proxy server and proxy servers difference
proxy server role is to give my door get the desired resource then returns the results to us the resources to get that door I take the initiative to tell the proxy server, for example, the door I want to visit Facebook , but not direct access, then you can allow access to the proxy server, then returns the results to us.

Reverse proxy server is a normal door when I visit a server, the server itself to call the other server resources and returns the results to us, my door did not know.

A proxy server is to use our initiative is to serve us, he does not need to have your own domain name; reverse proxy server is the server's own trial, the door I do not know, it has its own domain name, I visit it and access doors normal web site does not make any difference.

Reverse proxy server has three main functions:
1 can be used as front-end server server integration with the actual processing of the request;
2. Can do load balancing
3. forwarding the request, for example, can forward different types of resource requests to different servers to handle .

CDN 5.
CDN is actually a special page cache server cluster, and his ordinary cluster multiple page cache servers compared to its main storage location and the way a bit special allocation request. CDN servers are distributed throughout the country, when a user request is received after the request will be assigned to the most appropriate CDN server node acquires data. For example, China Unicom users assigned to the node Unicom, Shanghai users assigned to a node in Shanghai.

Each CDN node is actually a page cache server, cache resources if there is no request will get from the main server, or directly back to the cached page.

Way CDN allocation request (load balancing) is to use a dedicated CDN DNS server to resolve a domain name when you assign good. The general practice in the ISP where the trial will CNAME DNS to a specific domain name, then the domain name resolves to a dedicated server resolves CDN CDN node corresponding channel. As shown in FIG.

 

The second step is to visit the CDN DNS server should use the NS record points to the CDN server for the target domain's DNS CNAME records. Each CDN node may also cluster multiple servers.

6. The underlying optimization
in front of that architecture are all are based on the infrastructure front of the introduction. Many places need to transmit data through the network, if you can speed up transmission network, it will make the whole system is improved.

 

7. database clustering and database hash table

Large sites are complex applications, these applications must use the database, then in the face of heavily accessed, the database bottlenecks will be revealed soon, then a database will soon be unable to meet the application, so we need to use the database cluster hash table or library.

In terms of database cluster, many databases have their own solutions, Oracle, Sybase and others have very good programs, commonly provided by MySQL Master / Slave is a similar program, you use what kind of DB, will refer to the corresponding solutions can be implemented.

Due to the above-mentioned database cluster in the structure, the cost of expansion will be subject to the terms of restricting the use of DB type, so we need from the application point to consider improving the system architecture, database table hash is commonly used and most effective solution . We installed the application and business application or database to separate functional modules, different modules corresponding to different databases or tables, and then according to certain strategies smaller hash database on a page or function, such as user tables, carried out in accordance with a hash table user ID, so that the performance of the lift system at low cost and has good scalability. sohu the forum is to use such a structure, the user forums, settings, and other information in a database separate posts, and then posts the user to hash database and tables in accordance with the plate and ID, ultimately a simple configuration in a configuration file it allows the system at any time adding a low-cost supplement comes in database system performance.

8. Summary of
the whole evolution of the site architecture mainly revolve around big data and high concurrency of these two problems, the solution is divided into cache and multi-use resource types. Mainly refers to the multi-bank multi-resource (including multi-memory), multi-network and multi-CPU, for many resources it can be divided into a plurality of resource requests and complete cooperation processing a single resource handling both types of requests, such as the multi-bank and multi-CPU and distributed in the cluster, CDN static resources isolated and multi-network. Once we understand the whole idea of architecture to capture the essence of evolution, and he might also design a better architecture.

 

Other brief summary:

First of all, I believe that we must first have a clear idea until the problem is, if only for other people's solutions that can only be ism, not really understand, do not judge the whole.

Huge amounts of data and high concurrency is often even that thing in one piece, although they are completely different child. Pure massive data refers to data in the database is massive, and it refers to include concurrent high-traffic databases and servers.

So the question is, since it is a large amount of data in the database, how to do it? To solve the problem, we must first know what the problem is! ! ! So huge amounts of data will give me what kind of problem?

Massive data problems caused by nothing more than the additions and deletions to change search problem, but also addition to what's the problem? Can not bring security issues now (playing a face, really there may be security issues)

1 slow database access

2 Insert the update is slow, this problem can only be resolved through sub-sub-table library

To solve the problem of slow access to the database there are several ways to access the database since the slow, in the case of logic can not access the database to allow it?

1 Use Cached

2 Use static pages

Since they do not escape to access the database, and then we optimize the database

3 optimized database (contains the contents of very large, such as configuration parameters, index optimization, sql optimization, etc.)

4 separate active data in the database

5 separate read and write

6 batch read and modify the delay;

7 Use search engine data in the database;

8 NoSQL and Hadoop using techniques;

9 split operations;

 

High concurrency solutions

In fact, this issue must be combined with huge amounts of data to the above discussion, there will be high concurrency under what circumstances it? Must be the usual traffic is relatively large, then usually larger than the corresponding access data stored in it more and more, which are complementary, of course, there is also the cases, such as just need, such as 12306, high concurrency phase here than to its data is already not a mass. So usually how Sheremetyevo solve it? Because the issues involved here to the server and database, so from both be optimized

1 increase in the number of web server, which is to do clustering, load balancing. Since a server can not complete the task, it is multi-purpose a few, a few is not enough room

 Before leading to the second solution, there is no addition to the database server can do some optimization does it mean? Of course there

1.1 page caching

1.2 cdn

1.3 Reverse Proxy

1.4 applications and static resources isolated (such as resource exclusively for download on a separate together, this server to provide high bandwidth)

2 increasing the number of database servers, do the same cluster, do load balancing.

 

 

Solutions for mass data

1 Use Cached

A lot of things are complementary, it is compared to the use of the cache is more used to solve the problem of high concurrency, because access to huge amounts of data has led to a slow, easily lead to the seriousness of the problem of high concurrency, and because databases are typically web access bottleneck, so we try to avoid the operation of the database in case the business logic allowed Thus, there is a cache. The necessary data is stored in memory, without having to read each time the database unnecessary waste of performance and speed up access --- this is the benefits brought by the cache. It should be used when the cache and cache management software selection pay attention to what is it?

2 static pages --- not want to explain, there is nothing to explain it?

3 database optimization

3.1 database table structure involved

3.2 Selection of data type

3.3 sql optimization

3.4 Index Tuning

3.5 configuration optimization

Need to pay attention too much, should come up with in terms of a separate chapter

4 separate active data in the database

Why separate it? I say that a problem encountered in the actual environment it! There is a table with only 10 a few fields, tables, there are 1.3 million of data, but the size of the data has to 5G, which in itself is not very reasonable, so little data takes up too much data on some of the fields to store a large number of string (for example, article content, etc.), each time retrieving the table is less than most of these large fields of content, but requires a longer time-consuming, resulting in much slower logs. Then we can consider the table vertical segmentation, will be separated from active data, this can greatly speed up access

5 separate read and write

 

 

Links http://blog.csdn.net/u012373815/article/details/71435926

http://blog.csdn.net/u014723529/article/details/41892001


----------------
Disclaimer: This article is CSDN bloggers "_ _ start all over again" in the original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/sanyaoxu_2/article/details/78992113

Guess you like

Origin www.cnblogs.com/zhangzt/p/11565884.html