Interview Question: High Concurrency Architecture and Common Methods of Handling High Concurrency

1. HTML static
In fact, we all know that pure static HTML pages are the most efficient and least expensive , so we try our best to use static pages to implement the pages on our website. This simplest method is actually the most effective. Methods. However, for websites with a large amount of content and frequent updates, we cannot manually implement all of them one by one, so our common information publishing system CMS appears, such as the news channels of various portal sites we often visit, and even their other channels. It is managed and implemented by the information release system. The information release system can realize the simplest information entry and automatically generate static pages. It can also have functions such as channel management, authority management, and automatic crawling. For a large website, it has a set of efficient , a manageable CMS is essential.

In addition to portals and information publishing types of websites, for community type websites with high requirements for interactivity, as static as possible is also a necessary means to improve performance. Posts and articles in the community are made static in real time and updated. Re-staticization is also a strategy that is widely used at times. Mop’s hodgepodge uses such a strategy, as does the NetEase community.

At the same time, html staticization is also a means used by some caching strategies. For applications that frequently use database queries in the system but the content update is small, you can consider using html staticization to achieve this, such as the public setting information of the forum in the forum. These information are currently All mainstream forums can be managed in the background and stored in the database. In fact, a lot of this information is called by the foreground program, but the update frequency is very small. You can consider making this part of the content static when you update it in the background, so as to avoid a large number of databases. access request.

2. Separation of image servers
As we all know, for web servers, whether it is Apache, IIS or other containers, images are the most resource-consuming, so it is necessary to separate images from pages, which is basically used by large websites strategy, they all have independent image servers, and even many image servers. Such an architecture can reduce the pressure on the server system that provides page access requests, and can ensure that the system will not crash due to image problems. Different configuration optimizations can be performed on the application server and the image server. For example, apache can try to configure ContentType as much as possible. Less support and as few LoadModules as possible ensure higher system consumption and execution efficiency.

3. Database clusters and database table hashing
Large-scale websites have complex applications, and these applications must use databases. When faced with a large number of accesses, the bottleneck of the database will soon appear. At this time, a database will quickly The application cannot be satisfied, so we need to use a database cluster or database table hashing.

In terms of database clustering, many databases have their own solutions. Oracle, Sybase, etc. have good solutions. The commonly used Master/Slave provided by MySQL is also a similar solution. What kind of DB you use, please refer to the corresponding solution. solution to implement.

The database cluster mentioned above is limited by the type of DB used in terms of architecture, cost, and scalability. Therefore, we need to consider improving the system architecture from the perspective of the application. Library table hashing is the most commonly used and most effective solution. . We install business and application or function modules in the application to separate the database, different modules correspond to different databases or tables, and then perform a smaller database hash for a page or function according to a certain strategy, such as user table, Hash the table according to the user ID, which can improve the performance of the system at low cost and has good scalability. Sohu's forum adopts such a structure, which separates the forum's users, settings, posts and other information from the database, and then hashes the database and tables for posts and users according to the section and ID, and finally can be simply configured in the configuration file. A low-cost database can be added to the system at any time to supplement system performance.

4.
Cache The term cache has been touched by technology, and cache is used in many places. Caching in website architecture and website development is also very important. Here are the most basic two types of caches. Advanced and distributed caching is described later.
For caching in terms of architecture, those who are familiar with Apache can know that Apache provides its own caching module, and can also use the additional Squid module for caching, both of which can effectively improve Apache's access response capabilities.
For the cache of website program development, the Memory Cache provided on Linux is a commonly used cache interface, which can be used in web development. For example, when developing in Java, you can call MemoryCache to cache and share some data. Some large communities use it. such a structure. In addition, when using web language development, various languages basically have their own cache modules and methods, PHP has Pear's Cache module, Java has more, and .net is not very familiar, I believe there must be.

5. Mirroring
Mirroring is a method often used by large websites to improve performance and data security. The mirroring technology can solve the difference in user access speed caused by different network access providers and regions. For example, the difference between ChinaNet and EduNet has prompted Many websites build mirror sites in the education network, and the data is updated regularly or in real time. In terms of the detailed technology of mirroring, I will not elaborate too deep here. There are many professional ready-made solution architectures and products to choose from. There are also cheap software implementation ideas, such as tools such as rsync on Linux.

6. Load balancing
Load balancing will be the ultimate solution for large websites to solve high-load access and a large number of concurrent requests.
Load balancing technology has been developed for many years, and there are many professional service providers and products to choose from. I have personally encountered some solutions, and there are two architectures for your reference.
Hardware Layer 4 Switching
Layer 4 Switching uses the header information of Layer 3 and Layer 4 packets to identify service flows according to the application interval, and allocates the service flow of the entire interval to the appropriate application server for processing. The fourth layer switching function is like a virtual IP, pointing to the physical server. The business it transmits obeys a variety of protocols, including HTTP, FTP, NFS, Telnet or other protocols. These services are based on physical servers and require complex load balancing algorithms. In the IP world, the service type is determined by the terminal TCP or UDP port address, and the application range in Layer 4 switching is determined by the source and terminal IP addresses, TCP and UDP ports.
In the field of hardware four-layer switching products, there are some well-known products to choose from, such as Alteon, F5, etc. These products are expensive but value for money, and can provide excellent performance and flexible management capabilities. Yahoo China used three or four Alteons for its nearly 2,000 servers.

Software Layer 4 Switching
After everyone knows the principle of hardware Layer 4 switches, software Layer 4 switching based on the OSI model emerges as the times require. The principle of such a solution is the same, but the performance is slightly worse. However, it is still easy to meet a certain amount of pressure. Some people say that the software implementation method is actually more flexible, and the processing capacity depends entirely on the familiarity of your configuration.
We can use LVS commonly used in Linux to solve the four-layer software switching. LVS is Linux Virtual Server. It provides a real-time disaster response solution based on heartbeat, which improves the robustness of the system and provides flexible virtual VIPs. The configuration and management functions can meet the needs of multiple applications at the same time, which is essential for distributed systems.

A typical strategy for using load balancing is to build a squid cluster on the basis of software or hardware four-layer switching. This idea is used in many large websites, including search engines. This architecture is low-cost, high-performance, and strong. It is very easy to add or remove nodes to the architecture at any time.

Interview Question: High Concurrency Architecture and Common Methods of Handling High Concurrency

Guess you like