High-performance back-end server architecture design

ref: https: //www.cnblogs.com/lidabo/p/6627642.html
 
How to design a high-performance large-scale site system? In the mobile Internet era, the client application development itself, not at the winning experience, where the real challenge for the team, but also in the back-end aspects of both the bearing capacity, or security and so on, if these places can win, basis of the whole application is not solid.
       Improve server performance the most simple and crude way, it is to increase and upgrade the hardware configuration of the machine. Although the hardware is getting cheaper, but blindly to solve the concurrent growth by increasing the amount of the machine, the cost is very high. Combined with technical optimization program, it is a more effective solution.
 

A, web-side performance

       In recent years, the number of users and does not appear exponential growth, mainly due to exponential growth in the number of concurrent connections are: 1, Web page elements more and more, more abundant; 2, preloading the mainstream of this browser.
       (1), establishing a long connection. Reduce repetitive creation and destruction connection behavior. However, to maintain the connection will occupy Web server system resources, if you do not fully use this connection, will lead to waste of resources.
       (2), reduced by caching Web requests. It is controlled by a protocol header Http expire or max-age, static content into the local cache of the browser. However, this solution, the user's first visit to an invalid, but also affect the real-time part of the Web resources.
       (3), by version number mitigate Web request. Negotiation is controlled by the Last-Modified or Etag Http agreement, this time to request that the server, if the content is not the case change occurs, the server will return 304 Not Modified. However ,, but the connection is still established, the request also took place.
       (4) and the combined page request. 1, the combined display HTML content. The CSS and JS directly into the HTML page, not by way of introduction connected. 2, Ajax dynamic content merge request. For dynamic content, will be merged into 10 Ajax request once the bulk information inquiries. 3, small pictures merger, offset by technical Sprites CSS will many small images into one. 4, compression css, js, the picture size.
       (5), static pages. Most of the content on the page in a very long period of time, may have been no changes. For example, a news report, once published almost does not modify the content. In this case, by CGI generated static html page cache to disk local web server.
 
Two, app side performance.
       (1) picture three-step loading. Carried out to achieve three-level cache from the three aspects of memory, disk, network. 1, when loading a picture, first check whether there is a cache memory of the picture, if not, then see if the disk cache when there is the picture, if not, it is loaded from the network; 2, after the network load is complete, pictures need to be added to the memory and disk cache; 3, according to the LRU cache manages scheduling mode.
       (2) pre-loading technique. Based on historical records and browse the web page safety judgment, and before you could click on the request, pre-load the data.
       (3) routing schedule. Which room to the user's online behavior and landing every day different behaviors connection, we made a plan routing table, pushed to the client terminal. When you switch the user's network occurs, we know this case he should be connected to the network where the fastest.
       (4) Upload accelerated. The country has deployed more than 70 nodes upload speed up, so that each user will choose his most recent upload node upload his picture. At the same time there is enabled multi-port, multi-connection acceleration upload, you can run out of network resources as much as possible, and not told a thing connected kept waiting and other aspects of the retransmission packet.
 
Third, the bandwidth
        To calculate the bandwidth involved two indicators (average page size, day pv), detailed data can access statistics from the logs to. Average flow = (day pv / (24 * 60 * 60)) * average page size. = Average flow peak flow * 5. Need to purchase bandwidth equal to the peak flow.
        But the peak flow Day is far more than this number. 2015 New Year's Eve micro letter envelopes shake total number of 11 billion times, peak 14 million times / sec. Response activities day, day of the event in advance CDN will be ready to deal with hundreds of G-bandwidth.
 
Fourth, the backstage performance.
       Large sites are not designed, but gradually evolved out. Because the development of the Internet has run its own laws, Internet History has repeatedly proven that "the very beginning of the design of the site into a large-scale" this attempt does not work. In addition, the process of evolution, it is necessary to distinguish between the current point which is the bottleneck, you need to know at what point the highest priority optimization. Therefore, the evolution of technology architecture is not necessarily according to the article from start to finish this column down, as the case to decide.
       Speaking from a small site, one server will be enough. Evolution includes the following:
       (1) Isolation of data servers and application servers. To the application server configuration better CPU, memory. The server configuration data to a larger hard drive and better.
       (2) the use of cache. Because 80% of business access are concentrated in 20% of the data, if we can this part of the data cached, the performance suddenly come up. Null data into the cache should be, otherwise it will increase the pressure on the database.
       (3) nosql. NoSql widely used in transactional database system is not strong microblogging system. Such as: BigTable, MongoDB 
       (4) server cluster. To consider: Load Balancing load balancing scheduling servers, such as nginx?. Session management issues. How to upload files to make them similar functionality to continue normal? Using file server unified management.
       (5) separate read and write database. Subscription and publication. Implement a data access module enables the upper class people do not know to write code to read and write separation exists. To consider: delay problem. MySQL data synchronization through binlog log. Latency issues to improve performance by splitting the level of service, multithreaded synchronization solution.
       (6) splitting database. Vertical split database, problems encountered: cross-business transactions, application configuration items and more. Split level data problems encountered: SQL routing problems, you need to know on which a User database. Primary key strategy will be different. When the query performance issues, such as paging issue.
       (7) CDN. Distributed File System CDN. The content of the site closest to the user posted to the network "edge" so that users can go to obtain the necessary content to address the Internet network congestion, improve the response speed of the user access to the site. According to statistics, the use of CDN technology, can handle content views 70% to 95% of the entire Web page, reduce the pressure on the server, to enhance the performance and scalability of the site. Remote deployment generally follow: the core focus, decentralized nodes. Such as: network places, Jiang Rui, blue - the site content synchronized to the national CDN node, CDN customers access to the nearest server.
       (8) distributed resolution. Site business increasingly complex, large-scale creation of an independent application to do all this business becomes impractical. From the management perspective, it is not easy to manage. The split system functions, but also can be employed to support a large number of cheap machines huge amount of traffic and data. The advantages of micro-services architecture is obvious: low coupling, flexible technology selection, release more efficient, fault isolation. Split will encounter many challenges: 1, after the demolition of the need to provide a distributed high-performance, stable communications framework, and the need to support a variety of different communication and remote invocation; 2, will split a large application needs take a long time, the need for control systems and other finishing operations dependencies; 3. He Yunwei (dependency management, health management, bug tracking, tuning, monitoring and alarm, etc.) well this huge distributed applications .
       (9) Large small system to do. The complex functions of larger systems, of much smaller, reduction module coupling, reduce association. Into a highly self-made small systems, the formation of the pattern of high cohesion and low coupling, not overly dependent on each other between each module, so that the benefits will not be affected by any one module all the services, led by a launch to avoid systemic risk, true gray scale service. 
       (10) hardware load balancing. Soft load a server Nginx has been unable to take a huge amount of web access, you can use F5 hardware load or application to do to solve certain classification logically, then dispersed to different soft load the cluster.
 
Fifth, the way of doing business
       Some of the problem is not with the business more efficient means than by technical means. 12306 timesharing ticket is a typical example.
       (1) the front end of the buffer request. For example shake into envelopes logic in the access layer, the second-level requests into ten million million times per request level red, red and then spread back end logic and services, reducing the possibility of avalanche.
       (2) split rear asynchronous. Longest debiting operations, skip, asynchronous processing. Such as: "The current high numbers, red envelopes received will be recorded later change."
       (3) quickly rejected. In the updated version of the client, the relevant directives and strategies buried, when receiving abnormal data acquisition, the client will automatically reduce the frequency of requests, such as a request fails, you definitely want to paint the secondary, but may not actually request to the backend, but directly returned, customers Shaoanwuzao, if not buried in advance to deal with the problem when there are too late.
       (4) flow preloaded. Starting from the client, the voice traffic picture and a very consumption of resources in advance so that the client automatically downloads the preset well in advance of peak traffic grooming.
       (5) resource isolation. A branch to avoid any problem affecting the entire service chain, so even if some service problems will not affect the collapse of the entire service.
       (6) Reduce the picture quality based on business scenarios. 1, for different terminal, download a different quality pictures. 2, research new encoding format so that the picture and then decreased by 30% in the basic situation of equal quality. 3, apply some gradual transmission technology, you will first see a vague map, while a clear diagram will appear.
       (7) rollback mechanism will result in complex business logic, error-prone, there may be loopholes. We should improve the simplicity, high availability services, and reduce the error rate. For very few error, subsequent to the log processed separately.
 
Sixth, the maximum number of connections limit
       (1) Full flow pressure measurement, the entire business links automatically assess in advance, to prevent overload.
       (2) Flexible requires us to various features available beginning divided into many levels (Login> Text messages> Picture Message> Friend state presentation> keyboard activity tips).
       (3) inter-module call timeout if set properly, can lead to flexible policy failure. A calls B is 300ms timeout, B calls C is 500ms timeout; B for flexible c, c calling timeout when flexibility will continue down, but this does not make sense.
       (4) If the success rate is higher than 95%, it can be retried, or reject the interface layer.
 
references:
1, the evolution of large-scale site technical architecture http://news.cnblogs.com/n/518851/
2, high concurrency Web services evolve http://www.admin10000.com/document/6190.html
3, Web services architecture http://www.cnblogs.com/jiekzou/p/4677994.html
4, micro-channel product managers and architects is to rely on Kang Zhu 1 billion a red envelope? http://www.woshipm.com/pmd/138987.html
5, decryption Tencent network architecture one hundred million stories behind the product http://news.idcquan.com/news/66660.shtml
6. Why Chrome browser special favorite memory http://www.admin10000.com/document/6318.html
7, large-scale Web Site Technology Framework: the core principles and case studies, Li Zhihui
How to design a high-performance large-scale site system? In the mobile Internet era, the client application development itself, not at the winning experience, where the real challenge for the team, but also in the back-end aspects of both the bearing capacity, or security and so on, if these places can win, basis of the whole application is not solid.
       Improve server performance the most simple and crude way, it is to increase and upgrade the hardware configuration of the machine. Although the hardware is getting cheaper, but blindly to solve the concurrent growth by increasing the amount of the machine, the cost is very high. Combined with technical optimization program, it is a more effective solution.
 

A, web-side performance

       In recent years, the number of users and does not appear exponential growth, mainly due to exponential growth in the number of concurrent connections are: 1, Web page elements more and more, more abundant; 2, preloading the mainstream of this browser.
       (1), establishing a long connection. Reduce repetitive creation and destruction connection behavior. However, to maintain the connection will occupy Web server system resources, if you do not fully use this connection, will lead to waste of resources.
       (2), reduced by caching Web requests. It is controlled by a protocol header Http expire or max-age, static content into the local cache of the browser. However, this solution, the user's first visit to an invalid, but also affect the real-time part of the Web resources.
       (3), by version number mitigate Web request. Negotiation is controlled by the Last-Modified or Etag Http agreement, this time to request that the server, if the content is not the case change occurs, the server will return 304 Not Modified. However ,, but the connection is still established, the request also took place.
       (4) and the combined page request. 1, the combined display HTML content. The CSS and JS directly into the HTML page, not by way of introduction connected. 2, Ajax dynamic content merge request. For dynamic content, will be merged into 10 Ajax request once the bulk information inquiries. 3, small pictures merger, offset by technical Sprites CSS will many small images into one. 4, compression css, js, the picture size.
       (5), static pages. Most of the content on the page in a very long period of time, may have been no changes. For example, a news report, once published almost does not modify the content. In this case, by CGI generated static html page cache to disk local web server.
 
Two, app side performance.
       (1) picture three-step loading. Carried out to achieve three-level cache from the three aspects of memory, disk, network. 1, when loading a picture, first check whether there is a cache memory of the picture, if not, then see if the disk cache when there is the picture, if not, it is loaded from the network; 2, after the network load is complete, pictures need to be added to the memory and disk cache; 3, according to the LRU cache manages scheduling mode.
       (2) pre-loading technique. Based on historical records and browse the web page safety judgment, and before you could click on the request, pre-load the data.
       (3) routing schedule. Which room to the user's online behavior and landing every day different behaviors connection, we made a plan routing table, pushed to the client terminal. When you switch the user's network occurs, we know this case he should be connected to the network where the fastest.
       (4) Upload accelerated. The country has deployed more than 70 nodes upload speed up, so that each user will choose his most recent upload node upload his picture. At the same time there is enabled multi-port, multi-connection acceleration upload, you can run out of network resources as much as possible, and not told a thing connected kept waiting and other aspects of the retransmission packet.
 
Third, the bandwidth
        To calculate the bandwidth involved two indicators (average page size, day pv), detailed data can access statistics from the logs to. Average flow = (day pv / (24 * 60 * 60)) * average page size. = Average flow peak flow * 5. Need to purchase bandwidth equal to the peak flow.
        But the peak flow Day is far more than this number. 2015 New Year's Eve micro letter envelopes shake total number of 11 billion times, peak 14 million times / sec. Response activities day, day of the event in advance CDN will be ready to deal with hundreds of G-bandwidth.
 
Fourth, the backstage performance.
       Large sites are not designed, but gradually evolved out. Because the development of the Internet has run its own laws, Internet History has repeatedly proven that "the very beginning of the design of the site into a large-scale" this attempt does not work. In addition, the process of evolution, it is necessary to distinguish between the current point which is the bottleneck, you need to know at what point the highest priority optimization. Therefore, the evolution of technology architecture is not necessarily according to the article from start to finish this column down, as the case to decide.
       Speaking from a small site, one server will be enough. Evolution includes the following:
       (1) Isolation of data servers and application servers. To the application server configuration better CPU, memory. The server configuration data to a larger hard drive and better.
       (2) the use of cache. Because 80% of business access are concentrated in 20% of the data, if we can this part of the data cached, the performance suddenly come up. Null data into the cache should be, otherwise it will increase the pressure on the database.
       (3) nosql. NoSql widely used in transactional database system is not strong microblogging system. Such as: BigTable, MongoDB 
       (4) server cluster. To consider: Load Balancing load balancing scheduling servers, such as nginx?. Session management issues. How to upload files to make them similar functionality to continue normal? Using file server unified management.
       (5) separate read and write database. Subscription and publication. Implement a data access module enables the upper class people do not know to write code to read and write separation exists. To consider: delay problem. MySQL data synchronization through binlog log. Latency issues to improve performance by splitting the level of service, multithreaded synchronization solution.
       (6) splitting database. Vertical split database, problems encountered: cross-business transactions, application configuration items and more. Split level data problems encountered: SQL routing problems, you need to know on which a User database. Primary key strategy will be different. When the query performance issues, such as paging issue.
       (7) CDN. Distributed File System CDN. The content of the site closest to the user posted to the network "edge" so that users can go to obtain the necessary content to address the Internet network congestion, improve the response speed of the user access to the site. According to statistics, the use of CDN technology, can handle content views 70% to 95% of the entire Web page, reduce the pressure on the server, to enhance the performance and scalability of the site. Remote deployment generally follow: the core focus, decentralized nodes. Such as: network places, Jiang Rui, blue - the site content synchronized to the national CDN node, CDN customers access to the nearest server.
       (8) distributed resolution. Site business increasingly complex, large-scale creation of an independent application to do all this business becomes impractical. From the management perspective, it is not easy to manage. The split system functions, but also can be employed to support a large number of cheap machines huge amount of traffic and data. The advantages of micro-services architecture is obvious: low coupling, flexible technology selection, release more efficient, fault isolation. Split will encounter many challenges: 1, after the demolition of the need to provide a distributed high-performance, stable communications framework, and the need to support a variety of different communication and remote invocation; 2, will split a large application needs take a long time, the need for control systems and other finishing operations dependencies; 3. He Yunwei (dependency management, health management, bug tracking, tuning, monitoring and alarm, etc.) well this huge distributed applications .
       (9) Large small system to do. The complex functions of larger systems, of much smaller, reduction module coupling, reduce association. Into a highly self-made small systems, the formation of the pattern of high cohesion and low coupling, not overly dependent on each other between each module, so that the benefits will not be affected by any one module all the services, led by a launch to avoid systemic risk, true gray scale service. 
       (10) hardware load balancing. Soft load a server Nginx has been unable to take a huge amount of web access, you can use F5 hardware load or application to do to solve certain classification logically, then dispersed to different soft load the cluster.
 
Fifth, the way of doing business
       Some of the problem is not with the business more efficient means than by technical means. 12306 timesharing ticket is a typical example.
       (1) the front end of the buffer request. For example shake into envelopes logic in the access layer, the second-level requests into ten million million times per request level red, red and then spread back end logic and services, reducing the possibility of avalanche.
       (2) split rear asynchronous. Longest debiting operations, skip, asynchronous processing. Such as: "The current high numbers, red envelopes received will be recorded later change."
       (3) quickly rejected. In the updated version of the client, the relevant directives and strategies buried, when receiving abnormal data acquisition, the client will automatically reduce the frequency of requests, such as a request fails, you definitely want to paint the secondary, but may not actually request to the backend, but directly returned, customers Shaoanwuzao, if not buried in advance to deal with the problem when there are too late.
       (4) flow preloaded. Starting from the client, the voice traffic picture and a very consumption of resources in advance so that the client automatically downloads the preset well in advance of peak traffic grooming.
       (5) resource isolation. A branch to avoid any problem affecting the entire service chain, so even if some service problems will not affect the collapse of the entire service.
       (6) Reduce the picture quality based on business scenarios. 1, for different terminal, download a different quality pictures. 2, research new encoding format so that the picture and then decreased by 30% in the basic situation of equal quality. 3, apply some gradual transmission technology, you will first see a vague map, while a clear diagram will appear.
       (7) rollback mechanism will result in complex business logic, error-prone, there may be loopholes. We should improve the simplicity, high availability services, and reduce the error rate. For very few error, subsequent to the log processed separately.
 
Sixth, the maximum number of connections limit
       (1) Full flow pressure measurement, the entire business links automatically assess in advance, to prevent overload.
       (2) Flexible requires us to various features available beginning divided into many levels (Login> Text messages> Picture Message> Friend state presentation> keyboard activity tips).
       (3) inter-module call timeout if set properly, can lead to flexible policy failure. A calls B is 300ms timeout, B calls C is 500ms timeout; B for flexible c, c calling timeout when flexibility will continue down, but this does not make sense.
       (4) If the success rate is higher than 95%, it can be retried, or reject the interface layer.
 
references:
1, the evolution of large-scale site technical architecture http://news.cnblogs.com/n/518851/
2, high concurrency Web services evolve http://www.admin10000.com/document/6190.html
3, Web services architecture http://www.cnblogs.com/jiekzou/p/4677994.html
4, micro-channel product managers and architects is to rely on Kang Zhu 1 billion a red envelope? http://www.woshipm.com/pmd/138987.html
5, decryption Tencent network architecture one hundred million stories behind the product http://news.idcquan.com/news/66660.shtml
6. Why Chrome browser special favorite memory http://www.admin10000.com/document/6318.html
7, large-scale Web Site Technology Framework: the core principles and case studies, Li Zhihui

Guess you like

Origin www.cnblogs.com/schips/p/10956979.html