Finishing architecture on large sites

1, the history of architecture of large sites (red every step of the development process is the key)

(1) developed from a small site, all the resources of a server, applications, databases, files are on a single server

(2) development of business sites, a server can not meet the demand, so you want to use three servers after separation applications and data, applications, and data separation: application servers, file servers and database servers

(3) further website development, database access latency leads to too much pressure, so use the cache to improve site performance (remember, the first step is to use caching to improve site performance) , use the cache site is divided into two: the application cache local cache and cache on a dedicated server on a remote server distributed cache cache

(4) the use of caching, database access effectively relieve the pressure, but at the peak of the application server to access the site or become a bottleneck of the whole site. This time to understand, do not attempt to replace more powerful server for large sites, no matter how powerful servers are unable to meet the growing business needs of the site , so you can improve the load pressure by increasing the server's way , and through load balancing scheduling server, the access from the user's browser requests to the distributed application server in a server cluster

(5) Although most of the data cache can not go a database, but a cache miss, the cache will go stale data or database, after the site reaches a certain size, or the pressure will be great to read and write the database, the site has become a bottleneck. At this point you can use the database to read and write separation pressure to improve database loads, application server to write data to go write database, application server, database read data commuting , most of the mainstream database currently offer master-slave hot standby function, by configuring two master-slave relationship database , a data update may be synchronized to the database server on another server

(6) As the business continues to develop the site, the user scale is increasing, due to the Chinese complex network environment, when users access the site in different areas, the speed difference is also great. So you can use a reverse proxy and CDN , on the one hand to accelerate user access speeds, on the other hand to reduce the load pressure of the back-end server, because the basic principle of reverse proxy and CDN are cached

(7) After the database to read and write separation, split by the two servers to a single server, but still can not meet the needs of website traffic, so you can use a distributed database , it is the main means of splitting the business sub-libraries, different business data deployed on different physical servers

(8) major websites in order to cope with the increasingly complex business scenarios, you can use the divide and conquer means to split the entire site business into different applications , each application independent deployment, you can build relationships by super chain, also could be used by Message Queue distribution

The development of large sites here, basically most of the technical problems are solved

 

2, the key to high performance website: control the amount of concurrency. As long as this can be done, a lot of difficult data problems so it is not a problem

 

java interview questions    interview questions Exchange Group

3. Do not try to solve all the problems through technology, business can also be a problem to solve by means of business

Such as the establishment of the beginning of 12306, 0:00 ticketing, access the site at once to bear the amount of tens of millions, 12306 direct result of the collapse of this site, various professional and sub-professional opinions, advice. But this is only to solve the problem through technology? So, for this requirement, 12306 not only to improve its technology infrastructure, but also to adjust its business structure, not 0:00 ticketing, introduced a queuing mechanism on the ticket the way, the whole point of the ticket changed to sub-period ticket, concurrency under control the performance of the entire site to improve

 

4, an important goal and the driving force of the development of computer software is to reduce the coupling of the software, the less the relationship between things, the less influence each other, the more we can develop independently

 

5, asynchronous architecture is typical of the producer-consumer model

 

6, there are several benefits to using asynchronous queue

Availability (1) improve the system

(2) speed up access site

(3) elimination of peak concurrent access

 

7, Scalability refers to the site concurrent access to data storage needs continue to pressure by means of a cluster of servers added to ease the rising and growing user

 

8, a measure of scalability architecture

(1) the availability of multiple servers clustered architecture

(2) Is it easy to add new servers to the server

(3) whether the server is added to the original server and can provide non-discriminatory service

(4) total server cluster accommodate Is there a limit

 

9, an important indicator Load busy response system

System Load, also known as Load, that is the system load, refers to the currently executed by the CPU and waiting to be executed by the CPU and the number of threads ,, is an important indicator to reflect on how busy the system. Under the multi-core CPU, perfect situation should be all CPU are used, there is no thread waiting to be processed. Load value is lower than the number of CPU, means that the CPU idle, there is a waste of resources; Load value is higher than the number of CPU, means the process is waiting for insufficient CPU scheduling, resource exists

 

10, browser access optimization tools

(1) reduce the http request, merging CSS, JS, picture, do not get http request to initiate multiple data

(2) using the browser cache, store static resources, Cache-Control, Expires, Pragma, Last_Modified and are cached HTTP HEADER

(3) compression is enabled, effectively reduce the amount of data transmitted in communication

(4) CSS on top, JS on the following, because JS download will be executed immediately, potentially blocking page loading speed

(5) reduce the transmission Cookie

 

11, a few details of the cache used

(1) Do not write frequently modified data cache, general literacy ratio of at least 2: 1 or more cache will do, that there is a write-once read at least twice, such as Sina Weibo, the popular microblogging, once written may be read by millions of times, it is significantly cost-effective

(2) if there is no hotspot access, access to most of the data is not concentrated on a small part of the data, the cache does not make sense to do that, because there is a cache failure mechanisms, most of the data has not been accessed again was out of the cache

(3) tolerate a certain time of data inconsistency, unless immediately notify the cache data update, but it will bring the issue of consistency overhead and things

(4) distributed cache to improve cache cluster availability

(5) start a new cache does not have any data in the process of rebuilding the cache, the system performance and database load is not very good, and therefore according to the project, according to business, part of the data is loaded at startup Well, this is the pre-caching heat

(6) cache and make an invalid parameter set expiration time, avoid improper business or malicious attacks frequently call interface to query the database, once a database which can not find a Key value entered invalid data cache, visit the Key again within a period of time no data returned value

 

12, the message queue having a good clipping action (mentioned earlier), but the need to pay attention to the business process with appropriate modifications

Asynchronous process, the short-time high concurrency things stores the generated message in the message queue, so that things can be concurrently flattened peak. However, the point is to be noted, since the data is written immediately after the message queue back to the user, data in the subsequent service may fail validation, database write operation, so after use message queue for asynchronous processing operations ,, need suitably modified business process with, such as after the order is submitted, the order data is written to the message queue, can not return immediately submit to the success of customer orders, orders for consumer needs in the message queue process really processed the order, even after the goods out of the library, and through e-mail or SMS message informing the user successful order to avoid trade disputes

 

13, CDN and the basic principles are the reverse proxy cache, except that the CDN provider's network deployed in the engine room, allowing users to request site service, you can get data from the network provider from their recent room; the reverse proxy deployed in the central office site, when the user requests reach the center of the room, the first visit is a reverse proxy server, if the reverse proxy server caches the resource requested by the user, it will be returned directly to the user. The purpose of using CDN and reverse proxy is the same:

(1) rate of return as soon as possible to the user, to speed up user access to the site

(2) reduce the load pressure of the back-end server

 

java interview questions    interview questions Exchange Group

14, application server clusters Session Management

Stand-alone environment, Session can be deployed Web container management on the server, using load balancing cluster environment, the server load balancing might distribute requests to the cluster of any one application server, ensure that each request is still able to obtain correct Session is much more complex than a stand-alone. Cluster environment, Session management has the following means:

1, Session Copy

Session synchronization between clusters of several server objects that are stored on each server with Session information for all users, so any machine downtime will not result in the loss of data Session, while the server uses Session, also just to get from the local machine.

This solution is simple, quickly read Session information from the machine, but can only be used in cluster size is relatively small. When a large number of communication needs to be Session replication between cluster size is large, cluster servers, servers and networks occupy a lot of resources, overburdened system. And because all users Session is backed up in each server, in the case of a large number of users to access, even when the server is not enough memory to use the Session appears.

Core Application Clusters large sites that thousands of servers, tens of millions of concurrent users, does not apply to this program.

2, Session Binding

Session binding can take advantage of load balancing algorithm Hash original address, load balancing server will always come from the same IP distribute requests to the same server. Thus during the entire session, all user requests are processed on the same server, i.e. Session bound to a specific server, ensure that the Session always accessible on this server, this method is also known as sticky session.

But Session bound program does not meet the demand for highly available systems, because there is no Session because the business can not be completed once a server is down, then the Session on the machine will cease to exist, the user request to switch to other machines deal with. So while most of the load balancing servers provide load balancing algorithm former address, but few sites take advantage of this algorithm Session Management

3, recording by using Session Cookie

Early enterprise applications using C / S architecture, a management Session Session way it is time to record in the client, the server for each request, the request is sent to the Session on the server, then the server requests processed modified the Session response to the client. No client site, but can use the browser supports Cookie Record Session.

Cookie recorded using the Session also has some disadvantages:

(1) by Cookie size limitation, information can be recorded is limited

(2) in response to each request and carries Cookie, affect performance

(3) If the user closes Cookie, access will not normal

But because Cookie easy to use and high availability, linear support scalable application server and Session information most applications need to record a relatively small, in fact, many Web sites use more or less records Cookie Session

4, Session Server

Using a separate server deployment Session (cluster) unified management Session, each read and write the application server Session, Session have access to the server.

This solution is in fact the state of the application server is separated into stateless and stateful application servers Session server, and then the different properties of the two servers were designed their architecture.

Session server for stateful, a relatively simple method is to use a distributed caching, database, etc., packaged products on the basis of these, to comply with the storage and access requirements of Session; if business scenarios for Session Management have relatively high requirements, then it would use Session service integrated single sign-on (SSO), customer service and other functions, you need to develop a special Session service management platform.

Guess you like

Origin www.cnblogs.com/wuliaojava/p/11719703.html