Reading Notes of "Core Principles and Case Analysis of Large-scale Website Technology Architecture"

First, the evolution of large-scale website architecture

1. Features of large websites

  • High concurrency, large traffic
  • High availability
  • Massive Data
  • Wide distribution of users and complex network conditions
  • Bad security environment
  • Rapid changes in requirements and frequent releases
  • incremental development

2. The development process of large-scale website architecture

  • File server, database server, application server separation
  • The application server increases the local cache, the local cache takes priority, and the distributed cache server is added
  • The application server cluster is used to improve the concurrency of the website, which is uniformly scheduled and distributed by the load balancing server
  • Working with distributed file systems and distributed database systems
  • Use CDN network acceleration and reverse proxy server: The basic principle is caching. CDN is deployed in the network provider's computer room. When users request network services, they can obtain data from the network provider's computer room closest to themselves. The reverse proxy is deployed in the central computer room. When the user's request reaches the central computer room, the first access is to the reverse proxy server. If the resource requested by the user is cached in the reverse proxy server, it will be returned directly to the user. Both return data to users as soon as possible, speed up user access, and reduce the pressure on back-end servers.
  • Using Nosql and Search Server
  • Business splitting, according to different products and different businesses, a website is split into different applications, each application is deployed and maintained independently, and each application forms an interconnection through message middleware or access to the same data storage system complete system
  • Distributed services, extract public services, deploy them independently, and provide reusable shared business services for other application servers (can be implemented through dubbo and zookeeper clusters)

3. Values ​​and design misunderstandings of website architecture

  • The selection of the structure can be flexibly responded to the needs of the website
  • The main driving force is the business development of the website
  • Do not blindly follow the architectural design of large companies and copy them
  • Website technology exists for business, not technology for technology's sake
  • Don't try to use technology to solve everything. Technology is used to solve business problems. The same business problems can also be solved by modifying the business structure by using business methods.

Second, the large-scale website architecture model

1. Layering

The system is divided into several parts in the horizontal dimension, each part is responsible for a single responsibility, and a complete system is formed through the dependence and invocation of the upper layer to the lower layer. The 7-layer communication protocol of the network, computer hardware, operating system, and application software all adopt the idea of ​​​​layering. The website system is divided into application layer, service layer and data layer, and the three-layer structure is deployed on different servers.

  • Application layer: responsible for specific business and view display
  • Service layer: Provide service support for the application layer, such as user management services, shopping cart services
  • Data layer: Provide data storage access services, such as databases, caches, files, search engines, etc.

2, split

The system is divided vertically, such as the application layer, which can divide different businesses, and divide shopping, forums, search, and advertising into different applications, which are responsible for independent teams. Deployed on different servers.

3. Distributed

The purpose of layering and splitting is to facilitate distributed deployment of split modules, and deploy different modules on different servers. At the same time, distribution also brings many problems,

  • Communication between different servers is highly dependent on the network
  • Difficulty maintaining data consistency
  • Website dependencies are intricate and difficult to develop and maintain

    Common distributed solutions are as follows

  • Distributed applications and services.
  • Distributed static resources
  • Distributed Data and Storage
  • Distributed Computing
  • Distributed file system
  • Distributed lock
  • Distributed configuration. Support website online service configuration real-time update

4. Cluster

Although the distributed application has been deployed separately, for the modules that users access centrally, it is necessary to deploy server clustering independently, that is, multiple servers deploy the same application to form a cluster, and provide services to the outside world through load balancing equipment. And supports linear scaling, failover in the event of a failure. Even for distributed applications and services with a small amount of access, at least two small clusters should be deployed to improve the availability of the system.

5. Cache

Caching is the first means of improving software performance. Large websites use cache designs in many ways:

  • CDN, or Content Delivery Network. The following static resources (less changing data) of the website are cached here, which can be returned to the user at the fastest speed nearby. For example, video websites and portal websites will cache hot content with a large number of users in the CDN
  • reverse proxy. When the user's request arrives at the data center, the reverse proxy server is the first to be accessed, where the static resources of the website are cached, and the request does not need to be forwarded to the application server, but is directly returned to the user
  • local cache. This hot data is cached locally on the application server, and the application can access it directly in the local memory without accessing the data layer
  • Distributed cache. The data is stored in a special distributed cache cluster, and the application server accesses the cached data through network communication.

      At the same time, the cache also has the problems of cache breakdown, cache avalanche, and cache hotspots not being concentrated.

6. Asynchronous

In a single server, asynchrony can be achieved through multi-thread shared memory queues. In a distributed system, multiple server clusters can achieve asynchrony through distributed message queues. Asynchronous architecture is a typical producer-consumer pattern. Using asynchronous message queues can bring the following benefits:

  • Improve system availability
  • Improve system response speed
  • Eliminate concurrent access spikes. The message queue can put the suddenly increased access request data into the queue and wait for the consumer to process it without causing too much pressure on the website

7. Redundancy

In order to improve the availability of the system and prevent the system from continuing to serve even when some servers are down, certain redundant servers and redundant data backup are required. In addition to regular backup to achieve cold backup, the database also needs to be separated from master and slave, and real-time synchronization to achieve hot backup. In order to resist force majeure such as tsunamis and earthquakes, it is also necessary to deploy disaster recovery data centers around the world.

8. Automation

Release process automation: automated code management, automated testing, automated security testing, automated deployment. In the process of system operation, there are: automatic monitoring, automatic alarm, automatic failover, automatic failure recovery, automatic degradation, and automatic resource allocation.

9. Application of Architecture Pattern in Sina Weibo

Sina Weibo system is divided into three layers:

  • The bottom layer is the basic service layer, which provides basic services such as database, cache, search, and storage.
  • The middle layer is the platform service and application service layer. The core of Weibo is Weibo, users, and relationships. These services are divided into independent modules, which form the business foundation of Weibo through dependent calls and shared basic services.
  • The API layer is the business layer of Weibo, including websites, apps, and third-party applications. They are integrated into the Weibo system by calling APIs to form an ecosystem.

For the release of Weibo, use the synchronous push mode in the early morning. After a user publishes a Weibo, the Weibo will be inserted into the subscription list of all fans in the database. When the number of users is relatively large, a large number of database write operations will be caused, which will exceed the load. cause system performance to degrade. Later , the asynchronous push -pull method was used. After the user publishes the microblog, it immediately writes to the message queue and then returns immediately. The message queue consumer pushes the microblog to the subscription list of the current online fans, and the non-online users log in and pull the microblog according to the follow list. Get Weibo subscription list

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325008036&siteId=291194637