Xiaobai looks at the design scheme of large-scale website architecture

The architecture design of large websites is generally very different from that of small websites, and the technical points considered are also different

Insert picture description here

01 Preface


Recently, I am more interested in the architecture of large-scale websites. I read a book about architecture and recorded my thoughts.

We know that the software design of Taobao, Weibo, 12306, etc. must be different from the software design we usually use, because the former involves a large amount of data storage, a large number of user accesses, and high concurrent traffic (instant access) . If one of the links is not done well, it will definitely affect the overall performance, so there will be a short board effect.

What we have seen, such as on-time ticket sales, double eleven panic buying, Weibo search, etc., may cause server downtime and network paralysis. On the one hand, the network may be congested, but more importantly, the website's architectural design Can meet the high concurrency and high availability (7 * 24 hours) state. Let's take a look at how the architectural design of large websites is implemented step by step.

02 Features


The characteristics we see on the surface are these two, with large website visits and high website concurrency. In addition, we will not care about other things. This is what users can think of, and it is also the two most urgent problems that users want to solve. However, in the view of the development technicians, there are many factors to be considered. In general, there are the following points:

  • High concurrency

What is the amount of websites that you have to withstand an instant visit? For example, the double-eleven panic buying concurrency can reach the level of 100 million. Such high concurrency is the pressure that ordinary websites simply cannot bear, not just the number of servers, but also the server. Factors such as design options.

  • High availability

At the beginning, I didn't understand what high availability is. Simply put, it is 7 * 24 hours to keep the service normal. Because you cannot guarantee that users will browse your website in the middle of the night, we want to ensure that the service is always normal. Generally speaking, some small websites or systems will be updated at 0 p.m. to restrict personnel access.

  • Mass data storage

When it comes to large websites, the average user is massive, and you need to consider how to store user data and user browsing information. For example, the WeChat we use every day, the circle of friends and chat information published every day are massive, stored in Tencent's dedicated server cluster (many servers).

  • High safety factor

It is undeniable that we are involved in some bank transactions, WeChat transfers, or Alipay transfers every day. In fact, if you think about your cash changes, it is just a change in numbers. It is worrying to think about it. For example, your Alipay balance is just a number lying there, the money has been used by Alipay for other channels, but your withdrawal and other operations will be scheduled back. All of these processes must be safe.

  • Frequent updates

Because the background will collect some information of the user to improve the product function and product experience, or the user hopes to add a certain function. At this time, there will be user needs, need to update product features. Each software development meets a function, and then iteratively develops continuously.

  • Progressive development

No matter how big the website is, it starts small, and no matter how big the high-rise building is made of bricks and tiles. Progressive development is different from traditional software development and design. Without the complete expectations of software and the foreseeability of the function as a whole, it is all about improving itself in continuous development. Through the continuous operation of products, adapt to user needs and adapt to the trend of the times.

03 Design Evolution


I don't know if you have heard the word "LAMP". This is the early website design plan, which is only suitable for small websites. It is definitely not possible now. Due to the small amount of data at the beginning, one server is enough to support the operation of the real website. The operating system uses Linux, the server uses Apache, the database uses mysql, and the language uses PHP to develop.

After the development of the business, the website has been continuously improved and evolved, forming a technical solution with rules to follow. Every stage of the experience is driven by the business. If your website does not have this kind of demand, do programmers engage in these big design solutions? As the book says, it is business that makes technology, and career makes people.

Initial development stage

The business demand is not high, using simple configuration, free and open source software can build a system.

Insert picture description here

Separation of application data and service data

With the development of the business, the performance of the website will inevitably decline, so at this time, the service can be separated.

Insert picture description here

Use cache

According to the 28th rule, we know that 80% of users access 20% of the functions of the website, so we only need to do the functions that users need most, then we can use the caching technology to quickly and quickly take the resources that users need Return to the user.

Insert picture description here

Application server cluster

With the increase of your business volume and the continuous increase of functions, the processing of one server may not be able to withstand. In this case, we will put multiple servers to handle this business at the same time. Just like handing over a user ’s request to multiple people, performance will definitely improve.

Insert picture description here

Database read and write separation

Not only at the application level, the operation of data is also important. We know that data is either read or written. Generally speaking, there are more read operations by users. Therefore, we separate the read and write of the database, one database provides data, and the other writes user data, and then performs data synchronization (master-slave backup).

Insert picture description here

Load balancing and CDN

For websites with a relatively large business volume, such as those spread all over the country or even the world, you need to use CDN. Because users in the south access servers in the north, there will be a delay in the middle; or users in the United States visit China, the delay will be even greater. CDN is a content distribution network, the server closest to the user will directly return data, which is much faster.

Insert picture description here

There is also load balancing. The data added to the CDN expires. When users need to access the data center, they will first go to the load balancing server. There is also a cache. If there is no hit, your request will be distributed to the application server with less pressure. In general, the principle of CDN and load balancing server is to use caching technology.

Distributed file system and distributed database system

Distributed is actually splitting the original data storage, storing the data of different businesses in different servers, reducing the pressure on the remaining servers. For example, you can store the user's order data in the A database server, and the user's information in the B database server.

Insert picture description here

NoSQL and search engines

The search engine is to cope with the search function of the website.

Insert picture description here

Business split

Split a website into multiple different applications, and each application is independently deployed and maintained. For example, you can separate a function and provide an interface to be embedded in the website. The logical processing is on another server.

Insert picture description here

Distributed service

Extract the public business, and then deploy it independently, and use the distributed service to call the common service to complete the specific business operation.

Insert picture description here

04 Summary

We can look at the current Internet companies. There are only a few BAT-level companies. After all, most of them are small companies, and they are gradually developing their own businesses. It is impossible for him to have time and energy to research and develop in every field. But the average company will have its own professional point, as long as you develop your own business and serve our users well, sometimes it is not beneficial to engage in many bells and whistles.

The most important thing a small website needs to do is to provide users with good services to create value, get user approval, survive, and grow wildly. ——Li Zhihui "Technical Architecture of Large Websites"

Some companies want to transform their architecture because they have seen a lot of new technologies recently. This is technology for technology. Sometimes your starting point is good, but it can also bring bad results, and you do n’t follow your own business trends. Also, do not blindly imitate the technical solutions of large companies, but develop your own independent business and develop your own independent technology.

In general, every company now has its own set of technical solutions. Whether to refactor your technical solutions or change your server distribution depends on whether you have this need. However, the development of Internet technology is becoming more and more mature. You can purchase some resources for payment, such as Alibaba Cloud and Tencent Cloud, etc., and the technology is stable and the quality is over. After all, it is a big company. The amount of high-quality resources you need depends on how much money you give. It is very simple and has no worries (already solved it for you).

Finally, I recommend this book "Technical Architecture of Large Websites: Core Principles and Case Analysis". The author of this book is very good in technology and unique in analysis. It is worth reading in depth.

Reference article

  • Li Zhihui "Technical Architecture of Large Websites: Core Principles and Case Analysis"

Insert picture description here

Published 57 original articles · won praise 6 · views 6419

Guess you like

Origin blog.csdn.net/weixin_42724176/article/details/105012750