Good architecture is not designed, but evolved

Good architecture is not designed, but evolved

For many startups, it is difficult to predict at the early stage what the website architecture will look like after ten times, one hundred times, and one thousand times the traffic. At the same time, if the system is designed with a tens of millions of concurrent traffic architecture at the beginning, it is difficult for a company to support this cost.

Therefore, the main focus here will be on architectural dazzling. At each stage, find the problems faced by the website architecture at that stage, and then continue to solve these problems, and the entire architecture will continue to evolve during this process.

At the beginning of the establishment of 58.com, the traffic of the site was very small, which may be at the level of 100,000, which means that there are several visits per second on average. The characteristics of the website architecture at this time are: the request volume is relatively low, The amount of data is relatively small, and the amount of code is relatively small. At this point the site can be easily handled by a few engineers, so there is no "architecture" at all.

In fact, this is also a problem faced by many startups in the early days. At the beginning, the site structure of 58.com was summarized in one word as "ALL IN ONE", as shown in the following figure:

Like a single-machine system, everything is deployed on a single machine, including sites, databases, files, and so on. The core work of engineers every day is CURD. Some data is sent from the front end, and then the business logic layer is assembled into some CURD to access the database, the database returns data, the data is assembled into pages, and finally returned to the browser. I believe that many entrepreneurial teams are faced with a similar situation in the early stage, writing code every day, writing SQL, interface parameters, accessing data, and so on.

A problem needs to be explained here. Everyone knows that the original 58.com used Windows, iis, SQL-Sever, and C#. Many startups today probably won't do that.

If you can do it all over again? Then I will choose LAMP

Many entrepreneurship students may think, what kind of structure is suitable for the initial stage? If we start all over again, 58 will choose LAMP from the current point of view, why? The first is that there is no need to compile, and the rapid release function is powerful, from front-end to back-end, database access, business logic processing, etc. can be done, the most important thing is mature open source products, completely free. If you use LAMP to build a forum, two days is enough. So, if you are in the early days of your business, try not to use Windows anymore.

What are the main problems faced by 58.com at this stage? In fact, it is recruiting people. At first, engineers are prone to mistakes when writing CURD. At that time, DAO and ORM were introduced, so as to avoid directly facing the CURD statement, but facing the engineer's good at object-oriented, which can greatly improve the work efficiency and reduce the error rate.

Medium scale: the traffic crosses the stage of 100,000, and the database becomes the bottleneck

With the rapid growth of 58.com, the system quickly crossed the stage of 100,000 traffic. What are the main needs? The website can be accessed normally, of course, it would be better if the speed is faster. The problems faced by the system at this time are: it is easy to crash during peak traffic periods, because a large number of requests will be placed on the database, so the database becomes a new bottleneck, so the more people access it, the slower it will be. At this time, the number of machines has also changed from one to multiple, so it is natural to travel the distributed architecture, as shown in the following figure:

First of all, some very common technologies are used. On the one hand, dynamic and static separation is used. Dynamic pages are accessed through Web-Server, and static images are placed on some servers separately. Another point is the separation of read and write. In fact, for 58.com or most of the sites, generally speaking, they read more and write less. For 58.com, the vast majority of users access information, and only a few users come to post. So how do you scale read requests across the entire site architecture? Commonly used is master-slave synchronization, read-write separation. At the same time, there was only one database in the past, but now it uses multiple different databases to provide services. In this way, the reading and writing are expanded, and the problem of data access in medium-scale is quickly solved.

At this stage, the main contradiction of the system is "site coupling + read and write delay". How does 58.com decouple and how to alleviate the delay?

For 58.com, the typical business scenario is the home page, publishing information has a publishing page, information aggregation, title aggregation has a list page, clicking on a title has a detailed page, and these sites are all coupled in a program, or coupled In a site, when there is a problem with one site, the entire site will have problems together due to coupling.

The second question, everyone knows that database read requests and write requests are distributed on different databases. At this time, if you read again, you may read old data, because there is a delay in reading and writing. If a user posts a post, and if you look for it immediately, you will definitely not find it. The likely consequence is that two pieces of information are published one after another, which is a big problem. Especially when the request volume is getting bigger and bigger, this problem is more prominent.

When solving these problems, the first thing that comes to mind is to segment the core business of the original site, and then engineers segment it according to their own sites and business scenarios. First of all, business splitting is the first optimization attempted by 58.com - vertical splitting of business into homepage and release page. In addition, at the database level, it is also split, and the large data volume is split into small data volumes. In this way, the read and write latency is immediately alleviated. Especially after the code is split into different levels, the site coupling is also eased, and the data loading speed is also much improved.

At that time, some technologies were also used, and the splitting of dynamic resources and static resources was also mentioned earlier. Among them, we use the CDN service for static resources, which is convenient for data caching and nearby access, and the access speed is significantly improved. In addition, the MVC model is also used. Those who are good at the front-end will be the display layer. The engineers who are good at collaboration logic will be the Contorllers. The people who are good at data will be responsible for the data. The efficiency will be gradually improved, and finally the load balancing technology.

Large traffic: Turn the entire Windows technology system to the Java system

The traffic is getting bigger and bigger. When the traffic exceeds more than 10 million, the biggest problem faced by 58.com is performance and cost. It was mentioned earlier that the initial technical selection of 58.com was Windows, and the performance of the entire website became very low. Even after business splitting and some optimizations, this problem could not be solved, so a very difficult decision was made at that time, that is, transformation: the entire Windows technology system was turned to the Java system, which covered multiple dimensions such as operating system and database. .

In fact, many large Internet companies have experienced transformation in the process of traffic from small to large, including Jingdong, Taobao and so on. The requirements for technology are getting higher and higher, and no site can be hung up, and the requirements for the availability of the site are also getting higher and higher.

At this time, the business volume of 58.com also experienced an outbreak period. So we recruited a lot of engineers, and we wrote more and more sites together, but found that the efficiency was very low, and we often did some repetitive work, such as parameter parsing and so on. At the same time, businesses are interdependent. Whether it is a classification subsystem or an information subsystem, used car business and real estate business must access some underlying data such as users and information, and frequent communication between codes cannot be very efficient. .

The problem followed, the number of sites increased, the amount of data increased, and the number of machines rose from the first few to hundreds of machines. So how to provide the availability of the entire architecture? First, some improvements and optimizations were made in the upper layer, and further vertical splitting was done, and Cache was introduced at the same time, as shown in the following figure:

In terms of architecture improvement, a relatively independent service layer is built here, and corresponding code will be written for each business line done by this service layer. If the user makes a request, it is managed by the service layer. All upstream business lines call this service through the IDC framework just like calling a local function. The entire user logs in to access the Cache first. If the Cache changes, it will return directly. If the Cache does not change, it will access the database. In this way, the data of the database will be taken locally, put back into the Cache, and then called back to the previous round. In this way, the business logic is all encapsulated in the upstream management of the service. Only the service layer can write code for the business logic, and then the service layer is centrally managed and optimized, which improves efficiency.

In addition, in order to ensure the high availability of the site, reverse proxy technology is mainly used. Because for the user, he mainly uses the services of 58.com, and will not pay attention to the access to 58.com or a server with ten homepages. 58.com uses reverse proxy technology, DNS group, and LVS technology to ensure the high availability of the access layer, as well as the high availability of the service layer, site layer, and data layer. In addition, in order to ensure high availability, a redundant method is also used. Both site services and data services can be solved in this way. If a site is unavailable, change another site. If a database is not enough, add a few more. Of course, data redundancy will also bring some side effects. If the amount of data is updated, all the "redundancy" needs to be updated.

58.com also built a picture storage system, which was initially stored on the operating system. With the addition of new sites and new services, the pressure became greater and greater. As a result, 58.com built its own site framework and service framework, and now these two frameworks have also been open sourced (how to reduce site development costs? https://github.com/58code/Argo How to reduce service development costs?  https://github.com/58code/Argo github.com/58code/Gaea  ) just need to modify some basic configuration to use.

When the architecture becomes a "spider web", it is difficult to get human flesh!

With the further growth of the number of users and the concurrent amount of data, 58.com has also expanded a lot of new businesses, so the requirements for product iteration speed are very high, and the overall architecture has higher and higher requirements for automation.

In order to support the development of the business, the technical team further decoupled the architecture, and introduced a configuration center. If you want to access any service, you will not leave a service directly in the local configuration. The configuration center tells the service Features, if extended, the configuration center will automatically issue a message, if there is a machine to be offline, the configuration center will notify by sending an email in reverse.

The flexible service refers to the automatic new service when the traffic increases. It can be seen that after further decoupling, there are vertical services, wireless services, integrated services, etc., and these subsystems are related to each other through the configuration center.

Another point is about the database. When a certain point becomes the focus of a business line, it will focus on solving the problem of this point. In the initial stage, each business line had to access the database, cache, and user data, so the code was concentrated in the service layer. Now that the amount of data is getting bigger and bigger, everyone has to do data segmentation and segmentation for each business line. At this time, every page of 58.com is facing such pain points, so we bring this pain point to a centralized level. solve.

The last point is the contradiction of efficiency. At this time, there are many problems, and it is difficult to solve by "human flesh". This requires automation, including regression, testing, operation and maintenance, monitoring, etc. to return to automation.

It needs to be added here that intelligence is introduced at the product level, such as intelligent recommendation, which actively recommends some related topics; intelligent advertising, through some intelligent strategies, allows users to click more on advertisements and increase the inclusion of 58.com ;Intelligent search, adding some search strategies in the search process can increase the weight of the search, and also increase the PV of 58. Of course, all automated products are driven by technology.

future challenges

Now, the traffic of 58.com has exceeded 1 billion, so what challenges will the architecture face in the future? On the one hand, it is wireless and mobile. On the other hand, there are changes in requirements, and it is necessary to speed up iterating on something. If you have 1 billion traffic, it will definitely not work on a 100 million architecture. In the future, more parallel computing and real-time computing will be used. If real-time recommendation can be achieved, the effect will be very good, which is also one of the challenges. The last point, 58.com currently has about 3,000 servers, and it will expand to 10,000 in the future. This is the challenge of operation and maintenance.



 

Summarize

Finally, I will make a small summary. The problems encountered by the website at different stages are different, and the technologies used to solve these problems are also different. When the traffic is small, the main purpose is to improve the development efficiency. In the early stage, ORM, DAO, etc. should be introduced. Technology. As the traffic increases, the stability of the website is continuously improved by means of dynamic and static separation, read-write separation, master-slave synchronization, vertical splitting, CDN, and MVC. In the face of larger traffic, continuous improvement of high availability through vertical splitting, serviceization, reverse proxy, development framework (site/service), etc. In the face of hundreds of millions of larger traffic, we meet new challenges through centralization, flexible services, message bus, and automation (regression, testing, operation and maintenance, monitoring). The future is to continue to achieve.

Author: 58 Shen Jian



 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326400561&siteId=291194637