"Alipay Architect" tells this: the evolution of large-scale website technology architecture

Recently, I am reading 2 books about large-scale website architecture: "Technical Architecture of Large-Scale Websites - Core Principles and Case Analysis" Li Zhihui, "Large-scale Website System and Java Middleware Practice" Zeng Xianjie.

I expect to learn from these books how large websites are architected and what problems can be encountered in the process. After reading these two books, I concluded two big questions:

1. Why does the technical architecture of the website evolve? In other words, why do websites get bigger?

2. What problems will be encountered during the evolution process? Or what problems will be encountered in order to evolve?

Why does the technical architecture of the website evolve?

I personally summarize two driving forces for the evolution of our technical architecture, which drive why we evolve the technical architecture of the website:

1. Internal driving force: We expect to make our current business better and develop more new businesses

2. External driving force: the increase in the number of users, the diversification of user types

These two driving forces are not independent, more often they are parallel. I think Taobao is the result of two parallel driving forces.

The reason for evolution is simple. But when should we evolve the technical architecture of the website, and how? Faced with these problems, to be honest, I don't have any experience, and in reality, every company faces different problems at that time, so it is difficult for me to sum up the timing of evolution from my experience.

But I can cut into this problem from another angle: study the internal and external structure of the website, find the possible problem points of these structures, know or foresee the problem points, and of course you will know how to evolve. Similar to when you understand the structure of a PC, you also know when to add memory and when to add a hard disk.

So let's first look at the external structure of the website:

In the external structure, we can see that it consists of the following parts:

U: represents the user group. How does our website evolve when the user base changes? For the analysis of user groups, the dimensions that I can currently know are: quantity, type, and geographic location (region).

N: represents the network environment. The network environment is different in each region. You can imagine why we need a CDN. How does our website evolve when we expect users in each area to have a good experience?

S: stands for safety. Just how safe do we want to be? This has to do with the current stage of the site and the nature of your site.

C: Represents our website. belong to the internal structure

The internal structure of the website:

The composition of the internal structure:

A: Application service.

D: data service

The bottom line is that these components provide us with a baseline to consider when considering whether or how a website should evolve.

So why don't we design the site to be "large" in the first place. Li Zhihui wrote in the postscript: "Don't try to design a large website", "The reason is that the development and operation of the Internet has its own laws, and the short Internet history has repeatedly proved that such attempts will not work." He also said: "Large websites are not designed, but gradually evolved." For the last sentence, I need to remind: "not designed" does not mean "randomly designed".

Regarding the "design of large-scale websites", my personal opinion is that now we have a "cloud", and computing can be bought. As long as our design can adapt to the "cloud", can I design a large-scale website from the beginning? ?

What problems will be encountered in the process of evolution

- first time

Start with a small website. One server will suffice.

- Separation of data services and application services

More and more users represent more and more data, and one server can no longer satisfy it. We separate data service and application service, and configure better CPU and memory for application server. And configure better and larger hard disks for data servers.

- use cache

Because 80% of business access is concentrated on 20% of the data, if we can cache this part of the data, the performance will be improved all at once. There are two types of cache: local cache and remote distributed cache. Which one to use? Or use both, I don't know yet.

Here's a question that the book doesn't mention: what data should be cached? There should be some principles.

- Use server clusters

When the processing power of this server reaches its limit, it becomes a bottleneck. While you can buy more powerful hardware, there will always be an upper limit. At this point, we need a cluster of servers. At this time, a new thing must be added: a load balancing scheduling server.

However, when using server clusters, there is one issue that needs to be considered: Session management. Session management can be done in the following ways:

Session Sticky: For example, if we make sure we use our own tableware and chopsticks every time we eat, and as long as we keep our tableware and chopsticks in a restaurant, as long as we go to this restaurant every time we eat .

Problems with this way:

1. A server restarts, and the session above is gone

2. The load balancer has become a stateful machine, and it will be troublesome to achieve disaster recovery

Session Copy: It's like we keep a copy of our own tableware in all restaurants. Not suitable for large-scale clusters, suitable for situations where there are not many machines

Problems with this scheme:

1. Bandwidth problems between application servers

2. When a large number of users are online, it takes up too much memory

Cookie-based: similar to bringing your own tableware with you every time you eat

Problems with this scheme:

1. Cookie length limit

2. Security

3. Consumption of bandwidth outside the data center

4. Performance impact, the server processes more content for each request

Session server: can also be clustered. This method is suitable for a large number of sessions and a large number of web servers

This scheme needs to consider:

1. Ensure the availability of the session server

2. We need to make adjustments when writing the application. I don't know whether the application server can make this part of the logic transparent.

Here is a place for you to get the architecture data in Java, and add the group to get it for free: 433540541, if you want, I will provide it to you for free, hoping to help friends in this industry.

- Database read and write separation

Some reads of the database (uncached, cache expired) and all writes also need to go through the database. When the number of users reaches a certain amount, the database will become a bottleneck. Here we use the hot standby function provided by the database to introduce all read operations to the slave server. Note: Read-write separation solves the problem of high read pressure.

Because the read and write of the database is separated, our application has to make corresponding changes. We implement a data access module so that the upper-layer code writers do not know the existence of read-write separation. Here, I would like to know if I use the ORM model, how to achieve the separation of read and write?

Database read-write separation will encounter the following problems:

Data replication issues: Consider time delay, database support, and replication condition support. Don't forget, after the extension room, this is even more of a problem.

Application routing problem to data source

- Speed ​​up website response with reverse proxy and CDN

Using CDN can solve the problem of access speed in different regions, and the reverse proxy caches user resources in the server room:

- Use a distributed file system

- Dedicated to the database library: data is split vertically.

This can solve the problem of partial data writing

Problems encountered when splitting a database vertically:

Transactions Across Businesses

There are more configuration items in the application

There are two approaches to the question of transactions:

Use distributed transactions

Get rid of transactions or do not pursue strong transactions

- The data volume or update volume of a data table of a business has reached the bottleneck of a single database: horizontal data splitting

Split the data of the same table into two databases

Problems encountered in horizontal splitting of data:

SQL routing problem, you need to know which database a User is on.

The primary key strategy will be different.

Performance issues when querying, such as pagination issues

.Using Search Engines: Solving Data Query Problems

.Some scenarios can use NoSQL to improve performance

.Develop data unified access module: solve the data source problem of upper-layer application development

- Business split and application split

The business of the website is becoming more and more complex, and it has become impractical to build a single large-scale application to do all this business. From the management point of view, it is not convenient to manage. However, it is difficult to find a common model for the division of business, which is a mixed problem of enterprise management and technical problems. At the same time, it is related to the specific situation of each enterprise.

But from the perspective of these two books, the final architecture is going to be service-oriented, that is, SOA. And how to implement SOA is another big topic, not the scope of this article.

I took a screenshot from Cheng Li's speech in 2008 to illustrate what the architecture after SOA looks like:

- Non-functional problems

– Security issues, monitoring issues

– Release Issues: New Architecture Means New Releases

– branch room

– Neither of these books says anything about the extension room. I have no experience, but I can guess that if the computer room is to be divided, all the above problems may have to be reconsidered.

– Changes in organizational structure

Changes in our technical structure will inevitably lead to changes in our organizational structure, and vice versa.

It seems that this part should not be managed by us, but, I think, our technical personnel should also participate in the design of part of the organizational structure. For example, the design of an organizational structure involves performance, and performance is sometimes much like the laws of a country. What happens if a country's laws are not sound? You know.

At the same time, we must also consider the cost of learning the new architecture for people.

I am currently reading related books in this part, but I don't have a systematic understanding.

Summarize:

- About the order of evolution

In reality, the evolution of the technical architecture is not necessarily listed in this way from the beginning to the end of the article, so the decision should be made according to the specific situation.

- On the evolution of traditional and modern "cloud" environment

It is a pity that only Li Zhihui talked about the cloud, and only clicked on it - "Now more and more people's websites are built on the basis of cloud computing services provided by large websites from the very beginning, and all the resources they need: computing , storage, and network can be purchased linearly scaling on demand, and you don’t need to piece together various resources by yourself, and comprehensively use various technical solutions to gradually improve your website architecture.”

Because I haven't used "cloud" for a long time, I can't summarize the difference between the cloud architecture and the traditional cloudless architecture in the evolution.

Speaking of the traditional architecture evolution, the results of my own summary and thinking are:

When adjusting the architecture of a website, two major dimensions can be considered: data services and application services. In the process of this adjustment, it is necessary to distinguish which point is the bottleneck at present, and it is necessary to know which point has the highest priority for optimization. At the same time, the most important point is that although we are technical personnel, we should also learn business knowledge, so that when we consider problems, we can distinguish which are business problems and which are technical problems. You need to know that some problems are not more effective with technical means than business means.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325021064&siteId=291194637