Road evolution of large sites - Reading "large-scale Web Site Technology Framework"

Road evolution of large sites - Reading "large-scale Web Site Technology Framework"
____

author: Yao furry demon blog & Health

01 large sites or software What are the characteristics?

High concurrency, high flow, micro-letters are Nikkatsu 1 billion a
highly available 7 × 24, commonly known as the four nines (99.99%)
to store and manage large volumes of data,
national and even global users distributed, complex network
security environment is poor
requirements change frequently It requires rapid iteration

Finally, a gradual development.
All the major sites are developed from a small website up.
Good site architecture and complex evolution to all, not a good start on the design.
When the year before departure, and who can not think of micro-channel can Nikkatsu billions certainly not the first time thousands of server clusters right.

02 and the first initial of the evolution of the road: the separation of applications and data

Our initial small sites is what?
From a logical point of view, a service application, a database; from the physical point of view, a single server to get.
After the increase of users, we need to start a separate application with the database.

That application with the required database server configuration is the same as you?
Of course, is NO.
Applications need to handle more business logic, so they need a little bit better and more CPU.
The database will have to quickly retrieve the disk cache with the placement data, thus requiring little faster disks and bigger memory.

Of course, the goal of all evolution is wanted higher, faster, stronger. But sometimes can not do everything, we need to choose.

03 Second Evolution: cache optimization

Congratulations, a website optimization experience changed for the better, users began to increase, but the trouble came again.
Users increased pressure to bring the database also big, how do?

All models are even reality there is an irrefutable truth in the IT industry, namely 28 principles.
The same is true on the site visit, 80% of business access is always focused on the 20% of the data.
Taobao to buy things on the turn in front of so little, Taobao have been looking for our good credit, and high volume sellers; Baidu, which is turned in front of it twenty-three, even one in the first few (if not the words of the ad) ; microblogging hot search melons for the top ten will see it, you will back a point in the past it?

So, 20% of this data is cached, it is not possible to reduce the pressure to access the database to improve the performance of the site visit?
YES.

So, how does the cache?
We usually use caching scheme, there are two, namely local cache on the application server, and independent distributed cache.

What are the advantages and disadvantages?
Local cache speed faster, but is limited by the memory of the application server, and will lead to the application of contention.
Independent distributed cache can use the cluster, at a slower pace, but soon, only basic network IO consumption; but the drawback is one word: expensive. Because of the need to purchase a separate cache server.
So in reality, sometimes, and sometimes we do not buy a separate cache server, but on a large memory application or database server, set thresholds, shared memory.

04 Third Evolution: clusters and clusters and database applications to read and write separation

Wow, after using the cache, access to data so fast.
But users have increased, but to support the application how to do? Really happy trouble ah.
A single database is not in danger of downtime ah?

Alas, the cluster chant. To spend money to get away.

Application Clusters database cluster.

This is our today's most common software architecture deployment scenarios.

By a scheduler is responsible for the equalizer (nginx, F5, etc.), the user can request by polling or specific IP embodiment, the application server distributed to any server in the cluster, relief of the pressure.
The database to Oracle as an example, it is possible to install RAC on a production server version, and the application can be accessed via the VIP access to the database (Virtual IP), or JDBC access to the database cluster.
But in the application development website, it is generally more choices mysql. Although early Taobao also use Oracle, but the latter also turn the mysql.
As for why?
Oh, a word, expensive. Two words, very expensive. Three words, too expensive.

There are two benefits of clusters: 1, service ease pressure; 2, high availability, one of them breaks down, the other can continue to use the opportunity to restore your service.

General Software Evolution here to get away.

But the site has a different place, very often, are reading and writing less.
Point of praise, comments Chigua playing a lot less than that, right?

And although the situation read more cache configuration by digestion part, but there are still some read (cache miss, the cache expires) and all write operations access the database.
So when your subscribers and rapid increases to a certain scale, but also became our database bottlenecks.

At present, most databases are supported master-slave hot standby function, the primary database through the main data updates from replication to synchronize from the database.
At this point we can build specialized applications read and write access to the database module, the database read and write separation transparent to the application.

Sometimes we will even specialized query module spun off to become another subsystem.

05 Fourth evolution is not evolution: CDN and Reverse Proxy

Why do CDN?
Mobile, Telecom, China Unicom ......, east, south, southwest, northwest ......, complex network environment, speed of access to the site each region are not the same.
CDN with reverse proxy is a means to accelerate access to their basic principles are the cache.
The difference is deployed in the network CDN vendor room, reverse proxy is deployed in the engine room website.
CDN with reverse goal is early return data to the user.

06 Three Kingdoms type fifth evolution: the evolution of distributed, split and merge operations

Distributed database is a last resort, used only in a single table only when the data is very large.
Many websites and software simply not see this step, distributed database will bring more trouble with complexity.
Site more commonly used means business is split, split different business applications, different business split library, deployed on different physical servers.

This move, on the go, called divide and conquer. In the three countries, he called together for a long minute.

To the mall site, for example, may be home, shop, order, the seller, the buyer split the different product lines, which different product lines and multiple applications can be split, divided into different business team management.

Home can be established between applications through a hyperlink relationship, the data can also be distributed through the message queue, of course, is still the most access to the same data storage system to form a complete system.

This is called micro-services.

With more and more small business split, more and more complex, there have been some can share services. Such as user management, merchandise management, then it can be extracted from these shared services, independent deployment.
With the now popular words, called business units.

Technically, everyone has made a variety of wheels, to solve the problem actually have a lot in common. Such as files, images processing, data storage and search system.
Technology in Taiwan also.

On the data, all of the system as more and fragmented resolved, it is stored in a different database, and the formation of a data island. These open up, to make data warehousing, analysis of user portrait not Miya? Coupons push, cooked to kill the big data to find out.
And technically, as more and more data, data storage and retrieval technology needs are also increasing. So we will cite some non-relational technologies such as NoSQL, search engines and so on.
Finally, the data in the table also.

The so-called long period of division, the three countries forming the new.


I welcome the attention of the public number: Yao plush blog

Here are my programming career insights and summarize relevant technical Java, Linux, Oracle, mysql, there are architectural design practice and theory of reading the work, there are tuning JVM, Linux, database, there are ......

Art, have feelings, temperature

Welcome to my attention: Yao hairy demon & Health

No public

Guess you like

Origin www.cnblogs.com/yaomaomao/p/11882060.html