sqlserver database large-scale application solution summary database load balancing

With the widespread popularity of Internet applications, the storage and access of massive amounts of data has become a bottleneck in system design. For a large-scale Internet application, millions or even hundreds of millions of PVs every day will undoubtedly cause a very high load on the database. It has caused great problems for the stability and scalability of the system.

1. Load balancing technology A
load balancing cluster is composed of a group of mutually independent computer systems, which are connected through a conventional network or a dedicated network, and are connected by routers. The nodes cooperate with each other, share the load, and balance the pressure. For the client , The entire cluster can be regarded as an independent server with ultra-high performance.

1. Implementation principle To
achieve database load balancing technology, first of all, there must be a control terminal that can control the connection to the database. Here, it cuts off the direct connection between the database and the program, and all programs access the middle layer, and then the middle layer accesses the database. In this way, we can specifically control access to a certain database, and then we can adopt an effective balancing strategy according to the current load of the database to adjust which database to connect to each time.
2. Realize multi-database data synchronization
For load balancing, the most important thing is that all server data is synchronized in real time. This is necessary for a cluster, because if the data is not real-time and not synchronized, the data read by a user from one server is different from the data read from another server, which is not allowed. Therefore, the data synchronization of the database must be realized. In this way, there can be multiple resources when querying to achieve balance. The more commonly used method is the Moebius for SQL Server cluster. The Moebius for SQL Server cluster uses the core program to reside in the database of each machine. This core program is called Moebius for SQL Server middleware, and the main function is to monitor the database. Data changes and synchronize the changed data to other databases. The client will get a response after the data synchronization is completed. The synchronization process is completed concurrently, so the time for synchronizing to multiple databases and synchronizing to one database is basically the same; in addition, the synchronization process is completed in a transactional environment, which guarantees multiple The consistency of the data at any time. It is precisely because of the innovation of the Moebius middleware host in the database that the middleware not only knows the data changes, but also knows the SQL statements that cause the data changes, and intelligently adopts different data synchronization strategies according to the types of SQL statements to ensure the cost of data synchronization To minimize.

If the number of data items is small and the data content is not large, then the number of data items directly synchronized
is small, but it contains large data types, such as text, binary data, etc., then the data is compressed and then synchronized, thereby reducing network bandwidth Occupation and transmission time.
There are many pieces of data. At this time, the middleware will get the SQL statement that caused the data change, and then parse the SQL statement, analyze its execution plan and execution cost, and choose whether to synchronize the data or synchronize the SQL statement to other databases. This situation is very useful when adjusting the table structure or changing data in batches.
3. Advantages and disadvantages
(1) Strong scalability: When the system requires higher database processing speed, it can be extended simply by adding database servers.
(2) Maintainability: When a node fails, the system will automatically detect the failure and transfer the application of the failed node to ensure the continuous operation of the database.
(3) Security: Because the data will be synchronized on multiple servers, the redundancy of the data set can be realized, and the security can be ensured through multiple copies of data. In addition, it successfully put the database in the intranet, which better protects the security of the database.
(4) Ease of use: completely transparent to the application, the cluster exposed is an IP

(1) The load cannot be distributed according to the processing capacity of the Web server.
(2) The failure of the load balancer (control end) will cause the entire database system to be paralyzed.

 

Second, the database read and write separation
1. Implementation principle: The read and write separation is simply to separate the database read and write operations to correspond to different database servers, which can effectively reduce database pressure and also reduce io pressure. The main database provides write operations, and the slave database provides read operations. In fact, in many systems, it is mainly a read operation. When the master database performs a write operation, the data must be synchronized to the slave database to effectively ensure the integrity of the database.

(eBay's read-write ratio is 260:1, eBay's read-write separation)

(Microsoft database distribution) 

2. Implementation method: In the MS Sql server, you can use the release definition method to realize database replication and realize the separation of read and write. Replication is the technology of copying a set of data from one data source to multiple data sources, which is to publish a copy of data Effective way to multiple storage sites. Using replication technology, users can publish a copy of data to multiple servers. Replication technology can ensure that data distributed in different locations is automatically synchronized and updated, thereby ensuring data consistency. There are three types of SQL SERVER replication technology, namely: snapshot replication, transaction replication, and merge replication. SQL SERVER mainly uses publications and subscriptions to handle replication. The server where the source data is located is the publishing server, which is responsible for publishing the data. The publishing server copies all the changes of the data to be published to the distribution server. The distribution server contains a distribution database that can receive all changes to the data, save these changes, and then distribute these changes to the subscriber server.

3. Advantages and disadvantages
(1) The real-time performance of the data is poor: the data is not synchronized to the self-reading server in real time. After the data is written to the main server, it can be queried after the next synchronization.

(2) The synchronization efficiency is poor when the amount of data is large: When the amount of data in a single table is too large, the performance of inserting and updating due to index, disk IO and other issues will become very poor.

(3) Simultaneous connection to multiple (at least two) databases: At least two data databases must be connected. The actual read and write operations are done in the program code, which is easy to cause confusion

(4) Read has high performance, high reliability and scalability: read-only server, because there is no write operation, it will greatly reduce performance problems such as disk IO and greatly improve efficiency; read-only server can use load balancing, and the main database can be published to multiple Scalability of read operations is achieved on read-only servers.

 

Three, database/data table split (distributed) 

Through a certain condition, the data stored in the same database is distributed to multiple databases to realize distributed storage, and access to a specific database through routing rules, so that each access is not a single Servers, but N servers, so that you can reduce the load pressure of a single machine. Tip: After sqlserver 2005 version, "table partition" can be supported friendly.

Vertical (vertical) split: refers to the split according to functional modules, such as order database, commodity database, user database... In this way, the table structure between multiple databases is different.

水平(横向)拆分:将同一个表的数据进行分块保存到不同的数据库中,这些数据库中的表结构完全相同。

(纵向拆分)

 (横向拆分)

 

1,实现原理:使用垂直拆分,主要要看应用类型是否合适这种拆分方式,如系统可以分为,订单系统,商品管理系统,用户管理系统业务系统比较明的,垂直拆分能很好的起到分散数据库压力的作用。业务模块不明晰,耦合(表关联)度比较高的系统不适合使用这种拆分方式。但是垂直拆分方式并不能彻底解决所有压力问题,例如 有一个5000w的订单表,操作起来订单库的压力仍然很大,如我们需要在这个表中增加(insert)一条新的数据,insert完毕后,数据库会针对这张表重新建立索引,5000w行数据建立索引的系统开销还是不容忽视的,反过来,假如我们将这个表分成100个table呢,从table_001一直到table_100,5000w行数据平均下来,每个子表里边就只有50万行数据,这时候我们向一张只有50w行数据的table中insert数据后建立索引的时间就会呈数量级的下降,极大了提高了DB的运行时效率,提高了DB的并发量,这种拆分就是横向拆分

2,实现方法:垂直拆分,拆分方式实现起来比较简单,根据表名访问不同的数据库就可以了。横向拆分的规则很多,这里总结前人的几点,

(1)顺序拆分:如可以按订单的日前按年份才分,2003年的放在db1中,2004年的db2,以此类推。当然也可以按主键标准拆分。

优点:可部分迁移

缺点:数据分布不均,可能2003年的订单有100W,2008年的有500W。

(2)hash取模分: 对user_id进行hash(或者如果user_id是数值型的话直接使用user_id的值也可),然后用一个特定的数字,比如应用中需要将一个数据库切分成4个数据库的话,我们就用4这个数字对user_id的hash值进行取模运算,也就是user_id%4,这样的话每次运算就有四种可能:结果为1的时候对应DB1;结果为2的时候对应DB2;结果为3的时候对应DB3;结果为0的时候对应DB4,这样一来就非常均匀的将数据分配到4个DB中。
优点:数据分布均匀
缺点:数据迁移的时候麻烦;不能按照机器性能分摊数据 。
(3)在认证库中保存数据库配置
就是建立一个DB,这个DB单独保存user_id到DB的映射关系,每次访问数据库的时候都要先查询一次这个数据库,以得到具体的DB信息,然后才能进行我们需要的查询操作。
优点:灵活性强,一对一关系
缺点:每次查询之前都要多一次查询,会造成一定的性能损失。

本文出自http://blog.csdn.net/dinglang_2009/http://www.cnblogs.com/dinglang/转载请注明出处。

Guess you like

Origin blog.csdn.net/qq_16005627/article/details/77338329