Summary of large-scale application solutions for sqlserver database

With the widespread popularity of Internet applications, the storage and access of massive data has become a bottleneck in system design. For a large-scale Internet application, millions or even hundreds of millions of PVs will undoubtedly cause a very high load on the database every day. It has caused great problems to the stability and scalability of the system.

1. Load balancing technology A
load balancing cluster is composed of a group of independent computer systems, connected through a conventional network or a dedicated network, connected by routers, and each node cooperates with each other, shares the load, and balances the pressure. , the entire cluster can be regarded as a single server with ultra-high performance.

1. Implementation principle
To realize the load balancing technology of the database, first of all, there must be a control terminal that can control the connection to the database. Here, it cuts off the direct connection between the database and the program, and all programs access the middle layer, and then the middle layer accesses the database. In this way, we can specifically control access to a certain database, and then we can adopt an effective balancing strategy according to the current load of the database to adjust which database to connect to each time.
2. Realize multi-database data synchronization
For load balancing, the most important thing is that the data of all servers are synchronized in real time. This is necessary for a cluster, because if the data is not real-time and asynchronous, the data read by a user from one server is different from the data read from another server, which is not allowed. Therefore, the data synchronization of the database must be realized. In this way, there can be multiple resources when querying to achieve balance. The more commonly used method is Moebius for SQL Server cluster. Moebius for SQL Server cluster adopts the method of residing the core program in the database of each machine. This core program is called Moebius for SQL Server middleware, and its main function is to monitor the database. Data changes and synchronize the changed data to other databases. The client will get a response after the data synchronization is completed. The synchronization process is completed concurrently, so the time for synchronization to multiple databases and synchronization to one database is basically the same; in addition, the synchronization process is completed in a transactional environment, ensuring more data consistency at any time. Because of the innovation of the Moebius middleware host in the database, the middleware can not only know the data changes, but also the SQL statements that cause the data changes, and intelligently adopt different data synchronization strategies according to the type of SQL statements to ensure data synchronization costs. the minimization of .

If the number of data items is small and the data content is not large, the number of data items to be synchronized directly
is very small, but it contains large data types, such as text, binary data, etc., the data is compressed first and then synchronized, thereby reducing network bandwidth. occupancy and transmission time.
There are many pieces of data. At this time, the middleware will get the SQL statement that causes the data change, then parse the SQL statement, analyze its execution plan and execution cost, and choose whether to synchronize the data or synchronize the SQL statement to other databases. This situation is very useful when adjusting the table structure or changing data in batches.
3. Advantages and disadvantages
(1) Strong scalability: when the system needs a higher database processing speed, it can be expanded simply by adding a database server.
(2) Maintainability: When a node fails, the system will automatically detect the failure and transfer the application of the failed node to ensure the continuous operation of the database.
(3) Security: Because the data will be synchronized on multiple servers, the redundancy of the data set can be realized, and the security can be ensured by multiple copies of the data. In addition, it successfully placed the database into the intranet, which better protects the security of the database.
(4) Ease of use: It is completely transparent to the application, and the cluster exposes an IP

(1) The load cannot be distributed according to the processing capacity of the Web server.
(2) The failure of the load balancer (control end) will cause the entire database system to be paralyzed.

 

Second, the separation of reading and writing of the database
1. Implementation principle: The separation of reading and writing is simply to separate the operations of reading and writing to the database corresponding to different database servers, which can effectively reduce the pressure on the database and also reduce the pressure on the io. The main database provides write operations, and the slave database provides read operations. In fact, in many systems, it is mainly read operations. When the master database performs a write operation, the data must be synchronized to the slave database, so as to effectively ensure the integrity of the database.

(ebay's read-write ratio is 260:1, ebay's read-write separation)

(Microsoft database distribution) 

2. Implementation method: In MS Sql server, you can use the way of publishing definition to realize database replication, realize read-write separation, replication is the technology of copying a set of data from one data source to multiple data sources, it is to publish a data efficient way to multiple storage sites. Using replication technology, users can publish a piece of data to multiple servers. Replication technology can ensure that data distributed in different locations are automatically updated synchronously, thereby ensuring data consistency. There are three types of SQL SERVER replication technologies: snapshot replication, transactional replication, and merge replication. SQL SERVER mainly uses publications and subscriptions to process replication. The server where the source data is located is the publishing server, which is responsible for publishing the data. The publishing server replicates a copy of all changes to the data to be published to the distribution server, which contains a distribution database that receives all changes to the data, saves the changes, and distributes these changes to the subscribers.

3. Advantages and disadvantages
(1) The real-time performance of the data is poor: the data is not synchronized to the self-reading server in real time. When the data is written to the main server, it cannot be queried until the next synchronization.

(2) The synchronization efficiency is poor when the amount of data is large: when the amount of data in a single table is too large, the performance of inserting and updating will be very poor due to problems such as indexing and disk IO.

(3) Connect to multiple (at least two) databases at the same time: connect to at least two data databases, the actual read and write operations are done in the program code, which is easy to cause confusion

(4) Read has high performance, high reliability and scalability: read-only servers, because there is no write operation, will greatly reduce performance problems such as disk IO and greatly improve efficiency; read-only servers can use load balancing, and the main database can be published to multiple Scalability of read operations on read-only servers.

 

3. Database/data table split (distributed) 

Through certain specific conditions, the data stored in the same database is distributed to multiple databases to achieve distributed storage, and access to specific databases is routed through routing rules, so that each access is not a single database. Instead of N servers, the load pressure of a single machine can be reduced. Tip: After the sqlserver 2005 version, "table partitioning" can be supported amicably.

Vertical (vertical) splitting: refers to splitting by functional modules, such as order library, commodity library, user library... In this way, the table structures of multiple databases are different.

Horizontal (horizontal) splitting: The data of the same table is stored in blocks in different databases, and the table structures in these databases are exactly the same.

(vertical split)

 (horizontal split)

 

1. Implementation principle: The use of vertical splitting mainly depends on whether the application type is suitable for this splitting method. For example, the system can be divided into an order system, a commodity management system, and a user management system. The business system is relatively clear, and vertical splitting can be very easy. Good to play a role in dispersing the pressure on the database. The business module is not clear, and the system with a high degree of coupling (table association) is not suitable for this splitting method. However, the vertical split method can not completely solve all pressure problems. For example, there is a 5000w order table, and the pressure of the order library is still very high. For example, we need to add a new piece of data to this table. After the insert is completed , the database will re-index this table, and the system overhead of indexing 5000w rows of data cannot be ignored. Conversely, if we divide this table into 100 tables, from table_001 to table_100, the 5000w row data is averaged, There are only 500,000 rows of data in each sub-table. At this time, the time to build an index after inserting data into a table with only 50w rows of data will decrease by an order of magnitude, which greatly improves the runtime efficiency of the DB and improves the This kind of splitting is horizontal splitting because of the concurrency of DB.

2. Implementation method: vertical splitting, the splitting method is relatively simple to implement, and you can access different databases according to the table name. There are many rules for horizontal splitting. Here are some points from the predecessors.

(1) Sequential splitting: If it can be divided according to the year before the order, the 2003 will be placed in db1, the 2004 db2, and so on. Of course, it can also be split by the primary key standard.

Advantages: Partial migration possible

Disadvantages: uneven distribution of data, maybe 100W of orders in 2003 and 500W in 2008.

(2) Hash modulo score: Hash the user_id (or use the value of the user_id directly if the user_id is numeric), and then use a specific number, for example, if the application needs to divide a database into 4 databases, We use the number 4 to perform a modulo operation on the hash value of user_id, that is, user_id% 4. In this case, there are four possibilities for each operation: when the result is 1, it corresponds to DB1; when the result is 2, it corresponds to DB2; the result When it is 3, it corresponds to DB3; when the result is 0, it corresponds to DB4, so that the data is very evenly distributed to 4 DBs.
Advantages: Evenly distributed data
Disadvantages: It is troublesome to migrate data; data cannot be apportioned according to machine performance.
(3) To save the database configuration in the authentication database
is to establish a DB, which saves the mapping relationship between user_id and DB separately. Every time you access the database, you must first query the database to get the specific DB information, and then you can proceed The query operation we need.
Advantages: Strong flexibility, one-to-one relationship
Disadvantages: One more query is required before each query, which will cause a certain performance loss.

This article is from http://blog.csdn.net/dinglang_2009/ , http://www.cnblogs.com/dinglang/ please indicate the source.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326569403&siteId=291194637