--- sequel database cluster

Database cluster , by definition, is the use of at least two or multiple database servers, virtual constitute a single logical database image, like a single database system that provides transparent data services to clients.

1. Synchronization

After the database client sends a request to update the data after each cluster node to wait for all the updates, only to return the results to the client.

2. Asynchronous

After the database client sends a data update request, the node accepts the request (this is often the primary database) immediately returns a result to the client, the updated data will be replicated in the next transmission time of one to other nodes in the cluster (weak consistency process).

3. Based on the load balancing of connections

Such technology is relatively simple to achieve load balancing, that is client-initiated landing time, according to some load balancing algorithm to select a landing station to a cluster database, then all the requests for all clients will be sent to this database.

4. The load balancing based on a request

Such a load-balancing technology to achieve more complex, however powerful, is initiated by the client when landing, Clustered Gateway will also log on to the database cluster nodes, all subsequent client requests, after analysis Clustered Gateway is divided into two categories, queries the load balancing algorithm to pick a node to perform according to the data update request, there are a host to perform real-time synchronization and data to each node in the cluster.

The difference between a. And distributed database systems

  1. Some parts of the database cluster having a single set of data, and some have two or more portions similar data sets in two or more parts having some real consistent set of data; and distributed database systems often have a completely different set of data.
  2. Isomorphic database cluster system often requires each node cluster have the same operating system and database system version, patch versions and even consistency is also required; the distributed database system can be heterogeneous systems, comprising different operating systems and different database systems.
  3. Database cluster is often built into the high-speed local area network; and a distributed database system can be either a high-speed local area network, or across departments, units in different places across the remote network.

II. Technical indicators database cluster

As the database system is the core of any information system, so in addition to business logic, user also concerned about the following three points:

1. System Performance

Performance issues involving hardware, software, network, application architecture design, code quality and other aspects. But if the database cluster to provide load balancing and automatic optimization capabilities, it is a great benefit to the overall system performance.

2. Data Reliability

Possibility of data loss at occurrence of any fault conditions of the system (including the operating system, the database engine, or a hard disk or disk array storage network fault). Some systems doomed from the design principle the data exists in theory will inevitably be lost possibilities, because some systems redundancy design principles, theory can guarantee zero data loss. The term disaster recovery field of view, which is similar to RPO (Recovery Point Objective), but not exactly the same RPO.

3. Service Availability

Any fault (including the operating system, the database engine, or a hard disk or disk array storage network fault) the possibility of external data and services to provide the entire system is stopped under a condition occurs in the system. Closely associated with the above database reliability, the possibility of data loss if a system exists in theory, then such a system will inevitably lead to the whole system service is stopped. Similarly, the term in the field of disaster recovery is concerned, similar RTO (Recovery Time Objective), but also can not be completely equal to RTO.

III. Classification database cluster

In the market, database cluster is a general term, there is no authoritative definition, market participants tend to be what they want to launch various characteristics of the database cluster solutions. Generally have the following four cluster solution:

1. Based on the serial data replication technology

Serial Copy technology, originally used for data transfer and data backup, from the concept of "database clusters" people familiar with a certain distance. However, due to the rapid development of computer hardware and software technology and network communication technology, making use of "database cluster" concept and technical composition of a certain feasibility. Such clusters, it can be divided into two categories:

a. Serial asynchronous replication

Such is the way of the serial asynchronous data replication. Mainly the database transaction log or hard disk transfer block transfer techniques to achieve, SQL Server built-in copy, the new image and SQL2012 out AlwaysOn (preparation machine readable) and a third mirror Mirror some techniques are of such products, such is the nature of technology and product data backup technology and products. Following to the transaction log shipping (Log Shipping) example to illustrate. After completion of the primary transaction database, generating a transaction log, logging by FIFO queue, enter the database backup process, thereby obtaining the backup data. Defects such a way that:

a) a main database and transaction log copying parallel processing is serial, and the backup logging database processing is serial. Therefore, the FIFO queue overflow can occur at any time. The event queue must be rebuilt in order to re-establish a backup database. This method for the average customer is concerned is not feasible.

b) Since the log is asynchronous copy, not real consistent standby database, there is "time difference", so if the backup database workload balancing, logical loopholes such applications, data disorder may occur therebetween.

c) Because there is a time difference between the master and slave data, the primary database event of an accident, in theory, will lose data. In this case, either need to manually restore the database, which consumes a lot of labor costs, or simply can not recover the data.

Effect on the performance of the host d), according to the test is generally between 15% to 25%.

b. serial synchronous replication

Such clustering is often expensive dedicated hardware configuration, the principle is as follows:

Such a system using a dedicated high-speed network and software technology, each database requests, by means of synchronous replication, correct synchronous execution, it will return the results to the customer database on the primary two database servers. Features of this system are:

a) a main database and the backup database is forced synchronous serial processing, performance is limited.

b) any emergence of a standby database problems, will force the transaction to roll back the transaction, so the reliability of the entire system is reduced by half than the stand-alone system.

c) Because of the above problems, this backup method is only applicable to near optical networks (five miles).

d) expensive dedicated system, but also added significant deficiencies described above, and therefore rarely used on the market.

2. Fault-tolerant technology is based on the shared storage

From a technical point of view adaptive, fault-tolerant more suitable for stateless applications, less information or status switching application, the application level in order to achieve high availability purposes, is not suitable for the application database level switch.

This structure is often two servers share a disk array, where two servers share a virtual IP for customers to use the database to form a single logical database image. The purpose of this so-called database cluster is that once the host system problems, the backup system by detecting the heartbeat mechanism to complete the switch from the host system to the backup system. This solution on the market is known as " dual-cluster " or " hot standby ", referred see "Dual", but Microsoft called "failover clustering." It has the following features:

a. Normal fault tolerant high availability solutions just such a stateless system (typically a Web server such as a) the field of database applications thought in the handover.

B. The system itself is only a single image data, the data stored on the shared disk arrays embodiment, the shared disk array thus become a single point of the system error sources.

C. Because it is a single data image, it is necessary to adopt a general method for obtaining a backup copy or second data to ensure data security. Thus disadvantage of all copy or backup process, all such systems exist.

It is not between d. The host system and backup system any load balancing relations, under normal circumstances, the backup system is sitting there, so the user is a waste of investment.

e. at the wrong time switch, there is often a long time to switch, but more serious is that there may be a loss of user transaction data to be lost, resulting in system is forced out of service, or the need to manually repair data, or data never be regained .

3. Oracle RAC as the representative system

RAC's English name is: Real Application Cluster (real application cluster level). We need to focus on is the "application level." In order to alleviate the pressure increasing performance database system, Oracle introduced the RAC system. It basic structure is as follows:

Such systems, specifically for database performance problems is proposed. Uses a shared disk array mode, and said fault-tolerant so similar in structure, is not a simple heartbeat different places between the database nodes used in this system, but Oracle's own definition of a complex set of information exchange protocol, in order to dynamically allocate request from the client database. It is characterized by:

a. is an application-level clustering, that is, for the Oracle database management system (DBMS because the operating system is concerned, is an "application", so called "application-level cluster"), specifically to improve database performance design.

B. The system itself is only a single image data, the data stored on the shared disk arrays embodiment, so the disk array entitlements embodiment of the entire system becomes a single point source of error.

c. Configuration management complexity.

D. Because it is a single data image, it is necessary to adopt a general method for obtaining a backup copy or second data to ensure data security. Thus disadvantage of all copy or backup process, all such systems exist.

E. Since the database system itself has a characteristic of high I / O, and therefore, the RAC system, disk I / O performance is improved key areas.

f. According to different database applications, some performance has improved, some performance may be decreased.

 

1. Why use a database cluster

        (1) can be made by using separate read and write database cluster, improving the performance of the database system.

        As we all know, mysql support is distributed. MySQL Proxy's most powerful feature is to achieve a "separate read and write (Read / Write Splitting)". The basic principle is to let the Lord handle transactional database queries from a database at

Li SELECT query. Database replication is used to query transactional changes resulting from synchronization to the cluster from the database so that the data from the database and primary database consistent. Of course, the main server can also provide consulting services.

Using separate read and write action is nothing more than the maximum pressure environment server. We can look at this picture:

 

—————————————————————————————————————————————————————————

   Why separate read and write can improve database performance? (Taken from the network)

        1. physical server increases, the load increases
        2. From the main responsible for only their writing and reading, a great degree of ease X locks and lock contention S
        3. myisam engine can be configured from the library, improve query performance and saving system overhead
        4. from the database synchronization main library data and the main library to write directly there are differences, the main library sent binlog recover the data, however, the most important difference is that the main library to send from the library binlog is asynchronous, recovering data is asynchronous from the library the
        5 applies separate read and write the read write scene is much greater than if only one server, when many select, update, and delete these data will be blocked in the access select, select waiting end, high performance is not complicated. For applications read and write a similar proportion, should be deployed dual master copy each other

        6. Start from the library can be increased to improve performance parameters for reading, e.g. --skip-innodb, - skip-bdb, - low-priority-updates and --delay-key-write = ALL. Of course, these settings are also required to set too according to the specific business needs, can not necessarily be on

        7. apportionment read. If we have a 3 from the master, without considering the above-mentioned unilateral 1 provided from the library, it is assumed there are 10 writes 1 minute, 150 reading. Then, the main 3 1 Total 40 corresponding to the write, read and the total number has not changed, so on average bear per server 10 reads and writes 50 (library does not bear the main reading operation). Thus, while the write has not changed, but the share is read greatly improve system performance. Further, when the reading is assessed, but also indirectly enhance the performance of writing. So, overall performance improves, it means to take the machine and change the bandwidth performance. MySQL official documentation relevant calculus formula: official documents 6.9FAQ see the "MySQL replication capable of when and how much to increase system performance."

        8.MySQL copy another major function is to increase redundancy, improve availability, when a database server is down by adjusting the other one from the library to the fastest recovery services, not just look at the performance, that is to say 1 1 from the master are possible.

——————————————————————————————————————————————————————————

2. The database cluster and distributed database What is the difference?

        Bottom line: Parallel work is distributed, the cluster is working in tandem.

       1: Distributed refers to the different services in different places. The cluster refers to the focus on a few servers together to achieve the same business. Distributed in each node, the cluster can be done. The cluster is not necessarily divided

Cloth typical.

        For example: it is such as Sina, people visit more, he can do a cluster, put in front of a response from the server, followed by a few servers to complete the same service, if the service access when the server response Behold servers

The load is not heavy, it will Which to complete. And distributed, understood in a narrow meaning, but also with similar cluster, but its organization is relatively loose, unlike clusters, there is a resistance organization, a server broke down, the other server can be on top.

        Each node in a distributed, are different services, a collapsed node, on which the business can not be visited.

        2: Simply put, distributed execution time is shortened to improve the efficiency of a single task, and the cluster is to improve efficiency by increasing the number of tasks performed per unit time.

        Example: If a task 10 by the sub-tasks, each individual sub-task execution is 1 hour, is required to perform the task for 10 hours on a single server.

A distributed program to provide 10 servers, each server only handles a sub-task, without considering the dependencies between sub-tasks, performing this task only an hour. (A typical representative of this mode of operation is

The Hadoop Map / Reduce distributed computing model)

        The use of cluster solutions, also provides 10 servers, each server can independently handle this task. Suppose there are 10 tasks simultaneously arrive, the server 10 simultaneously, after 1 hour, 10 while completing the task, so that,

The whole body of view, or to complete a task within an hour!

          Look:

 

 

Guess you like

Origin www.cnblogs.com/klb561/p/11369713.html