God the same theory be applied where CAP

For architects or engineers to develop design for distributed systems, CAP is necessary to master the theory.

(But: The focus of this article is not to discuss CAP theory and detail, focusing on the development of the micro-services CAP talk of how to play a role in the guidelines, through several examples will serve to develop micro explanation says, as much as possible to develop close )

CAP theorem again become Brewer's theorem, the University of California computer scientist Eric Bruce Moriarty out of the suspect, later proved to be a recognized field of distributed computing theorems. But Brewer was not for the three CAP (Consistency, Availability, Partition tolerance) carried out a detailed definition at the time of the CAP, so the Internet there have been many voices of different interpretations of the CAP.

CAP theorem

Two versions of the CAP theorem existed, we prevail in the development of the second version

In a distributed system (refer to connect and share data with each other set of nodes), when it comes to read and write operations, can guarantee the consistency (Consistence), availability (Availability), fault tolerance partition (Partition Tolerance) three two, the other one must be sacrificed.

This version of the CAP in theory to explore a distributed system, more emphasis on two points are interconnected and share data, it is actually straightened some shortcomings in the first version pick two, distributed systems are not necessarily there the Internet and share data, For example memcached cluster there is no connection with each other and share data, such memcached cluster-wide distributed systems theory is not discussed in the CAP, and would like mysql cluster is interconnected and share data replication, and therefore belong to CAP mysql cluster theory discussed object.

Consistency (Consistency)

Consistency means that perform no matter which node requires a write operation to return the value of the read operation after the write operation

Availability (Availability)

The non-defective nodes returned reasonable response within a reasonable time

Partition fault tolerance (Partition Tolerance)

When the network is partitioned, the system can still continue to travel agency responsibilities

In a distributed environment, the network can not be 100% reliable, there may be a failure, so partitions is a necessary option, if you choose to give up the CA P, if partition occurs, in order to ensure C, the system needs to prohibit writing into, this time on the a conflict is to ensure that if a, the normal data can be written in a partition, partition faulty data can not be written appears, and C on the conflict. Therefore, a distributed system is theoretically impossible to choose CA architecture, but must select the CP or AP architecture.

Distributed Transaction BASE theory

BASE CAP theory is extended and supplemented, is a complement to the CAP program AP, even in the case where the selected program AP, how to achieve better end C.

BASE is a basic usable, flexible state, eventual consistency abbreviated three phrases, the core idea is that even if unable to do so strong consistency, but the application of suitable methods can be used to achieve eventual consistency.

CAP practical application examples in the service

Understand seemingly talk much, CAP can book project "from scratch learning architecture" of reference under Li Yunhua, which the chapter 21 and 22 depict a more detailed theoretical details of CAP and CAP's version of evolution.

Focus explanation here is the same God CAP how to guide and apply it in our micro-services, will probably give you a few examples usually common

Service registry, choose AP or choose CP?

Service registry problem

Before discussing the CAP under the first clear the main service registry is to solve the problem: a service is registered, a service discovery.

Service Registration: Registration information service instance itself to the registry, this part of the information, including host IP and Port Services service, service itself as well as exposure status and access protocol information.
Service discovery: the service request message instance depend registry, through the registry service instance, the acquired information to the registered service instances therein, through which the information to request the services they provide.

At present, some component as registry roughly: dubbo a zookeeper, springcloud of eureka, consul, rocketMq of nameServer, hdfs of nameNode. Currently micro mainstream service is dubbo and springcloud, use up a zookeeper and eureka, we look at how to choose should be based on CAP registry theory. (Springcloud can also be used zk, but not the mainstream is not discussed).

zookeeper selected CP

zookeep ensure the CP, i.e., any time access can be obtained zookeeper request results consistent data, and the system fault tolerant network is divided, it can not guarantee the availability of each service. From the analysis of the actual situation, when using the zookeeper get the list of services, if zk or zk elections are more than half of the machines in the cluster is not available, then the data will not be available. So, zk can not guarantee service availability.

eureka Select AP

eureka assurance AP, eureka priority in the design to ensure the availability of each node is equal, hang up some of the nodes will not affect the normal node operation, similar to the process of election leader zk does not appear, to a client discovery node registration or connection fails, it will automatically switch to the other nodes, as long as there exists a eureka, you can ensure that the entire service is in a usable state, but it is possible that the information on the service is not the latest information.

Data consistency of zookeeper and eureka

Be clear about that, eureka heart is beginning to create a registry, but more as a distributed coordination zk services exist, but because it's characteristics are dubbo given a registration center, its role is to ensure that more data ( configuration data, status data) is consistent across all services under the jurisdiction of, all this is not difficult to understand why zk is designed to CP instead of AP, zk core algorithm ZAB, is to solve the distributed system at the data in more than problems between a service agreement synchronization.

Deeper reason, zookeeper is constructed in accordance with the principles of CP, that is, it must leave the data for each node are consistent, if the network node is disconnected or split the cluster zookeeper occurred (e.g. switch between subnets can not access) then zk will be removed from their own scope of management, the outside world can not access these nodes, the nodes are healthy even if they can provide normal service, thus resulting in these nodes requests will be lost.

The eureka is no concern in this regard, its nodes are relatively independent, no need to consider the issue of consistency of data, this should be born eureka is designed to registry, relatively speaking zk removed the leader of choosing and transaction log extreme, so that is more conducive to the maintenance and guarantee robustness eureka in operation.

Let's look at the data inconsistency caused any problems in the registration service will eureka, nothing more than one node is registered in the service, less a node registration service, in an instant may cause some nodes are ip a small number of calls, some ip node calls a small number of issues. There could also be some of the dirty data should be deleted and not deleted.

Summary: service registry should choose AP or CP

Registration for the service is, for the same service, even if stored in different nodes registry service registration information is not the same, it does not cause catastrophic consequences for the service consumers can consumption is the most important, even if the data is not to get the latest data, consumers themselves can be a failed attempt to try again. Better than the pursuit of consistency of data and can not get instance information is not available throughout the service better.

So, for service registration, the availability of data is more important than consistency, select the AP.

Distributed Lock, choose AP or choose CP?

Here's the way to achieve distributed lock selected in three ways:

Based on implementation of distributed database lock
Based redis achieve Distributed Lock
Based zookeeper achieve Distributed Lock

Based on implementation of distributed database lock

Construction of table structure

KEY using the table UNIQUE idx_lock( method_lock) as the only master key to perform insert operation when locked database entry is successfully locked that to succeed, the database reported Duplicate entry it said it could not acquire the lock.

But this way for a single master can not automatically switch from the main mysql, the basic reality P partition can not fault tolerance, (Mysql automatic master-slave switch currently does not quite perfect solution). In this way it can be said strongly dependent on the availability of the database, the database write operation is a single point, hang up once the database, leads to the lock is not available. In this manner a substantially non-CAP scope.

Based redis achieve Distributed Lock

redis natural single-threaded serial processing serialization is to solve problems, to solve the distributed lock is easy to do.

Method to realize:

setnx key value Expire_time
获取到锁 返回 1 ， 获取失败 返回 0
复制代码

In order to solve no problems from the master database lock switch can be selected redis cluster, or sentinel Sentinel model, and the main transfer failure, when the master node fails, the sentinel node will select from the slave, the master node again becomes the new .

Sentinel failover mode is judged by the sentinel surveillance cluster, when the maser abnormal copy that is suspended, re-elected to become the new slave master, sentinel re-election and do not care whether the data has been copied from the master with consistency.

Therefore redis copy mode is a mode of the AP belongs. Ensure the availability, in the replication master from the "master" data, but may "from the" no data, this time, once the main hang up or network jitter and other reasons, may switch to the "from" node, this time may two business will lead the county was simultaneously acquire two locks

The process is as follows:

-1 service request thread lock to the master node
Business thread acquires the lock -1
Business thread -1 to acquire the lock and start executing business
This time redis just generated lock has not been synchronized between master and slave
This time the master node redis hung up
redis node upgrade from the master node
Business thread -2 think the new master node requests a lock
-2 thread business to acquire new master node lock to return
-2 thread lock to get business started business
This time business thread-1 and -2 threads simultaneously executing business tasks

The problem is not really redis aforementioned defects, using only redis AP model, which itself can not ensure that our requirements for consistency. redis official recommendation redlock algorithm to ensure that the problem is redlock need to achieve at least three redis master from the instance, the maintenance cost is relatively high, the equivalent of three redlock use redis clusters to achieve their consistency another set of algorithms, more complicated, in industry also uses less than.

Can you use redis as a distributed lock?

Redis can not be used as a distributed lock, this is not a problem in itself redis, or depends on the business scene, we need to confirm our own scene suitable AP or CP, if in the social posting and other scenes, we do not have very strong transaction consistency, redis provide us with high-performance AP model is very suitable, but if it is the type of transaction, the data consistency is very sensitive to the scene, we may have to look for in a more appropriate model CP

Based zookeeper achieve Distributed Lock

Just be analyzed, redis in fact, can not ensure data consistency, first look at the zookeeper is appropriate as a distributed lock we need, first of all zk CP model is the model, that is, when the lock zk give us a visit in zk cluster to ensure that the lock on each node zk are present.

(This is actually the leader zk written request submitted by two stages to ensure, and this is the size of a large cluster zk of a bottleneck point)

Zk lock principle implemented

He said zookeeper take a look at several properties before locking problems zk, which built several features of a distributed lock zk

characteristic:

Ordered node

When you create an ordered node in a parent directory such as / under lock, the node will be created in accordance with the strict order from the node lock000001, lock000002, lock0000003, and so on, ordered to strictly ensure the node from each node sorted by name generation.

Temporary node

The client has established a temporary node in the client's session ends or the session times out, zookepper will automatically delete the ID that solution.

Event Listeners

When reading data, we can set the monitor to a node, a node when the data changes (create 1 Node 2 Node 3 Node deleting data becomes 4 becomes self-node), zookeeper will notify the client.

The combination of these several features, look at how zk is distributed lock combination.

Business Business thread thread -1 -2 respectively under / lock directory zk of application creation and orderly temporary node
Business thread -1 grab the file / lock0001, which is a minimum order of nodes in the entire directory, which is to get a lock thread -1
-2 thread business can only grab the file / lock0002, not the smallest sequence of nodes, failed to get lock thread 2
Business thread -1 establish lock0001 connection, and maintain the heartbeat, the heartbeat is to maintain the lock of the lease
When the service thread -1 complete business, the freed connected to zk, that is the release of the lock

zk distributed lock code implementation

zk official client does not support the direct implementation of distributed locks, we need to write your own code to take advantage of these properties to be achieved in zk.

Summary: whether the use of distributed lock CP or AP

First we have to understand clearly the use of distributed lock scene, why use distributed lock, use it to help us solve the problem, after the first chat chat technology selection scenario distributed lock.

Whether redis, zk, for example redis the AP model will limit the use of a lot of scenes, but it has a few who have the highest performance, zookeeper distributed lock redis than many reliable, but his cumbersome implementation mechanisms leading to its performance than redis, and zk and performance will drop even more with the expansion of the cluster.

In brief, first understand the business scene, after technology selection.

Distributed transactions, is how the ACID relief, to join the CAP / BASE

When it comes to the transaction, ACID is commonly used in traditional database design, the pursuit of strong consistency model, relational database model ACID consistency + with high availability, it is difficult to partition, so the micro-ACID already in service can not be supported, we CAP or return to find a solution, but based on the above discussion, the CAP theorem, or only CP, or only AP, if we pursue the consistency of data and ignore the availability of this service in the micro certainly is not going to work, If we ignore the pursuit of consistency and availability, then certainly appears flawed in some important data (such as payment amount), this is unacceptable. So we should not only consistency, but also availability.

Should not be achieved, but we can not make some compromises on the consistency, not the pursuit of strong consistency in favor of the pursuit of eventual consistency, so the introduction of BASE theory, in a distributed transaction, BASE is the most important for the CAP proposed the final consistency of the solution, BASE sacrifice high emphasis on consistency, in order to gain willing to use, data inconsistencies within the allowed period of time, as long as eventual consistency on it.

Achieve eventual consistency

Weak consistency : The system can not guarantee the return value of the updated follow-up visit. After the required number of conditions are met, to return the updated value. Starting update, the system ensures that any observer always see the updated value of this period is called the inconsistency window.

Eventual consistency : This is a special form of weak consistency; storage system guarantees that if no new update operation on an object, and eventually all accesses will return the object was last updated value.

BASE model

ACID BASE model is the opposite of the traditional model, unlike ACID, BASE sacrifice high emphasis on consistency, to obtain availability data allows inconsistent over time, as long as the final agreement on it.

BASE model anti-ACID model, completely different ACID model, the expense of high consistency, reliability, availability, or to obtain: Basically Available Basic available. Support partition failed (eg sharding divided debris database) Soft state Soft state state can not for some time synchronous, asynchronous. Eventually consistent final agreement is consistent with final data on it, and not always consistent.

Distributed Transaction

In distributed systems, to achieve a distributed transaction, nothing less than the several solutions. Programs vary, but are in fact follow BASE theory, it is the eventual consistency model.

Two-phase commit (2PC)
Compensating transactions (TCC)
Local news list
MQ transaction message

Two-phase commit (2PC)

In fact, there is a database of XA transaction, but now the Internet in real basic little practical application, two-phase commit is to use XA principle.

XA is divided into two phases in the agreement:

Transaction Manager requires that each database transaction related to pre-commit (precommit) this, and reflect whether to submit.
Transaction Coordinator required to submit data for each database, or rollback data.

Say something, why submit a basic rarely applied industry has not been transformed two-phase system in the Internet, the most biggest drawback is synchronous blocking problem after resources are ready, Explorer of resources has been at the obstruction, after the completion of the filing, it releases the resource. The large high concurrent data on the Internet today, two-phase commit is now unable to meet the development of the Internet.

There is a two-phase commit protocol for distributed data, although strong consistency of the design, but there are still inconsistencies in the data may be, for example:

For example, in the second stage, it is assumed coordinator issued a notice Affairs Commit, but because of network problems such notice is received and executed only a Commit part of the participants, because the rest of the participants did not receive notice has been blocked in state, this time produced a data inconsistency.

Compensating transactions (TCC)

TCC is a service of the two-stage model becomes, each business service must implement try, confirm, calcel three ways, three ways to correspond to the SQL transaction Lock, Commit, Rollback.

Compared to the two-phase commit, TCC to resolve several issues

Synchronous blocking the introduction of a timeout mechanism to compensate after the timeout, and not as two-phase commit locking the entire resource, the resource will be converted to the form of business logic, the smaller particle size. Because of the compensation mechanism can be controlled by the manager of operational activities, to ensure data consistency.

1). Try stage

try just a preliminary operation, a preliminary confirmation, its main role is to complete the inspection of all business, business resources reserved

2). Confirm stage

confirm that after the inspection is finished try stage, to confirm the operation to continue, it must meet idempotent operations, if fails confirm, the transaction coordinator will continue to trigger the execution until meet

3). Cancel cancel is executed, and try not freed by stage try to reserve resources, must also meet idempotency, confirm with the ongoing implementation of the same is likely to be

Examples of a lower order, generate orders buckle stock:

Then take a look at our next single deduction inventory process how to join TCC

In the try when service will stock inventory reserve of n order to use this, so order service generates a "unconfirmed" order, while producing two reserved resources, confirm in time, it will be used in the pre-try the resources remain in the TCC transaction mechanism that, if the try phase normally reserved resources, then we can confirm the complete submission

In the try when there is a party for the mission failed, cancel interfaces action takes place, the stage will be set aside in a try to release resources.

This is not the focus be on the tcc matters is how to achieve, the focus is to discuss the application CAP + BASE theory of distributed transactions. Reference can be achieved: github.com/changmingxi...

Local news list

Local news watch this program was first proposed by eBay, eBay's complete solution queue.acm.org/detail.cfm?...

Local news should table this implementation is the industry's most used, the core idea is to split the cost of a distributed transaction to transaction processing.

For a local message queue, the core is a large transaction into smaller transactions, or use the example above, under the orders of said description buckle stock

When we go to create an order, we added a table of local news, to create orders and deductions inventory write to a local news table in the same transaction (local rely on database transactions to ensure consistency)
Configure a cron job to local affairs in rotation this table, scans the local transaction table, the message is not sent out, sent to the inventory service, when the inventory service receives a message, it will reduce inventory, and write transactions table server, update the status of the transaction table.
Inventory server or directly through the regular task notification service orders, service orders in the local message table update status.

It should be noted here that sent unsuccessful scan task for some, will be re-sent, it is necessary to ensure idempotency interface.

BASE is a local message queue theory, is the ultimate model of consistency, the consistency of the application of less demanding.

MQ Transaction

RocketMq in version 4.3 has officially announced its support for distributed transactions, distributed transactions made in the selection Rokcetmq sure to select more than version 4.3.

RocketMQ is implemented in a distributed transaction, the local package is actually a message table, the table will move the message to the local internal MQ.

As an asynchronous transaction message to ensure that the transaction type, the transaction will be two branches by decoupling the MQ asynchronous, the design process of the same message reference RocketMQ transaction two-phase commit theory, the overall interaction process shown below:

MQ message transaction is a local layer of encapsulation table, the table to the mobile local internal MQ message, it is also based on the theory BASE, the final consistency model, strong consistency required for the transaction is suitable not so high, while an entire transaction MQ the process of asynchronous, and is also very suitable for use in high concurrency.

RocketMQ selects the asynchronous / synchronous brush plate, asynchronous / synchronous replication, CP, and AP Thought behind

Although the brush disc synchronous / asynchronous disk brush, synchronous / asynchronous replication, and there is no direct application to the cAP, but as the process relates to the configuration of availability and consistency considerations

Brush disc synchronous / asynchronous brush disc

RocketMQ messages can be done persistence, the data will be persisted to disk, RocketMQ in order to improve performance, as far as possible to ensure sequential disk writes, the message is written when the Producer RocketMq, there are two ways written to disk:

Asynchronous brush plate: fast write messages to the memory of pagecache, to immediately return to write success state, when messages in memory accumulated to a certain extent, will trigger a unified writing to the disk. This way we can ensure high throughput, but there are also risks the message may not be saved to disk loss.
Brush pan sync: news fast write memory pagecahe, immediately notify brush disk brush threads disk, wait for the completion of the disk brush, wake up waiting threads, the return message write success state.

Synchronous replication / asynchronous replication

And a broker group Master Slave, the message needs to be copied from the Master to the Slave, it is synchronous and asynchronous replication.

Synchronous replication: and so is the success of Master and Slave are written back to the client after a successful write status.
Asynchronous replication: As long as Master is written back to the client can successfully write success state.

The advantages of asynchronous replication can improve the response speed, but at the expense of consistency, the general algorithm such agreement requires additional compensation mechanism. The advantages of synchronous replication is guaranteed consistency (generally two-phase commit protocol through), but the cost is large, the availability of good (see CAP theorem), brought more conflict and deadlock issues. It is worth mentioning that the Lazy + Primary / Copy replication protocol in the actual production environment is very useful.

RocketMQ set to combine business scenarios, a reasonable set of brush disk mode and master-slave replication, especially SYNC_FLUSH way, due to the frequent trigger disk write operation, can significantly reduce performance. Normally, the Master and Slave should brush disc arranged ASYNC_FLUSH manner, disposed between the master and slave to the SYNC_MASTER replication, so that even if there is a machine failure, can still ensure data is not lost.

to sum up

In the construction of the micro-service, can not always escape the CAP theory, because the network is always unstable, always aging hardware, the software may appear bug, so partitions fault tolerance is squarely in the sights of the proposition in the micro-service, so to speak, as long as is distributed, as long as the clusters are faced with the AP or select the CP, but you are very greedy when both consistency and also availability, it can only make a little compromise on consistency, that is, the introduction of BASE theory in business the final consistency achieved under circumstances permit.

What is selected AP or to vote for CP, really lies in understanding of the business, such as money, inventory-related priority will be given CP models, such as community posting relevant may prefer AP model, which it plainly fact-based understanding of the business is a choice and compromise the process.