Understanding of CAP Theory of Distributed Data Storage

Distributed is a very popular term now, especially in the field of distributed data processing. With the generation of massive data, everyone is researching more efficient, scalable, and highly available data storage systems. Basically, there is no product that can surpass the CAP principle of distributed databases.

The so-called CAP theory is:

Cosistency data consistency

Availability high availability

Tolerance to newowrk Partitions Partition tolerance

It is impossible for a data storage system to satisfy the above three characteristics at the same time, but only two of them, namely: CA, CP, AP. It can be said that all current data storage solutions can be classified into the above three types.

CA satisfies data consistency and high availability, but does not have scalability, such as traditional relational data, basically this solution, such as ORACLE, MYSQL single node, satisfies data consistency and high availability.

CP satisfies the consistency and partition of data, such as Oracle RAC, Sybase cluster. Although Oracle RAC has a little scalability, when the number of nodes reaches a certain number, the performance (that is, availability) will drop rapidly, and the network overhead between nodes is very high, and data between nodes needs to be synchronized in real time.

AP performs well in terms of performance and scalability, but sacrifices in terms of data consistency. Data synchronization between nodes is not so fast, but it can preserve the eventual consistency of data. Most of the currently hot NOSQL is a typical AP type database.

To sum up the above, architects should not attempt to design a database that satisfies the three aspects of CAP. The data storage requirements can only be compromised according to business scenarios.

 

http://my.oschina.net/lilw/blog/169776

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326985485&siteId=291194637