Analysis principle Zookeeper request

Zookeeper is possible to store data, so we can understand it as a database, in fact, its underlying principle itself and the database is similar.

Principle database

We know that the database is used to store data, but data can be stored in memory or on disk. The Zookeeper is actually a combination of the two, the data that is stored in Zookeeper on disk in order to achieve the purpose of persistence, will sync to the destination memory in order to achieve fast access.

In fact, students should know that used Zookeeper, Zookeeper, there are two types of nodes: persistent node and temporary node .

  • Persistence node: be persistent on disk, unless the initiative to remove, will always exist.
  • Temporary Node: not persisted on disk, only in memory, creating this temporary node Session upon expiration of the temporary node will automatically be deleted.

Principles of database processing data

As a database, certainly to the receiving client to create, modify, delete, query requesting node and the like.

In Zookeeper the request into two categories:

  • Transactional request
  • Non-transactional request

Transactional request

Zookeeper usually run in clustered mode, that is, data for each node in the cluster Zookeeper need to keep consistent. But the cluster and Mysql is not the same:

  • Mysql cluster, from the server is asynchronous synchronization data from the master server, this intermediate interval may be relatively long.
  • Zookeeper cluster, when a cluster node receives a write request operation, the node needs to write this request operations to other nodes so the other nodes synchronization executing the write request operation, so as to achieve the data on each node consistent , that is, data consistency. We usually say Zookeeper ensure that CAP theory CP will only mean that.

Zookeeper underlying cluster is how to ensure data consistency, in fact, it is a two-phase commit more than half + mechanism to ensure that the back will write a separate article to introduce the underlying implementation, are interested can look at my micro-channel public number : 1:25

Transactional request includes: update operation, the new operation, deletion, combined with the above analysis, because these operations will affect the data, so make sure these transactional operations across the entire cluster, so these operations is transactional requests.

Non-transactional request

Then the non-transactional request like to understand, like query operations, exist these operations do not affect the data of the playground, there is no need to maintain transactional cluster, so these playgrounds is non-transactional requests.

Zookeeper in dealing with transactional requests, than dealing with the non-transactional request is much more complicated

Data are expressed on disk

Suppose we now have a data node in Zookeeper, a node named /datanode, says 125that the node is a persistent node, the node information is stored in the file.

May we all think this way is similar to the following file stored on disk, Method One:

Node name Node content
/ DataNode 125

But apart from this representation, there is another way of saying, snapshot + the transaction log , such as Method Two:

Current snapshot:

Node name Node content
/ DataNode 120

The current transaction log:

Transaction ID operating Node name Node content modification ago Node content modified
1000010 update / DataNode 120 121
1000011 update / DataNode 121 125

At first glance, a second method is more complex than the method, and take up more disk. But we had mentioned above, Zookeeper nodes in a cluster in dealing with transactional request needs to be transactional operations synchronized to other nodes, so here is a transaction operations must be persistent in order to appear in sync to other nodes abnormal compensation. So appeared the transaction log . In fact, the transaction log data is also running rollback , in this two-phase commit is also very important.

So snapshot what use is it? The transaction log must be, but as time goes on, the log will certainly be more and more, it is certainly not sustainable logging of all history, so Zookeeper will be timed snapshot , and before the deletion log.

So if the data are stored so second method, when the data query is less convenient. Speaking above, Zookeeper In order to speed up the search data, will also store a copy of data in memory, then the memory of this data, how should it exist?

Data are expressed in memory

Data Zookeeper in the in-memory representation of fact, and the above methods a very similar, but the data Zookeeper is having a file directory features, saying that white is the name of the data nodes Zookeeper is must to “/”begin with, thus leading Zookeeper similar tree data:

image.png

A parent-child hierarchy multi - tree, called the Zookeeper source code Datatree .

Request processing logic

See below:

image.png

Note that for the figure, the real underlying Zookeeper achieve, zk1 is Leader, zk2 and zk3 is Learner, according to the leader election elected.

Non-transactional requests directly read the contents on the DataTree, DataTree is in memory, it will be very fast.

to sum up

This article introduces the core concepts Zookeeper when processing requests:

  1. Transactional request
  2. Transaction Log
  3. Snapshot
  4. data Tree
  5. Two-phase commit

Next article to explain in detail to submit a two-stage process the request when Zookeeper is how to achieve.

There are pain points have innovation, it is definitely a technology to solve a pain point phenomena. Please help forward look, if you want the first time to learn more exciting content, please pay attention to micro-channel public number: 1:25

Guess you like

Origin juejin.im/post/5d14765bf265da1bb47d766c