[Common problems and terminology of distributed systems]

 

Distributed System----CAP Theory 

[Consistent Hash Algorithm for Distributed Storage System]

1. Motivation for choosing a distributed system 

(1) Information exchange 

(2) Resource sharing 

(3) Improve reliability through repetition 

(4) Improve performance through parallelization 

(5) Simplified design through specialization 

(6) Determined by the characteristics of the problem itself

 

           There is nothing wrong with people being more powerful, but there are also disadvantages to having too many people

2. Distributed system process communication, basic principles and steps of rpc

1. The client procedure calls the client stub in the normal way

2. The client stub generates a message and then calls the local operating system

3. The client operating system sends the message to the remote operating system and blocks the client process

4. The remote operating system hands the message to the server stub

5. The server stub extracts the parameters and then calls the server

6. The server performs the required operation, and returns the result to the server stub after the operation is completed

7. The server stub types the result into a message packet, and then calls the local operating system

8. The server OS sends the message back to the client OS

9. The client OS hands the message to the client stub

10. The client stub extracts the result from the message and returns it to the calling process

 

3. Concurrency control: In a cluster environment, key data is usually stored in a shared manner, such as on a shared disk. The identities of each node in the cluster are equal, and all nodes have the same access rights to data. At this point, there must be some mechanism to control the node's access to the data. In ORACLE RAC, the DLM (Distribute Lock Management) mechanism is used to control concurrency among multiple instances. 

 

4. Split Brain

In a cluster, nodes need to know each other's health through some mechanism (heartbeat) to ensure that each node works in coordination. Suppose only the "heartbeat" fails, but the nodes are still up and running. At this time, each node thinks that other nodes are down, and it is the "only surviving person" in the entire cluster environment, and it should gain "control" of the entire cluster. In a cluster environment, storage devices are shared, which means data disaster, such a situation is "split brain". The usual solution to this problem is to use a Quorum Algorithm. 

Split-brain phenomenon: It is an inconsistency in the cluster caused by the failure of normal communication between nodes in the cluster  

If this happens, Oracle RAC will terminate a node to ensure the consistency of the cluster. The principle of terminating an instance after split-brain is generated is to vote to select the node to terminate according to the sub-cluster of the split-brain phenomenon.

The principle of voting algorithm: each node in the cluster needs a heartbeat mechanism to notify each other's "health status", assuming that each "notification" received from a node represents one vote. 

 

5. Amnesia

This problem occurs when the cluster environment configuration files are not stored centrally, but each node has a local copy. When the cluster is running normally, users can change the configuration of the cluster on any node, and such changes will be automatically synchronized to other nodes. But there is this scenario: In a two-node cluster, node 1 needs to be shut down for normal maintenance, and then some configurations are modified on node 2, then node 2 is shut down, and node 1 is started. Because of the configuration changes made on node 2 before It is not synchronized to node 1, so after node 1 starts, it still works with the old configuration file, which will cause configuration loss, which is called "amnesia". 

 

6. IO Fencing

This question is an extension of the previous question. It is necessary to ensure that the evicted node cannot operate the shared data. Because the node may still be running at this time, it is very likely that the shared data will be modified if no restrictions are imposed. 

IO Fencing is implemented in hardware and software.

For storage devices that support SCSI Reserve/Release commands, SG commands can be used to implement them. The normal node uses the SCSI Reserve command to "lock" the storage device. After the faulty node finds that the storage device is locked, it knows that it has been kicked out of the cluster. In normal working condition, this mechanism is also called suicide. Sun and Veritas use this mechanism.

STONITH (Shoot The Other Node In The Head) is another implementation method, which directly operates the power switch. When a node fails, if another node can detect it, it will send a command through the serial port to control the power switch of the faulty node, and the faulty node will be restarted by temporarily powering off and then powering on again. This method requires hardware support.

 

Seven, cache avalanche phenomenon 

Cache avalanche is generally caused by the failure of a cache node, resulting in a drop in the cache hit rate of other nodes, and missing data in the cache (the classic scenario of memcache, when a client's service request comes, first check memcache to see if it is cached in memcache. After this data, if there is no such data, we go to the database to query, if there is this data, we take it out from the memcache, and then return it to the client, this is a classic query process, in this scenario, The missing data in the cache is because its cache node is invalid, so the missing data will go to the database for query. Go to the database for query. In a short time, the database server will crash. After restarting the DB, it will be overrun again in a short period of time, but the cached data is also More. The DB is started many times repeatedly, the cache is rebuilt, and the DB runs stably. 

Or, because the cache is invalidated periodically, such as every 8 hours, then every 8 hours, there will be a request "peak", 

In severe cases, the DB may even crash, such as the modulo algorithm, a node hangs up, and a data flood may be encountered, which may face an avalanche situation.

 

Eight, cache bottomless phenomenon:

The problem was raised by the staff of facebook, facebook has reached the memcached node around 2010.

3000. Cache thousands of G content.

They found a problem---memcached connection frequency, the efficiency decreased, so they added memcached nodes,

After adding it, I found that the problem caused by the connection frequency still exists and has not improved, which is called "bottomless pit phenomenon".

The harm caused by the bottomless pit problem:

(1) A batch operation of the client will involve multiple network operations, which means that the batch operation will continue to increase as the number of instances increases.

(2) The number of network connections on the server side increases, which also affects the performance of the instance.

in conclusion:

To sum it up in a simple sentence: more machines does not mean more performance. The so-called "bottomless pit" means that more input does not necessarily mean more output .

Distributed is unavoidable, because our website traffic and data volume are getting larger and larger, and one instance cannot be trapped at all, so how to efficiently obtain data in batches in distributed caching and storage is a difficulty.



 

 

9. Distributed Approach

1) Modulo algorithm

Simply put, it is "distribution according to the remainder of the number of servers". Find the integer hash value of the key, divide it by the number of servers, and select the server based on the remainder. The method of calculating the remainder is simple, and the dispersion of the data is quite good, but it also has its shortcomings. That is, when servers are added or removed, cache reorganization is expensive. After the server is added, the remainder will change drastically, so that the same server as when saved cannot be obtained, which affects the cache hit rate. Distributed often uses the modulo algorithm to distribute data, which is very good when the data nodes do not change, but when the data nodes increase or decrease, due to the need to adjust the modulo in the modulo algorithm, all data must be re-according to the new one. The modes are distributed to each node. If the amount of data is huge, such work is often difficult to complete.

The principle is similar : all data are distributed on the hypothetical three nodes, the bicycle wheel is composed of three strands of steel wire, one of them collapses, and there is no support within a 120-degree radius, your wheel is invalid.

 

2) Paxos algorithm

The steps and constraints of the Paxos algorithm are actually a distributed election algorithm. Its purpose is to pass elections among a bunch of messages, so that the receivers or executors of the messages can reach an agreement and execute them according to the consensus message order. In fact, from the simplest point of view, in order to achieve everyone executing the same sequence of instructions, it can be done serially. For example, a FIFO queue is added in front of the distributed environment to receive all instructions, and then all service nodes follow the queue. order to execute. This method can of course solve the consistency problem, but it does not meet the distributed characteristics. What if the queue is down or overwhelmed? The genius of Paxos is that it allows each client to send instructions to the server without affecting each other, and everyone reaches an agreement in accordance with the election method, which has distributed characteristics and better fault tolerance.

 

 

3) Consistent Hash Algorithm

The consistent Hash algorithm is based on the optimization of the modulo algorithm, and solves the above problems through some mapping rules 

Consistent hashing is actually changing the previous point mapping to a segment mapping, so that the changes to other data nodes are as small as possible after the data node is changed. This idea is reflected in the operating system for storage problems. For example, in order to optimize the use of storage space, the operating system distinguishes different dimensions such as segments and pages, and adds a lot of mapping rules. The purpose is to avoid the cost of physical changes through flexible rules. 

The consistent hash algorithm itself is relatively simple, but there can be many improved versions according to the actual situation, and its purpose is nothing more than two points: 

After the node changes, other nodes are affected as little as possible 

Data redistribution after node changes is as balanced as possible 

There is not much difficulty and workload to implement this algorithm in terms of technology itself. What needs to be done is to establish the mapping relationship you designed without any framework or tools.

The principle is similar: a lot of nodes are virtualized to map three nodes, for example, 63 nodes are virtualized, if you collapse one, there are 42 nodes arranged on the ring, and the wheel can still be used



 

4) Token Ring

When the ring is initialized, process 0 gets a token token. The token runs around the ring, passing it from process k to process k+1 (modulo the ring size) in a peer-to-peer message. After a process gets a token from its neighbors, it checks to see if it wants to enter a critical section. If it wants to enter the critical section itself, then it enters the critical section, does what it has to do, and leaves the critical section. After the process exits the critical section, it continues to pass tokens along the ring. Entering another critical section with the same token is not allowed. If a process gets a token from a neighboring process, but it doesn't want to enter the critical section, it just passes the token down the ring.

Advantages: no starvation, worst case is to wait for all other processes to enter this critical section and then exit from it before it enters.

Disadvantages: If the token is lost, then it has to regenerate the token, it is difficult to detect the token loss; the algorithm also has troubles if a process crashes, but it is easier to recover than other algorithms.

The principle is similar: in the handkerchief throwing game I played when I was a child, more than N people formed a circle, and the token handkerchief was driven around the ring by one person. wheel running.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326914015&siteId=291194637