web1-comprehensive solution

If there is an array, use binary search to find an element:

The first time: find in n elements;

The second time: find in n/2 elements;

The third time: n/4;

…………

The kth time: n/(2^k-1);

If the kth time has been found, and it happens to be the last element that is found, then

n/(2^k-1) = 1    ->     k = log2(n+1)   ->即log2n

One: hash algorithm

The hash algorithm is an efficient addressing method, which locates the target location through the hash value. 

Ordinary hash algorithm:

It is to take the modulus, and then calculate the service node, such as sub-database and sub-table, nginx ip_hash

Problem: Both expansion and contraction need to be recalculated, which has a great impact

Consistency hash:

1 to 2 to the 32nd power -1, forming a closed loop (because the hash value is a 32-bit unsigned value), the server's hash value falls on the ring, and the client-side ip also falls on the ring with the same algorithm, clockwise to the nearest service node is the access node

In this way, shrinking or expanding will only affect a small number of requests, such as:

  • The downtime of node 2 will only affect between 1 and 2. It should have fallen on node 2, but after the downtime of node 2, it will fall on node 3.
  • Adding node 5 between 2-3 will only affect the range between 2-5, and what should have fallen to 3 will fall to 5.

Code:

public class ConsistentHashNoVirtual { 
    public static void main(String[] args) { 
        //step1 initialization: correspond the hash value of the server node IP to the hash ring 
        // define the server ip 
        String[] tomcatServers = new String[] 
                {" 123.111.0.0", "123.101.3.1", "111.20.35.2", "123.98.26.3"}; 
        SortedMap<Integer, String> hashServerMap = new TreeMap<>(); 
        for (String tomcatServer : tomcatServers) { 
            // Find Output the hash value of each ip, corresponding to the hash ring, store the corresponding relationship between the hash value and ip 
            int serverHash = Math.abs(tomcatServer.hashCode()); 
            // store the corresponding relationship between the hash value and ip 
            hashServerMap.put (serverHash, tomcatServer); 
        } 
        //step2 Find the hash value for the client IP 
        //Define the client IP
        String[] clients = new String[] 
                {"10.78.12.3", "113.25.63.1", "126.12.3.8"}; 
        for (String client : clients) { 
            int clientHash = Math.abs(client.hashCode()) ; 
            //step3 For the client, find the server that can handle the current client request (the closest clockwise on the hash ring) 
            // Find out which server node can handle it according to the hash value of the client ip () 
            // The tailMap method returns the key value in the hashServerMap >= the value of the current key 
            SortedMap<Integer, String> integerStringSortedMap = hashServerMap.tailMap(clientHash); 
            if (integerStringSortedMap.isEmpty()) { 
                // Take the clockwise first one on the hash ring serverInteger 
                firstKey = hashServerMap.firstKey();
                System.out.println("===========>>>>Client: " + client + " is routed to server: " + hashServerMap.get(firstKey)); } 
            else { 
                Integer firstKey = integerStringSortedMap.firstKey(); 
                System.out.println("===========>>>>Client: " + client + " is routed to server: " + hashServerMap.get(firstKey)); 
            } 
        } 
    } 
}

If there are too few nodes, a lot of data will be affected when an exception occurs. In addition, data may also appear in normal times, such as only node 1 and node 2, then most requests will fall to node 1.

At this time, multiple virtual nodes can be added to each real node. When the request falls on the virtual node, let it request the real node:

As follows: For the request of virtual nodes 2#2 and 2#3, let him request the real node 2#1. 

 Code:

flowCount; i++) { 
            hashServerMap.put(serverHash,tomcatServer);
            // Process the virtual node 
                int virtualHash = Math.abs((tomcatServer + "#" + i).hashCode()); 
                hashServerMap.put(virtualHash,"----request mapped from the virtual node "+ i + " : "+ tomcatServer); 
            } 
        } 
        //step2 Calculate the hash value for the client IP 
        // Define the client IP 
        String[] clients = new String[] 
                {"10.78.12.3","113.25.63.1","126.12. 3.8"}; 
        for(String client : clients) { 
            int clientHash = Math.abs(client.hashCode()); 
            //step3 For the client, find the server that can handle the current client request (the closest clockwise on the hash ring ) 
            // According to the hash value of the client ip to find out which server node can handle () 
            SortedMap<Integer, String> integerStringSortedMap =
            
                    hashServerMap.tailMap(clientHash); 
            if(integerStringSortedMap.isEmpty()) { 
                // Take the clockwise first one on the hash ring serverInteger 
                firstKey = hashServerMap.firstKey();
                System.out.println("===========>>>>Client: " + client + " is routed to server: " + hashServerMap.get(firstKey)); } 
            else{ 
                Integer firstKey = integerStringSortedMap.firstKey(); //If it is a virtual node, it will print out the corresponding real node 
                System.out.println("===========>>>>Client: " + client + " is routed to the server: " + hashServerMap.get(firstKey)); 
            } 
        } 
    } 
}
                

Nginx can use the ngx_http_upstream_consistent_hash module to configure the consistent hash of the request:

consistent_hash $remote_addr : can be mapped according to client ip
consistent_hash $request_uri : according to the uri mapping requested by the client
consistent_hash $args : Map according to the parameters carried by the client
The ngx_http_upstream_consistent_hash module is a third-party module, which needs to be downloaded and installed before use
Download nginx consistent hash load balancing module from github https://github.com/replay/ngx_http_consistent_hash

Upload the downloaded compressed package to the nginx server and decompress it

You have compiled and installed nginx in advance , now enter the source code directory of nginx at that time, and execute the following command
./confifigure —add-module=/root/ngx_http_consistent_hash-master
make
make install

Two: Clock synchronization problem

1. All nodes can be connected to the Internet, and the ntpdate -u ntp.api.bz command uses network time

2. Unable to connect to the Internet, use a certain node as a time server, and other nodes to synchronize; ntpdate time server ip

Three: distributed id

  • UUID under the java.util package: random meaningless strings, cannot be sorted, and the query is slow when the amount of data is large
  • Self-incrementing id of independent database table: There is a hidden danger of single point of failure, and in the case of large concurrency, there is a performance bottleneck in writing and querying; if the master-slave mode is used, the master node cannot be synchronized even if it hangs
  • redis incr self-increment command: depends on redis, if redis crashes, there may be data loss, resulting in duplication
  • Snowflake snowflake algorithm composition - 64 bits
    • 1. The first digit 0 represents a positive number
    • 2. 41-bit timestamp: 41-bit maximum value/year ms value, get 69 years, 9 months, 6 days, 15 hours, 47 minutes and 35 seconds; the calculation method of the time part is: current timestamp - initial timestamp (set by yourself Set the start time) and move forward 22 bits (64-1-41=22), because the segment of the timestamp is in the 2-42 position of 64 bits
    • 3. The 10-digit machine id means that 2^10=1024 machines can be deployed, which are generally divided into the first 5 data center ids: dataCenterId. . The last 5 digits of the machine id: workerId; dataCenterId represent the number of computer rooms, you can manually set the number 1, 2, 3, etc., as long as they are different. WorkerId can be distinguished by ip (for example, take the last segment of ip), and it can be guaranteed not to be repeated in the same computer room.
      The dataCenterId needs to be moved forward by 17 bits (64-1-41-5=17), the workerId needs to be moved by 12 bits (64-1-41-5-5=12), and the remaining 12 bits are the serial number, so there is no need to move . You can set dataCenterId and workerId according to the actual situation. For example, we only have one computer room, occupying only 1 digit, then the remaining 9 digits are workerId. At this time, dataCenterId needs to be shifted left by 21 digits (64-1-41-1=21)
    • 4. 12-digit serial number: the same computer room and the same machine can generate 2^12-1=4095 serial numbers within one ms
    • 5. Note: To avoid concurrency, be sure to lock

Guess you like

Origin blog.csdn.net/growing_duck/article/details/113807542