Hamming code for data verification

Preface

In the computer world, all data exists in binary form, and the emergence of the Internet makes it possible to communicate between computers, and they communicate with each other by transferring data to each other. Then how to ensure that one computer sends to another computer Does the data jump due to network fluctuations? Therefore, in order to ensure that the data received by each computer is accurate, we need to create a verification mechanism to ensure the correctness of the transmitted data.

 

Parity check

Now suppose that computer A has a task, and it needs to transmit a piece of binary data 1010 to computer B. In order to ensure that the data received by B does not jump, A and B agree: "I need all the original data (currently 1010 ) Add a number (that is, check digit) in front, and after adding it, ensure that the number of the entire data 1 is even and then send it to you. After you receive it, see if the number of the entire data 1 is even, if not It means that the data has jumped."

  • A gets the original data 1010, and there are already two 1s, so its check digit is filled with a 0, and the entire data becomes 01010 and sent to B.
  • Sure enough, the last bit of the data sent a jump during the transmission. The data received by B is 01011, but it finds that the number of 1 is an odd number and it knows that the data is wrong.

The check bit added by A above makes the number of 1s an even number is called even check. If both parties agree to make the number of 1s an odd number to be correct, it is called odd check. Parity check belongs to two types of checks. However, their shortcomings are also obvious. The first one can only check that one bit of data has jumped. For example, in the above case 01010, the right two bits jumped to 01001 during the transmission process, and the number of 1 still constitutes an even number, so B receives the data and cannot judge its correctness. Secondly, the parity check can only judge the correctness of the end, and cannot indicate which bit of data has an error.

 

Hamming code

Hamming code is a more advanced check method created on the basis of parity check. It can not only judge whether the end data is wrong, but also point out which bit of the whole data is wrong. Its defect is only It can verify that one bit of data has jumped. If multiple bits of data change, it cannot be detected. But in the real world, the data has a large jump and the probability is that only one bit has changed, so the Hamming code is also Have corresponding application scenarios.

 

Calibration principle

Suppose A has a piece of data 1010, which needs to be sent to B, but in order to be able to correct errors, it introduces three check bits a, b, and c. It is placed in the following form:

                                  1  0  1  c  0  b  a

                                  7  6  5  4  3  2  1

Why introduce three check digits and place them in this order will be described later, and then identify the corresponding serial number under each data.

What is the XOR operation? The XOR operation of two binary numbers is the same as 0, and the difference is 1. (For example, 1 XOR 1 = 0, 1 XOR 0 = 1). ⊕ is the XOR operator.

 

 

                                          

  • Let's draw three circles first, and write the number of abc in the yellow part first, which is 1 2 4. Then, add the numbers 1 2 4 adjacent to each other and mark them in the adjacent area. For example, 1 and 2 are adjacent to each other. Adding equal to 3 is marked in the middle. 1 and 4 adjacent to each other is equal to 5 marked in the middle. 2 and 4 adjacent to each other is equal to 6 marked in the middle. The last area in the middle is marked as 7.
  • In this way, we  divide 1 0 1 c 0 b a  into three groups, 1 3 7 5 is the first group, and number 1 represents the check digit a.
  • Then divide the check digit as follows: a - 3 5 7, b - 3 6 7, c - 5 6 7.
  • We take the exclusive OR operation for each group of abc, and the results of each group of calculations are respectively used as the value of abc. Therefore, we can calculate a = (XOR No. 3 XOR No. 5 XOR No. 7) = (0 XOR 1 XOR 1) = 0, similarly b = 1, c = 0. The entire data becomes 1010010.
  • Now A sends the data 1010010 to B. Since A and B have agreed in advance how to set the check digit and the order of placement, B quickly analyzes the data.
  • B restores the data to  1 0 1 c 0 b a, where cba is equal to 0 1 0. Then B also drew the same circle as above according to the serial number.
  • The serial number of a is 1, and B performs an exclusive OR operation (equivalent to even parity) to the value corresponding to No. 3, No. 5 and No. 7 to see if it is equal to 0 to determine whether there is an error in the data of group a.
  • We know that the value of a is obtained by A through the exclusive OR operation of 3 7 5 at the sending end. Now B receives the exclusive OR operation of a and 3 7 5. It is reasonable to say that it should be equal to 0 under normal circumstances. If it is equal to 0, it means that one of the four numbers a 3 7 5 in group a has jumped, then the entire group a data is classified as an error. Then B performs the same detection on the data of group b and group c. Step B can determine that the entire data is wrong by calculating the value of each group as long as there is a group of errors, but the error cannot be corrected yet.
  • B divides the data into three groups according to the check digit: a - 3 5 7 (1 group), b - 3 6 7 (2 groups), c - 5 6 7 (4 groups). Suppose B calculates 1 The group (that is, group a) is wrong. Since the Hamming code check is based on the jump of only one bit of data, it can be explained that the 2 groups and the 4 groups are normal, and then the 2 groups of the 2 can be introduced. 3 7 6 did not make a mistake, and 4 5 6 7 of the 4 groups did not make any mistakes. The result can only be that the 1 of the 1 group is wrong. The same reason, B calculates the 1 group and the 2 group to be wrong, then the 4 group must be normal Yes, it can be concluded that the four pieces of data 4 5 6 7 have no errors. The intersection of group 1 and group 2 is 3 and 7, and 7 is proved to be error-free, and the final result can only be that the data of 3 is wrong. It is the principle of Hamming code error correction.

Through the whole process above, I found that the core of data error correction is to add 3 check digits, and group the data according to these three check digits. Through the association between different groups, once a certain group or When multiple groups have errors at the same time, you can quickly infer which data in which group has an error. In this way, the purpose of error correction is achieved. Then how to judge the terminal should add several check bits, and how to press the check bit To group data becomes the next key.

 

Check actual combat

User A needs to send a piece of original data 1 0 1 0 to user B. Then how to calculate the number of check digits based on the number of original data?

Assuming that the original data has n bits and the check data has k bits, the check bit can represent 2^k situations. The data we pass to B is equal to n + k bits. Under the premise that there is only one bit transition, n There are n + k cases where the + k-bit data is wrong, and the data is completely correct. Then 2^k must be able to express all the conditions, so the following inequality can be derived:

                                                                       2^k >= n+k+1

Now the original data n = 4, according to the above inequality, it is not difficult to deduce k = 3. Then set the three check digits k3 k2 k1 and combine them into the original data. Then how to place the three check digits? We can first Found a rule from the numbers 2^0 = 001,2^1 = 010, 2^2 = 100, 2^3 = 1000.2 only the first digit is 1, and the first digit is 1, and the second is 1,2 Only the third digit of the power is 1. I now want k1 to be responsible for all the groups with the first digit of the original data sequence number being 1, and k2 for all the groups with the second digit of the original data serial number being 1. Then let the sequence number of k1 = 2^ 0 = 1, the serial number of k2 = 2^1 = 2, the serial number of k3 = 2^2 = 4. (assuming kn, the serial number of kn is equal to 2^n-1)

According to the above calculation, the data can be arranged as follows (and the serial number is marked):

                                                                 1 0 1 k3 0 k2 k1

                                                                 7  6  5   4   3   2    1

1. Now group the data as described above:

k1 is responsible for the data whose first digit of the original data sequence number is 1. For example, No. 3 011, No. 5 101, and No. 7 111, their first digit is 1. The grouping situation is as follows:

                        k1 - 3 5 7

                        k2 - 3 6 7

                        k3 - 5 6 7

2. The composition is complete, now calculate the values ​​of k1, k2 and k3

                        k1 = No. 3 ⊕ No. 5 ⊕ No. 7 = 0 ⊕ 1 ⊕ 1 = 0

                        k2 = No. 3 ⊕ No. 6 ⊕ No. 7 = 0 ⊕ 0 ⊕ 1 = 1

                        k3 = No. 5 ⊕ No. 6 ⊕ No. 7 = 1 ⊕ 0 ⊕ 1 = 0

3. Then the final data is  1 0 1 0 0 1 0, and A sends this piece of data to B

4. B receives the data, it also calculates k = 3 according to the formula 2^k >= n+k+1 , and knows where k1, k2, and k3 are located according to the calculation rules of A, and then launches it k1 = 0, k2 = 1, k3 = 0.

5. Now B starts to verify the data. He starts to do the following three sets of operations.

                 k1 ⊕ No. 3 ⊕ No. 5 ⊕ No. 7 = 0 ⊕ 0 ⊕ 1 ⊕ 1 = 0

                 k2 ⊕ No. 3 ⊕ No. 6 ⊕ No. 7 = 1 ⊕ 0 ⊕ 0 ⊕ 1 = 0

                 k3 ⊕ No. 5 ⊕ No. 6 ⊕ No. 7 = 0 ⊕ 1 ⊕ 0 ⊕ 1 = 0

According to the normal situation without error, the results of the three sets of calculations are all 0. If the calculation result of the first set is 1, then the second and third sets of numbers 3, 5, 6, and 7 And there is no error in k2 and k3, so it can be concluded that k1 has jumped. If the calculation results of the first group and the second group are 1 at the same time, then it can be inferred that there is no error in k3, No. 5, No. 6, and No. 7, and the first The intersection of the group and the second group is No. 3 and No. 7, and No. 7 is okay, only that No. 3 has a jump.

Of course, there is a simpler way to judge the end. If the first group and the second group of calculation results are 1 at the same time, then the results of the three groups from bottom to top are combined together to be 011, and the result is equal to 3, indicating that the data No. 3 is wrong. In this way, the inspection and error correction functions are all realized.

 

 

Guess you like

Origin blog.csdn.net/brokenkay/article/details/107591797