The concept of checksum, calculation principle, inspection principle, example calculation, code programming, highly recommended

Please take a look before reading: I am a person who is keen on recording. Every time I write a blog, I will study it repeatedly and try my best to continuously improve the quality of the blog. The article is set to only fans, because blogging really took a lot of effort. Hope to improve each other Thank you! !


提示:以下是本篇文章正文内容

1. Background introduction

The problems about checksum encountered in socket programming under Windows are first recorded as follows.


2. Checksum

There are two check mechanisms in the TCP/IP system: CRC check and checksum, which are used to ensure the integrity of the message. The CRC check is used to check the entire Ethernet frame, and the 32-bit check code is added to the last four bytes of the Ethernet frame. More commonly used is the checksum mechanism, which is used in the third and fourth layer protocols such as IP, ICMP, TCP, and UDP.

The checksum algorithms of protocols such as IP, ICMP, TCP, and UDP are the same, and they all use the data stream as a 16-bit integer stream for repeated superposition calculations.


3. Operation mechanism (textbook)

The online version is going around, don't look at it, just look at Xie Xiren's "Computer Network" textbook, the picture above.
Figure 1, text description
insert image description hereFigure 2, flowchart descriptioninsert image description here


4. Calculation mechanism (summary) Highly recommended, strongly recommended! ! !

When sending data, in order to calculate the checksum of the data packet. The following steps should be followed:

1. Set the checksum field to 0;

2. Treat the data to be checked as numbers in 16-bit units, and perform binary one’s complement arithmetic operations to add them up sequentially (many people say “one’s complement summation” on the Internet, which is not accurate! See the explanation below for the reason);

3. Invert the obtained sum and store it in the checksum field.

When receiving data, it is relatively simple to calculate the checksum of the data packet, as follows:

1. Treat the header as a 16-bit number, and use the one’s complement arithmetic operation to sum (including the checksum field); 2. Invert the
sum;
3. Check whether the calculated checksum result is 0 . If equal to 0, the checksum is correct. Otherwise, the checksum is wrong and the protocol stack discards the packet.

illustrate:

  • 1. To master the concept of inverse code arithmetic operation :

    Take the addition of one's complement arithmetic operation as an example.

    The addition of 0 and 0 is 0, the addition of 0 and 1 is 1, and the addition of 1 and 1 is 0 but a carry 1 is generated and added to the next column. If a carry is generated after the highest bits are added, the overflowed carry 1 (possibly multiple 1s) will be added to the final result.

    And the overflow bit will not be considered if the complement code is added! ! !

  • 2、“Use one's complement arithmetic operation to sum, and then invert "==" one's complement sum”:

    In the second step of calculating the checksum, there are many "one's complement sums" mentioned on the Internet, including the explanation of "ones complement arithmetic operations" in textbooks, which also mentions one's complement summation. This is easy to get around, my understanding is as follows.

    Use one's complement arithmetic to sum and then negate. In fact, it directly takes the number (complement code) stored in the computer, adds the complement code directly using the one's complement arithmetic operation, and then inverts the sum;

    One's complement summation. It is to invert the number (stored in complement code at this time) first, and then sum, if there is a carry in the highest bit, carry 1 to the lower bit;

    Invert first and then add, add first and then negate, the result is the same . So "use one's complement arithmetic to sum, then negate" == "one's complement sum". And often the latter computer executes faster, so the implementation code is added first, and then reversed, which is why the book talks about the latter, because it is based on the comparison code.

  • 3. Computer internal storage, original code, inverse code, complement code :

    The above mentioned the storage of numbers inside the computer, see " Original Code, Inverse Code, Complementary Code: Concept, Computer Internal Representation, Examples, Operation and Conversion Rules, Reasons for Use ". Therefore, it has been verified that the calculation checksum steps given in the book are actually obtained by using the complement code to perform the addition of the one's complement arithmetic operation, and then inverting it.

  • 4. The principle of receiver verification :

    There are two versions on the Internet, "all 0s are correct" or "all 1s are correct". Actually it has to do with the implementation. Since the book says "all 0s are correct", then use this as the standard.

  • 5. Why is it correct for the receiver to calculate all 0?

    Before understanding, you need to know the following algorithm

    A + ~A =1,例如
    	01101001
    +	10010110
     ——————————————
        11111111
    

    For easy viewing, assume that the packet header is divided into four 16-bits, the values ​​are A, B, C, and D respectively, and the second 16-bit is set as the checksum field. It is assumed here that there is no overflow bit, and those who have overflow bits can also verify the compliance by themselves.

    发送方计算校验和
    	第一步:校验和字段置0,并依次求和。
    	sum = A + 0 + C + D
    	第二步:取反。
    	check_sum = ~(A + C + D)= B
    接收方校验
    	第一步:全部求和
    	A + B + C + D = (A + C + D ) + [ ~(A + C + D)] =1
    	第二步:取反
    	~1 = ~0
    

5. Example of checksum calculation (highly recommended, strongly recommended!!!)

The ICMP packets captured by wireshark are as follows

08 00 4d 5a00 01 00 01 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 61 62 63 64 65 66 67 68 69 where the checksum is "4d 5a
"

1. Set the checksum field to 0

08 00 00 0000 01 00 01 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 61 62 63 64 65 66 67 68 69

2. Make them two bytes (16bit) into a group, and add (summation) in binary order until the result is obtained. Here I don’t use binary representation, and use hexadecimal directly to look good, but you must know what is inside the computer using binary processing.

0800 + 0000 + 0001 + 0001 + 6162 + 6364 + 6566 + 6768 + 696a + 6b6c + 6d6e + 6f70 + 7172 + 7374 + 7576 + 7761 + 6263 + 6465 + 6667 + 6869 = 6 b29f = b2a5 (handle overflow bit, i.e. b29f + 6 = b2a5)

3. Reconciliation

~b2a5 & 0xffff --> 4d5a


6. Programming checksum (recommended, highly recommended!!!)

Here we take the C language as an example.

version 1

//计算校验和 版本1  参考微软Ping源码
/*
步骤: 
1、将报文分成两个字节一组,如果总字节数为奇数,则在末尾追加一个零字节;
2、对所有 双字节 进行按位求和;
3、将高于 16 位的进位取出相加,直到没有进位;
4、将校验和按位取反;

心得:
1、先取反后相加,先相加后取反,得到的结果是一样的。后者更高效,因此实现代码都是先相加,最后再取反。 
2、关于求校验和时处理溢出位。可以在循环里面加进位(版本2),也可以等全部循环结束了再加(版本1); 
3、处理进位时是通过双目运算符<<或>>操作进行的; 
4、反码算术运算可参加计网教材  
*/
USHORT checksum(USHORT *buffer, int size) {
    
    
	unsigned long cksum=0;
	while(size >1) {
    
    
		cksum+=*buffer++;
		size -=sizeof(USHORT);
	}
	if (size)               //buffer中的数据是奇数个字节
		cksum += *(UCHAR*)buffer;
	cksum = (cksum >> 16) + (cksum & 0xffff);   //完成了所有数的累加后,开始处理进位。将所有数完成求和后,统一将高 16 位(溢出数)加到低 16 位(求和数)
	cksum += (cksum >>16);   //在完成上一次操作后,可能又发生了溢出,再执行一次同样的操作。有溢出,加的是溢出数,没有溢出,加0000,相当于没有加
	return (USHORT)(~cksum);  //取反
}
/*
这里存在两个问题:
首先16bit 数累加时,如果有 2^16 个数累加,那么会使 32 位数本身发生溢出,
但好在目前人类还没提出这么长的协议,所有不用担心 32 位数的溢出问题。

其次,如果将溢出数与结果数累加后,有可能再次溢出 1 ,所以在完成第一次高 16 位与低 16 位的运算后,需要再进行一次该运算,
第二次运算不可能产生溢出。(可以用最极端的情况考虑下 16bit 全1 与 16bit 全1 进行运算)
*/

version 2

//计算校验和 版本2  参考https://fasionchan.com/network/icmp/ping-c/
/*
心得:
这里移位操作可能看不懂,其实就是左移右移来进行倍数的扩大,从而拼接起来,实现8位变16位
*/
uint16_t calculate_checksum(unsigned char* buffer, int bytes) {
    
    
    uint32_t checksum = 0;
    unsigned char* end = buffer + bytes;  //buffer相当于指向头,end相当于指向尾
    // 奇数字节加上最后一个字节并重置结束
    if (bytes % 2 == 1) {
    
    
        end = buffer + bytes - 1;
        checksum += (*end) << 8;  //最后一个字节左移8位,从而实现最后一个字节后面补0,再加上去
    }
    // 逐个相加,这里处理进位就是在循环里处理,当然也可以在循环外
    while (buffer < end) {
    
    
        checksum += (buffer[0] << 8) + buffer[1];//第一个字节左移8位,再加上第二个字节,从而实现两个8位变16位
        // 添加进位(如果有)
        uint32_t carray = checksum >> 16;  //右移16位,相当于把进位单独拎出来了,从而加上去
        if (carray != 0) {
    
    
            checksum = (checksum & 0xffff) + carray;
        }
        buffer += 2;
    }
    // 取反
    checksum = ~checksum;
    return checksum & 0xffff;
}

Explanation:
If you don’t understand the binocular operator shift operator, read the notes written before " Understanding from the perspective of code: "Left shift<<" and "Right shift>>" in C language" .


Reference links:
" In-depth LwIP (1): Checksum Mechanism "
" Small Vegetable Network: Developing the Ping Command in C Language "
" Implementing the Ping Command in C Language "
" How to Calculate the ICMP Checksum "

Code words are not easy, thank you for your praise! ! !
Code words are not easy, thank you for your praise! ! !
Code words are not easy, thank you for your praise! ! !

Guess you like

Origin blog.csdn.net/qq_40967086/article/details/128527141