How are floating point numbers stored in C language?

        hello everyone, I am Boom Jiabao in c language. The content of this blog is to briefly talk about how floating point numbers are stored in C language. (The previous blog talked about the storage of integer families, please refer to the homepage for details). As we all know, there are many types of floating point numbers. For example, float occupies 4 bytes, and double occupies 8 bytes. The content of this blog will be explained around these two types.

   According to the international standard IEEE (Institute of Electrical and Electronics Engineering), any binary floating-point number V can be expressed in the form of (-1)^S*M*2^E. Among them (-1)^S represents the sign bit, that is, positive and negative. S=0 is a positive number. S=1, is a negative number. (Binary, so only 0 and 1) M represents a valid number, greater than or equal to 1, less than 2. 2^E represents the exponent.

        What does this passage mean? For example, the decimal number we want to store is 5.0, converted to binary is 101.0, written in the above form is (-1)^0*1.01*2^2, S=0; M=1.01; E=2; Another example is that the decimal number is -5.0, and it is -101.0 when converted into binary. In the same way, S=1; M=1.01; E=2; the blogger here has to point out that we need to know clearly when we study the storage of floating-point numbers , some numbers cannot be stored accurately , the reason is here, it will convert decimal to binary storage, and the decimal points represent the minus 1 power of 2, the minus 2 power of 2...in this Push, that is, 0.5; 0.25...etc. Therefore, there will always be some decimals that cannot be saved with accurate values.

        IEEE 754 stipulates: For a 32-bit (float) floating-point number, the highest bit is S; the next 8 bits are the exponent E, and the remaining 23 bits are the effective number M. As shown in the picture:

For 64-bit (double), the highest bit is S; the next 11 bits are the exponent E, and the remaining 52 bits are the significant number M. As shown in the picture:

         IEEE 754 has special regulations on the effective number M and exponent E. As mentioned earlier, 1<=M<2; that is, M can be written as 1.xxxxxx, where xxxxxx is the decimal part. Therefore, it is stipulated that when saving M, the first digit is 1 by default, so this 1 is not saved, but the decimal part is directly stored, that is, the xxxxxx part. Finally, take it out and give it another +1 when you use it. In this way, a space can be saved. Taking 32 bits as an example, 23 bits are left for M. After 1 is rounded down, 24 significant figures can be archived.

        As for the index E, the situation is more complicated. First, E is an unsigned integer, which means that the value range of E is 0--255 (32-bit floating-point number) and 0--2047 (64-bit floating-point number). But we know that E in scientific notation can have negative numbers. For example, the decimal floating-point number 0.5 is 0.1 in binary, which is 1*2.0^(-1) in binary, and E is -1, which is a negative number. Therefore, it is contrary to the range of unsigned bits. In this case, IEEE754 proposes that an intermediate number must be added to the real value of E when stored in memory. For 8-bit E, the intermediate number is 127, and for 11-bit E, the middle number is 1023. For example, the E of 2^10 is 10, so when saving it as a 32-bit floating point number, it must be saved as 10+127=137, which is 10001001. In this way, there will be no negative numbers.

 

Guess you like

Origin blog.csdn.net/m0_73321818/article/details/131058332