C language-how to store floating-point data in the computer

Floating-point data includes float, double, and long double types.

Let's take a look at an example of floating-point number storage:
Insert picture description here
As can be seen from the above example, we input the same number, why is the interpretation result of floating-point and integer so different? To understand this result, we must first know the internal representation of the floating-point type in the computer.
According to the international standard IEEE (Institute of Electrical and Electronic Engineering) 754 , any binary floating point number V can be expressed in the following form:
(-1)^S* M *2 ^E
(-1)^S represents the sign bit, when S When =0, V is a positive number; when s=1, V is a negative number;
M represents a significant number, greater than or equal to 1, and less than or equal to 2;
2^E represents the exponent bit.

For example, the decimal 5.0, written in binary is 101.0, which is equivalent to 1.01*(2^2). Then, according to the above format, we can get s=0, M=1.01, E=2;
decimal -5, Written in binary is -101.0, which is equivalent to 1.01*(2^2), then, s=1, M=1.01, E=2.

IEE754 stipulates : For 32-bit floating-point numbers, the highest bit is the sign bit s, the next 8 bits are the exponent E, and the remaining 32 bits are the significant digits M. Insert picture description here
IEEE754 also has some special regulations for significant digits M and exponent E **: As mentioned earlier, 1<=M<2, M can be written in the form of 1.xxxxxx, where xxxxxx represents the decimal part. However, IEEE stipulates that when the computer saves M, the first digit of this number is always 1 by default, so it can be omitted, and only the decimal part after it is kept. For example, when saving 1.01, only keep 01, and wait until it is read. In addition, the purpose of this is to save 1 significant figure. Take a 32-bit floating point number as an example. There are only 23 bits left for M. After the 1 in the first bit is rounded down, it is equal to 24 significant digits.

As for the index E, the situation is more complicated.
First, E is an unsigned integer (unsigned int) , which means that if E is 8 bits, its value range is 0-255, if E is 11 bits, its value range is 0-2047, but We know that E in scientific notation can be a negative number, so IEEE 754 stipulates that the true value of E stored in memory must also be added with an intermediate number. For 8-bit E, add 127; for 11-bit E , Plus 1023.

For E to be taken out of the memory, there are three situations:
(1) E is not all 0 or not all 1; at
this time, the floating-point number is expressed by the following rule, that is, the calculated value of the exponent E minus 127 (or 1023), Get the real value, and then add 1 in front of the effective digit M. For example: 0.5 in binary form is 0.1, because the positive part must be 1, that is, to move the decimal point to the right by 1 place, then it is: 1.0*2^(-1 ), the order code is -1+127=126, which is represented as 01111110, and the 1.0 part of the number of digits is 0, and the integer part is 0, and 0 to 23 digits are added.
Then its binary representation is: 0 01111110 00000000000000000000000
(2) E is all 0;
when E is all 0, the true value of the floating point number is 0-127 (or 0-1023), and the significant number M is no longer added to the first digit The 1 is reduced to a decimal of 0.xxxxxx. This is done to indicate plus or minus 0, and very small numbers close to 0.

(3) E is all 1;

When E is all 1, if the significant digits are all 0, it means plus or minus infinity.

Explain the previous example:
(1) Why is the 9 in the example reduced to a floating point type and it becomes 0.000000? First, the binary representation of 9 stored in the computer is: 0000 0000 0000 0000 0000 0000 0000 1001. It can be seen that the sign bit is 0, which represents a positive number, the next 8 bits E represent exponents, and the next 23 bits represent significant digits. When E is all 0, it represents a number close to 0, so the output bit in the example is 0.000000;
(2) Why does the floating-point number 9.0 use integer output to become a relatively large random number?
First of all, the binary representation of the floating point number 9.0 is: 1001.0, that is 1.001*(2^3), that is: s=0, E=3+127=130, M=1.001;
that is: 0 10000001 001 0000 0000 0000 0000 0000
this The 32-bit binary, reduced to decimal, is exactly 1091567616.

Guess you like

Origin blog.csdn.net/m0_46551861/article/details/106382206