C language - floating point storage

In the previous blog, I also introduced the storage method of integer type, so let's introduce the storage method of floating point type in this article.


content

Floating point storage rules

deposit rules

take out rule

example

Epilogue


Floating point storage rules

According to the international standard IEEE (Institute of Electrical and Electronics Engineering) 754, any binary floating-point number V can be represented in the following form: (-1)^S * M * 2^E
(1) (-1)^S represents the sign bit. When S=0, V is a positive number; when S=1, V is a negative number.
(2) M represents a significant number, greater than or equal to 1 and less than 2.
(3) 2^E represents the exponent bit.

deposit rules

IEEE 754 states:
For a 32 -bit floating-point number, the most significant 1 bit is the sign bit S , the next 8 bits are the exponent E , and the remaining 23 bits are the significand M.

 For 64 -bit floating-point numbers, the most significant 1 bit is the sign bit S , the next 11 bits are the exponent E , and the remaining 52 bits are the significand M.

IEEE 754 also has some special provisions for the significand M and the exponent E.
The range of M is:  1≤M<2 , that is, M can be written in the form of 1.xxxxxx , where xxxxxx represents the fractional part.
IEEE 754 stipulates that when M is stored in the computer, the first digit of this number is always 1 by default , so it can be discarded and only the following xxxxxx part is saved. For example, when saving 1.01 , only 01 is saved , and when it is read, the first 1 is added. The purpose of this is to save 1 significant figure. Take a 32 -bit floating point number as an example, leaving M with only 23 bits,
After rounding off the 1 in the first digit , 24 significant digits can be saved .
As for the index E , the situation is more complicated.
First, E is an unsigned integer ( unsigned int )
This means that if E is 8 bits, its value range is 0~255 ; if E is 11 bits, its value range is 0~2047 . However, we know that E in scientific notation can have negative numbers, so IEEE 754 stipulates that the real value of E must be added with an intermediate number when stored in memory . For 8 -bit E , this intermediate number is 127 ; For an 11 -bit E , this middle
The number is 1023 . For example, the E of 2^10 is 10 , so when saving as a 32 -bit floating point number, it must be saved as 10+127=137 , that is
10001001

take out rule

E is not all 0 or not all 1
At this time, the floating-point number is represented by the following rules, that is, the calculated value of the exponent E is subtracted from 127 (or 1023) to obtain the real value, and then the significant number M is added with the first 1.
for example:
The binary form of 0.5 (1/2) is 0.1. Since it is stipulated that the positive part must be 1, that is, the decimal point is shifted to the right by 1, then it is
1.0*2^(-1), its order code is -1+127=126, which is expressed as
01111110, and the mantissa 1.0 removes the integer part as 0, and fills in 0 to 23 bits 00000000000000000000000, then its binary representation is: 0 01111110 0000000000000000000000
(S)      (E)                            (M)
E is all 0
At this time, the exponent E of the floating point number is equal to 1-127 (or 1-1023), which is the real value, and the significant number M is no longer added with the first 1, but is restored to the decimal of 0.xxxxxx. This is done to represent ±0, and very small numbers close to 0.
E is all 1
At this time, if the significant digits M are all 0, it means ± infinity (positive or negative depends on the sign bit s)

example

 

When viewed in the memory window in the compiler, the first line of the memory window is the address of a, which is the form of 5.5 in memory.

 


Epilogue

        There will be a lot of text in this article. As long as you read and think carefully, it is understandable. Learning this knowledge is also improving yourself, so although it is not used much, you still need to understand it.

Guess you like

Origin blog.csdn.net/m0_64607843/article/details/123215151