In the previous blog, I also introduced the storage method of integer type, so let's introduce the storage method of floating point type in this article.
content
Floating point storage rules
According to the international standard IEEE (Institute of Electrical and Electronics Engineering) 754, any binary floating-point number V can be represented in the following form: (-1)^S * M * 2^E(1) (-1)^S represents the sign bit. When S=0, V is a positive number; when S=1, V is a negative number.(2) M represents a significant number, greater than or equal to 1 and less than 2.(3) 2^E represents the exponent bit.
deposit rules
IEEE 754 states:For a 32 -bit floating-point number, the most significant 1 bit is the sign bit S , the next 8 bits are the exponent E , and the remaining 23 bits are the significand M.
For 64 -bit floating-point numbers, the most significant 1 bit is the sign bit S , the next 11 bits are the exponent E , and the remaining 52 bits are the significand M.
IEEE 754 also has some special provisions for the significand M and the exponent E.The range of M is: 1≤M<2 , that is, M can be written in the form of 1.xxxxxx , where xxxxxx represents the fractional part.IEEE 754 stipulates that when M is stored in the computer, the first digit of this number is always 1 by default , so it can be discarded and only the following xxxxxx part is saved. For example, when saving 1.01 , only 01 is saved , and when it is read, the first 1 is added. The purpose of this is to save 1 significant figure. Take a 32 -bit floating point number as an example, leaving M with only 23 bits,After rounding off the 1 in the first digit , 24 significant digits can be saved .
As for the index E , the situation is more complicated.First, E is an unsigned integer ( unsigned int )This means that if E is 8 bits, its value range is 0~255 ; if E is 11 bits, its value range is 0~2047 . However, we know that E in scientific notation can have negative numbers, so IEEE 754 stipulates that the real value of E must be added with an intermediate number when stored in memory . For 8 -bit E , this intermediate number is 127 ; For an 11 -bit E , this middleThe number is 1023 . For example, the E of 2^10 is 10 , so when saving as a 32 -bit floating point number, it must be saved as 10+127=137 , that is10001001 。
take out rule
E is not all 0 or not all 1At this time, the floating-point number is represented by the following rules, that is, the calculated value of the exponent E is subtracted from 127 (or 1023) to obtain the real value, and then the significant number M is added with the first 1.for example:The binary form of 0.5 (1/2) is 0.1. Since it is stipulated that the positive part must be 1, that is, the decimal point is shifted to the right by 1, then it is1.0*2^(-1), its order code is -1+127=126, which is expressed as01111110, and the mantissa 1.0 removes the integer part as 0, and fills in 0 to 23 bits 00000000000000000000000, then its binary representation is: 0 01111110 0000000000000000000000(S) (E) (M)
E is all 0At this time, the exponent E of the floating point number is equal to 1-127 (or 1-1023), which is the real value, and the significant number M is no longer added with the first 1, but is restored to the decimal of 0.xxxxxx. This is done to represent ±0, and very small numbers close to 0.
E is all 1At this time, if the significant digits M are all 0, it means ± infinity (positive or negative depends on the sign bit s)
example
When viewed in the memory window in the compiler, the first line of the memory window is the address of a, which is the form of 5.5 in memory.
Epilogue
There will be a lot of text in this article. As long as you read and think carefully, it is understandable. Learning this knowledge is also improving yourself, so although it is not used much, you still need to understand it.