[C language advanced] storage of floating point numbers in memory && taking data out of memory (IEEE754 standard)

  

 

We have also used some floating point numbers before, so how is it stored in memory and how can it be taken out of memory? Is it the same as integer storage? Take you to understand it through this article

content

Common floating point numbers:

 Floating point and integer are stored differently

how to save

IEEE 754 states:

 Notice:

Index E is fetched from memory

1. E is not all 0 or not all 1

2. E is all 0

3.E is all 1s


Common floating point numbers:

3.14159 1E10 (scientific notation) The floating-point family includes: float, double, long double types. The range represented by floating-point numbers can be found in the compiler installation path. The file float.h has instructions.

 Floating point and integer are stored differently

First look at the following code:

#include <stdio.h>
int main()
{

	int n = 9;
	float* pFloat = (float*)&n;
	printf("n的值为:%d\n", n);
	printf("*pFloat的值为:%f\n", *pFloat);


	*pFloat = 9.0;
	printf("num的值为:%d\n", n);
	printf("*pFloat的值为:%f\n", *pFloat);
	return 0;
}

Can you think about the result of running this code?

operation result:

analyze:

In this code, if 9 is stored as an integer, and then accessed as an integer, the print is still 9, but if it is printed as a floating-point type, it is indeed 0.000000; 9.0 is stored as a floating-point type. , if it is printed as an integer, it is another number, but if it is printed as a floating-point number, there is no problem. It can be seen from this that the storage methods of floating-point and integer are different.

how to save

According to the international standard IEEE (Institute of Electrical and Electronics Engineering) 754, any binary floating-point number V can be represented in the following form:

  • (-1)^S * M * 2^E

(-1)^s represents the sign bit, when s=0, V is positive; when s=1, V is negative

M represents a significant number, greater than or equal to 1, less than 2

2^E means exponent bit

Example:

The following represents 5.5 in binary scientific notation:

Converted to binary: 101.1

Written in scientific notation: 1.011*2^2

The number is then written in IEEE standard form

The number is positive, s==0; M==1.011; E==2

Written in standard form it is (-1)^0 *1.011 *2^2

So in fact, when storing such a floating-point number, you only need to store S, M, and E. When needed, add -1 and 2 to this number. So how do S, M, and E exist?

IEEE 754 states :

For a 32-bit floating point number, the highest 1 bit is the sign bit s, the next 8 bits are the exponent E, and the remaining 23 bits are the significand M

As shown below:

 What if it is of type double?

As shown below:

For 64-bit floating-point numbers, the highest 1 bit is the sign bit S, the next 11 bits are the exponent E, and the remaining 52 bits are the significand M

 Notice:

Floating point numbers are indeed stored like this, but there are some things to be aware of

  1. There are 11 bits to store E, but it is not directly stored in E, but something related to E is stored
  2. M is greater than or equal to 1, less than 2, that is, no matter when M is 1.xxxxx, then you don't need to store 1 every time, you only need to store the number after the decimal point.

IEEE 754 stipulates that when M is stored in the computer, the first digit of this number is always 1 by default, so it can be discarded, and only the following xxxxxx part is saved. For example, when saving 1.01, only 01 is saved, and when it is read, the first 1 is added. The purpose of this is to save 1 significant figure. Taking a 32-bit floating point number as an example, there are only 23 bits left for M. After rounding off the 1 in the first digit, it is equivalent to saving 24 significant digits.

E is an unsigned integer (unsigned int) which means that if E is 8 bits, its value range is 0~255; if E is 11 bits, its value range is 0~2047. However, we know that E in scientific notation can have negative numbers, so IEEE 754 stipulates that the real value of E must be added with an intermediate number when stored in memory . For 8-bit E, this intermediate number is 127; For an 11-bit E, this intermediate number is 1023. For example, the E of 2^10 is 10, so when it is stored as a 32-bit floating point number, it must be stored as 10+127=137, which is 10001001.

We can verify the previous 5.5

Standard form of 5.5: (-1)^0 *1.011 *2^2

i.e. S->0

E->2+127==129--->10000001

M->011---->01100000000000000000000 (less than 23 digits, fill with 0)

So the memory should be 0100 0000 1011 0000 0000 0000 0000 0000

Converted to hexadecimal: 40 B0 00 00

look in memory

Index E is fetched from memory

The index E is fetched from memory and can be further divided into three cases

1. E is not all 0 or not all 1

This is the normal case, the exponent E calculates the value minus 127 or 1023 and prepends M with a 1 and a decimal point

2. E is all 0

At this time, the exponent E of the floating-point number is equal to 1-127 (or 1-1023), which is the real value, and the significant number M is no longer added with the first 1, but is restored to the decimal of 0.xxxxxx. This is done to represent ±0, and very small numbers close to 0

3.E is all 1s

If E is all 1, it will directly represent a positive or negative infinity number

-----------------------------------------------------------------

-------------The storage of C language floating point numbers in memory is completed---------

Regarding the C language, each knowledge point will be written in a separate blog for a more detailed introduction.

Welcome everyone to pay attention! ! !

Learn to communicate together! ! !

Let's get programming to the end! ! !

--------------It's not easy to organize, please support three consecutive -------------------

Guess you like

Origin blog.csdn.net/weixin_46531416/article/details/120756273