C language - storage of floating point numbers in memory

Table of contents

1. Related introduction

2. How to store

(1) Step 1:

(2) Step 2: 

(3) Step three:

3. Example:

 4. Relevant rules when taking out:


1. Related introduction

1. In floating point type: 1E10 means 1.0*(10^10)

2. The value range of the integer family can be viewed in the header file <limits.h>, and the value range of the floating-point family can be viewed in the header file <float.h>.

3. The binary system of decimals needs to be used when storing, so we first understand the binary system of decimals.

2. How to store

(1) Step 1:

1. According to the international standard IEEE754. Any binary floating point number V can be expressed as: (-1)^S * M * 2^E .

explain:

①: -1^(S) represents the sign bit. When S==0, (-1)^S==-1, it represents a negative number; when S==1, (-1)^S==1, it represents a positive number.

②: M represents a significant number. And 1<=M<2.

③: 2^E represents the exponent bit.

It may seem abstract in this way. Let’s give an example, as shown below:

(2) Step 2: 

2. Then IEEE754 stipulates:

①: For a 32-bit floating point number, the highest bit stores the value of S, the next eight bits store the value of E, and the remaining 23 bits store the significant number M.

 ②: For a 64-bit floating point number, the highest bit stores the sign bit S, then the 11th bit stores the exponent E, and the remaining 52 bits store the significant digit M.

(3) Step three:

3. Next, some special regulations will be introduced. 

①: Because scientific notation is used for the significant number M, it can always be written as 1.xxxxx..., so when storing, the first 1 can be discarded, and only the decimal part after .xxxxx.. is saved. , and then add 1 when it is read.

Purpose: This can save one significant digit. Taking a 32-bit floating point number as an example, M is left with only 23. After truncating 1, it equals to saving 24 significant digits.

②: For the exponent E, first of all, E is an unsigned integer. If E is eight bits, its value ranges from 0 to 255; if E is 11 bits, its value range is from 0 to 2047; but in scientific notation E may be a negative number, so IEEE754 stipulates that when storing E, the real value must be added to an intermediate number to ensure that E is a positive number or 0. The intermediate number rules are as follows:

For an 8-bit E, the middle number is 127

For an 11-digit E, the middle number is 1023

3. Example:

After knowing the above rules, you can store a floating point number. Next, we take the example of storing 5.5 in 32 bits.

 4. Relevant rules when taking out:

For removal, E is divided into three situations:

1.E is neither all 0 nor all 1 (normal situation): subtract 123 (1023) from the calculated value to get the true value of E

2.E is all 0: At this time, IEEE754 stipulates that E==1-127 (1-1023), and M no longer adds the first 1, but is restored to a decimal of 0.xxxx... The purpose of this is to represent plus and minus 0, as well as numbers close to 0

3. When E is all 1: At this time, the significant digits M are all 0, which means that the number is plus or minus infinity.

This ends this knowledge, I hope it is helpful to you.

Guess you like

Origin blog.csdn.net/hffh123/article/details/132229285