C language - the storage method of double and float in memory

This article mainly introduces the storage method of double and float data types in C language

Introduction to double and float storage methods

In terms of storage structure and algorithm, double and float are the same. The only difference is that float is 32 bits and double is 64 bits, so double can store higher precision.

Any data is stored in binary (0 or 1) order in memory, each 1 or 0 is called 1 bit, and a byte on x86CPU is 8 bits.

For example, the value of a 16-bit (2 bytes) short int variable is 1000, then its binary expression is: 00000011 11101000.

Due to the architecture of the Intel CPU, it is stored in reverse byte order, so it should be like this: 11101000 00000011, which is the structure of the fixed-point number 1000 in memory.

The following are the specific specifications:
The storage method of float in memory:
float occupies 4 bytes (32bit) in memory, 32bit=sign bit (1bit)+exponent bit (8bit)+base digit (23bit)

insert image description here

How double is stored in memory:

type sign bit exponent code mantissa length
float 1 8 23 32
double 1 11 52 64

How to store it?

At present, the C/C++ compiler standard follows the floating-point number notation formulated by IEEE to perform float and double operations.

This structure is a kind of scientific notation, which is represented by symbols, exponents and mantissas, and the base is set to 2—that is, a floating-point number is expressed as the mantissa multiplied by 2 to the power of the exponent and then a sign is added.

Both single precision and double precision are divided into three parts in storage:

  1. Sign bit (Sign) : 0 means positive, 1 means negative
  2. Exponent (Exponent) : used to store scientific notation 指数数据, and adopted 移位存储. It may be 阶码more accurate to call it, and forget it after learning the computer principles. 127 and 1023 are to consider the sign of the exponent.
  3. mantissa part (Mantissa) : the mantissa part

First look at the float type (take 2.25 as an example)

  • Step 1: The value of the sign bit (accounting for 1 bit)

It is easy to see that this number is positive, so the sign bit is 0.

  • Step 2: The value of the exponent (8 bits)

    • The first step: first convert the decimal 2.25 to binary 0010.01;

    • Step 2: Express 10.01 in binary scientific notation as 1.001;

    • Step 3: Write the value obtained in the second step as an exponential form 1.001*(2^1);

    • Step 4: Convert the exponent value 1+127=128, convert 128 into binary form (1000 0000) and write it to the exponent.

  • Step 3: The value of the mantissa (23 bits)

  • Write the number 1.001 obtained in the second step of step 2 and the three-digit number 001 after the decimal point to the exponent, and fill the remaining digits with 0.

So the representation of the single-precision floating-point number 2.25 in memory is:

0 1000 0000 00100000000000000000000


Look at the double type again (take 2.25 as an example)

  • Step 1: The value of the sign bit (accounting for 1 bit)

It is easy to see that this number is positive, so the sign bit is 0.

  • Step 2: The value of the exponent (11 bits)

    • The first step: first convert the decimal 2.25 to binary 0010.01;

    • Step 2: Express 10.01 in binary scientific notation as 1.001;

    • Step 3: Write the value obtained in the second step as an exponential form 1.001*(2^);

    • Step 4: Convert the exponent value 1+1023=1024, convert 124 into binary form (100 0000 0000) and write it to the exponent.

  • Step 3: The value of the mantissa part (accounting for 52 bits)

  • Write the number 1.001 obtained in the second step of step 2 and the three-digit number 001 after the decimal point to the exponent, and fill the remaining digits with 0.

So the representation of the double-precision floating-point number 2.25 in memory is:
0 100 0000 0000 0010000000000000000000000000000000000000000000000000

Guess you like

Origin blog.csdn.net/baidu_33256174/article/details/130716263