IEEE Standard for Storing Floating Point Numbers

  The Institute of Electrical and Electronics Engineers (IEEE) has defined several standards for storing floating point numbers. Here is a brief description of the two longest-used ones -- single precision and double precision.

Before introducing single precision and double precision, we need to introduce the residual code system.

Residual code system

  The mantissa can be stored as an unsigned number. The exponent (ie, the power that shows how many decimal places should be shifted left or right) is a signed number. Although this can be stored in two's complement notation, a new notation called the remainder system takes its place. In this remainder system, both positive and negative integers can be stored as unsigned numbers. To represent positive or negative integers, a positive integer (called an offset) is added to each number, shifting them uniformly to the non-negative side. The value of this offset is 2 m-1-1 , where m is the size of the memory cell storage index.

example:

  We can represent 16 integers in a number system with a 4-bit storage unit. Use one unit as 0 and separate the other 15 integers that can be represented in the range -7 to 8, as shown below. By adding 7 units to each integer in the range, it is possible to uniformly shift all the integers backwards so that they are all integers without changing the relative positions of the integers. As shown, the new system is called the remainder 7, or offset notation with an offset of 7.

Remainder 1.jpg

 OK, that's it for the paving.

Residual code.jpg

  The single-precision format uses a total of 32 bits to store a real number in floating-point representation. The sign occupies 1 bit (0 is positive, 1 is negative), the exponent occupies 8 bits (using offset 127), and the mantissa uses 23 bits (unsigned). The standard is sometimes called Excess_127 because the offset is 127.

  The double-precision format uses a total of 64 bits to store a real number in floating-point representation. The sign occupies 1 bit (0 is positive, 1 is negative), the exponent occupies 11 bits (using offset 1023), and the mantissa uses 52 bits. The standard is sometimes called Excess_1023 because the offset is 1023.

IEEE.jpg

  Contrasting the figure above, a real number can be stored in the IEEE standard floating point format using the following steps:

(1) Store the symbol (0 or 1) in S;

(2) Convert numbers to binary;

(3) normalization (in order to unify the fixed part of the representation, both scientific notation (for decimal) and floating-point notation (for binary) use a unique non-zero digit to the left of the decimal point, which is called normalization);

(4) Find the values ​​of E and M;

(5) Connect S, E, M.

example:

  Write the remainder 127 (single precision) representation of the decimal number 5.75.

untie:

(1) The sign is positive, so S=0;

(2) Convert decimal to binary: 5.75=(101.11) 2 ;

(3) Normalization: 5.75=(101.11) 2 = (1.0111) 2 ×2 2 ;

(4)E=2+127=129=(10000001)2,M=(0111)2;

(5) The numbers stored in the computer are

01000000101110000000000000000000


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325739671&siteId=291194637