Study Notes: floating-point representation, the difference between long and double

Floating point representation

IEEE754 floating-point representation: N = (-1)^s x m x 2^ewherein s is the sign bit, m is the mantissa bits, e is exponent.

kind Sign bit Exponent bits Mantissa bits
float Section 31 (accounting for 1bit) Bits 30-23 (accounting for 8bit) The first bits 22-0 (accounting for 23bit)
double Section 63 (accounting for 1bit) The first 62-52 bits (accounting for 11bit) The first bits 51-0 (accounting for 52bit)

For single-precision floating-point (float), the symbol a bit, eight exponent, 23 mantissa. Index can be represented by range index -128 to 127, the mantissa is 23 bits.

float and double precision is determined by the number of bits of the mantissa. Float in the memory is based on scientific notation to store its integer part is always an implicit "1", because it is constant, it can not affect the accuracy.

  • float: 2 ^ 23 = 8388608, a total of seven, which means up to seven significant digits, but the absolute guarantee of 6, i.e. a float precision significant digits 6-7;
  • double: 2 ^ 52 = 4503599627370496, a total of 16, Similarly, double precision of 15 bits to 16 bits.

Wherein the absolute value of the negative index determines the minimum non-zero floating-point number can be expressed; the index n determines the absolute value of the maximum number that can be expressed in floating-point, i.e., it determines the range of floating-point numbers. (Index binary to decimal)

  • Float range from -2 to +2 ^ 128 ^ 128, i.e. -3.40E + 38 ~ + 3.40E + 38.
  • Double the range of -2 to +2 ^ 1024 ^ 1024, i.e. -1.79E + 308 ~ + 1.79E + 308.

The calculation formula N = (-1)^s x m x 2^eto float as chestnuts, but s = 0, m = 23, e = 8 when:

+1.1111111111111111111111 x 2 ^ 127 (1 23 after the decimal point, since the scope of the mantissa 1 to 2, the highest bit is always 1, so that only the fractional part of the access, it is 23 to 1 decimal), is approximately equal to 2 x 2 ^ 127 = 3.4 x 10 ^ 38. Negative versa.

The difference between long and double

is represented by a long integer data, or the system may be different in different languages ​​is defined, in memory 32 or 64, i.e., it is represented by a binary integer representation of 32/64, a maximum of 2 ^ 63- 1.

Double but not the same, the floating-point representation to floating-point representation as EEE754: N = (-1)^s x m x 2^eits range and accuracy are determined by the index and number of bits. He said 32 or 64 of the float means 符号位+指数位+尾数位.

Not the same in both representations, not simply to compare the size of their share of the data digits.

Added: understood that system 32 and the system 64

A 32-bit CPU system can process 32-bit data, 4 bytes i.e. one process; a 64-bit CPU can process 64-bit data, i.e., one process 8 bytes.

x86 and x86-64 What difference does it make? It refers to a 32-bit x86 Intel processors in general; and x86_64 is the 64-bit Intel processors in general.

Guess you like

Origin www.cnblogs.com/small-world/p/11622714.html