Floating point representation (computer systems)

Fixed-point notation:

insert image description here
The symbol '.' is expressed as a binary dot, the right side of the dot is a negative power of 2, and the left side is a positive power of 2.
insert image description here
The binary of decimals can only represent those that can be written as x* 2 y 2^{y}2The number of y cannot be expressed exactly for those other numbers, but can only be expressed approximately.
And cannot efficiently represent very large numbers. (Increasing the length of the binary representation increases precision)

IEEE says:

insert image description here

  • Sign (sign): S determines whether the number is negative (S=1) or positive (S=0)
  • Mantissa (significand): M is a binary decimal
  • Exponent: E is to weight floating-point numbers, and the weight is 2 to the power of E

Bit division of floating point numbers:

  • A single sign bit S directly encodes the sign
  • The k-bit exponent code field exp encodes the exponent code E
  • The n-bit fractional field frac encodes the mantissa M (M is a binary fraction)

insert image description here
Single-precision floating-point format: float (s=1, exp=8, frac=23) 32 bits
Double-precision floating-point format: double (s=1, exp=11, frac=52) 64 bits
According to the value of exp, it is The encoded value can be divided into three cases:
insert image description here

  1. Normalized value:
    The bit pattern of exp is neither all 0s nor all 1s.
    The exponent code field:
    interpreted as a signed integer expressed in offset form (single precision -126~127) (double precision -1022~1023); the value of the exponent code: E=e-Bias.
    ①e: (unsigned number) ek − 1 e_{k-1}ek1and 1 and 0 and_1 and_0e1e0
    ②Bias: 2 k − 1 2^{k-1} 2The offset value decimal field of k 1
    -1 :
    the mantissa is defined as M=1+f (implicit representation starting with 1)

  2. Denormalized value:
    when the exponent is all 0s.
    Bias field:
    the subcode value is E=1-Bias.
    Decimal field:
    the value of the mantissa is M=f.
    ① Denormalized numbers effectively avoid the generation of +0.0 and -0.0
    ② For numbers close to 0.0, the number gradually overflows, and the possible number distribution is evenly close to 0.0

  3. Special value:
    the exponent code is all 1, the decimal is all 0, and the obtained value represents infinity. s=0 is positive infinity, s=1 is negative infinity.
    The exponent code is all 1, the decimal is not 0, and the result is called "NaN" (not a number).

IEEE representation example 1:

Assumed 8-bit floating-point format, where k=4 (order code bits), n=3 (decimal bits), and the bias (Bias) is 2 4 − 1 2^{4-1}241-1=7.

insert image description heree = ek − 1 e=e_{k-1}e=ek1and 1 and 0 and_1 and_0e1e0(unsigned number)
E:

  • Normalization: E=e-Bias
  • Denormalization: E=1-Bias

f = 0. f n − 1 f=0.f_{n-1} f=0.fn1 f 1 f 0 f_1 f_0 f1f0(binary value)

M:

  • Normalization: M=1+f
  • Denormalization: M=f

V = ( − 1 ) s × M × 2 E V=(-1)^{s}×M×2^E V=(1)s×M×2E

IEEE representation example two:

Conversion methods between integers and floating point numbers:

insert image description here

Guess you like

Origin blog.csdn.net/qq_44044341/article/details/109248092
Recommended