It represents the operation of the data - Float

It represents the operation of the data - Float

Preface:

  • Computer, and floating-point number into fixed-point. With respect to the floating-point, fixed-point number is better understood, anti-code complement the original code frameshift. The float is very complicated.
  • On complex floating point, I think the best explanation is that, \ (William \ M. Kahan \) for its outstanding contribution in the development of floating-point operations standards obtained Turing Award. \ (Kahan \) is a float \ (IEEE754 \) standard main designer.

Float acquaintance:

  • If you say that we want to represent such a numerical speed of light, how can we do?
    • \ (1: \) using integer to write his way out, then that \ (300 ... 00m / S \) . Such figures very long, not save to your computer.
    • \ (2: \) using scientific notation, it is \ (. 8. 3 * 10 ^ {m} / S \) , then if I want to save this number, I only need three recording information, the first one is \ (3 \) , the second is the \ (10 \) , the third is the \ (8 \) .
  • Comparison of two methods:
    • Obviously the first method we need to use more storage space to store it, and for scientific notation, we do not need to record so much the number but it can represent the same value.
    • For a computer, can only recognize \ (0/1 \) symbol, which is implemented in hardware is more convenient and simple. So this time we can put \ (10 \) the base number to cancel, he is the computer's default \ (2 \) , so that we can save only two digits to represent such a large number.
    • For indicates the diameter of the electron mass, solar system, such extreme numbers, the advantages of scientific notation becomes even more obvious. It is more convenient.
  • But we can also be found, if not the number I wish to express \ (300 ... 000 \) , but \ (29935 ... 13 \) such numbers, and I employ scientific notation \ (2.9 * 10 ^ {8} \) , then I will inevitably lose some accuracy.
  • In my understanding it seems, floating-point representation is expressed with an accuracy range of exchange.
  • So here we seem to be able to understand why \ (c \) language, \ (double \) than \ (float \) can be expressed with higher precision, because \ (double \) higher number of bits, he can represent more more decimals.
  • I can understand why the number of floating-point numbers called float, because as I have different numbers of index, decimal point position is also changed.

Floating-point representation:

  • Typically, floating-point number can be expressed as: \ (N = R & lt E * M ^ \) .

    • Wherein \ (R & lt \) is a bottom exponent, typically \ (2 \) , and the same radix mantissa.

    • \ (E \) is the order of the code, \ (M \) is the mantissa.

    • As follows:

    • Order breaks The value of the exponent part The number marks Mantissa value portion
      \(J_f\) \(J_1J_2,...,J_m\) \(S_f\) \(S_1S_2,...,S_n\)
  • Order code is an integer, order and order code symbols together represent the actual position and represents a range of decimal floating-point numbers;

  • Symbols indicate positive or negative number, the mantissa value represented by the floating-point precision.

Normalized floating-point numbers:

  • Look at the door to talk about what the mountain is called normalization.
  • Normalized provides a maximum number of bits mantissa must be a valid value.

  • By reading the above, we can see, to get maximize accuracy, then we need to hold a valid digital mantissa part as much as possible.
  • For example, these two numbers (binary)
    • \ (2 ^ {10} * 0.01 \) and \ (2 ^ {01} * 0.1 \) .
  • These two numbers are equal, but the second number can obviously save a little on the mantissa \ (0 \) , so this time we can be normalized floating-point number, so that he can represent higher accuracy.
  • The so-called normalization, is through certain changes in the operation of the floating-point mantissa and exponent of size, so a float (non-zero) at the highest level to ensure the mantissa is a valid value.
  • The following two methods:
    • Left rules: when a non-normalized floating point computation result is normalized required, an arithmetic left shift the mantissa and exponent minus one (binary). (Left regulations may require multiple)
    • Right rules: When mantissa overflow floating-point operations, which is a double symbol appeared \ (01/10 \) , you need to mantissa and exponent arithmetic right one plus one. Right regulation only once.
  • Then the range of normalized floating-point mantissa is \ (\ FRAC. 1} {2} {\ Leq | M | \ Leq. 1 \) .
  • analysis:
    • Suppose that the original mantissa code:
      • positive number:
        • Is the maximum number of \ (0.11 ... 111 \) , then the truth value \ (1-2 ^ {-} n-\) .
        • When the minimum value of the \ (0.10 ... 000 \) , then the truth value \ (\ FRAC. 1} {2} {\) .
        • The range of the absolute value \ ([\ FRAC {{2}}. 1, 1-2 ^ {-} n-] \) .
      • negative number:
        • Is the maximum number of \ (1.10 ... 00 \) , then the truth value \ (- \ FRAC. 1} {2} {\) .
        • Is the minimum value of the \ (1.11 ... 11 \) , then the truth value \ (- (1-2 ^ {- n-}) \) .
        • It is the absolute value range \ ([\ FRAC {{2}}. 1, 1-2 ^ {-} n-] \) .
    • Suppose that the two's mantissa:
      • positive number:
        • A positive number of the same complement the original code, do not do analysis.
      • negative number:
        • It is the maximum negative \ (1.011 ... 1 \) , the minimum value of \ (1.00 ... 0 \) .
        • Worth range \ ([\ FRAC. 1} {2} {+2 ^ {-} n-,. 1] \) .
        • It should be noted. The maximum value is not ending in \ (1.10 ... 00 \) in such a form as (1.10 ... 000 \) \ is not a normalized number.
        • Here I checked some information, I am more recognized in two different ways, the first is for \ (1.10000 \) , then I can for him normalized to \ (1.00000 \) ; the second is to facilitate design of the machine, for the original code, I can judge his mantissa most significant bit is not \ (1 \) to determine whether he normalized to complement, I can determine whether the number of his character and the highest bit mantissa to determine whether he is the same normalized.

IEEE754 standard:

  • According to \ (the IEEE754 \) standard floating point number format is as follows:

  • The number marks Order code (represented by frameshift) Mantissa (represented by the original code, hidden highest \ (1 \) )
    \(m_s\) \(E\) \(M\)
  • In order to represent the most significant increase in floating point precision, as if we mantissa highest \ (1 \) we will hide it. For example, if that ending \ (1011 \) , then we store \ (011 \) .

  • \ (float \) and \ (double \) are satisfied \ (IEEE754 \) standard floating-point number.

  • Order code is present in the form of shift. For short float \ (a float \) , the offset value \ (127 \) , for long float \ (Double \) , the offset value \ (1023 \) .

  • So you can ask: I first \ (E \) is calculated as complement form its value, then subtract \ (127/1023 \) is his frameshift representative value.

    • \ ((-. 1) ^ 2 ^ * S * 1.M E-127} {\) (short floating point).
    • \ ((-. 1) ^ 2 ^ * S * 1.M E-1023} {\) (float length).

Floating-point addition and subtraction:

  • Floating-point arithmetic operations require the exponent and mantissa arithmetic spaced apart. And divided into the following steps:

    • To order
    • Mantissa subtraction
    • Normalization
    • included
    • Analyzing overflow: characteristic overflow
  • Then analyze one by one.

  • To order:
    • The purpose of the order is for two operands exponent equal. The method principle is in line with a small order to big order. The smaller mantissa right order code, order code plus a known exponent equal. Of course, because the need to give up the right data, the accuracy will be affected.
  • Mantissa subtraction:
    • After order for fixed-point addition and subtraction.
  • Normalization:
    • A left or right regulatory compliance in accordance with the normalization described above.
  • included:
    • Rules of order and right in the process, may be lost mantissa low, causing the error, rounding common methods are:
      • \ (1: \) \ (0 \) homes \ (1 \) into law: the right time if the mantissa is \ (0 \) , it is discarded; if it is \ (1 \) , then at the end of the mantissa \ (+1 \) , so it is possible for mantissa overflow, the need for a right-compliance. (Assuming that all ending in 1, plus it will have to carry a carry overflow).
      • \ (2: \) constant set \ (1 \) method: After losing ending in either \ (1 \) or \ (0 \) , the fill \ (1 \) , which may make larger or smaller mantissa .
  • Overflow judgment:
    • The last step needed to determine overflow.
    • In the floating-point double-normalized mantissa part already know the sign bit appears \ (01, 10 \) does not mean overflow, this regulation can be the right number.
    • Floating-point overflow is determined by the order code. Double exponent symbol appears \ (01/10 \) , this time on the overflow.
      • \ (10: \) exponent less than the minimum order code, according to the machine zero process.
      • \ (01: \) exponent greater than the maximum exponent, entering the interrupt handler.

Guess you like

Origin www.cnblogs.com/zxytxdy/p/11909332.html