Floating-point type memory of them is how to store

Question Throws:

Disclaimer: This article is a blogger original article, reproduced, please attach the original source link and this statement. 2019-10-03,00: 56: 39.
By ----- drowning heart of the ups and downs ---- blog Park

  Why when subtracting two floating-point numbers, sometimes beyond our unexpected value it? For example 3.1415927 - 3.1415926 = 0.0000002? (The examples I have just cited, we do not care about this, in this example the value I have not come across, but I believe that when you are doing floating-point operations, there must be such a similar situation) which involves the precise problem.

1/3 can be expressed as a fraction very good, but if it does not allow expressed as a fraction? How to ensure as much as possible the number equal to 1/3 of it? I believe we all know, the greater the number of decimal places ...... 0.3 3 behind, showing even more closer.

  As the decimal system can not be accurately expressed 1/3! The same can not be an accurate representation of the binary system 1/10. This also explains why the floating-point subtraction to a "minus endless" precision loss problems.

Overview:

  float and double storage in the specification are compliant with the IEEE, float storage methods shown below:

 

 

 

 

  The double storage as shown below:

step:

   A float -type format is stored into memory the steps of:

  1, first the real number of the absolute value into a binary format

  2, the format of the real number of the binary point to the left or right by n bits, until the decimal point is moved to the right of the first significant digit. (Please decimal analogy scientific notation)

  3, first began counting from the right of the decimal numbers into the first twenty-three of 0 to 22. 

  4, if the real number is positive, then the first 31 into "0", otherwise put a "1."

  5, if n is left obtained, indicating index is positive, into position 30 "a." If n is obtained right or n = 0, into the position 30 "0."

  6, if n is obtained by left after subtracting 1 will be turned into a binary n, and add "0" to make up seven on the left, into the 29th to 23rd. If n is right to give or n = 0, plus the left post into a binary "0" to make up seven, then you negate, then add 29 to the first 23 will be n.

First, we need to figure out the following two questions:
  (1) How to convert a binary number decimal integer
  arithmetic is very simple, for example, 11 expressed as a binary number:
        11/2 more than 1
        5/2 I 1
        2 / more than 2 0
        1 / 1 2
         0 11 hex end expressed as (from bottom) 1011
  mention that here, except if he met after the result is a 0 on the end, we think, all integer divided by 2 is not necessarily able to 0 finally get it. In other words, all the integers into a binary number of wireless loop algorithm will not go on it? Absolutely not, integers can always be represented exactly in binary, decimal, but not necessarily.
  (2) How to decimal decimal to binary conversion
  algorithm is multiplied by 2 until a date no decimals. For example, 0.9 represents a binary number
        0.9 × 2 = 1.8 Integer Part 1
        0.8 (fractional part 1.8) × 2 = 1.6 Integer Part 1
        0.6 × 2 = 1.2 Integer Part 1
        0.2 × 2 = 0.4 rounded portion 0
        0.4 × 0.8 = integer part 2 0
        0.8 1.6 × 2 = integer part. 1
        0.6 1.2 × 2 = integer part. 1
        0.2 0.4 × 2 = integer part 0
        0.4 × 2 = 0.8 taking the integer part 0
          ...... 0.9 in binary is (from top to bottom): 11100110011001100 ...... Note : The above calculation cycle, I believe you also found, i.e. He said that this example 0.9 × 2 can never eliminate the fractional part, so to speak infinite loop algorithm. Obviously, the binary representation of the decimal sometimes impossible accurate. In fact, the reason is very simple, the decimal system can not accurately represent 1/3 of it? The same can not be an accurate representation of the binary system 1/10. This also explains why the floating-point subtraction to a "minus endless" precision loss problems.
  

For example:

  In Examples 8.25, 8.23 ​​to see how the calculation is stored in

  (1) The integer part is converted into binary 8

      8/2, more than 0

      4 / I 2 0

      2/2 0 Yu

      1/2 more than 1

   

   (2) converted into a binary fractional part 0.25

      0.25 × 2 = 0.5 taking the integer part 0

      0.5 × 2 = 1.0 Integer Part 1

   

   (3) the integer part and fractional part are converted into binary, then the binary representation of 8.25 to 1000.01, though can be so expressed, but the computer know it? All content (the number of letters, characters, symbols) in the computer world, only two kinds of patterns, 0 and 1, so this is not what we want to show the final number!

   (4) 8.25 ---- >>> 1000.01 representation (remember the decimal scientific notation it? Hex to another use) 3 power 1.00001 × 2, the index of 3 binary scientific notation.

    (5) First 8.25 We manifestations is positive, then 31 in fill 0 (remember said earlier signed number with Bowen unsigned do?), 0x7FFFFFFF, 0x80000000 these two numbers is that there is a limit of the number of symbols , so there is the highest number sign bit is 1, represents the negative, the highest bit is 0, represents a positive!

    但0x80000000它一定代表着负数吗?不是,在无符号数中,最高位为1,依然是正数。怎么区分,看使用者如何定义。这里我不做介绍了!

    正数,符号位,我们在第31位中填0,

    指数部分,我们的n是左移得到的,说明我们的指数为正,因此第30位我们填1,然后我们将指数n减1,得到2,它的二进制是10,并在左边添0,从第29位开始到第23位,我们凑够7位,因此指数部分是10000010

    

     (6)22 - 0尾数部分怎么填呢?我们8.25转换成二进制不是1000.01也即1.00001 × 2的3次方吗?我们取科学计数法小数点后面的数,从第22位开始,全部往里扔,后面不足全部补0

       

     (7)最终的表示:

       

     整理一下,我们得到:0100 0001 0000 0100 0000 0000 0000 0000,这便是我们最终需要的二进制表示形式,转换成16进制为:0x41040000,我们用编译器查看一下

 1 // xiaoyu1.cpp : Defines the entry point for the console application.
 2 //
 3 
 4 #include "stdafx.h"
 5 
 6 void Function()
 7 {
 8     float a = 8.25f;
 9 }
10 
11 int main(int argc, char* argv[])
12 {
13     Function();
14     return 0;
15 }

 

   

 1 6:    void Function()
 2 7:    {
 3 00401020   push        ebp
 4 00401021   mov         ebp,esp
 5 00401023   sub         esp,44h
 6 00401026   push        ebx
 7 00401027   push        esi
 8 00401028   push        edi
 9 00401029   lea         edi,[ebp-44h]
10 0040102C   mov         ecx,11h
11 00401031   mov         eax,0CCCCCCCCh
12 00401036   rep stos    dword ptr [edi]
13 8:        float a = 8.25f;
14 00401038   mov         dword ptr [ebp-4],41040000h
15 9:    }
16 0040103F   pop         edi
17 00401040   pop         esi
18 00401041   pop         ebx
19 00401042   mov         esp,ebp
20 00401044   pop         ebp
21 00401045   ret

 通过反汇编查看得知,压入栈中的8.25,是0x41040000

例子2

  既然8.25的会了,那么-8.25呢?如何表示呢?

  很简单,首先将-8.25的绝对值用二进制表示出来就行了!跟8.25表示形式一模一样,只不过第31位为1,因为是负数,所以-8.25的最终二进制表示为0xC1040000,我们反汇编查看一下。看是不是这个呢?

                                   

  经过反汇编观察,发现与我们计算结果一致吧~~~~~ 

例子3

  如何将一个只有小数部分的数用二进制来表示呢?(这也是一个难点,与有整数部分不一样的是,只有小数部分的数,通常它的科学计数法表示中指数为负),我们还没有探讨过指数为负的情况!

  我以0.25为例!

  0.25的二进制表示为:

      0.25  ×   2   =  0.5     取整数部分为  0

      0.5    ×   2   =  0.5     取整数部分为  1

   (1)0.25用二进制表示为0.01,用科学计数法来表示为1 × 2的负2次方,0.25为正数,因此第31为为0,指数n向右得到,因此第30为为0,指数n(-2)减1得到-3,-3用32位来表示是0xFFFFFFFD,FD为1111 1101,从最低位开始数7个数出来放入23到29中

      

   (2)尾数部分,0.01的科学计数法为1.0 × 2的负2次方,小数点后面为0,因此,22 -0位中全部添0即可,最终表示为0011 1110 1000 0000 0000 0000 0000 0000,用十六进制表示为0x3E800000,我们反汇编观察看看

例子4

  -0.25呢?如何表示呢?

  0xBE800000,读者自己去试试吧,其实也不用试,十分简单,我刚才不是算了0.25的了吗?只需要把0.25的二进制表示复制过来,然后第31位改成1就是-0.25的值了~~~~

float与double

  float在内存中占4字节,double在内存中占8个字节,如果读者您弄懂了上面的过程的话,相信你也一定明白了double的存储过程,double存储中提供了更多的尾数部分,这为小数点后面的精确度提供了更多的位数!使值更加精确,就像是99.99%与99.9999999999999999999999999999999999%的区别,就像是买黄金,如果只是商品交易,那么99.99%那么完全够用了,因为99.99%它就可以说成是24K纯金,但是科学实验中呢?99.99%纯吗?它肯定没有99.9999999999999999999999999999999999%纯度高对吧!因为后者表示的精度更高,在平时中的程序中,float代表的精度其实已经足够我们使用,因此,在平时代码编写中,float定义小数就足够了,及其特殊情况下我们采用double,明白了小数在计算机中是如何存数的了吗?

Guess you like

Origin www.cnblogs.com/Reverse-xiaoyu/p/11618913.html