Representation range and precision of Java float and double types


Expand
Implicitly, it seems that there is a trap in determining the size of floating-point numbers, because the underlying binary numbers cannot accurately represent all decimal numbers. Sometimes things can be inexplicable.

As in java,  

        0.99999999f==1f //true 

        0.9f==1f //false

To understand these, we must first understand the structure of float and double in memory

1.
The range of the memory structure float and double is determined by the number of digits of the index.
The exponent of float has 8 digits, while the exponent of double has 11 digits, the distribution is as follows:

float:

        1bit (sign bit) 8bits (exponent bit) 23bits (mantissa bit)                            
            


double:
        1bit (sign bit) 11bits (exponent bit) 52bits (mantissa bit)
So, the exponent range of float is -128 ~ + 127, and the exponent range of double is -1024 ~ + 1023, and the exponent bit is in complement Divided by form.
The negative exponent determines the non-zero number with the smallest absolute value that can be expressed by a floating point number; and the positive exponent determines the number with the largest absolute value that can be expressed by a floating point number, which also determines the range of floating point values.
The range of float is -2 ^ 128 ~ + 2 ^ 127, which is -3.40E + 38 ~ + 3.40E + 38; the range of double is -2 ^ 1024 ~ + 2 ^ 1023, which is -1.79E + 308 ~ + 1.79E + 308.


2. Precision The precision of
float and double is determined by the number of digits in the mantissa. Floating point numbers are stored in memory according to scientific notation. The integer part is always an implicit "1". Because it is unchanged, it cannot affect accuracy.
float: 2 ^ 23 = 8388608, a total of seven digits, because the leftmost digit is omitted, which means that it can represent up to 8 digits: 2 * 8388608 = 16777216. There are 8 significant digits, but the absolute guarantee is 7 digits, that is, the precision of float is 7 ~ 8 significant digits;
double: 2 ^ 52 = 4503599627370496, a total of 16 digits, the same, double precision is 16 ~ 17 Bit.
The reason why f1 == f2 cannot be used to judge that two numbers are equal is that although f1 and f2 may be two different numbers, but due to the limitation of the precision of floating point numbers, it may be wrong to judge that the two numbers are equal!

We can use the following code to check:

        float f1 = 16777215f;
        for (int i = 0; i < 10; i++) {
            System.out.println(f1);
            f1++;
        }

For decimals, it is easier to make mistakes because of precision.

        float f = 2.2f;
        double d = (double) f;
        System.out.println(d); 
        f = 2.25f;
        d = (double) f;
        System.out.println(d); 

The output is:
        2.200000047683716
        2.25

The output of this simple number will be like this, which is simply unbearable.

In fact, through the introduction of the above two stored results, we have probably found the answer. First of all, let's look at the single-precision storage method of 2.25, which is 10.01 when converted into binary digits, and it is very simple to sort into 1.001 * 2 

So we can write out the memory distribution of 2.25: the 
        sign bit is: 0
        exponent is 1, 0000 0001 is represented by the complement, and the shift code is 1000 0001.
        The mantissa is 0010 0000 0000 0000 0000 000

The double precision of 2.25 is expressed as: 0 100 0000 0001 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000, so that the value of 2.25 will not change during the forced conversion, and let us look at 2.2 again, 2.2 It should be expressed in scientific notation: the method of converting decimal decimal to binary decimal is to convert the decimal * 2 to the integer part, so 0.282 = 0.4, so the first digit of the binary decimal is the integer part 0.4 of 0.4, 0.4 × 2 = 0.8, the second digit is 0, 0.8 * 2 = 1.6, the third digit is 1, 0.6 × 2 = 1.2, the fourth digit is 1, 0.2 * 2 = 0.4, and the fifth digit is 0, so it will never be It may be multiplied by = 1.0, and the resulting binary is an infinitely circular arrangement 00110011001100110011 ... For single-precision data, the mantissa can only represent 24-bit precision, so the float of 2.2 is stored as:

However, this storage method, converted to a decimal value, will not be 2.2, because the decimal may be inaccurate when converted to binary, such as 2.2, and the double type data also has the same problem, so it is expressed in floating point numbers There will be some errors in the process. When converting single precision to double precision, there will also be errors. For example, the following code will output different results:

        float f = 2.2f;
        double d = (double) f;
        System.out.println (f);
        System.out.println (d);
For decimal data that can be represented in binary, such as 2.25, this error will not exist , So the strange output above will appear.
 

Published 19 original articles · praised 4 · 170,000 views +

Guess you like

Origin blog.csdn.net/u011250186/article/details/105678048