Java floating point numbers

 

floating point structure

  To clarify the value range and precision of Java floating-point numbers, you must first understand the representation method of floating-point numbers and the structure of floating-point numbers. The reason for this so-called structure is because the machine only knows 01. You want to represent decimals, If you want the machine to recognize the decimal point, you must use a certain method. For example, simply, float has four bytes, the first two bytes represent integer bits, and the last two bytes represent decimal places (this is a kind of rule standard ), thus forming a floating-point number. Floating point numbers in Java use the IEEE 754 standard.

IEEE 754

  I won't go into detail about what IEEE 754 is here, just talk about the specific content directly. If you are interested, you can Baidu by yourself.

float

Sign bit (S): 1bit Exponent bit (E): 8bit Mantissa (M): 23bit


A float4 byte is 32 bits, divided into three parts: sign bit, exponent bit, mantissa bit.
(1). Sign bit (S): The highest bit (31 bits) is the sign bit, indicating the positive or negative of the entire floating-point number, 0 is positive, 1 is negative;
(2). Exponent bit (E): 23-30 bits A total of 8 bits are exponent bits, where the base of the exponent is specified as 2 (value range: 0~255). The final result format of this part is: 2E−1272E−127, that is, the range -127~128. In addition, the standard also stipulates that when the exponent bits are all 0 or all 1, the floating-point number is in an irregular form (the mantissa is different at this time), so the real range of the exponent bits is: -126~127.
(3). Mantissa (M): 23 bits from 0-22 are the mantissa, indicating the mantissa of the fractional part, that is, the form is 1.M or 0.M. As for when it is 1 and when it is 0, then It is determined by the exponent and the mantissa. A number in which the most significant digit of the fractional part is 1 is called normal (normalized) form. A number whose most significant digit in the fractional part is 0 is said to be in denormal (denormalized) form, otherwise it is a special value. The final float value = (−1)S∗(2E−127)∗(1.M)(−1)S∗(2E−127)∗(1.M). The specific form is as follows:

symbol

index

part

Exponential Section - 127

mantissa part

fractional part

most significant bit

form

1

255

128

non-0

no

NaN

1

255

128

0

no

negative infinity

1

1~254

-126~127

any

1

normal form (negative number)

1

0

-127

non-0

0

Irregular form (negative number)

1

0

-127

0

no

minus 0

0

0

-127

0

no

positive 0

0

0

-127

non-0

0

Irregular form (positive numbers)

0

1~254

-126~127

any

1

normal form (positive number)

0

255

128

0

no

Positive infinity

0

255

128

non-0

no

NaN

double

Sign bit (S): 1bit Exponent bit (E): 11bit Mantissa (M): 52bit


  double here is similar to float, except that the length of double is larger, so the range is larger, but the rules are the same. The value of double = (−1)S∗(2E−1023)∗(1.M)(−1)S∗(2E−1023)∗(1.M).

Ranges

According to Table 1, the value range of float: 
negative infinity —— −2128−2128 ~~~ −2−149−2−149 —— 0 —— 2−1492−149 ~~21282128 —— positive infinity 
1). The "-" above means that the value cannot be taken in the middle. For example, the value between negative infinity and −2128−2128 cannot be obtained (in fact, 128 cannot be obtained, but it is only close to the approximate value), but this does not mean that, Any value of "~" can be obtained. It should be noted that floating-point numbers have precision and cannot represent any small absolute value. Also, infinity in Java is represented as:

Float.POSITIVE_INFINITY或Double.POSITIVE_INFINITY//表示正无穷大
Float.NEGATIVE_INFINITY或Double.NEGATIVE_INFINITY//负无穷大
//他们打印的结果:+/-Infinity
float f1 = (float)Math.pow(2,128);//指数>=128的,打印结果:Infinity
//上面要加(float)强制转换,否则编译提示出错,详细可参考前一节:Java变量数据类型
float f2 = (float)Math.pow(2,127);//1.7014118E38
System.out.println(Float.MAX_VALUE);//3.4028235E38
//其他测试,读者可自行测试

2). The result of -149: According to the above theory, it should be 150 (the index is all 0, then the index value = 0 -127, at this time the mantissa is the smallest, 2−232−23, then -127-23 = -150), I don't know why it is 149. The information I found is that all 0s and all 1s are special values ​​and are not used as values ​​within the range. The maximum and minimum values ​​of the float above are close to 21282128). So value = (−1)S∗(2−126)∗(2−23)(−1)S∗(2−126)∗(2−23) = +/-2−1492−149

float f3 = (float) Math.pow(2,-149)//1.4E-45,小于-149,结果则为0.0
Float.MIN_VALUE //1.4E-45

The value of double is the same as float: 
negative infinity —— −21024−21024 ~~~ −2−1074−2−1074 —— 0 —— 2−10742−1074 ~~2102421024 —— positive infinity 
1074 =| (-1022) - (52)|

  In addition, note that in the table, there are also NaN, which means non-numeric values, for example:

System.out.println(0.0/0.0);//打印结果:NaN。注意不能是 0/0
//NaN表示计算错误,具体出现情况,可以参考表中
//Float.NaN或 Double.NaN 也能直接表示NaN,NaN与其他数计算结果均为NaN,除了
Math.pow(Float.NaN,0);//结果为1.0
//另外NaN == NaN; false

floating point precision

  The precision is determined by the mantissa, why? It can be seen from the value calculation formula of floating-point number: when the final value of the exponent is negative, although the value of the floating-point number can represent a smaller value at this time, it can only represent the decimal number of 0~1 (or -1~0) at this time. , has no practical significance. So the precision mainly depends on the value of the mantissa.

float

  The mantissa of float: 23 bits, the range is: 0~223223, and 223=8388608=106.92223=8388608=106.92, so the precision of float is 6~7 bits, which can ensure that 6 bits are absolutely accurate, and 7 bits are generally correct , 8 bits is not necessarily (but it does not mean that 8 bits is absolutely wrong), note that the 6~7 bits here are valid decimal places (for large numbers, you need to convert them into the exponential form of decimals, for example: 8317637.5, which is valid Decimal digits: 8.3176375E6, seven digits), and the significant digits (starting from the first non-zero number) are 7~8 digits, including integer digits, such as 8317637.5, if you do not convert, you must start from the significant digits. For perspective, there are 8 significant bits. 
  

System.out.println((float)Math.pow(10,6.92));//注意加float强制转换
//打印结果8317637.5,float只保证7~8位有效位,其余位数舍入

  If you don't understand, you can think of it like this: 23 bits, binary 0101...0101, the mantissa represents the decimal place, the minimum is 0000...0001 (22 0s, the last 1), that is, 2−232−23=1.1920929E-7 , this is the smallest unit of float (probably the size of 0.0000001192, you want to represent smaller than this, such as 0.00000001, impossible), this is a 7-digit decimal, the smallest is so small, smaller than this, The computer is powerless. If it is bigger than this, add such a minimum unit each time until it is equal or close (two numbers that differ by a minimum unit, the number between them cannot be represented, so some 7 bits are also Can't be exact, because the minimum is not 0.0000001, but slightly larger than this).

double

  The calculation method is the same as float, double mantissa: 52 bits, 2−522−52=2.220446049250313E-16, the minimum is 16 bits, but the minimum is not 1.0E-16, so the precision is 15~16, can guarantee 15, generally 16 bits .

 

/*
         关于 float 4字节 也就是32bit 与 Integer 一样, 3.4E-38 ——3.4E+38,可提供7位有效数字
         */

        float f = 0.12345678f;
        System.out.println(f);    //0.12345678 【正常】, 1.2E-7  >  3.4E-7      

        f = 12345678f;
        System.out.println(f);    //12345678 【正常】, 1.2 E+7  < 3.4E+7

        f = 33444444f;
        System.out.println(f);    // 3.3444444E7 【正常】  3.3E+7 <  3.4E+7

        f = 1.1234567f;
        System.out.println(f);     //1.1234567 【正常】

        f = 12.123456f;
        System.out.println(f);      //12.123456 【正常】

        //---------------

        f = 92345678f;
        System.out.println(f);    //92345680 【溢出】,原因 9.2 E+7  > 3.4E+7

        f = 123456789f;
        System.out.println(f);    //123456792 【溢出】,12.3 E+7 > 3.4 +7

        f = 1234567.1234567f;
        System.out.println(f);    //1234567.1 【溢出】, 超出 7位有效位

        f = 1.123456789f;
        System.out.println(f);     // 1.1234568  7位,【溢出】  超出 7位有效位

        //当 float 的整数位越大,则表示小数位就越小。精度就越不足,溢出的可能性就越高

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325312526&siteId=291194637