floating point number

1. Definition of floating point type

Floating-point is a computer data type that represents real numbers. It can represent finite decimals, infinite decimals, and approximate values. The representation method of floating point type is based on scientific notation, that is, a real number can be represented as the product of the mantissa (significant digits) and the exponent.

In scientific notation, a real number is represented as: ±尾数 × 基数^指数. In computers, floating point types usually follow the IEEE 754 standard, which defines the storage format and operation rules of floating point numbers. In the IEEE 754 standard, the base is 2, and the representation of real numbers includes sign bits, exponent bits and mantissa bits.

Floating point types usually have two precisions: single precision floating point type (float) and double precision floating point type (double).

Floating point is widely used in scientific computing, graphics processing, engineering simulation and other fields because it can represent very large or very small values ​​while having high calculation accuracy. However, due to the representation and operation characteristics of floating point numbers, problems such as rounding errors and cumulative errors may occur, which requires special attention when high-precision calculations are required.

1.1. Single precision floating point type

Usually 32 bits (4 bytes) are used to represent a real number. Among them, 1 bit represents the sign bit (positive or negative number), 8 bits represent the exponent bit, and 23 bits represent the mantissa bit. The effective digits of a single-precision floating point type are approximately 6 to 7 decimal digits.

1.2. Double precision floating point type

Usually 64 bits (8 bytes) are used to represent a real number. Among them, 1 bit represents the sign bit (positive or negative number), 11 bits represents the exponent bit, and 52 bits represents the mantissa bit. The effective digits of a double-precision floating point type are approximately 15 to 16 decimal digits.

2. Floating point precision

The effective figures of a single-precision floating-point type are approximately 6 to 9 decimal digits, and the effective figures of a double-precision floating-point type are approximately 15 to 17 decimal digits. Under what circumstances are there several significant figures? Here is a simple analysis. Criticisms and corrections are welcome.

2.1. Single precision floating point type

23 bits represent the mantissa, and 2 raised to the 23rd power is 8388608, so when the absolute value of the mantissa in scientific notation is less than 8.388607, it has 7 significant digits, and when it is greater than it, there are only 6 significant digits.

2.2. Double precision floating point type

52 bits represent the mantissa, and 2 raised to the 52nd power is 4503599627370496, so when the absolute value of the mantissa in scientific notation is less than 4.503599627370495, it has 16 significant digits, and when it is greater than it, there are only 15 significant digits.

references

Go Float Data Types

Guess you like

Origin blog.csdn.net/xhtchina/article/details/133418155