[Knowledge point essay sharing | Part 1] Unavoidable floating point error

  Introduce:

When you first entered the C language in your freshman year, the teacher must have said that the comparison between floating-point numbers should be done by the difference method. When the difference between the two is very small or even close to 0, the two numbers are equal. I don’t know if you have doubts? Why can't floating-point numbers be compared directly?

#include<stdio.h>
int main()
{
	double a = 0.1;
	double b = 0.2;
	

	if ((a + b) == 0.3)
	{
		printf("true");
	}
	else
	{
		printf("false");
	}
}

This code is very simple, it is to judge whether 0.1+0.2 stored in floating point is equal to 0.3.

The result of the operation turned out to be not equal to. But practical knowledge tells us that 0.1+0.2 is indeed equal to 0.3, so why does the comparison not equal?
Let's look at the value of the floating point number 0.1+0.2 separately:
we found that: the original floating point number 0.1+0.2 is not exactly equal to 0.3, why are there so many numbers at the end?
In fact, this is what we are going to introduce today:
floating point error

Floating point error:

To explore the origin of the error, we have to introduce what IEEE-754 is, which specifies the binary storage rules for floating-point numbers in computers in detail.
 

IEEE-754 is an international standard for binary floating-point numbers. It specifies the expression of floating-point numbers in computer systems and the corresponding calculation rules. The purpose is to improve the interoperability and portability of floating-point numbers between different computer systems.

The IEEE-754 standard defines two floating-point formats, namely single-precision floating-point numbers and double-precision floating-point numbers. Among them, the single-precision floating-point number contains 32 bits, and the double-precision floating-point number contains 64 bits. Both formats are widely used, especially in scientific computing and engineering.

The IEEE-754 standard also stipulates the distribution of the sign bit, exponent bit, and mantissa bit of the floating-point number, as well as the four operation methods (addition, subtraction, multiplication, and division) of the floating-point number and the exception handling method. Among them, the exponent of the floating-point number is represented by a frame shift, which can represent positive numbers, negative numbers and zero, and the precision and significant figures of the floating-point number will also change with the change of the exponent.

In practical applications, the floating-point format and calculation rules of the IEEE-754 standard are widely used in various programming languages ​​and computer systems, such as C language, Java, Python, etc., and also provide important technical support for high-performance computing and large-scale scientific computing.

The EEE-754 standard specifies two floating-point formats, namely single-precision floating-point numbers and double-precision floating-point numbers , as follows:

  1. Single-precision floating-point number : 32-bit binary, with 1 sign bit, 8-bit exponent and 23 mantissa bits. The highest bit is the sign bit, 0 means a positive number and 1 means a negative number. The next 8-bit exponent is represented by a frame shift, which can represent positive numbers, negative numbers, and zero. The 23-bit mantissa includes an integer part and a fractional part, and is used to save significant figures and precision.

  2. Double-precision floating-point number : 64-bit binary, with 1 sign bit, 11 exponent bits and 52 mantissa bits . The highest bit is the sign bit, 0 means a positive number and 1 means a negative number. The next 11 exponent bits are represented by a frame shift, which can represent positive numbers, negative numbers, and zero. 52 mantissa bits, including integer part and fractional part, are used to save significant figures and precision.

These two floating-point number formats use a combination of exponent and mantissa to express real numbers within a certain range. In the design of the floating-point number format, the separate representation method of the exponent code and the mantissa is adopted, so that the precision and range of the floating-point number can be dynamically adjusted according to the change of the high and low bits of the exponent, and the flexibility and accuracy of the floating-point number calculation are improved.

Origin of error:

We can use this picture to simply represent binary storage

We can see: because it is binary, it is very convenient to store decimals 0.5 and 0.25, but there is no way to store numbers like 0.1, we can only approach 0.1 infinitely, and the corresponding binary is an endless cycle of infinite numbers: 0.00011001100110011001100110011........ 

But through the introduction of IEEE-754 just now, everyone should already know that the number of digits we can store is limited. If it is a single-precision floating-point type, there is only 23-bit storage space, and if it is double-precision, there is only 52-bit storage space . Therefore, binary numbers like this infinite loop will be truncated when they are actually stored. And this is where the error comes from.

Although this kind of error is very small, it will also cause great losses in some aspects, such as the bank's storage. If the bank pays such a little more for every penny, it will be a huge loss for the bank.

So we have been exploring how to avoid floating point errors

How to avoid floating point errors:

Implemented manually:

1. Do not compare floating-point numbers directly
When comparing floating-point numbers directly, you may get wrong results due to floating-point errors. Instead, a small error range should be used to compare floating-point numbers. For example, if the difference is less than a small value eps, two floating-point numbers are considered equal.

2. Use integer operations as much as possible
In some cases, you can convert floating-point numbers to integers in advance, multiply by several multiples, and then divide by the corresponding multiples, which can avoid floating-point errors.

3. Use high-precision algorithms
If the precision requirements are very high, you can use high-precision algorithms, for example, you can use a programming language with built-in high-precision calculation functions, or a third-party high-precision calculation library, etc.

4. Set appropriate floating-point precision
Sometimes you can use rounding, rounding to zero, etc. to set appropriate precision to avoid floating-point errors. For example, the round function can be rounded to the nearest integer, or you can set the precision to take a few digits after the decimal point

5. Avoid iterative calculation
Iterative calculation often introduces serious errors, so it should be avoided as much as possible. Iterative calculations can be avoided by using a single calculation to get exact results.

Finally, it should be noted that the method of avoiding floating-point errors should be determined according to specific problems, and different problems may require different methods. In practical problems, it is necessary to comprehensively consider precision and efficiency, and choose an appropriate method to avoid floating-point errors.

In addition, Python also provides us with a special data type: deciaml, which is used to store decimals of specific precision.

ecimal is a Python library for high precision floating point calculations. It allows developers to perform floating-point arithmetic without loss of precision.

The main features of the ecimal library are as follows:

1. High-precision
ecimal uses base 10 fixed-point arithmetic to perform floating-point arithmetic, which enables it to provide calculations with up to 28 significant figures. This means that very high precision can be maintained when performing calculations involving many digits after the decimal point.

2. Reproducible
Unlike the floating-point calculation specified in the IEEE-754 standard, the calculation result of ecimal is reproducible. This means that no matter what computer architecture or CPU you use, you will get the same results on every machine.

3. Easy to use
Due to the module design, it is very convenient to use the ecimal library for coding. It works well with Python's built-in data types and operators. Also, some special signed arithmetic functions allow developers to control the precision.

        In fact, ecimal has become a very important library in Python programming, especially in calculations related to business and finance, such as currency exchange, tax calculation, etc. In these scenarios, it is necessary to maintain high precision and reproducible floating point calculations, and ecimal is one of the best solutions to provide these functions.

 

Going back to the mistake at the beginning: Since we can't compare directly, what should we do?

The first method is to do the difference directly. When the difference between the two numbers is very small, we consider the two floating-point numbers to be equal.

Automatic implementation:

The second method is to directly call the library functions in the major languages. Each language has already noticed this problem when designing, so the library functions have been designed for us to use.

The functions provided by each programming language to determine whether floating-point numbers are equal are as follows:

1. Python: math.isclose()
In Python, the math.isclose() function is used to compare whether two floating-point numbers are close. This function has five parameters, namely a, b (two numbers to be compared), rel_tol (relative error threshold, the default is 1e-9), abs_tol (absolute error threshold, default is 0), and two Boolean parameters, respectively indicating whether the sign should be considered and whether the two values ​​​​are NaN. The function returns True when the difference between a and b is less than the provided error threshold, and False otherwise.

2. C++: std::abs()
In C++, you can use the std::abs() function to determine whether two floating-point numbers are equal. According to whether the difference returned by this function is less than the given epsilon, it is judged whether the difference between two floating-point numbers is less than the threshold. For example:

   bool isEqual(double x, double y) {
       const double epsilon = 1e-9;
       if (std::abs(x - y) <= epsilon * std::abs(x)) {
           return true;
       }
       return false;
   }

3. Java: Math.abs()
In Java, you can use the Math.abs() function to determine whether two floating-point numbers are equal. Use this function to return whether the difference is less than the given epsilon value, thereby judging whether the difference between two floating-point numbers is less than the threshold. For example:

   public static boolean isEqual(double x, double y) {
       final double epsilon = 1e-9;
       return Math.abs(x - y) <= epsilon * Math.abs(x);
   }

4. MATLAB: eps()
In MATLAB, you can use the eps() function to determine whether two floating-point numbers are equal. This function returns a floating-point number with the same precision as the specified floating-point number. In general, you can subtract floating-point numbers and calculate the difference with the eps() function, then compare the result with the given epsilon value. For example:

   function bool = isEqual(x, y)
       epsilon = 1e-9;
       bool = abs(x - y) <= epsilon * abs(x);
   end

In summary, using these functions can effectively avoid incorrect calculation results due to floating-point rounding errors.

 

 Summarize:

Although there will be floating-point errors in floating-point numbers, we should not be afraid to use floating-point numbers. As long as we keep in mind logically that floating-point numbers will have floating-point errors, it is difficult to make mistakes.

If my content is helpful to you, please like, comment and bookmark . Creation is not easy, everyone's support is my motivation to persevere!

 

Guess you like

Origin blog.csdn.net/fckbb/article/details/131347402