Design flaws of IEEE floating point numbers

In biochemistry, "information" is one of the two basic perspectives for studying matter, and the other is "energy". Because information and energy are abstract things, the cost of studying the real world from their perspective is very low. For example, computer students only need a computer to do experiments (except for deep learning), unlike other science and technology Subjects, they have to spend a lot of money to prepare equipment and materials to do an experiment. Doing less experiments will greatly reduce the learning efficiency. Compared with this, our computer professionals have obvious learning advantages.

2 basic principles of coding

A previous article on information theory, "The Limits of Serialization" seemed to mention the relevant principles once:

Today we will rearrange these two principles. Principle one is called "no ambiguity" and principle two is called "no redundancy".

Information theory requires a one- to-one correspondence between encoded values ​​(serialized binary values) and actual meanings in order to compress the information to a minimum. There are two types of situations that break the one-to-one correspondence:

  • Ambiguity: The same code has multiple different meanings

  • Redundancy: multiple codes correspond to the same meaning

In one sentence, the coding value (bit stream of 0 and 1) and the actual meaning must be one-to-one correspondence, no more, no less, in order to achieve the optimal coding.

Embarrassing IEEE floating point numbers

Recently, I helped the company develop a serialization format, and spent a lot of time on the issue of "how to store decimals". It seems that Bill Gates and Jobs also struggled with this issue for a long time. Why is it so difficult to store decimals? Because the difference between decimals and integers is that integers are concerned with the size of the value, and decimals are concerned with "precision". The code volume of an integer is proportional to the size of the value, but the volume of a decimal is only proportional to the accuracy. The lower limit of volume, size, and progress is 0, and the upper limit is infinite.

How do IEEE floating point numbers store decimals?

In the history of computer development, storing natural numbers in binary is the most "natural" choice. Later, in order to store negative numbers, codes such as original codes, 2's complement codes, zigzag, etc. appeared, and then 3 codes appeared to store floating-point numbers. Mainstream IEEE standards:

  • Half precision half: 16bit

  • Single precision: 32bit

  • Double precision double: 64bit

  • ......

In fact, there are endless precision types such as four-precision and eight-precision. Take the simplest half-precision (binary16) as an example:

There are sign bit, exponent bit, fraction, and the decoding formula is very interesting:


(−1)sign × 2exponent−15 × 1.fraction2


Let's study this formula.

Do you know why the following significant bits (significantbits) start with "1." instead of "0."? The default "1" for IEEE floating point numbers is to satisfy the previous principle of "no redundancy" : there is at least one "1" in the binary fractional part, and it cannot be all 0, otherwise it is an integer. Since the first "1" is definitely there, we can save a "1" when encoding and let the decoder add it to us. This is a space-saving, or "offset" technique: IEEE half-precision floating-point numbers explicitly store 10 significant digits, but these 10 digits are "face value", and an implicit "1" is added to it. Get the actual value.

Decimals with higher precision are more common and have higher status

There is also a wise approach to IEEE floating point numbers. In the formula, why should "1." be placed before the fraction instead of after it? If it is placed after it is fraction.1, then the exponent bit does not need to be -15. But obviously did not do so. The habit of scientific notation does not explain the reason that IEEE floating-point numbers abandon this simple encoding method. The real reason behind it is the human habits of using decimals for thousands of years: decimals with higher precision are always more common.

For example, if you are intuitive, which two decimals, 12345.6 or 1.23456, is more "good"? I patted my head and felt that the latter one is better, because the integer part of 12345.6 is much larger than the decimal part, which makes us question the meaning of storing this 0.6. Isn't it enough to store an integer like 12346 directly? However, in 1.23456, the proportion of the precision part is much larger than the integer part. From a statistical point of view, it is more meaningful to store 1.23456, while storing 12345.6 is "not significant".

This conclusion can be regarded as an axiom, so we have 4 "more common" in the real number category:

  • Integer> non-integer

  • Number with small absolute value> Number with large absolute value

  • Positive number> negative number

  • Numbers with high precision> decimals with low precision

Note: The greater than sign compares "commonness".

But I finally gave up using IEEE floating point numbers to encode my decimals, because it has a fatal flaw: impure . For some unknown purpose, IEEE floating-point numbers retain NaN and ±infinity, which are almost meaningless constants in modern programming languages. NaN and encoding are not one-to-one, which means that there are a bunch of different binary16 representations. NaN, this breaks the principle of "no redundancy" in information theory. Breaking this principle has its ±0.

Exponent Significand = zero Significand ≠ zero Equation
000002 zero, −0 subnormal numbers (−1)signbit × 2−14 × 0.significantbits2
000012, ..., 111102 normalized value (−1)signbit × 2exponent−15 × 1.significantbits2
111112 ±infinity NaN (quiet, signalling)

This table is on Wikipedia, which mainly exposes the special case of half-precision floating-point numbers: NaN, ±infinity, -0 that are forced to "vacate".

Reluctantly, I had to redesign the decimal encoding. It needs to be clear that floating-point numbers are only one of the encodings of decimals, and IEEE floating-point numbers are a variant of regular floating-point numbers because they are also compatible with integers. I hope to design a non-redundant perfect encoding for pure decimals. There are three ways that can be thought of at present:

  • Floating-point number encoding: storage [exponent, effective part]

  • Fraction coding: storage [numerator, denominator]

  • Decimal point separation: storage [integer part, decimal part]

The advantage of fraction encoding is that it can accurately encode any decimal number , such as 0.1 in decimal can be stored as [1, 10]. However, the fractional encoding also caused redundancy, because the numerator and denominator were multiplied by a number, and the value of the score remained unchanged, so the fractional encoding was eliminated.

After a long period of thinking, I finally came up with a set of decimal codes based on decimal point separation, temporarily named "precision inversion algorithm", the name is very gimmick, the next chapter will specifically analyze the theory of precision inversion algorithm in accordance with.

Since the last time I complained about Utf8, I have scolded IEEE floating point numbers again, which is really cool.


Ascendas Building

Guess you like

Origin blog.csdn.net/github_38885296/article/details/106368848