IEEE 754 standard

1. What is IEEE


        IEEE, the full name of Institute of Electrical and Electronics Engineers, is an international association of electronic technology and information science engineers and the world's largest non-profit professional technical society.


        IEEE is committed to the development and research of electrical, electronics, computer engineering and science-related fields. It has developed more than 1,300 industry standards in the fields of space, computers, telecommunications, biomedicine, power and consumer electronics products, and has developed Become an international academic organization with great influence.


        IEEE publishes more than 70 journals and magazines, holds more than 300 professional conferences every year, and publishes nearly one-third of the world's literature in the fields of electrical and electronic engineering, computer and control technology. The standards defined by IEEE have a great influence on the industry, and the standards it develops are also widely adopted by the international community.

2. IEEE 754 standard

        IEEE 754 is a floating-point arithmetic standard formulated by the Institute of Electrical and Electronics Engineers (IEEE). The full name is "IEEE Standard for Floating-Point Arithmetic". It was first released in 1985 and subsequently Revised in 1987, 2008 and 2019.

        The IEEE 754 standard defines the representation of binary floating-point numbers, including positive numbers, negative numbers, zero, and special values ​​(such as infinity and NaN - Not a Number). It mainly stipulates:

--> 1. 浮点数的格式:

    包括单精度(32位)、双精度(64位)以及其他扩展精度格式。
    
    每个浮点数由三部分组成:符号位(sign bit)、阶码(exponent field)和尾数(mantissa或fraction field)。

--> 2. 范围和精度:

    根据不同的格式,浮点数可以表示非常大或非常小的数值,同时具有一定的有效数字精度。

--> 3. 特殊数值的表示:

    例如,±0、±无穷大(infinity)以及非数字值(NaN)都有特定的编码方式。

--> 4. 运算规则:

    包括加减乘除、舍入规则、比较大小等基本运算的行为规范。

        Due to its versatility and efficiency, almost all computer systems and programming languages ​​today adopt or are compatible with the IEEE 754 standard to handle floating point operations.

3. Representation and calculation method of IEEE 754 standard in computers

        The IEEE 754 standard defines how floating-point numbers are represented and calculated in computers. The following takes a single-precision (32-bit) floating point number as an example to illustrate how to convert a decimal number into a binary form that complies with the IEEE 754 standard:

1. Single precision (32-bit) format

1、符号位(Sign bit)

    最高位1位,0表示正数,1表示负数。

2、阶码(Exponent)
    
    接下来8位,用于存储指数信息,通常采用偏移形式(即阶码减去某个固定的偏置值)来存储实际指数。

3、尾数(Mantissa或Fraction)

    最后23位,存储小数部分,通常会省略第一位的隐藏位(该位始终为1),所以有效尾数是24位。

2. Conversion steps

-->1. 处理符号位:

 • 如果原数为正,则符号位设为0;

 • 如果原数为负,则符号位设为1。

-->2. 转换为二进制并规范化:

 • 把十进制数转换为二进制科学计数法的形式,例如 (-1)^s * m * 2^e,其中s是符号,m是规格化的尾数(范围在1到2之间的小数),e是指数。

 • 规范化意味着确保尾数部分的第一位总是1(这个1在存储时会被隐含存储,不占用实际位数)。

-->3. 编码阶码:

 • 对于单精度浮点数,阶码的实际值等于原始指数加上一个偏置常数,通常是127(对于双精度则是1023)。

 • 计算 E = e + 127(对于单精度)并将结果以二进制补码形式存入8位阶码字段。

-->4. 编码尾数:

 • 将规格化后的尾数(去掉第一位的1后剩余的部分)转换为二进制,并填充到尾数字段。

-->5. 特殊情况处理:

 • 若数字为0,则尾数全为0,阶码根据实际情况决定是0还是最小负指数(表示±0)。

 • 若数字为无穷大或NaN,则阶码字段全为1,尾数全为0,通过符号位区分正负无穷大,而某些特定的尾数模式用于表示不同类型的NaN。

-->6. 组合所有部分:

 • 将符号位、阶码字段和尾数字段按照上述规则拼接起来形成32位的二进制数。

3. Example

将十进制数33.758转换为IEEE 754单精度浮点数(32位)

-->1. 符号位(Sign):

     • 因为33.758是正数,所以符号位S设为0。

-->2. 转换为二进制并规范化:

     • 首先将整数部分33转换为二进制:33 = 100001
     • 将小数部分0.758转换为二进制。这通常需要不断地乘以2并记录下整数部分直到达到足够的精度或者达到  尾数部分的最大位数。对于单精度浮点数,尾数有23位精度(不包括隐含的最高位1)。

    计算得到近似的二进制小数表示:
    0.758 ≈ 0.11000100110001...

    注意这里为了简化说明,我们没有实际展示完整精确的转换过程,但理论上要确保尾数部分在截断到23位后尽可能接近原始值。

-->3. 规格化:

    移动小数点使第一位数字成为1(隐含),并相应地调整指数。由于原数大于1,我们需要向左移动两位得到规范化的结果:
    1.1000100110001... * 2^2


-->4. 编码指数(E):

    规范化后的指数是2,加上偏置常数127得到实际存储的指数E:
    E = 2 + 127 = 129

    将129转换为8位无符号偏移指数的二进制形式(即二进制补码形式,但因为这里是正数所以与无符号等价):
    129 = 10000001 (二进制)

-->5. 编码尾数(M):

    将规范化后的尾数去掉第一位的1,剩余部分填充到23位尾数字段:
    Mantissa = 1.000100110001... -> 000100110001... (忽略首位的1,保留23位)

-->6. 组合成最终的单精度浮点数:
    将符号位、指数和尾数组合起来:
    单精度浮点数: S Exponent Mantissa
    0 10000001 000100110001...


    最后,将它们拼接起来形成32位的单精度浮点数。

    注意:以上步骤中的尾数部分可能因为手头条件限制而进行了近似处理,在实际操作中,尾数应准确转换至23位,并且有可能需要四舍五入或舍去最低有效位以适应标准格式要求。同时,指数也需要正确按照IEEE 754单精度浮点数格式进行编码。

 

3. How the IEEE 754 standard came into being

        Before the emergence of the IEEE 754 standard, there was no unified floating-point number standard in the industry. Many computer manufacturers designed their own floating-point number rules and operation details. At the time, there was more emphasis on speed and simplicity of implementation than on numerical accuracy.

         Until 1985, Intel planned to introduce a floating-point coprocessor for its 8086 microprocessor. It wisely realized that electronic engineers and solid-state physicists, as chip designers, may not be able to choose the best choice through numerical analysis. A reasonable binary format for floating point numbers.

        So Intel invited Professor William Kahan of the University of California, Berkeley, one of the best numerical analysts, to design a floating-point format for the 8087 FPU; and this guy hired two experts to assist him, so the KCS combination was born (Kahn, Coonan, and Stone).

        They jointly completed the design of Intel's floating-point format, and they did it so well that the IEEE organization decided to adopt a solution very close to KCS as the IEEE's standard floating-point format.        

        Currently, almost all computers support this standard, which greatly improves the portability of scientific applications.

Guess you like

Origin blog.csdn.net/W_Fe5/article/details/135362059