[Introduction to ARM floating point computing unit FPU (FPA, VFP, NEON)]

Article Directory
1.1 Introduction to ARM FPU
1.1.1 Single-precision floating-point numbers
1.1.2 Double-precision floating-point numbers
1.1.3 Special cases of exponent and mantissa
1.1.4 IEEE 754 standard
1.2 Compiler impact on floating-point numbers
1.2.1 VFP and FPA Relationship
1.2.1.1 VFP functional features
1.2.2 GCC and floating point operations
1.2.3 VFP field protection
1.2.4 Hard floating point and soft floating point
1.3 ARM NEON
1.1 ARM FPU Introduction
ARM’s FPU (Floating Point Unit) is an ARM processing An important part of the processor, mainly responsible for performing floating point operations. ARM's FPU supports the IEEE 754 standard floating-point number format and can perform various basic operations on floating-point numbers, such as addition, subtraction, multiplication, division, etc., as well as some more complex operations, such as square root, absolute value, etc.

In early ARM processors, the floating point unit was an optional component. But in modern ARM processors, such as the Cortex series processors, the floating point unit is usually built-in, which is of great help in performing floating point operations.

In addition, ARM's floating-point unit also supports vector operations and can process multiple floating-point numbers at the same time, thus greatly improving computing efficiency. This is useful for performing tasks such as complex scientific calculations and graphics processing.

FPv5 is the fifth generation of ARM floating point hardware. It supports all standard single-precision floating-point operations, including addition, subtraction, multiplication, division, square roots, and all comparison operations. It also supports floating point to integer conversion, as well as floating point rounding operations.

1.1.1 Single-precision floating-point number
The ARM32 architecture supports the IEEE 754 standard floating-point number format. In this standard, single-precision (32-bit) and double-precision (64-bit) floating-point numbers are represented as follows:
Sign bit (Sign): 1bit, (bit[31]), 0 represents a positive number, and 1 represents a negative number.
Exponent: 8bits (bit[30-23]), used to represent the exponent part of floating point numbers, encoded by offset 127.
Mantissa bit (Fraction/Mantissa): 23bits, (bit22:0]), represents the mantissa part of the floating point number, which directly determines the accuracy of the floating point number.


Single precision (32-bit) floating point number:

The smallest positive number (non-zero) is about 1.4E-45.
The largest positive number is about 3.4E38. The
effective number of digits is about 7 digits. The decimal number
accuracy mainly depends on the number of digits in the mantissa. For example, the float mantissa is 23 digits. Excluding the mantissa, all are Except for the case of 0, the minimum is 2-23 power, which is approximately equal to 1.19 multiplied by 10-7 power, so the decimal part of float can only be accurate to the last 6 digits. Add one digit before the decimal point, that is, 7 significant digits.

1.1.2 Double-precision floating point
number Sign bit: 1bit, bit[63], 0 represents a positive number, 1 represents a negative number.
Exponent bit: 11bits, bit[62:52], used to represent the exponent part of the floating point number, encoded by offset 1023.
Mantissa bit: 52 bits, bit[51:0], represents the mantissa part of the floating point number, which directly determines the accuracy of the floating point number.


Double precision (64-bit) floating point number:

The smallest positive number (non-zero) is about 5.0E-324.
The largest positive number is about 1.8E308. The
effective digits are about 15~16 digits.
Similar to decimal numbers, the double mantissa part is 52 bits. The minimum is 2-52, which is about 2.22 x 10 -16, so it is accurate to 15 digits after the decimal point, and the effective digits are 16 digits

1.1.3 Special cases of exponent and mantissa
In the ARM single-precision floating-point number format, if the exponent part is all 0 or all 1, then this will represent some special cases.

The exponent part is all 0:

When the mantissa is also all 0, this means plus or minus zero.
When the mantissa part is not 0, this represents a normalized floating point number, which means that the number is too small to be represented in standard form.
The exponent part is all 1:

When the mantissa is all 0, this means plus or minus infinity.
When the mantissa part is not 0, this represents a not-a-number (NaN).
These special cases are to deal with situations where the results of some operations cannot be expressed by regular floating point numbers. For example, the result of dividing by zero is infinity, the result of dividing 0 by 0 or the result of infinity minus infinity is a non-number, etc.

1.1.4 IEEE 754 standard


1.2 The impact of the compiler on floating point numbers
1.2.1 The relationship between VFP and FPA
ARM VFP (Vector Floating Point) and FPA (Floating Point Accelerator) are both hardware units used to process floating point operations in the ARM architecture and can accelerate floating point operations. Calculations such as addition, subtraction, multiplication, division, etc.

FPU can be seen as a more general term for it. An FPU can be any hardware that performs floating point operations, including FPA and more modern hardware such as VFP (Vector Floating Point Unit) and NEON.

The FPA is an older floating point processing unit that can handle single and double precision floating point numbers. The FPA has 32 floating-point registers, which can be organized into eight four-element vectors or used as single-element registers.

VFP was introduced later and can handle single-precision, double-precision and half-precision floating point numbers, providing more powerful functions and higher efficiency. The VFP has 32 64-bit registers, which can be thought of as 32 double-precision registers, or 64 single-precision registers.

Simply put, VFP is an upgraded version of FPA, providing more advanced floating point calculation support. Although the functions of the two overlap, they differ in some instructions and register organization due to historical reasons. In real applications, the choice of which one to use needs to be based on the specific processor and requirements.

1.2.1.1 Functional features of VFP
In addition to providing support for basic operations of floating point numbers (addition, subtraction, multiplication, division, square root, comparison, and negation), the most distinctive feature of VFP is its vectors function.
It supports the operation of up to 8 groups of single-precision and 4 groups of double-precision floating-point numbers at the same time. VFP can also be implemented through coprocessors CP10 and CP11. Among them, CP10 supports single-precision floating-point operations, and CP11 supports double-precision floating-point operations. So all VFP instructions are actually some coprocessor instructions. For example, FADDS is actually a CDP instruction, and a FLDS is an LDC instruction.

1.2.2 GCC and floating point operations
When using GCC to compile ARM programs, you can use the -mfloat-abi and -mfpu options to set the floating point ABI and FPU type.

The -mfloat-abi option has three possible values: soft, softfp, and hard.

soft means that all floating point operations are handled by the software library and does not use the hardware FPU;
softfp means that the floating point operations use the hardware FPU, but the floating point function parameters and return values ​​are passed through integer registers;
hard means that the floating point operations use the hardware FPU, and Floating point function parameters and return values ​​are passed through FPU registers.
The -mfpu option is used to select a specific FPU type. For VFP, there are several possible values, such as vfp, vfpv3, vfpv4, etc., depending on which version of VFP your processor supports.

For example, if your processor supports VFPv3, you can compile your program using the following GCC options:

gcc -mfloat-abi=hard -mfpu=vfpv3
1
In this way, GCC will generate floating-point arithmetic code using VFPv3 hardware acceleration, and use FPU registers to pass floating-point function parameters and return values.

1.2.3 VFP scene protection
In the operating system, the state of the VFP (Vector Floating Point Unit) is part of the process scene. When the operating system performs a context switch, it needs to save and restore the state of the VFP. That is, the operating system needs to save the values ​​of the VFP registers of the currently running process or thread and restore these values ​​when switching back.

This is very important for systems that support hardware floating point operations, as incorrectly saving and restoring the state of the VFP may result in incorrect results from floating point operations. Therefore, the operating system needs to ensure that the VFP scene is correctly handled during process switching to ensure the correct operation of the system.

1.2.4 Hard floating point and soft
floating point
The hard floating point compiler directly compiles the code into instructions that can be recognized by the hardware floating point coprocessor (floating point unit FPU). When these instructions are executed, the ARM core directly converts them into to the coprocessor for execution. FPUs usually have an additional set of registers to complete floating point parameter passing and operations. Using an actual hardware floating-point unit (FPU) will bring performance improvements.

Soft floating point
The compiler converts floating-point operations into floating-point operation function calls and library function calls (that is, using integer operations to simulate floating-point operations). There are no FPU instruction calls and no floating-point register parameter transfers. The transfer of floating point parameters is also completed through ARM registers or stacks. Current Linux systems use hard-float as the default compilation option. If the system does not have any floating-point processor unit, this will generate illegal instructions and exceptions. Therefore, general system images use soft floating point to be compatible with processors without VFP.

1.3 ARM NEON
ARM NEON is an advanced SIMD (Single Instruction Multiple Data) architectural extension in ARM processors, used to accelerate multimedia and signal processing calculations. NEON can support 8, 16, 32, and 64-bit integer and single-precision (32-bit) floating point operations, and can handle parallel operations with up to 16 operands.

The NEON architecture has 32 128-bit registers that can be used for integer and floating point operations and can be divided into smaller registers as needed. These registers can process multiple data in one instruction, thereby greatly improving the speed and efficiency of the processor in processing multimedia data.

ARM NEON is integrated into many ARM Cortex-A series processors, including Cortex-A8, A9, A15, etc. In addition, NEON is also widely used in multimedia applications such as video encoding and decoding, graphics processing, voice and audio processing.

You can use Neon to speed up the algorithm. In fact, the encryption algorithms in the kernel are basically Neon accelerated, such as sha1, sha2...sha512, aes, chacha20, etc.——————————————Copyright
statement
: This article is written by CSDN blogger “CodingCos "The original article follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement when reprinting.
Original link: https://blog.csdn.net/sinat_32960911/article/details/127773859

Guess you like

Origin blog.csdn.net/u012294613/article/details/132322281