Common means and methods of data compression

0. Introduction

We mentioned before in " Classic Literature Reading - R-PCC (Point Cloud Compression Method Based on Distance Image)" that we can complete data compression through some algorithmic levels, but in fact, the simpler or more direct method is Use the form of half to complete data compression.

1. half and float

Half is a data type that uses 16 bits to represent floating-point numbers. It is also specified in IEEE 754. This data type is widely used in deep learning systems. However, the current mainstream CPU does not support the calculation and output of half type data, so conversion between the half and float data types is required.

Figure 1 is a standard of 16-bit floating-point representation, which includes 1 sign bit, 5 exponent bits and 10 mantissa bits. For normal values, the results are expressed as follows.
insert image description here
Figure 2 is a standard representation of 32-bit floating-point numbers, including 1 sign bit, 8 exponent bits and 23 mantissa bits. For normal values, the results are expressed as follows.
insert image description here
So for the conversion between half and float, in addition to the shift of different parts, also pay attention to the difference between the bases of the exponent (15 and 127). To convert the half type to the float type, the main steps are as follows.

  • The sign bit is shifted left by 16 bits.
  • Add 112 to the exponent (the difference between 127 and 15), shift left by 13 bits (right justify).
  • The mantissa part is shifted left by 13 bits (left justified).

2. half and float reference code

The following is the corresponding reference code:


typedef unsigned short ushort;//占用2个字节
typedef unsigned int uint;    //占用4个字节
 
uint as_uint(const float x) {
    
    
    return *(uint*)&x;
}
float as_float(const uint x) {
    
    
    return *(float*)&x;
}
 
float half_to_float(const ushort x) {
    
     // IEEE-754 16-bit floating-point format (without infinity): 1-5-10, exp-15, +-131008.0, +-6.1035156E-5, +-5.9604645E-8, 3.311 digits
    const uint e = (x&0x7C00)>>10; // exponent
    const uint m = (x&0x03FF)<<13; // mantissa
    const uint v = as_uint((float)m)>>23; // evil log2 bit hack to count leading zeros in denormalized format
    return as_float((x&0x8000)<<16 | (e!=0)*((e+112)<<23|m) | ((e==0)&(m!=0))*((v-37)<<23|((m<<(150-v))&0x007FE000))); // sign : normalized : denormalized
}
ushort float_to_half(const float x) {
    
     // IEEE-754 16-bit floating-point format (without infinity): 1-5-10, exp-15, +-131008.0, +-6.1035156E-5, +-5.9604645E-8, 3.311 digits
    const uint b = as_uint(x)+0x00001000; // round-to-nearest-even: add last bit after truncated mantissa
    const uint e = (b&0x7F800000)>>23; // exponent
    const uint m = b&0x007FFFFF; // mantissa; in line below: 0x007FF000 = 0x00800000-0x00001000 = decimal indicator flag - initial rounding
    return (b&0x80000000)>>16 | (e>112)*((((e-112)<<10)&0x7C00)|m>>13) | ((e<113)&(e>101))*((((0x007FF000+m)>>(125-e))+1)>>1) | (e>143)*0x7FFF; // sign : normalized : denormalized : saturate
}


//下面的demo中,yolov5_outputs[0].buf是void *类型的,void *类型不能++,因此先转换成ushort*类型。

    float *data0 = (float*)malloc(4 * output_attrs[0].n_elems);
    float *data1 = (float*)malloc(4 * output_attrs[1].n_elems);
    float *data2 = (float*)malloc(4 * output_attrs[2].n_elems);
    unsigned short *temp0 = (ushort*)yolov5_outputs[0].buf;
    unsigned short *temp1 = (ushort*)yolov5_outputs[1].buf;
    unsigned short *temp2 = (ushort*)yolov5_outputs[2].buf;
 
    for(int i=0; i < output_attrs[0].n_elems;i++)
    {
    
    
        data0[i] = half_to_float(temp0[i]);
    }
    for(int i=0; i < output_attrs[1].n_elems;i++)
    {
    
    
       data1[i] = half_to_float(temp1[i]);
    }
    for(int i=0; i < output_attrs[2].n_elems;i++)
    {
    
    
       data2[i] = half_to_float(temp2[i]);

3. float and uint16

It is often used in serial port communication. The serial port can only communicate in character type (char).
atof(): Convert a string to a double-precision floating-point value.
atoi(): Converts a string to an integer value.
Floating point to uint16 function

…For details, please refer to Gu Yueju

Guess you like

Origin blog.csdn.net/lovely_yoshino/article/details/128916637