Example explanation of Huffman decoding in JPEG

DHT Huffuman table format

-------------------------------------------------- ------------------------
Name Number of Bytes Value Description
--------------------- -------------------------------------------------- ---
Segment identification 1 FF
segment type 1 C4
segment length 2 Its value = 19 + n (when there is only one HT table)
  (The following is the segment content)
HT information 1 0-3 bits: HT number
                                             4 bits: HT type, 0= DC table, 1 = AC table
                 5-7 bits: must = 0
HT bit table 16 The sum of these 16 numbers should be ≤ 256
HT value table n n = The sum of the 16 numbers in the header
---------- -------------------------------------------------- ---------------

Read Huffman table

FF C4 01 A2 00 00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0A 0B

FF C4: Huffman table identification code

01 A2: DHT table length (number of bytes from 01 to 0B)

00: 4bit=0, it is DC meter; the low 3bit=0, HT number is 0; it means DC meter number 0

00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 00: The number of codewords with different bits of DHT. The sum of the data represents the number of leaf nodes: 1+5+1+1+1+1+1+1 = 12;

00 01 02 03 04 05 06 07 08 09 0A 0B: Encoding content, that is, the encoding value under each leaf node

Construct Huffman tree

After reading the data from the Huffman table, you need to build the Huffman tree. The specific rules are as follows:

  (a) The first coded number must be 0; if the first coded digit is 1, it is coded as 0; if the first coded digit is 2, it is coded as 00; if the first coded digit is 2 If the number of digits in a code is 3, it is coded as 000. . .

  (b) Starting from the second code, if it has the same number of digits as the previous code, the current code is the previous code plus 1; if its code digits are greater than the previous code digits, the current code is When encoding, add 1 to the previous code and then add several 0s at the end until the length of the number of coded bits is met.

Or take the above data 00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 00 as an example:

The first byte 00 indicates that there is no code with a digit of 1;

The second byte 01 indicates that there are two codes with a digit of 2; since there is no code with a digit of 1, the first of the codes with a digit of 2 here is 00;

The third byte 05 indicates that there are 5 codes with 3 digits; therefore, the first code with 3 digits here is 00+1=01, and then add 1 "0" to get 010; bit The second code with the number 3 is 010+1=011; the third is 011+1=100; the fourth is 100+1=101; the fifth is 101+1=110;

The fourth byte 01 indicates that there is 1 code with 4 digits; therefore, the first code with 4 digits here is 110+1=111, and then add 1 "0" to get 1110;

The fifth byte 01 indicates that there is 1 code with 5 digits; therefore, the first code with 5 digits here is 1110+1=1111, and then add 1 "0" to get 11110;

By analogy, the following Huffman tree is obtained, Table 1:

Y(luminance)-DC

 

Serial number (number of bits)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

The number of identical Bits

0

1

5

1

1

1

1

1

1

0

0

0

0

0

0

0

illustrate

There is no encoding with a digit of 1

1 2bits

5 3bits

1 4bits

1 5bits

1 6bits

1 7bits

1 8bits

1 9bits

There is no encoding with 10 digits

There is no encoding with 11 digits

There is no encoding with 12 digits

There is no encoding with 13 digits

There is no encoding with 14 digits

No encoding with 15 digits

There is no encoding with 16 digits

Codeword (binary)

none

00

(00+1)<<1

==>

010

011

100

101

110

(110+1)<<1

==>

1110

(1110+1)<<1

==>

1111 0

(11110+1)<<1

==>

1111 10

(111110+1)<<1

==>

1111 110

(1111110+1)<<1

==>

1111 1110

(11111110+1)<<1

==>

1111 1111 0

none

none

none

none

none

none

none

According to the Huffman tree, establish DHT weights, actual saved data and DCT quantized data table 2. Check this table to decode jpeg:

serial number

Number of Bits

codeword length

Codeword

DHT weight

Size (numeric bits width)

Additionnal Bits

(Actually saved data)

DC-value (data after DCT quantization)

1

2bits

2

00

0x0

0x00

0

0

2

3bits

3

010

0x2

0x01

1

0

1

-1

1

3

3

011

0x3

0x02

2

00,01

10,11

-3,-2

2,3

4

3

100

0x4

0x03

3

000,001,010,011

100,101,110,111

-7,-6,-5,-4

4,5,6,7

5

3

101

0x5

0x04

4

0000,…,0111

1000,…,1111

-15,…,-8

8,…,15

6

3

110

0x6

0x05

5

0000 0,…,01111

1000 0,…,11111

-31,…,-16

16,…,31

7

4bits

4

1110

0xE

0x06

6

0000 00,…,011111

1000 00,…,111111

-64,…,-32

32,…,64

8

5bits

5

1111 0

0x1E

0x07

7

0000 000,…

…,1111 111

-127,…,-64

64,…,127

9

6bits

6

1111 10

0x3E

0x08

8

0000 0000,…

…,1111 1111

-255,…,-128

128,…,255

A

7bits

7

1111 110

0x7E

0x09

9

0000 0000 0,…

…,1111 1111 1

-511,…,-256

256,…,511

B

8bits

8

1111 1111

0xFE

0x0A

A

0000 0000 00,…

…,1111 1111 11

-1023,…,-512

512,…,1023

C

9bits

9

1111 1111 0

0x1FE

0x0B

B

0000 0000 000,…

…,1111 1111 111

-2047,…,-1024

1024,…,2047

High 4bits: the number of zeros reserved

Low 4bits: the length of the next data bit

0n

negative number

A positive number

-(1<<(n+1)-1) ~ -(1<<n)

(1<<n) ~ (1<<(n+1)-1)

In the DHT weight table, the high 4 bits indicate the number of zeros reserved, and the low 4 bits indicate the length of the following data bits.

Huffman :DC actual value-  >  Size[ encoding length]-> weight value->  bitstring{.Len; .value;} 

Encode: DC actual value  ->  DQT quantization  ->  ZigZag scan  -> (Y: DPCM encoding , CbCr) ->   huffman ->  write

 

Reconstruct the Huffman table based on the Huffman coefficient, or read the Huffman coefficient:

static BYTE std_dc_luminance_nrcodes[17]={0,0,1,5,1,1,1,1,1,1,0,0,0,0,0,0,0};

static BYTE std_dc_luminance_values[12]={0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11};

static BYTE std_dc_chrominance_nrcodes[17]={0,0,3,1,1,1,1,1,1,1,1,1,0,0,0,0,0};

static BYTE std_dc_chrominance_values[12]={0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11};

static BYTE std_ac_luminance_nrcodes[17]={0,0,2,1,3,3,2,4,3,5,5,4,4,0,0,1,0x7d };

static BYTE std_ac_luminance_values[162]=

  { 0x01, 0x02, 0x03, 0x00, 0x04, 0x11, 0x05, 0x12, 0x21, 0x31, 0x41, 0x06, 0x13, 0x51, 0x61, 0x07, 0x22, 0x71, 0x14, 0x32, 0x81, 0x91, 0xa1, 0x08, 0x23, 0x42, 0xb1, 0xc1, 0x15, 0x52, 0xd1, 0xf0, 0x24, 0x33, 0x62, 0x72, 0x82, 0x09, 0x0a, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49, 0x4a, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5a, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa };

static BYTE std_ac_chrominance_nrcodes[17]={0,0,2,1,2,4,4,3,4,7,5,4,4,0,1,2,0x77};

static BYTE std_ac_chrominance_values[162]=

{ 0x00, 0x01, 0x02, 0x03, 0x11, 0x04, 0x05, 0x21, 0x31, 0x06, 0x12, 0x41, 0x51, 0x07, 0x61, 0x71, 0x13, 0x22, 0x32, 0x81, 0x08, 0x14, 0x42, 0x91, 0xa1, 0xb1, 0xc1, 0x09, 0x23, 0x33, 0x52, 0xf0, 0x15, 0x62, 0x72, 0xd1, 0x0a, 0x16, 0x24, 0x34, 0xe1, 0x25, 0xf1, 0x17, 0x18, 0x19, 0x1a, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49, 0x4a, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5a, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa };

According to the above method of reconstructing the Huffman table, we can get:

DHT-Y-DC

DHT-Y-AC

DHT-CbCr-DC

DHT-CbCr-AC

serial number

The same number of Bits: number of codewords

Same number of bits: start of codeword

Same number of bits: end of codeword

The same number of Bits: number of codewords

Same number of bits: start of codeword

Same number of bits: end of codeword

The same number of Bits: number of codewords

Same number of bits: start of codeword

Same number of bits: end of codeword

The same number of Bits: number of codewords

Same number of bits: start of codeword

Same number of bits: end of codeword

1 

0

0

0

0

0

0

0

0

0

0

0

0

2

1

0

0

2

0

0x1

3

0

0x2

2

0

0x1

3

5

0x2

0x6

1

0x4

0x4

1

0x6

0x6

1

0x4

0x4

4

1

0xE

0xE

3

0xA

0xC

1

0xE

0xE

2

0xA

0xB

5

1

0x1E

0x1E

3

0x1A

0x1C

1

0x1E

0x1E

4

0x18

0x1B

6

1

0x3E

0x3E

2

0x3A

0x3B

1

0x3E

0x3E

4

0x38

0x3B

7

1

0x7E

0x7E

4

0x78

0x7B

1

0x7E

0x7E

3

0x78

0x7A

8

1

0xFE

0xFE

3

0xF8

0xFA

1

0xFE

0xFE

4

0xF6

0xF9

9

1

0x1FE

0x1FE

5

0x1F6

0x1FA

1

0x1FE

0x1FE

7

0x1F4

0x1FA

10

0

0

0

5

0x3F6

0x3FA

1

0x3FE

0x3FE

5

0x3F6

0x3FA

11

0

0

0

4

0x7F6

0x7F9

1

0x7FE

0x7FE

4

0x7F6

0x7F9

12

0

0

0

4

0xFF4

0xFF7

0

0

0

4

0xFF4

0xFF7

13

0

0

0

0

0

0

0

0

0

0

0

0

14

0

0

0

0

0

0

0

0

0

1

0x3FE0

 0x3FE0

15

0

0

0

1

0x7FC0

0x7FC0

0

0

0

2

0x7FC2

0x7FC3

At this time, Huffman's table of size (number of codeword bits) and code (encoded value with the same number of bits) is obtained. This enables Huffman's decoding process:

1) Convert the data to be decoded into a binary data stream;

2) Traverse the tables Huffman_size and Huffman_code, find the binary data segment whose length is equal to Huffman_size and whose content is equal to Huffman_code from the binary data stream to be decoded, and record the ID of the table below (that is, which data segment of the table is being searched for arrived);

3) Divide this ID value by 16, and its quotient is cnt (referring to cnt 0s before), and the remaining number is the access length Len;

4) In the binary data stream, start fetching numbers from the same data stream as Huffman_code. The fetch length is Len obtained in step 3. Assume that the data obtained is data;

5) According to the value of data, convert to get the corresponding decoded data de_data. (According to the highest bit, if it is 1, it is the corresponding number, and if it is 0, it is the negative value after inversion. For example, data=100, then the decoded data de_data value is 4; data=010, the decoded data de_data is -5;

6) Write the value of de_data, and add cnt 0s in front. At this point, decoding is completed.

Example data explanation :

 长度01 A2后面的字节: 00 表示Y-DC, tablenum=0; 10表示Y-AC, tablenum=1;01表示Cb-DC,tablenum=2;11表示Cb-AC,tablenum=3;

后面的数据依次是bits位数的个数表,bits位数表(码表);

根据这个重建Huffman表,得到size与code表;

DHT后面是SOS数据,实际的数据流从E2 E8 A2 8A F9 93 F7 开始

Huffman解码时,每次读取32bit数据,此次前4字节数据为E2 E8 A2 8A,转换城二进制数据为:

1110 0010 1110 1000 1010 0010 10000 1010

首次编码的数据一定是Y-DC,所以这里可跟表2匹配:匹配二进制数据的长度与Huffman_size相等,内容与Huffman_code相等的二进制数据段,记录下Huffman的ID号,此时匹配上的1110, 码字长度为4,对应的DHT权值为6,即后面需要读取6 bits数据(001011)作为该组数据,对应的实际DCT量化后的数据值为-53。【(001011对应10进制数据为11) 故数据为:11- (1<<6)  ==> -53 】,此次计算共用的bit位数为4+6 = 10,所以下一组数据从偏移10bits开始,依次类推,可得出下如下数据:

1110 0010 11    101 0001    010 0     010 1     0000 1010

二进制数据 1110 0010 11  101 0001 010 0 010 1 00 00

1010 后面读byte数据,补足32bit,再做Huffman解码,如:

1010 (F9)(93) (F7)

...
实际数值 -53 -14 -1 1 0 0 ...
说明 4bits码字,6bits数据 3bits码字,4bits数据 3bits码字,1bits数据 3bits码字,1bits数据 2bits数据 ...

当数据长度不够时,可从后面读取byte的数据以填充,以补足32 bits数据做Huffman解码。 

 Y-AC,Cb-DC,Cb-AC也依次类推得到相应的数据值。

Guess you like

Origin blog.csdn.net/u010192735/article/details/120860826