DHT Huffuman table format
-------------------------------------------------- ------------------------
Name Number of Bytes Value Description
--------------------- -------------------------------------------------- ---
Segment identification 1 FF
segment type 1 C4
segment length 2 Its value = 19 + n (when there is only one HT table)
(The following is the segment content)
HT information 1 0-3 bits: HT number
4 bits: HT type, 0= DC table, 1 = AC table
5-7 bits: must = 0
HT bit table 16 The sum of these 16 numbers should be ≤ 256
HT value table n n = The sum of the 16 numbers in the header
---------- -------------------------------------------------- ---------------
Read Huffman table
FF C4 01 A2 00 00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0A 0B
FF C4: Huffman table identification code
01 A2: DHT table length (number of bytes from 01 to 0B)
00: 4bit=0, it is DC meter; the low 3bit=0, HT number is 0; it means DC meter number 0
00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 00: The number of codewords with different bits of DHT. The sum of the data represents the number of leaf nodes: 1+5+1+1+1+1+1+1 = 12;
00 01 02 03 04 05 06 07 08 09 0A 0B: Encoding content, that is, the encoding value under each leaf node
Construct Huffman tree
After reading the data from the Huffman table, you need to build the Huffman tree. The specific rules are as follows:
(a) The first coded number must be 0; if the first coded digit is 1, it is coded as 0; if the first coded digit is 2, it is coded as 00; if the first coded digit is 2 If the number of digits in a code is 3, it is coded as 000. . .
(b) Starting from the second code, if it has the same number of digits as the previous code, the current code is the previous code plus 1; if its code digits are greater than the previous code digits, the current code is When encoding, add 1 to the previous code and then add several 0s at the end until the length of the number of coded bits is met.
Or take the above data 00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 00 as an example:
The first byte 00 indicates that there is no code with a digit of 1;
The second byte 01 indicates that there are two codes with a digit of 2; since there is no code with a digit of 1, the first of the codes with a digit of 2 here is 00;
The third byte 05 indicates that there are 5 codes with 3 digits; therefore, the first code with 3 digits here is 00+1=01, and then add 1 "0" to get 010; bit The second code with the number 3 is 010+1=011; the third is 011+1=100; the fourth is 100+1=101; the fifth is 101+1=110;
The fourth byte 01 indicates that there is 1 code with 4 digits; therefore, the first code with 4 digits here is 110+1=111, and then add 1 "0" to get 1110;
The fifth byte 01 indicates that there is 1 code with 5 digits; therefore, the first code with 5 digits here is 1110+1=1111, and then add 1 "0" to get 11110;
By analogy, the following Huffman tree is obtained, Table 1:
Y(luminance)-DC
|
||||||||||||||||
Serial number (number of bits) |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
The number of identical Bits |
0 |
1 |
5 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
illustrate |
There is no encoding with a digit of 1 | 1 2bits |
5 3bits |
1 4bits |
1 5bits |
1 6bits |
1 7bits |
1 8bits |
1 9bits |
There is no encoding with 10 digits |
There is no encoding with 11 digits |
There is no encoding with 12 digits |
There is no encoding with 13 digits |
There is no encoding with 14 digits |
No encoding with 15 digits |
There is no encoding with 16 digits |
Codeword (binary) |
none |
00 |
(00+1)<<1 ==> 010 011 100 101 110 |
(110+1)<<1 ==> 1110 |
(1110+1)<<1 ==> 1111 0 |
(11110+1)<<1 ==> 1111 10 |
(111110+1)<<1 ==> 1111 110 |
(1111110+1)<<1 ==> 1111 1110 |
(11111110+1)<<1 ==> 1111 1111 0 |
none |
none |
none |
none |
none |
none |
none |
According to the Huffman tree, establish DHT weights, actual saved data and DCT quantized data table 2. Check this table to decode jpeg:
serial number |
Number of Bits |
codeword length |
Codeword |
DHT weight |
Size (numeric bits width) |
Additionnal Bits (Actually saved data) |
DC-value (data after DCT quantization) |
|||
1 |
2bits |
2 |
00 |
0x0 |
0x00 |
0 |
0 |
|||
2 |
3bits |
3 |
010 |
0x2 |
0x01 |
1 |
0 |
1 |
-1 |
1 |
3 |
3 |
011 |
0x3 |
0x02 |
2 |
00,01 |
10,11 |
-3,-2 |
2,3 |
|
4 |
3 |
100 |
0x4 |
0x03 |
3 |
000,001,010,011 |
100,101,110,111 |
-7,-6,-5,-4 |
4,5,6,7 |
|
5 |
3 |
101 |
0x5 |
0x04 |
4 |
0000,…,0111 |
1000,…,1111 |
-15,…,-8 |
8,…,15 |
|
6 |
3 |
110 |
0x6 |
0x05 |
5 |
0000 0,…,01111 |
1000 0,…,11111 |
-31,…,-16 |
16,…,31 |
|
7 |
4bits |
4 |
1110 |
0xE |
0x06 |
6 |
0000 00,…,011111 |
1000 00,…,111111 |
-64,…,-32 |
32,…,64 |
8 |
5bits |
5 |
1111 0 |
0x1E |
0x07 |
7 |
0000 000,… |
…,1111 111 |
-127,…,-64 |
64,…,127 |
9 |
6bits |
6 |
1111 10 |
0x3E |
0x08 |
8 |
0000 0000,… |
…,1111 1111 |
-255,…,-128 |
128,…,255 |
A |
7bits |
7 |
1111 110 |
0x7E |
0x09 |
9 |
0000 0000 0,… |
…,1111 1111 1 |
-511,…,-256 |
256,…,511 |
B |
8bits |
8 |
1111 1111 |
0xFE |
0x0A |
A |
0000 0000 00,… |
…,1111 1111 11 |
-1023,…,-512 |
512,…,1023 |
C |
9bits |
9 |
1111 1111 0 |
0x1FE |
0x0B |
B |
0000 0000 000,… |
…,1111 1111 111 |
-2047,…,-1024 |
1024,…,2047 |
High 4bits: the number of zeros reserved Low 4bits: the length of the next data bit |
0n |
negative number |
A positive number |
-(1<<(n+1)-1) ~ -(1<<n) |
(1<<n) ~ (1<<(n+1)-1) |
|||||
In the DHT weight table, the high 4 bits indicate the number of zeros reserved, and the low 4 bits indicate the length of the following data bits. |
||||||||||
Huffman :DC actual value- > Size[ encoding length]-> weight value-> bitstring{.Len; .value;} Encode: DC actual value -> DQT quantization -> ZigZag scan -> (Y: DPCM encoding , CbCr) -> huffman -> write |
Reconstruct the Huffman table based on the Huffman coefficient, or read the Huffman coefficient:
static BYTE std_dc_luminance_nrcodes[17]={0,0,1,5,1,1,1,1,1,1,0,0,0,0,0,0,0};
static BYTE std_dc_luminance_values[12]={0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11};
static BYTE std_dc_chrominance_nrcodes[17]={0,0,3,1,1,1,1,1,1,1,1,1,0,0,0,0,0};
static BYTE std_dc_chrominance_values[12]={0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11};
static BYTE std_ac_luminance_nrcodes[17]={0,0,2,1,3,3,2,4,3,5,5,4,4,0,0,1,0x7d };
static BYTE std_ac_luminance_values[162]=
{ 0x01, 0x02, 0x03, 0x00, 0x04, 0x11, 0x05, 0x12, 0x21, 0x31, 0x41, 0x06, 0x13, 0x51, 0x61, 0x07, 0x22, 0x71, 0x14, 0x32, 0x81, 0x91, 0xa1, 0x08, 0x23, 0x42, 0xb1, 0xc1, 0x15, 0x52, 0xd1, 0xf0, 0x24, 0x33, 0x62, 0x72, 0x82, 0x09, 0x0a, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49, 0x4a, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5a, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa };
static BYTE std_ac_chrominance_nrcodes[17]={0,0,2,1,2,4,4,3,4,7,5,4,4,0,1,2,0x77};
static BYTE std_ac_chrominance_values[162]=
{ 0x00, 0x01, 0x02, 0x03, 0x11, 0x04, 0x05, 0x21, 0x31, 0x06, 0x12, 0x41, 0x51, 0x07, 0x61, 0x71, 0x13, 0x22, 0x32, 0x81, 0x08, 0x14, 0x42, 0x91, 0xa1, 0xb1, 0xc1, 0x09, 0x23, 0x33, 0x52, 0xf0, 0x15, 0x62, 0x72, 0xd1, 0x0a, 0x16, 0x24, 0x34, 0xe1, 0x25, 0xf1, 0x17, 0x18, 0x19, 0x1a, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49, 0x4a, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5a, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa };
According to the above method of reconstructing the Huffman table, we can get:
DHT-Y-DC |
DHT-Y-AC |
DHT-CbCr-DC |
DHT-CbCr-AC |
|||||||||
serial number |
The same number of Bits: number of codewords |
Same number of bits: start of codeword |
Same number of bits: end of codeword |
The same number of Bits: number of codewords |
Same number of bits: start of codeword |
Same number of bits: end of codeword |
The same number of Bits: number of codewords |
Same number of bits: start of codeword |
Same number of bits: end of codeword |
The same number of Bits: number of codewords |
Same number of bits: start of codeword |
Same number of bits: end of codeword |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
1 |
0 |
0 |
2 |
0 |
0x1 |
3 |
0 |
0x2 |
2 |
0 |
0x1 |
3 |
5 |
0x2 |
0x6 |
1 |
0x4 |
0x4 |
1 |
0x6 |
0x6 |
1 |
0x4 |
0x4 |
4 |
1 |
0xE |
0xE |
3 |
0xA |
0xC |
1 |
0xE |
0xE |
2 |
0xA |
0xB |
5 |
1 |
0x1E |
0x1E |
3 |
0x1A |
0x1C |
1 |
0x1E |
0x1E |
4 |
0x18 |
0x1B |
6 |
1 |
0x3E |
0x3E |
2 |
0x3A |
0x3B |
1 |
0x3E |
0x3E |
4 |
0x38 |
0x3B |
7 |
1 |
0x7E |
0x7E |
4 |
0x78 |
0x7B |
1 |
0x7E |
0x7E |
3 |
0x78 |
0x7A |
8 |
1 |
0xFE |
0xFE |
3 |
0xF8 |
0xFA |
1 |
0xFE |
0xFE |
4 |
0xF6 |
0xF9 |
9 |
1 |
0x1FE |
0x1FE |
5 |
0x1F6 |
0x1FA |
1 |
0x1FE |
0x1FE |
7 |
0x1F4 |
0x1FA |
10 |
0 |
0 |
0 |
5 |
0x3F6 |
0x3FA |
1 |
0x3FE |
0x3FE |
5 |
0x3F6 |
0x3FA |
11 |
0 |
0 |
0 |
4 |
0x7F6 |
0x7F9 |
1 |
0x7FE |
0x7FE |
4 |
0x7F6 |
0x7F9 |
12 |
0 |
0 |
0 |
4 |
0xFF4 |
0xFF7 |
0 |
0 |
0 |
4 |
0xFF4 |
0xFF7 |
13 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
14 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0x3FE0 |
0x3FE0 |
15 |
0 |
0 |
0 |
1 |
0x7FC0 |
0x7FC0 |
0 |
0 |
0 |
2 |
0x7FC2 |
0x7FC3 |
1) Convert the data to be decoded into a binary data stream;
2) Traverse the tables Huffman_size and Huffman_code, find the binary data segment whose length is equal to Huffman_size and whose content is equal to Huffman_code from the binary data stream to be decoded, and record the ID of the table below (that is, which data segment of the table is being searched for arrived);
3) Divide this ID value by 16, and its quotient is cnt (referring to cnt 0s before), and the remaining number is the access length Len;
4) In the binary data stream, start fetching numbers from the same data stream as Huffman_code. The fetch length is Len obtained in step 3. Assume that the data obtained is data;
5) According to the value of data, convert to get the corresponding decoded data de_data. (According to the highest bit, if it is 1, it is the corresponding number, and if it is 0, it is the negative value after inversion. For example, data=100, then the decoded data de_data value is 4; data=010, the decoded data de_data is -5;
6) Write the value of de_data, and add cnt 0s in front. At this point, decoding is completed.
Example data explanation :
长度01 A2后面的字节: 00 表示Y-DC, tablenum=0; 10表示Y-AC, tablenum=1;01表示Cb-DC,tablenum=2;11表示Cb-AC,tablenum=3;
后面的数据依次是bits位数的个数表,bits位数表(码表);
根据这个重建Huffman表,得到size与code表;
DHT后面是SOS数据,实际的数据流从E2 E8 A2 8A F9 93 F7 开始
Huffman解码时,每次读取32bit数据,此次前4字节数据为E2 E8 A2 8A,转换城二进制数据为:
1110 0010 1110 1000 1010 0010 10000 1010
首次编码的数据一定是Y-DC,所以这里可跟表2匹配:匹配二进制数据的长度与Huffman_size相等,内容与Huffman_code相等的二进制数据段,记录下Huffman的ID号,此时匹配上的1110, 码字长度为4,对应的DHT权值为6,即后面需要读取6 bits数据(001011)作为该组数据,对应的实际DCT量化后的数据值为-53。【(001011对应10进制数据为11) 故数据为:11- (1<<6) ==> -53 】,此次计算共用的bit位数为4+6 = 10,所以下一组数据从偏移10bits开始,依次类推,可得出下如下数据:
1110 0010 11 101 0001 010 0 010 1 0000 1010
二进制数据 | 1110 0010 11 | 101 0001 | 010 0 | 010 1 | 00 | 00 | 1010 后面读byte数据,补足32bit,再做Huffman解码,如: 1010 (F9)(93) (F7) |
... |
实际数值 | -53 | -14 | -1 | 1 | 0 | 0 | ... | |
说明 | 4bits码字,6bits数据 | 3bits码字,4bits数据 | 3bits码字,1bits数据 | 3bits码字,1bits数据 | 2bits数据 | ... |
当数据长度不够时,可从后面读取byte的数据以填充,以补足32 bits数据做Huffman解码。
Y-AC,Cb-DC,Cb-AC也依次类推得到相应的数据值。