System Architecture-2-3 Optimization of Instruction Opcodes: Huffman Coding

2020 System Architecture Series Articles


After taking the exam for so long, I am still working on the topic, and I admire myself.

command format

Instructions are composed of two parts: opcode and address code .

instruction optimization

As far as the optimization of the instruction format is concerned, it refers to how to use the shortest number of bits to represent the operation information and address information of the instruction, so that the average word length of the instruction in the program is the shortest.
The optimization of the opcode, that is, the shortest number of bits to represent the operation information of the instruction.

Organization scheme for fixed-length opcodes

Fixed number of bits : For example, the opcode is fixed to 6 bits.
Allocating a fixed number of bits in the highest part of the instruction word to represent the operation code is beneficial to simplify the design of computer hardware and improve the speed of instruction decoding and recognition.
For example: 1BM360 machine, teaching computer

Organization scheme for variable-length opcodes

Unfixed number of bits : The opcode can be 6 bits or 7 bits or 8 bits.
A fixed-length field is used to represent the basic operation code in the most significant part of the instruction word, and for some instructions with few operand addresses, their operation code is expanded to the operand address field of the instruction, that is, the operation code can become longer.
This method can express more instructions without increasing the length of the instruction word, but it increases the difficulty of decoding and analysis, and requires more hardware support

Optimization of instruction opcode part

How to optimize?
First of all, we need to know that the optimization of the opcode part refers to the optimization of variable-length opcodes . Because there is no optimization space for fixed-length opcodes.
The idea of ​​​​optimization is: find a way to shorten the number of bits of the instruction's opcode.

How to do?
According to the frequency of use of instructions, we should express the most commonly used instructions with the shortest possible number of bits. This involves the concept of Huffman compression .

The basic idea of ​​the Huffman compression concept is that when the probability of occurrence of various events is not equal, the optimization technology is used to represent (process) the event with the highest probability of occurrence with the shortest number of digits (time), and to express (process) the event with the highest probability of occurrence. Low events allow a longer number of bits (time) to represent (processing), which will shorten the average number of bits (time) that represent "(processing). The optimization of the
opcode representation is mainly to shorten the instruction word length , reduce the total number of bits in the program and increase the operation information and address information that can be represented by the instruction word. To optimize the expression of the operation code, it is necessary to know the probability (frequency of use) of each instruction in the program. Generally, a large number of Existing typical procedures for statistical calculation.

Huffman coding

We already know the concept of Huffman coding. Next, let's apply it according to the example. The topic is as follows:
insert image description here
Conclusion ideas: 1. Use the Huffman algorithm to construct a Huffman tree.
It is very simple to construct a Huffman tree. Put all the nodes into a queue, replace the two nodes with the lowest frequency with one node, and the frequency of the new node is the sum of the frequencies of the two nodes. In this way, the new node is the parent node of the two replaced nodes. This loops until there is only one node (the root of the tree) left in the queue.
insert image description here
2. According to the drawn Huffman tree, starting from the root node, the code sequence passed along the line to each frequency instruction constitutes the Huffman code of the frequency instruction, as shown in the following table: 3. Calculated according to the coding
insert image description here
table Average yard length.
As shown in the figure below, the calculated average code length is 2.2 bits, which is very close to the H given in the title, and the information redundancy of this encoding is only 1.36%, which is much smaller than the information redundancy of the three-bit fixed-length code (28 %). Therefore, full Huffman coding is the most optimal coding.
However, there are too many types of code lengths for this encoding. As shown in the above table, there are 4 kinds of code lengths for 7 kinds of instructions, the lengths are 1, 2, 3, and 5, which are irregular and difficult to encode . For this reason, combined with general binary coding, it is extended on the basis of Huffman coding.
insert image description here

Extended Opcode Encoding

Extended opcode encoding is an encoding method between fixed-length binary encoding and complete Huffman encoding. The opcode is not fixed-length, but only has a limited number of code lengths. As in the above topic, if only two fixed code lengths are required to represent 7 instructions, what encoding scheme should be used?

In the best case, it must be as short as possible. So let's start with the shortest code length.

When the code length is 1, it can only represent 2 instructions (2 to the power of 1: 0 and 1) at this time, and one is used for expansion. How many bits can be expanded?
1 digit, can represent 2 items, (2-1)+2=3<7, insufficient representation;
2 digits, can represent 4 items, (2-1)+4=5<7, insufficient representation;
3 digits , 1+8=9>7, satisfying the condition.
The first scheme has come out, there is 1 code length of 1, and 6 code lengths of 4, the average code length=1 0.4+4 (…)=2.8
Note: In the case of insufficient representation, in fact, it can continue to expand , but this does not meet the restrictions of the two code lengths.

When the code length is 2, it can only represent 4 instructions (2 to the power of 2: 00, 01, 10, 11), 3 instructions are occupied, and 1 is used for expansion. How many bits can be expanded?
1 digit can represent 2 items, (4-1)+2=5<7, insufficient representation;
2 digits can represent 4 items, (4-1)+4=7=7, satisfying the condition.
The second scheme came out. There are 3 codes with a code length of 2, and 4 codes with a code length of 4. The average code length=2*(0.4+0.3+0.15)+4*(…)=2.3

In the case of a code length of 3, 8 instructions (2 to the 3rd power: 000, 001, 010, 011, 100, 101, 110, 111) can be represented at this time, which is enough to represent, but it does not meet the requirements of the two code lengths. Requirement, so pass directly. And the subsequent code length > 3, also does not meet.

To sum up, at present, we only have two solutions, and we only need to compare which one has the shortest average code length.
Conclusion, the optimal coding scheme: Divide instructions into two groups according to the frequency of instruction usage, and use 2-bit opcode codes to express the 3 high-frequency instructions I1, I2, and I3, leaving a 2-bit code as an extension flag , extended to 2 bits, used to encode the remaining 4 instructions with lower frequency. The average yard length is 2.3. (Obviously, the average code length of the extended opcode encoding is greater than that of the Huffman encoding, with more redundancy and some waste, but regular. Hardware implementation is more convenient)

The specific coding scheme is as follows:insert image description here

Little water monsters who love programming, welcome to pay attention. Please point out any mistakes, let's work together.

Guess you like

Origin blog.csdn.net/changhuzichangchang/article/details/119187648