Article directory
Preface
Because the CPU can only recognize and execute machine instructions, assembly instructions need to be translated [converted, encoded] into machine instructions.
The official manual "Intel_8086_Family_Users_Manual" contains the format of machine instructions corresponding to all assembly instructions.
This document introduces how the translation of mov
instructions is implemented.
General format of machine instructions
8086 Machine instructions vary in length from 1 byte to 6 bytes. The format of most instructions is as follows:
The first 6 bits of a multi-byte instruction usually contain an opcode that identifies the basic instruction type: such as ADD, XOR, etc.
The next bits are called the D field and generally specify the "direction" of the operation: 1 = REG field identifies the destination operand, 0 = REG field identifies the source operand.
The W field distinguishes between byte and word operations: 0 = byte operation, 1 = word operation.
In addition to D and W, other instructions may also have the following fields:
The 2nd byte of the instruction usually identifies the operands of the instruction. The MOD field indicates whether one of the two operands is in memory [that is, whether one of them is a memory operand. Because it is impossible for an 8086 instruction to have two memory operands at the same time] or whether both operands are registers:
The REG field identifies which register the operand uses [register operand]:
In some instructions, primarily immediate-to-memory type instructions, REG is used to expand the opcode to identify the type of operation.
The encoding of the R/M (register/memory) field depends on how the MOD field is set. If MOD = 11 (register-to-register mode), then R/M identifies the second register operand. If MOD is the memory operating mode, then R/M indicates how the effective address of the memory operand is calculated:
Bytes 3 to 6 of the instruction are optional and usually contain the memory operand offset [displacement] and/or an immediate value. The MOD field indicates whether the length of the offset is 1 byte or 2 bytes [1 word], and the 2nd byte is the highest byte of the word.
The immediate number after the offset is also optional, byte 2 is the highest byte.
The format of the mov machine instruction
The format of the mov machine instruction is as follows:
You can see that instructions are divided into seven categories:
- register/memory to/from register
- Immediately count to register/memory
- Count to register immediately [This encoding format is shorter than the above]
- memory to accumulator
- accumulator to memory
- register/memory to segment register
- segment register to register/memory
The format of the mov assembly instruction
According to the mov machine instruction format, there are 11 types of mov assembly instructions, [the right side is the source operand, the left side is the destination operand]:
- Register to register [mov ax,bx]
- Memory to register [mov cx,[bp+si+1]]
- Register to memory [mov [bx+si],ax]
- Count to the register immediately [mov ax,123]
- Count to memory immediately [mov [bx],123]
- Memory to accumulator [mov ax,[si+1]]
- Accumulator to memory [mov [si+1],al]
- Register to segment register [mov cs,ax]
- Memory to segment register [mov cs,[1]]
- Segment register to register [mov cx,cs]
- Segment register to memory [mov [di+1],ds]
Translation of mov instruction
Identify operand types
To identify which format the mov instruction in assembler is, we first need to identify the type of the operands.
For example, mov [bx],123, the source operand type is immediate data, and the destination operand is a memory operand.
mov cx, cs, the source operand type is a segment register, and the destination operand is a [general] register.
So I defined the following operand types:
type OperandType uint8
const (
Immediate8Operand OperandType = iota
Immediate16Operand
ImmediateLabelOperand
ImmediateOffsetLabelOperand
Reg8Operand
Reg16Operand
SegRegOperand
Mem8Operand
Mem16Operand
MemUnknownSizeOperand
InvalidOperand
)
type Operand struct {
Type OperandType // 操作数类型
Value interface{
} // 操作数的值
}
immediate number
Immediate numbers are divided into 4 categories:
- 8-bit immediate number
- 16-bit immediate
- Internal label
- Internal label with offset modification
Look at the following sample program:
assume cs:code,ds:data,ss:stack ;将cs,ds,ss分别和code,data,stack段相连
data segment
dw 0123h, 0456h, 0789h, 0abch, 0defh, 0fedh, 0cbah, 0987h
data ends
stack segment
dw 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
stack ends
code segment
start: mov ax,stack
mov ss,ax
mov sp,20h ; 将设置栈顶ss:sp指向stack:20
mov ax, data ; 将名称为"data"的段的段地址送入ax
mov ds,ax ; ds指向data段
mov bx,0 ; ds:bx指向data段中的第一个单元
mov cx,8
s0: push cs:[bx]
add bx,2
loop s0 ; 以上将代码段0~15单元总的8个字型数据依次入栈
mov bx,0
mov cx, 8
s1:pop cs:[bx]
add bx,2
loop s1 ; 以上依次出栈8个字型数据到代码段0~15单元中
mov ax,4c00h
int 21h
code ends
end start
mov ax, data, this data is the internal label. Its value is the value of the data segment when the program is loaded, and it is an immediate number.
The same goes for mov ax,stack.
Look at the following sample program:
assume cs:code
code segment
mov ax,4c00h
int 21h
start: mov ax,0
s: nop
nop
mov di,offset s
mov si,offset s2
mov ax,cs:[si]
mov cs:[di],ax
s0: jmp short s
s1: mov ax,0
int 21h
mov ax,0
s2: jmp short s1
nop
code ends
end start
mov di,offset s means putting the [code segment] offset of external label s into the di register. Here s is the label with offset.
[The format mov di,s is not supported yet, where s is the starting address of some data defined in the data segment. In other words, s is a variable]
The same goes for mov si,offset s2.
Use the following structure to represent an immediate number:
type ImmediateOperand struct {
Value uint16 // 值
Width uint8 // 立即数宽度,8位或16位
IsLabel bool // 是否是内部标号
IsLabelOffset bool // 是否是内部带offset的标号
Label string // 标号的名字
}
Use the isImmediateOperand function to determine whether it is an 8-bit or 16-bit immediate number.
// 12345
// 0ffffh
// 123h
// 1000b
// -123h
//'a'
func isSizedImmediateOperand(s string, bitSize int) (bool, *ImmediateOperand) {
s = strings.TrimSpace(s)
if len(s) == 0 {
return false, nil
}
if bitSize == 8 {
if len(s) == 3 &&
s[0] == '\'' &&
s[2] == '\'' {
return true, &ImmediateOperand{
Value: uint16(s[1]),
Width: 8,
}
}
}
isNegative := false
if s[0] == '-' {
isNegative = true
}
base := 10
if strings.HasSuffix(s, "h") {
base = 16
s = s[:len(s)-1]
} else if strings.HasSuffix(s, "b") {
base = 2
s = s[:len(s)-1]
}
if isNegative {
v, err := strconv.ParseInt(s, base, bitSize)
if err != nil {
return false, nil
}
return true, &ImmediateOperand{
Value: uint16(v),
Width: uint8(bitSize),
}
}
v, err := strconv.ParseUint(s, base, bitSize)
if err != nil {
return false, nil
}
return true, &ImmediateOperand{
Value: uint16(v),
Width: uint8(bitSize),
}
}
func isImmediateOperand(s string) (bool, *ImmediateOperand) {
t, v := isSizedImmediateOperand(s, 8)
if t {
return t, v
}
t, v = isSizedImmediateOperand(s, 16)
if t {
return t, v
}
return false, nil
}
The logic of whether it is an internal label or an internal label with offset is given below.
Register operand
Registers are divided into 8-bit registers [al, cl, dl, bl, ah, chdh, bh], 16-bit registers [ax, cx, dx, bx, sp, bp, si, di], segment registers [es, cs, ss, ds].
Represented by the following structure:
type RegOperand struct {
Name string // 寄存器名称
Width uint8 // 寄存器宽度
IsSegReg bool // 是否是段寄存器
}
The following function is implemented to determine the register type:
var reg8BitMap = map[string]uint8{
"al": 0, //Byte Multiply, Byte Divide, Byte 1/0, Translate, Decimal Arithmetic
"cl": 1, //Variable Shift and Rotate
"dl": 2,
"bl": 3,
"ah": 4, //Byte Multiply, Byte Divide
"ch": 5,
"dh": 6,
"bh": 7,
}
var reg16BitMap = map[string]uint8{
"ax": 0, //Word Multiply, Word Divide, Word 1/0
"cx": 1, //String Operations, Loops
"dx": 2, //Word Multiply, Word Divide, Indirect 1/0
"bx": 3, //Translate
"sp": 4, //Stack Operations
"bp": 5,
"si": 6, //String Operations
"di": 7, //String Operations
}
var regSegMap = map[string]uint8{
"es": 0,
"cs": 1,
"ss": 2,
"ds": 3,
}
func isReg8Operand(s string) (bool, *RegOperand) {
if _, ok := reg8BitMap[s]; !ok {
return false, nil
}
return true, &RegOperand{
Name: s, Width: 8}
}
func isReg16Operand(s string) (bool, *RegOperand) {
if _, ok := reg16BitMap[s]; !ok {
return false, nil
}
return true, &RegOperand{
Name: s, Width: 16}
}
func isSegRegOperand(s string) (bool, *RegOperand) {
if _, ok := regSegMap[s]; !ok {
return false, nil
}
return true, &RegOperand{
Name: s, Width: 16, IsSegReg: true}
}
Note: The values corresponding to the various maps defined above are not determined arbitrarily, but are defined based on the value of the REG field in the machine instruction.
memory operand
Memory operands are the most complex. It has a total of 16 forms as follows:
Counting whether the offset is 8 or 16 bits, there are 24 forms in total. Moreover, immediate data can also be placed outside [], for example
[bx+di+123] and 123[bx+di] and [bx+di]123 These three forms are the same. The compiler must support both.
Memory operands are represented by the following structure:
type MemOperand struct {
IsSingleIndex bool // 是否使用一个索引寄存器,比如[bx],[bx+123]等
SingleIndexReg string // 索引寄存器,比如"bx"
IsDoubleIndex bool // 是否使用两个索引寄存器,比如[bx+si],[bx+di]123等
DoubleIndexFirstReg string // 第一个索引寄存器,比如"bx",“bp”等
DoubleIndexSecondReg string // 第二个索引寄存器 ,比如"si","di"等
HasDisplacement bool // 是否有偏移量
DisplacementValue uint16 // 偏移量的值
DisplacementWidth uint8 // 偏移量的宽度
OperandWidth uint8 // 操作数的宽度,比如 byte ptr [bx],操作数宽度就是8
HasSegmentPrefix bool // 是否有段前缀
SegmentPrefix string // 段前缀名称
}
So late one night, I wrote the longest string processing function in the project code, isSimpleMemOperand, to determine whether it is a memory operand without a segment prefix and without word ptr or byte ptr modification.
func isSimpleMemOperand(s string) (bool, *MemOperand) {
s = strings.TrimSpace(s)
if len(s) == 0 {
return false, nil
}
// 内存操作数必须带有 [ ]
idxL := strings.IndexByte(s, '[')
idxR := strings.IndexByte(s, ']')
if idxL < 0 || idxR < 0 {
return false, nil
}
if idxL >= idxR {
return false, nil
}
// 偏移量可能放在[]左边
hasLeftImmediate := false
var LeftImmediate *ImmediateOperand
var t bool
if idxL != 0 {
if idxR != len(s)-1 {
return false, nil
}
t, LeftImmediate = isImmediateOperand(s[:idxL])
if !t {
return false, nil
}
hasLeftImmediate = true
}
// 偏移量可能放在[]右边
hasRightImmediate := false
var RightImmediate *ImmediateOperand
if idxR != len(s)-1 {
t, RightImmediate = isImmediateOperand(s[idxR+1:])
if !t {
return false, nil
}
hasRightImmediate = true
}
var Immediate *ImmediateOperand
if hasLeftImmediate {
Immediate = LeftImmediate
}
if hasRightImmediate {
Immediate = RightImmediate
}
filedFunc := func(r rune) bool {
if r == ' ' || r == '+' {
return true
}
return false
}
// 将[]中的索引寄存器名称分离出来
fields := strings.FieldsFunc(s[idxL+1:idxR], filedFunc)
if len(fields) > 3 {
return false, nil
}
if len(fields) == 1 {
//外面有偏移量
// [SI]d8
// [DI]d8
// [BP]d8
// [BX]d8
// [SI]d16
// [DI]d16
// [BP]d16
// [BX]d16
if Immediate != nil {
if fields[0] != "si" &&
fields[0] != "di" &&
fields[0] != "bx" &&
fields[0] != "bp" {
return false, nil
}
return true, &MemOperand{
IsSingleIndex: true,
SingleIndexReg: fields[0],
HasDisplacement: true,
DisplacementValue: Immediate.Value,
DisplacementWidth: Immediate.Width,
}
}
//外面没有偏移量
//[SI]
//[DI]
//[d16]
//[BX]
if fields[0] != "si" &&
fields[0] != "di" &&
fields[0] != "bx" {
t, Immediate = isImmediateOperand(fields[0])
if !t {
return false, nil
}
return true, &MemOperand{
HasDisplacement: true,
DisplacementValue: Immediate.Value,
DisplacementWidth: Immediate.Width,
}
}
return true, &MemOperand{
IsSingleIndex: true,
SingleIndexReg: fields[0],
}
}
if len(fields) == 2 {
//外面有偏移量
// [BX + SI]d8
// [BX + DI]d8
// [BP + SI]d8
// [BP + DI]d8
// [BX + SI]d16
// [BX + DI]d16
// [BP + SI]d16
// [BP + DI]d16
if Immediate != nil {
switch fields[0] {
case "bx", "bp":
if fields[1] != "si" &&
fields[1] != "di" {
return false, nil
}
case "si", "di":
if fields[1] != "bx" &&
fields[1] != "bp" {
return false, nil
}
default:
return false, nil
}
if fields[0] == "si" || fields[0] == "di" {
fields[0], fields[1] = fields[1], fields[0]
}
return true, &MemOperand{
IsDoubleIndex: true,
DoubleIndexFirstReg: fields[0],
DoubleIndexSecondReg: fields[1],
HasDisplacement: true,
DisplacementValue: Immediate.Value,
DisplacementWidth: Immediate.Width,
}
}
// 外面没有偏移量
t, Immediate = isImmediateOperand(fields[0])
if t {
fields[0], fields[1] = fields[1], fields[0]
} else {
_, Immediate = isImmediateOperand(fields[1])
}
// [BX + SI]
// [BX + DI]
// [BP + SI]
// [BP + DI]
// [SI + d8]
// [DI + d8]
// [BP + d8]
// [BX + d8]
// [SI + d16]
// [DI + d16]
// [BP + d16]
// [BX + d16]
switch fields[0] {
case "bx", "bp":
if fields[1] != "si" &&
fields[1] != "di" &&
Immediate == nil {
return false, nil
}
case "si", "di":
if fields[1] != "bx" &&
fields[1] != "bp" &&
Immediate == nil {
return false, nil
}
default:
return false, nil
}
if Immediate != nil {
return true, &MemOperand{
IsSingleIndex: true,
SingleIndexReg: fields[0],
HasDisplacement: true,
DisplacementValue: Immediate.Value,
DisplacementWidth: Immediate.Width,
}
}
return true, &MemOperand{
IsDoubleIndex: true,
DoubleIndexFirstReg: fields[0],
DoubleIndexSecondReg: fields[1],
}
}
if len(fields) == 3 {
if Immediate != nil {
return false, nil
}
t, Immediate = isImmediateOperand(fields[0])
if t {
fields[0], fields[2] = fields[2], fields[0]
} else {
t, Immediate = isImmediateOperand(fields[1])
if t {
fields[1], fields[2] = fields[2], fields[1]
} else {
t, Immediate = isImmediateOperand(fields[2])
if !t {
return false, nil
}
}
}
if fields[0] != "bx" &&
fields[0] != "bp" {
fields[0], fields[1] = fields[1], fields[0]
}
// [BX + SI + d8]
// [BX + DI + d8]
// [BP + SI + d8]
// [BP + DI + d8]
// [BX + SI + d16]
// [BX + DI + d16]
// [BP + SI + d16]
// [BP + DI + d16]
switch fields[0] {
case "bx", "bp":
if fields[1] != "si" &&
fields[1] != "di" {
return false, nil
}
case "si", "di":
if fields[1] != "bx" &&
fields[1] != "bp" {
return false, nil
}
default:
return false, nil
}
}
return true, &MemOperand{
IsDoubleIndex: true,
DoubleIndexFirstReg: fields[0],
DoubleIndexSecondReg: fields[1],
HasDisplacement: true,
DisplacementValue: Immediate.Value,
DisplacementWidth: Immediate.Width,
}
}
Then isMemOperand is implemented to completely determine whether it is a memory operand:
func isMemOperand(s string) (bool, *MemOperand) {
s = strings.TrimSpace(s)
idxCol := strings.IndexByte(s, ':')
if idxCol == 0 {
return false, nil
}
var t bool
var memOperand *MemOperand
var operandWidth uint8
var segPrefix string
// word ptr ds:[0]
// word ptr ds:[bx+2]
// ds:[bx+si]
if idxCol > 0 {
// have segment override prefix
idxSpace := strings.LastIndexByte(s[:idxCol], ' ')
if idxSpace == 0 {
return false, nil
}
if idxSpace > 0 {
// have word ptr or byte ptr
fields := strings.Fields(s[:idxSpace])
if fields[1] != "ptr" {
return false, nil
}
if fields[0] == "word" {
operandWidth = 16
} else if fields[0] == "byte" {
operandWidth = 8
} else {
return false, nil
}
}
segPrefix = s[idxSpace+1 : idxCol]
if !isSegReg(segPrefix) {
return false, nil
}
t, memOperand = isSimpleMemOperand(s[idxCol+1:])
if !t {
return false, nil
}
fmt.Printf("has seg prefix :%s\n", segPrefix)
memOperand.OperandWidth = operandWidth
memOperand.HasSegmentPrefix = true
memOperand.SegmentPrefix = segPrefix
return true, memOperand
}
//no segment override prefix
// word ptr [bx+2]
//[bx+2]
fields := strings.Fields(s)
if fields[0] == "word" ||
fields[0] == "byte" {
if fields[1] != "ptr" {
return false, nil
}
idxP := strings.IndexByte(s, 'p')
if idxP <= 0 {
return false, nil
}
if fields[0] == "word" {
operandWidth = 16
} else {
operandWidth = 8
}
t, memOperand = isSimpleMemOperand(s[idxP+3:])
if !t {
return false, nil
}
memOperand.OperandWidth = operandWidth
return t, memOperand
}
return isSimpleMemOperand(s)
}
Parsing operands
After the above operand judgment is implemented, a getOperand function can be implemented to identify all types of operands:
func getOperand(operand string) Operand {
// 是否是寄存器操作数
if t, v := isReg8Operand(operand); t {
return Operand{
Reg8Operand, v}
}
if t, v := isReg16Operand(operand); t {
return Operand{
Reg16Operand, v}
}
if t, v := isSegRegOperand(operand); t {
return Operand{
SegRegOperand, v}
}
// 是否是立即数
if t, v := isImmediateOperand(operand); t {
if v.Width == 8 {
return Operand{
Immediate8Operand, v}
} else {
return Operand{
Immediate16Operand, v}
}
}
// 是否是内存操作数
if t, v := isMemOperand(operand); t {
if v.OperandWidth == 16 {
return Operand{
Mem16Operand, v}
} else if v.OperandWidth == 8 {
return Operand{
Mem8Operand, v}
} else {
return Operand{
MemUnknownSizeOperand, v}
}
}
// 是否是带offset的内部标号
fields := strings.Fields(operand)
if len(fields) == 2 && fields[0] == "offset" {
return Operand{
ImmediateOffsetLabelOperand,
&ImmediateOperand{
IsLabelOffset: true,
Label: fields[1],
},
}
}
// 以上类型的操作数都不是,那么就是内部标号了
if len(fields) == 1 {
return Operand{
ImmediateLabelOperand,
&ImmediateOperand{
IsLabel: true,
Label: fields[0],
},
}
}
return Operand{
InvalidOperand, nil}
}
Implementation of checkMov
checkMov checks whether the format of the mov instruction is correct. If it is normal, save the operand to ctx and pass it to the encodeMov function:
func checkMov(stmt []string) (bool, context.Context) {
if len(stmt) != 3 {
log.Fatal("invalid \"mov\" syntax")
}
dstOperand := getOperand(stmt[1])
dstOperandType := dstOperand.Type
srcOperand := getOperand(stmt[2])
srcOperandType := srcOperand.Type
//语义检查
if dstOperandType == InvalidOperand ||
srcOperandType == InvalidOperand {
log.Fatal("invalid mov 0:无效的操作数类型")
}
if dstOperandType <= ImmediateOffsetLabelOperand {
log.Fatal("invalid mov 1:目的操作数不能为立即数")
} else if dstOperandType == SegRegOperand {
if srcOperandType <= ImmediateOffsetLabelOperand {
log.Fatal("invalid mov 2:不能将立即数移到段寄存器")
}
if srcOperandType == SegRegOperand {
log.Fatal("invalid mov 3:不能将段寄存器移到段寄存器")
}
} else if dstOperandType >= Mem8Operand {
if srcOperandType >= Mem8Operand {
log.Fatal("invalid mov 4:不能将内存移到内存")
}
}
//判断操作数类型
if dstOperandType == Reg8Operand {
if srcOperandType == Immediate16Operand ||
srcOperandType == Reg16Operand ||
srcOperandType == SegRegOperand ||
srcOperandType == Mem16Operand {
log.Fatal("invalid mov 5:目的寄存器是8位,而源操作数是16位")
}
} else if dstOperandType == Reg16Operand ||
dstOperandType == SegRegOperand {
if srcOperandType == Reg8Operand ||
srcOperandType == Mem8Operand {
log.Fatal("invalid mov 6:目的寄存器是16位,而源操作数是8位")
}
} else if dstOperandType == Mem8Operand {
if srcOperandType == Immediate16Operand ||
srcOperandType == Reg16Operand ||
srcOperandType == SegRegOperand {
log.Fatal("invalid mov 7:目的内存操作数是8位,而源操作数是16位")
}
} else if dstOperandType == Mem16Operand {
if srcOperandType == Reg8Operand {
log.Fatal("invalid mov 8:目的内存操作数是16位,而源寄存器是8位")
}
} else if dstOperandType == MemUnknownSizeOperand {
if srcOperandType <= ImmediateOffsetLabelOperand {
log.Fatal("invalid mov 9:目的内存操作数宽度未知,而源操作数是立即数,需指明内存操作宽度!")
}
}
var k encodeCtxKey
ctx := context.Background()
k = encodeCtxKey("dst")
ctx = context.WithValue(ctx, k, dstOperand)
k = encodeCtxKey("src")
ctx = context.WithValue(ctx, k, srcOperand)
return true, ctx
}
The main purpose is to check the legality of the operands of the mov instruction. for example:
-
The destination register is 8 bits, and the source operand can only be 8 bits.
-
The destination operand cannot be an immediate value.
-
Immediate values cannot be moved to segment registers.
-
etc.
Implementation of encodeMov
The encodeMov pair translates assembly instructions into machine instructions:
func encodeMov(ctx context.Context) []byte {
var instruction []byte
dstOperand := ctx.Value(encodeCtxKey("dst")).(Operand)
dstOperandType := dstOperand.Type
srcOperand := ctx.Value(encodeCtxKey("src")).(Operand)
srcOperandType := srcOperand.Type
switch srcOperandType {
case Immediate8Operand, Immediate16Operand,
ImmediateLabelOperand, ImmediateOffsetLabelOperand:
switch dstOperandType {
case Reg8Operand, Reg16Operand:
src := srcOperand.Value.(*ImmediateOperand)
dst := dstOperand.Value.(*RegOperand)
fmt.Println("1 mov immediate to reg")
instruction = encodeMovImmediateToReg(src, dst)
case Mem8Operand, Mem16Operand, MemUnknownSizeOperand:
src := srcOperand.Value.(*ImmediateOperand)
dst := dstOperand.Value.(*MemOperand)
fmt.Println("2 mov immediate to memory")
instruction = encodeMovImmediateToMemory(src, dst)
}
case Reg8Operand, Reg16Operand:
switch dstOperandType {
case Reg8Operand, Reg16Operand:
src := srcOperand.Value.(*RegOperand)
dst := dstOperand.Value.(*RegOperand)
fmt.Println("3 mov reg to reg")
instruction = encodeMovRegToReg(src, dst)
case SegRegOperand:
src := srcOperand.Value.(*RegOperand)
dst := dstOperand.Value.(*RegOperand)
fmt.Println("4 mov reg to seg")
instruction = encodeMovRegToSeg(src, dst)
case Mem8Operand, Mem16Operand, MemUnknownSizeOperand:
src := srcOperand.Value.(*RegOperand)
dst := dstOperand.Value.(*MemOperand)
if src.Name == Accumulator16 || src.Name == Accumulator8 {
if !dst.IsSingleIndex &&
!dst.IsDoubleIndex &&
dst.HasDisplacement {
fmt.Println("5 mov accumulator to memory")
instruction = encodeMovAccumulatorToMemory(src, dst)
} else {
fmt.Println("5.1 mov reg to memory")
fmt.Println(src)
instruction = encodeMovRegToMemory(src, dst)
}
} else {
fmt.Println("6 mov reg to memory")
instruction = encodeMovRegToMemory(src, dst)
}
}
case SegRegOperand:
switch dstOperandType {
case Reg16Operand:
src := srcOperand.Value.(*RegOperand)
dst := dstOperand.Value.(*RegOperand)
fmt.Println("7 mov seg to reg")
instruction = encodeMovSegToReg(src, dst)
case Mem16Operand, MemUnknownSizeOperand:
src := srcOperand.Value.(*RegOperand)
dst := dstOperand.Value.(*MemOperand)
fmt.Println("8 mov seg to memory")
instruction = encodeMovSegToMemory(src, dst)
}
case Mem8Operand, Mem16Operand, MemUnknownSizeOperand:
switch dstOperandType {
case Reg8Operand, Reg16Operand:
src := srcOperand.Value.(*MemOperand)
dst := dstOperand.Value.(*RegOperand)
var ok bool
if dst.Name == Accumulator16 || dst.Name == Accumulator8 {
if !src.IsSingleIndex &&
!src.IsDoubleIndex &&
src.HasDisplacement {
fmt.Println("9 mov memory to accumulator")
instruction = encodeMovMemoryToAccumulator(src, dst)
ok = true
}
}
if !ok {
fmt.Println("10 mov memory to reg")
instruction = encodeMovMemoryToReg(src, dst)
}
case SegRegOperand:
src := srcOperand.Value.(*MemOperand)
dst := dstOperand.Value.(*RegOperand)
fmt.Println("11 mov memory to seg")
instruction = encodeMovMemoryToSeg(src, dst)
}
}
return instruction
}
It calls the corresponding translation function based on the type of the operand. For example, to move an immediate value to memory, it calls the encodeMovImmediateToMemory function:
func encodeMovImmediateToMemory(src *ImmediateOperand, dst *MemOperand) []byte {
/*1100011w, mod 000 rm, [disp-lo] [disp-hi] data [data]*/
var instruction []byte
var w uint8
if dst.OperandWidth == 8 {
w = 0
} else {
w = 1
}
if dst.HasSegmentPrefix {
instruction = append(instruction, encodeSegPrefix(dst.SegmentPrefix)...)
}
instruction = append(instruction, 0b11000110|w)
instruction = append(instruction, encodeMemoryOperand(dst)...)
//考虑立即数是个label,或者是 offset label
if src.IsLabel || src.IsLabelOffset {
putLabelEncodeInfo(src.Label, uint8(len(instruction)), dst.OperandWidth, src.IsLabelOffset)
}
instruction = append(instruction, byte(src.Value))
if w == 1 {
instruction = append(instruction, byte((src.Value>>8)&0xff))
}
return instruction
}
This function encodes instructions into a specific format according to the manual:
The encodeMemoryOperand function is called to encode the memory operand:
func getMODAndRM(operand *MemOperand) (MOD uint8, RM uint8) {
if operand.IsDoubleIndex {
MOD = 0b00
if operand.DoubleIndexFirstReg == "bx" &&
operand.DoubleIndexSecondReg == "si" {
RM = 0b000
} else if operand.DoubleIndexFirstReg == "bx" &&
operand.DoubleIndexSecondReg == "di" {
RM = 0b001
} else if operand.DoubleIndexFirstReg == "bp" &&
operand.DoubleIndexSecondReg == "si" {
RM = 0b010
} else if operand.DoubleIndexFirstReg == "bp" &&
operand.DoubleIndexSecondReg == "di" {
RM = 0b011
}
if operand.HasDisplacement {
if operand.DisplacementWidth == 8 {
MOD = 0b01
} else {
MOD = 0b10
}
}
return
}
if operand.IsSingleIndex {
MOD = 0b00
if operand.SingleIndexReg == "si" {
RM = 0b100
} else if operand.SingleIndexReg == "di" {
RM = 0b101
} else if operand.SingleIndexReg == "bx" {
RM = 0b111
} else if operand.SingleIndexReg == "bp" {
RM = 0b110
}
if operand.HasDisplacement {
if operand.DisplacementWidth == 8 {
MOD = 0b01
} else {
MOD = 0b10
}
} else {
if RM == 0b110 {
log.Fatal("getMODAndRM error 1!")
}
}
return
}
if operand.HasDisplacement {
RM = 0b110
MOD = 0b00
} else {
log.Fatal("getMODAndRM error 2!")
}
return
}
func encodeMemoryOperand(operand *MemOperand) []byte {
/* mod 000 r/m, (DISP-LO), (DISP-HI) */
var instruction []byte
mod, rm := getMODAndRM(operand)
instruction = append(instruction, rm|uint8(mod<<6))
if operand.HasDisplacement {
instruction = append(instruction, byte(operand.DisplacementValue))
if operand.DisplacementWidth == 16 {
instruction = append(instruction, byte((operand.DisplacementValue>>8)&0xff))
} else {
/* mov bx, [1],DISP也编码成2字节,否则解码时不知道DISP长度*/
if mod == 0b00 && rm == 0b110 {
instruction = append(instruction, 0)
}
}
}
return instruction
}
key point
There are two more points worth noting:
-
Memory operands may have a segment prefix, such as "es: [bx]". encodeMovImmediateToMemory Call encodeSegPrefix on a memory operand with a segment prefix to encode the segment prefix.
func encodeSegPrefix(segPrefix string) []byte { var instruction []byte var b byte switch segPrefix { case "es": b = 0b00100110 case "cs": b = 0b00101110 case "ss": b = 0b00110110 case "ds": b = 0b00111110 default: log.Fatalf("invalid seg prefix \"%s\"\n", segPrefix) } instruction = append(instruction, b) return instruction }
-
Internal labels and labels with offset are handled.
//考虑立即数是个label,或者是 offset label if src.IsLabel || src.IsLabelOffset { putLabelEncodeInfo(src.Label, uint8(len(instruction)), dst.OperandWidth, src.IsLabelOffset) }
The implementation of putLabelEncodeInfo is as follows:
var labelEncodeInfos []LabelEncodeInfo
type LabelEncodeInfo struct {
Name string // 标号名称
Offset uint32 // 标号在程序中的偏移量
Width uint8 // 标号值的宽度
IsOffsetLabel bool // 是否是 offset 标号
IsJmpLable bool // 是否是 jmp 指令中的标号
JmpInc uint16 // jmp 指令下一条指令在代码段中的偏移量
}
func putLabelEncodeInfo(name string, offsetInInstruction uint8, width uint8, isOffsetLabel bool) {
labelEncodeInfos = append(labelEncodeInfos,
LabelEncodeInfo{
Name: name,
Offset: progOffset + uint32(offsetInInstruction),
Width: width,
IsOffsetLabel: isOffsetLabel,
})
}
It will record the internal label information for subsequent processing [see subsequent article].