Implementing 8086 assembly compiler (2) - Translation of assembly instructions [mov instruction]

Preface

Because the CPU can only recognize and execute machine instructions, assembly instructions need to be translated [converted, encoded] into machine instructions.

The official manual "Intel_8086_Family_Users_Manual" contains the format of machine instructions corresponding to all assembly instructions.

This document introduces how the translation of mov instructions is implemented.

General format of machine instructions

8086 Machine instructions vary in length from 1 byte to 6 bytes. The format of most instructions is as follows:
Insert image description here
The first 6 bits of a multi-byte instruction usually contain an opcode that identifies the basic instruction type: such as ADD, XOR, etc.

The next bits are called the D field and generally specify the "direction" of the operation: 1 = REG field identifies the destination operand, 0 = REG field identifies the source operand.

The W field distinguishes between byte and word operations: 0 = byte operation, 1 = word operation.

In addition to D and W, other instructions may also have the following fields:

Insert image description here

The 2nd byte of the instruction usually identifies the operands of the instruction. The MOD field indicates whether one of the two operands is in memory [that is, whether one of them is a memory operand. Because it is impossible for an 8086 instruction to have two memory operands at the same time] or whether both operands are registers:

Insert image description here

The REG field identifies which register the operand uses [register operand]:

Insert image description here

In some instructions, primarily immediate-to-memory type instructions, REG is used to expand the opcode to identify the type of operation.

The encoding of the R/M (register/memory) field depends on how the MOD field is set. If MOD = 11 (register-to-register mode), then R/M identifies the second register operand. If MOD is the memory operating mode, then R/M indicates how the effective address of the memory operand is calculated:

Insert image description here

Bytes 3 to 6 of the instruction are optional and usually contain the memory operand offset [displacement] and/or an immediate value. The MOD field indicates whether the length of the offset is 1 byte or 2 bytes [1 word], and the 2nd byte is the highest byte of the word.

The immediate number after the offset is also optional, byte 2 is the highest byte.

The format of the mov machine instruction

The format of the mov machine instruction is as follows:

Insert image description here

You can see that instructions are divided into seven categories:

  1. register/memory to/from register
  2. Immediately count to register/memory
  3. Count to register immediately [This encoding format is shorter than the above]
  4. memory to accumulator
  5. accumulator to memory
  6. register/memory to segment register
  7. segment register to register/memory

The format of the mov assembly instruction

According to the mov machine instruction format, there are 11 types of mov assembly instructions, [the right side is the source operand, the left side is the destination operand]:

  1. Register to register [mov ax,bx]
  2. Memory to register [mov cx,[bp+si+1]]
  3. Register to memory [mov [bx+si],ax]
  4. Count to the register immediately [mov ax,123]
  5. Count to memory immediately [mov [bx],123]
  6. Memory to accumulator [mov ax,[si+1]]
  7. Accumulator to memory [mov [si+1],al]
  8. Register to segment register [mov cs,ax]
  9. Memory to segment register [mov cs,[1]]
  10. Segment register to register [mov cx,cs]
  11. Segment register to memory [mov [di+1],ds]

Translation of mov instruction

Identify operand types

To identify which format the mov instruction in assembler is, we first need to identify the type of the operands.

For example, mov [bx],123, the source operand type is immediate data, and the destination operand is a memory operand.

mov cx, cs, the source operand type is a segment register, and the destination operand is a [general] register.

So I defined the following operand types:

type OperandType uint8

const (
	Immediate8Operand OperandType = iota
	Immediate16Operand
	ImmediateLabelOperand
	ImmediateOffsetLabelOperand
	Reg8Operand
	Reg16Operand
	SegRegOperand
	Mem8Operand
	Mem16Operand
	MemUnknownSizeOperand
	InvalidOperand
)

type Operand struct {
    
    
	Type  OperandType // 操作数类型
	Value interface{
    
    } // 操作数的值
}

immediate number

Immediate numbers are divided into 4 categories:

  • 8-bit immediate number
  • 16-bit immediate
  • Internal label
  • Internal label with offset modification

Look at the following sample program:

assume cs:code,ds:data,ss:stack     ;将cs,ds,ss分别和code,data,stack段相连
data segment
  dw 0123h, 0456h, 0789h, 0abch, 0defh, 0fedh, 0cbah, 0987h
data ends

stack segment
  dw 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
stack ends
code segment
  start: mov ax,stack
         mov ss,ax
         mov sp,20h         ; 将设置栈顶ss:sp指向stack:20

         mov ax, data        ; 将名称为"data"的段的段地址送入ax
         mov ds,ax          ; ds指向data段

         mov bx,0           ; ds:bx指向data段中的第一个单元
         mov cx,8

    s0: push cs:[bx]
        add bx,2
        loop s0             ; 以上将代码段0~15单元总的8个字型数据依次入栈

        mov bx,0
        mov cx, 8
    s1:pop cs:[bx]
        add bx,2
        loop s1             ; 以上依次出栈8个字型数据到代码段0~15单元中

  mov ax,4c00h
  int 21h
code ends
end start

mov ax, data, this data is the internal label. Its value is the value of the data segment when the program is loaded, and it is an immediate number.

The same goes for mov ax,stack.

Look at the following sample program:

assume cs:code
code segment
      mov ax,4c00h
      int 21h
  start: mov ax,0
      s: nop
         nop
         
         mov di,offset s
         mov si,offset s2
         mov ax,cs:[si]
         mov cs:[di],ax

      s0: jmp short s

      s1: mov ax,0
          int 21h
          mov ax,0

      s2: jmp short s1
          nop
code ends
end start

mov di,offset s means putting the [code segment] offset of external label s into the di register. Here s is the label with offset.

[The format mov di,s is not supported yet, where s is the starting address of some data defined in the data segment. In other words, s is a variable]

The same goes for mov si,offset s2.

Use the following structure to represent an immediate number:

type ImmediateOperand struct {
    
    
	Value         uint16 // 值
	Width         uint8  // 立即数宽度,8位或16位
	IsLabel       bool   // 是否是内部标号
	IsLabelOffset bool   // 是否是内部带offset的标号
	Label         string // 标号的名字
}

Use the isImmediateOperand function to determine whether it is an 8-bit or 16-bit immediate number.

// 12345
// 0ffffh
// 123h
// 1000b
// -123h
//'a'
func isSizedImmediateOperand(s string, bitSize int) (bool, *ImmediateOperand) {
    
    
	s = strings.TrimSpace(s)
	if len(s) == 0 {
    
    
		return false, nil
	}

	if bitSize == 8 {
    
    
		if len(s) == 3 &&
			s[0] == '\'' &&
			s[2] == '\'' {
    
    
			return true, &ImmediateOperand{
    
    
				Value: uint16(s[1]),
				Width: 8,
			}
		}
	}

	isNegative := false
	if s[0] == '-' {
    
    
		isNegative = true
	}

	base := 10
	if strings.HasSuffix(s, "h") {
    
    
		base = 16
		s = s[:len(s)-1]
	} else if strings.HasSuffix(s, "b") {
    
    
		base = 2
		s = s[:len(s)-1]
	}

	if isNegative {
    
    
		v, err := strconv.ParseInt(s, base, bitSize)
		if err != nil {
    
    
			return false, nil
		}
		return true, &ImmediateOperand{
    
    
			Value: uint16(v),
			Width: uint8(bitSize),
		}
	}

	v, err := strconv.ParseUint(s, base, bitSize)
	if err != nil {
    
    
		return false, nil
	}

	return true, &ImmediateOperand{
    
    
		Value: uint16(v),
		Width: uint8(bitSize),
	}
}

func isImmediateOperand(s string) (bool, *ImmediateOperand) {
    
    
	t, v := isSizedImmediateOperand(s, 8)
	if t {
    
    
		return t, v
	}

	t, v = isSizedImmediateOperand(s, 16)
	if t {
    
    
		return t, v
	}

	return false, nil
}

The logic of whether it is an internal label or an internal label with offset is given below.

Register operand

Registers are divided into 8-bit registers [al, cl, dl, bl, ah, chdh, bh], 16-bit registers [ax, cx, dx, bx, sp, bp, si, di], segment registers [es, cs, ss, ds].

Represented by the following structure:

type RegOperand struct {
    
    
	Name     string // 寄存器名称
	Width    uint8  // 寄存器宽度
	IsSegReg bool   // 是否是段寄存器
}

The following function is implemented to determine the register type:

var reg8BitMap = map[string]uint8{
    
    
	"al": 0, //Byte Multiply, Byte Divide, Byte 1/0, Translate, Decimal Arithmetic
	"cl": 1, //Variable Shift and Rotate
	"dl": 2,
	"bl": 3,
	"ah": 4, //Byte Multiply, Byte Divide
	"ch": 5,
	"dh": 6,
	"bh": 7,
}

var reg16BitMap = map[string]uint8{
    
    
	"ax": 0, //Word Multiply, Word Divide, Word 1/0
	"cx": 1, //String Operations, Loops
	"dx": 2, //Word Multiply, Word Divide, Indirect 1/0
	"bx": 3, //Translate
	"sp": 4, //Stack Operations
	"bp": 5,
	"si": 6, //String Operations
	"di": 7, //String Operations
}

var regSegMap = map[string]uint8{
    
    
	"es": 0,
	"cs": 1,
	"ss": 2,
	"ds": 3,
}

func isReg8Operand(s string) (bool, *RegOperand) {
    
    
	if _, ok := reg8BitMap[s]; !ok {
    
    
		return false, nil
	}
	return true, &RegOperand{
    
    Name: s, Width: 8}
}

func isReg16Operand(s string) (bool, *RegOperand) {
    
    
	if _, ok := reg16BitMap[s]; !ok {
    
    
		return false, nil
	}
	return true, &RegOperand{
    
    Name: s, Width: 16}
}

func isSegRegOperand(s string) (bool, *RegOperand) {
    
    
	if _, ok := regSegMap[s]; !ok {
    
    
		return false, nil
	}
	return true, &RegOperand{
    
    Name: s, Width: 16, IsSegReg: true}
}

Note: The values ​​corresponding to the various maps defined above are not determined arbitrarily, but are defined based on the value of the REG field in the machine instruction.

memory operand

Memory operands are the most complex. It has a total of 16 forms as follows:

Insert image description here

Counting whether the offset is 8 or 16 bits, there are 24 forms in total. Moreover, immediate data can also be placed outside [], for example

[bx+di+123] and 123[bx+di] and [bx+di]123 These three forms are the same. The compiler must support both.

Memory operands are represented by the following structure:

type MemOperand struct {
    
    
	IsSingleIndex        bool   // 是否使用一个索引寄存器,比如[bx],[bx+123]等
	SingleIndexReg       string // 索引寄存器,比如"bx"
	IsDoubleIndex        bool   // 是否使用两个索引寄存器,比如[bx+si],[bx+di]123等
	DoubleIndexFirstReg  string // 第一个索引寄存器,比如"bx",“bp”等
	DoubleIndexSecondReg string // 第二个索引寄存器 ,比如"si","di"等
	HasDisplacement      bool   // 是否有偏移量
	DisplacementValue    uint16 // 偏移量的值
	DisplacementWidth    uint8  // 偏移量的宽度
	OperandWidth         uint8  // 操作数的宽度,比如 byte ptr [bx],操作数宽度就是8
	HasSegmentPrefix     bool   // 是否有段前缀
	SegmentPrefix        string // 段前缀名称
}

So late one night, I wrote the longest string processing function in the project code, isSimpleMemOperand, to determine whether it is a memory operand without a segment prefix and without word ptr or byte ptr modification.

func isSimpleMemOperand(s string) (bool, *MemOperand) {
    
    
	s = strings.TrimSpace(s)
	if len(s) == 0 {
    
    
		return false, nil
	}

    // 内存操作数必须带有 [ ]
	idxL := strings.IndexByte(s, '[')
	idxR := strings.IndexByte(s, ']')
	if idxL < 0 || idxR < 0 {
    
    
		return false, nil
	}

	if idxL >= idxR {
    
    
		return false, nil
	}

    // 偏移量可能放在[]左边
	hasLeftImmediate := false
	var LeftImmediate *ImmediateOperand
	var t bool
	if idxL != 0 {
    
    
		if idxR != len(s)-1 {
    
    
			return false, nil
		}

		t, LeftImmediate = isImmediateOperand(s[:idxL])
		if !t {
    
    
			return false, nil
		}

		hasLeftImmediate = true
	}

    // 偏移量可能放在[]右边
	hasRightImmediate := false
	var RightImmediate *ImmediateOperand
	if idxR != len(s)-1 {
    
    
		t, RightImmediate = isImmediateOperand(s[idxR+1:])
		if !t {
    
    
			return false, nil
		}

		hasRightImmediate = true
	}

	var Immediate *ImmediateOperand
	if hasLeftImmediate {
    
    
		Immediate = LeftImmediate
	}
	if hasRightImmediate {
    
    
		Immediate = RightImmediate
	}

	filedFunc := func(r rune) bool {
    
    
		if r == ' ' || r == '+' {
    
    
			return true
		}
		return false
	}

    // 将[]中的索引寄存器名称分离出来
	fields := strings.FieldsFunc(s[idxL+1:idxR], filedFunc)
	if len(fields) > 3 {
    
    
		return false, nil
	}

	if len(fields) == 1 {
    
    
		//外面有偏移量
		// [SI]d8
		// [DI]d8
		// [BP]d8
		// [BX]d8

		// [SI]d16
		// [DI]d16
		// [BP]d16
		// [BX]d16

		if Immediate != nil {
    
    
			if fields[0] != "si" &&
				fields[0] != "di" &&
				fields[0] != "bx" &&
				fields[0] != "bp" {
    
    
				return false, nil
			}

			return true, &MemOperand{
    
    
				IsSingleIndex:     true,
				SingleIndexReg:    fields[0],
				HasDisplacement:   true,
				DisplacementValue: Immediate.Value,
				DisplacementWidth: Immediate.Width,
			}
		}

		//外面没有偏移量
		//[SI]
		//[DI]
		//[d16]
		//[BX]
		if fields[0] != "si" &&
			fields[0] != "di" &&
			fields[0] != "bx" {
    
    
			t, Immediate = isImmediateOperand(fields[0])
			if !t {
    
    
				return false, nil
			}
			return true, &MemOperand{
    
    
				HasDisplacement:   true,
				DisplacementValue: Immediate.Value,
				DisplacementWidth: Immediate.Width,
			}
		}

		return true, &MemOperand{
    
    
			IsSingleIndex:  true,
			SingleIndexReg: fields[0],
		}
	}

	if len(fields) == 2 {
    
    
		//外面有偏移量
		// [BX + SI]d8
		// [BX + DI]d8
		// [BP + SI]d8
		// [BP + DI]d8
		// [BX + SI]d16
		// [BX + DI]d16
		// [BP + SI]d16
		// [BP + DI]d16
		if Immediate != nil {
    
    
			switch fields[0] {
    
    
			case "bx", "bp":
				if fields[1] != "si" &&
					fields[1] != "di" {
    
    
					return false, nil
				}

			case "si", "di":
				if fields[1] != "bx" &&
					fields[1] != "bp" {
    
    
					return false, nil
				}

			default:
				return false, nil
			}

			if fields[0] == "si" || fields[0] == "di" {
    
    
				fields[0], fields[1] = fields[1], fields[0]
			}

			return true, &MemOperand{
    
    
				IsDoubleIndex:        true,
				DoubleIndexFirstReg:  fields[0],
				DoubleIndexSecondReg: fields[1],
				HasDisplacement:      true,
				DisplacementValue:    Immediate.Value,
				DisplacementWidth:    Immediate.Width,
			}
		}

		// 外面没有偏移量
		t, Immediate = isImmediateOperand(fields[0])
		if t {
    
    
			fields[0], fields[1] = fields[1], fields[0]
		} else {
    
    
			_, Immediate = isImmediateOperand(fields[1])
		}
		// [BX + SI]
		// [BX + DI]
		// [BP + SI]
		// [BP + DI]
		// [SI + d8]
		// [DI + d8]
		// [BP + d8]
		// [BX + d8]
		// [SI + d16]
		// [DI + d16]
		// [BP + d16]
		// [BX + d16]
		switch fields[0] {
    
    
		case "bx", "bp":
			if fields[1] != "si" &&
				fields[1] != "di" &&
				Immediate == nil {
    
    
				return false, nil
			}
		case "si", "di":
			if fields[1] != "bx" &&
				fields[1] != "bp" &&
				Immediate == nil {
    
    
				return false, nil
			}
		default:
			return false, nil
		}

		if Immediate != nil {
    
    
			return true, &MemOperand{
    
    
				IsSingleIndex:     true,
				SingleIndexReg:    fields[0],
				HasDisplacement:   true,
				DisplacementValue: Immediate.Value,
				DisplacementWidth: Immediate.Width,
			}
		}
		return true, &MemOperand{
    
    
			IsDoubleIndex:        true,
			DoubleIndexFirstReg:  fields[0],
			DoubleIndexSecondReg: fields[1],
		}
	}

	if len(fields) == 3 {
    
    
		if Immediate != nil {
    
    
			return false, nil
		}

		t, Immediate = isImmediateOperand(fields[0])
		if t {
    
    
			fields[0], fields[2] = fields[2], fields[0]
		} else {
    
    
			t, Immediate = isImmediateOperand(fields[1])
			if t {
    
    
				fields[1], fields[2] = fields[2], fields[1]
			} else {
    
    
				t, Immediate = isImmediateOperand(fields[2])
				if !t {
    
    
					return false, nil
				}
			}
		}

		if fields[0] != "bx" &&
			fields[0] != "bp" {
    
    
			fields[0], fields[1] = fields[1], fields[0]
		}

		// [BX + SI + d8]
		// [BX + DI + d8]
		// [BP + SI + d8]
		// [BP + DI + d8]
		// [BX + SI + d16]
		// [BX + DI + d16]
		// [BP + SI + d16]
		// [BP + DI + d16]
		switch fields[0] {
    
    
		case "bx", "bp":
			if fields[1] != "si" &&
				fields[1] != "di" {
    
    
				return false, nil
			}

		case "si", "di":
			if fields[1] != "bx" &&
				fields[1] != "bp" {
    
    
				return false, nil
			}

		default:
			return false, nil
		}
	}

	return true, &MemOperand{
    
    
		IsDoubleIndex:        true,
		DoubleIndexFirstReg:  fields[0],
		DoubleIndexSecondReg: fields[1],
		HasDisplacement:      true,
		DisplacementValue:    Immediate.Value,
		DisplacementWidth:    Immediate.Width,
	}
}

Then isMemOperand is implemented to completely determine whether it is a memory operand:

func isMemOperand(s string) (bool, *MemOperand) {
    
    
   s = strings.TrimSpace(s)
   idxCol := strings.IndexByte(s, ':')
   if idxCol == 0 {
    
    
   	return false, nil
   }

   var t bool
   var memOperand *MemOperand
   var operandWidth uint8
   var segPrefix string
   // word ptr ds:[0]
   // word ptr ds:[bx+2]
   // ds:[bx+si]
   if idxCol > 0 {
    
     // have segment override prefix
   	idxSpace := strings.LastIndexByte(s[:idxCol], ' ')
   	if idxSpace == 0 {
    
    
   		return false, nil
   	}

   	if idxSpace > 0 {
    
     // have word ptr or byte ptr
   		fields := strings.Fields(s[:idxSpace])
   		if fields[1] != "ptr" {
    
    
   			return false, nil
   		}
   		if fields[0] == "word" {
    
    
   			operandWidth = 16
   		} else if fields[0] == "byte" {
    
    
   			operandWidth = 8
   		} else {
    
    
   			return false, nil
   		}
   	}

   	segPrefix = s[idxSpace+1 : idxCol]
   	if !isSegReg(segPrefix) {
    
    
   		return false, nil
   	}

   	t, memOperand = isSimpleMemOperand(s[idxCol+1:])
   	if !t {
    
    
   		return false, nil
   	}

   	fmt.Printf("has seg prefix :%s\n", segPrefix)
   	memOperand.OperandWidth = operandWidth
   	memOperand.HasSegmentPrefix = true
   	memOperand.SegmentPrefix = segPrefix
   	return true, memOperand
   }

   //no segment override prefix
   // word ptr [bx+2]
   //[bx+2]
   fields := strings.Fields(s)
   if fields[0] == "word" ||
   	fields[0] == "byte" {
    
    
   	if fields[1] != "ptr" {
    
    
   		return false, nil
   	}

   	idxP := strings.IndexByte(s, 'p')
   	if idxP <= 0 {
    
    
   		return false, nil
   	}

   	if fields[0] == "word" {
    
    
   		operandWidth = 16
   	} else {
    
    
   		operandWidth = 8
   	}

   	t, memOperand = isSimpleMemOperand(s[idxP+3:])
   	if !t {
    
    
   		return false, nil
   	}

   	memOperand.OperandWidth = operandWidth
   	return t, memOperand
   }

   return isSimpleMemOperand(s)
}

Parsing operands

After the above operand judgment is implemented, a getOperand function can be implemented to identify all types of operands:

func getOperand(operand string) Operand {
    
    
    // 是否是寄存器操作数
	if t, v := isReg8Operand(operand); t {
    
    
		return Operand{
    
    Reg8Operand, v}
	}

	if t, v := isReg16Operand(operand); t {
    
    
		return Operand{
    
    Reg16Operand, v}
	}

	if t, v := isSegRegOperand(operand); t {
    
    
		return Operand{
    
    SegRegOperand, v}
	}

    // 是否是立即数
	if t, v := isImmediateOperand(operand); t {
    
    
		if v.Width == 8 {
    
    
			return Operand{
    
    Immediate8Operand, v}
		} else {
    
    
			return Operand{
    
    Immediate16Operand, v}
		}
	}

    // 是否是内存操作数
	if t, v := isMemOperand(operand); t {
    
    
		if v.OperandWidth == 16 {
    
    
			return Operand{
    
    Mem16Operand, v}
		} else if v.OperandWidth == 8 {
    
    
			return Operand{
    
    Mem8Operand, v}
		} else {
    
    
			return Operand{
    
    MemUnknownSizeOperand, v}
		}
	}

    // 是否是带offset的内部标号
	fields := strings.Fields(operand)
	if len(fields) == 2 && fields[0] == "offset" {
    
    
		return Operand{
    
    
			ImmediateOffsetLabelOperand,
			&ImmediateOperand{
    
    
				IsLabelOffset: true,
				Label:         fields[1],
			},
		}
	}

    // 以上类型的操作数都不是,那么就是内部标号了
	if len(fields) == 1 {
    
    
		return Operand{
    
    
			ImmediateLabelOperand,
			&ImmediateOperand{
    
    
				IsLabel: true,
				Label:   fields[0],
			},
		}
	}

	return Operand{
    
    InvalidOperand, nil}
}

Implementation of checkMov

checkMov checks whether the format of the mov instruction is correct. If it is normal, save the operand to ctx and pass it to the encodeMov function:

func checkMov(stmt []string) (bool, context.Context) {
    
    
	if len(stmt) != 3 {
    
    
		log.Fatal("invalid \"mov\" syntax")
	}

	dstOperand := getOperand(stmt[1])
	dstOperandType := dstOperand.Type
	srcOperand := getOperand(stmt[2])
	srcOperandType := srcOperand.Type
	//语义检查
	if dstOperandType == InvalidOperand ||
		srcOperandType == InvalidOperand {
    
    
		log.Fatal("invalid mov 0:无效的操作数类型")
	}

	if dstOperandType <= ImmediateOffsetLabelOperand {
    
    
		log.Fatal("invalid mov 1:目的操作数不能为立即数")
	} else if dstOperandType == SegRegOperand {
    
    
		if srcOperandType <= ImmediateOffsetLabelOperand {
    
    
			log.Fatal("invalid mov 2:不能将立即数移到段寄存器")
		}

		if srcOperandType == SegRegOperand {
    
    
			log.Fatal("invalid mov 3:不能将段寄存器移到段寄存器")
		}

	} else if dstOperandType >= Mem8Operand {
    
    
		if srcOperandType >= Mem8Operand {
    
    
			log.Fatal("invalid mov 4:不能将内存移到内存")
		}
	}

	//判断操作数类型
	if dstOperandType == Reg8Operand {
    
    
		if srcOperandType == Immediate16Operand ||
			srcOperandType == Reg16Operand ||
			srcOperandType == SegRegOperand ||
			srcOperandType == Mem16Operand {
    
    
			log.Fatal("invalid mov 5:目的寄存器是8位,而源操作数是16位")

		}
	} else if dstOperandType == Reg16Operand ||
		dstOperandType == SegRegOperand {
    
    
		if srcOperandType == Reg8Operand ||
			srcOperandType == Mem8Operand {
    
    
			log.Fatal("invalid mov 6:目的寄存器是16位,而源操作数是8位")
		}
	} else if dstOperandType == Mem8Operand {
    
    
		if srcOperandType == Immediate16Operand ||
			srcOperandType == Reg16Operand ||
			srcOperandType == SegRegOperand {
    
    
			log.Fatal("invalid mov 7:目的内存操作数是8位,而源操作数是16位")
		}
	} else if dstOperandType == Mem16Operand {
    
    
		if srcOperandType == Reg8Operand {
    
    
			log.Fatal("invalid mov 8:目的内存操作数是16位,而源寄存器是8位")
		}
	} else if dstOperandType == MemUnknownSizeOperand {
    
    
		if srcOperandType <= ImmediateOffsetLabelOperand {
    
    
			log.Fatal("invalid mov 9:目的内存操作数宽度未知,而源操作数是立即数,需指明内存操作宽度!")
		}
	}

	var k encodeCtxKey
	ctx := context.Background()
	k = encodeCtxKey("dst")
	ctx = context.WithValue(ctx, k, dstOperand)
	k = encodeCtxKey("src")
	ctx = context.WithValue(ctx, k, srcOperand)

	return true, ctx
}

The main purpose is to check the legality of the operands of the mov instruction. for example:

  • The destination register is 8 bits, and the source operand can only be 8 bits.

  • The destination operand cannot be an immediate value.

  • Immediate values ​​cannot be moved to segment registers.

  • etc.

Implementation of encodeMov

The encodeMov pair translates assembly instructions into machine instructions:

func encodeMov(ctx context.Context) []byte {
    
    

	var instruction []byte

	dstOperand := ctx.Value(encodeCtxKey("dst")).(Operand)
	dstOperandType := dstOperand.Type
	srcOperand := ctx.Value(encodeCtxKey("src")).(Operand)
	srcOperandType := srcOperand.Type

	switch srcOperandType {
    
    
	case Immediate8Operand, Immediate16Operand,
		ImmediateLabelOperand, ImmediateOffsetLabelOperand:
		switch dstOperandType {
    
    
		case Reg8Operand, Reg16Operand:
			src := srcOperand.Value.(*ImmediateOperand)
			dst := dstOperand.Value.(*RegOperand)
			fmt.Println("1 mov immediate to reg")
			instruction = encodeMovImmediateToReg(src, dst)
		case Mem8Operand, Mem16Operand, MemUnknownSizeOperand:
			src := srcOperand.Value.(*ImmediateOperand)
			dst := dstOperand.Value.(*MemOperand)
			fmt.Println("2 mov immediate to memory")
			instruction = encodeMovImmediateToMemory(src, dst)
		}
	case Reg8Operand, Reg16Operand:
		switch dstOperandType {
    
    
		case Reg8Operand, Reg16Operand:
			src := srcOperand.Value.(*RegOperand)
			dst := dstOperand.Value.(*RegOperand)
			fmt.Println("3 mov reg to reg")
			instruction = encodeMovRegToReg(src, dst)
		case SegRegOperand:
			src := srcOperand.Value.(*RegOperand)
			dst := dstOperand.Value.(*RegOperand)
			fmt.Println("4 mov reg to seg")
			instruction = encodeMovRegToSeg(src, dst)
		case Mem8Operand, Mem16Operand, MemUnknownSizeOperand:
			src := srcOperand.Value.(*RegOperand)
			dst := dstOperand.Value.(*MemOperand)

			if src.Name == Accumulator16 || src.Name == Accumulator8 {
    
    
				if !dst.IsSingleIndex &&
					!dst.IsDoubleIndex &&
					dst.HasDisplacement {
    
    
					fmt.Println("5 mov accumulator to memory")
					instruction = encodeMovAccumulatorToMemory(src, dst)
				} else {
    
    
					fmt.Println("5.1 mov reg to memory")
					fmt.Println(src)
					instruction = encodeMovRegToMemory(src, dst)
				}
			} else {
    
    
				fmt.Println("6 mov reg to memory")
				instruction = encodeMovRegToMemory(src, dst)
			}

		}
	case SegRegOperand:
		switch dstOperandType {
    
    
		case Reg16Operand:
			src := srcOperand.Value.(*RegOperand)
			dst := dstOperand.Value.(*RegOperand)
			fmt.Println("7 mov seg to reg")
			instruction = encodeMovSegToReg(src, dst)
		case Mem16Operand, MemUnknownSizeOperand:
			src := srcOperand.Value.(*RegOperand)
			dst := dstOperand.Value.(*MemOperand)
			fmt.Println("8 mov seg to memory")
			instruction = encodeMovSegToMemory(src, dst)
		}
	case Mem8Operand, Mem16Operand, MemUnknownSizeOperand:
		switch dstOperandType {
    
    
		case Reg8Operand, Reg16Operand:
			src := srcOperand.Value.(*MemOperand)
			dst := dstOperand.Value.(*RegOperand)
			var ok bool
			if dst.Name == Accumulator16 || dst.Name == Accumulator8 {
    
    
				if !src.IsSingleIndex &&
					!src.IsDoubleIndex &&
					src.HasDisplacement {
    
    
					fmt.Println("9 mov memory to accumulator")
					instruction = encodeMovMemoryToAccumulator(src, dst)
					ok = true
				}
			}
			if !ok {
    
    
				fmt.Println("10 mov memory to reg")
				instruction = encodeMovMemoryToReg(src, dst)
			}
		case SegRegOperand:
			src := srcOperand.Value.(*MemOperand)
			dst := dstOperand.Value.(*RegOperand)
			fmt.Println("11 mov memory to seg")
			instruction = encodeMovMemoryToSeg(src, dst)
		}
	}
	return instruction
}

It calls the corresponding translation function based on the type of the operand. For example, to move an immediate value to memory, it calls the encodeMovImmediateToMemory function:

func encodeMovImmediateToMemory(src *ImmediateOperand, dst *MemOperand) []byte {
    
    
	/*1100011w, mod 000 rm, [disp-lo] [disp-hi] data [data]*/
	var instruction []byte
	var w uint8
	if dst.OperandWidth == 8 {
    
    
		w = 0
	} else {
    
    
		w = 1
	}

	if dst.HasSegmentPrefix {
    
    
		instruction = append(instruction, encodeSegPrefix(dst.SegmentPrefix)...)
	}
	instruction = append(instruction, 0b11000110|w)
	instruction = append(instruction, encodeMemoryOperand(dst)...)
	//考虑立即数是个label,或者是 offset label
	if src.IsLabel || src.IsLabelOffset {
    
    
		putLabelEncodeInfo(src.Label, uint8(len(instruction)), dst.OperandWidth, src.IsLabelOffset)
	}
	instruction = append(instruction, byte(src.Value))
	if w == 1 {
    
    
		instruction = append(instruction, byte((src.Value>>8)&0xff))
	}

	return instruction
}

This function encodes instructions into a specific format according to the manual:

Insert image description here

The encodeMemoryOperand function is called to encode the memory operand:

func getMODAndRM(operand *MemOperand) (MOD uint8, RM uint8) {
    
    
	if operand.IsDoubleIndex {
    
    
		MOD = 0b00
		if operand.DoubleIndexFirstReg == "bx" &&
			operand.DoubleIndexSecondReg == "si" {
    
    
			RM = 0b000
		} else if operand.DoubleIndexFirstReg == "bx" &&
			operand.DoubleIndexSecondReg == "di" {
    
    
			RM = 0b001
		} else if operand.DoubleIndexFirstReg == "bp" &&
			operand.DoubleIndexSecondReg == "si" {
    
    
			RM = 0b010
		} else if operand.DoubleIndexFirstReg == "bp" &&
			operand.DoubleIndexSecondReg == "di" {
    
    
			RM = 0b011
		}
		if operand.HasDisplacement {
    
    
			if operand.DisplacementWidth == 8 {
    
    
				MOD = 0b01
			} else {
    
    
				MOD = 0b10
			}
		}
		return
	}

	if operand.IsSingleIndex {
    
    
		MOD = 0b00
		if operand.SingleIndexReg == "si" {
    
    
			RM = 0b100
		} else if operand.SingleIndexReg == "di" {
    
    
			RM = 0b101
		} else if operand.SingleIndexReg == "bx" {
    
    
			RM = 0b111
		} else if operand.SingleIndexReg == "bp" {
    
    
			RM = 0b110
		}

		if operand.HasDisplacement {
    
    
			if operand.DisplacementWidth == 8 {
    
    
				MOD = 0b01
			} else {
    
    
				MOD = 0b10
			}
		} else {
    
    
			if RM == 0b110 {
    
    
				log.Fatal("getMODAndRM error 1!")
			}
		}

		return
	}

	if operand.HasDisplacement {
    
    
		RM = 0b110
		MOD = 0b00
	} else {
    
    
		log.Fatal("getMODAndRM error 2!")
	}

	return
}

func encodeMemoryOperand(operand *MemOperand) []byte {
    
    
	/* mod 000 r/m, (DISP-LO), (DISP-HI) */
	var instruction []byte
	mod, rm := getMODAndRM(operand)
	instruction = append(instruction, rm|uint8(mod<<6))
	if operand.HasDisplacement {
    
    
		instruction = append(instruction, byte(operand.DisplacementValue))
		if operand.DisplacementWidth == 16 {
    
    
			instruction = append(instruction, byte((operand.DisplacementValue>>8)&0xff))

		} else {
    
     /* mov bx, [1],DISP也编码成2字节,否则解码时不知道DISP长度*/
			if mod == 0b00 && rm == 0b110 {
    
    
				instruction = append(instruction, 0)
			}
		}
	}
	return instruction
}
key point

There are two more points worth noting:

  • Memory operands may have a segment prefix, such as "es: [bx]". encodeMovImmediateToMemory Call encodeSegPrefix on a memory operand with a segment prefix to encode the segment prefix.

    func encodeSegPrefix(segPrefix string) []byte {
          
          
    	var instruction []byte
    	var b byte
    	switch segPrefix {
          
          
    	case "es":
    		b = 0b00100110
    	case "cs":
    		b = 0b00101110
    	case "ss":
    		b = 0b00110110
    	case "ds":
    		b = 0b00111110
    	default:
    		log.Fatalf("invalid seg prefix \"%s\"\n", segPrefix)
    	}
    	instruction = append(instruction, b)
    	return instruction
    }
    
  • Internal labels and labels with offset are handled.

    	//考虑立即数是个label,或者是 offset label
    	if src.IsLabel || src.IsLabelOffset {
          
          
    		putLabelEncodeInfo(src.Label, uint8(len(instruction)), dst.OperandWidth, src.IsLabelOffset)
    	}
    

The implementation of putLabelEncodeInfo is as follows:

var labelEncodeInfos []LabelEncodeInfo

type LabelEncodeInfo struct {
    
    
	Name          string // 标号名称
	Offset        uint32 // 标号在程序中的偏移量
	Width         uint8  // 标号值的宽度
	IsOffsetLabel bool   // 是否是 offset 标号
	IsJmpLable    bool   // 是否是 jmp 指令中的标号
	JmpInc        uint16 // jmp 指令下一条指令在代码段中的偏移量
}

func putLabelEncodeInfo(name string, offsetInInstruction uint8, width uint8, isOffsetLabel bool) {
    
    
	labelEncodeInfos = append(labelEncodeInfos,
		LabelEncodeInfo{
    
    
			Name:          name,
			Offset:        progOffset + uint32(offsetInInstruction),
			Width:         width,
			IsOffsetLabel: isOffsetLabel,
		})
}

It will record the internal label information for subsequent processing [see subsequent article].

Guess you like

Origin blog.csdn.net/woay2008/article/details/126564802