Based on stm32mp157 linux development board ARM bare metal development tutorial 5: ARM microprocessor instruction system (in serial)

Foreword:

At present, the ARM Cortex-A7 bare-metal development documents and videos have been upgraded twice and continuously updated to make the content richer and the explanation more detailed. The development platform used in the full text is the Huaqing Yuanjian FS-MP1A development board (STM32MP157 development board )

For the FS-MP1A development board, in addition to Cortex-A7 bare metal development, it also includes other series of tutorials, including Cortex-M4 development, FreeRTOS, Linux basic and application development, Linux system transplantation, Linux driver development , hardware design, artificial intelligence machine vision, Qt application programming, Qt comprehensive project actual combat, etc. In addition, it is planned to upgrade the documents and videos for the Linux system porting chapter and the Linux driver development chapter, so stay tuned!

More information about the development board can be obtained by leaving a message below the comment area~~~

ARM  Microprocessor Instruction System

The ARM instruction set can be divided into jump instructions, data processing instructions, program status register transfer instructions, Load/Store instructions, co-processing

processor instructions and exception-generating instructions. According to the type of instruction used, the addressing mode of the instruction is divided into the addressing mode of the data processing instruction and the addressing mode of the memory access instruction.

This chapter mainly introduces ARM assembly language. The main contents are as follows:

⚫ Addressing mode of ARM processor.

⚫ Instruction set of ARM processor

Composition of ARM  instructions

<opcode> {<c>} {S} <Rd>,<Rn>,<shifter_operand>

Instruction analysis:

<opcode>: the command to execute

{<c>}: The condition code for command execution. When <c> is omitted, the instruction is executed unconditionally.

Table 44.7.2.1 Instruction Execution Conditions

{S}: Determine whether the operation of the instruction affects CPSR. On an exception return, if the target of the operation is the PC register, the S flag will be

restore the SPSR register to the CPSR.

<Rd>: is the target register.

<Rn>: The register where the operand is located.

<shifter_operand> has 11 forms, as shown in the table

Here are some examples of MOV instructions:

Example Code 45-1 Example

1 MOV R0,#2

2 ADDS R0,R0,R1

3 MOV R2,R0

4 MOV R1, R0, LSL #2

ARM  processor addressing mode

The addressing mode of ARM instruction is divided into data processing instruction addressing mode and memory access instruction addressing mode.

Data processing instruction addressing mode

The addressing modes of data processing instructions can be divided into the following types.

⚫ Immediate addressing mode.

⚫ Register addressing mode.

⚫ Register shift addressing mode.

1. Immediate addressing mode

The immediate value in the instruction is obtained by shifting an 8-bit constant by 4 even-numbered bits (0, 2, 4, ..., 26, 28, 30). Therefore, each instruction contains an 8-bit constant X and a shift value Y, and the obtained immediate value = X rotates right (2×Y), as shown in the figure.

Why does the immediate number need to be obtained through the above operations? Here we take the machine code of the ldr instruction as an example to analyze the reason.

In the figure above, imm12 is the bit field occupied by the immediate value. Here you can see that the area of ​​imm12 is only 12bit, and a 12bit is required.

It is absolutely impossible to represent arbitrary 32bit numbers with the encoding. However, in the actual development process, 12bit encoding is used to represent 32bit

number. Then only limit the number of representations, and use 12-bit codes to represent 32-bit numbers through coding.

In the above we mentioned that the immediate value = X cyclic right shift (2×Y), although the range of representation becomes larger, but the number of numbers that can be represented by 12 bits is certain. Therefore, ARM stipulates that not all 32 Bit constants are all legal immediate numbers, and only those obtained through the above construction methods are legal immediate numbers.

The ARM assembler generates the encoding of the immediate value according to the following rules:

When the immediate value is in the range of 0~0xFF, set X=immediate and Y=0.

In other cases, the assembler chooses the encoding that minimizes the value of Y.

Then to generate the encoding of the immediate value:

Some valid immediate values ​​are listed below:

0xFF:X=0xFF,Y=0

0x104:X=0x41,Y=15

0xFF0:X=0xFF,Y=14

0xF000000F:X=0xFF,Y=2

The following are some invalid immediate values:

0x101、0x102、0xFF1、0xFF04、0xFF003、0xFFFFFFFF、0xF000001F

Here are some instructions that apply immediate values

Example Code 45-2 Immediate Data

1 MOV R0,#0 ; Send 0 to R0

2 ADD R3, R3, #1 ; add 1 to the value of R3

3 CMP R7, #1000 ; compare the value of R7 with 1000

4 BIC R9, R8, #0xFF00 ; Clear bits 8 to 15 in R8, and save the result in R9

2. Register addressing mode

The value of the register can be directly used for data manipulation instructions. This addressing method is often used by various processors, and it is also an addressing method with high execution efficiency, such as:

Example Code 45-3 Register Addressing

1 MOV R2, R0 ; the value of R0 is sent to R2

2 ADD R4, R3, R2; add R2 to R3, and send the result to R4

3 CMP R7, R8 ; Compare the values ​​of R7 and R8

3. Register shift addressing mode

Similar to register addressing, except that the register operand needs to be shifted before the operation. Before the value of the register is sent to the ALU, it can be processed by the barrel shift register in advance. Preprocessing and shifting occur in the same cycle, so efficient use of shift registers can increase code execution efficiency.

LSL<c> <Rd>, <Rm>, #<imm5>

LSR<c> <Rd>, <Rm>, #<imm>

ASR<c> <Rd>, <Rm>, #<imm>

ROR{S}<c> <Rd>, <Rm>, #<imm>

RRX{S}<c> <Rd>, <Rm>

{<c>}: The condition code for command execution. When <c> is omitted, the instruction is executed unconditionally.

<Rd>: is the target register.

<Rm>: The register where the operand is located.

<imm>: Shift amount, range 1-32.

Rotate Right with Extend (RRX): The operand is shifted right by one bit, and the high-order bits vacated by the shift are filled with the value of the C flag.

Here are some examples of shift operations used in instructions:

Example Code 45-4 Register Shift Addressing

1 ADD R2, R0, R1, LSR #5 ; Logically shift the value of R1 to the right by 5 bits and add the result with R0 to R2

2 MOV R1, R0, LSL #2 ; Logically shift the value of R0 to the left by 2 bits and transfer the result to R1

3 SUB R1, R2, R0, LSR #4 ; Subtract the value of R0 from the value of R2 and shift the value of R0 to the right by 4 bits, and transfer the result to R1

4 MOV R2, R4, ROR R0 ; Move the value of R4 to the right R0 times, and transfer the result to R2

Memory access instruction addressing mode

The addressing modes of memory access instructions can be divided into the following types.

⚫ The addressing mode of the Load/Store instruction of word and unsigned byte.

⚫ Addressing mode of miscellaneous Load/Store instructions.

⚫ Addressing mode of batch Load/Store instructions.

⚫ Addressing mode for coprocessor Load/Store instructions.

1. The addressing mode of the Load/Store instruction of word and unsigned byte

The syntax format of the Load/Store command for words and unsigned bytes is as follows:

LDR|STR{<cond>}{B}{T} <Rd>,<addressing_mode>

In the above table, "!" indicates that the base address register should be updated after the data transmission is completed.

2. The addressing mode of miscellaneous Load/Store instructions

The syntax format of instructions using this type of addressing mode is as follows:

LDR|STR{<cond>}H|SH|SB|D <Rd>, <addressing_mode>

Instructions using this type of addressing mode include (signed/unsigned) halfword Load/Store instructions, signed byte Load/Store instructions, and double-word Load/Store instructions.

Table 45.2.2.2 <addressing_mode> addressing mode

3. Stack operation addressing mode

The stack operation addressing mode is very similar to the bulk Load/Store instruction addressing mode. But for the operation of the stack, data is written into the memory and read from the memory to use different addressing modes, because the push operation (pop) and the stack operation (push) need to adjust the stack in different directions.

The syntax format of this type of instruction is as follows:

LDM|STM {<amode>}{<cond>}<addressing_mode> <Rn>{!},<registers><^>

The following discusses in detail how to use the appropriate addressing mode to implement data stack operations.

According to different addressing modes of amode, the stack is divided into the following four types.

1) Full stack: The stack pointer points to the top element of the stack (last used location).

2) Empty stack: the stack pointer points to the first available element (the first unused location).

3) Decreasing stack: The stack grows in the direction of decreasing memory address.

4) Incremental stack: The stack grows in the direction of increasing memory addresses.

According to the different types of the stack, its addressing mode is divided into the following 4 types.

1) Full Descending FD (Full Descending).

2) Empty Descending (ED).

3) Full increment FA (Full Ascending).

4) Empty Ascending EA (Empty Ascending).

As shown in the table, the corresponding relationship between the stack addressing mode and the bulk Load/Store instruction addressing mode is listed.

Batch Load/Store instruction addressing mode

Batch Load/Store instructions load data from a contiguous memory location into a general-purpose register file or store data from a group of general-purpose registers into a memory location.

The addressing mode of the batch Load/Store instruction generates the address range of a memory unit, and the corresponding relationship between the instruction register and the memory unit satisfies such a rule, that is, the low-numbered register corresponds to the low address unit in the memory, and the high-numbered register corresponds to the memory unit The high address unit in .

The syntax format of this type of instruction is as follows:

LDM|STM {<amode>}{<cond>}<addressing_mode> <Rn>{!},<registers><^>

Coprocessor Load/Store addressing mode

The syntax format of the coprocessor Load/Store instruction is as follows:

MCR<c> <coproc>, <opc1>, <Rt>, <CRn>, <CRm>{, <opc2>}

MRC<c> <coproc>, <opc1>, <Rt>, <CRn>, <CRm>{, <opc2>}

<c>: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally

<coproc>: coprocessor name, range p0-p15;

<opc1>: coprocessor opcode, range 0-15;

<Rt>: source register, the MCR instruction is to write the Rt register to the coprocessor, and the MRC instruction is to read the content of the coprocessor to the Rt register;

<CRn>: the target register of the coprocessor;

<CRm>: The additional target register or source operation register in the coprocessor, if no additional information is required, set it to C0, otherwise the result is unpredictable;

<opc2>: optional coprocessor-specific opcode, set to 0 when not needed.

Regarding operations such as CRn, opc1, CRm, opc2, etc., the register layout of the CP15 coprocessor is given below. Different values ​​correspond to different values ​​when operating different coprocessors

For example, the Cache operation instruction mentioned earlier

Code Example 45-5 Enable ICache

1 /******Cache Test*******/

2 mrc p15,0,r1,c1,c0,0

3 orr r1, r1, #(1 << 2) // Set C bit to enable Cache as a whole

4 orr r1, r1, #(1 << 12) //Set I bit to enable ICache

5 mcr p15,0,r1,c1,c0,0

6 /******End Test******/

If you need to operate other coprocessors, you can refer to the official documents of "ARM® Architecture Reference Manual" or "Cortex-A7 MPCore Technical Reference Manual".

ARM  processor instruction set

data manipulation instruction

Data manipulation instructions refer to instructions that operate on data stored in registers. It mainly includes data transfer instructions, arithmetic instructions, logic instructions, comparison and test instructions and multiplication instructions.

If the S prefix is ​​used before the data processing instruction, the execution result of the instruction will affect the flag bit in CPSR. Data processing instructions are shown in the table.

1. MOV instruction

The MOV instruction is the simplest ARM instruction. The result of execution is to send a number N to the target register Rd, where N can be a register or an immediate value.

The MOV instruction is mostly used to set the initial value or transfer data between registers.

The MOV instruction transfers the data represented by the shift code (shifter_operand) to the destination register Rd, and updates the corresponding condition flag in the CPSR according to the result of the operation.

The syntax format of the command:

MOV{<c>}{S} <Rd>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register. If R15 is the destination register, the program counter or flags will be modified. This is used to return to the calling code after the called subroutine finishes by transferring the contents of the link register to R15.

<shifter_operand>: The data to be transferred to the Rd register, which can be an immediate value, a register, or an instruction obtained through a shift operation. Example:

Example Code 45-6 mov example

1 mov r0, r0 ; R0=R0 NOP instruction

2 mov r0, r0, lsl#3 ; R0=R0*8

3 mov pc, lr ; Exit to the caller, used for normal function return, PC is R15

4 movs pc, lr ; Exit to the caller and restore the flag, used for abnormal function return

The MOV instruction mainly completes the following functions.

⚫ Transfer data from one register to another.

⚫ Transfer a constant value into a register.

⚫ When PC (R15) is used as the destination register, program jump can be realized. Such as "MOV PC, LR", so this kind of jump can realize subroutine call and return from subroutine, instead of instruction "B, BL".

⚫ When the PC is used as the target register and the S bit is set in the instruction, the instruction will copy the contents of the SPSR register of the current processor mode to the CPSR while executing the jump operation. This instruction "MOVS PC LR" can realize returning from certain abnormal interrupts.

MVN instruction

MVN is a Move Negative command. It transfers the complement of the operand to the destination register.

The MVN instruction is mostly used to transfer a negative number to a register or to generate a bit mask.

The MVN instruction transfers the one's complement of the data represented by shifter_operand to the destination register Rd, and updates the corresponding condition flag in CPSR according to the operation result.

The syntax format of the command:

MVN{<c>}{S} <Rd>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register.

<shifter_operand>: The data to be transferred to the Rd register, which can be an immediate value, a register, or obtained through a shift operation. This is a logical NOT operation rather than an arithmetic operation. This negated value plus 1 is its negative value.

Command example:

Example code 45-7 mvn example

1 mvn r0, #4 ; r0 = -5

2 mvn r0, #0 ; r0 = -1

The MVN instruction mainly completes the following functions:

⚫ Transfer a negative number to a register.

⚫ Generate bit mask (Bit Mask).

⚫ Find the one's complement of a number.

AND instruction

The AND instruction performs a bitwise AND operation on the value represented by shifter_operand and the value of the register Rn, and saves the result in the target register Rd, and updates the CPSR register according to the result of the operation.

The syntax format of the command:

AND{<c>}{S} <Rd>,<Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register.

<Rn>: the first operand register.

<shifter_operand>: the data to be ANDed with the Rn register

Command example:

Example code 45-8 and example

1 and r0, r0, #3 ; Keep bits 0 and 1 in r0, discard the rest.

2 and r2,r1,r3 ;r2 = r1&r3

3 ands r0,r0,#0x01 ;r0 = r0&0x01, take out the lowest bit data

ORR instruction

ORR (Logical OR) is a logical OR operation instruction, which performs a bitwise "logical OR" operation on the value of the second source operand shifter_operand and the value of the register Rn, and saves the result in Rd.

The syntax format of the command:

ORR{<c>}{S} <Rd>,<Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination operand

<Rn>: the first operand.

<shifter_operand>: the number to be logically ORed with Rn

Command example:

example code 45-9 orr example

1 orr r0, r0, #3 ; set bits 0 and 1 in r0

2 orr r0, r0, #0x0f ; Set the lower 4 bits of r0 to 1

3 ; Use the orr instruction to move the high 8 bits of r2 into the low 8 bits of r3.

4 mov r1,r2,lsr #4

5 orr r3,r1,r3,lsl #8

BIC Bit Clear Instruction

BIC (Bit Clear) bit clear instruction, the value of the register Rn and the value of the second source operand shifter_operand inverse code bit by bit

Do "logic AND" operation, and save the result to Rd.

The syntax format of the command:

BIC{<c>}{S} <Rd>,<Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination operand

<Rn>: the first operand.

<shifter_operand>: Negative code and Rn do "logical AND" operation

Command example:

Example code 45-10 bic example

1 bic r0, r0, #0x1011 ; clear bits 12, 4 and 0 in r0, leave the rest unchanged

2 bic r1, r2, r3 ; Do "logic AND" operation on r3 and r2, save the result in r1

EOR instruction

The EOR (Exclusive OR) instruction performs a bitwise "exclusive OR" operation on the value in the register Rn and the value in shifter_operand, stores the execution result in the destination register Rd, and updates the corresponding condition flag in the CPSR according to the execution result of the instruction .

The syntax format of the command:

EOR{<c>}{S} <Rd>,<Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register.

<Rn>: the first operand register.

<shifter_operand>: the data to be XORed with the Rn register

Command example:

Example code 45-11 eor example

1 eor r0, r0, #3 ; invert bits 0 and 1 in r0

2 eor r1, r1, #0x0f ; invert the lower 4 bits of r1

3 eor r2,r1,r0 ;r2=r1∧r0

4 eors r0,r5,#0x01 ;r0=r5∧0x01 affects the flag bit

SUB command

The SUB (Subtract) instruction subtracts the value represented by shifter_operand from the register Rn, and saves the result to the destination register Rd

, and set the corresponding flag bit in CPSR according to the execution result of the instruction.

The syntax format of the command:

SUB{<c>}{S} <Rd>,<Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register.

<Rn>: minuend.

<shifter_operand>: subtrahend

Command example:

Example code 45-12 sub example

1 sub r0, r1, r2 ;r0 = r1−r2

2 sub r0, r1, #256 ;r0 = r1−256

3 sub r0, r2, r3,lsl#1 ;r0 = r2−(r3<<1)

RSB instruction

The RSB (Reverse Subtract) instruction subtracts the value represented by Rn from the register shifter_operand, and saves the result to the target register

register Rd, and set the corresponding flag bit in CPSR according to the execution result of the instruction.

The syntax format of the command:

RSB{<c>}{S} <Rd>,<Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register.

<Rn>: subtrahend.

<shifter_operand>: minuend

Command example:

The following sequence of instructions negates a 64-bit value. The 64-bit number is placed in registers R0 and R1, and its negative number is placed in R2 and R3. The 32-bit value is lowered in R0 and R2.

Example code 45-13 rsb example

1 rsbs r2,r0,#0

2 rsc r3,r1,#0

SBC command

The SBC (Subtract with Carry) instruction is used to perform subtraction when the operand is greater than 32 bits. This instruction subtracts the value represented by shifter_operand from the register Rn, then subtracts the inverse code [NOT (Carry flag)] of the C condition flag in the register CPSR, and saves the result in the target register Rd, and according to the execution result of the instruction Set the corresponding flag bit in CPSR.

The syntax format of the command:

SBC{<c>}{S} <Rd>,<Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register.

<Rn>: minuend.

<shifter_operand>: subtrahend

Command example:

The following program uses SBC to implement 64-bit subtraction, (R1, R0)−(R3, R2), and the result is stored in (R1, R0).

Example code 45-14 sbc example

1 subs r0,r0,r2

2 sbcs r1,r1,r3

0. RSC command

The RSC (Reverse Subtract with Carry) instruction subtracts the value represented by Rn from the register shifter_operand, then subtracts the inverse code [NOT (Carry Flag)] of the C condition flag in the register CPSR, and saves the result in the target register Rd, And set the corresponding flag bit in CPSR according to the execution result of the instruction.

The syntax format of the command:

RSC{<c>}{S} <Rd>,<Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register.

<Rn>: subtrahend.

<shifter_operand>: minuend

Command example:

The program below uses the RSC instruction to find the negative of a 64-bit value.

Example code 45-15 rsc example

1 rsbs r2,r0,#0

2 rsc r3,r1,#0

ADD instruction

The ADD instruction adds the value of the register shifter_operand to the value represented by Rn, and saves the result to the target register Rd, and sets the corresponding flag bit in CPSR according to the execution result of the instruction.

The syntax format of the command:

ADD{<c>}{S} <Rd>,<Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register.

<Rn>: the first operand.

<shifter_operand>: the number to be added

Command example:

Example code 45-16 add example

1 add r0, r1, r2 ; r0 = r1 + r2

2 add r0, r1, #256 ; r0 = r1 + 256

3 add r0, r2, r3,lsl#1 ; r0 = r2 + (r3 << 1)

ADC instruction

The ADC instruction adds the value of the register shifter_operand to the value represented by Rn, plus the value of the C condition flag in CPSR, saves the result in the target register Rd, and sets the corresponding flag in CPSR according to the execution result of the instruction.

The syntax format of the command:

ADC{<c>}{S} <Rd>,<Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register.

<Rn>: the first operand.

<shifter_operand>: the number to be added

Command example:

The ADC instruction will add the two operands and place the result in the destination register. It uses a carry flag so that additions larger than 32 bits can be done. The following example will add two 128-bit numbers.

128-bit result: registers R0, R1, R2, and R3.

First 128-bit number: registers R4, R5, R6 and R7.

Second 128-bit number: registers R8, R9, R10, and R11.

Example Code 45-17 adc example

1 adds r0, r4, r8 ; add low-end words

2 adcs r1, r5, r9 ; add the next word with carry

3 adcs r2, r6, r10 ; add the third word with carry

4 adcs r3, r7, r11 ; add high-end words, with carry

CMP instruction

The CMP (Compare) instruction subtracts the value of shifter_operand from the value of register Rn, and updates the corresponding condition flag in CPSR according to the result of the operation, so that the following instructions can judge whether to execute according to the corresponding condition flag.

The syntax format of the command:

CMP{<c>} <Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

<Rn>: the first operand.

<shifter_operand>: the number of comparisons

Command example:

The CMP instruction allows the contents of one register to be compared with the contents of another register or an immediate value, changing status flags to allow conditional execution. It does a subtraction, but instead of storing the result, it correctly changes the flag bits. The flag bit indicates the result of the comparison between operand 1 and operand 2 (the value may be greater than, less than, or equal). If operand 1 is greater than operand 2, subsequent ones with GT suffix

The command will be executed.

Apparently, CMP does not need to explicitly specify the S suffix to change the status flags.

Example code 45-18 cmp example

1 cmp r1, #10 ; Compare r1 and immediate 10 for equality

2 frog loops

It can be seen from the above example that the difference between the CMP instruction and the SUBS instruction is that the CMP instruction does not save the operation result. When judging the size of two data, the CMP instruction and the corresponding condition code are often used to operate.

CMN command

The CMN (Compare Negative) instruction subtracts the negative value of shifter_operand from the value of the register Rn, and updates the corresponding condition flag bit in the CPSR according to the operation result, so that subsequent instructions can judge whether to execute according to the corresponding condition flag.

The syntax format of the command:

CMN{<c>} <Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is ignored, the instruction is executed unconditionally.

<Rn>: the first operand.

<shifter_operand>: the number of comparisons

Command example:

The CMN instruction adds the value in the register Rn to the value represented by shifter_operand, and sets the corresponding condition flag in CPSR according to the result of the addition. The effect of the value in register Rn plus the operation result of shifter_operand on the condition flag in CPSR is slightly different from the inverse of the value in register Rn minus the operation result of shifter_operand on the condition flag in CPSR.

The following command adds 1 to the value of R0, and judges whether R0 is 1’s complement, and if so, sets Z.

Example code 45-19 cmn example

1 cmn r0,#1

TST test command

The TST (Test) test instruction is used to compare the value of a register with a value. The condition flag is set according to the result of the "logical AND" of the two operands.

TST{<c>} <Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

<Rn>: the first operand.

<shifter_operand>: the number of comparisons

Command example:

The following instruction tests whether bit 0 is set in R0.

Example code 45-20 tst example

1 tst r7, #0x4 ; Test whether the second bit of the value of the r7 register is set

2 addeq r6, r7 ; if set, execute r6+r7

3 addne r8, r7 ; if not set, execute r8+r7

The TST instruction is similar to the CMP instruction in that it does not produce a result that is placed into a destination register. Instead, it operates on the two operands given and reflects the result on the status flags. Use the TST instruction to check whether a specific bit is set. Operand 1 is the data word to test and operand 2 is a bitmask. After the test, the Z flag is set if it matches, otherwise it is cleared. Like the CMP instruction, this instruction does not need to specify the S suffix.

TEQ instruction

The TEQ (Test Equivalence) instruction is used to compare the value of a register with an arithmetic value. The condition flag is set according to the result of the "exclusive OR" of the two operands. So that the following instructions can judge whether to execute according to the corresponding condition flag.

The syntax format of the command:

TEQ{<c>} <Rn>,<shifter_operand>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

<Rn>: the first operand.

<shifter_operand>: The number of comparisons.

Command example:

Example Code 45-21 teq Example

1 teq r0, r1 ; r0 and r1 are equal

2 addeq r0, r0, #1 ; if r0==r1, eq is true, then r1=r1+1

multiplication instruction

The ARM multiply instruction performs the multiplication of two data. The result of multiplying two 32-bit binary numbers is a 64-bit product. In some processor versions of ARM, the result of the product is stored in two separate registers. Other versions store only the least significant 32 bits in a register. No matter what version of the processor it is, there are multiply-accumulate variant instructions, which continuously accumulate the products to get the sum. And both signed and unsigned numbers can be used. The least significant bit of the result is the same for signed and unsigned numbers. Therefore, for a multiply instruction that only retains a 32-bit result, there is no need to distinguish between signed and unsigned cases.

The functions of various multiplication instructions are shown in the table.

illustrate:

1) "RdHi:RdLo" is a 64-bit concatenation of RdHi (most significant 32 bits) and RdLo (least significant 32 bits)

number, "[31:0]" selects only the least significant 32 bits of the result.

2) Simple assignments are represented by ":=".

3) Accumulation (adding the right to the left) is indicated by "+=".

4) Bit S in each multiplication instruction (refer to the syntax format of the specific instruction below) controls the setting of the condition code to produce the following results.

1) For the instruction form that produces 32-bit results, set the flag bit N to the value of the 31st bit of Rd; for the instruction form that produces long results

command form, set it to the value of the 31st bit of RdHi.

2) For the instruction form that produces 32-bit results, if Rd is equal to zero, the flag bit Z is set; for the instruction form that produces long results, when RdHi and RdLo are both zero, the flag bit Z is set.

3) Set the flag bit C to a meaningless value.

4) Flag bit V remains unchanged.

 MUL instruction

The MUL (Multiply) 32-bit multiplication instruction multiplies the values ​​in Rm and Rs, and the lowest 32 bits of the result are stored in Rd.

The syntax format of the command: 

MUL{<c>}{S} <Rd>,<Rm>,<Rs>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register.

<Rn>: the first operand.

<Rs>: the number to be multiplied by Rn

Command example:

Example Code 45-22 mul Example

1 mul r1, r2, r3 ;r1 = r2 × r3

2 muls r0, r3, r7 ;r0 = r3 x r7

MLA instruction

The MLA (Multiply Accumulate) 32-bit multiply-accumulate instruction multiplies the values ​​in Rm and Rs, then adds the product to the third operand, and saves the lowest 32 bits of the result into Rd.

The syntax format of the command:

MLA{<c>}{S} <Rd>,<Rm>,<Rs>,<Rn>

{<c>}: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rd>: destination register.

<Rm>: The first multiplier.

<Rs>: Second multiplier.

<Rn>: add to the product of Rm and Rs.

Command example:

The following instruction completes the operation of R1 = R2×R3+R0.

Example code 45-23 mla example

1 mla r1, r2, r3, r0

UMULL instruction

UMULL (Unsigned Multiply Long) is a 64-bit unsigned multiplication instruction. It multiplies the values ​​in Rm and Rs by unsigned numbers, saves the low 32 bits of the result to RdLo, and saves the high 32 bits to RdHi.

The syntax format of the command:

UMULL{<c>}{S} <RdLo>,<RdHi>,<Rm>,<Rs>

{<c>}: The condition code for the instruction execution. When <c> is ignored, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rm>: The first multiplier.

<Rs>: second multiplier

< RdHi >: high 32 bits of the product of Rm and Rs

< RdLo >: the lower 32 bits of the product of Rm and Rs

Command example:

Example Code 45-24 umull example

1 return r0, r1, r5, r8 ; (R1,R0) = R5 × R8

UMLAL instruction

UMLAL (Unsigned Multiply Accumulate Long) is a 64-bit unsigned long multiply-accumulate instruction. The instruction multiplies the values ​​in Rm and Rs as unsigned numbers, adds the 64-bit product to RdHi and RdLo, saves the low 32 bits of the result to RdLo, and saves the high 32 bits to RdHi.

The syntax format of the command:

UMALL{<c>}{S} <RdLo>,<RdHi>,<Rm>,<Rs>

{<c>}: The condition code for the instruction execution. When <c> is ignored, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rm>: The first multiplier.

<Rs>: second multiplier

<RdHi>: Add the upper 32 bits of the product of Rm and Rs

<RdLo>: add the lower 32 bits of the product of Rm and Rs

Command example:

Example code 45-25 umlal example

1 umlal r0, r1, r5,r8 ;(r1,r0) = r5 × r8+(r1,r0)

SMULL instruction

SMULL (Signed Multiply Long) is a 64-bit signed long multiplication instruction. The instruction multiplies the values ​​in Rm and Rs with signed numbers, the lower 32 bits of the result are stored in RdLo, and the upper 32 bits are stored in RdHi.

The syntax format of the command:

SMULL{<c>}{S} <RdLo>,<RdHi>,<Rm>,<Rs>

{<c>}: The condition code for the instruction execution. When <c> is ignored, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rm>: The first multiplier.

<Rs>: second multiplier

< RdHi >: high 32 bits of the product of Rm and Rs

< RdLo >: the lower 32 bits of the product of Rm and Rs

Command example:

Example Code 45-26 smull example

1 small r2, r3, r7, r6 ;(r3,r2) = r7 × r6

SMLAL instruction

SMLAL (Signed Multiply Accumulate Long) is a 64-bit signed long multiply-accumulate instruction. The instruction multiplies the values ​​in Rm and Rs as signed numbers, adds the 64-bit product to RdHi and RdLo, saves the lower 32 bits of the result to RdLo, and saves the upper 32 bits to RdHi

middle.

The syntax format of the command:

SMLAL{<c>}{S} <RdLo>,<RdHi>,<Rm>,<Rs>

{<c>}: The condition code for the instruction execution. When <c> is ignored, the instruction is executed unconditionally.

{S}: Determines whether the operation of the instruction affects CPSR.

<Rm>: The first multiplier.

<Rs>: second multiplier

<RdHi>: Add the upper 32 bits of the product of Rm and Rs

<RdLo>: add the lower 32 bits of the product of Rm and Rs

Command example:

Example code 45-27 smlal example

1 small r2, r3, r7, r6 ;(r3,r2)=r7×r6+(r3,r2)

Load/Store  command

Load/Store memory access instructions transfer data between ARM registers and memory. There are 3 basic data transfer instructions in ARM instructions.

⚫ Single register Load/Store instruction

These instructions provide a more flexible means of moving single data items between ARM registers and memory. Data items can be bytes,

16-bit halfword or 32-bit word.

⚫ Multi-register Load/Store memory access instructions

These instructions are less flexible than single-register transfer instructions, but allow large amounts of data to be transferred more efficiently. They are used for process entry and exit, saving and restoring working registers, and copying a block of data in memory.

⚫ Single register swap instruction

These instructions allow values ​​in registers and memory to be exchanged, effectively performing a Load/Store operation in a single instruction. They are rarely used in user-level programming. Its main purpose is to implement semaphores (Semaphores) operations in multiprocessor systems to ensure that common data structures are not accessed at the same time.

1. Load/Store instruction of single register

LDR instruction

The LDR instruction is used to read a 32-bit word from memory into the destination register.

The syntax format of the command:

LDR{<c>} <Rd>,<addr_mode>

Command example:

Example Code 45-28 ldr example

1 ldr r1,[r0,#0x12] ; read out the data at address r0+12 and save it in r1 (the value of r0 remains unchanged)

2 ldr r1,[r0] ; Read the data at address r0 and save it to r1 (zero offset)

3 ldr r1,[r0,r2] ; Read out the data at the address of r0+r2 and save it in r1 (the value of r0 remains unchanged)

4 ldr r1,[r0,r2,lsl #2] ;Read the data at the address of r0+r2×4 and save it in r1 (the values ​​of r0 and r2 remain unchanged)

5 ldr pc,[pc, #0x18] ;jump the program to pc+0x18

6 ldr rd,label ;label is the program label, and the label must be within the range of -4~4kb of the current instruction

7 ldr rd,[rn],#0x04 ;The value of rn is used as the storage address of the transmitted data. After data transfer, offset 0x04 and rn

Add and write the result back to rn. rn is not allowed to be r15

STR command

The STR instruction is used to write a 32-bit word data to the memory unit specified in the instruction.

The syntax format of the command:

STR{<c>} <Rd>,<addr_mode>

Command example:

Sample code 45-29 str data write back

1 ldr r0, =0xE0200000

2 ldr r1, =0x00002222

3 str r1, [r0, #0x20]

LDRB instruction

The LDRB instruction reads 1 byte (8bit) into the target register Rd in the instruction according to the address mode determined by addr_mode.

The syntax format of the command:

LDRB{<c>} <Rd>, <addr_mode>

STRB instruction

The STRB instruction takes a specified byte (8bit) from the register and puts it into the lower 8 bits of the register, and fills the higher bits of the register with 0.

The syntax format of the command:

STRB{<c>} <Rd>,<addr_mode>

) LDRH instruction

The LDRH instruction is used to read a 16-bit halfword from memory into the destination register.

If the memory address of the instruction is not nibble-aligned, the execution result of the instruction is unpredictable.

The syntax format of the command:

LDRH{<c>} <Rd>,<addr_mode>

STRH instruction

The STRH instruction takes the specified 16-bit halfword from the register and puts it into the low 16 bits of the register, and fills the high bits of the register with 0.

The syntax format of the command:

STRH{<c>} <Rd>,<addr_mode>

Multi-register Load/Store memory access instruction

Multi-register Load/Store memory access instructions are also called bulk load/store instructions, which can transfer data between a group of registers and a continuous memory unit. LDM is used to load multiple registers and STM is used to store multiple registers. The multi-register Load/Store memory access instruction allows a single instruction to transfer any subset or all of the 16 registers. The multi-register Load/Store memory access instruction is mainly used for context protection, data replication and parameter transfer. The load/store memory access instructions for multiple registers are listed as shown in the table.

Table 45.3.3.2 Multiple Register Operation Instructions

LDM instruction

The LDM instruction reads data from contiguous memory locations into each register in the register list specified in the instruction. When the PC is included in the register list of the LDM instruction, the word data read by the instruction from the memory will be used as the target address value. After the instruction is executed, the program will start to execute from the target address, thus realizing the jump of the instruction.

The syntax format of the command:

LDM{<c>}<addressing_mode> <Rn>{!}, <registers>

STM commands

The STM instruction writes the value of each register in the register list in the instruction to a continuous memory unit. It is mainly used for writing block data, operating data stack and saving related registers when entering a subroutine.

The syntax format of the command:

STM{<c>}<addressing_mode> <Rn>{!}, <registers>

Example of batch data transfer instruction

LDM/STM bulk load/store instructions can transfer data between a set of registers and a contiguous memory unit. LDM is load multiple registers and STM is store multiple registers. Allows one instruction to transfer any subset or all of the 16 registers. The command format is as follows:

 LDM{c}<mode> Rn{!},regist{ˆ}

STM{c}<mode> Rn{!},regist{ˆ}

The main uses of LDM/STM are site protection, data replication, and parameter transfer. There are 8 modes, among which the first 4 are used for data block transmission, and the last 4 are stack operations, as shown below.

(1) IA: Add 4 to the address after each transmission.

(2) IB: Add 4 to the address before each transmission.

(3) DA: The address is decremented by 4 after each transfer.

(4) DB: subtract 4 from the address before each transfer.

(5) FD: full decrement stack.

(6) ED: Empty increment stack.

(7) FA: Full increment stack.

(8) EA: empty increment stack.

Among them, the register Rn is the base address register, which contains the initial address of the transmitted data, and Rn is not allowed to be R15; the suffix "!" indicates that the last address is written back to Rn; the register list reglist can contain more than one register or register range, Use "," to separate, such as {R1, R2, R6~R9}, the registers are arranged in ascending order; the "ˆ" suffix is ​​not allowed to be used in user mode, but can only be used in system mode. If PC is included in LDM instruction register list, copy SPSR to CPSR in addition to normal multi-register transfer, which can be used for exception handling return; use "ˆ" suffix for data transfer and register list does not contain When using a PC, the user mode register is loaded/stored, not the current mode register.

Sample Code 45-30 Bulk Data Transfer Instructions

1 LDMIA R0!,{R3~R9} ; Load the multi-word data at the address pointed by R0, save it to R3~R9, and update the value of R0

2 STMIA R1!,{R3~R9} ; Store the data of R3~R9 to the address pointed to by R1, and update the value of R1

3 STMFD SP!,{R0~R7,LR} ; save on site, push R0~R7, LR into the stack

4 LDMFD SP!,{R0~R7,PC}ˆ ;restore the scene, return with abnormal processing

When copying data, first set the source data pointer, and then use the block copy addressing instructions LDMIA/STMIA, LDMIB/STMIB, LDMDA/STMDA, LDMDB/STMDB to read and store. When performing stack operations, you must first set the stack pointer, generally use SP and then use stack addressing instructions STMFD/LDMFD, STMED/LDMED, STMEA/LDMEA to implement stack operations. Whether the data is stored above or below the address of the base register, whether the address is before or after the first value is stored, incremented or decremented, as shown in the table.

Data replication using LDM/STM.

Example Code 45-31 Data Replication

1 LDR R0,=SrcData ; set source data address

2 LDR R1,=DstData ; set target address

3 LDMIA R0,{R2~R9} ; Load 8 words of data to registers R2~R9

4 STMIA R1, {R2~R9} ; Store registers R2~R9 to the target address

Use LDM/STM for field register protection, often used in subroutines or exception handling.

Example Code 45-32 Securing the Site

1 SENDBYTE:

2 STMFD SP!,{R0~R7,LR} ;Register push protection

3 ...

4 BL DELAY ; Call the DELAY subroutine

5 ...

6 LDMFD SP!,{R0~R7,PC} ;Restore registers and return

single data exchange instruction

The swap instruction is a special case of the Load/Store instruction, which swaps the contents of a register location with the contents of a register. The exchange instruction is an atomic operation (Atomic Operation), that is, read/write a storage unit in continuous bus operations, and prevent any other instructions from reading/writing the storage unit during the operation. Exchange instructions are shown in the table.

SWP word swap instruction

The SWP instruction is used to exchange a word unit in the memory with the value of a specified register. The operation process is as follows: Assuming that the address of the memory unit is stored in the register <Rn>, the instruction reads the data in <Rn> into the destination register Rd, and at the same time writes the content of another register <Rm> into the memory unit.

When <Rd> and <Rm> are the same register, the instruction exchanges the contents of the register and the memory unit.

The syntax format of the command:

SWP{<c>} <Rd>,<Rm>,[<Rn>]

The byte unit is exchanged with the lower 8-bit value of a specified register, and the operation process is as follows: Assuming that the address of the memory unit is stored in the register <Rn>, the instruction reads the data in <Rn> into the destination register Rd, and the value of the register Rd Set the high 24 bits to 0, and write the low 8 bits of another register <Rm> into this memory byte unit. When <Rd> and <Rm> are the same register, the instruction exchanges the contents of the lower 8 bits of the register and the contents of the memory byte unit.

The syntax format of the command:

SWP{<c>}B <Rd>,<Rm>,[<Rn>]

Command example:

Example code 45-33 swp command example

1 SWP R1, R1, [R0] ; Exchange the content of R1 with the content of the storage unit pointed to by R0

2 SWPB R1,R2,[R0] ;Read one byte of data from the storage unit pointed to by R0 into R1 (the upper 24 bits are cleared), and write the content of R2 into the memory unit (lowest byte effective), using SWP instruction can conveniently carry out semaphore operation.

jump instruction

The Jump (B) and Jump Link (BL) instructions are the standard way of changing the order in which instructions are executed. ARM generally executes instructions in word address order, and uses conditional execution to skip certain instructions when necessary. Whenever a program must execute out of order, use control flow instructions to modify the program counter. The transfer and transfer-connect instructions are the standard way, although there are several other ways of accomplishing this in specific situations. Jump instructions change the execution flow of a program or call a subroutine. This instruction enables a program to use subroutines, if-then-else structures, and loops. A change in execution flow forces the program counter (PC) to point to a new address.

Another way to achieve instruction jump is to directly write the target address value into the PC register to realize any jump in the 4GB address space. This jump instruction is also called long jump. If you use instructions such as "MOV LR" or "MOV PC" before the long jump instruction, you can save the address value returned in the future, and realize the subroutine call in the 4GB address space.

1. Jump instruction B and jump instruction BL with connection

Jump instruction B makes the program jump to the specified address to execute the program. The jump instruction BL with connection copies the address of the next instruction to the R14 (return address connection register LR) register, and then jumps to the specified address to run the program. It should be noted that these two instructions and the instruction at the target address must belong to the ARM instruction set. Both instructions can decide whether to execute the instruction according to the value of the condition flag bit in CPSR.

The syntax format of the command:

B{L}{<c>} <target_address>

The BL instruction is used to implement subroutine calls. Subroutine return can be realized by copying the value of LR register to PC register. The following 3 instructions can realize subroutine return.

1) BX R14 (if the architecture supports the BX instruction).

2) MOV PC,R14。

3) When the subroutine uses the push instruction at the entry point:

STMFD R13!,{<registers>,R14}

You can use the command:

LDMFD R13!,{<registers>,PC} 

Put the subroutine return address into PC.

The ARM assembler calculates signed_immed_24 in the instruction encoding by the following steps.

1) Use the value of PC register as the base address value of this jump instruction.

2) Subtract the above-mentioned jump base address from the jump target address to generate a byte offset. Since ARM instructions are word aligned, this byte offset is a multiple of 4.

3) When the byte offset generated above exceeds −33 554 432~+33 554 430, different assemblers use different code generation strategies. Otherwise, set signed_immed_24 in the instruction codeword to bits[25:2] of the above byte offset.

Program example:

The program jumps to the LABLE label.

Example code 45-34 jumps to the label

1 b lable

2 add r1,r2,#4

3 add r3,r2,#8

4 sub r3,r3,r1

5 lable:

6 sub r1,r2,#8

jump to absolute address

Example code 45-35 jumps to an absolute address 

1 b 0x1234

Jump to the execution of the subroutine func, and save the current PC value to LR at the same time.

Sample Code 45-36 Subroutine Jump 

1 bl func

Create an infinite loop with jump instructions.

Example code 45-37 infinite loop 

1 loop:

2 add r1,r2,#4

3 add r3,r2,#8

4 sub r3,r3,r1

5 b loop

Loop the program body 10 times by using jumps.

Example Code 45-38 Finite Loop 

1 mov r0,#10

2 loop:

3 subs r0,#1

4 bne loop

Example of a conditional subroutine call.

Code Example 45-39 Conditional Call 

1 cmp r0,#5 ; if r0<5

2 bllt sub1 ; then call

3 blge sub2; otherwise call sub2

Jump instruction BX with state switching

The jump instruction with state switching (BX) makes the program jump to the address specified by the parameter Rm specified in the instruction to execute the program, the 0th bit of Rm is copied to the T bit in CPSR, and bit[31:1] is moved into PC. If bit[0] of Rm is 1, the flag bit T in CPSR is automatically set when jumping, that is, the code of the target address is interpreted as Thumb code; if bit[0] of Rm is 0, jump Automatically reset the flag bit T in the CPSR, that is, interpret the target address code as ARM code.

The syntax format of the command:

BX{<c>} <Rm>

When Rm[1:0]=0b10, the execution result of the instruction is unpredictable. Because in ARM state, instructions are 4-byte aligned. PC can be used as Rm register, but this usage is not recommended. When PC is used as <Rm>, the instruction "BX PC" will jump the program to the second instruction below the current instruction for execution. Although such jumps can be implemented, it is best to use the following instructions to accomplish such jumps.

MOV PC, PC or ADD PC, PC, #0

Command example:

Branch to address in R0, if R0[0]=1, enter Thumb state.

Example code 45-40 bx instruction example 

1 bx r0

Link jump instruction BLX with connection and state switching Branch with Link Exchange (BLX) uses a label to make the program jump to the Thumb state or return from the Thumb state. This instruction is an unconditional execution instruction, and uses the lowest bit of the branch register to update the T bit in the CPSR, and writes the return address into the connection register LR.

Grammar format:

BLX <target_add>

Among them, <target_add> is the jump target address of the instruction. This address is calculated according to the following rules.

1) Sign-extend the 24-bit offset specified in the instruction to form a 32-bit immediate value.

2) Shift the result left two places.

3) Bit H (bit[24]) is added to the first bit (bit[1]) of the result address.

4) Accumulate the result into the program counter (PC).

The work of calculating the offset is generally done by the ARM assembler. This form of jump instruction can only realize the jump of −32~32MB space. Shift left two bits to form a word offset, which is then accumulated into the program counter (PC). At this time, the content of the program counter is the BX instruction address plus 8 bytes. Bit H (bit[24]) is also added to the first bit (bit[1]) of the result address, making the target address a halfword address for the execution of the following Thumb instruction. The work of calculating the offset is generally done by the ARM assembler. This form of jump instruction can only realize the jump of −32~32MB space.

Command example:

To return from Thumb state to ARM state, use the BX instruction.

Example code 45-41 Thumb status returns ARM status

1 blx func

state operation instruction

The ARM instruction set provides two instructions that directly control the Program State Register (PSR). The MRS instruction is used to transfer the value of CPSR or SPSR to a register; MSR, on the contrary, transfers the contents of a register to CPSR or SPSR.

Combined, these two instructions can be used to read/write to CPSR and SPSR. The program status register instructions are shown in the table.

Table 45.3.5.1 Status Operation Instructions

In the command syntax you see an item called fields, which can be a combination of control (C), extension (X), status (S), and flags (F).

1、 MRS

The MRS instruction is used to transfer the contents of the program status register to the general register.

In the ARM processor, only the MRS instruction can read the status register CPSR or SPSR into the general register.

The syntax format of the command:

MRS{c} Rd, PSR

Among them, Rd is the target register, and Rd is not allowed to be the program counter (PC). PSR is CPSR or SPSR.

Command example:

Example Code 45-42 Reading CPSR and SPSR 

1 mrs r1, cpsr ; read the cpsr status register and save it in r1

2 mrs r2, spsr ; read the spsr status register and save it in r1

The MRS instruction reads CPSR, which can be used to judge whether the ALU status flag and IRQ/FIQ interrupt are allowed, etc.; in the exception handling program, reading SPSR can specify the processor state before entering the exception, etc. MRS is used in conjunction with MSR to realize the read-modify-write operation of CPSR or SPSR register, which can be used to switch processor mode, enable/disable IRQ/FIQ interrupt and other settings. In addition, when the process is switched or abnormal interrupt nesting is allowed, it is also necessary to use the MRS instruction to read and save the SPSR status value. 

MSR

In ARM processors, only the MSR instruction can directly set the status register CPSR or SPSR.

The syntax format of the command:

MSR{c} PSR_fields, #immed_8r

MSR{c} PSR_fields, Rm

Among them, PSR refers to CPSR or SPSR. <fields> Set the bits to be manipulated in the status register. The 32 bits of the status register can be divided into four 8-bit fields. bits[31:24] is the condition flag bit field, represented by f; bits[23:16] is the state bit field, represented by s; bits[15:8] is the extended bit field, represented by x; bits[7: 0] is the control bit field, denoted by c; immed_8r is the immediate value to be transferred to the designated domain of the status register, 8 bits; Rm is the data source register to be transferred to the designated domain of the status register.

Command example:

Example Code 45-43 MSR Instruction Example

1 msr cpsr,#0xd3 ;cpsr[7:0]=0xd3, switch to management mode

2 msr cpsr,r3 ;cpsr=r3

Note: The status register can only be modified in privileged mode.

In the program, the T bit control bit in the CPSR cannot be directly modified by the MSR instruction to realize the switching of the ARM state/Thumb state, and the BX instruction must be used to complete the switching of the processor state (because the BX instruction is a transfer instruction, it will interrupt the pipeline state. , to achieve switching of the processor state). MRS is used in conjunction with MSR to realize the read-modify-write operation of CPSR or SPSR registers, which can be used to switch processor modes and enable/disable IRQ/FIQ interrupt settings.

Application of Program Status Register Instruction

Enable IRQ interrupt.

Example Code 45-44 Enable IRQ Interrupt

1 enable_irq:

2 mrs r0,cpsr

3 bic r0,r0,#0x80

4 msr cpsr_c,r0

5 mov pc,lr

Disable IRQ interrupt.

Example Code 45-45 disable IRQ interrupt

1 disable_irq:

2 mrs r0,cpsr

3 orr r0,r0,#0x80

4 msr cpsr,r0

5 mov pc,lr

Set interrupt mode stack:

Example code 45-46 irq mode stack

1 msr cpsr,#0xd2

2 ldr sp,stacksvc

coprocessor instructions

The ARM architecture allows the instruction set to be extended by adding coprocessors. The most commonly used coprocessor is the system coprocessor used to control on-chip functions. For example, the cp15 register that controls the Cache and memory management units. In addition, there are floating-point ARM coprocessors for floating-point operations, and manufacturers can also develop their own dedicated coprocessors as needed.

The ARM coprocessor has its own dedicated set of registers whose state is controlled by instructions that mirror the instructions that control the state of the ARM. The program's control flow instructions are handled by the ARM processor, and all coprocessor instructions can only be related to data processing and data transfer. According to the Load/Store system principle of RISC, data processing and transmission instructions are clearly separated, so they have different instruction formats. The ARM processor supports 16 coprocessors, each coprocessor ignores ARM and other coprocessor instructions during program execution. When a coprocessor hardware cannot execute its coprocessor instructions, it will generate an undefined instruction exception interrupt, and the hardware operation can be simulated by software during the exception interrupt processing process. If a system does not contain a vector floating-point unit, a floating-point arithmetic package can be selected to support vector floating-point arithmetic.

The ARM coprocessor can partially execute an instruction and then generate an interrupt. This allows for better handling of run-time-generated exceptions such as division by zero and overflow. However, part of the execution of instructions is done by the coprocessor, which is transparent to ARM. When the ARM processor regains execution, it starts executing at the instruction that caused the exception. For a coprocessor, not all fields in the coprocessor instruction are necessarily used. How the specific coprocessor is defined and operated is completely determined by the manufacturer of the coprocessor. Therefore, the identifiers and operation mnemonics of the coprocessor registers in the ARM coprocessor instructions also have various implementation definitions. Programmers can define the syntax format of these instructions through macros.

ARM coprocessor instructions can be divided into the following 3 categories.

⚫ Coprocessor data manipulation. Coprocessor data operations are entirely coprocessor internal operations, which complete the state change of the coprocessor registers. Such as floating-point addition operation, two registers are added in the floating-point coprocessor, and the result is placed in the third register. Such orders include CDP orders.

⚫ Coprocessor data transfer instructions. These instructions read data from a register into a coprocessor register, or load data from a coprocessor register into memory. Because coprocessors can support their own data types, the number of words transferred per register is coprocessor dependent. The ARM processor generates memory addresses, but the bytes transferred are controlled by the coprocessor. Such instructions include LDC instructions and STC instructions.

⚫ Coprocessor register transfer instructions. In some cases, it is necessary to transfer data between the ARM processor and the coprocessor. Like a floating-point coprocessor, the FIX instruction takes floating-point data from the coprocessor registers, converts it to an integer, and transfers the integer to an ARM register. It is often necessary to use the results of floating-point comparisons to affect control flow, so the comparison results must be passed into the ARM's CPSR. Such coprocessor register transfer instructions include MCR and MRC.

All coprocessor processing instructions are listed as shown in the table.

Table 45.3.6.1 Coprocessor Operation Instructions

The following briefly introduces the usage of the more commonly used MCR and MRC commands

ARM register to coprocessor register data transfer instruction MCR

syntax of the command

MCR<c> <coproc>, <opc1>, <Rt>, <CRn>, <CRm>{, <opc2>}

<c>: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

<coproc>: coprocessor name, range p0-p15;

<opc1>: coprocessor opcode, range 0-15;

<Rt>: source register, write the Rt register to the coprocessor;

<CRn>: the target register of the coprocessor;

<CRm>: The additional target register or source operation register in the coprocessor, if no additional information is required, set it to C0, otherwise

results are unpredictable;

<opc2>: optional coprocessor-specific opcode, set to 0 when not needed.

Coprocessor register to ARM register data transfer instruction MRC

syntax of the command

MRC<c> <coproc>, <opc1>, <Rt>, <CRn>, <CRm>{, <opc2>}

<c>: The condition code for the instruction execution. When <c> is omitted, the instruction is executed unconditionally.

<coproc>: coprocessor name, range p0-p15;

<opc1>: coprocessor opcode, range 0-15;

<Rt>: source register, read the content of the coprocessor to the Rt register;

<CRn>: the target register of the coprocessor;

<CRm>: The additional target register or source operation register in the coprocessor, if no additional information is required, set it to C0, otherwise

results are unpredictable;

<opc2>: optional coprocessor-specific opcode, set to 0 when not needed.

command example

Example code 45-47 enable ICache

1 /******Cache Test*******/

2 mrc p15,0,r1,c1,c0,0

3 orr r1, r1, #(1 << 2) // Set C bit to enable Cache as a whole

4 orr r1, r1, #(1 << 12) //Set I bit to enable ICache

5 mcr p15,0,r1,c1,c0,0

6 /******End Test******/

exception generation instruction

The ARM instruction set provides two instructions that generate exceptions, through which exceptions can be realized by software. The exception generation instructions for ARM are shown in the table.

The software interrupt instruction (Software Interrupt, SWI) is used to generate a soft interrupt, so as to realize the transition from the user mode to the management mode, save the CPSR to the SPSR in the management mode, and transfer the execution to the SWI vector. The SWI instruction can also be used in other modes. The processor also switches to supervisor mode.

The syntax format of the command.

SWI{<c>} <immed_24>

command example

Example code 45-48 swi instruction example

1 swi 0 ; generate a soft interrupt, the interrupt number is 0

2 swi 0x123456 ; Generate a soft interrupt, the immediate value of the interrupt is 0x123456

When using SWI instructions, the following two methods are usually used for parameter passing.

The program below generates a soft interrupt with interrupt number 12.

Example code 45-49 swi instruction example

1 mov r0,#34 ; set the function number to 34

2 swi 12 ; Generate a soft interrupt, the interrupt number is 12

The following example transmits the interrupt number through R0, and R1 transmits the sub-function number of the interrupt.

Example code 45-50 swi instruction example

1 mov r0,#12 ; set soft interrupt number 12

2 mov r1,#34 ; set the function number to 34

3 swi 0

ARM  assembly experiment

Purpose

Understand how the program works

Master the basic use of ARM assembly language;

Familiar with eclipse development tools to build assembly projects and simulations;

Experimental principle

According to the usage syntax and functions of the RAM assembly language described above, write an assembly program to realize a simple data operation.

⚫  Components of an executable program

An executable elf application under linux usually includes the following parts.

Figure 45-3 Components of an executable program

This involves the code segment, data segment, uninitialized data, heap, stack, several parts:

Code Segment: The code segment is mapped read-only in memory. It is usually used to store instructions for program execution.

Data segment: It is usually used to store initialized (non-zero) global variables and static local variables in the program. The starting position of the data segment is confirmed by the link location file, and the size is automatically allocated when compiling and linking.

Uninitialized data: A memory area usually used to store global variables that are uninitialized and initialized to 0 in the program, and are cleared when the program is loaded.

Heap: Save the memory dynamically allocated (malloc or new) inside the function.

Stack: save the local variables of the function (excluding static modified variables), parameters and return values.

In assembly language, an executable program generally includes at least: code segment + data segment + BSS segment.

⚫  Position-dependent and position-independent:

The position-independent code is independent of the position. This kind of code can run normally no matter where it is placed, so the address is dynamic and cannot be fixed. The location-related code is exactly the code related to a certain specific location, and this specific location is our link address.

⚫Link  address:

When the program is compiled, each object file is compiled from the source code, and finally multiple object files are linked to generate a final executable file, and the link address is to indicate the linker, the address of each object file in the executable program Location. For example, an executable program a.out is composed of ao, bo, and co. In the final a.out, who is at the front, who is in the middle, and who is at the end can be determined by specifying the link address.

⚫Run  address:

The address when the program is actually running in the memory. For example, if the CPU wants to execute an instruction, it must be fetched from the corresponding address space by assigning a value to the PC. Then this address is the actual running address.

⚫Load  address and store address:

Every program is stored in flash at the beginning, and runs in memory. At this time, instructions need to be read from flash to memory (running address), and the address of flash is the loading address.

Figure 45-4 Comparison between link address and running address

Experimental content

Create a linker script

Example Code 45-51 Linker Script

1 OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")

2 OUTPUT_ARCH(arm)

3 ENTRY(_start)

4 SECTIONS

5 {

6 . = 0xc2000040;

7 . = ALIGN(4);

8 .text :

9 {

10 start.o(.text)

11 *(.text)

12 }

13 . = ALIGN(4);

14.rodata :

15 { *(.rodata) }

16 . = ALIGN(4);

17 .data :

18 { *(.data) }

19 . = ALIGN(4);

20 .bss :

21 { *(.bss) }

22 }

Create a Makefile compilation script

Example Code 45-52 Makefile

1 SHELL=C:\Windows\System32\cmd.exe

2

3 CROSS_COMPILE = arm-none-eabi4 NAME = h_project

5

6 CPPFLAGS := -nostdlib -nostdinc -g

7 CFLAGS := -Wall -O2 -fno-builtin -g

8

9 LD = $(CROSS_COMPILE)ld

10 CC = $(CROSS_COMPILE)gcc

11 OBJCOPY = $(CROSS_COMPILE)objcopy

12 OBJDUMP = $(CROSS_COMPILE)objdump

13

14 export CC LD OBJCOPY OBJDUMP AR CPPFLAGS CFLAGS

15

16 objs := start.o

17 all: $(objs)

18 $(LD) -T map.lds -o $(NAME).elf $^

19 $(OBJCOPY) -O binary $(NAME).elf $(NAME).bin

20 $(OBJDUMP) -D $(NAME).elf > $(NAME).dis

21

22 %.o : %.S

23 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ $< -c

24

25 %.o : %.c

26 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ $< -c

27

28 clean:

29 rm *.o *.elf *.bin

The assembler program is designed as follows

Example Code 45-53 Assembly Example

1 .text

2 .global _start

3 _start:

4

5 mov r0, #0x9

6 nop

7 mov r1, #0x7

8

9 bl add_sub

10

11 stop:

12 b stop

13

14 add_sub:

15 add r2, r0, r1 ; r2=0x9+0x7=0x10

16 sub r3, r0, r1 ; r3=0x9-0x7=0x2

17

18 mul r4, r0,r1 ; r4=0x9*0x7=0x3f

19 mov pc, lr

Experimental procedure

1. Import project source code

Please refer to Importing an Existing Project in Importing an Existing Project.

CD-ROM experiment source code path: [Data CD\Huaqing Vision-FS-MP1A Development Data-2020-11-06\02-Program Source Code\03-ARM System

Structure and Interface Technology\Cortex-A7\h_project]

2. Open the "Register" display box

Click window -> show view -> Register,

3. Single-step simulation

After the configuration is complete, click " " to start the simulation, and the Debug box will pop up.

Click " " to perform single-step simulation. Check out the simulation.

Experimental phenomena

Click " " to view the changes of the Rn register.

Step through the run to see the values ​​of R2, R3, and R4 change.

Guess you like

Origin blog.csdn.net/u014170843/article/details/130194736