Architecture: Common behavioral characteristics of various ARM processors
Processor: Implements an architecture and can be integrated into different designs
Device: Contains the processor and additional components
ARM architecture includes:
programming model
instruction set
system configuration
exception handling
storage model
A processor can implement different memory management models:
VMSA (Virtual Memory System Architecture), based on MMU (Memory Management Unit)
PMSA (Protected Memory System Architecture), based on MPU (Memory Protection Unit)
ARMv7 introduces architecture profiles :
A defines a microprocessor architecture based on VMSA, high performance, and supports complete features of the operating system
R Real-time defines a microprocessor architecture based on PMSA, deterministic time response and low-latency interrupts,
M Microcontroller low-latency interrupt processing Has different exception handling model and PMSA
processor
implements a certain architecture version
ARM926EJ-S implements ARMv5 with TEJ extension
Cortex-A9 implements ARMv7-A with multiprocessing extension
Devices are usually SOCs integrating an ARM processor and additional components
When implementing the device, the cache size, whether to support the hardware floating point unit, etc. are optional operations;
--------------------------ARMv5 architecture document summary--------------- 7
CPU modes
: User , management, system, abort, interrupt, fast interrupt, undefined
7 exceptions: reset SWI data abort prefetch abort fast interrupt interrupt undefined reset
and SWI enters management mode; cannot access memory, enter abort mode (fetch instruction, fetch data); the fetched instruction cannot be decoded, and enters the undefined (instruction) mode;
only the user mode is a non-privileged mode, and some storage systems and coprocessors have limited permissions, and an exception is generated through the SWI instruction to switch to another privileged processor model.
Privileged mode allows complete reading and writing of the CPSR, while non-privileged mode can only read the control domain and read and write condition flags.
Only when an exception occurs, the CPSR is saved (automatically?). Use instructions to rewrite the CPSR and will not save it to the SPSR;
CPSR [NZCV....IFT mode]
IF interrupt and fast interrupt mask bit
N Negative from the high 31 bits
Z Zero result is 0
C Carry carry;
V overflow
Register:
31 general-purpose registers, only 16 are visible at the same time R0 ~ R15 r13
(SP), r14(LR), r15(PC)
r0....r15 CPSR (user, system)
r0...r7 r8_fiq..r14_fiq CPSR, SPSR_fiq (fast interrupt)
r0...r12 r13_irq r14_irq CPSR, SPSR_irq (interrupt)
r0...r12 r13_svc r14_svc CPSR, SPSR_svc (management)
r0...r12 r13_undef r14_undef CPSR, SPSR_undef (undefined)
r0...r12 r13_abt r14_abt CPSR, SPSR_abt (abort)
CP15 coprocessor control storage components: cache, buffer, MMU MPU
conditional execution
EQ Z equal
NE z Not equal
CS/HS C Carry bit 1/unsigned number greater than or equal to
CC/LO c Carry cleared 0/unsigned number less than
MIN N negative number minus
PL n non-negative number plus
VS V overflow
VC v no overflow
HI zC unsigned greater than
LS Z/c unsigned less than or equal to
GE NV/nv Signed greater than or equal to
LE Z/Nv/nV Signed less than or equal to
GT NzV/nzv Signed number greater than
LT Nv/nV Signed number less than
Pipeline
ARM mode "executes" the current instruction, the PC register has already pointed to the +8 position;
when the jump instruction is executed, the ARM core clears the pipeline; but through the branch prediction technology, the branch address can be loaded in advance to reduce the impact;
when interrupted, the current The executed instructions are completed and other instructions in the pipeline are discarded;
A typical ARM instruction has two source registers Rn and Rm, and one destination register, where Rm can perform shift preprocessing before entering the ALU.
Syntax: ADD r3,r2,r1 r3 destination, r2, r1 source register;
Instruction classification: data processing, branch, LOAD-STORE, interrupt, program status register
Data processing instructions (MOV, arithmetic, logic, comparison, multiplication)
using the S suffix before the data processing instructions will update the flag (NZCV) in the CPSR; CMP does not need to add S, and automatically updates the CPSR;
MOV instructions and logic instructions will Have an impact on C, N, Z;
MOV syntax: <Instruction>{<cond>}{S} Rd, N
MOV Rd,N ; Rd = N
MVN Rd,N ; Rd = ~N
N in the syntax can be a register, an immediate value, or a barrel The shifter preprocessing register Rm, such as: R0, LSL #2
barrel shifter, performs a left/right shift of the operand by a specified number of digits before the operand enters the ALU. This occurs within the current instruction cycle. ;
MOV r7, r5, LSL #2 ; r7 = r5<<2
LSL Logical shift left x LSL y x<<y
LSR Logical shift right x LSR y (unsigned)x>>y
ASR Arithmetic shift right x ASR y (signed)x >>y Only arithmetic right shift is signed;
ROR Rotate right x ROR y ((unsigned)x>>y) | (x<<(32-y))
RRX Extended rotate right 1 bit x RRX y ( c << 31) | ((unsigned)x>>1) C flag in CPSR;
Arithmetic instructions
<instructions>{<cond>}{S} Rd, Rn, N
ADD 32-bit addition Rd = Rn + N
ADC 32-bit addition with carry Rd = Rn + N + carry
SUB 32-bit subtraction Rd = Rn - N
SBC 32-bit subtraction with carry Rd = Rn - N - !C
RSB Reverse subtraction Rd = N - Rn
RSC Reverse subtraction with carry Rd = N - Rn - !C
Logical instruction
<instruction>{<cond>}{S} Rd, Rn, N
AND Rd = Rn & N Logical AND
ORR Rd = Rn | N Logical OR
EOR Rd = Rn ^ N Logical XOR
BIC Rd = Rn & ~N logic bit clear
Comparison instruction (automatically update CPSR)
<Instruction>{<cond>} Rn, N
CMP Rn - N Compare
CMN Rn + N Negative comparison
TEQ Rn ^ N Equality test
TST Rn & N bit test
Multiply instruction
MLA {<cond>}{S} Rd,Rm,Rs,Rn; Rd=(Rm*Rs)+Rn Multiply and accumulate
MUL {<cond>}{S} Rd,Rm,Rn; Rd=Rm*Rn multiplication
<Command> {<cond>}{S} RdLo, RdHi, Rm, Rs
RdLo low 32 bits, RdHi high 32 bits; S signed U unsigned, tail L long integer
SMLAL [RdHi, RdLo] = [RdHi, RdLo] + Rm*Rs
SMULL [RdHi,RdLo] = Rm*Rs
UMLAL [RdHi,RdLo] = [RdHi,RdLo] + Rm*Rs
UMULL nb[RdHi,RdLo] = Rm*Rs
Branch instruction
B{<cond>} label
BL{<cond>} label
BX{<cond>} Rm
BLX{<cond>} Rm | label
B pc=label jumps to
the address of the first instruction after BL pc=label,lr=BL
BX pc=Rm&0xFFFFFFFE, T=Rm&1 jumps and switches the state
BLX pc=label,T=1;
pc=Rm&0xFFFFFFFE,T=Rm&1 , lr=The address label of the first instruction after BLX
is the signed offset relative to the PC, limited to the 32MB range; T corresponds to the Thumb bit in CPSR;
Rm in BX/BLX is the absolute address, and the lowest bit indicates whether Switch to Thumb state;
LOAD-STORE instruction
single register transfer
<LDR|STR>{<cond>}{B} Rd, addressing1
LDR{<cond>}SB|H|SH Rd, addressing2
STR{<cond>}H Rd,addressing2
LDR Load a word into a register
STR Save a word from a register
LDRB Load a byte into register
STRB Save a byte from a register
LDRH Load a halfword
STRH Store a halfword
LDRSB Load a signed byte
STRSH Store a signed byte
LDR r0, [r1]; the data pointed to by r1 is loaded into r0
STR r0, [r1]; r0 is stored in the location pointed to by r1, and r1 is called the base address register]
Addressing mode of single register LOAD-STORE instruction
LDR r0, [r1,#4]! ; r1=r1+4; r0=*(r1);
LDR r0, [r1,#4] ; r0=*(r1+ 4);
LDR r0, [r1], #4; r0=*r1; r1=r1+4
Summary: The loaded data is always the data pointed to in []; if it has ! or offset outside, it needs to be written back Base address register.
Variant:
LDR r0, [r1,r2] ;
LDR r0, [r1,r2, LSR #0x04]! Load mem32[r1+(r2 LSR #0x04)],r1 = r1 + (r2 LSR 0x4) LDR r0,
[ r1,-r2,LSR #0x4] Load mem32[r1-(r2 LSR #0x04)]
LDR r0, [r1], r2, LSR #0x04 Load mem32[r1], r1=r1+(r2 LSR 0x4)
Multi-register transfer
Increase the delay of interrupt;
{LDM|STM}{<cond><addressing mode>Rn{!}, <Registers>{r^}
LDM {Rd}*N < mem32[base+4*N] optional Rn update; load memory to multiple registers
STM {Rd}*N > mem32[base+4*N] optional Rn update; save multiple registers to memory
----Addressing mode Description Start address End address Rn! ----
IA increases Rn after execution Rn+4*N-4 Rn+4*N
IB increases Rn+4 Rn+4*N Rn+ before execution 4*N
DA is reduced by Rn-4*N+4 Rn Rn-4*N after
execution DB is reduced by Rn-4*N Rn-4 Rn-4*N before execution
I increases, D decreases;
increases after execution: first load data from the relevant address, and then update the address to be loaded in the next step
LDMIA r0!, {r1-r3}
STMIB r0!, {r1-r3}
Stack operation
ATPCS defines the stack as full decreasing type, LDMFD and STMFD respectively correspond to pop and push
A(Ascending), D(Descending); The position pointed to by F(Full)sp has been used; The position pointed to by E(Empty)sp has not been used;
FA ----> Full Ascending full increasing stack
FD ----> Full Descending full decreasing stack
EA ----> Empty Ascending empty increasing stack
ED ----> Empty Descending empty decreasing stack
Swap instruction
SWP{B}{<cond>} Rd,Rm,[Rn]; tmp=mem32[Rn]; mem32[Rn]=Rm;Rd=tmp; SWPB
byte exchange
cannot be used by other instructions or any bus during execution Access interruption, during which the system "holds the bus"
exchange instructions are used to implement semaphores and mutual exclusion operations in the operating system;
soft interrupt instructions
are usually executed in user mode and are used for system calls;
SWI{ <cond>} SWI_Number
1. lr_svc = instruction address after SWI
2. spsr_svc = cpsr
3. pc=vectors+8
4. cpsr = svc
5. cpsr_I = 1 (Mask IRQ interrupt)
Summary: Set the CPU mode and interrupt mask bit , jump to the corresponding interrupt processing function;
Program Status Register Instructions
MRS{<cond>} Rd, <cpsr|spsr> ; Rd = psr
MSR{<cond>} <cpsr|spsr>_<fields>, Rm ; psr[field] = Rm
MSR{<cond> } <cpsr|spsr>_<fields>, #immediate ; psr[field] = immediate
MRS r1, cpsr
BIC r1, r1,#0x80
MSR cpsr_c,r1
Coprocessing instructions
are used to extend the instruction set. It can be used both to provide additional computing power and to control the storage subsystem including cache and memory management.
Including data processing, register transfer and memory transfer instructions. Instructions are specific to the coprocessor.
CDP {<cond>} cp,opcode1,Cd,Cn {,opcode2}
<MRC|MCR>{<cond>} cp,opcode1,Rd,Cn,Cm{,opcode2}
<LDC|STC>{<cond>} cp,Cd,addressing
CDP coprocessor data processing, performs a data processing operation inside the coprocessor;
MRC MCR coprocessor register transfer, sends data into or out of the coprocessor register;
LDC STC coprocessor memory transfer, loads from the coprocessor /Store a memory data block;
cp represents the coprocessor number, ranging from p0-p15.
CP15 is reserved for the system and used for memory management, write buffer control, Cache control and register identification.
Coprocessor 15 (CP15) instructs
the CP15 system to control the coprocessor.
Configurable processor core with a dedicated set of registers for storing configuration information
MRC p15,0,r1,c1,c0,0; write C1 to r1
The constant load
ARM instruction is 32 bits, and it uses 12 bits to store the immediate value (4-bit offset plus 8-bit integer).
ARM adds two pseudo-instructions for sending 32-bit constants into registers. Let the compiler or assembler choose the actual instruction.
LDR Rd, =constant ; constant loading
ADR Rd, label ; address loading
The Count Zeros instruction
counts the number of zeros between the most significant sign bit and the first 1.
LDR r1,=0x00FFFFFF
CLZ r0,r1; r0=8
============================================
Exception and interrupt handling
An exception is any situation that requires aborting the normal execution of an instruction. An interrupt is a special type of exception;
each exception causes the processor to enter a specific mode. In addition, changing the CPSR can also enter a specific mode.
When an exception causes a mode change, the kernel automatically:
1. Saves the CPSR to the SPSR of the corresponding exception mode;
2. Saves the PC to the LR of the corresponding exception mode;
3. Sets the CPSR to the corresponding exception mode;
4. Sets the PC to the corresponding exception handling The entry address of the program.
Exceptions and corresponding modes
----------------------
Fast interrupt FIQ
interrupt IRQ
SWI/reset SVC
prefetch/data abort abort
undefined instruction undefined
Vector table (a table composed of jump addresses when an exception occurs)
B branch jump relative to PC
LDR pc, [pc,#offset]
LDR pc, [pc,#-0xFF0]
MOV pc, #immediate
Vector tables and processor modes
Exception Mode Offset
-------------------------
Reset SVC 0x0
Undefined instruction UND 0x4
SWI SVC 0x08
Prefetch abort ABT 0x0C
Data abort ABT 0x10
Unallocated --- 0x14
IRQ IRQ 0x18
FIQ FIQ 0x1C
Exception Priority
Exception Priority I bit F bit
--------------------------------
Reset 1 1
1Data abort 2 1 -
FIQ 3 1 1
IRQ 4 1 -
Prefetch abort 5 1 -
SWI 6 1 -
Undefined instruction 6 1 -
When an exception occurs, LR is set to a specific value based on the current PC value.
During IRQ, LR = the address of the last executed instruction plus 8; LR stores the return address of the exception handler. Useful address exception
based on LR register Address Usage ---------------------------------- Reset --Data abort lr-8 points to the resulting data Abort the abnormal instruction FIQ lr-4 IRQ lr-4 prefetch abort lr-4 SWI lr points to the next instruction of the SWI instruction undefined instruction lr points to the next instruction of the undefined instruction
handler
<handler code>
...
SUBS pc, r14, #4;pc=r14-4.
There is an [S] at the end of SUBS, and PC is the destination register, SPSR automatically restores to CPSR
Assembly example, use write system call to output "hello" and call exit to exit:
.section .data
hello:
.ascii "hello\n"
.section .text
.globl _start
_start:
mov r0, #1 //fd 1 stdout
ldr r1, =hello //buf addr
mov r2, #6 //size 6
mov r7, #4 //syscall write
svc #0
mov r0, #0 // exit status 0
mov r7, #1 // syscall 1(exit)
svc #0
all:
as -g main.s -o main.o
#ld -dynamic-linker /lib/ld-linux-aarch64.so.1 -lc main.o -o a.out
ld main.o -o a.out
rm main.o