ARM architecture, compilation summary

Architecture: Common behavioral characteristics of various ARM processors
Processor: Implements an architecture and can be integrated into different designs
Device: Contains the processor and additional components

ARM architecture includes:
    programming model
    instruction set
    system configuration
    exception handling
    storage model

A processor can implement different memory management models:
    VMSA (Virtual Memory System Architecture), based on MMU (Memory Management Unit)
    PMSA (Protected Memory System Architecture), based on MPU (Memory Protection Unit)
    
ARMv7 introduces architecture profiles :
    A defines a microprocessor architecture based on VMSA, high performance, and supports complete features of the operating system
    R Real-time defines a microprocessor architecture based on PMSA, deterministic time response and low-latency interrupts,
    M Microcontroller low-latency interrupt processing Has different exception handling model and PMSA
    
processor
    implements a certain architecture version
    ARM926EJ-S implements ARMv5 with TEJ extension
    Cortex-A9 implements ARMv7-A with multiprocessing extension
    
Devices are usually SOCs integrating an ARM processor and additional components
    When implementing the device, the cache size, whether to support the hardware floating point unit, etc. are optional operations; 

--------------------------ARMv5 architecture document summary--------------- 7
CPU modes
    : User , management, system, abort, interrupt, fast interrupt, undefined
    7 exceptions: reset SWI data abort prefetch abort fast interrupt interrupt undefined reset
    and SWI enters management mode; cannot access memory, enter abort mode (fetch instruction, fetch data); the fetched instruction cannot be decoded, and enters the undefined (instruction) mode;
    only the user mode is a non-privileged mode, and some storage systems and coprocessors have limited permissions, and an exception is generated through the SWI instruction to switch to another privileged processor model. 
    
    Privileged mode allows complete reading and writing of the CPSR, while non-privileged mode can only read the control domain and read and write condition flags.    
    Only when an exception occurs, the CPSR is saved (automatically?). Use instructions to rewrite the CPSR and will not save it to the SPSR;
    
CPSR [NZCV....IFT mode]
    IF interrupt and fast interrupt mask bit
    N Negative from the high 31 bits
    Z Zero result is 0
    C Carry carry;
    V overflow
Register:
    31 general-purpose registers, only 16 are visible at the same time R0 ~ R15 r13
    (SP), r14(LR), r15(PC)

    r0....r15 CPSR (user, system)
    r0...r7 r8_fiq..r14_fiq CPSR, SPSR_fiq (fast interrupt)
    r0...r12 r13_irq r14_irq CPSR, SPSR_irq (interrupt)
    r0...r12 r13_svc r14_svc CPSR, SPSR_svc (management)
    r0...r12 r13_undef r14_undef CPSR, SPSR_undef (undefined)
    r0...r12 r13_abt r14_abt CPSR, SPSR_abt (abort)
    
    CP15 coprocessor control storage components: cache, buffer, MMU MPU
conditional execution
    EQ Z equal
    NE z Not equal
    CS/HS C Carry bit 1/unsigned number greater than or equal to
    CC/LO c Carry cleared 0/unsigned number less than
    MIN N negative number minus
    PL n non-negative number plus 
    VS V overflow
    VC v no overflow
    HI zC unsigned greater than
    LS Z/c unsigned less than or equal to

    GE NV/nv Signed greater than or equal to
    LE Z/Nv/nV Signed less than or equal to
    GT NzV/nzv Signed number greater than
    LT Nv/nV Signed number less than

Pipeline
    ARM mode "executes" the current instruction, the PC register has already pointed to the +8 position;
    when the jump instruction is executed, the ARM core clears the pipeline; but through the branch prediction technology, the branch address can be loaded in advance to reduce the impact; 
    when interrupted, the current The executed instructions are completed and other instructions in the pipeline are discarded;

A typical ARM instruction has two source registers Rn and Rm, and one destination register, where Rm can perform shift preprocessing before entering the ALU.
Syntax: ADD r3,r2,r1 r3 destination, r2, r1 source register;

Instruction classification: data processing, branch, LOAD-STORE, interrupt, program status register


Data processing instructions (MOV, arithmetic, logic, comparison, multiplication)
    using the S suffix before the data processing instructions will update the flag (NZCV) in the CPSR; CMP does not need to add S, and automatically updates the CPSR;
    MOV instructions and logic instructions will Have an impact on C, N, Z;


    MOV syntax: <Instruction>{<cond>}{S} Rd, N
    MOV Rd,N ; Rd = N
    MVN Rd,N ; Rd = ~N
    
    N in the syntax can be a register, an immediate value, or a barrel The shifter preprocessing register Rm, such as: R0, LSL #2
    barrel shifter, performs a left/right shift of the operand by a specified number of digits before the operand enters the ALU. This occurs within the current instruction cycle. ;

    MOV r7, r5, LSL #2 ; r7 = r5<<2
    
    LSL Logical shift left x LSL y x<<y
    LSR Logical shift right x LSR y (unsigned)x>>y
    ASR Arithmetic shift right x ASR y (signed)x >>y Only arithmetic right shift is signed;
    ROR Rotate right x ROR y ((unsigned)x>>y) | (x<<(32-y))
    RRX Extended rotate right 1 bit x RRX y ( c << 31) | ((unsigned)x>>1) C flag in CPSR;

Arithmetic instructions
    <instructions>{<cond>}{S} Rd, Rn, N

    ADD 32-bit addition Rd = Rn + N
    ADC 32-bit addition with carry Rd = Rn + N + carry

    SUB 32-bit subtraction Rd = Rn - N
    SBC 32-bit subtraction with carry Rd = Rn - N - !C

    RSB Reverse subtraction Rd = N - Rn
    RSC Reverse subtraction with carry Rd = N - Rn - !C

Logical instruction
    <instruction>{<cond>}{S} Rd, Rn, N
    AND Rd = Rn & N Logical AND
    ORR Rd = Rn | N Logical OR
    EOR Rd = Rn ^ N Logical XOR
    BIC Rd = Rn & ~N logic bit clear

Comparison instruction (automatically update CPSR)
    <Instruction>{<cond>} Rn, N
    CMP Rn - N Compare
    CMN Rn + N Negative comparison
    TEQ Rn ^ N Equality test
    TST Rn & N bit test

Multiply instruction
    MLA {<cond>}{S} Rd,Rm,Rs,Rn; Rd=(Rm*Rs)+Rn Multiply and accumulate
    MUL {<cond>}{S} Rd,Rm,Rn; Rd=Rm*Rn multiplication

    <Command> {<cond>}{S} RdLo, RdHi, Rm, Rs
    RdLo low 32 bits, RdHi high 32 bits; S signed U unsigned, tail L long integer
    SMLAL [RdHi, RdLo] = [RdHi, RdLo] + Rm*Rs
    SMULL [RdHi,RdLo] = Rm*Rs

    UMLAL  [RdHi,RdLo] = [RdHi,RdLo] + Rm*Rs
    UMULL  nb[RdHi,RdLo] = Rm*Rs

Branch instruction
    B{<cond>} label   
    BL{<cond>} label  
    BX{<cond>} Rm  
    BLX{<cond>} Rm | label 

    B pc=label jumps to
    the address of the first instruction after BL pc=label,lr=BL
    BX pc=Rm&0xFFFFFFFE, T=Rm&1 jumps and switches the state
    BLX pc=label,T=1; 
        pc=Rm&0xFFFFFFFE,T=Rm&1 , lr=The address label of the first instruction after BLX
    is the signed offset relative to the PC, limited to the 32MB range; T corresponds to the Thumb bit in CPSR;
    Rm in BX/BLX is the absolute address, and the lowest bit indicates whether Switch to Thumb state;

LOAD-STORE instruction
    
single register transfer
    <LDR|STR>{<cond>}{B} Rd, addressing1
    LDR{<cond>}SB|H|SH Rd, addressing2
    STR{<cond>}H Rd,addressing2

    LDR Load a word into a register 
    STR Save a word from a register
    LDRB Load a byte into register
    STRB Save a byte from a register
    LDRH Load a halfword
    STRH Store a halfword
    LDRSB Load a signed byte
    STRSH Store a signed byte

    LDR r0, [r1]; the data pointed to by r1 is loaded into r0
    STR r0, [r1]; r0 is stored in the location pointed to by r1, and r1 is called the base address register]

    Addressing mode of single register LOAD-STORE instruction
    LDR r0, [r1,#4]! ; r1=r1+4; r0=*(r1); 
    LDR r0, [r1,#4] ; r0=*(r1+ 4);
    LDR r0, [r1], #4; r0=*r1; r1=r1+4
    
    Summary: The loaded data is always the data pointed to in []; if it has ! or offset outside, it needs to be written back Base address register.

    Variant:
    LDR r0, [r1,r2] ;
    LDR r0, [r1,r2, LSR #0x04]! Load mem32[r1+(r2 LSR #0x04)],r1 = r1 + (r2 LSR 0x4) LDR r0,
    [ r1,-r2,LSR #0x4] Load mem32[r1-(r2 LSR #0x04)]
    LDR r0, [r1], r2, LSR #0x04 Load mem32[r1], r1=r1+(r2 LSR 0x4)
    
Multi-register transfer
    Increase the delay of interrupt;
    {LDM|STM}{<cond><addressing mode>Rn{!}, <Registers>{r^}

    LDM {Rd}*N < mem32[base+4*N] optional Rn update; load memory to multiple registers
    STM {Rd}*N > mem32[base+4*N] optional Rn update; save multiple registers to memory

----Addressing mode Description Start address End address Rn! ----
    IA increases Rn after execution Rn+4*N-4 Rn+4*N
    IB increases Rn+4 Rn+4*N Rn+ before execution 4*N
    DA is reduced by Rn-4*N+4 Rn Rn-4*N after
    execution DB is reduced by Rn-4*N Rn-4 Rn-4*N before execution

    I increases, D decreases;
    increases after execution: first load data from the relevant address, and then update the address to be loaded in the next step

    LDMIA r0!, {r1-r3}
    STMIB r0!, {r1-r3}

Stack operation
    ATPCS defines the stack as full decreasing type, LDMFD and STMFD respectively correspond to pop and push
    A(Ascending), D(Descending); The position pointed to by F(Full)sp has been used; The position pointed to by E(Empty)sp has not been used;
    FA ----> Full Ascending full increasing stack          
    FD ----> Full Descending full decreasing stack 
    EA ----> Empty Ascending empty increasing stack
    ED ----> Empty Descending empty decreasing stack

Swap instruction
    SWP{B}{<cond>} Rd,Rm,[Rn]; tmp=mem32[Rn]; mem32[Rn]=Rm;Rd=tmp; SWPB
    byte exchange
    cannot be used by other instructions or any bus during execution Access interruption, during which the system "holds the bus"
    exchange instructions are used to implement semaphores and mutual exclusion operations in the operating system;
    
soft interrupt instructions
    are usually executed in user mode and are used for system calls;
    SWI{ <cond>} SWI_Number
        1. lr_svc = instruction address after SWI
        2. spsr_svc = cpsr
        3. pc=vectors+8
        4. cpsr = svc
        5. cpsr_I = 1 (Mask IRQ interrupt)
    Summary: Set the CPU mode and interrupt mask bit , jump to the corresponding interrupt processing function;

Program Status Register Instructions
    MRS{<cond>} Rd, <cpsr|spsr> ; Rd = psr
    MSR{<cond>} <cpsr|spsr>_<fields>, Rm ; psr[field] = Rm
    MSR{<cond> } <cpsr|spsr>_<fields>, #immediate ; psr[field] = immediate

    MRS r1, cpsr
    BIC r1, r1,#0x80
    MSR cpsr_c,r1

Coprocessing instructions
    are used to extend the instruction set. It can be used both to provide additional computing power and to control the storage subsystem including cache and memory management.
    Including data processing, register transfer and memory transfer instructions. Instructions are specific to the coprocessor.
    CDP {<cond>} cp,opcode1,Cd,Cn {,opcode2}
    <MRC|MCR>{<cond>} cp,opcode1,Rd,Cn,Cm{,opcode2}
    <LDC|STC>{<cond>} cp,Cd,addressing

    CDP coprocessor data processing, performs a data processing operation inside the coprocessor;
    MRC MCR coprocessor register transfer, sends data into or out of the coprocessor register;
    LDC STC coprocessor memory transfer, loads from the coprocessor /Store a memory data block;

    cp represents the coprocessor number, ranging from p0-p15.
    CP15 is reserved for the system and used for memory management, write buffer control, Cache control and register identification.

Coprocessor 15 (CP15) instructs
    the CP15 system to control the coprocessor.
    Configurable processor core with a dedicated set of registers for storing configuration information

    MRC p15,0,r1,c1,c0,0; write C1 to r1

The constant load
    ARM instruction is 32 bits, and it uses 12 bits to store the immediate value (4-bit offset plus 8-bit integer).
    ARM adds two pseudo-instructions for sending 32-bit constants into registers. Let the compiler or assembler choose the actual instruction.

    LDR Rd, =constant ; constant loading
    ADR Rd, label ; address loading

The Count Zeros instruction
    counts the number of zeros between the most significant sign bit and the first 1.
    LDR r1,=0x00FFFFFF
    CLZ r0,r1; r0=8

============================================
Exception and interrupt handling

    An exception is any situation that requires aborting the normal execution of an instruction. An interrupt is a special type of exception;
    each exception causes the processor to enter a specific mode. In addition, changing the CPSR can also enter a specific mode.

    When an exception causes a mode change, the kernel automatically:
    1. Saves the CPSR to the SPSR of the corresponding exception mode;
    2. Saves the PC to the LR of the corresponding exception mode;
    3. Sets the CPSR to the corresponding exception mode;
    4. Sets the PC to the corresponding exception handling The entry address of the program.
    
    Exceptions and corresponding modes
    ----------------------
    Fast interrupt FIQ
    interrupt IRQ
    SWI/reset SVC
    prefetch/data abort abort
    undefined instruction undefined

Vector table (a table composed of jump addresses when an exception occurs)
    B branch jump relative to PC
    LDR pc, [pc,#offset]
    LDR pc, [pc,#-0xFF0] 
    MOV pc, #immediate

Vector tables and processor modes

    Exception Mode Offset
    -------------------------
    Reset SVC 0x0
    Undefined instruction UND 0x4    
    SWI SVC 0x08
    Prefetch abort ABT 0x0C
    Data abort ABT 0x10
    Unallocated --- 0x14
    IRQ IRQ 0x18
    FIQ FIQ 0x1C

Exception Priority
    Exception Priority I bit F bit
    --------------------------------
    Reset 1 1
    1Data abort 2 1 -
    FIQ 3 1 1
    IRQ 4 1 -
    Prefetch abort 5 1 -
    SWI 6 1 -
    Undefined instruction 6 1 -

When an exception occurs, LR is set to a specific value based on the current PC value.
    During IRQ, LR = the address of the last executed instruction plus 8; LR stores the return address of the exception handler.     Useful address exception
    
    based on LR register Address Usage     ----------------------------------     Reset --Data           abort lr-8 points to the resulting data Abort the abnormal instruction     FIQ lr-4          IRQ lr-4          prefetch abort lr-4         SWI lr points to the next instruction of the SWI instruction     undefined instruction lr points to the next instruction of the undefined instruction








    handler
        <handler code>
        ...
        SUBS pc, r14, #4;pc=r14-4.  
    There is an [S] at the end of SUBS, and PC is the destination register, SPSR automatically restores to CPSR

Assembly example, use write system call to output "hello" and call exit to exit:

.section .data

hello:
.ascii "hello\n"



.section .text
.globl _start

_start:

        mov r0, #1   //fd 1 stdout
        ldr r1, =hello  //buf addr
        mov r2, #6  //size 6
        mov r7, #4  //syscall write
        svc #0

        mov r0, #0  // exit status 0
        mov r7, #1  // syscall 1(exit)
        svc #0
all:
        as -g main.s -o main.o
        #ld -dynamic-linker /lib/ld-linux-aarch64.so.1 -lc  main.o -o a.out
        ld main.o -o a.out
        rm main.o

Guess you like

Origin blog.csdn.net/konga/article/details/84503279