Heuristic Arm Architecture Interpretation (2023 New)

Chapter 1 Interpretation of Heuristic Arm Architecture

Chapter 2 CPU Microarchitecture

Chapter 3 System Microarchitecture

Chapter 4 Bus Microarchitecture

Chapter 5 Monitoring Microarchitecture

Chapter 6 Security Microarchitecture

Chapter 7 Virtualization Microarchitecture

Chapter 8 Armv9-A Architecture

Chapter 9 Armv8-M Architecture

Chapter 10 Armv8-R Architecture

Chapter 11 Interpretation of Cortex-A715

Chapter 12 Interpretation of Cortex-X3

Chapter Thirteen Neoverse Interpretation

Chapter 14 Interpretation of Cortex-M85

Chapter 15 Interpretation of Cortex-R82

Table of contents

foreword

1. Architecture overview

1.1 Von Neumann architecture

1.2 Harvard Architecture

1.3 Arm Architecture

2. Architecture map

2.1 Troika

2.2 Six generations of inheritance

2.3 Newest members

2.3.1 Big Brother Cortex-A710

2.3.2 Second Junior Brother Cortex-R82

2.3.3 Little junior sister Cortex-M85

2.4 Architecture clan

2.4.1 Graphics Processing Unit GPU

2.4.2 Neural Network Processor Unit NPU

3. Architecture Magic

3.1 Factions

3.1.1 Cortex-A Magic

3.1.2 Cortex-R magic

3.1.3 Cortex-M magic

3.2 era

3.3 Micro magic

4. Architecture drill

4.1 Great Gathering

4.2 Formation

4.3 Strategizing and winning thousands of miles away

4.3.1 Upload and download - console output

4.3.2 FiberHome - LED Marquee

4.3.3 Password issuance-encryption

4.3.4 Forage first - start code

4.3.5 Ready to Go - Link Script

V. Summary

refer to


foreword


  • Reminder : The full text is 10,000 words, and the estimated reading time is 15 minutes;
  • Readers : Friends who are interested in the Arm architecture;
  • Abstract : This article mainly discusses the underlying logic of the Arm architecture, and introduces the top-level design of the Arm architecture; based on the processor core architecture, with the system architecture as the core, and the A-series and M-series architectures as typical, the key system components are designed Easy-to-understand description; the Arm architecture mentioned in this article does not include GPU and NPU architectures;
b65ef10fac324ee59660050518a14dae.png

Figure 1 Arm knowledge structure

  • Keywords : Arm architecture, microarchitecture, Cortex-A, Cortex-R, Cortex-M, Armv7, Armv8, Armv9, ISA, instruction set, AMBA bus, Debug, Trustzone, virtualization, EL-2, S-EL2, EL-1, S-EL1, operating system, RISC-V;
  • Related recommendations : If you are interested in concepts such as structure, architecture, and systems, it is recommended to read Architecture and Systems  ;
  • Related recommendation : If you are interested in the Arm company, it is recommended to read an article to understand the Arm company ;

1. Architecture overview


1.1 Von Neumann architecture

  • ​Von Neumann structure, also known as Princeton structure, is a memory structure that combines program instruction memory and data memory. The program instruction storage address and the data storage address point to different physical locations of the same memory, so the program instructions and data have the same width. For example, the program instructions and data of Intel's 8086 central processing unit are 16 bits wide.
  • Mathematician von Neumann proposed three basic principles of computer manufacturing, namely, the use of binary logic, program storage and execution, and the computer is composed of five parts (calculator, controller, memory, input device, output device). The theory is known as the von Neumann architecture.

1.2 Harvard Architecture

  • The Harvard structure is a parallel architecture. Its main feature is that the program and data are stored in different storage spaces, that is, the program memory and the data memory are two independent memories, and each memory is independently addressed and accessed independently. Introduction Corresponding to the two memories are the 4 buses of the system: the data bus and the address bus of the program and data.

1.3 Arm Architecture

  • The Arm architecture refers to the architecture of the Arm processor , including the central processing unit (CPU) microarchitecture , system microarchitecture, bus microarchitecture, monitoring microarchitecture, security microarchitecture, and virtualization microarchitecture.
    • The CPU micro-architecture is the implementation of the instruction set architecture ISA , including A32/T32, A64, NEON, VFP, etc.;
    • The system architecture set includes interrupt controller GIC / system memory manager SMMU / power management PSCA / APCI, etc.;
    • The bus microarchitecture refers to the AMBA microarchitecture , including AHB, APB, AXI, CHI, etc.;
    • Monitoring microarchitecture includes debugging debug and tracking trace;
    • Security micro-architecture includes Trustzone, Realm, Crypto Cell, Crypto Island, etc.;
    • Virtualization microarchitectures include VMSA, LPAE, EL-2, S-EL2, etc.
Table 1 Micro-architecture system
CPU microarchitecture The central computing unit architecture that implements the instruction set architecture , and the A, R, and M series architectures describe the  CPU architecture
System microarchitecture The system component architecture that exists for the various components of the processor to function properly
bus microarchitecture The bridge architecture that connects the various architectural subsystems
Monitoring Microarchitecture Debug and track each architecture component of the system
Security Microarchitecture A collection of architectures used to implement system security
virtualization microarchitecture A collection of architectures for virtualizing hardware resources
1960bc48131140d382512892a00ad770.png

Figure 2 Arm processor top-level architecture

2. Architecture map


2.1 Troika

  • The Arm architecture is divided into three architecture families: Cortex-A, Cortex-M, and Cortex-R according to different application scenarios;
  • The Arm A-Profile  architecture mainly includes Cortex-A series processors for mobile and PC terminals, high-performance Neoverse processors for cloud computing and machine learning, and high-performance Cortex-X series processors developed in cooperation with customers. The latter two are series may form independent series;
  • The Arm M-Profile  architecture mainly includes Armv6 M0, Armv7 Cortex-M3, Cortex-M4, Armv8 Cortex-M23, Cortex-M33, Cortex-M35, Cortex-M55, Cortex-M85, used for general MCU, IoT networking field;
  • The Arm R-Profile  architecture mainly includes Cortex-R4, Cortex-R5, Cortex-R7, Cortex-R8 of Armv7, Cortex-R52 and Cortex-R82 of Armv8, which are used in the field of real-time control.

The evolution of each architecture family is both independent and interrelated. At present, the A series has evolved to the Armv9 version, the M series has evolved to the Armv8 version, and the R series has evolved to the Armv8 version. The following is the architecture of the representative processors of each family.

2.2 Six generations of inheritance

  • The Arm architecture has gone through six versions from Armv4 to Armv9;
  • At present, there are three versions of Armv7, Armv8, and Armv9 that are active in the market;
  • Each version introduces/deprecates different functional features, such as Trustzone of Armv6, virtualization of Armv7, vector extension SVE of Armv8, matrix extension SME of Armv9, etc.
2388f4e1e7f8491b988bc03adec5c13c.png

Figure 3 Architecture version

2.3 Newest members

2.3.1 Big Brother Cortex-A710

Cortex-A710 is an enhanced version of Cortex-A78, the large-core architecture of the Armv9-A architecture, which is basically the same as the previous generation. The new micro-architecture can achieve better performance and lower power consumption; the enhanced version of vector computing Extended architecture SVE2 support; NEON architecture support for advanced SIMD&DSP; FPU floating point architecture compatible with VFPv3 vector floating point.

c226a5ce47574600997ea9bb9cba97d1.png

Figure 4 Cortex-A710 Architecture Diagram

2.3.2 Second Junior Brother Cortex-R82

Cortex-R82 is the latest processor of the R series, using the Armv8 architecture, including CoreSgiht MDT, GIC, FPU, TCM, SCU, ACP, AXI-S, AXI-M, LLPP, LLRAM and other microarchitectures.

31901ecce3df20d171122f499b0dd644.png

Figure 5 Cortex-R82 architecture diagram

2.3.3 Little junior sister Cortex-M85

M85 is Armv8.1-M architecture, including MPU, Helium, PMU, CP, FPU, TCM, AHB, DSP, ETM, PACBTI, APH and other microarchitectures.

e4db73fe82eb4c9b89bb2eeed5a2a7c6.png

Figure 6 Cortex-M85 Architecture Diagram

2.4 Architecture clan

The Arm architecture described in this article refers to the general-purpose processor architecture and does not include specialized processors. In addition to general-purpose processors, Arm also has graphics processors and neural network processors.

2.4.1 Graphics Processing Unit GPU

f4f5ee8a810d4094b885054cefd72f91.png

Figure 7 Mali GPU Roadmap

The GPU architecture is divided into two branches: the traditional Mali architecture and the latest Immortalis architecture:

  • Mali currently has four generations, namely Utgard, Midgard, Bifrost and Valhall
  • Immortalis is a newly launched architecture, represented by Immortalis-G715

For more GPU architecture knowledge, refer to  Arm GPUs

2.4.2 Neural Network Processor Unit NPU

e071fd03f129442786eb441d2c3bbf88.png

Figure 8 Ethos NPU family

Ethos - NPUs is a neural network-based machine learning chip architecture launched by Arm, including U55, U65, N78,

For more NPU knowledge, refer to  Arm NPUs

3. Architecture Magic


Architectural magic refers to architectural features, such as computing magic, security magic, virtualization magic, etc. In this part, we will introduce these mysterious magics from the dimensions of faction, era, and micro-magic.

3.1 Factions

3.1.1 Cortex-A Magic

Armor 8.0 (Armv8.0-A)

  • Advance SIMD
    • aSIMD, Enhanced Fixed-Length Advanced Single Instruction Multiple Data
    • In Armv8-A, together with the variable length SVE and SVE2, SIMD is formed
    • SVE is mainly used in HPC and is standard in V9
  • Crypto Extension (CE)
    • AES accelerator: AEAD, AESE
    • SHA accelerator: SHA1, SHA256
  • CRC
    • Hardware CRC acceleration

Armor 8.1 (Armv8.1-A)

  • Atomic memory access instructions (AArch64)
    • PostgreSQL already supports storage atomic access extensions designed for large-scale system LSE , such as the exclusive load instruction LDXR and the exclusive storage instruction STXR
  • Limited Order regions (AArch64)
    • Memory access sequence designed for large system LSE, load-aquire, store-release instructions
    • It is an upgraded version of DMB, DSB, and ISB in the out-of-order era
  • Increased Virtual Machine Identifier (VMID) size, and Virtualization Host Extensions (AArch64)
    • Larger VM IDs in virtualization
    • Directly run Host on EL2 VHE technology
  • Privileged Access Never (PAN) (AArch32 and AArch64)
    • Kernel access to user space memory can be restricted via PAN

Armor 8.2 (Armv8.2-A)

  • Support for 52-bit addresses (AArch64)
    • 52-bit large physical address and large virtual address support, usually for server applications
  • The ability for PEs to share Translation Lookaside Buffer (TLB) entries (AArch32 and AArch64)
    • Multi-PE shared TLB, that is, shared page table entries
  • FP16 data processing instructions (AArch32 and AArch64)
    • Half precision is supported, as opposed to single and double precision
  • Statistical profiling (AArch64)
    • Built-in instruction statistics tools inside the pipeline, such as packet delay, important information of sampling instructions (access/hit/miss, branch prediction error, read-write interlock or not), which level of storage originates from
  • Reliability Availability Serviceability (RAS) support becomes mandatory (AArch32 and AArch64)
    • To provide mechanisms to ensure reliability, availability, and serviceability, TF-A has supported the RAS framework
  • Security Extended CE
    • SHA2-512、SHA3
    • SM3、SM4

Armor 8.3 (Armv8.3-A)

  • Pointer authentication (AArch64)
    • Perform identity authentication on instruction pointers and data pointers
    • Currently GCC -msign-return-address supports returning LR authentication
  • Nested virtualization (AArch64)
    • Allow guests to run hypervisors in EL1
    • Added access mechanism from EL1 to EL2
  • Advanced Single Instruction Multiple Data (SIMD) complex number support (AArch32 and AArch64)
    • aSIMD supports complex arithmetic
  • Improved JavaScript data type conversion support (AArch32 and AArch64)
    • Improved javascript data type conversion support
  • A change to the memory consistency model (AArch64)
    •  Add weaker RCpc support on the basis of RCsc
  • ID mechanism support for larger system-visible caches (AArch32 and AArch64)
    • Cache ID register expansion

Armor 8.4 (Armv8.4-A)

  • Secure virtualization (AArch64)
    • S-EL2 support, can run a secure virtual machine in a secure environment
  • Nested virtualization enhancements (AArch64)
  • Small translation table support (AArch64)
  • Relaxed alignment restrictions (AArch32 and AArch64)
  • Memory Partitioning and Monitoring (MPAM) (AArch32 and AArch64)
  • Additional crypto support (AArch32 and AArch64)
  • Generic counter scaling (AArch32 and AArch64)
  • Instructions to accelerate SHA

Armor 8.5/9.0 (Armv8.5-A/Armv9.0-A)

  •  Memory Tagging (AArch64)
  • Branch Target Identification (AArch64)
  • Random Number Generator instructions (AArch64)
  • Cache Clean to Point of Deep Persistence (AArch64)
  • V5A

Armor 8.6/9.1 (Armv8.6-A/Armv9.1-A)

  •  General Matrix Multiply (GEMM) instructions (AArch64)
  • Fine grained traps for virtualization (AArch64)
  • High precision Generic Timer
  • Data Gathering Hint (AArch64)
  • V6-A

Armor 8.7/9.2 (Armv8.7-A/Armv9.2-A)

  • Enhanced support for PCIe hot plug (AArch64)
  • Atomic 64-byte load and stores to accelerators (AArch64)
  • Wait For Instruction (WFI) and Wait For Event (WFE) with timeout (AArch64)
  • Branch-Record recording (Armv9.2 only)

Armor 8.8/9.3 (Armv8.8-A/Armv9.3-A)

  • Non-maskable interrupts (AArch64)
  • Instructions to optimize memcpy() and memset() style operations (AArch64)
  • Enhancements to PAC (AArch64)
  • Hinted conditional branches

3.1.2 Cortex-R magic

Seventh Generation Magic (Armv7-R)

Eighth Generation Magic (Armv8-R)

3.1.3 Cortex-M magic

Eighth Generation Magic (Armv8.0-M)

Eighth Generation Magic v1 (Armv8.1-M)

  • MVE (M-Profile Vector Extension),Arm Helium
  • LoB/Loop Tail Predication/BF

  • Security

    • Execution permission
    • V8.2-M PAC(Pointer Authentication)
    • V8.2-M BTI(Branch Target Instructions)
    • DIT(Data Independent Timing)
    • UDE(Unprivileged Debug Extension)

3.2 era

Table 2 Magic Year
2022 A-PROFILE 2022
2021 A-PROFILE 2021
2020 A-PROFILE 2020
2019 -
2018 A-PROFILE 2018
2017 A-PROFILE 2017
2016 A-PROFILE 2016
2015 A-PROFILE 2015
2014 A-PROFILE 2014

3.3 Micro magic

Table 3 Skill table
NEON integer vector operations
VFP Floating-point vector operations
ALL Variable-length vector expansion operation

See the ID_AA64xxxx family of registers to identify architectural features of the current CPU implementation. 

4. Architecture drill


4.1 Great Gathering

With the passage of time and changes in demand, each product family has evolved into multiple members, among which Cortex-A series processors have released 24 models from A5 to A715, and Cortex-M series processors have released a total of 24 models from M0 to M85. 11 models, Cortex-R series processors have released 11 models from R4 to R82. For the differences between each processor, you can see the link table below.

4.2 Formation

Different processors can handle different applications.

In terms of wearables, Cortex-A and Cortex-M are mainly used; in terms of storage, Cortex-R and Cortex-M; in terms of ADAS, there are Cortex-A and Cortex-R, A is used for high-performance computing, and R is used for real-time and security Control; in the mobile consumer market, Cortex-A, Cortex-R and Cortex-M have applications, A is used as an application processor, R is mainly used as a baseband, and Cortex-M may be used for sensor hub (sensor hub) chips field.

eb1e225e3db142c1921f65e2e09ace42.png

Figure 7 Application fields

4.3 Strategizing and winning thousands of miles away

4.3.1 Upload and download - console output

/*Hello world*/
 
#include <stdio.h>
 
int main()
 
{
 
    printf("Hello World\n");
 
    return 0;
 
}

4.3.2 FiberHome - LED Marquee

#include"led.h"

void LED_Init(void)
{
    RCC->APB2ENR|=1<<2;
                  
    RCC->APB2ENR|=1<<5; 
    GPIOA->CRH&=0XFFFFFFF0;
    GPIOA->CRH|=0X00000003;
                          
    GPIOA->ODR|=1<<8;    
    GPIOD->CRL&=0XFFFFF0FF;
    GPIOD->CRL|=0X00000300;   
                           
    GPIOD->ODR|=1<<2;
}

4.3.3 Password issuance-encryption

void mbedtls_aes_encrypt( mbedtls_aes_context *ctx,
                           const unsigned char input[16],
                           unsigned char output[16] )
 {
     int i;
     uint32_t *RK, X0, X1, X2, X3, Y0, Y1, Y2, Y3;
 
     RK = ctx->rk;
 
     GET_UINT32_LE( X0, input,  0 ); X0 ^= *RK++;
     GET_UINT32_LE( X1, input,  4 ); X1 ^= *RK++;
     GET_UINT32_LE( X2, input,  8 ); X2 ^= *RK++;
     GET_UINT32_LE( X3, input, 12 ); X3 ^= *RK++;
 
     for( i = ( ctx->nr >> 1 ) - 1; i > 0; i-- )
     {
         AES_FROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 );
         AES_FROUND( X0, X1, X2, X3, Y0, Y1, Y2, Y3 );
     }
 
     AES_FROUND( Y0, Y1, Y2, Y3, X0, X1, X2, X3 );
 
     X0 = *RK++ ^ 
             ( (uint32_t) FSb[ ( Y0       ) & 0xFF ]       ) ^
             ( (uint32_t) FSb[ ( Y1 >>  8 ) & 0xFF ] <<  8 ) ^
             ( (uint32_t) FSb[ ( Y2 >> 16 ) & 0xFF ] << 16 ) ^
             ( (uint32_t) FSb[ ( Y3 >> 24 ) & 0xFF ] << 24 );
 
     X1 = *RK++ ^ 
             ( (uint32_t) FSb[ ( Y1       ) & 0xFF ]       ) ^
             ( (uint32_t) FSb[ ( Y2 >>  8 ) & 0xFF ] <<  8 ) ^
             ( (uint32_t) FSb[ ( Y3 >> 16 ) & 0xFF ] << 16 ) ^
             ( (uint32_t) FSb[ ( Y0 >> 24 ) & 0xFF ] << 24 );
 
     X2 = *RK++ ^ 
             ( (uint32_t) FSb[ ( Y2       ) & 0xFF ]       ) ^
             ( (uint32_t) FSb[ ( Y3 >>  8 ) & 0xFF ] <<  8 ) ^
             ( (uint32_t) FSb[ ( Y0 >> 16 ) & 0xFF ] << 16 ) ^
             ( (uint32_t) FSb[ ( Y1 >> 24 ) & 0xFF ] << 24 );
 
     X3 = *RK++ ^ 
             ( (uint32_t) FSb[ ( Y3       ) & 0xFF ]       ) ^
             ( (uint32_t) FSb[ ( Y0 >>  8 ) & 0xFF ] <<  8 ) ^
             ( (uint32_t) FSb[ ( Y1 >> 16 ) & 0xFF ] << 16 ) ^
             ( (uint32_t) FSb[ ( Y2 >> 24 ) & 0xFF ] << 24 );
 
     PUT_UINT32_LE( X0, output,  0 );
     PUT_UINT32_LE( X1, output,  4 );
     PUT_UINT32_LE( X2, output,  8 );
     PUT_UINT32_LE( X3, output, 12 );
 }

4.3.4 Forage first - start code

.text
.global _start
_start:

@异常向量表

b reset
nop		
b swi_handler
nop
nop
nop
b irq_hander
nop
reset:
ldr sp,=buf+512*3
@irq模式
mrs r0,cpsr
bic r0,#0x1f
orr r0,#0x12
msr cpsr,r0
ldr sp,=buf+512*2
@user模式
mrs r0,cpsr
bic r0,#0x1f
orr r0,#0x10
msr cpsr,r0
ldr sp,=buf+512

mov r0,#0x11
mov r1,#0x22
SWI	1
add r2,r0,r1
nop
nop

stop:
	nop
    nop
    nop
    B stop

@软中断
swi_handler:
	@入栈保护现场
	stmfd sp!,{r0-r12,lr}
	mov r0,#0x1f
	mov r1,#0x2f
	mov r2,#0x3f
	mov r3,#0x4f
	mov r4,#0x5f
	@出栈 恢复现常,还原模式	spsr->cpsr
	@lc -> pc
	ldmfd sp!,{r0-r12,pc}^
	@mov pc,lr

@中断
irq_hander:
	@入栈保护现场
	stmfd sp!,{r0-r12,lr}

	@中断处理
	@switch(irqnum)

	ldmfd sp!,{r0-r12,pc}^
	
.DATA
buf:
	.space 512*3

.end

4.3.5 Ready to Go - Link Script


SECTIONS
{
    . = 0x80000,
    .text.boot :{*(.text.boot)}
    .text : {*(.text)}
    .rodata : {*(.rodata)}
    .data : {*(.data)}
 
    . = ALIGN(0x8);
    bss_begin = .;
    .bss :{*(.bss*)}
    bss_end = .;
 
    . = ALIGN(4096);
    init_pg_dir = .;
     += 4096;
}

V. Summary


From a process point of view, this article provides a heuristic introduction to the Arm architecture based on the main lines of architecture overview, architecture development, architecture features, and architecture drills. From the perspective of pyramidal knowledge structure, this article involves computer architecture, Arm system architecture, micro-architecture, processor programming model, and application programming; from the perspective of system thinking, this article is likely to be superficial for readers, so The following chapters will describe the Arm architecture in a more systematic, systematic and in-depth manner.

refer to


the term

Turing machine

An abstract machine, mental model;

bus 

The public communication trunk that transmits information between various functional components of a computer;

Trustzone

Technology to achieve trusted domains through isolation;

Hypervisor

A virtual machine monitor is software, firmware, or hardware used to create and execute virtual machines.

Sharing knowledge is a virtue. If you think this article is well written, please like, bookmark, and share.

Next Chapter CPU Microarchitecture

Guess you like

Origin blog.csdn.net/BillyThe/article/details/128653472