ARM架构版本及处理器系列详细介绍

目录

1 ARM发展

2 ARM版本

3ARM系列说明

3.1ARM7系列

3.2ARM9系列

3.3ARM11系列

3.4Cortex-R系列

3.5Cortex-M系列

3.6Cortex-A系列

4ARM 内核时间表

5ARM第三方设计公司


1 ARM发展

         ARM是Advanced RISC Machine的缩写,即进阶精简指令集机器。arm更早称为Acorn RISC Machine,是一个32位精简指令集(RISC)处理器架构。也有基于ARM设计的派生产品,主要产品包括Marvell的XScale架构和和德州仪器的OMAP系列。ARM家族中32位嵌入式处理器占比达75%,由于ARM的低功耗特性,被广泛反应于移动通信领域、便携式设备等领域。

       1983年Acorn电脑公司(Acorn Computers Ltd)开始开发一颗主要用于路由器的Conexant ARM处理器,由Roger Wilson和Steve Furber带领团队,着手开发一种新架构,类似进阶的MOS Technology 6502处理器。Acorn有一大堆建构在6502架构上的电脑。该团队在1985年时开发出ARM1 Sample版,并于次年量产了ARM2,ARM2具有32位的数据总线、26位的寻址空间,并提供64 Mbyte的寻址范围与16个32-bit的暂存器。

        在1980年代晚期,苹果电脑开始与Acorn合作开发新版的ARM核心。1990年将设计团队另组成一间名为安谋国际科技(Advanced RISC Machines Ltd.)的新公司,。1991年首版ARM6出样,然后苹果电脑使用ARM6架构的ARM 610来当作他们Apple Newton PDA的基础。在1994年,Acorn使用ARM 610做为他们Risc PC电脑内的CPU。

        ARM是一家微处理器行业的知名企业,该企业设计了大量高性能、廉价、耗能低的RISC (精简指令集)处理器,它只设计芯片而不生产。ARM的经营模式在于出售其知识产权核(IP core),将技术授权给世界上许多著名的半导体、软件和OEM厂商,并提供技术服务。

        ARM的版本分为两类,一个是内核版本,一个处理器版本。内核版本也就是ARM架构,如ARMv1、ARMv2、ARMv3、ARMv4、ARMv5、ARMv6、ARMv7、ARMv8等。处理器版本也就是ARM处理器,如ARM1、ARM9、ARM11、ARM Cortex-A(A7、A9、A15),ARM Cortex-M(M1、M3、M4)、ARM Cortex-R,这个也是我们通常意义上所指的ARM版本。

2 ARM版本

ARM版本信息简化表如下表所示。

内核(架构)版本

处理器版本

ARMv1

ARM1

ARMv2

ARM2、ARM3

ARMv3

ARM6、ARM7

ARMv4

StrongARM、ARM7TDMI、ARM9TDMI

ARMv5

ARM7EJ、ARM9E、ARM10E、XScale

ARMv6

ARM11、ARM Cortex-M

ARMv7

ARM Cortex-A、ARM Cortex-M、ARM Cortex-R

ARMv8

ARM Cortex-A30、ARM Cortex-A50、ARM Cortex-A70

ARM版本信息详细表如下表所示。(参考https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures

ARM family ARM architecture ARM core Feature Cache (I / D), MMU Typical MIPS @ MHz Reference
ARM1 ARMv1 ARM1 First implementation None    
ARM2 ARMv2 ARM2 ARMv2 added the MUL (multiply) instruction None 4 MIPS @ 8 MHz
0.33 DMIPS/MHz
 
ARMv2a ARM250 Integrated MEMC (MMU), graphics and I/O processor. ARMv2a added the SWP and SWPB (swap) instructions None, MEMC1a 7 MIPS @ 12 MHz  
ARM3 ARMv2a ARM3 First integrated memory cache KB unified 12 MIPS @ 25 MHz
0.50 DMIPS/MHz
 
ARM6 ARMv3 ARM60 ARMv3 first to support 32-bit memory address space (previously 26-bit).
ARMv3M first added long multiply instructions (32x32=64).
None 10 MIPS @ 12 MHz  
ARM600 As ARM60, cache and coprocessor bus (for FPA10 floating-point unit) 4 KB unified 28 MIPS @ 33 MHz  
ARM610 As ARM60, cache, no coprocessor bus 4 KB unified 17 MIPS @ 20 MHz
0.65 DMIPS/MHz
[4]
ARM7 ARMv3 ARM700   8 KB unified 40 MHz  
ARM710 As ARM700, no coprocessor bus 8 KB unified 40 MHz [5]
ARM710a As ARM710 8 KB unified 40 MHz
0.68 DMIPS/MHz
 
ARM7T ARMv4T ARM7TDMI(-S) 3-stage pipeline, Thumb, ARMv4 first to drop legacy ARM 26-bit addressing None 15 MIPS @ 16.8 MHz
63 DMIPS @ 70 MHz
 
ARM710T As ARM7TDMI, cache 8 KB unified, MMU 36 MIPS @ 40 MHz  
ARM720T As ARM7TDMI, cache 8 KB unified, MMU with FCSE (Fast Context Switch Extension) 60 MIPS @ 59.8 MHz  
ARM740T As ARM7TDMI, cache MPU    
ARM7EJ ARMv5TEJ ARM7EJ-S 5-stage pipeline, Thumb, Jazelle DBX, enhanced DSP instructions None    
ARM8 ARMv4 ARM810 5-stage pipeline, static branch prediction, double-bandwidth memory 8 KB unified, MMU 84 MIPS @ 72 MHz
1.16 DMIPS/MHz
[6][7]
ARM9T ARMv4T ARM9TDMI 5-stage pipeline, Thumb None    
ARM920T As ARM9TDMI, cache 16 KB / 16 KB, MMU with FCSE (Fast Context Switch Extension) 200 MIPS @ 180 MHz [8]
ARM922T As ARM9TDMI, caches 8 KB / 8 KB, MMU    
ARM940T As ARM9TDMI, caches 4 KB / 4 KB, MPU    
ARM9E ARMv5TE ARM946E-S Thumb, enhanced DSP instructions, caches Variable, tightly coupled memories, MPU    
ARM966E-S Thumb, enhanced DSP instructions No cache, TCMs    
ARM968E-S As ARM966E-S No cache, TCMs    
ARMv5TEJ ARM926EJ-S Thumb, Jazelle DBX, enhanced DSP instructions Variable, TCMs, MMU 220 MIPS @ 200 MHz  
ARMv5TE ARM996HS Clockless processor, as ARM966E-S No caches, TCMs, MPU    
ARM10E ARMv5TE ARM1020E 6-stage pipeline, Thumb, enhanced DSP instructions, (VFP) 32 KB / 32 KB, MMU    
ARM1022E As ARM1020E 16 KB / 16 KB, MMU    
ARMv5TEJ ARM1026EJ-S Thumb, Jazelle DBX, enhanced DSP instructions, (VFP) Variable, MMU or MPU    
ARM11 ARMv6 ARM1136J(F)-S 8-stage pipeline, SIMD, Thumb, Jazelle DBX, (VFP), enhanced DSP instructions, unaligned memory access Variable, MMU 740 @ 532–665 MHz (i.MX31 SoC), 400–528 MHz [9]
ARMv6T2 ARM1156T2(F)-S 9-stage pipeline, SIMD, Thumb-2, (VFP), enhanced DSP instructions Variable, MPU   [10]
ARMv6Z ARM1176JZ(F)-S As ARM1136EJ(F)-S Variable, MMU + TrustZone 965 DMIPS @ 772 MHz, up to 2,600 DMIPS with four processors [11]
ARMv6K ARM11MPCore As ARM1136EJ(F)-S, 1–4 core SMP Variable, MMU    
SecurCore ARMv6-M SC000     0.9 DMIPS/MHz  
ARMv4T SC100        
ARMv7-M SC300     1.25 DMIPS/MHz  
Cortex-M ARMv6-M Cortex-M0[12] Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), optional system timer, optional bit-banding memory Optional cache, no TCM, no MPU 0.84 DMIPS/MHz  
Cortex-M0+[14] Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), optional system timer, optional bit-banding memory Optional cache, no TCM, optional MPU with 8 regions 0.93 DMIPS/MHz  
Cortex-M1[15] Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), OS option adds SVC / banked stack pointer, optional system timer, no bit-banding memory Optional cache, 0–1024 KB I-TCM, 0–1024 KB D-TCM, no MPU 136 DMIPS @ 170 MHz,[16] (0.8 DMIPS/MHz FPGA-dependent)[17]  
ARMv7-M Cortex-M3[18] Microcontroller profile, Thumb / Thumb-2, hardware multiply and divide instructions, optional bit-banding memory Optional cache, no TCM, optional MPU with 8 regions 1.25 DMIPS/MHz  
ARMv7E-M Cortex-M4[19] Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv4-SP single-precision FPU, hardware multiply and divide instructions, optional bit-banding memory Optional cache, no TCM, optional MPU with 8 regions 1.25 DMIPS/MHz (1.27 w/FPU)  
Cortex-M7[20] Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv5 single and double precision FPU, hardware multiply and divide instructions 0−64 KB I-cache, 0−64 KB D-cache, 0–16 MB I-TCM, 0–16 MB D-TCM (all these w/optional ECC), optional MPU with 8 or 16 regions 2.14 DMIPS/MHz  
ARMv8-M Cortex-M23[21] Microcontroller profile, Thumb-1 (most), Thumb-2 (some), Divide, TrustZone Optional cache, no TCM, optional MPU with 16 regions 0.99 DMIPS/MHz  
Cortex-M33[22] Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processor Optional cache, no TCM, optional MPU with 16 regions 1.50 DMIPS/MHz  
Cortex-M35P[23] Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processor Built-in cache (with option 2–16 KB), I-cache, no TCM, optional MPU with 16 regions 1.50 DMIPS/MHz  
Cortex-R ARMv7-R Cortex-R4[24] Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lockstep with fault logic 0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt. MPU with 8/12 regions 1.67 DMIPS/MHz[25]  
Cortex-R5[26] Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lock-step with fault logic / optional as 2 independent cores, low-latency peripheral port (LLPP), accelerator coherency port (ACP)[27] 0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt. MPU with 12/16 regions 1.67 DMIPS/MHz[25]  
Cortex-R7[28] Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 11-stage pipeline dual-core running lock-step with fault logic / out-of-order execution / dynamic register renaming / optional as 2 independent cores, low-latency peripheral port (LLPP), ACP[27] 0–64 KB / 0–64 KB, ? of 0–128 KB TCM, opt. MPU with 16 regions 2.50 DMIPS/MHz[25]  
Cortex-R8[29] TBD TBD 2.50 DMIPS/MHz[25]  
ARMv8-R Cortex-R52[30] TBD TBD 2.16 DMIPS/MHz[31]  
Cortex-A
(32-bit)
ARMv7-A Cortex-A5[32] Application profile, ARM / Thumb / Thumb-2 / DSP / SIMD / Optional VFPv4-D16 FPU / Optional NEON / Jazelle RCT and DBX, 1–4 cores / optional MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) 4−64 KB / 4−64 KB L1, MMU + TrustZone 1.57 DMIPS/MHz per core  
Cortex-A7[33] Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / Jazelle RCT and DBX / Hardware virtualization, in-order execution, superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), architecture and feature set are identical to A15, 8–10 stage pipeline, low-power design[34] 8−64 KB / 8−64 KB L1, 0–1 MB L2, MMU + TrustZone 1.9 DMIPS/MHz per core  
Cortex-A8[35] Application profile, ARM / Thumb / Thumb-2 / VFPv3 FPU / NEON / Jazelle RCT and DAC, 13-stage superscalar pipeline 16–32 KB / 16–32 KB L1, 0–1 MB L2 opt. ECC, MMU + TrustZone Up to 2000 (2.0 DMIPS/MHz in speed from 600 MHz to greater than 1 GHz)  
Cortex-A9[36] Application profile, ARM / Thumb / Thumb-2 / DSP / Optional VFPv3 FPU / Optional NEON / Jazelle RCT and DBX, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) 16–64 KB / 16–64 KB L1, 0–8 MB L2 opt. parity, MMU + TrustZone 2.5 DMIPS/MHz per core, 10,000 DMIPS @ 2 GHz on Performance Optimized TSMC 40G (dual-core)  
Cortex-A12[37] Application profile, ARM / Thumb-2 / DSP / VFPv4 FPU / NEON / Hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) 32−64 KB 3.0 DMIPS/MHz per core  
Cortex-A15[38] Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP, 15-24 stage pipeline[34] 32 KB w/parity / 32 KB w/ECC L1, 0–4 MB L2, L2 has ECC, MMU + TrustZone At least 3.5 DMIPS/MHz per core (up to 4.01 DMIPS/MHz depending on implementation)[39]  
Cortex-A17[40] Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP 32 KB L1, 256 KB–8 MB L2 w/optional ECC 2.8 DMIPS/MHz  
ARMv8-A Cortex-A32[41] Application profile, AArch32, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, dual issue, in-order pipeline 8–64 KB w/optional parity / 8−64 KB w/optional ECC L1 per core, 128 KB–1 MB L2 w/optional ECC shared    
Cortex-A
(64-bit)
ARMv8-A ARM Cortex-A34[42] Application profile, AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline 8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–1 MB L2 shared, 40-bit physical addresses    
Cortex-A35[43] Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline 8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–1 MB L2 shared, 40-bit physical addresses 1.78 DMIPS/MHz  
Cortex-A53[44] Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline 8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–2 MB L2 shared, 40-bit physical addresses 2.3 DMIPS/MHz  
Cortex-A57[45] Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline 48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB–2 MB L2 shared w/ECC; 44-bit physical addresses 4.1–4.5 DMIPS/MHz[46][47]  
Cortex-A72[48] Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width superscalar, deeply out-of-order pipeline 48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB–2 MB L2 shared w/ECC; 44-bit physical addresses 4.7 DMIPS/MHz  
Cortex-A73[49] Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width superscalar, deeply out-of-order pipeline 64 KB / 32−64 KB L1 per core, 256 KB–8 MB L2 shared w/ optional ECC, 44-bit physical addresses 4.8 DMIPS/MHz[50]  
ARMv8.2-A Cortex-A55[51] Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline[52] 16−64 KB / 16−64 KB L1, 256 KB L2 per core, 4 MB L3 shared    
Arm Cortex-A65AE[53] Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, out-of-order pipeline, SMT 64 / 64 KB L1, 256 KB L2 per core, 4 MB L3 shared    
Cortex-A75[54] Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline[55] 64 / 64 KB L1, 512 KB L2 per core, 4 MB L3 shared    
Cortex-A76[56] Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way issue, 13 stage pipeline, deeply out-of-order pipeline[57] 64 / 64 KB L1, 256−512 KB L2 per core, 512 KB−4 MB L3 shared    
Cortex-A77[58] Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 6-width instruction fetch, 12-way issue, 13 stage pipeline, deeply out-of-order pipeline[57] 1.5K L0 MOPs cache, 64 / 64 KB L1, 256−512 KB L2 per core, 512 KB−4 MB L3 shared    
Neoverse Neoverse N1[59] Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way dispatch/issue, 13 stage pipeline, deeply out-of-order pipeline[57] 64 / 64 KB L1, 512−1024 KB L2 per core, 2−128 MB L3 shared, 128 MB system level cache    
Neoverse E1 Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, 10 stage pipeline, out-of-order pipeline, SMT 32−64 KB / 32−64 KB L1, 256 KB L2 per core, 4 MB L3 shared    
ARM family ARM architecture ARM core Feature Cache (I / D), MMU Typical MIPS @ MHz Reference

3 ARM系列说明

3.1 ARM7系列

         该系列主要针对某些简单的32位设备,作为目前较旧的一个系列,ARM7处理器已经不建议继续在新品中使用。主要包括ARM7TDMI-S(ARMv4T架构)和ARM7EJ-S(ARMv5TEJ架构)。

3.2 ARM9系列

         主要针对嵌入式实时应用,主要包括ARM926EJ-S、ARM946E-S和 ARM968E-S。

3.3 ARM11系列

         主要应用在高可靠性和实时嵌入式应用领域,主要包括ARM11MPCore、ARM1176、ARM1156、ARM1136。

3.4 Cortex-R系列

         Cortex-R,代表实时的意义(Real-Time),目标是实时任务处理,主要应用领域包括汽车、相机、工业、医学等。

该系列处理器主要包括Cortex-R4、Cortex-R5、Cortex-R7、Cortex-R8、Cortex-R52、Cortex-A17。

3.5 Cortex-M系列

          Cortex-M,代表微处理器的意义(Microcontrollers),目标是最节能的嵌入式设备,主要应用领域包括汽车、能源网、医学、嵌入式、智能卡、智能设备。传感器融合、穿戴设备等。

该系列处理器主要包括Cortex-M0、Cortex-M0+、Cortex-M3、Cortex-M4、Cortex-M7、Cortex-M23、Cortex-M33、Cortex-M35P。

3.6 Cortex-A系列

         Cortex-A,代表的是先进意义(Advanced),目标是以最佳功耗实现最高性能,主要应用领域包括汽车、工业、医学、调制解调器、存储等。Cortex-A也是目前应用最广的处理器版本。

         该系列处理器主要包括Cortex-A5、Cortex-A7、Cortex-A8、Cortex-A9、Cortex-A15、Cortex-A17、Cortex-A32、Cortex-A35、Cortex-A53、Cortex-A57、Cortex-A72、Cortex-A73。Cortex-A8只支持单核。其中,Cortex-A5、Cortex-A7、Cortex-A8、Cortex-A9、Cortex-A15、Cortex-A17基于ARMv7-A架构;Cortex-A32、Cortex-A35、Cortex-A53、Cortex-A57、Cortex-A72、Cortex-A73基于ARMv8-A架构,除了Cortex-A32为32位结构,其它支持64位结构。

         Cortex-A处理器从高到低可排序为:Cortex-A73、Cortex-A72、Cortex-A57、Cortex-A53、Cortex-A35、Cortex-A32、Cortex-A17、Cortex-A15、Cortex-A7、Cortex-A9、Cortex-A8、Cortex-A5。

Company Core Rele-ased Revision Decode Pipeline
depth
Out-of-order
execution
Branch
prediction
big.LITTLE role Exec.
ports
Fab
(in nm)
Simult. MT L0 cache L1 cache
Instr + Data
(in KiB)
L2 cache L3 cache Core
configu-
rations
DMIPS/
MHz
Company Core Rele-ased Revision Decode Pipeline
depth
Out-of-order
execution
Branch
prediction
big.LITTLE role Exec.
ports
Fab
(in nm)
Simult. MT L0 cache L1 cache
Instr + Data
(in KiB)
L2 cache L3 cache Core
configu-
rations
DMIPS/
MHz
ARM Holdings Cortex-A32 (32-bit) 2017 ARMv8.0-A
(only 32-bit)
2-wide 8 No   LITTLE ? 28 No No 8–64 + 8–64 0–1 MiB No 1-4+  
Cortex-A34 (64-bit) 2019 ARMv8.0-A
(only 64-bit)
2-wide 8 No   LITTLE ?   No No 8–64 + 8–64 0–1 MiB No 1-4+  
Cortex-A35 2017 ARMv8.0-A 2-wide 8 No Yes LITTLE ? 28 / 16 /

14 / 10

No No 8–64 + 8–64 0 / 128 KiB–1 MiB No 1–4+ 1.78
Cortex-A53 2014 ARMv8.0-A 2-wide 8 No Conditional+
Indirect branch
prediction
big/LITTLE 2 28 / 20 /

16 / 14 / 10

No No 8–64 + 8–64 128 KiB–2 MiB No 1–4+ 2.24
Cortex-A55 2017 ARMv8.2-A 2-wide 8 No big/LITTLE 2 28 / 20 /

16 / 14 / 10

No No 16–64 + 16–64 0–256 KiB/core 0–4 MiB 1–8+ 2.65[8]
Cortex-A57 2013 ARMv8.0-A 3-wide 15 Yes
3-wide dispatch
Two-level big 8 28 / 20 /

16[10] / 14

No No 48 + 32 0.5–2 MiB No 1–4+ 4.6
Cortex-A65AE 2019 ARMv8.2-A ? ? Yes Two-level ? 2 ? SMT2 No 16-64 + 16-64 64-256 KiB 0-4 MB 1–8 ?
Cortex-A72 2015 ARMv8.0-A 3-wide 15 Yes
5-wide dispatch
Two-level big 8 28 / 16 No No 48 + 32 0.5–4 MiB No 1–4+ 4.72
Cortex-A73 2016 ARMv8.0-A 2-wide 11–12 Yes
4-wide dispatch
Two-level big 7 28 / 16 / 10 No No 64 + 32/64 1–8 MiB No 1–4+ ~6.35
Cortex-A75 2017 ARMv8.2-A 3-wide 11–13 Yes
6-wide dispatch
Two-level big 8? 28 / 16 / 10 No No 64 + 64 256–512 KiB/core 0–4 MiB 1–8+ ?
Cortex-A76 2018 ARMv8.2-A 4-wide 11–13 Yes
8-wide dispatch
Two-level big 8 10 / 7 No No 64 + 64 256–512 KiB/core 1–4 MiB 1–4 ?
Cortex-A77 2019 ARMv8.2-A 4-wide 11–13 Yes
10-wide dispatch
Two-level big 12 7 No 1.5K entries 64 + 64 256–512 KiB/core 1–4 MiB 1-4 ?
Apple Inc. Cyclone 2013 ARMv8.0-A 6-wide 16 Yes Yes No 9 28 No No 64 + 64 1 MiB 4 MiB 2 ?
Typhoon 2014 ARMv8.0‑A 6-wide 16 Yes Yes No 9 20 No No 64 + 64 1 MiB 4 MiB 2, 3 (A8X) ?
Twister 2015 ARMv8.0‑A 6-wide 16[20] Yes Yes No 9 16 / 14 No No 64 + 64 3 MiB 4 MiB
No (A9X)
2 ?
Hurricane 2016 ARMv8.1‑A 6-wide 16 Yes Yes "big" (In A10/A10X paired with "LITTLE" Zephyr
cores)
9 16 (A10)
10 (A10X)
No No 64 + 64 3 MiB(A10)
8 MiB (A10X)
4 MiB(A10)
No (A10X)
2x Hurricane + 2x Zephyr (A10)
3x Hurricane + 3x Zephyr (A10X)
?
Zephyr 2016 ARMv8.1‑A 3-wide 12 Yes Yes LITTLE 5 16 (A10)
10 (A10X)
No No 32 + 32 1 MiB 4 MiB[22] (A10)
No (A10X)
2x Hurricane + 2x Zephyr (A10)
3x Hurricane + 3x Zephyr (A10X)
?
Monsoon 2017 ARMv8.2‑A 7-wide 16 Yes Yes "big" (In Apple A11 paired with "LITTLE" Mistral
cores)
13 10 No No 64 + 64 8 MiB No 2x Monsoon + 4× Mistral ?
Mistral 2017 ARMv8.2‑A 3-wide 12 Yes Yes LITTLE 5 10 No No 32 + 32 1 MiB No 2x Monsoon + 4× Mistral ?
Vortex 2018 ARMv8.3‑A 7-wide 16 Yes Yes "big" (In Apple A12/Apple A12X/Apple A12Z paired with "LITTLE" Tempest
cores)
13 7 No No 128 + 128 8 MiB No 2x Vortex + 4x Tempest (A12)
4x Vortex + 4x Tempest (A12X/A12Z)
?
Tempest 2018 ARMv8.3‑A 3-wide 12 Yes Yes LITTLE 5 7 No No 32 + 32 2 MiB No 2x Vortex + 4x Tempest (A12)
4x Vortex + 4x Tempest (A12X/A12Z)
?
Lightning 2019 ARMv8.4‑A  7-wide 16 Yes Yes "big" (In Apple A13 paired with "LITTLE" Thunder
cores)
13 7 No No 128 + 128 8 MiB No 2x Lightning + 4x Thunder ?
Thunder 2019 ARMv8.4‑A  3-wide 12 Yes Yes LITTLE 5 7 No No 32 + 48 4 MiB No 2x Lightning + 4x Thunder ?
Nvidia Denver 2014 ARMv8‑A 2-wide hardware
decoder, up to
7-wide variable-
length VLIW
micro-ops
13 Not if the hardware
decoder is in use.
Can be provided
by dynamic software
translation into VLIW.
Direct+
Indirect branch
prediction
No 7 28 No No 128 + 64 2 MiB No 2 ?
Denver 2 2016 ARMv8‑A ? 13 Not if the hardware
decoder is in use.
Can be provided
by dynamic software
translation into VLIW.
Direct+
Indirect branch
prediction
"Super" Nvidia's own implementation ? 16 No No 128 + 64 2 MiB No 2 ?
Carmel 2018 ARMv8.2‑A ?     Direct+
Indirect branch
prediction
  ? 12 No No 128 + 64 2 MiB (4 MiB @ 8 cores) 2 (+ 8) ?
Cavium ThunderX   ARMv8-A 2-wide ? No Two-level   ? 28 No No 78 + 32 16 MiB No 8–16, 24–48 ?
ThunderX2
(ex. Broadcom Vulcan)
May 2018 ARMv8.1-A
 
4-wide
"4 μops"
? Yes Multi-level ? ? 16 SMT4 No 32 + 32
(data 8-way)
256KB
per core
1MB
per core
16-32 ?
Applied

Micro

Helix ? ? ? ? ? ? ? ? 40 / 28 No No 32 + 32 (per core;
write-through
w/parity)
256 KiB shared
per core pair (with ECC)
1 MiB/core 2, 4, 8 ?
X-Gene   ? 4-wide 15 Yes ? ? ? 40 No No 8 MiB 8 4.2
X-Gene 2   ? 4-wide 15 Yes ? ? ? 28 No No 8 MiB 8 4.2
X-Gene 3   ? ? ? ? ? ? ? 16 No No ? ? 32 MiB 32 ?
Qualcomm Kryo 2016 ARMv8-A ? ? Yes Two-level? "big" or "LITTLE"
Qualcomm's own similar implementation
? 14 No No 32+24 0.5–1 MiB   2, 4 6.3
Kryo 2XX 2017 ARMv8-A 2-wide 11–12 Yes
7-wide dispatch
Two-level big 7 14 / 11 / 10 [51] No No 64 + 32/64? 512 KiB/Gold Core No 4 ?
2-wide 8 No Conditional+
Indirect branch
prediction
? 2 No No 8–64? + 8–64? 256 KiB/Silver Core 4 ?
Kryo 3XX 2018 ARMv8.2-A 3-wide 11–13 Yes
8-wide dispatch
Two-level big 8 10[51] No No 64+64[51] 256 KiB/Gold Core 2 MiB 4 ?
2-wide 8 No Conditional+
Indirect branch
prediction
? 28 No No 16–64? + 16–64? 128 KiB/Silver 4 ?
Kryo 4XX 2019 ARMv8.2-A 4-wide 11–13 Yes
8-wide dispatch
Yes big 8 11 / 8 / 7 No No 64 + 64 512 KiB/Gold Prime

256 KiB/Gold

2 MiB 1+3 ?
2-wide 8 No Conditional+
Indirect branch
prediction
? 2 No No 16–64? + 16–64? 128 KiB/Silver 4 ?
Falkor 11-8-2017 "ARMv8.1-A features"; AArch64 only (not 32-bit) 4-wide 10–15 Yes
8-wide dispatch
Yes ? 8 10 No 24 KiB 88[53] + 32 500KiB 1.25MiB 40-48 ?
Samsung M1/M2 2015 ARMv8-A 4-wide 13 Yes
9-wide dispatch
Two-level big 8 14 / 10 No No 64 + 32 2 MiB[59] no 4 ?
M3 2018 ARMv8.2-A 6-wide 15 Yes
12-wide dispatch
Two-level big 12 10 No No 64 + 64 512 KiB per core 4096KB 4 ?
M4 2019 ARMv8.2-A 6-wide 15 Yes
12-wide dispatch
Two-level big 12 8 / 7 No No 64 + 64 512 KiB per core 4096KB 2 ?
Fujitsu A64fx 2019 ARMv8.2-A 4/2-wide 7+ Yes
5-way?
Yes n/a 8+ 7 No No 64 + 64 8MiB per 12+1 cores No 48+4 1.9GHz+; 15GF/W+.
HiSilicon TaiShan V110 2019 ARMv8.2-A 4-wide ? Yes Yes n/a 8 7 No No 64 + 64 512 KiB per core 1 MiB per core ? ?

         目前国产的CPU以及华为的手机麒麟手机芯片和海思芯片等都是基于ARM V8架构的,也是cortex-A系列。可以说在移动便携式领域设备,ARM几乎全部覆盖。

4 ARM 内核时间表

Year Classic cores Cortex cores Neoverse cores
ARM7 ARM8 ARM9 ARM10 ARM11 Microcontroller Real-time Application
(32-bit)
Application
(64-bit)
Application
(64-bit)
1993 ARM700                  
1994 ARM710
ARM7DI
ARM7TDMI
                 
1995 ARM710a                  
1996   ARM810                
1997 ARM710T
ARM720T
ARM740T
                 
1998     ARM9TDMI
ARM940T
             
1999     ARM9E-S
ARM966E-S
             
2000     ARM920T
ARM922T
ARM946E-S
ARM1020T            
2001 ARM7TDMI-S
ARM7EJ-S
  ARM9EJ-S
ARM926EJ-S
ARM1020E
ARM1022E
           
2002       ARM1026EJ-S ARM1136J(F)-S          
2003     ARM968E-S   ARM1156T2(F)-S
ARM1176JZ(F)-S
         
2004           Cortex-M3        
2005         ARM11MPCore     Cortex-A8    
2006     ARM996HS              
2007           Cortex-M1   Cortex-A9    
2008                    
2009           Cortex-M0   Cortex-A5    
2010           Cortex-M4(F)   Cortex-A15    
2011             Cortex-R4
Cortex-R5
Cortex-R7
Cortex-A7    
2012           Cortex-M0+     Cortex-A53
Cortex-A57
 
2013               Cortex-A12    
2014           Cortex-M7(F)   Cortex-A17    
2015                 Cortex-A35
Cortex-A72
 
2016           Cortex-M23
Cortex-M33(F)
Cortex-R8
Cortex-R52
Cortex-A32 Cortex-A73  
2017                 Cortex-A55
Cortex-A75
 
2018           Cortex-M35P(F)     Cortex-A65AE
Cortex-A76
Cortex-A76AE
 
2019                 Cortex-A77 Neoverse E1
Neoverse N1

5 ARM第三方设计公司

Core Family Instruction set Microarchitecture Feature Cache (I / D), MMU Typical MIPS @ MHz
StrongARM
(Digital)
ARMv4 SA-110 5-stage pipeline 16 KB / 16 KB, MMU 100–233 MHz
1.0 DMIPS/MHz
SA-1100 derivative of the SA-110 16 KB / 8 KB, MMU  
Faraday[60]
(Faraday Technology)
ARMv4 FA510 6-stage pipeline Up to 32 KB / 32 KB cache, MPU 1.26 DMIPS/MHz
100–200 MHz
FA526 Up to 32 KB / 32 KB cache, MMU 1.26 MIPS/MHz
166–300 MHz
FA626 8-stage pipeline 32 KB / 32 KB cache, MMU 1.35 DMIPS/MHz
500 MHz
ARMv5TE FA606TE 5-stage pipeline No cache, no MMU 1.22 DMIPS/MHz
200 MHz
FA626TE 8-stage pipeline 32 KB / 32 KB cache, MMU 1.43 MIPS/MHz
800 MHz
FMP626TE 8-stage pipeline, SMP 1.43 MIPS/MHz
500 MHz
FA726TE 13 stage pipeline, dual issue 2.4 DMIPS/MHz
1000 MHz
XScale
(Intel / Marvell)
ARMv5TE XScale 7-stage pipeline, Thumb, enhanced DSP instructions 32 KB / 32 KB, MMU 133–400 MHz
Bulverde Wireless MMX, wireless SpeedStep added 32 KB / 32 KB, MMU 312–624 MHz
Monahans[61] Wireless MMX2 added 32 KB / 32 KB L1, optional L2 cache up to 512 KB, MMU Up to 1.25 GHz
Sheeva
(Marvell)
ARMv5 Feroceon 5–8 stage pipeline, single-issue 16 KB / 16 KB, MMU 600–2000 MHz
Jolteon 5–8 stage pipeline, dual-issue 32 KB / 32 KB, MMU
PJ1 (Mohawk) 5–8 stage pipeline, single-issue, Wireless MMX2 32 KB / 32 KB, MMU 1.46 DMIPS/MHz
1.06 GHz
ARMv6 / ARMv7-A PJ4 6–9 stage pipeline, dual-issue, Wireless MMX2, SMP 32 KB / 32 KB, MMU 2.41 DMIPS/MHz
1.6 GHz
Snapdragon
(Qualcomm)
ARMv7-A Scorpion[62] 1 or 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv3 FPU / NEON (128-bit wide) 256 KB L2 per core 2.1 DMIPS/MHz per core
Krait[62] 1, 2, or 4 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON (128-bit wide) 4 KB / 4 KB L0, 16 KB / 16 KB L1, 512 KB L2 per core 3.3 DMIPS/MHz per core
ARMv8-A Kryo[63] 4 cores. ? Up to 2.2 GHz

(6.3 DMIPS/MHz)

Ax
(Apple)
ARMv7-A Swift[64] 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON L1: 32 KB / 32 KB, L2: 1 MB 3.5 DMIPS/MHz per core
ARMv8-A Cyclone[65] 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64 L1: 64 KB / 64 KB, L2: 1 MB, L3: 4 MB 1.3 or 1.4 GHz
ARMv8-A Typhoon[65][66] 2 or 3 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64 L1: 64 KB / 64 KB, L2: 1 MB or 2 MB, L3: 4 MB 1.4 or 1.5 GHz
ARMv8-A Twister[67] 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64 L1: 64 KB / 64 KB, L2: 2 MB, L3: 4 MB or 0 MB 1.85 or 2.26 GHz
ARMv8.1-A Hurricane[68] 2 or 3 cores. AArch64, 6-decode, 6-issue, 9-wide, superscalar, out-of-order L1: 64 KB / 64 KB, L2: 3 MB or 8 MB, L3: 4 MB or 0 MB 2.34 or 2.38 GHz
ARMv8.2-A Monsoon[69] 2 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-order L1I: 128 KB, L1D: 64 KB, L2: 8 MB, L3: 4 MB 2.39 GHz
ARMv8.3-A Vortex[70] 2 or 4 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-order L1: 128 KB / 128 KB, L2: 8 MB, L3: 8 MB 2.5 GHz
ARMv8.4-A Lightning[71] 2 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-order L1: 128 KB / 128 KB, L2: 8 MB, L3: 16 MB 2.66 GHz
X-Gene
(Applied Micro)
ARMv8-A X-Gene 64-bit, quad issue, SMP, 64 cores[72] Cache, MMU, virtualization 3 GHz (4.2 DMIPS/MHz per core)
Denver
(Nvidia)
ARMv8-A Denver[73][74] 2 cores. AArch64, 7-wide superscalar, in-order, dynamic code optimization, 128 MB optimization cache,
Denver1: 28nm, Denver2:16nm
128 KB I-cache / 64 KB D-cache Up to 2.5 GHz
Carmel
(Nvidia)
ARMv8(t.b.d.) Carmel[75][76] 2 cores. AArch64, 10-wide superscalar, in-order, dynamic code optimization, ? MB optimization cache,
functional safety, dual execution, parity & ECC
? KB I-cache / ? KB D-cache Up to ? GHz
ThunderX
(Cavium)
ARMv8-A ThunderX 64-bit, with two models with 8–16 or 24–48 cores (×2 w/two chips) ? Up to 2.2 GHz
K12
(AMD)
ARMv8-A K12[77] ? ? ?
Exynos
(Samsung)
ARMv8-A M1/M2 ("Mongoose")[78] 4 cores. AArch64, 4-wide, quad-issue, superscalar, out-of-order 64 KB I-cache / 32 KB D-cache, L2: 16-way shared 2 MB 5.1 DMIPS/MHz

(2.6 GHz)

ARMv8-A M3 ("Meerkat")[79] 4 cores, AArch64, 6-decode, 6-issue, 6-wide. superscalar, out-of-order 64 KB I-cache / 32 KB D-cache, L2: 8-way private 512 KB, L3: 16-way shared 4 MB ?
ARMv8.2-A M4 ("Cheetah") 2 cores, AArch64, 6-decode, 6-issue, 6-wide. superscalar, out-of-order 64 KB I-cache / 32 KB D-cache, L2: 8-way private 512 KB, L3: 16-way shared 4 MB ?

本文参考:https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures 

猜你喜欢

转载自blog.csdn.net/qq_34160841/article/details/105611131