ARM Cortex-A系列编程指南之ARMv8 A -- 第二章 ARMv8 A架构和处理器

从1985年开始ARM架构经历了如下阶段：

ARMv4及早期版本	早期的使用ARM32bit指令集
ARMv4T	ARM32bit指令集的基础上，追加了Thumb16bit指令集。这是第一个被广泛授权的ARM架构。 ARM7TDMI ® 和ARM9TDMI ®处理器。
ARMv5TE	The ARMv5TE architecture added improvements for DSP-type operations, saturated arithmetic, and for ARM and Thumb interworking. ARM926EJ-S ®处理器。
ARMv6	ARMv6 made several enhancements, including support for unaligned memory accesses, significant changes to the memory architecture and for multi-processor support. Additionally, some support for SIMD operations operating on bytes or halfwords within the 32-bit registers was included. ARM1136JF-S ®处理器。
ARMv7-A	The ARMv7-A architecture makes the Thumb-2 extensions mandatory and adds the Advanced SIMD extensions (NEON). ARMv7-A provides all the features necessary to support a platform Operating System such as Linux. ARMv7-R provides predictable real-time high-performance. ARMv7-M is targeted at deeply-embedded microcontrollers.

ARMv8-A既包括32bit可执行环境，又包括64bit可执行环境。它引入了64bit宽的寄存器访问，又保留了对ARMv7软件的兼容。

ARMv8-A引入了一些列变化，使得处理器有了更显著的高性能：

Large physical address	这使处理器可以访问超过4GB的物理内存
64-bit virtual addressing	这使得虚拟内存超过了4GB的限制
Automatic event signaling	This enables power-efficient, high-performance spinlocks.
Larger register files	31个64bit通用寄存器提高了性能，并减少的栈的使用
Efficient 64-bit immediate generation	There is less need for literal pools.
Large PC-relative addressing range	A +/-4GB addressing range for efficient data addressing within shared libraries and position-independent executables.
Additional 16KB and 64KB translation granules	This reduces Translation Lookaside Buffer (TLB) miss rates and depth of page walks.
New exception model	减少了OS和hypervisor软件的复杂度
Efficient cache management	User space cache operations improve dynamic code generation efficiency. Fast Data cache clear using a Data Cache Zero instruction.
Hardware-accelerated cryptography	Provides 3 × to 10 × better software encryption performance. This is useful for small granule decryption and encryption too small to offload to a hardware accelerator efficiently, for example https.
Load-Acquire, Store-Release instructions	Designed for C++11, C11, Java memory models. They improve performance of thread-safe code by eliminating explicit memory barrier instructions.
NEON double-precision floating-point advanced SIMD	This enables SIMD vectorization to be applied to a much wider set of algorithms, for example, scientific computing, High Performance Computing (HPC) and supercomputers.

下面简单摘录几个ARMv8-A的处理器的内容（从Cortex-A – Arm Developer摘录的）：

Cortex-A72和Coretex-A53可以配对组合成大小核。同样，Coretex-A57也可以和Coretex-A53配对组合成大小核。

扫描二维码关注公众号，回复： 13637066 查看本文章

=========================================================================

注意：本文为本人原创，版权所属为个人所有，欢迎转载，但是转载请注明出处。

=========================================================================

ARM Cortex-A系列编程指南之ARMv8 A -- 第二章 ARMv8 A架构和处理器

猜你喜欢