ARM Cortex-A系列编程指南之ARMv8 A -- 第二章 ARMv8 A架构和处理器

从1985年开始ARM架构经历了如下阶段:

ARMv4及早期版本 早期的使用ARM32bit指令集
ARMv4T

ARM32bit指令集的基础上,追加了Thumb16bit指令集。

这是第一个被广泛授权的ARM架构。

ARM7TDMI ®  和ARM9TDMI ®处理器。
ARMv5TE
The ARMv5TE architecture added improvements for DSP-type operations,
saturated arithmetic, and for ARM and Thumb interworking.
ARM926EJ-S ®处理器。
ARMv6
ARMv6 made several enhancements, including support for unaligned memory
accesses, significant changes to the memory architecture and for multi-processor
support. Additionally, some support for SIMD operations operating on bytes or
halfwords within the 32-bit registers was included.
ARM1136JF-S ®处理器。
ARMv7-A
The ARMv7-A architecture makes the Thumb-2 extensions mandatory and adds
the Advanced SIMD extensions (NEON).
ARMv7-A provides all the features necessary to support a platform
Operating System such as Linux.
ARMv7-R provides predictable real-time high-performance.
ARMv7-M is targeted at deeply-embedded microcontrollers.

ARMv8-A既包括32bit可执行环境,又包括64bit可执行环境。它引入了64bit宽的寄存器访问,又保留了对ARMv7软件的兼容。

ARMv8-A引入了一些列变化,使得处理器有了更显著的高性能:

Large physical address
这使处理器可以访问超过4GB的物理内存
64-bit virtual addressing
这使得虚拟内存超过了4GB的限制
Automatic event signaling
This enables power-efficient, high-performance spinlocks.
Larger register files
31个64bit通用寄存器提高了性能,并减少的栈的使用
Efficient 64-bit immediate generation
There is less need for literal pools.
Large PC-relative addressing range
A +/-4GB addressing range for efficient data addressing within shared libraries
and position-independent executables.
Additional 16KB and 64KB translation granules
This reduces Translation Lookaside Buffer (TLB) miss rates and depth of page
walks.
New exception model
减少了OS和hypervisor软件的复杂度
Efficient cache management
User space cache operations improve dynamic code generation efficiency. Fast
Data cache clear using a Data Cache Zero instruction.
Hardware-accelerated cryptography
Provides 3 × to 10 × better software encryption performance. This is useful for
small granule decryption and encryption too small to offload to a hardware
accelerator efficiently, for example https.
Load-Acquire, Store-Release instructions
Designed for C++11, C11, Java memory models. They improve performance of
thread-safe code by eliminating explicit memory barrier instructions.
NEON double-precision floating-point advanced SIMD
This enables SIMD vectorization to be applied to a much wider set of algorithms,
for example, scientific computing, High Performance Computing (HPC) and
supercomputers.

下面简单摘录几个ARMv8-A的处理器的内容(从Cortex-A – Arm Developer摘录的):

 

 

 Cortex-A72和Coretex-A53可以配对组合成大小核。同样,Coretex-A57也可以和Coretex-A53配对组合成大小核。

扫描二维码关注公众号,回复: 13637066 查看本文章

=========================================================================

注意:本文为本人原创,版权所属为个人所有,欢迎转载,但是转载请注明出处。

=========================================================================

猜你喜欢

转载自blog.csdn.net/sjwangjinbao/article/details/121737184