从1985年开始ARM架构经历了如下阶段:
ARMv4及早期版本 | 早期的使用ARM32bit指令集 |
ARMv4T | ARM32bit指令集的基础上,追加了Thumb16bit指令集。 这是第一个被广泛授权的ARM架构。
ARM7TDMI
®
和ARM9TDMI
®处理器。
|
ARMv5TE |
The ARMv5TE architecture added improvements for DSP-type operations,
saturated arithmetic, and for ARM and Thumb interworking.
ARM926EJ-S
®处理器。
|
ARMv6 |
ARMv6 made several enhancements, including support for unaligned memory
accesses, significant changes to the memory architecture and for multi-processor
support. Additionally, some support for SIMD operations operating on bytes or
halfwords within the 32-bit registers was included.
ARM1136JF-S
®处理器。
|
ARMv7-A |
The ARMv7-A architecture makes the Thumb-2 extensions mandatory and adds
the Advanced SIMD extensions (NEON).
ARMv7-A provides all the features necessary to support a platform
Operating System such as Linux.
ARMv7-R provides predictable real-time high-performance.
ARMv7-M is targeted at deeply-embedded microcontrollers.
|
ARMv8-A既包括32bit可执行环境,又包括64bit可执行环境。它引入了64bit宽的寄存器访问,又保留了对ARMv7软件的兼容。
ARMv8-A引入了一些列变化,使得处理器有了更显著的高性能:
Large physical address
|
这使处理器可以访问超过4GB的物理内存 |
64-bit virtual addressing
|
这使得虚拟内存超过了4GB的限制 |
Automatic event signaling
|
This enables power-efficient, high-performance spinlocks.
|
Larger register files
|
31个64bit通用寄存器提高了性能,并减少的栈的使用 |
Efficient 64-bit immediate generation
|
There is less need for literal pools.
|
Large PC-relative addressing range
|
A +/-4GB addressing range for efficient data addressing within shared libraries
and position-independent executables.
|
Additional 16KB and 64KB translation granules
|
This reduces
Translation Lookaside Buffer
(TLB) miss rates and depth of page
walks.
|
New exception model
|
减少了OS和hypervisor软件的复杂度 |
Efficient cache management
|
User space cache operations improve dynamic code generation efficiency. Fast
Data cache clear using a Data Cache Zero instruction.
|
Hardware-accelerated cryptography
|
Provides 3
×
to 10
×
better software encryption performance. This is useful for
small granule decryption and encryption too small to offload to a hardware
accelerator efficiently, for example https.
|
Load-Acquire, Store-Release instructions
|
Designed for C++11, C11, Java memory models. They improve performance of
thread-safe code by eliminating explicit memory barrier instructions.
|
NEON double-precision floating-point advanced SIMD
|
This enables SIMD vectorization to be applied to a much wider set of algorithms,
for example, scientific computing,
High Performance Computing
(HPC) and
supercomputers.
|
下面简单摘录几个ARMv8-A的处理器的内容(从Cortex-A – Arm Developer摘录的):
Cortex-A72和Coretex-A53可以配对组合成大小核。同样,Coretex-A57也可以和Coretex-A53配对组合成大小核。
扫描二维码关注公众号,回复:
13637066 查看本文章
=========================================================================
注意:本文为本人原创,版权所属为个人所有,欢迎转载,但是转载请注明出处。
=========================================================================