【arm】arm架构64位(AArch64)汇编优化总结

版权声明:本文为博主原创文章,未经博主允许不得转载。若允许转载,请注明来源https://blog.csdn.net/SoaringLee_fighting,否则保留追究法律责任的权利! https://blog.csdn.net/SoaringLee_fighting/article/details/82530435

Date: 2018.9.13


1、参考

https://blog.csdn.net/SoaringLee_fighting/article/details/81906495
https://blog.csdn.net/u011514906/article/details/38142177
https://blog.csdn.net/listener51/article/details/82530464

2、前言

  本文是arm架构64位(AArch64执行状态) neon优化的总结文档,主要包括arm架构64位优化的基础知识,特殊用法,打印调试和常用指令使用注意事项以及资料来源等相关知识。前文已有arm架构32位汇编优化总结对arm架构32位neon优化进行了全面总结,并且讲述了arm汇编语法,下面主要以gnu arm汇编语法为例讲述。
  

3、arm架构64位优化基础知识

https://blog.csdn.net/SoaringLee_fighting/article/details/81906495
  该博客已经分析了arm架构64位汇编优化的入门基础知识,主要包括架构分析,寄存器,调用规则,指令集和程序打印调试相关知识,可以作为入门arm64位汇编优化的基础知识。
  

4、ARMv8/AArch64 neon指令格式

  In the AArch64 execution state, the syntax of NEON instruction has changed. It can be described as follows:

  {<prefix>}<op>{<suffix>}  Vd.<T>, Vn.<T>, Vm.<T>  

Where:
< prefix> - prefix, such as using S/U/F/P to represent signed/unsigned/float/bool data type.
< op> – operation, such as ADD, AND etc.
< suffix> - suffix

P: “pairwise” operations, such as ADDP,LDP,STP.
V: the new reduction (across-all-lanes) operations, such as ADDV,SMAXV,FMAXV.
2:new widening/narrowing “second part” instructions, such as ADDHN2, SADDL2,SMULL2.

< T> - data type, 8B/16B/4H/8H/2S/4S/2D. B represents byte (8-bit). H represents half-word (16-bit). S represents word (32-bit). D represents a double-word (64-bit).
 For example:

UADDLP    V0.8H, V0.16B
FADD V0.4S, V0.4S, V0.4S
5、ARM相关编译参数

  嵌入式设备(即arm架构的板子)在编译时,最好加上 -fsigned-char 因为嵌入式设备默认类型为unsigned char类型,非char 类型。此外在编译arm汇编优化代码时,编译选项需要加上-c 。-c都表示编译或汇编源文件,但是不进行链接。
  ARM相关或者硬件相关编译参数一般以-m开头,常用ARM平台编译选项包括:

-mcpu = cortex-a7
-mabi = atpcs
-march = armv7
-mtune = cortex-a53
-mfpu = neon, neon-vfpv4
-mfloat-api = soft, softfp, hard

更多详细内容可以参考:https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc.pdf 3.17.1小节AArch64 option和3.17.4小节 ARM options.

6、查看状态标记位NZCV的方法
mrs  x15, nzcv
mov  w0,  w15
bl  print
7、A64指令集特有的指令及其用法

  1. shl和ushr指令

    shl  <V>.<d>, <V>.<n>, #<shift>
    ushr  <V>.<d>, <V>.<n>, #<shift>
    ushr  d2, d2,  #8

使用注意事项:这两条指令只能操作64位数据,即只能对D寄存器进行处理。
ushr最多只能进行64位数据的右移,并且右移时会影响V2寄存器的高64位数据(清零),因此高64位数据需要在右移前保存,否则相关数据会被修改。

  2. INS指令
用法与MOV指令基本一样,可以实现neon标量与neon标量之间的传送,以及ARM寄存器与neon标量之间的传送。

INS   <Vd>.<Ts>[index1], <Vn>.<Ts>[index2]
INS   <Vd>.<Ts>[index1], Rn

  3. SUQADD、USQADD指令
既有标量用法,也有矢量用法。

SUQADD <V><d>, <V><d>     // signed saturating accumulate of unsigned value
SUQADD <Vd>.<T>, <Vn>.<T>

USQADD <V><d>, <V><d>    // unsigned saturating accumulate of signed value
USQADD <Vd>.<T>, <Vn>.<T>

  4. RBIT、REV指令

 RBIT <Wd>, <Wn> //reverse bits
 REV <Wd>, <Wn>  //reverse bytes

 5. ADDV,SADDLV,SMAXV,SMINV (Vector Reduce(across lanes))

ADDV <V><d>, <Vn><T>    // Integer sum element to scalar(vector)
SADDLV <V><d>, <Vn><T>  // Signed Interger sum elements to long scalar(vector)
SMAXV <V><d>, <Vn><T>   // Signed Interger maximum elements to scalar(vector)
SMINV <V><d>, <Vn><T>   // Signed Interger minimum elements to scalar(vector)
8、资料文档查阅

THE END!

猜你喜欢

转载自blog.csdn.net/SoaringLee_fighting/article/details/82530435