register
AVX floating point architecture allows data to be stored in 16 YMM registers
255 | 127 | 0 |
---|---|---|
%ymm0 | %xmm0 | 1st FP arg. return value |
% ymm1 | %xmm1 | 2nd FP parameters |
% ymm2 | %xmm2 | 3rd FP parameters |
%ymm3 | %xmm3 | 4th FP parameter |
% ymm4 | %xmm4 | 5th FP parameter |
% ymm5 | % xmm5 | 6th FP parameter |
% ymm6 | %xmm6 | 7th FP Parameters |
%ymm7 | % xmm7 | 8th FP parameter |
% ymm8 | % xmm8 | caller save |
% ymm9 | %xmm9 | caller save |
% ymm10 | %xmm10 | caller save |
%ymm11 | %xmm11 | caller save |
%ymm12 | %xmm12 | caller save |
%ymm13 | %xmm13 | caller save |
% ymm14 | %xmm14 | caller save |
% ymm15 | %xmm15 | caller save |
media register. These registers are used to hold floating point data. Each YMM register holds 32 bytes. The lower 16 bytes can be accessed as XMM registers
Floating point transfer and conversion operations
instruction | source | Purpose | describe |
---|---|---|---|
vmovss | M32 | X | Transmit single precision numbers |
vmovss | X | M32 | Transmit single precision numbers |
vmovsd | M64 | X | send double precision |
vmovsd | X | M64 | send double precision |
vmovaps | X | X | Delivers aligned packed single precision numbers |
vmovapd | X | X | Delivers an aligned packed double |
Floating point transfer instructions. These operations transfer values between memory and registers and between a pair of registers (X: XMM registers (e.g. %xmm3); M32: 32-bit memory range; M64: 64-bit memory range)
instruction | source | Purpose | describe |
---|---|---|---|
vcvttss2si | X/M32 | R32 | Convert a single-precision number to an integer using truncation |
vcvttsd2si | X/M64 | R32 | Convert Double to Integer by Truncation |
vcvttss2siq | X/M32 | R64 | Convert a single-precision number to a quad-word integer using truncation |
vcvttsd2siq | X/M64 | R64 | Convert double to quadword integer by truncation |
Two-operand floating-point conversion instructions. These operations convert floating point numbers to integers (X: XMM registers (eg %xmm3); R32: 32-bit general-purpose registers (eg %eax); R64: 64-bit general-purpose registers (eg %rax); M32: 32-bit memory ranges; M64: 64-bit memory range)
instruction | source 1 | source 2 | Purpose | describe |
---|---|---|---|---|
vcvtsi2ss | M32/R32 | X | X | convert integer to single precision |
vcvtsi2sd | M32/R32 | X | X | convert integer to double |
vcvtsi2ssq | M64/R64 | X | X | Convert quadword integer to single precision |
vcvtsi2sdq | M64/R64 | X | X | Convert quadword integer to double precision |
三操作数浮点转换指令。这些操作将第一个源的数据类型转换成目的数据类型。第二个源值对结果的低位字节没有影响(X:XMM寄存器(例如%xmm3);M32:32位内存范围;M64:64位内存范围)
gcc实现单精度与双精度的转换需要单独说明(就不具体解释了)
Conversion from single to double precision
vunpcklps %xmm0, %xmm0, %xmm0 Replicate first vector element
vcvtps2pd %xmm0, %xmm0 Convert two vector elements to double
Conversion from double to single precision
vmovddup %xmm0, %xmm0 Replicate first vector element
vcvtpd2psx %xmm0, %xmm0 Convert two vector elements to single
运算操作
标量avx2浮点指令。每条指令有一个(S1)或两个(S1,S2)源操作数,和一个目的操作数。第一个源操作数S1可以是一个XMM寄存器或一个内存位置。第二个源操作数和目的操作数都必须是XMM寄存器。每个操作都有一条针对单精度的指令和一条针对双精度的指令。结果存放在目的寄存器中。
单精度 | 双精度 | 效果 | 描述 |
---|---|---|---|
vaddss | vaddsd | D<—S2+S1 | 浮点数加 |
vsubss | vsubsd | D<—S2-S1 | 浮点数减 |
vmulss | vmulsd | D<—S2xS1 | 浮点数乘 |
vdivss | vdivsd | D<—S2/S1 | 浮点数除 |
vmaxss | vmaxsd | D<—max(S2,S1) | 浮点数最大值 |
vminss | vminsd | D<—min(S2,S1) | 浮点数最小值 |
sqrtss | sqrtsd | D<— | 浮点数平方根 |
位级操作
单精度 | 双精度 | 效果 | 描述 |
---|---|---|---|
vxorps | vorpd | D<—S2^S1 | 位级异或(EXCLUSIVE–OR) |
vandps | andpd | D<—S2&S1 | 位级与(AND) |
对封装数据的位级操作(这些指令对一个XMM寄存器中的所有128位进行布尔操作)
比较操作
指令 | 基于 | 描述 |
---|---|---|
ucomiss S1,S2 | S2-S1 | 比较单精度值 |
ucomisd S1,S2 | S2-S1 | 比较双精度值 |
参数S2必须在XMM寄存器中,而S1可以在XMM寄存器中,也可以在内存中
条件码的设置如下:
顺序S2:S1 | CF | ZF | PF(奇偶标志位) |
---|---|---|---|
无序的(NaN) | 1 | 1 | 1 |
S2 < S1 | 1 | 0 | 0 |
S2 = S1 | 0 | 1 | 0 |
S2 > S1 | 0 | 0 | 0 |