Introduction to Neon and common functions

What is neon, neon is a SIMD solution under the arm architecture, essentially some registers that can be used for SIMD

Refer to the official picture for details: 

 The official figure shows that for the arm development board, there are 16 128-bit registers, or it can also be regarded as 32 64-bit registers

List the commonly used neon functions:

Download Data:

float ptr[4]={1,2,3,4};
float32x4_t loaded_data = vld1q_f32(&ptr);

Load a certain data:

//将prt指针位置存放的数值加载到neon p向量中的最后一个位置上
float32x4_t p = vld1q_lane_f32(ptr,3)

Get a certain data in the vector:

//获取neon向量中的第一个数据,data_vector 是neon的向量, 0是向量首位
float a = vgetq_lane_u32(data_vector, 0)

A single data is expanded into a neon vector, and the four values ​​are equal:

//得到一个全是0的neon向量
float32x4_t a = vdupq_n_f32(0.0f)

Add corresponding vectors:

//两个neon向量v_0和v_1 对应元素相加
float32x4_t added_reuslt = vaddq_f32(v_0,v1)

Each element of the vector is rooted:

float32x4_t b = vsqrtq_f32(a)

 Multiply corresponding elements

//对应元素相乘
float32x4_t a =vmulq_f32(v0,v1)

Data written back:

// 当需要写回neon向量的数据到指针buff的时候 , 只要调用set类函数,这样该目标指针下往后四个数(包括自己 就会变为neon向量中的值)
 vst1q_f32(neon_v, ptr)

Here is a question for everyone to think about. Suppose there is a neon A vector with NaN values ​​in it, how to find out the position of the NaN value. At first, I thought of creating a neon vector with all NaN and doing the comparison operation. In fact, this The idea is wrong, because NaN values ​​are not equal for any value comparison.

So I found a simple way, the A vector is compared with itself, the equal is non-NaN, and the unequal is NaN

Then to get the position of NaN mask (NaN) is all 1, then you need to do another inversion operation vmvnq_u32

This is the end of the sharing of neon programming, there are still many neon instructions, you can check the intrinsic documentation of arm

Guess you like

Origin blog.csdn.net/qq_31638535/article/details/131554527