What is neon, neon is a SIMD solution under the arm architecture, essentially some registers that can be used for SIMD
Refer to the official picture for details:
The official figure shows that for the arm development board, there are 16 128-bit registers, or it can also be regarded as 32 64-bit registers
List the commonly used neon functions:
Download Data:
float ptr[4]={1,2,3,4};
float32x4_t loaded_data = vld1q_f32(&ptr);
Load a certain data:
//将prt指针位置存放的数值加载到neon p向量中的最后一个位置上
float32x4_t p = vld1q_lane_f32(ptr,3)
Get a certain data in the vector:
//获取neon向量中的第一个数据,data_vector 是neon的向量, 0是向量首位
float a = vgetq_lane_u32(data_vector, 0)
A single data is expanded into a neon vector, and the four values are equal:
//得到一个全是0的neon向量
float32x4_t a = vdupq_n_f32(0.0f)
Add corresponding vectors:
//两个neon向量v_0和v_1 对应元素相加
float32x4_t added_reuslt = vaddq_f32(v_0,v1)
Each element of the vector is rooted:
float32x4_t b = vsqrtq_f32(a)
Multiply corresponding elements
//对应元素相乘
float32x4_t a =vmulq_f32(v0,v1)
Data written back:
// 当需要写回neon向量的数据到指针buff的时候 , 只要调用set类函数,这样该目标指针下往后四个数(包括自己 就会变为neon向量中的值)
vst1q_f32(neon_v, ptr)
Here is a question for everyone to think about. Suppose there is a neon A vector with NaN values in it, how to find out the position of the NaN value. At first, I thought of creating a neon vector with all NaN and doing the comparison operation. In fact, this The idea is wrong, because NaN values are not equal for any value comparison.
So I found a simple way, the A vector is compared with itself, the equal is non-NaN, and the unequal is NaN
Then to get the position of NaN mask (NaN) is all 1, then you need to do another inversion operation vmvnq_u32
This is the end of the sharing of neon programming, there are still many neon instructions, you can check the intrinsic documentation of arm