Compiling the NEON program on the ARM processor of aarch64, the following error occurs:
/usr/lib/gcc/arm-linux-gnueabihf/6/include/arm_neon.h:5792:1: error: inlining failed in call to always_inline ‘vdupq_n_s32’: target specific option mismatch
vdupq_n_s32 (int32_t __a)
^~~~~~~~~~~
convolution_neon_int32.c:94:13: note: called from here
int32x4_t sum_t = vdupq_n_s32(0);
^~~~~
In file included from convolution_neon_int32.c:3:0:
/usr/lib/gcc/arm-linux-gnueabihf/6/include/arm_neon.h:583:1: error: inlining failed in call to always_inline ‘vaddq_s32’: target specific option mismatch
vaddq_s32 (int32x4_t __a, int32x4_t __b)
^~~~~~~~~
convolution_neon_int32.c:103:10: note: called from here
sum_t = vaddq_s32(sum_t, multiply_t);
~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from convolution_neon_int32.c:3:0:
/usr/lib/gcc/arm-linux-gnueabihf/6/include/arm_neon.h:1072:1: error: inlining failed in call to always_inline ‘vmulq_s32’: target specific option mismatch
vmulq_s32 (int32x4_t __a, int32x4_t __b)
^~~~~~~~~
convolution_neon_int32.c:102:15: note: called from here
multiply_t = vmulq_s32(matb_t, veca_t);
~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from convolution_neon_int32.c:3:0:
/usr/lib/gcc/arm-linux-gnueabihf/6/include/arm_neon.h:8997:1: error: inlining failed in call to always_inline ‘vld1q_s32’: target specific option mismatch
vld1q_s32 (const int32_t * __a)
^~~~~~~~~
convolution_neon_int32.c:101:11: note: called from here
veca_t = vld1q_s32(ptr_veca);
~~~~~~~^~~~~~~~~~~~~~~~~~~~~
In file included from convolution_neon_int32.c:3:0:
/usr/lib/gcc/arm-linux-gnueabihf/6/include/arm_neon.h:8997:1: error: inlining failed in call to always_inline ‘vld1q_s32’: target specific option mismatch
vld1q_s32 (const int32_t * __a)
^~~~~~~~~
convolution_neon_int32.c:99:11: note: called from here
matb_t = vld1q_s32(ptr_matB);
~~~~~~~^~~~~~~~~~~~~~~~~~~~~
In file included from convolution_neon_int32.c:3:0:
/usr/lib/gcc/arm-linux-gnueabihf/6/include/arm_neon.h:5299:1: error: inlining failed in call to always_inline ‘vget_lane_s32’: target specific option mismatch
vget_lane_s32 (int32x2_t __a, const int __b)
^~~~~~~~~~~~~
convolution_neon_int32.c:117:10: note: called from here
sum += vget_lane_s32(half_t, 1);
^~~~~~~~~~~~~~~~~~~~~~~~
In file included from convolution_neon_int32.c:3:0:
/usr/lib/gcc/arm-linux-gnueabihf/6/include/arm_neon.h:5299:1: error: inlining failed in call to always_inline ‘vget_lane_s32’: target specific option mismatch
vget_lane_s32 (int32x2_t __a, const int __b)
^~~~~~~~~~~~~
convolution_neon_int32.c:116:10: note: called from here
sum += vget_lane_s32(half_t, 0);
^~~~~~~~~~~~~~~~~~~~~~~~
In file included from convolution_neon_int32.c:3:0:
/usr/lib/gcc/arm-linux-gnueabihf/6/include/arm_neon.h:3061:1: error: inlining failed in call to always_inline ‘vpadd_s32’: target specific option mismatch
vpadd_s32 (int32x2_t __a, int32x2_t __b)
The solution is to add the following compile options: -march=armv8-a -marm -mfpu=neon
arm-linux-gnueabihf-gcc -march=armv8-a -marm -mfpu=neon convolution_neon_int32.c -o convolution_neon_int32