SIMD (Single Instruction Multiple Data), as the name suggests, is a single instruction that processes multiple data. SIMD instructions are very similar in nature to a vector processor, which can perform the same operation on a set of data (also known as "data vector") on the controller at the same time to achieve spatial parallelism. SIMD is the key for the CPU to realize DLP (Data Level Parallelism), and DLP completes the calculation according to the SIMD mode. SSE and older MMX and AMD's 3DNow! are both SIMD instruction sets. It can effectively increase the speed of floating-point operations by processing multiple floating-points in parallel with single-instruction multiple-data technology and a single clock cycle.
1, more popular
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512 for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.
https://github.com/ermig1979/Simd
http://ermig1979.github.io/Simd/
#include <opencv2/core/core.hpp>
#define SIMD_OPENCV_ENABLE
#include "Simd/SimdLib.hpp"
cv::Point, cv::Size <--> Simd::Point
cv::Rect <--> Simd::Rectangle
cv::Mat <--> Simd::View
std::experimental::simd for GCC [ISO/IEC TS 19570:2018]
https://github.com/VcDevel/std-simd
SIMD Vector Classes for C++
https://github.com/VcDevel/Vc
https://web-docs.gsi.de/~mkretz/Vc-master/
MIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX and AVX-512.
https://github.com/aff3ct/MIPP
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)
https://github.com/xtensor-stack/xsimd
https://xsimd.readthedocs.io/en/latest/index.html
pillow
https://github.com/python-pillow/Pillow
pillow-simd
https://github.com/uploadcare/pillow-simd
2. Relatively unpopular
Portable header-only C++ low level SIMD library
https://github.com/p12tic/libsimdpp
Agenium Scale vectorization library for CPUs and GPUs
https://github.com/agenium-scale/nsimd
UME::SIMD A library for explicit simd vectorization.
https://github.com/edanor/umesimd
3. Package and repackage
The Mandelbrot fractal: sequential and SIMD implementations.
https://gitlab.inria.fr/acassagn/mandelbrot
VecCore is a simple abstraction layer on top of other vectorization libraries.
https://gitlab.cern.ch/VecGeom/VecCore
4. Other applications
Implementations of SIMD instruction sets for systems which don't natively support them.
https://github.com/simd-everywhere/simde
SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
https://github.com/shibatch/sleef
C++ SIMD Noise Library
https://github.com/Auburn/FastNoiseSIMD
5. SIMD tutorial
---
Intel CPU指令集
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
《Modern Parallel Programming with C++ and Assembly Language: X86 SIMD Development Using AVX, AVX2, and AVX-512 1st ed. Edition》
"Parallel programming method and optimization practice"
"Parallel Algorithm Design and Performance Optimization"