Chapter 8.4 : The vectorization of reduction with intrinsic functions
- 8.4.1) The reduction_intrinsics.h file
- 8.4.2) The reduction_intrinsics.cpp file
- 8.4.3) The main_intrinsics.cpp
- 8.4.4) The CMakeLists.txt file
- 8.4.5) The compilation
- 8.4.6) The performances
During this section, we will use :
- Inclusion of file immintrin.h
- Intrinsic function : _mm256_load_ps
- Intrinsic function : _mm256_store_ps
- Intrinsic function : _mm256_storeu_ps (variant of the _mm256_store_ps to store values in unaligned vector)
- Intrinsic function : _mm256_add_ps
- Intrinsic function : _mm256_broadcast_ss (to duplicate a float 8 times in a vectorial register)
- Enable specific optimisations with -O3 -march=native -mtune=native -mavx2