Chapter 8.4 : The vectorization of reduction with intrinsic functions



During this section, we will use :
  • Inclusion of file immintrin.h
  • Intrinsic function : _mm256_load_ps
  • Intrinsic function : _mm256_store_ps
  • Intrinsic function : _mm256_storeu_ps (variant of the _mm256_store_ps to store values in unaligned vector)
  • Intrinsic function : _mm256_add_ps
  • Intrinsic function : _mm256_broadcast_ss (to duplicate a float 8 times in a vectorial register)
  • Enable specific optimisations with -O3 -march=native -mtune=native -mavx2