Part 10 : Optimisation of Dense Matrix-Matrix multiplication
- 10.1) What is a SGEMM ?
- 10.2) The classical approach
- 10.3) Let's swap the loops over j and k
- 10.4) Vectorization
- 10.5) Intrinsics implementation
- 10.6) Intrinsics implementation with a pitch
- 10.7) How to create a sgemm python module
You can find the associated presentation here.
Let's create a directory 6-Sgemm in the directory ExampleOptimisation for this example :
$ mkdir 6-Sgemm
Do not forget to complete the ExampleOptimisation/CMakeLists.txt file :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
project(HPC_ASTERICS) cmake_minimum_required(VERSION 3.0) set(PYTHON_EXECUTABLE "python3" CACHE BOOL "Python program") add_subdirectory(Performances) include(runExample.cmake) include(pythonCheck.cmake) set(VECTOR_ALIGNEMENT 32) add_definitions(-DVECTOR_ALIGNEMENT=${VECTOR_ALIGNEMENT}) add_subdirectory(0-CMakeHelloWorld) add_subdirectory(AstericsHPC) include_directories(${CMAKE_CURRENT_SOURCE_DIR}/AstericsHPC) add_subdirectory(1-HadamardProduct) add_subdirectory(3-Saxpy) add_subdirectory(4-Reduction) add_subdirectory(5-Barycentre) add_subdirectory(6-Sgemm) |