How to optimize computation with HPC
Pierre Aubert
- 1) Introduction to High Performance Computing
- 2) Basic use of CMake
- 3) Starting the project
- 4) Several useful CMake functions
- 5) Creation of a HPC/Timer library
- 6) Optimisation of Hadamard product
- 7) Optimisation of saxpy
- 8) Optimisation of a reduction
- 9) Application/exercice : Optimisation barycentre computation
- 10) Optimisation of Dense Matrix-Matrix multiplication
- 11) What about branching ? (bonus)
Prerequisites
Compiler
We are going to use the GCC-7 compiler. If you are stucked with an other compiler or with an older version of GCC you can install it with Anaconda (in your tutorial environement) :conda install gcc7
Follow with Docker
This lecture can be followed with a docker image :docker pull gitlab-registry.in2p3.fr/cta-lapp/cours/hpc_asterics/hpc-asterics:latest docker run -it -p 8888:8888 gitlab-registry.in2p3.fr/cta-lapp/cours/hpc_asterics/hpc-asterics:latest
Compilation tools
We will use CMake and Make to compile our programs.Ploting tools
We will use Gnuplot to plot the results of our tests.Architecture tools
We will use the program hwloc-ls to show the CPU architecture.Optimisations
We are going to focus on the optimisation of single precision floating points computations. Basically, you can divide all the following speed-up by 2 to have the equivalent for double precision.
The double precision computation sounds more precise that single precision computation.
But, if you are doing non sence with double precision it also fails.
The correction of the whole tutorial is available here. If you want to start with the basic library you can download the minimal example here.