10.5.6 : The performances
Figure 35 : Left panel : total averaged time for the sgemm function. Right panel : averaged time to compute one single element.
The problem of this method is the matrices have to have a number of columns which is a multiple of the number of float in a vectorial register size.
So you can try by yourself to find a solution to have an intrinsics method available for any matrix size.