3.4.4.2 : Performances
1 |
make plot_all |
Detail des performances
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
make plot_all [ 7%] Built target perf_hadamard_gpupar_O3 [ 15%] Built target perf_hadamard_gpupar_O0 [ 23%] Built target perf_hadamard_gpupar_O4 [ 31%] Built target perf_hadamard_gpupar_O1 [ 39%] Built target perf_hadamard_gpupar_O2 Scanning dependencies of target plot_hadamard_gpuparBase [ 42%] Run perf_hadamard_gpupar_O4 program micro_benchmarkAutoNs : nbCallPerTest = 422 evaluate hadamard : nbElement = 1000, timePerElement = 30.8812 ns/el ± 0.23229, elapsedTime = 30881.2 ns ± 232.29 micro_benchmarkAutoNs : nbCallPerTest = 420 evaluate hadamard : nbElement = 2000, timePerElement = 15.5982 ns/el ± 0.1222, elapsedTime = 31196.4 ns ± 244.401 micro_benchmarkAutoNs : nbCallPerTest = 406 evaluate hadamard : nbElement = 3000, timePerElement = 10.6674 ns/el ± 0.0697921, elapsedTime = 32002.3 ns ± 209.376 micro_benchmarkAutoNs : nbCallPerTest = 411 evaluate hadamard : nbElement = 4000, timePerElement = 8.00122 ns/el ± 0.0592695, elapsedTime = 32004.9 ns ± 237.078 micro_benchmarkAutoNs : nbCallPerTest = 405 evaluate hadamard : nbElement = 5000, timePerElement = 6.48158 ns/el ± 0.0412615, elapsedTime = 32407.9 ns ± 206.307 micro_benchmarkAutoNs : nbCallPerTest = 391 evaluate hadamard : nbElement = 10000, timePerElement = 3.37552 ns/el ± 0.0244, elapsedTime = 33755.2 ns ± 244 micro_benchmarkAutoNs : nbCallPerTest = 289 evaluate hadamard : nbElement = 50000, timePerElement = 0.925408 ns/el ± 0.0181593, elapsedTime = 46270.4 ns ± 907.967 micro_benchmarkAutoNs : nbCallPerTest = 213 evaluate hadamard : nbElement = 100000, timePerElement = 0.667545 ns/el ± 0.0288547, elapsedTime = 66754.5 ns ± 2885.47 micro_benchmarkAutoNs : nbCallPerTest = 130 evaluate hadamard : nbElement = 200000, timePerElement = 0.510854 ns/el ± 0.0250128, elapsedTime = 102171 ns ± 5002.56 micro_benchmarkAutoNs : nbCallPerTest = 71 evaluate hadamard : nbElement = 500000, timePerElement = 0.358167 ns/el ± 0.00341734, elapsedTime = 179084 ns ± 1708.67 micro_benchmarkAutoNs : nbCallPerTest = 28 evaluate hadamard : nbElement = 1000000, timePerElement = 0.456917 ns/el ± 0.00416396, elapsedTime = 456917 ns ± 4163.96 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.749144 ns/el ± 0.00796663, elapsedTime = 3.74572e+06 ns ± 39833.2 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.786622 ns/el ± 0.00371752, elapsedTime = 7.86622e+06 ns ± 37175.2 [ 44%] Run perf_hadamard_gpupar_O0 program micro_benchmarkAutoNs : nbCallPerTest = 97 evaluate hadamard : nbElement = 1000, timePerElement = 133.665 ns/el ± 1.47425, elapsedTime = 133665 ns ± 1474.25 micro_benchmarkAutoNs : nbCallPerTest = 85 evaluate hadamard : nbElement = 2000, timePerElement = 75.9769 ns/el ± 0.754512, elapsedTime = 151954 ns ± 1509.02 micro_benchmarkAutoNs : nbCallPerTest = 79 evaluate hadamard : nbElement = 3000, timePerElement = 54.496 ns/el ± 0.700398, elapsedTime = 163488 ns ± 2101.19 micro_benchmarkAutoNs : nbCallPerTest = 75 evaluate hadamard : nbElement = 4000, timePerElement = 43.3872 ns/el ± 0.504569, elapsedTime = 173549 ns ± 2018.28 micro_benchmarkAutoNs : nbCallPerTest = 71 evaluate hadamard : nbElement = 5000, timePerElement = 36.8775 ns/el ± 0.549347, elapsedTime = 184387 ns ± 2746.74 micro_benchmarkAutoNs : nbCallPerTest = 56 evaluate hadamard : nbElement = 10000, timePerElement = 23.0748 ns/el ± 0.283516, elapsedTime = 230748 ns ± 2835.16 micro_benchmarkAutoNs : nbCallPerTest = 25 evaluate hadamard : nbElement = 50000, timePerElement = 10.4635 ns/el ± 0.128517, elapsedTime = 523175 ns ± 6425.87 micro_benchmarkAutoNs : nbCallPerTest = 14 evaluate hadamard : nbElement = 100000, timePerElement = 8.73013 ns/el ± 0.147506, elapsedTime = 873013 ns ± 14750.6 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 200000, timePerElement = 7.71121 ns/el ± 0.0707096, elapsedTime = 1.54224e+06 ns ± 14141.9 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 500000, timePerElement = 7.14712 ns/el ± 0.104409, elapsedTime = 3.57356e+06 ns ± 52204.6 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 1000000, timePerElement = 7.00085 ns/el ± 0.087528, elapsedTime = 7.00085e+06 ns ± 87528 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 6.78909 ns/el ± 0.0155163, elapsedTime = 3.39454e+07 ns ± 77581.3 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 6.73782 ns/el ± 0.00961987, elapsedTime = 6.73782e+07 ns ± 96198.7 [ 47%] Run perf_hadamard_gpupar_O1 program micro_benchmarkAutoNs : nbCallPerTest = 442 evaluate hadamard : nbElement = 1000, timePerElement = 29.3987 ns/el ± 0.34706, elapsedTime = 29398.7 ns ± 347.06 micro_benchmarkAutoNs : nbCallPerTest = 427 evaluate hadamard : nbElement = 2000, timePerElement = 15.3083 ns/el ± 0.142779, elapsedTime = 30616.6 ns ± 285.558 micro_benchmarkAutoNs : nbCallPerTest = 417 evaluate hadamard : nbElement = 3000, timePerElement = 10.578 ns/el ± 0.132579, elapsedTime = 31734 ns ± 397.737 micro_benchmarkAutoNs : nbCallPerTest = 405 evaluate hadamard : nbElement = 4000, timePerElement = 8.21038 ns/el ± 0.0820182, elapsedTime = 32841.5 ns ± 328.073 micro_benchmarkAutoNs : nbCallPerTest = 385 evaluate hadamard : nbElement = 5000, timePerElement = 6.74212 ns/el ± 0.072288, elapsedTime = 33710.6 ns ± 361.44 micro_benchmarkAutoNs : nbCallPerTest = 338 evaluate hadamard : nbElement = 10000, timePerElement = 3.83246 ns/el ± 0.032873, elapsedTime = 38324.6 ns ± 328.73 micro_benchmarkAutoNs : nbCallPerTest = 175 evaluate hadamard : nbElement = 50000, timePerElement = 1.47353 ns/el ± 0.0193611, elapsedTime = 73676.5 ns ± 968.053 micro_benchmarkAutoNs : nbCallPerTest = 117 evaluate hadamard : nbElement = 100000, timePerElement = 1.09855 ns/el ± 0.0132944, elapsedTime = 109855 ns ± 1329.44 micro_benchmarkAutoNs : nbCallPerTest = 73 evaluate hadamard : nbElement = 200000, timePerElement = 0.891388 ns/el ± 0.00807633, elapsedTime = 178278 ns ± 1615.27 micro_benchmarkAutoNs : nbCallPerTest = 34 evaluate hadamard : nbElement = 500000, timePerElement = 0.760071 ns/el ± 0.0071964, elapsedTime = 380035 ns ± 3598.2 micro_benchmarkAutoNs : nbCallPerTest = 17 evaluate hadamard : nbElement = 1000000, timePerElement = 0.737363 ns/el ± 0.00578992, elapsedTime = 737363 ns ± 5789.92 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.751263 ns/el ± 0.0138082, elapsedTime = 3.75632e+06 ns ± 69040.9 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.780348 ns/el ± 0.00667971, elapsedTime = 7.80348e+06 ns ± 66797.1 [ 50%] Run perf_hadamard_gpupar_O2 program micro_benchmarkAutoNs : nbCallPerTest = 432 evaluate hadamard : nbElement = 1000, timePerElement = 29.8853 ns/el ± 0.333225, elapsedTime = 29885.3 ns ± 333.225 micro_benchmarkAutoNs : nbCallPerTest = 424 evaluate hadamard : nbElement = 2000, timePerElement = 15.3606 ns/el ± 0.192006, elapsedTime = 30721.3 ns ± 384.013 micro_benchmarkAutoNs : nbCallPerTest = 410 evaluate hadamard : nbElement = 3000, timePerElement = 10.5133 ns/el ± 0.112758, elapsedTime = 31539.8 ns ± 338.274 micro_benchmarkAutoNs : nbCallPerTest = 411 evaluate hadamard : nbElement = 4000, timePerElement = 8.04945 ns/el ± 0.0672967, elapsedTime = 32197.8 ns ± 269.187 micro_benchmarkAutoNs : nbCallPerTest = 402 evaluate hadamard : nbElement = 5000, timePerElement = 6.59136 ns/el ± 0.0695574, elapsedTime = 32956.8 ns ± 347.787 micro_benchmarkAutoNs : nbCallPerTest = 351 evaluate hadamard : nbElement = 10000, timePerElement = 3.66702 ns/el ± 0.039932, elapsedTime = 36670.2 ns ± 399.32 micro_benchmarkAutoNs : nbCallPerTest = 196 evaluate hadamard : nbElement = 50000, timePerElement = 1.30901 ns/el ± 0.0124705, elapsedTime = 65450.7 ns ± 623.524 micro_benchmarkAutoNs : nbCallPerTest = 138 evaluate hadamard : nbElement = 100000, timePerElement = 0.940901 ns/el ± 0.0136015, elapsedTime = 94090.1 ns ± 1360.15 micro_benchmarkAutoNs : nbCallPerTest = 88 evaluate hadamard : nbElement = 200000, timePerElement = 0.738183 ns/el ± 0.0136821, elapsedTime = 147637 ns ± 2736.41 micro_benchmarkAutoNs : nbCallPerTest = 42 evaluate hadamard : nbElement = 500000, timePerElement = 0.611794 ns/el ± 0.0071958, elapsedTime = 305897 ns ± 3597.9 micro_benchmarkAutoNs : nbCallPerTest = 20 evaluate hadamard : nbElement = 1000000, timePerElement = 0.619036 ns/el ± 0.00413746, elapsedTime = 619036 ns ± 4137.46 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.752102 ns/el ± 0.0135541, elapsedTime = 3.76051e+06 ns ± 67770.6 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.785966 ns/el ± 0.00482216, elapsedTime = 7.85966e+06 ns ± 48221.6 [ 52%] Run perf_hadamard_gpupar_O3 program micro_benchmarkAutoNs : nbCallPerTest = 411 evaluate hadamard : nbElement = 1000, timePerElement = 31.321 ns/el ± 0.345605, elapsedTime = 31321 ns ± 345.605 micro_benchmarkAutoNs : nbCallPerTest = 414 evaluate hadamard : nbElement = 2000, timePerElement = 15.8577 ns/el ± 0.212741, elapsedTime = 31715.4 ns ± 425.481 micro_benchmarkAutoNs : nbCallPerTest = 399 evaluate hadamard : nbElement = 3000, timePerElement = 10.8115 ns/el ± 0.120496, elapsedTime = 32434.5 ns ± 361.489 micro_benchmarkAutoNs : nbCallPerTest = 409 evaluate hadamard : nbElement = 4000, timePerElement = 8.06679 ns/el ± 0.084405, elapsedTime = 32267.2 ns ± 337.62 micro_benchmarkAutoNs : nbCallPerTest = 399 evaluate hadamard : nbElement = 5000, timePerElement = 6.55645 ns/el ± 0.0724905, elapsedTime = 32782.2 ns ± 362.452 micro_benchmarkAutoNs : nbCallPerTest = 385 evaluate hadamard : nbElement = 10000, timePerElement = 3.39989 ns/el ± 0.0365379, elapsedTime = 33998.9 ns ± 365.379 micro_benchmarkAutoNs : nbCallPerTest = 282 evaluate hadamard : nbElement = 50000, timePerElement = 0.93019 ns/el ± 0.0107346, elapsedTime = 46509.5 ns ± 536.73 micro_benchmarkAutoNs : nbCallPerTest = 207 evaluate hadamard : nbElement = 100000, timePerElement = 0.618014 ns/el ± 0.0079121, elapsedTime = 61801.4 ns ± 791.21 micro_benchmarkAutoNs : nbCallPerTest = 142 evaluate hadamard : nbElement = 200000, timePerElement = 0.453412 ns/el ± 0.00666807, elapsedTime = 90682.3 ns ± 1333.61 micro_benchmarkAutoNs : nbCallPerTest = 72 evaluate hadamard : nbElement = 500000, timePerElement = 0.360069 ns/el ± 0.0069826, elapsedTime = 180034 ns ± 3491.3 micro_benchmarkAutoNs : nbCallPerTest = 27 evaluate hadamard : nbElement = 1000000, timePerElement = 0.46339 ns/el ± 0.00800761, elapsedTime = 463390 ns ± 8007.61 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.755135 ns/el ± 0.0125613, elapsedTime = 3.77567e+06 ns ± 62806.6 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.794803 ns/el ± 0.0065402, elapsedTime = 7.94803e+06 ns ± 65402 [ 55%] Call gnuplot hadamard_gpuparBase [ 55%] Built target plot_hadamard_gpuparBase [ 63%] Built target perf_hadamard_gpupar_vectorize_O3 Scanning dependencies of target run_perf_hadamard_gpupar_vectorize_O3 [ 65%] Run perf_hadamard_gpupar_vectorize_O3 program micro_benchmarkAutoNs : nbCallPerTest = 405 evaluate hadamard : nbElement = 1000, timePerElement = 31.965 ns/el ± 0.348029, elapsedTime = 31965 ns ± 348.029 micro_benchmarkAutoNs : nbCallPerTest = 395 evaluate hadamard : nbElement = 2000, timePerElement = 16.6194 ns/el ± 0.242264, elapsedTime = 33238.7 ns ± 484.528 micro_benchmarkAutoNs : nbCallPerTest = 389 evaluate hadamard : nbElement = 3000, timePerElement = 11.3259 ns/el ± 0.144268, elapsedTime = 33977.6 ns ± 432.804 micro_benchmarkAutoNs : nbCallPerTest = 384 evaluate hadamard : nbElement = 4000, timePerElement = 8.43383 ns/el ± 0.091065, elapsedTime = 33735.3 ns ± 364.26 micro_benchmarkAutoNs : nbCallPerTest = 381 evaluate hadamard : nbElement = 5000, timePerElement = 6.89169 ns/el ± 0.0789938, elapsedTime = 34458.4 ns ± 394.969 micro_benchmarkAutoNs : nbCallPerTest = 363 evaluate hadamard : nbElement = 10000, timePerElement = 3.64085 ns/el ± 0.0440259, elapsedTime = 36408.5 ns ± 440.259 micro_benchmarkAutoNs : nbCallPerTest = 276 evaluate hadamard : nbElement = 50000, timePerElement = 0.936963 ns/el ± 0.0122165, elapsedTime = 46848.1 ns ± 610.824 micro_benchmarkAutoNs : nbCallPerTest = 212 evaluate hadamard : nbElement = 100000, timePerElement = 0.612648 ns/el ± 0.00834856, elapsedTime = 61264.8 ns ± 834.856 micro_benchmarkAutoNs : nbCallPerTest = 144 evaluate hadamard : nbElement = 200000, timePerElement = 0.4457 ns/el ± 0.00602624, elapsedTime = 89140.1 ns ± 1205.25 micro_benchmarkAutoNs : nbCallPerTest = 74 evaluate hadamard : nbElement = 500000, timePerElement = 0.348136 ns/el ± 0.00475829, elapsedTime = 174068 ns ± 2379.15 micro_benchmarkAutoNs : nbCallPerTest = 28 evaluate hadamard : nbElement = 1000000, timePerElement = 0.461096 ns/el ± 0.0113995, elapsedTime = 461096 ns ± 11399.5 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.79824 ns/el ± 0.0127848, elapsedTime = 3.9912e+06 ns ± 63923.8 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.840527 ns/el ± 0.00425711, elapsedTime = 8.40527e+06 ns ± 42571.1 [ 65%] Built target run_perf_hadamard_gpupar_vectorize_O3 Scanning dependencies of target run_perf_hadamard_gpupar_O0 [ 68%] Built target run_perf_hadamard_gpupar_O0 Scanning dependencies of target run_perf_hadamard_gpupar_O1 [ 71%] Built target run_perf_hadamard_gpupar_O1 Scanning dependencies of target run_perf_hadamard_gpupar_O3 [ 73%] Built target run_perf_hadamard_gpupar_O3 Scanning dependencies of target run_perf_hadamard_gpupar_O2 [ 76%] Built target run_perf_hadamard_gpupar_O2 [ 84%] Built target perf_hadamard_gpupar_vectorize_O4 Scanning dependencies of target run_perf_hadamard_gpupar_vectorize_O4 [ 86%] Run perf_hadamard_gpupar_vectorize_O4 program micro_benchmarkAutoNs : nbCallPerTest = 409 evaluate hadamard : nbElement = 1000, timePerElement = 32.1256 ns/el ± 0.419265, elapsedTime = 32125.6 ns ± 419.265 micro_benchmarkAutoNs : nbCallPerTest = 394 evaluate hadamard : nbElement = 2000, timePerElement = 16.5534 ns/el ± 0.192669, elapsedTime = 33106.8 ns ± 385.338 micro_benchmarkAutoNs : nbCallPerTest = 378 evaluate hadamard : nbElement = 3000, timePerElement = 11.2921 ns/el ± 0.171324, elapsedTime = 33876.3 ns ± 513.972 micro_benchmarkAutoNs : nbCallPerTest = 392 evaluate hadamard : nbElement = 4000, timePerElement = 8.49155 ns/el ± 0.109366, elapsedTime = 33966.2 ns ± 437.465 micro_benchmarkAutoNs : nbCallPerTest = 381 evaluate hadamard : nbElement = 5000, timePerElement = 6.89543 ns/el ± 0.0778528, elapsedTime = 34477.2 ns ± 389.264 micro_benchmarkAutoNs : nbCallPerTest = 362 evaluate hadamard : nbElement = 10000, timePerElement = 3.62296 ns/el ± 0.0365081, elapsedTime = 36229.6 ns ± 365.081 micro_benchmarkAutoNs : nbCallPerTest = 282 evaluate hadamard : nbElement = 50000, timePerElement = 0.933921 ns/el ± 0.00855928, elapsedTime = 46696 ns ± 427.964 micro_benchmarkAutoNs : nbCallPerTest = 208 evaluate hadamard : nbElement = 100000, timePerElement = 0.614975 ns/el ± 0.0108843, elapsedTime = 61497.5 ns ± 1088.43 micro_benchmarkAutoNs : nbCallPerTest = 125 evaluate hadamard : nbElement = 200000, timePerElement = 0.448284 ns/el ± 0.00809339, elapsedTime = 89656.7 ns ± 1618.68 micro_benchmarkAutoNs : nbCallPerTest = 73 evaluate hadamard : nbElement = 500000, timePerElement = 0.355934 ns/el ± 0.00690746, elapsedTime = 177967 ns ± 3453.73 micro_benchmarkAutoNs : nbCallPerTest = 28 evaluate hadamard : nbElement = 1000000, timePerElement = 0.463625 ns/el ± 0.0100701, elapsedTime = 463625 ns ± 10070.1 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.79523 ns/el ± 0.0130826, elapsedTime = 3.97615e+06 ns ± 65412.8 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.842765 ns/el ± 0.0055305, elapsedTime = 8.42765e+06 ns ± 55305 [ 86%] Built target run_perf_hadamard_gpupar_vectorize_O4 Scanning dependencies of target run_perf_hadamard_gpupar_O4 [ 89%] Built target run_perf_hadamard_gpupar_O4 Scanning dependencies of target run_all [ 89%] Built target run_all Scanning dependencies of target plot_thread [ 89%] Built target plot_thread Scanning dependencies of target plot_hadamard_gpuparVectorize [ 92%] Call gnuplot hadamard_gpuparVectorize [100%] Built target plot_hadamard_gpuparVectorize Scanning dependencies of target plot_all [100%] Built target plot_all |
La figure 7 montre les performances obtenues sur un CPU avec 8 coeurs (4 coeurs hyperthreadés).
Figure 7 : Performances de notre produit de hadamard avec G++. À gauche : le temps total. À droite : le temps par élément.
La figure 8 montre les performances obtenues sur un CPU avec 8 coeurs (4 coeurs hyperthreadés) avec la vectorisation.
Figure 8 : Performances de notre produit de hadamard vectorisé avec G++. À gauche : le temps total. À droite : le temps par élément.
Le lecteur perspicace aura noté que nous avons utiliser le compilateur G++ et non NVC++, mais c'était pour montrer que tout fonctionne.
On peut au moins dire que les options d'optimisation -O3 et -O4 se valent.