3.4.5.2 : Performances
1 |
make plot_all |
Detail des performances
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
make plot_all [ 7%] Built target perf_hadamard_gpupar_O3 [ 15%] Built target perf_hadamard_gpupar_O0 [ 23%] Built target perf_hadamard_gpupar_O4 [ 31%] Built target perf_hadamard_gpupar_O1 [ 39%] Built target perf_hadamard_gpupar_O2 Scanning dependencies of target plot_hadamard_gpuparBase [ 42%] Run perf_hadamard_gpupar_O4 program micro_benchmarkAutoNs : nbCallPerTest = 399 evaluate hadamard : nbElement = 1000, timePerElement = 33.4132 ns/el ± 0.55279, elapsedTime = 33413.2 ns ± 552.79 micro_benchmarkAutoNs : nbCallPerTest = 381 evaluate hadamard : nbElement = 2000, timePerElement = 17.2905 ns/el ± 0.255431, elapsedTime = 34581 ns ± 510.862 micro_benchmarkAutoNs : nbCallPerTest = 372 evaluate hadamard : nbElement = 3000, timePerElement = 11.7511 ns/el ± 0.18808, elapsedTime = 35253.2 ns ± 564.24 micro_benchmarkAutoNs : nbCallPerTest = 373 evaluate hadamard : nbElement = 4000, timePerElement = 8.71583 ns/el ± 0.107584, elapsedTime = 34863.3 ns ± 430.334 micro_benchmarkAutoNs : nbCallPerTest = 369 evaluate hadamard : nbElement = 5000, timePerElement = 7.15768 ns/el ± 0.11584, elapsedTime = 35788.4 ns ± 579.202 micro_benchmarkAutoNs : nbCallPerTest = 351 evaluate hadamard : nbElement = 10000, timePerElement = 3.71963 ns/el ± 0.061178, elapsedTime = 37196.3 ns ± 611.78 micro_benchmarkAutoNs : nbCallPerTest = 285 evaluate hadamard : nbElement = 50000, timePerElement = 0.921453 ns/el ± 0.00992434, elapsedTime = 46072.6 ns ± 496.217 micro_benchmarkAutoNs : nbCallPerTest = 220 evaluate hadamard : nbElement = 100000, timePerElement = 0.594931 ns/el ± 0.00881871, elapsedTime = 59493.1 ns ± 881.871 micro_benchmarkAutoNs : nbCallPerTest = 150 evaluate hadamard : nbElement = 200000, timePerElement = 0.429775 ns/el ± 0.00591954, elapsedTime = 85955 ns ± 1183.91 micro_benchmarkAutoNs : nbCallPerTest = 77 evaluate hadamard : nbElement = 500000, timePerElement = 0.34063 ns/el ± 0.00723432, elapsedTime = 170315 ns ± 3617.16 micro_benchmarkAutoNs : nbCallPerTest = 28 evaluate hadamard : nbElement = 1000000, timePerElement = 0.461107 ns/el ± 0.0127116, elapsedTime = 461107 ns ± 12711.6 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.778033 ns/el ± 0.0118725, elapsedTime = 3.89017e+06 ns ± 59362.4 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.817238 ns/el ± 0.00523804, elapsedTime = 8.17238e+06 ns ± 52380.4 [ 44%] Run perf_hadamard_gpupar_O0 program micro_benchmarkAutoNs : nbCallPerTest = 286 evaluate hadamard : nbElement = 1000, timePerElement = 45.8364 ns/el ± 0.657482, elapsedTime = 45836.4 ns ± 657.482 micro_benchmarkAutoNs : nbCallPerTest = 262 evaluate hadamard : nbElement = 2000, timePerElement = 24.8945 ns/el ± 0.362912, elapsedTime = 49788.9 ns ± 725.824 micro_benchmarkAutoNs : nbCallPerTest = 245 evaluate hadamard : nbElement = 3000, timePerElement = 17.6691 ns/el ± 0.22554, elapsedTime = 53007.4 ns ± 676.621 micro_benchmarkAutoNs : nbCallPerTest = 236 evaluate hadamard : nbElement = 4000, timePerElement = 13.9821 ns/el ± 0.207076, elapsedTime = 55928.2 ns ± 828.303 micro_benchmarkAutoNs : nbCallPerTest = 221 evaluate hadamard : nbElement = 5000, timePerElement = 11.7911 ns/el ± 0.179819, elapsedTime = 58955.4 ns ± 899.093 micro_benchmarkAutoNs : nbCallPerTest = 185 evaluate hadamard : nbElement = 10000, timePerElement = 7.04588 ns/el ± 0.092295, elapsedTime = 70458.8 ns ± 922.95 micro_benchmarkAutoNs : nbCallPerTest = 80 evaluate hadamard : nbElement = 50000, timePerElement = 3.23589 ns/el ± 0.0464096, elapsedTime = 161795 ns ± 2320.48 micro_benchmarkAutoNs : nbCallPerTest = 48 evaluate hadamard : nbElement = 100000, timePerElement = 2.65516 ns/el ± 0.0328916, elapsedTime = 265516 ns ± 3289.16 micro_benchmarkAutoNs : nbCallPerTest = 27 evaluate hadamard : nbElement = 200000, timePerElement = 2.3489 ns/el ± 0.0396336, elapsedTime = 469780 ns ± 7926.72 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 500000, timePerElement = 2.14414 ns/el ± 0.0205214, elapsedTime = 1.07207e+06 ns ± 10260.7 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 1000000, timePerElement = 2.10961 ns/el ± 0.0332288, elapsedTime = 2.10961e+06 ns ± 33228.8 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 2.07867 ns/el ± 0.00696241, elapsedTime = 1.03933e+07 ns ± 34812.1 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 2.06289 ns/el ± 0.0069469, elapsedTime = 2.06289e+07 ns ± 69469 [ 47%] Run perf_hadamard_gpupar_O1 program micro_benchmarkAutoNs : nbCallPerTest = 431 evaluate hadamard : nbElement = 1000, timePerElement = 30.8522 ns/el ± 0.359313, elapsedTime = 30852.2 ns ± 359.313 micro_benchmarkAutoNs : nbCallPerTest = 418 evaluate hadamard : nbElement = 2000, timePerElement = 15.7892 ns/el ± 0.196957, elapsedTime = 31578.4 ns ± 393.913 micro_benchmarkAutoNs : nbCallPerTest = 400 evaluate hadamard : nbElement = 3000, timePerElement = 10.9012 ns/el ± 0.166795, elapsedTime = 32703.7 ns ± 500.385 micro_benchmarkAutoNs : nbCallPerTest = 396 evaluate hadamard : nbElement = 4000, timePerElement = 8.36214 ns/el ± 0.130573, elapsedTime = 33448.6 ns ± 522.29 micro_benchmarkAutoNs : nbCallPerTest = 384 evaluate hadamard : nbElement = 5000, timePerElement = 6.88175 ns/el ± 0.110712, elapsedTime = 34408.7 ns ± 553.56 micro_benchmarkAutoNs : nbCallPerTest = 304 evaluate hadamard : nbElement = 10000, timePerElement = 3.82988 ns/el ± 0.0610476, elapsedTime = 38298.8 ns ± 610.476 micro_benchmarkAutoNs : nbCallPerTest = 190 evaluate hadamard : nbElement = 50000, timePerElement = 1.37675 ns/el ± 0.0379184, elapsedTime = 68837.4 ns ± 1895.92 micro_benchmarkAutoNs : nbCallPerTest = 125 evaluate hadamard : nbElement = 100000, timePerElement = 0.961271 ns/el ± 0.0158151, elapsedTime = 96127.1 ns ± 1581.51 micro_benchmarkAutoNs : nbCallPerTest = 87 evaluate hadamard : nbElement = 200000, timePerElement = 0.747319 ns/el ± 0.0116808, elapsedTime = 149464 ns ± 2336.17 micro_benchmarkAutoNs : nbCallPerTest = 42 evaluate hadamard : nbElement = 500000, timePerElement = 0.613129 ns/el ± 0.00657316, elapsedTime = 306564 ns ± 3286.58 micro_benchmarkAutoNs : nbCallPerTest = 21 evaluate hadamard : nbElement = 1000000, timePerElement = 0.620131 ns/el ± 0.00786868, elapsedTime = 620131 ns ± 7868.68 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.776533 ns/el ± 0.0140583, elapsedTime = 3.88267e+06 ns ± 70291.3 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.811779 ns/el ± 0.00481845, elapsedTime = 8.11779e+06 ns ± 48184.5 [ 50%] Run perf_hadamard_gpupar_O2 program micro_benchmarkAutoNs : nbCallPerTest = 393 evaluate hadamard : nbElement = 1000, timePerElement = 33.6331 ns/el ± 0.491203, elapsedTime = 33633.1 ns ± 491.203 micro_benchmarkAutoNs : nbCallPerTest = 377 evaluate hadamard : nbElement = 2000, timePerElement = 17.3284 ns/el ± 0.201045, elapsedTime = 34656.8 ns ± 402.09 micro_benchmarkAutoNs : nbCallPerTest = 369 evaluate hadamard : nbElement = 3000, timePerElement = 11.8028 ns/el ± 0.184893, elapsedTime = 35408.5 ns ± 554.678 micro_benchmarkAutoNs : nbCallPerTest = 367 evaluate hadamard : nbElement = 4000, timePerElement = 8.73433 ns/el ± 0.115124, elapsedTime = 34937.3 ns ± 460.496 micro_benchmarkAutoNs : nbCallPerTest = 370 evaluate hadamard : nbElement = 5000, timePerElement = 7.15674 ns/el ± 0.0900864, elapsedTime = 35783.7 ns ± 450.432 micro_benchmarkAutoNs : nbCallPerTest = 354 evaluate hadamard : nbElement = 10000, timePerElement = 3.69594 ns/el ± 0.0417309, elapsedTime = 36959.4 ns ± 417.309 micro_benchmarkAutoNs : nbCallPerTest = 283 evaluate hadamard : nbElement = 50000, timePerElement = 0.915761 ns/el ± 0.0119915, elapsedTime = 45788.1 ns ± 599.573 micro_benchmarkAutoNs : nbCallPerTest = 216 evaluate hadamard : nbElement = 100000, timePerElement = 0.593649 ns/el ± 0.00802338, elapsedTime = 59364.9 ns ± 802.338 micro_benchmarkAutoNs : nbCallPerTest = 151 evaluate hadamard : nbElement = 200000, timePerElement = 0.430794 ns/el ± 0.00523984, elapsedTime = 86158.9 ns ± 1047.97 micro_benchmarkAutoNs : nbCallPerTest = 76 evaluate hadamard : nbElement = 500000, timePerElement = 0.343299 ns/el ± 0.00610536, elapsedTime = 171649 ns ± 3052.68 micro_benchmarkAutoNs : nbCallPerTest = 28 evaluate hadamard : nbElement = 1000000, timePerElement = 0.458034 ns/el ± 0.00759681, elapsedTime = 458034 ns ± 7596.81 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.776961 ns/el ± 0.011909, elapsedTime = 3.8848e+06 ns ± 59545 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.823439 ns/el ± 0.00940426, elapsedTime = 8.23439e+06 ns ± 94042.6 [ 52%] Run perf_hadamard_gpupar_O3 program micro_benchmarkAutoNs : nbCallPerTest = 392 evaluate hadamard : nbElement = 1000, timePerElement = 33.4706 ns/el ± 0.331422, elapsedTime = 33470.6 ns ± 331.422 micro_benchmarkAutoNs : nbCallPerTest = 372 evaluate hadamard : nbElement = 2000, timePerElement = 17.298 ns/el ± 0.217674, elapsedTime = 34596 ns ± 435.349 micro_benchmarkAutoNs : nbCallPerTest = 368 evaluate hadamard : nbElement = 3000, timePerElement = 11.7878 ns/el ± 0.181141, elapsedTime = 35363.3 ns ± 543.424 micro_benchmarkAutoNs : nbCallPerTest = 376 evaluate hadamard : nbElement = 4000, timePerElement = 8.78477 ns/el ± 0.136388, elapsedTime = 35139.1 ns ± 545.551 micro_benchmarkAutoNs : nbCallPerTest = 368 evaluate hadamard : nbElement = 5000, timePerElement = 7.14936 ns/el ± 0.0896933, elapsedTime = 35746.8 ns ± 448.466 micro_benchmarkAutoNs : nbCallPerTest = 351 evaluate hadamard : nbElement = 10000, timePerElement = 3.69523 ns/el ± 0.0632912, elapsedTime = 36952.3 ns ± 632.912 micro_benchmarkAutoNs : nbCallPerTest = 285 evaluate hadamard : nbElement = 50000, timePerElement = 0.917107 ns/el ± 0.00936547, elapsedTime = 45855.4 ns ± 468.273 micro_benchmarkAutoNs : nbCallPerTest = 219 evaluate hadamard : nbElement = 100000, timePerElement = 0.594568 ns/el ± 0.00711747, elapsedTime = 59456.8 ns ± 711.747 micro_benchmarkAutoNs : nbCallPerTest = 149 evaluate hadamard : nbElement = 200000, timePerElement = 0.429941 ns/el ± 0.00621204, elapsedTime = 85988.2 ns ± 1242.41 micro_benchmarkAutoNs : nbCallPerTest = 77 evaluate hadamard : nbElement = 500000, timePerElement = 0.336579 ns/el ± 0.00577335, elapsedTime = 168289 ns ± 2886.67 micro_benchmarkAutoNs : nbCallPerTest = 28 evaluate hadamard : nbElement = 1000000, timePerElement = 0.457861 ns/el ± 0.00813642, elapsedTime = 457861 ns ± 8136.42 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.779107 ns/el ± 0.0134428, elapsedTime = 3.89554e+06 ns ± 67213.8 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.816606 ns/el ± 0.00473954, elapsedTime = 8.16606e+06 ns ± 47395.4 [ 55%] Call gnuplot hadamard_gpuparBase [ 55%] Built target plot_hadamard_gpuparBase [ 63%] Built target perf_hadamard_gpupar_vectorize_O3 Scanning dependencies of target run_perf_hadamard_gpupar_vectorize_O3 [ 65%] Run perf_hadamard_gpupar_vectorize_O3 program micro_benchmarkAutoNs : nbCallPerTest = 392 evaluate hadamard : nbElement = 1000, timePerElement = 33.4542 ns/el ± 0.391268, elapsedTime = 33454.2 ns ± 391.268 micro_benchmarkAutoNs : nbCallPerTest = 380 evaluate hadamard : nbElement = 2000, timePerElement = 17.3171 ns/el ± 0.225818, elapsedTime = 34634.2 ns ± 451.635 micro_benchmarkAutoNs : nbCallPerTest = 372 evaluate hadamard : nbElement = 3000, timePerElement = 11.7386 ns/el ± 0.137196, elapsedTime = 35215.7 ns ± 411.588 micro_benchmarkAutoNs : nbCallPerTest = 377 evaluate hadamard : nbElement = 4000, timePerElement = 8.74554 ns/el ± 0.128382, elapsedTime = 34982.1 ns ± 513.526 micro_benchmarkAutoNs : nbCallPerTest = 369 evaluate hadamard : nbElement = 5000, timePerElement = 7.13623 ns/el ± 0.0971973, elapsedTime = 35681.1 ns ± 485.987 micro_benchmarkAutoNs : nbCallPerTest = 350 evaluate hadamard : nbElement = 10000, timePerElement = 3.71016 ns/el ± 0.0438853, elapsedTime = 37101.6 ns ± 438.853 micro_benchmarkAutoNs : nbCallPerTest = 285 evaluate hadamard : nbElement = 50000, timePerElement = 0.916567 ns/el ± 0.00980597, elapsedTime = 45828.4 ns ± 490.299 micro_benchmarkAutoNs : nbCallPerTest = 220 evaluate hadamard : nbElement = 100000, timePerElement = 0.593687 ns/el ± 0.007022, elapsedTime = 59368.7 ns ± 702.2 micro_benchmarkAutoNs : nbCallPerTest = 150 evaluate hadamard : nbElement = 200000, timePerElement = 0.430899 ns/el ± 0.00625281, elapsedTime = 86179.8 ns ± 1250.56 micro_benchmarkAutoNs : nbCallPerTest = 78 evaluate hadamard : nbElement = 500000, timePerElement = 0.333645 ns/el ± 0.00555386, elapsedTime = 166822 ns ± 2776.93 micro_benchmarkAutoNs : nbCallPerTest = 28 evaluate hadamard : nbElement = 1000000, timePerElement = 0.461771 ns/el ± 0.0080523, elapsedTime = 461771 ns ± 8052.3 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.778288 ns/el ± 0.0110582, elapsedTime = 3.89144e+06 ns ± 55290.8 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.815769 ns/el ± 0.00490741, elapsedTime = 8.15769e+06 ns ± 49074.1 [ 65%] Built target run_perf_hadamard_gpupar_vectorize_O3 Scanning dependencies of target run_perf_hadamard_gpupar_O0 [ 68%] Built target run_perf_hadamard_gpupar_O0 Scanning dependencies of target run_perf_hadamard_gpupar_O1 [ 71%] Built target run_perf_hadamard_gpupar_O1 Scanning dependencies of target run_perf_hadamard_gpupar_O3 [ 73%] Built target run_perf_hadamard_gpupar_O3 Scanning dependencies of target run_perf_hadamard_gpupar_O2 [ 76%] Built target run_perf_hadamard_gpupar_O2 [ 84%] Built target perf_hadamard_gpupar_vectorize_O4 Scanning dependencies of target run_perf_hadamard_gpupar_vectorize_O4 [ 86%] Run perf_hadamard_gpupar_vectorize_O4 program micro_benchmarkAutoNs : nbCallPerTest = 383 evaluate hadamard : nbElement = 1000, timePerElement = 33.5663 ns/el ± 0.36282, elapsedTime = 33566.3 ns ± 362.82 micro_benchmarkAutoNs : nbCallPerTest = 374 evaluate hadamard : nbElement = 2000, timePerElement = 17.3563 ns/el ± 0.214874, elapsedTime = 34712.7 ns ± 429.747 micro_benchmarkAutoNs : nbCallPerTest = 369 evaluate hadamard : nbElement = 3000, timePerElement = 11.8179 ns/el ± 0.175017, elapsedTime = 35453.7 ns ± 525.051 micro_benchmarkAutoNs : nbCallPerTest = 375 evaluate hadamard : nbElement = 4000, timePerElement = 8.76842 ns/el ± 0.111842, elapsedTime = 35073.7 ns ± 447.367 micro_benchmarkAutoNs : nbCallPerTest = 369 evaluate hadamard : nbElement = 5000, timePerElement = 7.17709 ns/el ± 0.0918638, elapsedTime = 35885.4 ns ± 459.319 micro_benchmarkAutoNs : nbCallPerTest = 353 evaluate hadamard : nbElement = 10000, timePerElement = 3.72602 ns/el ± 0.0513345, elapsedTime = 37260.2 ns ± 513.345 micro_benchmarkAutoNs : nbCallPerTest = 282 evaluate hadamard : nbElement = 50000, timePerElement = 0.928783 ns/el ± 0.0126429, elapsedTime = 46439.1 ns ± 632.143 micro_benchmarkAutoNs : nbCallPerTest = 216 evaluate hadamard : nbElement = 100000, timePerElement = 0.598947 ns/el ± 0.00882292, elapsedTime = 59894.7 ns ± 882.292 micro_benchmarkAutoNs : nbCallPerTest = 150 evaluate hadamard : nbElement = 200000, timePerElement = 0.431421 ns/el ± 0.00684615, elapsedTime = 86284.2 ns ± 1369.23 micro_benchmarkAutoNs : nbCallPerTest = 77 evaluate hadamard : nbElement = 500000, timePerElement = 0.336947 ns/el ± 0.00536625, elapsedTime = 168474 ns ± 2683.13 micro_benchmarkAutoNs : nbCallPerTest = 28 evaluate hadamard : nbElement = 1000000, timePerElement = 0.461204 ns/el ± 0.0119023, elapsedTime = 461204 ns ± 11902.3 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 5000000, timePerElement = 0.775994 ns/el ± 0.0128051, elapsedTime = 3.87997e+06 ns ± 64025.7 micro_benchmarkAutoNs : nbCallPerTest = 10 evaluate hadamard : nbElement = 10000000, timePerElement = 0.819109 ns/el ± 0.00614954, elapsedTime = 8.19109e+06 ns ± 61495.4 [ 86%] Built target run_perf_hadamard_gpupar_vectorize_O4 Scanning dependencies of target run_perf_hadamard_gpupar_O4 [ 89%] Built target run_perf_hadamard_gpupar_O4 Scanning dependencies of target run_all [ 89%] Built target run_all Scanning dependencies of target plot_thread [ 89%] Built target plot_thread Scanning dependencies of target plot_hadamard_gpuparVectorize [ 92%] Call gnuplot hadamard_gpuparVectorize [100%] Built target plot_hadamard_gpuparVectorize Scanning dependencies of target plot_all [100%] Built target plot_all |
La figure 9 montre les performances obtenues sur un CPU avec 8 coeurs (4 coeurs hyperthreadés).
Figure 9 : Performances de notre produit de hadamard avec NVC++. À gauche : le temps total. À droite : le temps par élément.
La figure 10 montre les performances obtenues sur un CPU avec 8 coeurs (4 coeurs hyperthreadés) avec la vectorisation.
Figure 10 : Performances de notre produit de hadamard vectorisé avec NVC++. À gauche : le temps total. À droite : le temps par élément.