3.4.4.2 : Performances

1
make plot_all



Detail des performances
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
make plot_all
[  7%] Built target perf_hadamard_gpupar_O3
[ 15%] Built target perf_hadamard_gpupar_O0
[ 23%] Built target perf_hadamard_gpupar_O4
[ 31%] Built target perf_hadamard_gpupar_O1
[ 39%] Built target perf_hadamard_gpupar_O2
Scanning dependencies of target plot_hadamard_gpuparBase
[ 42%] Run perf_hadamard_gpupar_O4 program
micro_benchmarkAutoNs : nbCallPerTest = 422
evaluate hadamard : nbElement = 1000, timePerElement = 30.8812 ns/el ± 0.23229, elapsedTime = 30881.2 ns ± 232.29
micro_benchmarkAutoNs : nbCallPerTest = 420
evaluate hadamard : nbElement = 2000, timePerElement = 15.5982 ns/el ± 0.1222, elapsedTime = 31196.4 ns ± 244.401
micro_benchmarkAutoNs : nbCallPerTest = 406
evaluate hadamard : nbElement = 3000, timePerElement = 10.6674 ns/el ± 0.0697921, elapsedTime = 32002.3 ns ± 209.376
micro_benchmarkAutoNs : nbCallPerTest = 411
evaluate hadamard : nbElement = 4000, timePerElement = 8.00122 ns/el ± 0.0592695, elapsedTime = 32004.9 ns ± 237.078
micro_benchmarkAutoNs : nbCallPerTest = 405
evaluate hadamard : nbElement = 5000, timePerElement = 6.48158 ns/el ± 0.0412615, elapsedTime = 32407.9 ns ± 206.307
micro_benchmarkAutoNs : nbCallPerTest = 391
evaluate hadamard : nbElement = 10000, timePerElement = 3.37552 ns/el ± 0.0244, elapsedTime = 33755.2 ns ± 244
micro_benchmarkAutoNs : nbCallPerTest = 289
evaluate hadamard : nbElement = 50000, timePerElement = 0.925408 ns/el ± 0.0181593, elapsedTime = 46270.4 ns ± 907.967
micro_benchmarkAutoNs : nbCallPerTest = 213
evaluate hadamard : nbElement = 100000, timePerElement = 0.667545 ns/el ± 0.0288547, elapsedTime = 66754.5 ns ± 2885.47
micro_benchmarkAutoNs : nbCallPerTest = 130
evaluate hadamard : nbElement = 200000, timePerElement = 0.510854 ns/el ± 0.0250128, elapsedTime = 102171 ns ± 5002.56
micro_benchmarkAutoNs : nbCallPerTest = 71
evaluate hadamard : nbElement = 500000, timePerElement = 0.358167 ns/el ± 0.00341734, elapsedTime = 179084 ns ± 1708.67
micro_benchmarkAutoNs : nbCallPerTest = 28
evaluate hadamard : nbElement = 1000000, timePerElement = 0.456917 ns/el ± 0.00416396, elapsedTime = 456917 ns ± 4163.96
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.749144 ns/el ± 0.00796663, elapsedTime = 3.74572e+06 ns ± 39833.2
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.786622 ns/el ± 0.00371752, elapsedTime = 7.86622e+06 ns ± 37175.2
[ 44%] Run perf_hadamard_gpupar_O0 program
micro_benchmarkAutoNs : nbCallPerTest = 97
evaluate hadamard : nbElement = 1000, timePerElement = 133.665 ns/el ± 1.47425, elapsedTime = 133665 ns ± 1474.25
micro_benchmarkAutoNs : nbCallPerTest = 85
evaluate hadamard : nbElement = 2000, timePerElement = 75.9769 ns/el ± 0.754512, elapsedTime = 151954 ns ± 1509.02
micro_benchmarkAutoNs : nbCallPerTest = 79
evaluate hadamard : nbElement = 3000, timePerElement = 54.496 ns/el ± 0.700398, elapsedTime = 163488 ns ± 2101.19
micro_benchmarkAutoNs : nbCallPerTest = 75
evaluate hadamard : nbElement = 4000, timePerElement = 43.3872 ns/el ± 0.504569, elapsedTime = 173549 ns ± 2018.28
micro_benchmarkAutoNs : nbCallPerTest = 71
evaluate hadamard : nbElement = 5000, timePerElement = 36.8775 ns/el ± 0.549347, elapsedTime = 184387 ns ± 2746.74
micro_benchmarkAutoNs : nbCallPerTest = 56
evaluate hadamard : nbElement = 10000, timePerElement = 23.0748 ns/el ± 0.283516, elapsedTime = 230748 ns ± 2835.16
micro_benchmarkAutoNs : nbCallPerTest = 25
evaluate hadamard : nbElement = 50000, timePerElement = 10.4635 ns/el ± 0.128517, elapsedTime = 523175 ns ± 6425.87
micro_benchmarkAutoNs : nbCallPerTest = 14
evaluate hadamard : nbElement = 100000, timePerElement = 8.73013 ns/el ± 0.147506, elapsedTime = 873013 ns ± 14750.6
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 200000, timePerElement = 7.71121 ns/el ± 0.0707096, elapsedTime = 1.54224e+06 ns ± 14141.9
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 500000, timePerElement = 7.14712 ns/el ± 0.104409, elapsedTime = 3.57356e+06 ns ± 52204.6
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 1000000, timePerElement = 7.00085 ns/el ± 0.087528, elapsedTime = 7.00085e+06 ns ± 87528
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 6.78909 ns/el ± 0.0155163, elapsedTime = 3.39454e+07 ns ± 77581.3
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 6.73782 ns/el ± 0.00961987, elapsedTime = 6.73782e+07 ns ± 96198.7
[ 47%] Run perf_hadamard_gpupar_O1 program
micro_benchmarkAutoNs : nbCallPerTest = 442
evaluate hadamard : nbElement = 1000, timePerElement = 29.3987 ns/el ± 0.34706, elapsedTime = 29398.7 ns ± 347.06
micro_benchmarkAutoNs : nbCallPerTest = 427
evaluate hadamard : nbElement = 2000, timePerElement = 15.3083 ns/el ± 0.142779, elapsedTime = 30616.6 ns ± 285.558
micro_benchmarkAutoNs : nbCallPerTest = 417
evaluate hadamard : nbElement = 3000, timePerElement = 10.578 ns/el ± 0.132579, elapsedTime = 31734 ns ± 397.737
micro_benchmarkAutoNs : nbCallPerTest = 405
evaluate hadamard : nbElement = 4000, timePerElement = 8.21038 ns/el ± 0.0820182, elapsedTime = 32841.5 ns ± 328.073
micro_benchmarkAutoNs : nbCallPerTest = 385
evaluate hadamard : nbElement = 5000, timePerElement = 6.74212 ns/el ± 0.072288, elapsedTime = 33710.6 ns ± 361.44
micro_benchmarkAutoNs : nbCallPerTest = 338
evaluate hadamard : nbElement = 10000, timePerElement = 3.83246 ns/el ± 0.032873, elapsedTime = 38324.6 ns ± 328.73
micro_benchmarkAutoNs : nbCallPerTest = 175
evaluate hadamard : nbElement = 50000, timePerElement = 1.47353 ns/el ± 0.0193611, elapsedTime = 73676.5 ns ± 968.053
micro_benchmarkAutoNs : nbCallPerTest = 117
evaluate hadamard : nbElement = 100000, timePerElement = 1.09855 ns/el ± 0.0132944, elapsedTime = 109855 ns ± 1329.44
micro_benchmarkAutoNs : nbCallPerTest = 73
evaluate hadamard : nbElement = 200000, timePerElement = 0.891388 ns/el ± 0.00807633, elapsedTime = 178278 ns ± 1615.27
micro_benchmarkAutoNs : nbCallPerTest = 34
evaluate hadamard : nbElement = 500000, timePerElement = 0.760071 ns/el ± 0.0071964, elapsedTime = 380035 ns ± 3598.2
micro_benchmarkAutoNs : nbCallPerTest = 17
evaluate hadamard : nbElement = 1000000, timePerElement = 0.737363 ns/el ± 0.00578992, elapsedTime = 737363 ns ± 5789.92
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.751263 ns/el ± 0.0138082, elapsedTime = 3.75632e+06 ns ± 69040.9
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.780348 ns/el ± 0.00667971, elapsedTime = 7.80348e+06 ns ± 66797.1
[ 50%] Run perf_hadamard_gpupar_O2 program
micro_benchmarkAutoNs : nbCallPerTest = 432
evaluate hadamard : nbElement = 1000, timePerElement = 29.8853 ns/el ± 0.333225, elapsedTime = 29885.3 ns ± 333.225
micro_benchmarkAutoNs : nbCallPerTest = 424
evaluate hadamard : nbElement = 2000, timePerElement = 15.3606 ns/el ± 0.192006, elapsedTime = 30721.3 ns ± 384.013
micro_benchmarkAutoNs : nbCallPerTest = 410
evaluate hadamard : nbElement = 3000, timePerElement = 10.5133 ns/el ± 0.112758, elapsedTime = 31539.8 ns ± 338.274
micro_benchmarkAutoNs : nbCallPerTest = 411
evaluate hadamard : nbElement = 4000, timePerElement = 8.04945 ns/el ± 0.0672967, elapsedTime = 32197.8 ns ± 269.187
micro_benchmarkAutoNs : nbCallPerTest = 402
evaluate hadamard : nbElement = 5000, timePerElement = 6.59136 ns/el ± 0.0695574, elapsedTime = 32956.8 ns ± 347.787
micro_benchmarkAutoNs : nbCallPerTest = 351
evaluate hadamard : nbElement = 10000, timePerElement = 3.66702 ns/el ± 0.039932, elapsedTime = 36670.2 ns ± 399.32
micro_benchmarkAutoNs : nbCallPerTest = 196
evaluate hadamard : nbElement = 50000, timePerElement = 1.30901 ns/el ± 0.0124705, elapsedTime = 65450.7 ns ± 623.524
micro_benchmarkAutoNs : nbCallPerTest = 138
evaluate hadamard : nbElement = 100000, timePerElement = 0.940901 ns/el ± 0.0136015, elapsedTime = 94090.1 ns ± 1360.15
micro_benchmarkAutoNs : nbCallPerTest = 88
evaluate hadamard : nbElement = 200000, timePerElement = 0.738183 ns/el ± 0.0136821, elapsedTime = 147637 ns ± 2736.41
micro_benchmarkAutoNs : nbCallPerTest = 42
evaluate hadamard : nbElement = 500000, timePerElement = 0.611794 ns/el ± 0.0071958, elapsedTime = 305897 ns ± 3597.9
micro_benchmarkAutoNs : nbCallPerTest = 20
evaluate hadamard : nbElement = 1000000, timePerElement = 0.619036 ns/el ± 0.00413746, elapsedTime = 619036 ns ± 4137.46
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.752102 ns/el ± 0.0135541, elapsedTime = 3.76051e+06 ns ± 67770.6
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.785966 ns/el ± 0.00482216, elapsedTime = 7.85966e+06 ns ± 48221.6
[ 52%] Run perf_hadamard_gpupar_O3 program
micro_benchmarkAutoNs : nbCallPerTest = 411
evaluate hadamard : nbElement = 1000, timePerElement = 31.321 ns/el ± 0.345605, elapsedTime = 31321 ns ± 345.605
micro_benchmarkAutoNs : nbCallPerTest = 414
evaluate hadamard : nbElement = 2000, timePerElement = 15.8577 ns/el ± 0.212741, elapsedTime = 31715.4 ns ± 425.481
micro_benchmarkAutoNs : nbCallPerTest = 399
evaluate hadamard : nbElement = 3000, timePerElement = 10.8115 ns/el ± 0.120496, elapsedTime = 32434.5 ns ± 361.489
micro_benchmarkAutoNs : nbCallPerTest = 409
evaluate hadamard : nbElement = 4000, timePerElement = 8.06679 ns/el ± 0.084405, elapsedTime = 32267.2 ns ± 337.62
micro_benchmarkAutoNs : nbCallPerTest = 399
evaluate hadamard : nbElement = 5000, timePerElement = 6.55645 ns/el ± 0.0724905, elapsedTime = 32782.2 ns ± 362.452
micro_benchmarkAutoNs : nbCallPerTest = 385
evaluate hadamard : nbElement = 10000, timePerElement = 3.39989 ns/el ± 0.0365379, elapsedTime = 33998.9 ns ± 365.379
micro_benchmarkAutoNs : nbCallPerTest = 282
evaluate hadamard : nbElement = 50000, timePerElement = 0.93019 ns/el ± 0.0107346, elapsedTime = 46509.5 ns ± 536.73
micro_benchmarkAutoNs : nbCallPerTest = 207
evaluate hadamard : nbElement = 100000, timePerElement = 0.618014 ns/el ± 0.0079121, elapsedTime = 61801.4 ns ± 791.21
micro_benchmarkAutoNs : nbCallPerTest = 142
evaluate hadamard : nbElement = 200000, timePerElement = 0.453412 ns/el ± 0.00666807, elapsedTime = 90682.3 ns ± 1333.61
micro_benchmarkAutoNs : nbCallPerTest = 72
evaluate hadamard : nbElement = 500000, timePerElement = 0.360069 ns/el ± 0.0069826, elapsedTime = 180034 ns ± 3491.3
micro_benchmarkAutoNs : nbCallPerTest = 27
evaluate hadamard : nbElement = 1000000, timePerElement = 0.46339 ns/el ± 0.00800761, elapsedTime = 463390 ns ± 8007.61
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.755135 ns/el ± 0.0125613, elapsedTime = 3.77567e+06 ns ± 62806.6
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.794803 ns/el ± 0.0065402, elapsedTime = 7.94803e+06 ns ± 65402
[ 55%] Call gnuplot hadamard_gpuparBase
[ 55%] Built target plot_hadamard_gpuparBase
[ 63%] Built target perf_hadamard_gpupar_vectorize_O3
Scanning dependencies of target run_perf_hadamard_gpupar_vectorize_O3
[ 65%] Run perf_hadamard_gpupar_vectorize_O3 program
micro_benchmarkAutoNs : nbCallPerTest = 405
evaluate hadamard : nbElement = 1000, timePerElement = 31.965 ns/el ± 0.348029, elapsedTime = 31965 ns ± 348.029
micro_benchmarkAutoNs : nbCallPerTest = 395
evaluate hadamard : nbElement = 2000, timePerElement = 16.6194 ns/el ± 0.242264, elapsedTime = 33238.7 ns ± 484.528
micro_benchmarkAutoNs : nbCallPerTest = 389
evaluate hadamard : nbElement = 3000, timePerElement = 11.3259 ns/el ± 0.144268, elapsedTime = 33977.6 ns ± 432.804
micro_benchmarkAutoNs : nbCallPerTest = 384
evaluate hadamard : nbElement = 4000, timePerElement = 8.43383 ns/el ± 0.091065, elapsedTime = 33735.3 ns ± 364.26
micro_benchmarkAutoNs : nbCallPerTest = 381
evaluate hadamard : nbElement = 5000, timePerElement = 6.89169 ns/el ± 0.0789938, elapsedTime = 34458.4 ns ± 394.969
micro_benchmarkAutoNs : nbCallPerTest = 363
evaluate hadamard : nbElement = 10000, timePerElement = 3.64085 ns/el ± 0.0440259, elapsedTime = 36408.5 ns ± 440.259
micro_benchmarkAutoNs : nbCallPerTest = 276
evaluate hadamard : nbElement = 50000, timePerElement = 0.936963 ns/el ± 0.0122165, elapsedTime = 46848.1 ns ± 610.824
micro_benchmarkAutoNs : nbCallPerTest = 212
evaluate hadamard : nbElement = 100000, timePerElement = 0.612648 ns/el ± 0.00834856, elapsedTime = 61264.8 ns ± 834.856
micro_benchmarkAutoNs : nbCallPerTest = 144
evaluate hadamard : nbElement = 200000, timePerElement = 0.4457 ns/el ± 0.00602624, elapsedTime = 89140.1 ns ± 1205.25
micro_benchmarkAutoNs : nbCallPerTest = 74
evaluate hadamard : nbElement = 500000, timePerElement = 0.348136 ns/el ± 0.00475829, elapsedTime = 174068 ns ± 2379.15
micro_benchmarkAutoNs : nbCallPerTest = 28
evaluate hadamard : nbElement = 1000000, timePerElement = 0.461096 ns/el ± 0.0113995, elapsedTime = 461096 ns ± 11399.5
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.79824 ns/el ± 0.0127848, elapsedTime = 3.9912e+06 ns ± 63923.8
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.840527 ns/el ± 0.00425711, elapsedTime = 8.40527e+06 ns ± 42571.1
[ 65%] Built target run_perf_hadamard_gpupar_vectorize_O3
Scanning dependencies of target run_perf_hadamard_gpupar_O0
[ 68%] Built target run_perf_hadamard_gpupar_O0
Scanning dependencies of target run_perf_hadamard_gpupar_O1
[ 71%] Built target run_perf_hadamard_gpupar_O1
Scanning dependencies of target run_perf_hadamard_gpupar_O3
[ 73%] Built target run_perf_hadamard_gpupar_O3
Scanning dependencies of target run_perf_hadamard_gpupar_O2
[ 76%] Built target run_perf_hadamard_gpupar_O2
[ 84%] Built target perf_hadamard_gpupar_vectorize_O4
Scanning dependencies of target run_perf_hadamard_gpupar_vectorize_O4
[ 86%] Run perf_hadamard_gpupar_vectorize_O4 program
micro_benchmarkAutoNs : nbCallPerTest = 409
evaluate hadamard : nbElement = 1000, timePerElement = 32.1256 ns/el ± 0.419265, elapsedTime = 32125.6 ns ± 419.265
micro_benchmarkAutoNs : nbCallPerTest = 394
evaluate hadamard : nbElement = 2000, timePerElement = 16.5534 ns/el ± 0.192669, elapsedTime = 33106.8 ns ± 385.338
micro_benchmarkAutoNs : nbCallPerTest = 378
evaluate hadamard : nbElement = 3000, timePerElement = 11.2921 ns/el ± 0.171324, elapsedTime = 33876.3 ns ± 513.972
micro_benchmarkAutoNs : nbCallPerTest = 392
evaluate hadamard : nbElement = 4000, timePerElement = 8.49155 ns/el ± 0.109366, elapsedTime = 33966.2 ns ± 437.465
micro_benchmarkAutoNs : nbCallPerTest = 381
evaluate hadamard : nbElement = 5000, timePerElement = 6.89543 ns/el ± 0.0778528, elapsedTime = 34477.2 ns ± 389.264
micro_benchmarkAutoNs : nbCallPerTest = 362
evaluate hadamard : nbElement = 10000, timePerElement = 3.62296 ns/el ± 0.0365081, elapsedTime = 36229.6 ns ± 365.081
micro_benchmarkAutoNs : nbCallPerTest = 282
evaluate hadamard : nbElement = 50000, timePerElement = 0.933921 ns/el ± 0.00855928, elapsedTime = 46696 ns ± 427.964
micro_benchmarkAutoNs : nbCallPerTest = 208
evaluate hadamard : nbElement = 100000, timePerElement = 0.614975 ns/el ± 0.0108843, elapsedTime = 61497.5 ns ± 1088.43
micro_benchmarkAutoNs : nbCallPerTest = 125
evaluate hadamard : nbElement = 200000, timePerElement = 0.448284 ns/el ± 0.00809339, elapsedTime = 89656.7 ns ± 1618.68
micro_benchmarkAutoNs : nbCallPerTest = 73
evaluate hadamard : nbElement = 500000, timePerElement = 0.355934 ns/el ± 0.00690746, elapsedTime = 177967 ns ± 3453.73
micro_benchmarkAutoNs : nbCallPerTest = 28
evaluate hadamard : nbElement = 1000000, timePerElement = 0.463625 ns/el ± 0.0100701, elapsedTime = 463625 ns ± 10070.1
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.79523 ns/el ± 0.0130826, elapsedTime = 3.97615e+06 ns ± 65412.8
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.842765 ns/el ± 0.0055305, elapsedTime = 8.42765e+06 ns ± 55305
[ 86%] Built target run_perf_hadamard_gpupar_vectorize_O4
Scanning dependencies of target run_perf_hadamard_gpupar_O4
[ 89%] Built target run_perf_hadamard_gpupar_O4
Scanning dependencies of target run_all
[ 89%] Built target run_all
Scanning dependencies of target plot_thread
[ 89%] Built target plot_thread
Scanning dependencies of target plot_hadamard_gpuparVectorize
[ 92%] Call gnuplot hadamard_gpuparVectorize
[100%] Built target plot_hadamard_gpuparVectorize
Scanning dependencies of target plot_all
[100%] Built target plot_all


La figure 7 montre les performances obtenues sur un CPU avec 8 coeurs (4 coeurs hyperthreadés).

nothing nothing

Figure 7 : Performances de notre produit de hadamard avec G++. À gauche : le temps total. À droite : le temps par élément.



La figure 8 montre les performances obtenues sur un CPU avec 8 coeurs (4 coeurs hyperthreadés) avec la vectorisation.

nothing nothing

Figure 8 : Performances de notre produit de hadamard vectorisé avec G++. À gauche : le temps total. À droite : le temps par élément.



Le lecteur perspicace aura noté que nous avons utiliser le compilateur G++ et non NVC++, mais c'était pour montrer que tout fonctionne.


On peut au moins dire que les options d'optimisation -O3 et -O4 se valent.