3.4.5.2 : Performances

1
make plot_all



Detail des performances
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
make plot_all
[  7%] Built target perf_hadamard_gpupar_O3
[ 15%] Built target perf_hadamard_gpupar_O0
[ 23%] Built target perf_hadamard_gpupar_O4
[ 31%] Built target perf_hadamard_gpupar_O1
[ 39%] Built target perf_hadamard_gpupar_O2
Scanning dependencies of target plot_hadamard_gpuparBase
[ 42%] Run perf_hadamard_gpupar_O4 program
micro_benchmarkAutoNs : nbCallPerTest = 399
evaluate hadamard : nbElement = 1000, timePerElement = 33.4132 ns/el ± 0.55279, elapsedTime = 33413.2 ns ± 552.79
micro_benchmarkAutoNs : nbCallPerTest = 381
evaluate hadamard : nbElement = 2000, timePerElement = 17.2905 ns/el ± 0.255431, elapsedTime = 34581 ns ± 510.862
micro_benchmarkAutoNs : nbCallPerTest = 372
evaluate hadamard : nbElement = 3000, timePerElement = 11.7511 ns/el ± 0.18808, elapsedTime = 35253.2 ns ± 564.24
micro_benchmarkAutoNs : nbCallPerTest = 373
evaluate hadamard : nbElement = 4000, timePerElement = 8.71583 ns/el ± 0.107584, elapsedTime = 34863.3 ns ± 430.334
micro_benchmarkAutoNs : nbCallPerTest = 369
evaluate hadamard : nbElement = 5000, timePerElement = 7.15768 ns/el ± 0.11584, elapsedTime = 35788.4 ns ± 579.202
micro_benchmarkAutoNs : nbCallPerTest = 351
evaluate hadamard : nbElement = 10000, timePerElement = 3.71963 ns/el ± 0.061178, elapsedTime = 37196.3 ns ± 611.78
micro_benchmarkAutoNs : nbCallPerTest = 285
evaluate hadamard : nbElement = 50000, timePerElement = 0.921453 ns/el ± 0.00992434, elapsedTime = 46072.6 ns ± 496.217
micro_benchmarkAutoNs : nbCallPerTest = 220
evaluate hadamard : nbElement = 100000, timePerElement = 0.594931 ns/el ± 0.00881871, elapsedTime = 59493.1 ns ± 881.871
micro_benchmarkAutoNs : nbCallPerTest = 150
evaluate hadamard : nbElement = 200000, timePerElement = 0.429775 ns/el ± 0.00591954, elapsedTime = 85955 ns ± 1183.91
micro_benchmarkAutoNs : nbCallPerTest = 77
evaluate hadamard : nbElement = 500000, timePerElement = 0.34063 ns/el ± 0.00723432, elapsedTime = 170315 ns ± 3617.16
micro_benchmarkAutoNs : nbCallPerTest = 28
evaluate hadamard : nbElement = 1000000, timePerElement = 0.461107 ns/el ± 0.0127116, elapsedTime = 461107 ns ± 12711.6
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.778033 ns/el ± 0.0118725, elapsedTime = 3.89017e+06 ns ± 59362.4
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.817238 ns/el ± 0.00523804, elapsedTime = 8.17238e+06 ns ± 52380.4
[ 44%] Run perf_hadamard_gpupar_O0 program
micro_benchmarkAutoNs : nbCallPerTest = 286
evaluate hadamard : nbElement = 1000, timePerElement = 45.8364 ns/el ± 0.657482, elapsedTime = 45836.4 ns ± 657.482
micro_benchmarkAutoNs : nbCallPerTest = 262
evaluate hadamard : nbElement = 2000, timePerElement = 24.8945 ns/el ± 0.362912, elapsedTime = 49788.9 ns ± 725.824
micro_benchmarkAutoNs : nbCallPerTest = 245
evaluate hadamard : nbElement = 3000, timePerElement = 17.6691 ns/el ± 0.22554, elapsedTime = 53007.4 ns ± 676.621
micro_benchmarkAutoNs : nbCallPerTest = 236
evaluate hadamard : nbElement = 4000, timePerElement = 13.9821 ns/el ± 0.207076, elapsedTime = 55928.2 ns ± 828.303
micro_benchmarkAutoNs : nbCallPerTest = 221
evaluate hadamard : nbElement = 5000, timePerElement = 11.7911 ns/el ± 0.179819, elapsedTime = 58955.4 ns ± 899.093
micro_benchmarkAutoNs : nbCallPerTest = 185
evaluate hadamard : nbElement = 10000, timePerElement = 7.04588 ns/el ± 0.092295, elapsedTime = 70458.8 ns ± 922.95
micro_benchmarkAutoNs : nbCallPerTest = 80
evaluate hadamard : nbElement = 50000, timePerElement = 3.23589 ns/el ± 0.0464096, elapsedTime = 161795 ns ± 2320.48
micro_benchmarkAutoNs : nbCallPerTest = 48
evaluate hadamard : nbElement = 100000, timePerElement = 2.65516 ns/el ± 0.0328916, elapsedTime = 265516 ns ± 3289.16
micro_benchmarkAutoNs : nbCallPerTest = 27
evaluate hadamard : nbElement = 200000, timePerElement = 2.3489 ns/el ± 0.0396336, elapsedTime = 469780 ns ± 7926.72
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 500000, timePerElement = 2.14414 ns/el ± 0.0205214, elapsedTime = 1.07207e+06 ns ± 10260.7
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 1000000, timePerElement = 2.10961 ns/el ± 0.0332288, elapsedTime = 2.10961e+06 ns ± 33228.8
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 2.07867 ns/el ± 0.00696241, elapsedTime = 1.03933e+07 ns ± 34812.1
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 2.06289 ns/el ± 0.0069469, elapsedTime = 2.06289e+07 ns ± 69469
[ 47%] Run perf_hadamard_gpupar_O1 program
micro_benchmarkAutoNs : nbCallPerTest = 431
evaluate hadamard : nbElement = 1000, timePerElement = 30.8522 ns/el ± 0.359313, elapsedTime = 30852.2 ns ± 359.313
micro_benchmarkAutoNs : nbCallPerTest = 418
evaluate hadamard : nbElement = 2000, timePerElement = 15.7892 ns/el ± 0.196957, elapsedTime = 31578.4 ns ± 393.913
micro_benchmarkAutoNs : nbCallPerTest = 400
evaluate hadamard : nbElement = 3000, timePerElement = 10.9012 ns/el ± 0.166795, elapsedTime = 32703.7 ns ± 500.385
micro_benchmarkAutoNs : nbCallPerTest = 396
evaluate hadamard : nbElement = 4000, timePerElement = 8.36214 ns/el ± 0.130573, elapsedTime = 33448.6 ns ± 522.29
micro_benchmarkAutoNs : nbCallPerTest = 384
evaluate hadamard : nbElement = 5000, timePerElement = 6.88175 ns/el ± 0.110712, elapsedTime = 34408.7 ns ± 553.56
micro_benchmarkAutoNs : nbCallPerTest = 304
evaluate hadamard : nbElement = 10000, timePerElement = 3.82988 ns/el ± 0.0610476, elapsedTime = 38298.8 ns ± 610.476
micro_benchmarkAutoNs : nbCallPerTest = 190
evaluate hadamard : nbElement = 50000, timePerElement = 1.37675 ns/el ± 0.0379184, elapsedTime = 68837.4 ns ± 1895.92
micro_benchmarkAutoNs : nbCallPerTest = 125
evaluate hadamard : nbElement = 100000, timePerElement = 0.961271 ns/el ± 0.0158151, elapsedTime = 96127.1 ns ± 1581.51
micro_benchmarkAutoNs : nbCallPerTest = 87
evaluate hadamard : nbElement = 200000, timePerElement = 0.747319 ns/el ± 0.0116808, elapsedTime = 149464 ns ± 2336.17
micro_benchmarkAutoNs : nbCallPerTest = 42
evaluate hadamard : nbElement = 500000, timePerElement = 0.613129 ns/el ± 0.00657316, elapsedTime = 306564 ns ± 3286.58
micro_benchmarkAutoNs : nbCallPerTest = 21
evaluate hadamard : nbElement = 1000000, timePerElement = 0.620131 ns/el ± 0.00786868, elapsedTime = 620131 ns ± 7868.68
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.776533 ns/el ± 0.0140583, elapsedTime = 3.88267e+06 ns ± 70291.3
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.811779 ns/el ± 0.00481845, elapsedTime = 8.11779e+06 ns ± 48184.5
[ 50%] Run perf_hadamard_gpupar_O2 program
micro_benchmarkAutoNs : nbCallPerTest = 393
evaluate hadamard : nbElement = 1000, timePerElement = 33.6331 ns/el ± 0.491203, elapsedTime = 33633.1 ns ± 491.203
micro_benchmarkAutoNs : nbCallPerTest = 377
evaluate hadamard : nbElement = 2000, timePerElement = 17.3284 ns/el ± 0.201045, elapsedTime = 34656.8 ns ± 402.09
micro_benchmarkAutoNs : nbCallPerTest = 369
evaluate hadamard : nbElement = 3000, timePerElement = 11.8028 ns/el ± 0.184893, elapsedTime = 35408.5 ns ± 554.678
micro_benchmarkAutoNs : nbCallPerTest = 367
evaluate hadamard : nbElement = 4000, timePerElement = 8.73433 ns/el ± 0.115124, elapsedTime = 34937.3 ns ± 460.496
micro_benchmarkAutoNs : nbCallPerTest = 370
evaluate hadamard : nbElement = 5000, timePerElement = 7.15674 ns/el ± 0.0900864, elapsedTime = 35783.7 ns ± 450.432
micro_benchmarkAutoNs : nbCallPerTest = 354
evaluate hadamard : nbElement = 10000, timePerElement = 3.69594 ns/el ± 0.0417309, elapsedTime = 36959.4 ns ± 417.309
micro_benchmarkAutoNs : nbCallPerTest = 283
evaluate hadamard : nbElement = 50000, timePerElement = 0.915761 ns/el ± 0.0119915, elapsedTime = 45788.1 ns ± 599.573
micro_benchmarkAutoNs : nbCallPerTest = 216
evaluate hadamard : nbElement = 100000, timePerElement = 0.593649 ns/el ± 0.00802338, elapsedTime = 59364.9 ns ± 802.338
micro_benchmarkAutoNs : nbCallPerTest = 151
evaluate hadamard : nbElement = 200000, timePerElement = 0.430794 ns/el ± 0.00523984, elapsedTime = 86158.9 ns ± 1047.97
micro_benchmarkAutoNs : nbCallPerTest = 76
evaluate hadamard : nbElement = 500000, timePerElement = 0.343299 ns/el ± 0.00610536, elapsedTime = 171649 ns ± 3052.68
micro_benchmarkAutoNs : nbCallPerTest = 28
evaluate hadamard : nbElement = 1000000, timePerElement = 0.458034 ns/el ± 0.00759681, elapsedTime = 458034 ns ± 7596.81
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.776961 ns/el ± 0.011909, elapsedTime = 3.8848e+06 ns ± 59545
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.823439 ns/el ± 0.00940426, elapsedTime = 8.23439e+06 ns ± 94042.6
[ 52%] Run perf_hadamard_gpupar_O3 program
micro_benchmarkAutoNs : nbCallPerTest = 392
evaluate hadamard : nbElement = 1000, timePerElement = 33.4706 ns/el ± 0.331422, elapsedTime = 33470.6 ns ± 331.422
micro_benchmarkAutoNs : nbCallPerTest = 372
evaluate hadamard : nbElement = 2000, timePerElement = 17.298 ns/el ± 0.217674, elapsedTime = 34596 ns ± 435.349
micro_benchmarkAutoNs : nbCallPerTest = 368
evaluate hadamard : nbElement = 3000, timePerElement = 11.7878 ns/el ± 0.181141, elapsedTime = 35363.3 ns ± 543.424
micro_benchmarkAutoNs : nbCallPerTest = 376
evaluate hadamard : nbElement = 4000, timePerElement = 8.78477 ns/el ± 0.136388, elapsedTime = 35139.1 ns ± 545.551
micro_benchmarkAutoNs : nbCallPerTest = 368
evaluate hadamard : nbElement = 5000, timePerElement = 7.14936 ns/el ± 0.0896933, elapsedTime = 35746.8 ns ± 448.466
micro_benchmarkAutoNs : nbCallPerTest = 351
evaluate hadamard : nbElement = 10000, timePerElement = 3.69523 ns/el ± 0.0632912, elapsedTime = 36952.3 ns ± 632.912
micro_benchmarkAutoNs : nbCallPerTest = 285
evaluate hadamard : nbElement = 50000, timePerElement = 0.917107 ns/el ± 0.00936547, elapsedTime = 45855.4 ns ± 468.273
micro_benchmarkAutoNs : nbCallPerTest = 219
evaluate hadamard : nbElement = 100000, timePerElement = 0.594568 ns/el ± 0.00711747, elapsedTime = 59456.8 ns ± 711.747
micro_benchmarkAutoNs : nbCallPerTest = 149
evaluate hadamard : nbElement = 200000, timePerElement = 0.429941 ns/el ± 0.00621204, elapsedTime = 85988.2 ns ± 1242.41
micro_benchmarkAutoNs : nbCallPerTest = 77
evaluate hadamard : nbElement = 500000, timePerElement = 0.336579 ns/el ± 0.00577335, elapsedTime = 168289 ns ± 2886.67
micro_benchmarkAutoNs : nbCallPerTest = 28
evaluate hadamard : nbElement = 1000000, timePerElement = 0.457861 ns/el ± 0.00813642, elapsedTime = 457861 ns ± 8136.42
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.779107 ns/el ± 0.0134428, elapsedTime = 3.89554e+06 ns ± 67213.8
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.816606 ns/el ± 0.00473954, elapsedTime = 8.16606e+06 ns ± 47395.4
[ 55%] Call gnuplot hadamard_gpuparBase
[ 55%] Built target plot_hadamard_gpuparBase
[ 63%] Built target perf_hadamard_gpupar_vectorize_O3
Scanning dependencies of target run_perf_hadamard_gpupar_vectorize_O3
[ 65%] Run perf_hadamard_gpupar_vectorize_O3 program
micro_benchmarkAutoNs : nbCallPerTest = 392
evaluate hadamard : nbElement = 1000, timePerElement = 33.4542 ns/el ± 0.391268, elapsedTime = 33454.2 ns ± 391.268
micro_benchmarkAutoNs : nbCallPerTest = 380
evaluate hadamard : nbElement = 2000, timePerElement = 17.3171 ns/el ± 0.225818, elapsedTime = 34634.2 ns ± 451.635
micro_benchmarkAutoNs : nbCallPerTest = 372
evaluate hadamard : nbElement = 3000, timePerElement = 11.7386 ns/el ± 0.137196, elapsedTime = 35215.7 ns ± 411.588
micro_benchmarkAutoNs : nbCallPerTest = 377
evaluate hadamard : nbElement = 4000, timePerElement = 8.74554 ns/el ± 0.128382, elapsedTime = 34982.1 ns ± 513.526
micro_benchmarkAutoNs : nbCallPerTest = 369
evaluate hadamard : nbElement = 5000, timePerElement = 7.13623 ns/el ± 0.0971973, elapsedTime = 35681.1 ns ± 485.987
micro_benchmarkAutoNs : nbCallPerTest = 350
evaluate hadamard : nbElement = 10000, timePerElement = 3.71016 ns/el ± 0.0438853, elapsedTime = 37101.6 ns ± 438.853
micro_benchmarkAutoNs : nbCallPerTest = 285
evaluate hadamard : nbElement = 50000, timePerElement = 0.916567 ns/el ± 0.00980597, elapsedTime = 45828.4 ns ± 490.299
micro_benchmarkAutoNs : nbCallPerTest = 220
evaluate hadamard : nbElement = 100000, timePerElement = 0.593687 ns/el ± 0.007022, elapsedTime = 59368.7 ns ± 702.2
micro_benchmarkAutoNs : nbCallPerTest = 150
evaluate hadamard : nbElement = 200000, timePerElement = 0.430899 ns/el ± 0.00625281, elapsedTime = 86179.8 ns ± 1250.56
micro_benchmarkAutoNs : nbCallPerTest = 78
evaluate hadamard : nbElement = 500000, timePerElement = 0.333645 ns/el ± 0.00555386, elapsedTime = 166822 ns ± 2776.93
micro_benchmarkAutoNs : nbCallPerTest = 28
evaluate hadamard : nbElement = 1000000, timePerElement = 0.461771 ns/el ± 0.0080523, elapsedTime = 461771 ns ± 8052.3
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.778288 ns/el ± 0.0110582, elapsedTime = 3.89144e+06 ns ± 55290.8
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.815769 ns/el ± 0.00490741, elapsedTime = 8.15769e+06 ns ± 49074.1
[ 65%] Built target run_perf_hadamard_gpupar_vectorize_O3
Scanning dependencies of target run_perf_hadamard_gpupar_O0
[ 68%] Built target run_perf_hadamard_gpupar_O0
Scanning dependencies of target run_perf_hadamard_gpupar_O1
[ 71%] Built target run_perf_hadamard_gpupar_O1
Scanning dependencies of target run_perf_hadamard_gpupar_O3
[ 73%] Built target run_perf_hadamard_gpupar_O3
Scanning dependencies of target run_perf_hadamard_gpupar_O2
[ 76%] Built target run_perf_hadamard_gpupar_O2
[ 84%] Built target perf_hadamard_gpupar_vectorize_O4
Scanning dependencies of target run_perf_hadamard_gpupar_vectorize_O4
[ 86%] Run perf_hadamard_gpupar_vectorize_O4 program
micro_benchmarkAutoNs : nbCallPerTest = 383
evaluate hadamard : nbElement = 1000, timePerElement = 33.5663 ns/el ± 0.36282, elapsedTime = 33566.3 ns ± 362.82
micro_benchmarkAutoNs : nbCallPerTest = 374
evaluate hadamard : nbElement = 2000, timePerElement = 17.3563 ns/el ± 0.214874, elapsedTime = 34712.7 ns ± 429.747
micro_benchmarkAutoNs : nbCallPerTest = 369
evaluate hadamard : nbElement = 3000, timePerElement = 11.8179 ns/el ± 0.175017, elapsedTime = 35453.7 ns ± 525.051
micro_benchmarkAutoNs : nbCallPerTest = 375
evaluate hadamard : nbElement = 4000, timePerElement = 8.76842 ns/el ± 0.111842, elapsedTime = 35073.7 ns ± 447.367
micro_benchmarkAutoNs : nbCallPerTest = 369
evaluate hadamard : nbElement = 5000, timePerElement = 7.17709 ns/el ± 0.0918638, elapsedTime = 35885.4 ns ± 459.319
micro_benchmarkAutoNs : nbCallPerTest = 353
evaluate hadamard : nbElement = 10000, timePerElement = 3.72602 ns/el ± 0.0513345, elapsedTime = 37260.2 ns ± 513.345
micro_benchmarkAutoNs : nbCallPerTest = 282
evaluate hadamard : nbElement = 50000, timePerElement = 0.928783 ns/el ± 0.0126429, elapsedTime = 46439.1 ns ± 632.143
micro_benchmarkAutoNs : nbCallPerTest = 216
evaluate hadamard : nbElement = 100000, timePerElement = 0.598947 ns/el ± 0.00882292, elapsedTime = 59894.7 ns ± 882.292
micro_benchmarkAutoNs : nbCallPerTest = 150
evaluate hadamard : nbElement = 200000, timePerElement = 0.431421 ns/el ± 0.00684615, elapsedTime = 86284.2 ns ± 1369.23
micro_benchmarkAutoNs : nbCallPerTest = 77
evaluate hadamard : nbElement = 500000, timePerElement = 0.336947 ns/el ± 0.00536625, elapsedTime = 168474 ns ± 2683.13
micro_benchmarkAutoNs : nbCallPerTest = 28
evaluate hadamard : nbElement = 1000000, timePerElement = 0.461204 ns/el ± 0.0119023, elapsedTime = 461204 ns ± 11902.3
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 5000000, timePerElement = 0.775994 ns/el ± 0.0128051, elapsedTime = 3.87997e+06 ns ± 64025.7
micro_benchmarkAutoNs : nbCallPerTest = 10
evaluate hadamard : nbElement = 10000000, timePerElement = 0.819109 ns/el ± 0.00614954, elapsedTime = 8.19109e+06 ns ± 61495.4
[ 86%] Built target run_perf_hadamard_gpupar_vectorize_O4
Scanning dependencies of target run_perf_hadamard_gpupar_O4
[ 89%] Built target run_perf_hadamard_gpupar_O4
Scanning dependencies of target run_all
[ 89%] Built target run_all
Scanning dependencies of target plot_thread
[ 89%] Built target plot_thread
Scanning dependencies of target plot_hadamard_gpuparVectorize
[ 92%] Call gnuplot hadamard_gpuparVectorize
[100%] Built target plot_hadamard_gpuparVectorize
Scanning dependencies of target plot_all
[100%] Built target plot_all


La figure 9 montre les performances obtenues sur un CPU avec 8 coeurs (4 coeurs hyperthreadés).

nothing nothing

Figure 9 : Performances de notre produit de hadamard avec NVC++. À gauche : le temps total. À droite : le temps par élément.



La figure 10 montre les performances obtenues sur un CPU avec 8 coeurs (4 coeurs hyperthreadés) avec la vectorisation.

nothing nothing

Figure 10 : Performances de notre produit de hadamard vectorisé avec NVC++. À gauche : le temps total. À droite : le temps par élément.