3.4.7.1.7 : Produit de Hadamard NVC++ sur MUST
Commençons par quelques petites erreurs que l'on peut rencontrer sur le coin de la route noteDu moment que l'on est sur une route qui a des coins bien sûr. :
Comme nous compilons en C++17, nous ne pouvons pas utiliser un compilateur antique (voir ici). Et dans le cas, même G++7 est trop vieux noteNous sommes d'accord qu'il fourni du binaire relativement bien optimisé, mais son C++17 ne fonctione pas.. Dans mon cas, G++9 fonctionne très bien.
D'un manière générale, si vous obtenez des erreurs du genre :
C'est que votre compilateur C++ ne gère pas C++17.
Cela dit, si vous demandez à nvc++20.11-0 de compiler en C++20, il n'y arrive pas non plus. Mais la version de 2022 sera sensée le faire.
D'un manière générale, si vous obtenez des erreurs du genre :
"/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/include-stdpar/thrust/mr/new.h", line 45: error: namespace "std" has no member "align_val_t" return ::operator new(bytes, std::align_val_t(alignment)); ^
"/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/include-stdpar/thrust/mr/new.h", line 67: error: namespace "std" has no member "align_val_t" ::operator delete(p, bytes, std::align_val_t(alignment)); ^
C'est que votre compilateur C++ ne gère pas C++17.
Cela dit, si vous demandez à nvc++20.11-0 de compiler en C++20, il n'y arrive pas non plus. Mais la version de 2022 sera sensée le faire.
Une autre arnaque est que le mode d'exécution std::execution::par n'est pas gérer pour les GPU ayant des Compute Capabilities supérieures ou égales à 7.0.
Évidemment, la P6000 est en 6.1. C'est pour cela que l'on utilise std::execution::par_unseq :
"hadamard.cpp", line 23: error: Calls to function "std::transform(_EP &&, _FIt1, _FIt1, _FIt2, _FIt3, _BF) [with _EP=std::execution::parallel_policy &, _FIt1=const float *, _FIt2=const float *, _FIt3=float *, _BF=lambda [](float, float)->float]" with execution policy std::execution::par will run sequentially when compiled for a compute capability less than cc70; only std::execution::par_unseq can be run in parallel on such GPUs std::transform(std::execution::par, tabX, tabX + nbElement, tabY, tabRes, ^
condor_submit submit.condor Submitting job(s). 1 job(s) submitted to cluster 9435.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
cat hadamard_product_nvcpp.output Used machine is Linux lapp-wngpu005.in2p3.fr 3.10.0-1160.42.2.el7.x86_64 #1 SMP Tue Sep 7 14:49:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Remove existing directory build -- The C compiler identification is GNU 4.8.5 -- The CXX compiler identification is PGI 21.9.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc - works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvc++ -- Check for working CXX compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvc++ - works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Program HadamardProductNvcpp version 0.1.0 -- SELF_TESTS_MODE = yes -- OptionParser not found -- Find project OptionParser in /lapp_data/cta/paubert/TestCondor/COURS/HadamardProductNvcpp/tmp_project/OptionParser -- Program OptionParser version 1.7.9 -- StringUtils not found -- Find project StringUtils in /lapp_data/cta/paubert/TestCondor/COURS/HadamardProductNvcpp/tmp_project/OptionParser/tmp_project/StringUtils -- Program StringUtils version 1.7.9 -- MicroBenchmark not found -- Find project MicroBenchmark in /lapp_data/cta/paubert/TestCondor/COURS/HadamardProductNvcpp/tmp_project/MicroBenchmark -- Program MicroBenchmark version 1.7.9 -- Activate mode to avoid performance test refreshing : NO_PERF_REFRESH = yes -- EXTRA_DEPENDENCIES = -- TensorAlloc not found -- Find project TensorAlloc in /lapp_data/cta/paubert/TestCondor/COURS/HadamardProductNvcpp/tmp_project/TensorAlloc -- Program TensorAlloc version 1.7.9 -- Automatic searching for architecture flags -- Get LINUX extensions -- Find SSSE3 -- Find SSE4 -- Find AVX -- Find AVX2 -- Find AVX512F -- tmp FLAG_VECTORIZED_COMPILATION = -mavx512f -- global PHOENIX_FLAG_VECTORIZED_COMPILATION = -mavx512f -- Automatic searching for architecture flags -- Get LINUX extensions -- Find SSSE3 -- Find SSE4 -- Find AVX -- Find AVX2 -- Find AVX512F -- CMAKE_VERSION = 3.17.5, MODE_NUNBER = '6' -- CPU_MODEL_NAME = Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz -- CPU_SIBLINGS = 16 -- NB_CORE = 8 -- CACHE_L1_DATA = 0 B -- CACHE_L1_INSTRUCTION = 0 B -- CACHE_L2 = 0 B -- CACHE_L3 = 0 B -- ENDIANESS = LittleEndian -- DataStream not found -- Find project DataStream in /lapp_data/cta/paubert/TestCondor/COURS/HadamardProductNvcpp/tmp_project/TensorAlloc/tmp_project/DataStream -- Program DataStream version 1.7.9 -- TensorAlloc PHOENIX_FLAG_VECTORIZED_COMPILATION = -mavx512f -- enable GPU mode : GPU_MODE = yes -- Use nvc++ compiler at /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvc++ -- Configuring done -- Generating done -- Build files have been written to: /lapp_data/cta/paubert/TestCondor/COURS/HadamardProductNvcpp/build Scanning dependencies of target string_utils [ 0%] Building CXX object tmp_project/OptionParser/tmp_project/StringUtils/src/CMakeFiles/string_utils.dir/PStream.cpp.o [ 0%] Building CXX object tmp_project/OptionParser/tmp_project/StringUtils/src/CMakeFiles/string_utils.dir/PString.cpp.o [ 1%] Building CXX object tmp_project/OptionParser/tmp_project/StringUtils/src/CMakeFiles/string_utils.dir/ProgressBarr.cpp.o Scanning dependencies of target tensor_alloc [ 2%] Building CXX object tmp_project/TensorAlloc/src/CMakeFiles/tensor_alloc.dir/PStride.cpp.o [ 2%] Building CXX object tmp_project/TensorAlloc/src/CMakeFiles/tensor_alloc.dir/PTensor.cpp.o [ 2%] Building CXX object tmp_project/TensorAlloc/src/CMakeFiles/tensor_alloc.dir/alignement_type.cpp.o [ 2%] Building CXX object tmp_project/TensorAlloc/src/CMakeFiles/tensor_alloc.dir/pallocAlignedVector.cpp.o [ 4%] Building CXX object tmp_project/TensorAlloc/src/CMakeFiles/tensor_alloc.dir/template_alloc.cpp.o [ 4%] Linking CXX shared library libtensor_alloc.so [ 4%] Built target tensor_alloc Scanning dependencies of target micro_benchmark [ 6%] Building CXX object tmp_project/MicroBenchmark/src/CMakeFiles/micro_benchmark.dir/micro_benchmark_common.cpp.o [ 6%] Building CXX object tmp_project/MicroBenchmark/src/CMakeFiles/micro_benchmark.dir/micro_benchmark_ns.cpp.o [ 6%] Building CXX object tmp_project/MicroBenchmark/src/CMakeFiles/micro_benchmark.dir/phoenix_timer.cpp.o [ 6%] Building CXX object tmp_project/MicroBenchmark/src/CMakeFiles/micro_benchmark.dir/pin_thread_to_core.cpp.o [ 9%] Linking CXX shared library libmicro_benchmark.so [ 9%] Built target micro_benchmark Scanning dependencies of target data_stream [ 9%] Building CXX object tmp_project/TensorAlloc/tmp_project/DataStream/src/CMakeFiles/data_stream.dir/data_stream_file_simple_type.cpp.o [ 9%] Building CXX object tmp_project/TensorAlloc/tmp_project/DataStream/src/CMakeFiles/data_stream.dir/data_stream.cpp.o [ 9%] Building CXX object tmp_project/TensorAlloc/tmp_project/DataStream/src/CMakeFiles/data_stream.dir/data_stream_file.cpp.o [ 11%] Building CXX object tmp_project/TensorAlloc/tmp_project/DataStream/src/CMakeFiles/data_stream.dir/data_stream_isSimpleType.cpp.o [ 11%] Building CXX object tmp_project/TensorAlloc/tmp_project/DataStream/src/CMakeFiles/data_stream.dir/data_stream_message.cpp.o [ 11%] Building CXX object tmp_project/TensorAlloc/tmp_project/DataStream/src/CMakeFiles/data_stream.dir/data_stream_size.cpp.o [ 11%] Linking CXX shared library libdata_stream.so [ 11%] Built target data_stream Scanning dependencies of target perf_hadamard_gpu_stdpar_par_vectorize_O3 [ 11%] Building CXX object src/CMakeFiles/perf_hadamard_gpu_stdpar_par_vectorize_O3.dir/hadamard.cpp.o [paubert@lappui7c HadamardProductNvcpp]$ cat hadamard_product_nvcpp.error lscpu : option invalide -- 'B' Try 'lscpu --help' for more information. lscpu : option invalide -- 'B' Try 'lscpu --help' for more information. lscpu : option invalide -- 'B' Try 'lscpu --help' for more information. lscpu : option invalide -- 'B' Try 'lscpu --help' for more information. "/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/include-stdpar/thrust/mr/new.h", line 45: error: namespace "std" has no member "align_val_t" return ::operator new(bytes, std::align_val_t(alignment)); ^ "/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/include-stdpar/thrust/mr/new.h", line 67: error: namespace "std" has no member "align_val_t" ::operator delete(p, bytes, std::align_val_t(alignment)); ^ "hadamard.cpp", line 23: error: Calls to function "std::transform(_EP &&, _FIt1, _FIt1, _FIt2, _FIt3, _BF) [with _EP=std::execution::parallel_policy &, _FIt1=const float *, _FIt2=const float *, _FIt3=float *, _BF=lambda [](float, float)->float]" with execution policy std::execution::par will run sequentially when compiled for a compute capability less than cc70; only std::execution::par_unseq can be run in parallel on such GPUs std::transform(std::execution::par, tabX, tabX + nbElement, tabY, tabRes, ^ 2 errors detected in the compilation of "hadamard.cpp". make[3]: *** [src/CMakeFiles/perf_hadamard_gpu_stdpar_par_vectorize_O3.dir/hadamard.cpp.o] Erreur 2 make[2]: *** [src/CMakeFiles/perf_hadamard_gpu_stdpar_par_vectorize_O3.dir/all] Erreur 2 make[1]: *** [tmp_project/MicroBenchmark/CMakeFiles/run_all.dir/rule] Erreur 2 make: *** [run_all] Erreur 2 |
Bon, là nvc++ n'est pas coopératif.
condor_submit submit.condor Submitting job(s). 1 job(s) submitted to cluster 9438.