Un exemple avec GPU

3.4.7.1.2 : Un exemple avec GPU

C'est un peu le but du cours, donc il faudra bien y passer à un moment où à un autre. Si vous voulez des informations à jour, la doc est ici.

Le tableau 1 reprend les informations concernant les machines et les GPU qui y sont installés ^{noteDu moins quand j'écris ces lignes.}.

Server number	NVIDIA cards per server	Profile
001 to 003	2 x Tesla K80	Default
004	1 x Tesla V100	Training
005	1 x Quadro P6000	Default
006	4 x Tesla T4	Inference
007 and 008	3 x Ampere A100	Training
009	1 x Ampere A100	None (restricted access to LISTIC laboratory users)

Table 1 : Ressources GPU utilisables sur MUST

Sur MUST le programme nvc++ est installé à /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvc++ sur les machines GPU.

Connectons nous sur une machine GPU avec le script helloworldgpu.sh :

#!/bin/sh

echo "Hello world Condor gpu ! from $(uname -a)"
# Version de nvc++
/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvc++ --version
# Version de cmake
cmake3 --version
# Version de make
make --version
# version de g++
echo "Some GCC 7 I presume"
/opt/rh/devtoolset-7/root/usr/bin/x86_64-redhat-linux-g++ --version

Et avec la configuration helloworldgpu.condor :

# Nom de l'executable
executable=helloworldgpu.sh
# On dit a Condor que l'on veut un environnement vide
universe=vanilla
# Fichier de sortie standard
output=helloworldgpu.output
# Fichier d'erreur
error=helloworldgpu.error
# On définit un fichier de log
log=helloworldgpu.log
# Pour transmettre l'environnement au job
getenv = True

# On ne veux qu'un GPU
request_gpus = 1
# for a specific GPU server, replace XXX with 001 to 009 according to your needs
requirements = machine == "lapp-wngpu007.in2p3.fr"

# On veut lancer un seul job
queue

Lançons notre job :

condor_submit helloworldgpu.condor
Submitting job(s).
1 job(s) submitted to cluster 9409.

Note : Mis à par le faire qu'il faut à peut près mille ans pour que le job s'exécute. Même si on ne demande pas de GPU sur lapp-wngpu007.in2p3.fr ou lapp-wngpu008.in2p3.fr, on fini bien pas avoir un résulat.

Hello world Condor gpu ! from Linux lapp-wngpu005.in2p3.fr 3.10.0-1160.42.2.el7.x86_64 #1 SMP Tue Sep 7 14:49:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Some nvc++ version
Some other nvc++ version


nvc++ 21.9-0 64-bit target on x86-64 Linux -tp skylake 
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
Some cmake version
cmake3 version 3.17.5


CMake suite maintained and supported by Kitware (kitware.com/cmake).
Some make version
GNU Make 3.82
Construit pour x86_64-redhat-linux-gnu
Copyright (C) 2010  Free Software Foundation, Inc.
Licence GPLv3+ : GNU GPL version 3 ou ultérieure <http://gnu.org/licenses/gpl.html>
Ceci est un logiciel libre : vous êtes autorisé à le modifier et à la redistribuer.
Il ne comporte AUCUNE GARANTIE, dans la mesure de ce que permet la loi.
Some GCC 7 I presume
x86_64-redhat-linux-g++ (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

Voilà, nous avons la certitude que le HPC_SDK de NVidia, cmake et make sont bien installés ^{noteSur les OS pourris comme CentOS, cmake s'appelle cmake3 et pas cmake. Ceci n'est pas un reproche mais une constatation.}.

Note : De manière générale, il est préférable d'enfoncer des portes ouvertes pour commencer un projet, comme cela, on n'a pas de mauvaise surprise au moment on on s'attend à ce que tout fonctionne.