Part 5 : Development and reproductibility (WIP)
![]() |
| dependencies, compiler, environment, tools, versionning, container and pixi |
OK, let's start from the very beginning. As a developer or as a scientist, you want to work on your laptop which is perfectly fine and understandable. This is easier, faster, you have the libraries you want, the installation you want, the tools you want, sounds great.
Then, you will have somehting working on your laptop. You get some result but the computing gets longer and longer and this is sustainable at some point.
So, you want to use a computing center. This is a good choise. But you will have to face some problems you did not expect.
- The cmake command does not exist
- OK, wait, no. It is just called cmake3 and not cmake because why not
- Ah, but it is quite old. OK, let's downgrade the build configuration. If you can. But let's pretend you are lucky and you do not need recent cmake features
- make also as not the same version, but this is OK too. And this is always nice to get lucky some time
- But wait, compilation starts but everything exploded for some reason, what happended ?
- After several minutes, you finally realise G++ compiler used by cmake (sorry cmake3) is very old. You find out, as an archeologist, that you are using G++4. This is really far from you expectation where you dreamed to use G++15 or at least G++11.
- G++4 is not maintained since 2011 (yes 14 years ago when I wrote these lines)
- You have to find a more recent compiler which is technically not that difficult because G++4 but you really need a relevant compiler for what you want to do.
- Let's pretend again, you are lucky and after having a look to the documentation of the computing center and several mails to you favorite system administrator you find out that some, not-so-old-compilers are installed in /opt/rh/devtoolset-7/root/usr/bin/x86_64-redhat-linux-g++
- Once again, your realise that you are lucky for two reasons :
- First, you do not have to recompile a recent G++ on the computing center which saves you days of works and preserve you mental health
- Second, you can use G++7 because you program does not use C++20 features otherwise you would have to get G++11 or more.
- OK, you do not recognize your compilation script anymore because you get from cmake .. to cmake3 .. -DCMAKE_C_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/x86_64-redhat-linux-gcc -DCMAKE_CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/x86_64-redhat-linux-g++ but that's OK
- And, I have to tell you but your are once again lucky because you are using HDF5 which did not get update since about 15 years and you do not use blosc with it so your program is compiling !!! This is a success
- Now you can submit all the jobs you want !!!
- Ah, wait, you have a crash
- OK, this is your fault, because your forget to change the output file of your jobs, so they all try to write in the same hdf5 file. But again, you are lucky because hdf5 prevent this behaviour and make your jobs crash. If you were using an ugly text file as a result file you would get random crashes and crasy results with no clue.
- Finally everithing works, your jobs are running and you can leave the laboratory and have some sleep (not much because this is 4 am and you realise that maybe next time it would be better to wait until the next day to fix everything with some rest and fresh ideas but too late. At last, this is working).
- Few days later, you discuss with your collegues at the coffee break and some of them ask you : "why did you try the hard way to submit your jobs ? This could be so much simpler"
- What ????!!!! How this could be possible ?
- OK, let's have a talk...
People from data centers want stability so they do not upgrade the OS of their computing center every 5 years because this is, of course, much more work to upgrade a computing center than to upgrade a laptop noteAt this point it seems that they have done a mistake by confusing stable with old which are completly different concepts. But upgrading a computing center is very complex so you have to understand them..
Let's continue our story.
- Let's pretend you start a new projet and with the advises of your collegues, you use conda environments.
- First, it sounds great. You do not have to bother administrators to install or update programs or libraries. You can have quite the same environment on your laptop and on your favorite computing center. And moreover, you get G++15 in a reasonable path. You create some environments variables CC and CXX or FC for Fortran and you can work nicely on your laptop.
- After some weeks you start to hate conda because it is quite broken and does not give you proper error messages. SO you are changing for micromamba which is way better than conda. It is standalone, developped in C++ and not Python noteYes, Python is very slow as a matter of fact, even for snails.
- Everything goes well on your laptop, and you want to start your projet on your favorite computing center
- And then, youre are lucky again. Because you switch to micromamba just before getting your project of the computing center. Otherwise the default conda environment would kill the file system. In this case, having tens of thouthands of small files is a killer for distributed file system.
