Installation on JUWELS (JSC) ================================================= `JUWELS `__ is a supercomputer at the `Juelich Supercomputing Centre `__ (JSC). Installation and usage of FBPIC requires the following steps: - Loading the cluster modules - Installation of Anaconda - Installation of FBPIC - Allocation of resources and running simulations Loading the cluster modules --------------------------- To load the standard modules the ``.bashrc`` should contain the following: :: module load GCC/8.3.0 module load MVAPICH2/2.3.3-GDR Installation of Anaconda ------------------------------------------------ In order to download and install `Anaconda `__, type: :: wget bash Then install the dependencies of FBPIC: :: conda install numba scipy h5py mkl conda install cudatoolkit=10.0 pip install cupy-cuda100 It is advised that the following packages are **NOT** installed directly with Anaconda: ``mpich`` and ``mpi4py`` You can install mpi4py directly with pip and it will be built against the MPI library that is loaded on the cluster via the modules. :: pip install mpi4py --no-cache-dir If you want to use GPUdirect, you instead need the development version: :: pip install git+ --no-cache-dir You can check if the correct MPI is linked by opening a ``python`` shell and checking: :: from mpi4py import MPI MPI.Get_library_version() Note that sometimes it is also required that you add the Anaconda folders to your ``PATH`` and ``PYTHONPATH`` environment variables after the ``module loads ...`` in your ``.bashrc``. For example: :: export PATH="/p/home/jusers/USERNAME/juwels/anaconda3/bin":$PATH export PYTHONPATH="/p/home/jusers/USERNAME/juwels/anaconda3/lib/python3.7/site-packages":$PYTHONPATH Installation of FBPIC --------------------- Finally, clone FBPIC using ``git``, ``cd`` into the folder ``fbpic/`` and type :: python install Running simulations ------------------------------------------ In the following, it is explained how to allocate and use **interactive** jobs on JUWELS. For the usage of normal jobs, one can use the similar commands in a job script. More information can be found `here `__. **Allocation of ressources** **CPU:** CPU nodes consist of 2x24 cores. Allocation of two nodes for 60 minutes: ``salloc --nodes=2 --time=00:60:00`` **CPU multithreading:** Best performance is expected if the FBPIC (and NUMBA/MKL) threading settings are set to 24 threads, while using one MPI process per socket. As a single JUWELS node has two sockets with each 24 cores, this means that ideally 2 MPI processes should be used when running on a single node. **GPU:** GPU nodes consist of 4 Nvidia V100 Devices, i.e. 4 GPUs. Allocation of 8 GPUs (2 nodes) for 60 minutes: ``salloc --nodes=2 --partition=gpus --time=00:60:00 --gres=gpu:4`` **Starting an interactive run** The following command starts an interactive run ( with 8 tasks (e.g. 8 GPUs). ``--pty`` activates continuous console output and ``--forward-x``\ enables X-forwarding if the connection to JURECA was established with ``ssh -Y``. ``srun --ntasks=8 --forward-x --pty python`` **Using GPUdirect** To take advantage of direct communication between MPI and the GPUs, run: ``export FBPIC_ENABLE_GPUDIRECT=1`` Note that this will only work with the correct MPI implementation (i.e. the cluster modules above) and requires the development version of ``mpi4py``.