How to run the code¶
Once installed (see Installation), FBPIC is available as a Python module on your system. Thus, a simulation is setup by creating a Python script that imports and uses FBPIC’s functionalities.
You can download examples of FBPIC scripts below (which you can then modify to suit your needs):
The different FBPIC objects that are used in the above simulation scripts are defined in the section API reference.
Running the simulation¶
The simulation is then run by typing
fbpic_script.py should be replaced by the name of your
Python script: either
boosted_frame_script.py for the above examples.
If an MPI implementation is available within the compute environment and
mpi4py package is installed, the computation can be scaled to multiple
processes (e.g. 4) by running
mpirun -n 4 python fbpic_script.py
Note that, depending on the size of the simulation, running with
multiple MPI processes is not necessarily faster. In addition,
MPI simulations require using a finite order (e.g.
for the field solver. Please read the
documentation on the parallelisation of FBPIC before using this feature.
When running on CPU, multi-threading is enabled by default, and the default number of threads is the number of (virtual) cores on your system. You can modify this with environment variables:
To modify the number of threads (e.g. set it to 8 threads):
export MKL_NUM_THREADS=8 export NUMBA_NUM_THREADS=8 python fbpic_script.py
To disable multi-threading altogether:
export FBPIC_DISABLE_THREADING=1 export MKL_NUM_THREADS=1 export NUMBA_NUM_THREADS=1 python fbpic_script.py
It can also happen that an alternative threading backend is selected by Numba
during installation. It is therefore sometimes required to set
OMP_NUM_THREADS in addition to (or instead of)
When running in a Jupyter notebook, environment variables can be set by executing the following command at the beginning of the notebook:
import os os.environ['MKL_NUM_THREADS']='1'
On systems with more than one CPU socket per node, multi-threading can become inefficient if the threads are distributed across sockets. It can be advantageous to use one MPI process per socket and to limit the number of threads to the number of physical cores of each socket. In addition, it can be necessary to explicitly bind all threads of an MPI process to the same socket.
On a machine with 2 sockets and 12 physical cores per socket, the following commands spawn 2 MPI processes each with 12 threads bound to a single socket:
Using the SLURM workload manager:
export MKL_NUM_THREADS=12 export NUMBA_NUM_THREADS=12 srun -n 2 --cpu_bind=socket python fbpic_script.py
export MKL_NUM_THREADS=12 export NUMBA_NUM_THREADS=12 mpirun -n 2 --bind-to socket python fbpic_script.py
When running on GPU with MPI domain decomposition, it is possible to enable the CUDA GPUDirect technology. GPUDirect enables direct communication of CUDA device arrays between GPUs over MPI without explicitly copying the data to CPU first, resulting in reduced latencies and increased bandwidth. As this feature requires a CUDA-aware MPI implementation that supports GPUDirect, it is disabled by default and should be used with care.
To activate this feature, the user needs to set the following environment variable:
Visualizing the simulation results¶
pip install openpmd-viewer
conda install -c conda-forge openpmd-viewer
And then type
and follow the instructions in the notebook that pops up. (NB: the notebook only shows some of the capabilities of the openPMD-viewer. To learn more, see the tutorial notebook on the Github repository of openPMD-viewer).
If you want to render your simulation results in 3D, see the section 3D visualization using PyVista.