Profiling the code

Profiling the code consists in finding which parts of the algorithm dominate the computational time, for your particular simulation setup.

Profiling the code executed on CPU

Getting the results in a simple text file

Dumping the profiling results in a text file allows you to quickly profile the execution of a simulation.

Run the code with cProfile :

python -m cProfile -s time > cpu.log

and then open the file cpu.log with a text editor.

Using a visual profiler

For a more detailed analysis, you can use a visual profiler (i.e. profilers with a graphical user interface).

Run the code with cProfile, using binary output:

python -m cProfile -o

and then open the file with snakeviz


Profiling the code executed on GPU

Getting the results in a simple text file

Run the code with nvprof :

nvprof --log-file gpu.log python

in order to profile the device-side (i.e. GPU-side) code alone, or

nvprof --log-file gpu.log python -m cProfile -s time > cpu.log

in order to simultaneously profile the device-side (i.e. GPU-side) and host-side (i.e. CPU-side) code. Then open the files gpu.log and/or cpu.log with a text editor.

Using a visual profiler

Run the code with nvprof, using binary output:

nvprof -o python

and then launch nvvp:


And click File > Open, in order to select the file


You do not need to run snakeviz or nvvp on the same machine on which the simulation was run. (In particular, nvvp does not need to have access to a GPU.) This means for example that, if your simulation was run on a remote cluster, you can simply transfer the files and/or to your local computer, and run snakeviz or nvvp locally.

You can install nvvp on your local computer by installing the cuda toolkit.


When profiling the code with nvprof, the profiling data can quickly become very large. Therefore we recommend to profile the code only on a small number of PIC iterations (<1000).

Profiling MPI simulations

One way to profile MPI simulations is to write one file per MPI rank. In this case, each file will contain only the profiling data of the corresponding MPI process.

Profiling the CPU code

One way to create one file per MPI rank is to modify your FBPIC script, in the following way:

  • Add the following lines at the beginning of the file:

    import cProfile, sys
    from mpi4py.MPI import COMM_WORLD as comm
  • Replace the line:

    sim.step( N_step )

    by the following set of lines:

    # First step: do not profile (includes just-in-time compilation)
    # Profile the next N_step
    pr = cProfile.Profile()
    sim.step( N_step )
    # Dump results:
    # - for binary dump
    pr.dump_stats('' %comm.rank)
    # - for text dump
    with open( 'cpu_%d.txt' %comm.rank, 'w') as output_file:
        sys.stdout = output_file
        pr.print_stats( sort='time' )
        sys.stdout = sys.__stdout__

Then run your FBPIC script with MPI as usual, e.g. with 4 MPI ranks:

mpirun -np 4 python

Profiling the GPU code

Use the following command:

mpirun -np 2 nvprof -o gpu_%q{<RANK>}.prof python

where <RANK> should be replaced by the following name depending on your MPI distribution:

  • For openmpi: OMPI_COMM_WORLD_RANK

  • For mpich: PMI_RANK

(If you are unsure which name to use, type mpirun -np 2 printenv | grep RANK.)

nvprof will then create one profile file per MPI rank. You can load these files on the same timeline within nvvp by clicking File > Import > Nvprof > Multiple processes > Browse. For more information, see this page.