NGC | Catalog


For copy image paths and more information, please view on a desktop device.
Logo for GROMACS


GROMACS is a popular molecular dynamics application used to simulate proteins and lipids.


KTH Royal Institute of Technology

Latest Tag



December 2, 2023

Compressed Size

405.44 MB

Multinode Support


Multi-Arch Support



GROMACS is a molecular dynamics application designed to simulate Newtonian equations of motion for systems with hundreds to millions of particles. GROMACS is designed to simulate biochemical molecules like proteins, lipids, and nucleic acids that have a lot of complicated bonded interactions.

System requirements

Before running the NGC GROMACS container please ensure your system meets the following requirements.

  • One of the following container runtimes
  • One of the following NVIDIA GPU(s)
    • Pascal(sm60)
    • Volta (sm70)
    • Ampere (sm80)
    • Hopper (sm90)


  • CPU with AVX instruction support
  • One of the following CUDA driver versions
    • r520 (>=.61.05)
    • >= 450.80.02


  • Marvell ThunderX2 CPU
  • CUDA driver version >= r460


The following examples demonstrate using the NGC GROMACS container to run the STMV benchmark. Reference performance, on a range of systems, can be found at Throughout this example the container version will be referenced as $GROMACS_TAG, replace this with the tag you wish to run.

Download the STMV benchmark:

tar xf GROMACS_heterogeneous_parallelization_benchmark_info_and_systems_JCP.tar.gz 
cd GROMACS_heterogeneous_parallelization_benchmark_info_and_systems_JCP/stmv


Run GROMACS using 4 GPUs (with IDs 0,1,2,3). Here we use 2 thread-MPI tasks per GPU (-ntmpi 8), which we find gives good performance. We set 16 OpenMP threads per thread-MPI task (assuming at least 128 CPU cores in the system). These can be adjusted to map to any specific hardware system, and experimented with for best performance.


DOCKER="docker run --gpus all -it --rm -v ${PWD}:/host_pwd --workdir /host_pwd${GROMACS_TAG}"
${DOCKER} gmx mdrun -ntmpi 8 -ntomp 16 -nb gpu -pme gpu -npme 1 -update gpu -bonded gpu -nsteps 100000 -resetstep 90000 -noconfout -dlb no -nstlist 300 -pin on -v -gpu_id 0123  


SINGULARITY="singularity run --nv -B ${PWD}:/host_pwd --pwd /host_pwd docker://${GROMACS_TAG}"
${SINGULARITY} gmx mdrun -ntmpi 8 -ntomp 16 -nb gpu -pme gpu -npme 1 -update gpu -bonded gpu -nsteps 100000 -resetstep 90000 -noconfout -dlb no -nstlist 300 -pin on -v -gpu_id 0123 

Running on Base Platform Command

NVIDIA Base Command Platform (BCP) offers a ready-to-use cloud-hosted solution that manages the end-to-end lifecycle of development, workflows, and resource management. Before running the commands below, install and configure the ngc cli, more information can be found here.

Uploading the Dataset to BCP

Upload the stmv dataset using the command below

ngc dataset upload --source ./stmv/ --desc "GROMACS stmv dataset" gromacs_dataset
Running GROMACS on BCP

As a note: we must include the -g <md-log-path> and -e <energy log path> to the run command because the mounted working directory is read-only, we must set the paths for the output logs to a writable mounted directory

Single node on a single GPU running the stmv dataset on 4 GPUs with 2 MPI threads per GPU and 15 OpenMP threads per thread-MPI task for a total of 120 CPU cores.

ngc batch run --name "gromacs_reducentomp120cores" --priority NORMAL --order 50 --preempt RUNONCE --min-timeslice 0s --total-runtime 0s --ace <your-ace> --instance dgxa100.80g.4.norm --commandline "/usr/bin/nventry -build_base_dir=/usr/local/gromacs -build_default=avx2_256 gmx mdrun -g /results/md.log -e /results/ener.edr -ntmpi 8 -ntomp 15 -nb gpu -pme gpu -npme 1 -update gpu -bonded gpu -nsteps 100000 -resetstep 90000 -noconfout -dlb no -nstlist 300 -pin on -v -gpu_id 0123" --result /results/ --image "hpc/gromacs:2023.2" --org <your-org> --datasetid <dataset-id>:/host_pwd/

Suggested Reading



GROMACS Documentation

GROMACS GPU Acceleration

GROMACS 2020 GPU optimization

Maximizing GROMACS Throughput with Multiple Simulations per GPU Using MPS and MIG | NVIDIA Technical Blog

Massively Improved Multi-node NVIDI

BCP User Guide