Sample dataset prepared by NVIDIA BioNeMo team from PoseBusters benchmark set.
Latest Version
January 18, 2024
Compressed Size
228.22 MB

DiffDock Dataset

This is a dataset generated following the preparation steps for PoseBusters benchmark set used for training diffdock score and confidence model. The PoseBusters Benchmark set is a new set of 428 carefully-selected publicly-available crystal complexes from the PDB. It is a diverse set of recent high-quality protein-ligand complexes which contain drug-like molecules. It only contains complexes released since 2021. A subset of 50 complexes from this database is used to create and train/validation/test datasets for training test of diffdock.

How to use the dataset?

You can use BioNeMo Framework to run DiffDock Score/Confidence model training using this dataset.


This dataset is being re-distributed under the same license as PoseBusters benchmark set (Creative Commons Attribution 4.0 International (CC BY 4.0) License)