This is a dataset generated following the preparation steps for PoseBusters benchmark set used for training diffdock score and confidence model. The PoseBusters Benchmark set is a new set of 428 carefully-selected publicly-available crystal complexes from the PDB. It is a diverse set of recent high-quality protein-ligand complexes which contain drug-like molecules. It only contains complexes released since 2021. A subset of 50 complexes from this database is used to create and train/validation/test datasets for training test of diffdock as presented in Posebusters DiffDock Processed Sample Data Resource. And 2 samples are selected for use in the data preprocessing test.
How to use the dataset?
You can use BioNeMo Framework to run DiffDock Score/Confidence model training using this dataset.
This dataset is being re-distributed under the same license as PoseBusters benchmark set (Creative Commons Attribution 4.0 International (CC BY 4.0) License)