DSMBind is an energy-based model that has been trained on protein-ligand complexes to predict binding affinities. The model produces comparative values that are useful for ranking protein-ligand binding affinities.
Model Overview
Description:
DSMBind [1,2] is an energy-based model that has been trained on protein-ligand complexes to predict binding affinities. The model produces comparative values that are useful for ranking protein-ligand binding affinities. This model is for research and development only.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the model card by Broad Institute of MIT and Harvard.
License
DSMBind is provided under the Apache License 2.0.
References:
[1] Wengong Jin, Siranush Sarkizova, Xun Chen, Nir Hacohen, and Caroline Uhler. "Unsupervised protein-ligand binding energy prediction via neural euler's rotation equation." Advances in Neural Information Processing Systems 36 (2024).
[2] Wengong Jin, Xun Chen, Amrita Vetticaden, Siranush Sarzikova, Raktima Raychowdhury, Caroline Uhler, and Nir Hacohen. "DSMBind: SE (3) denoising score matching for unsupervised binding energy prediction and nanobody design." bioRxiv (2023): 2023-12.
Model Architecture:
Architecture Type: Energy-Based Model (EBM)
Network Architecture: SE(3)-Invariant Neural Network
Input:
Input Type(s): Text (PDB, SDF)
Input Format(s): Protein Data Bank (PDB) Structure files for proteins, Structural Data Files (SDF) for ligands
Output:
Output Type(s): Numerical scores (indicating binding affinities)
Output Format: List of scalar values
Other Properties Related to Output: Only the rank of the predicted values matters because the model produces comparative values instead of absolute binding energies.
Software Integration:
Runtime Engine(s):
- BioNeMo (1.7), NeMo
Supported Hardware Microarchitecture Compatibility:
- Ampere
- Hopper
Preferred/Supported Operating System(s):
- Linux
Model Version(s):
dsmbind.pth, version: 1.7
Training & Evaluation:
Training Dataset:
Link: a subset from PDB
Data Collection Method by dataset
- Human
Properties (Quantity, Dataset Descriptions, Sensor(s)): Our DSMBind checkpoint was trained using a subset of PDB. This subset includes a total of 25,561 samples, each representing a unique protein-ligand complex.
Evaluation Dataset:
Link: CASF-16
Data Collection Method by dataset
- Human
Labeling Method by dataset - Hybrid: Human & Automated
Properties (Quantity, Dataset Descriptions, Sensor(s)): CASF-16 is an open challenge for comparative assessment of scoring functions. This benchmark has 285 protein-ligand complexes with binding affinity labels.
Inference:
Engine: BioNeMo, NeMo
Test Hardware:
- Ampere
Evaluation Results
We use gaussian noise to perturbe the ligand coordinates during training. We evaluate our trained DSMBind model on the CASF-16 benchmark. We measure the Pearson correlation coefficient to assess the linear relationship between the predicted scalar values and actual binding affinities. The trained checkpoint can achieve a Pearson correlation coefficient of 0.64.
Limitations
DSMBind produces comparative values which are useful to rank complexes. But it does not provide absolute measures that are directly comparable to experimental ground truth affinities.