NGC Catalog
CLASSIC
Welcome Guest
Models
DSMBind

DSMBind

For downloads and more information, please view on a desktop device.
Logo for DSMBind
Description
DSMBind is an energy-based model that has been trained on protein-ligand complexes to predict binding affinities. The model produces comparative values that are useful for ranking protein-ligand binding affinities.
Publisher
NVIDIA
Latest Version
1.7
Modified
November 27, 2024
Size
11.7 MB

Model Overview

Description:

DSMBind [1,2] is an energy-based model that has been trained on protein-ligand complexes to predict binding affinities. The model produces comparative values that are useful for ranking protein-ligand binding affinities. This model is for research and development only.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the model card by Broad Institute of MIT and Harvard.

License

DSMBind is provided under the Apache License 2.0.

References:

[1] Wengong Jin, Siranush Sarkizova, Xun Chen, Nir Hacohen, and Caroline Uhler. "Unsupervised protein-ligand binding energy prediction via neural euler's rotation equation." Advances in Neural Information Processing Systems 36 (2024).

[2] Wengong Jin, Xun Chen, Amrita Vetticaden, Siranush Sarzikova, Raktima Raychowdhury, Caroline Uhler, and Nir Hacohen. "DSMBind: SE (3) denoising score matching for unsupervised binding energy prediction and nanobody design." bioRxiv (2023): 2023-12.

Model Architecture:

Architecture Type: Energy-Based Model (EBM)
Network Architecture: SE(3)-Invariant Neural Network

Input:

Input Type(s): Text (PDB, SDF)
Input Format(s): Protein Data Bank (PDB) Structure files for proteins, Structural Data Files (SDF) for ligands

Output:

Output Type(s): Numerical scores (indicating binding affinities)
Output Format: List of scalar values
Other Properties Related to Output: Only the rank of the predicted values matters because the model produces comparative values instead of absolute binding energies.

Software Integration:

Runtime Engine(s):

  • BioNeMo (1.7), NeMo

Supported Hardware Microarchitecture Compatibility:

  • Ampere
  • Hopper

Preferred/Supported Operating System(s):

  • Linux

Model Version(s):

dsmbind.pth, version: 1.7

Training & Evaluation:

Training Dataset:

Link: a subset from PDB
Data Collection Method by dataset

  • Human
    Properties (Quantity, Dataset Descriptions, Sensor(s)): Our DSMBind checkpoint was trained using a subset of PDB. This subset includes a total of 25,561 samples, each representing a unique protein-ligand complex.

Evaluation Dataset:

Link: CASF-16
Data Collection Method by dataset

  • Human
    Labeling Method by dataset
  • Hybrid: Human & Automated
    Properties (Quantity, Dataset Descriptions, Sensor(s)): CASF-16 is an open challenge for comparative assessment of scoring functions. This benchmark has 285 protein-ligand complexes with binding affinity labels.

Inference:

Engine: BioNeMo, NeMo
Test Hardware:

  • Ampere

Evaluation Results

We use gaussian noise to perturbe the ligand coordinates during training. We evaluate our trained DSMBind model on the CASF-16 benchmark. We measure the Pearson correlation coefficient to assess the linear relationship between the predicted scalar values and actual binding affinities. The trained checkpoint can achieve a Pearson correlation coefficient of 0.64.

Limitations

DSMBind produces comparative values which are useful to rank complexes. But it does not provide absolute measures that are directly comparable to experimental ground truth affinities.