DSMBind | NVIDIA NGC

NGC Catalog

CLASSIC

Welcome Guest

For downloads and more information, please view on a desktop device.

Description

DSMBind is an energy-based model that has been trained on protein-ligand complexes to predict binding affinities. The model produces comparative values that are useful for ranking protein-ligand binding affinities.

Publisher

NVIDIA

Latest Version

1.7

Modified

November 27, 2024

Size

11.7 MB

Model Overview

Description:

DSMBind [1,2] is an energy-based model that has been trained on protein-ligand complexes to predict binding affinities. The model produces comparative values that are useful for ranking protein-ligand binding affinities. This model is for research and development only.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the model card by Broad Institute of MIT and Harvard.

License

DSMBind is provided under the Apache License 2.0.

References:

[1] Wengong Jin, Siranush Sarkizova, Xun Chen, Nir Hacohen, and Caroline Uhler. "Unsupervised protein-ligand binding energy prediction via neural euler's rotation equation." Advances in Neural Information Processing Systems 36 (2024).

[2] Wengong Jin, Xun Chen, Amrita Vetticaden, Siranush Sarzikova, Raktima Raychowdhury, Caroline Uhler, and Nir Hacohen. "DSMBind: SE (3) denoising score matching for unsupervised binding energy prediction and nanobody design." bioRxiv (2023): 2023-12.

Model Architecture:

Architecture Type: Energy-Based Model (EBM)
Network Architecture: SE(3)-Invariant Neural Network

Input:

Input Type(s): Text (PDB, SDF)
Input Format(s): Protein Data Bank (PDB) Structure files for proteins, Structural Data Files (SDF) for ligands

Output:

Output Type(s): Numerical scores (indicating binding affinities)
Output Format: List of scalar values
Other Properties Related to Output: Only the rank of the predicted values matters because the model produces comparative values instead of absolute binding energies.

Software Integration:

Runtime Engine(s):

BioNeMo (1.7), NeMo

Supported Hardware Microarchitecture Compatibility:

Ampere
Hopper

Preferred/Supported Operating System(s):

Linux

Model Version(s):

dsmbind.pth, version: 1.7

Training & Evaluation:

Training Dataset:

Link: a subset from PDB
Data Collection Method by dataset

Human
Properties (Quantity, Dataset Descriptions, Sensor(s)): Our DSMBind checkpoint was trained using a subset of PDB. This subset includes a total of 25,561 samples, each representing a unique protein-ligand complex.

Evaluation Dataset:

Link: CASF-16
Data Collection Method by dataset

Human
Labeling Method by dataset
Hybrid: Human & Automated
Properties (Quantity, Dataset Descriptions, Sensor(s)): CASF-16 is an open challenge for comparative assessment of scoring functions. This benchmark has 285 protein-ligand complexes with binding affinity labels.

Inference:

Engine: BioNeMo, NeMo
Test Hardware:

Ampere

Evaluation Results

We use gaussian noise to perturbe the ligand coordinates during training. We evaluate our trained DSMBind model on the CASF-16 benchmark. We measure the Pearson correlation coefficient to assess the linear relationship between the predicted scalar values and actual binding affinities. The trained checkpoint can achieve a Pearson correlation coefficient of 0.64.

Limitations

DSMBind produces comparative values which are useful to rank complexes. But it does not provide absolute measures that are directly comparable to experimental ground truth affinities.