NVIDIA
NVIDIA
DSMBind
Model
NVIDIA
NVIDIA
DSMBind

DSMBind is an energy-based model that has been trained on protein-ligand complexes to predict binding affinities. The model produces comparative values that are useful for ranking protein-ligand binding affinities.

Sign in to access all content for this ModelSigning in will also allow download accessSign In

Model Overview

Description:

DSMBind [1,2] is an energy-based model that has been trained on protein-ligand complexes to predict binding affinities. The model produces comparative values that are useful for ranking protein-ligand binding affinities. This model is for research and development only.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the model card by Broad Institute of MIT and Harvard.

License

DSMBind is provided under the Apache License 2.0.

References:

[1] Wengong Jin, Siranush Sarkizova, Xun Chen, Nir Hacohen, and Caroline Uhler. "Unsupervised protein-ligand binding energy prediction via neural euler's rotation equation." Advances in Neural Information Processing Systems 36 (2024).

[2] Wengong Jin, Xun Chen, Amrita Vetticaden, Siranush Sarzikova, Raktima Raychowdhury, Caroline Uhler, and Nir Hacohen. "DSMBind: SE (3) denoising score matching for unsupervised binding energy prediction and nanobody design." bioRxiv (2023): 2023-12.

Model Architecture:

Architecture Type: Energy-Based Model (EBM)

Network Architecture: SE(3)-Invariant Neural Network

Input:

Input Type(s): Text (PDB, SDF)

Input Format(s): Protein Data Bank (PDB) Structure files for proteins, Structural Data Files (SDF) for ligands

Output:

Output Type(s): Numerical scores (indicating binding affinities)

Output Format: List of scalar values

Other Properties Related to Output: Only the rank of the predicted values matters because the model produces comparative values instead of absolute binding energies.

Software Integration:

Runtime Engine(s):

  • BioNeMo (1.7), NeMo

Supported Hardware Microarchitecture Compatibility:

  • Ampere
  • Hopper

Preferred/Supported Operating System(s):

  • Linux

Model Version(s):

dsmbind.pth, version: 1.7

Training & Evaluation:

Training Dataset:

Link: a subset from PDB

Data Collection Method by dataset

  • Human

    Properties (Quantity, Dataset Descriptions, Sensor(s)): Our DSMBind checkpoint was trained using a subset of PDB. This subset includes a total of 25,561 samples, each representing a unique protein-ligand complex.

Evaluation Dataset:

Link: CASF-16

Data Collection Method by dataset

  • Human

    Labeling Method by dataset
  • Hybrid: Human & Automated

    Properties (Quantity, Dataset Descriptions, Sensor(s)): CASF-16 is an open challenge for comparative assessment of scoring functions. This benchmark has 285 protein-ligand complexes with binding affinity labels.

Inference:

Engine: BioNeMo, NeMo

Test Hardware:

  • Ampere

Evaluation Results

We use gaussian noise to perturbe the ligand coordinates during training. We evaluate our trained DSMBind model on the CASF-16 benchmark. We measure the Pearson correlation coefficient to assess the linear relationship between the predicted scalar values and actual binding affinities. The trained checkpoint can achieve a Pearson correlation coefficient of 0.64.

Limitations

DSMBind produces comparative values which are useful to rank complexes. But it does not provide absolute measures that are directly comparable to experimental ground truth affinities.

Publisher
NVIDIA
NVIDIA
Latest Version1.7
UpdatedNovember 27, 2024 UTC
Compressed Size11.7 MB