EquiDock DB5 | NVIDIA NGC

NGC Catalog

CLASSIC

Welcome Guest

For downloads and more information, please view on a desktop device.

Description

EquiDock is an SE(3)-equivariant model that can predict protein-protein complex formation from two invidual proteins.

Publisher

NVIDIA

Latest Version

1.1

Modified

January 18, 2024

Size

1.9 MB

Model Overview

EquiDock [1] is an SE(3)-equivariant model that can predict protein-protein complex formation from two invidual proteins. EquiDock treats both the proteins as rigid bodies and assumes no conformation change during the formation of the complex.

Intended Use

Almost all biological processes involve proteins interacting with each other. Prediction of protein complexes can be used understand disease progression and also in the design of novel proteins that can preferentially interact with other proteins of choice.

Model Architecture

EquiDock was implemented using the BioNeMo framework. EquiDock uses Independent E(3)-Equivariant Graph Matching Network (IEGMN) that performs graph feature embedding updates in addition to E(3)-equivariant coordinate updates. EquiDock uses a novel attention-based selection of keypoints to represent binding pocket of each protein via a differentiable variant of the Kabsch algortihm [2]. In addition, the EquiDock architecture also gaurantees commutativity ie., docking protein 1 with respect to protein 2 gives identical results as docking protein 2 with respect to protein 1.

Limitations

EquiDock is designed to treat proteins as rigid bodies and would not work well with proteins that undergo significant change in conformation upon binding.

Training

Dataset

The EquiDock model leverages two primary datasets: Docking Benchmark 5.5 (DB5.5) [3] and the Database of Interacting Protein Structures (DIPS) [4]. DB5.5, considered a gold standard dataset for data quality, consists of 253 structures. On the other hand, DIPS is a larger dataset of protein complex structures obtained from the Protein Data Bank (PDB), specifically tailored for rigid body docking. It's noted that DIPS contains only bound structures, making it suitable for rigid docking, while DB5.5 includes unbound protein structures, mostly displaying rigid structures.

Training involves initially training models on the training part of DIPS using the Adam optimizer with a learning rate of 2e-4. Early stopping is employed with a patience of 30 epochs. The best validation model is determined based on Ligand RMSD on the DIPS validation set.

For DB5.5, the DIPS pre-trained model is fine-tuned on the DB5.5 training set using a learning rate of 1e-4 and early stopping with 150 epochs patience. During training, roles of ligand and receptor are randomly assigned, and both ligand rotation and translation are randomly applied, even though the model is invariant to these operations.

Infrastructure and Configuration

The EquiDock model can be trained from scratch with the code and datasets provided. The equidock_dips.nemo and equidock_db5.nemo model checkpoints provided here were adapted from the public chekpoints available on GitHub.

Benchmarks

Two pretrained EquiDock checkpoints are available for download: equidock_dips.nemo and equidock_db5.nemo. These have been converted to the NeMo format from publicly available checkpoints. This section provides accuracy benchmarks for these models, as well as information on expected training speed performance. Currently, models trained from randomly initialized weights within the BioNeMo framework are not provided. The production of these models is ongoing work.

Accuracy Benchmarks

The accuracy of EquiDock was measured using the complex and interface root-mean-squared distance [1], which are measured over test datasets of DIPS and DB5.

RMSD type	Checkpoint Name	Dataset	Dataset Split for Measurement	Metric	BioNeMo	Public
Complex RMSD	equidock_dips.nemo	DIPS	test	median	13.39	13.39
Complex RMSD	equidock_dips.nemo	DIPS	test	mean	14.52	14.52
Complex RMSD	equidock_dips.nemo	DIPS	test	std	7.13	7.13
Interface RMSD	equidock_dips.nemo	DIPS	test	median	10.24	10.24
Interface RMSD	equidock_dips.nemo	DIPS	test	mean	11.92	11.92
Interface RMSD	equidock_dips.nemo	DIPS	test	std	7.01	7.01
Complex RMSD	equidock_db5.nemo	DB5	test	median	14.13	14.13
Complex RMSD	equidock_db5.nemo	DB5	test	mean	14.73	14.72
Complex RMSD	equidock_db5.nemo	DB5	test	std	5.31	5.31
Interface RMSD	equidock_db5.nemo	DB5	test	median	11.97	11.97
Interface RMSD	equidock_db5.nemo	DB5	test	mean	13.23	13.23
Interface RMSD	equidock_db5.nemo	DB5	test	std	4.93	4.93

Training Performance Benchmarks

Training speed was tested on DGX-A100 systems with 8 IEGM layers over DIPS dataset for one epoch, on GPUs with 80GB of memory.

References

Octavian-Eugen Ganea, Xinyuan Huang, Charlotte Bunne, Yatao Bian, Regina Barzilay, Tommi Jaakkola, and Andreas Krause. Independent se (3)-equivariant models for end-to-end rigid protein docking. arXiv preprint arXiv:2111.07786, 2021.
Wolfgang Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, 32(5):922–923, 1976.
Thom Vreven, Iain H Moal, Anna Vangone, Brian G Pierce, Panagiotis L Kastritis, Mieczyslaw Torchala, Raphael Chaleil, Brian Jiménez-García, Paul A Bates, Juan Fernandez-Recio, and others. Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. Journal of molecular biology, 427(19):3031–3041, 2015.
Raphael Townshend, Rishi Bedi, Patricia Suriana, and Ron Dror. End-to-end learning on 3d protein structure for interface prediction. Advances in Neural Information Processing Systems, 2019.

License

EquiDock is provided under Apache 2.0 license