NVIDIA
NVIDIA
Consistency Distilled Synthetic Protein Database
Resource
NVIDIA
NVIDIA
Consistency Distilled Synthetic Protein Database

This resource consists of the Consistency Distilled Dataset used for Proteina-Atomistica model training.

Dataset Description:

The Consistency Distilled Synthetic Protein Database is a curated collection of high-quality, codesignable protein sequence–structure pairs designed to overcome limitations present in datasets derived from the AlphaFold Database (AFDB). The existing AFDB contains pairs that are not reproducible by state-of-the-art folding models such as ESMFold, AlphaFold2, or Boltz-1, indicating that many sequences may not accurately fold into their predicted structures. To address this, the Proteina-Atomistica Consistency Distilled Database was built from scratch using ProteinMPNN to generate multiple synthetic sequences for each Foldseek AFDB cluster representative structure. These sequences were then re-folded to obtain fully atomistic, self-consistent models. The result aligns the diversity of the original AFDB with the consistency of inverse folding and re-folding.

This dataset is ready for commercial/non-commercial use.

Dataset Owner(s):

NVIDIA Corporation

Dataset Creation Date:

5/15/2025

License/Terms of Use:

CC_BY-4.0

Intended Usage:

Protein designers and researchers alike who wish to scale their protein AI models to predict structure, sequences, and properties.

Dataset Quantification

Record count: 455,473 protein structures

Feature count: 6 metadata features for each structure (id, length, plddt_avg, plddt_std, rmsd_ca, pmpnn_seq)

Measurement of Total Data Storage: 10.96 GB

Reference(s):

https://arxiv.org/pdf/2512.01976

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns (https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).

Publisher
NVIDIA
NVIDIA
Latest Versionrelease
UpdatedDecember 9, 2025 UTC
Compressed Size10.96 GB

NVIDIA uses cookies to improve your experience on our web site. We and our third-party partners also use cookies and other tools to collect and record information you provide as well as information about your interactions with our websites for performance improvement, analytics, and to assist in marketing efforts. By clicking "Accept All", you consent to our use of cookies and other tools as described in our Cookie Policy. You can manage your cookie settings by clicking on "Manage Settings." By continuing to use this site or by clicking one of the buttons below, you agree to our Terms of Service (which contains important waivers). Please see our Privacy Policy for more information on our privacy practices.