NGC Catalog
CLASSIC
Welcome Guest
Containers
NVIDIA NIM for GenMol

NVIDIA NIM for GenMol

For copy image paths and more information, please view on a desktop device.
Logo for NVIDIA NIM for GenMol
Associated Products
Features
Description
GenMol is a masked diffusion model trained on molecular SAFE representations for fragment-based molecule generation, which can serve as a generalist model for various drug discovery tasks.
Publisher
Nvidia
Latest Tag
latest
Modified
May 30, 2025
Compressed Size
8.16 GB
Multinode Support
No
Multi-Arch Support
No
latest (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

Model Overview

Description:

GenMol is a masked diffusion model1 trained on molecular Sequential Attachment-based Fragment Embedding (SAFE) representations2 for fragment-based molecule generation, which can serve as a generalist model for various drug discovery tasks, including De Novo generation​, linker design​, motif extension​, scaffold decoration/morphing​, hit generation​, and lead optimization.

This model is ready for commercial use.

License/Terms of Use:

This NIM is licensed under NVIDIA AI Foundation Models Community License Agreement. By using this NIM, you accept the terms and conditions of this license. You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.

References:

@misc{sahoo2024simpleeffectivemaskeddiffusion,
      title={Simple and Effective Masked Diffusion Language Models}, 
      author={Subham Sekhar Sahoo and Marianne Arriola and Yair Schiff and Aaron Gokaslan and Edgar Marroquin and Justin T Chiu and Alexander Rush and Volodymyr Kuleshov},
      year={2024},
      eprint={2406.07524},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.07524}, 
}
@misc{noutahi2023gottasafenewframework,
      title={Gotta be SAFE: A New Framework for Molecular Design}, 
      author={Emmanuel Noutahi and Cristian Gabellini and Michael Craig and Jonathan S. C Lim and Prudencio Tossou},
      year={2023},
      eprint={2310.10773},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2310.10773}, 
}

Model Architecture:

Architecture Type: Transformer
Network Architecture: BERT

Input:

Input Type(s): Text (Molecular Sequence), Number (Molecules to generate, SoftMax temperature scaling factor, randomness factor, diffusion step-size), Enumeration (Scoring method), Binary (Showing unique molecules only)
Input Format(s): Text: String (Sequential Attachment-based Fragment Embedding (SAFE)); Number: Integer, FP32; Enumeration: String (QED, LogP); Binary: Boolean
Input Parameters: 1D
Other Properties Related to Input: Maximum input length is 512 tokens.

Output:

Output Type(s): Text (List of molecule sequences), Number (List of scores)
Output Format: Text: Array of string (Sequential Attachment-based Fragment Embedding (SAFE)); Number: Array of FP32 (Scores)
Output Parameters: 2D
Other Properties Related to Output: Maximum output length is 512 tokens.

Software Integration:

Runtime Engine(s): PyTorch >= 2.5.1

Supported Hardware Microarchitecture Compatibility:
NVIDIA Ampere
NVIDIA Ada Lovelace
NVIDIA Hopper
NVIDIA Grace Hopper

[Preferred/Supported] Operating System(s):
Linux

Model Version(s):

GenMol v1.0

Training & Evaluation Dataset:

Training and Testing Dataset:

Link: SAFE-GPT GitHub, HuggingFace,
Data Collection Method by dataset: Automated
Labeling Method by dataset: Automated
Properties: 1.1B SAFE strings consist of various molecule types (drug-like compounds, peptides, multi-fragment molecules, polymers, reagents and non-small molecules).
Dataset License(s): CC-BY-4.0

Evaluation Dataset:

Link: SAFE-DRUGS GitHub, HuggingFace
Data Collection Method by dataset: Not Applicable
Labeling Method by dataset: Not Applicable
Properties: SAFE-DRUGS consists of 26 known therapeutic drugs.
Dataset License(s): CC-BY-4.0

Inference:

Engine: PyTorch
Test Hardware: A6000, A100, L40, L40S, H100

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards here.

Please report security vulnerabilities or NVIDIA AI Concerns here.