Clara Discovery

Clara Discovery

Logo for Clara Discovery
Clara Discovery is a collection of frameworks, applications, and AI models enabling GPU-accelerated computational drug discovery
May 17, 2024
Sorry, your browser does not support inline SVG.
Helm Charts
Sorry, your browser does not support inline SVG.
Sorry, your browser does not support inline SVG.
Sorry, your browser does not support inline SVG.

What is NVIDIA Clara Discovery?

Clara Discovery is a collection of frameworks, applications, and AI models enabling GPU-accelerated computational drug discovery.

Drug development is a cross-disciplinary endeavor. Clara Discovery can be applied across the drug discovery process and combines accelerated computing, AI and machine learning in genomics, proteomics, microscopy, virtual screening, computational chemistry, visualization, clinical imaging, and natural language processing.

What is in this Collection?

Clara Discovery is a growing collection of frameworks, applications, and models enabling GPU-accelerated computational drug discovery. Specifically, Clara Discovery supports genomics workflows with Clara Parabricks, CryoEM pipelines with Relion, virtual screening with Autodock, Protein structure prediction with MELD, several 3rd party applications for molecular simulation, Clara Imaging pretrained models and training framework and Clara NLP with pre-trained models BioMegatron and BioBert and the NeMo training framework.


Clara Parabricks is a computational framework supporting genomics applications from DNA to RNA. It employs NVIDIA’s CUDA, HPC, AI, and data analytics stacks to build GPU accelerated libraries, pipelines, and reference application workflows for primary, secondary, and tertiary analysis. Clara Parabricks is a complete portfolio of off-the-shelf solutions coupled with a toolkit to support new application development to address the needs of genomic labs.


Clara Train SDK is a domain optimized developer application framework that includes APIs for AI-assisted annotation and a TensorFlow-based training framework with pre-trained models to kick start AI development with techniques like Transfer Learning, Federated Learning, and AutoML.


Clara NLP is a collection of SOTA biomedical pre-trained language models as well as highly optimized pipelines for training NLP models on biomedical and clinical text. Using NeMo and BioMegatron, researchers & data scientists can build even more powerful NLP models on the large corpus of textual data that they have at hand.

Protein Structure

RELION (REgularized LIkelihood OptimizatioN) implements an empirical Bayesian approach for analysis of electron cryo-microscopy (Cryo-EM). Specifically it provides methods of refinement of singular or multiple 3D reconstructions as well as 2D class averages. RELION is an important tool in the study of living cells.

MELD is a molecular simulation framework for determining protein structures by combining semi-reliable data with atomistic physical models by Bayesian inference.


Cheminformatics is a demonstration of real-time exploration and analysis of a database of chemical compounds. Molecules are clustered based on chemical similarity and visualized with an interactive plot. Users are able to explore in real time regions of interest in chemical space, generate molecules, and see the corresponding chemical structures and physical properties.

MegaMolBART is a seq2seq transformer model that understands chemistry and can be used for a variety of cheminformatics applications in drug discovery. The embeddings from its encoder can be used as features for predictive models. Alternatively, the encoder and decoder can be used together to generate novel molecules by sampling the model's latent space. MegaMolBART can be used in the real-time explorer or using the gRPC service.

SE(3)-Transformer is a graph neural network using a variant of self-attention for 3D points and graphs processing. This model enables you to predict quantum chemical properties of small organic molecules in the QM9 dataset.

Virtual Screening

Autodock is a growing collection of methods for computational docking and virtual screening, for use in structure-based drug discovery and exploration of the basic mechanisms of biomolecular structure and function.

Molecular Simulation

GROMACS is a molecular dynamics application for atomistic simulation. GROMACS is designed to simulate biomolecular systems like proteins, lipids, and nucleic acids. A wide variety of methods are supported.

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.

Tinker-HP is a CPUs and GPUs based, multi-precision, MPI massively parallel package dedicated to long polarizable molecular dynamics simulations and to polarizable QM/MM.

VMD is designed for modeling, visualization, and analysis of biomolecular systems such as proteins, nucleic acids, lipid membranes, carbohydrate structures, etc. VMD provides a wide variety of graphical representations for visualizing and coloring molecular structures as well as a powerful scripting language for post-processing and automation of visualization tasks.

TorchANI is a PyTorch implementation of ANI(Accurate NeurAl networK engINe for Molecular Energies), created and maintained by the Roitberg group. TorchANI contains classes like AEVComputer, ANIModel, and EnergyShifter that can be pipelined to compute molecular energies from the 3D coordinates of molecules.

DeePMD-Kit is a package written in Python/C++, designed to minimize the effort required to build deep learning based models of interatomic potential energy and force field and to perform molecular dynamics (MD). This brings new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems.

Hardware Requirements

Clara supports CUDA compute capability 6.0 and higher. This corresponds to GPUs in the Pascal, Volta, Turing, and Ampere families.

Clara Train recommends NGC-Ready systems, with NVIDIA Tesla V100 GPUs or NVIDIA Tesla T4 GPUs.

Driver Requirements

Clara is based on NVIDIA CUDA 10.1.243, which requires NVIDIA Driver release 418.xx. However, if you are running on Tesla (for example, T4 or any other Tesla board), you may use NVIDIA driver release 384.111+ or 410. You may also use driver release 396 on Tesla T4.


End User License Agreement is included with the product. Licenses are also available along with the model application zip file. By pulling and using the Clara Train SDK container and downloading models, you accept the terms and conditions of these licenses.

Technical Support

Use the NVIDIA Devtalk forum for questions regarding this Software