NGC Catalog
CLASSIC
Welcome Guest
Containers
NeMo Framework

NeMo Framework

For copy image paths and more information, please view on a desktop device.
Logo for NeMo Framework
Features
Description
NVIDIA NeMo™ framework supports enterprise development of LLMs and generative AI models with automated data processing, model training techniques, and flexible deployment options.
Publisher
NVIDIA
Latest Tag
25.04
Modified
May 7, 2025
Compressed Size
28.86 GB
Multinode Support
Yes
Multi-Arch Support
Yes
25.04 (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

Linux / arm64

Sorry, your browser does not support inline SVG.

What is the NeMo Framework Container?

NVIDIA NeMo™ is an end-to-end platform for development of custom generative AI models anywhere. NVIDIA NeMo framework is designed for enterprise development, it utilizes NVIDIA's state-of-the-art technology to facilitate a complete workflow from automated distributed data processing to training of large-scale bespoke models using sophisticated 3D parallelism techniques, and finally, deployment using retrieval-augmented generation for large-scale inference on an infrastructure of your choice, be it on-premises or in the cloud.

For enterprises running their business on AI, NVIDIA AI Enterprise provides a production-grade, secure, end-to-end software platform that includes NeMo as well as generative AI reference applications and enterprise support to streamline adoption. Now organizations can integrate AI into their operations, streamlining processes, enhancing decision-making capabilities, and ultimately driving greater value.

What You Get with NVIDIA NeMo Framework Container

At the heart of the NeMo framework lies the unification of distributed training and advanced parallelism. NeMo expertly uses GPU resources and memory across nodes, leading to groundbreaking efficiency gains. By dividing the model and training data, NeMo enables seamless multi-node and multi-GPU training, significantly reducing training time and enhancing overall productivity. A standout feature of NeMo is its incorporation of various parallelism and memory saving techniques:

Parallelism Techniques

  • Data Parallelism
  • Fully Sharded Data Parallelism (FSDP)
  • Tensor Parallelism
  • Pipeline Parallelism
  • Sequence Parallelism
  • Expert Parallelism
  • Context Parallelism

Memory-Saving Techniques

  • Selective Activation Recompute (SAR)
  • CPU offloading (Activation, Weights)
  • Attention: Flash Attention (FA), Grouped Query Attention (GQA), Multi-Query Attention (MQA), Sliding Window Attention (SWA)

NeMo framework container is the leading solution to support multimodality training at scale. The platform supports language and multimodal models including Llama 2, Falcon, and CLIP, Stable Diffusion, LLAVA, and various text-based generative AI architectures including GPT, T5, BERT, MoE, RETRO. In addition to large language models (LLM), NeMo supports several pretrained models for Computer Vision (CV), Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text to Speech (TTS).

The NeMo framework container offers an array of techniques to refine pretrained LLMs for specialized use cases including p-tuning, LoRA, Supervised fine-tuning (SFT), Reinforcement learning from human feedback (RLHF), SteerLM, and more. Through these diverse customization options, NeMo offers wide-ranging flexibility that is crucial in meeting varying business requirements.

Getting Started With NVIDIA NeMo

Refer to the NVIDIA NeMo playbooks page for step-by-step instructions on how to get started quickly with the NeMo framework.

Developer Container

For developers that want access to features that have been implemented but are not yet included in a major release please use the nvcr.io/nvidia/nemo:dev container.

The NeMo Framework Developer container is released on a weekly cadence. If you encounter problems please report the issue on GitHub and specify the date that the container was pulled.

Questions? See the current discussions and submit a question.

Report a bug? You can report a Dev container bug.

Technical Blogs

  • Building and Deploying Generative AI Models with NVIDIA NeMo Framework (video)
  • Getting Started with Large Language Models (blog)
  • How to Create a Custom Large Language Model (blog)
  • NVIDIA SteerLM: A Simple and Practical Technique to Customize LLMs During Inference (blog)
  • Unlocking the Power of Enterprise-Ready LLMs with NVIDIA NeMo (blog)

Documentation

More detailed documentation is available on the Nemo framework.

License

NeMo is licensed under the NVIDIA AI Product Agreement. By pulling and using the container, you accept the terms and conditions of this license.

This container contains Llama Materials governed by the Meta Llama3 Community License Agreement, and is Built with Meta Llama3.