NGC Catalog
CLASSIC
Welcome Guest
Containers
NeMo Retriever Extraction

NeMo Retriever Extraction

For copy image paths and more information, please view on a desktop device.
Logo for NeMo Retriever Extraction
Associated Products
Features
Description
NeMo Retriever extraction is a scalable, performance oriented document content and metadata extraction microservice.
Publisher
NVIDIA
Latest Tag
25.4.2
Modified
May 7, 2025
Compressed Size
3.17 GB
Multinode Support
No
Multi-Arch Support
No
25.4.2 (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

NeMo Retriever extraction also known as NVIDIA Ingest and nv-ingest

NeMo Retriever extraction is a scalable, performance oriented document content and metadata extraction microservice. Including support for parsing PDFs, Word and PowerPoint documents, nv-ingest uses specialized nvidia image NIMs to find, contextualize, and extract text, tables, charts and images for use in downstream generative applications.

NeMo Retriever extraction enables parallelization of splitting documents into pages where artifacts are classified (such as text, tables, charts, and images), extracted, and further contextualized through optical character recognition (OCR) into a well defined JSON schema. From there, NeMo Retriever extraction can optionally manage computation of embeddings for the extracted content, and optionally manage storing into a vector database Milvus.

Documentation

For more details, please vis the NVIDIA Ingest GitHub Repository.

Governing Terms

The container is governed by NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement and NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product; and the NeMo Retriever extraction is released under the Apache-2.0 license.

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.