Linux / amd64
NeMo Retriever extraction is a scalable, performance oriented document content and metadata extraction microservice. Including support for parsing PDFs, Word and PowerPoint documents, nv-ingest uses specialized nvidia image NIMs to find, contextualize, and extract text, tables, charts and images for use in downstream generative applications.
NeMo Retriever extraction enables parallelization of splitting documents into pages where artifacts are classified (such as text, tables, charts, and images), extracted, and further contextualized through optical character recognition (OCR) into a well defined JSON schema. From there, NeMo Retriever extraction can optionally manage computation of embeddings for the extracted content, and optionally manage storing into a vector database Milvus.
For more details, please vis the NVIDIA Ingest GitHub Repository.
The container is governed by NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement and NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product; and the NeMo Retriever extraction is released under the Apache-2.0 license.
You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.