Linux / amd64
NV-Ingest is a scalable, performance oriented document content and metadata extraction microservice. Including support for parsing PDFs, Word and PowerPoint documents, nv-ingest uses specialized nvidia image NIMs to find, contextualize, and extract text, tables, charts and images for use in downstream generative applications.
Based on nvidia Morpheus, nv-ingest parallelizes the process of splitting documents into pages who's contents are classified (as tables, charts, images, text), extracted into discrete content, and further contextualized (OCR) into a well defined JSON schema. From there, nv-ingest can optionally manage computation of embeddings for the extracted content, and the process storing into a vector database Milvus.
For more details, please vis the NVIDIA Ingest GitHub Repository.
By using the NVIDIA Ingest service, you acknowledge that you have read and agreed to the NVIDIA Evaluation License Agreement.