Download this resource to run enterprise RAG applications based on NVIDIA services with Docker Compose.
Install Docker Engine and Docker Compose.
Verify NVIDIA GPU driver version 535 or later is installed.
$ nvidia-smi --query-gpu=driver_version --format=csv,noheader
535.129.03
$ nvidia-smi -q -d compute
==============NVSMI LOG==============
Timestamp : Sun Nov 26 21:17:25 2023
Driver Version : 535.129.03
CUDA Version : 12.2
Attached GPUs : 1
GPU 00000000:CA:00.0
Compute Mode : Default
Refer to the NVIDIA Linux driver installation instructions for more information.
Install the NVIDIA Container Toolkit.
Verify the toolkit is installed and configured as the default container runtime.
$ cat /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi -L
GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-d8ce95c1-12f7-3174-6395-e573163a2ace)
Create an NGC account and API Key. Refer to the instructions to create an account and generate an NGC API key.
Log in to the NVIDIA container registry using the following command:
docker login nvcr.io
Export the NGC_API_KEY
export NGC_API_KEY=<ngc-api-key>
Refer to Accessing And Pulling an NGC Container Image via the Docker CLI for more information.
This example showcases a baseline Llamaindex based RAG pipeline that deploys the NIM for LLMs microservice to host a TensorRT optimized LLM and the NeMo Retriever Embedding microservice. Milvus is deployed as the vector database to store embeddings and generate responses to queries.
LLM Model | Embedding | Framework | Document Type | Vector Database | Model deployment platform |
---|---|---|---|---|---|
meta/llama3-8b-instruct | nv-embedqa-e5-v5 | llama-index | PDF/Text | Milvus | On Prem |
Complete the Common Prerequisites.
Run the pipeline.
cd rag-app-text-chatbot-llamaindex/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
Check status of the containers.
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
CONTAINER ID NAMES STATUS
32515fcb8ad2 rag-playground Up 26 minutes
d60e0cee49f7 rag-application-text-chatbot-langchain Up 27 minutes
02c8062f15da nemo-retriever-embedding-microservice Up 27 minutes (healthy)
7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes
55135224e8fd milvus-standalone Up 48 minutes (healthy)
5844248a08df milvus-minio Up 48 minutes (healthy)
c42df344bb25 milvus-etcd Up 48 minutes (healthy)
Open your browser and interact with the RAG Playground at http://localhost:3001/converse.
This example showcases a baseline Langchain based RAG pipeline that deploys the NIM for LLMs microservice to host a TensorRT optimized LLM and the NeMo Retriever Embedding microservice. Milvus is deployed as the vector database to store embeddings and generate responses to queries.
LLM Model | Embedding | Framework | Document Type | Vector Database | Model deployment platform |
---|---|---|---|---|---|
meta/llama3-8b-instruct | nv-embedqa-e5-v5 | llama-index | PDF/Text | Milvus | On Prem |
Complete the Common Prerequisites.
Run the pipeline.
cd rag-app-text-chatbot-langchain/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
Check status of the containers.
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
CONTAINER ID NAMES STATUS
32515fcb8ad2 rag-playground Up 26 minutes
d60e0cee49f7 rag-application-text-chatbot-llamaindex Up 27 minutes
02c8062f15da nemo-retriever-embedding-microservice Up 27 minutes (healthy)
7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes
55135224e8fd milvus-standalone Up 48 minutes (healthy)
5844248a08df milvus-minio Up 48 minutes (healthy)
c42df344bb25 milvus-etcd Up 48 minutes (healthy)
Open your browser and interact with the RAG Playground at http://localhost:3001/converse.
This example showcases a multimodal use case in a RAG pipeline. The example understands any kind of image in PDF, such as graphs and plots, alongside text and tables. The example uses multimodal models from NVIDIA API Catalog to answer queries.
LLM Model | Embedding | Framework | Document Type | Vector Database | Model deployment platform |
---|---|---|---|---|---|
meta/llama3-8b-instruct for response generation, ai-google-Deplot for graph to text convertion, ai-Neva-22B for image to text convertion | nv-embedqa-e5-v5 | Custom Python | PDF with images/PPTX | Milvus | Cloud - NVIDIA API Catalog for ai-google-Deplot and ai-Neva-22B. On Prem NeMo Inference Microservices for meta/llama3-8b-instruct model |
Complete the Common Prerequisites.
Export your NVIDIA API key in your terminal. The key is used to access Neva 22B and Deplot models from NVIDIA API Catalog.
export NVIDIA_API_KEY="nvapi-*"
Run the pipeline.
cd rag-app-multimodal-chatbot/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
Check status of the containers.
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
CONTAINER ID NAMES STATUS
32515fcb8ad2 rag-playground Up 26 minutes
d60e0cee49f7 rag-application-multimodal-chatbot Up 27 minutes
02c8062f15da nemo-retriever-embedding-microservice Up 27 minutes (healthy)
7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes
55135224e8fd milvus-standalone Up 48 minutes (healthy)
5844248a08df milvus-minio Up 48 minutes (healthy)
c42df344bb25 milvus-etcd Up 48 minutes (healthy)
Open your browser and interact with the RAG Playground at http://localhost:3001/converse.
This example showcases a multi-turn use case in a RAG pipeline. The example stores the conversation history and knowledge base in Milvus and retrieves them at runtime to understand contextual queries. The example deploys NIM for LLMs and NeMo Retriever Embedding Microservice.
The example supports ingestion of PDF and .txt files. The documents are ingested in a dedicated document vector store. The prompt for the example is tuned to act as a document chat bot. For maintaining the conversation history, the chain server stores the previous user queries and the generated answers as a text entry in a different dedicated vector store for conversation history. Both vector stores are part of a LangChain LCEL chain as LangChain Retrievers. When the chain is invoked with a query, the query is passed through both retrievers. The retriever retrieves context from the document vector store and the closest matching conversation history from conversation history vector store. The document chunks retrieved from the document vector store are then passed through a reranker model to determine the most relevant top_k context. The context is then passed onto the LLM prompt for response generation.
LLM Model | Embedding | Ranking(Optional) | Framework | Document Type | Vector Database | Model deployment platform |
---|---|---|---|---|---|---|
meta/llama3-8b-instruct | nv-embedqa-e5-v5 | nv-rerankqa-mistral-4b-v3 | LangChain Expression Language | PDF/Text | Milvus | On Prem |
Complete the Common Prerequisites.
Run the pipeline.
cd rag-app-multiturn-chatbot/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
Deploy the NVIDIA NeMo Retriever Text Reranking NIM container.
USERID=$(id -u) docker compose -f docker-compose-nim-ms.yaml up -d ranking-ms
Check status of the containers.
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
CONTAINER ID NAMES STATUS
32515fcb8ad2 rag-playground Up 26 minutes
d60e0cee49f7 rag-application-multiturn-chatbot Up 27 minutes
02c8062f15da nemo-retriever-embedding-microservice Up 27 minutes (healthy)
02c8062f15da nemo-retriever-reranking-microservice Up 27 minutes (healthy)
7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes
55135224e8fd milvus-standalone Up 48 minutes (healthy)
5844248a08df milvus-minio Up 48 minutes (healthy)
c42df344bb25 milvus-etcd Up 48 minutes (healthy)
Open your browser and interact with the RAG Playground at http://localhost:3001/converse.
This example showcases a RAG use case built using the task decomposition paradigm. The chain server breaks down a query into smaller subtasks and then combines results from different subtasks to formulate the final answer. It uses models from NVIDIA API Catalog and LangChain as the framework.
LLM Model | Embedding | Framework | Document Type | Vector Database | Model deployment platform |
---|---|---|---|---|---|
meta/llama3-70b-instruct | nv-embedqa-e5-v5 | LangChain Agent | PDF/Text | Milvus | On Prem |
Complete the Common Prerequisites.
Run the pipeline.
cd rag-app-query-decomposition-agent/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
Check status of the containers.
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
CONTAINER ID NAMES STATUS
32515fcb8ad2 rag-playground Up 26 minutes
d60e0cee49f7 rag-app-query-decomposition-agent Up 27 minutes
02c8062f15da nemo-retriever-embedding-microservice Up 27 minutes (healthy)
7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes
55135224e8fd milvus-standalone Up 48 minutes (healthy)
5844248a08df milvus-minio Up 48 minutes (healthy)
c42df344bb25 milvus-etcd Up 48 minutes (healthy)
Open your browser and interact with the RAG Playground at http://localhost:3001/converse.
This example demonstrates a use case of RAG with structured CSV data. This approach does not involve embedding models or vector database solutions and instead uses PandasAI for interacting with the CSV data. For ingestion, the structured data is loaded from a CSV file into a Pandas dataframe. The pipeline can ingest multiple CSV files, provided they have identical columns. Ingestion of CSV files with differing columns is not currently supported and results in an exception.
The core functionality utilizes a PandasAI agent to extract information from the dataframe. This agent combines the query with the structure of the dataframe into an LLM prompt. The LLM then generates Python code to extract the required information from the dataframe. Subsequently, this generated code is executed on the dataframe, yielding the output dataframe.
To test the example, sample CSV files are available.
These are part of the structured data example Helm chart and represent a subset of the Microsoft Azure Predictive Maintenance from Kaggle.
The CSV data retrieval prompt is specifically tuned for three CSV files from this dataset: PdM_machines.csv, PdM_errors.csv, and PdM_failures.csv
.
Specify the CSV file to use in the rag-app-structured-data-chatbot.yaml
file within this resource by updating the environment variable CSV_NAME
.
By default, the file is set to PdM_machines
but you can change it to PdM_errors
or PdM_failures
.
Currently, customization of the CSV data retrieval prompt is not supported.
LLM Model (Data Retrieval) | LLM Model (Response Paraphrasing) | Embedding | Framework | Document Type | Vector Database | Model deployment platform |
---|---|---|---|---|---|---|
meta/llama3-70b-instruct | meta/llama3-70b-instruct | Not Used | PandasAI | CSV | Not Used | On Prem |
Complete the Common Prerequisites.
Run the pipeline.
cd rag-app-structured-data-chatbot/
USERID=$(id -u) docker compose --profile local-nim up -d
Check status of the containers.
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
CONTAINER ID NAMES STATUS
32515fcb8ad2 rag-playground Up 26 minutes
d60e0cee49f7 rag-app-structured-data-chatbot Up 27 minutes
7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes
Open your browser and interact with the RAG Playground at http://localhost:3001/converse.
Perform the following steps to use Milvus as the vector database for an example.
Edit the Docker Compose file for the example, such as rag-app-text-chatbot-llamaindex.yaml
.
Update the environment variables within the chain server service:
services:
chain-server:
container_name: chain-server
environment:
APP_VECTORSTORE_NAME: "milvus"
APP_VECTORSTORE_URL: "http://milvus:19530"
Optional: Stop the services running
$ USERID=$(id -u) docker compose --profile local-nim --profile milvus down
Stop and then start the services:
$ USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
Optional: View the chain server logs to confirm the vector database.
View the logs:
$ docker logs -f <rag-example>
Try out a query after uploading any document and confirm the log output includes the vector database:
INFO:example:Ingesting <file-name>.pdf in vectorDB
INFO:RetrievalAugmentedGeneration.common.utils:Using milvus as vector store
INFO:RetrievalAugmentedGeneration.common.utils:Using milvus collection: <example-name>
Edit the Docker Compose file for the example to use pgvector as the vector database.
Update the environment variables within the chain server service:
services:
chain-server:
container_name: chain-server
environment:
APP_VECTORSTORE_NAME: "pgvector"
APP_VECTORSTORE_URL: "pgvector:5432"
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-password}
POSTGRES_USER: ${POSTGRES_USER:-postgres}
POSTGRES_DB: ${POSTGRES_DB:-api}
The preceding example shows the default values for the database user, password, and database.
To override the defaults, edit the values in the Docker Compose file, or set the values in the compose.env
file.
Optional: Stop the services running
$ USERID=$(id -u) docker compose --profile local-nim --profile pgvector down
Stop and then start the services:
$ USERID=$(id -u) docker compose --profile local-nim --profile pgvector up -d
Optional: View the chain server logs to confirm the vector database.
View the logs:
$ docker logs -f <rag-example>
Try out a query after uploading any document and confirm the log output includes the vector database:
INFO:example:Ingesting <file-name>.pdf in vectorDB
INFO:RetrievalAugmentedGeneration.common.utils:Using pgvector as vector store
INFO:RetrievalAugmentedGeneration.common.utils:Using PGVector collection: <example-name>
The Llama 3 70B Instruct model with NVIDIA NIM for LLMs is the only supported alternative model.
Edit the docker-compose-nim-ms.yaml
file and update the image for the nemollm-inference-microservice
service.
Update the image to nvcr.io/nim/meta/llama3-70b-instruct:1.0.0
, as shown in the following sample.
services:
nemollm-inference:
container_name: nemollm-inference-microservice
image: nvcr.io/nim/meta/llama3-70b-instruct:1.0.0
volumes:
- ${MODEL_DIRECTORY}:/opt/nim/.cache
ports:
- "8000:8000"
expose:
- "8000"
environment:
NGC_API_KEY: ${NGC_API_KEY}
shm_size: 20gb
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: ${INFERENCE_GPU_COUNT:-all}
capabilities: [gpu]
profiles: ["local-nim", "nemo-retriever"]
Start the container.
cd <example_dir>
docker compose --profile local-nim up -d
Clone the repository:
git lfs clone https://github.com/NVIDIA/GenerativeAIExamples.git
After you modify the chain server code, build the image:
USERID=$(id -u) docker compose build --no-cache chain-server
Tag and push the container to a private registry, such as NVIDIA NGC:
docker tag chain-server:latest nvcr.io/<org-name>/<team-name>/<example-name>:<version>
docker push nvcr.io/<org-name>/<team-name>/<example-name>:<version>
Modify the compose file for the enterprise example to run and set your image for the chain server:
services:
chain-server:
container_name: rag-application-text-chatbot
image: nvcr.io/<org-name>/<team-name>/<example-name>:<version>
...
Start the containers by running docker compose up -d
.
Learn more about how to use NVIDIA NIM microservices for RAG through our Deep Learning Institute. Access the course here.
The RAG applications are shared as reference architectures and are provided “as is”. The security of them in production environments is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats (including direct and indirect prompt injection); define the trust boundaries, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment including the containers up to date, ensure the containers are secure and free of vulnerabilities.
By downloading or using this artifact included in the AI Chatbot workflows you agree to the terms of the NVIDIA Software License Agreement and Product-specific Terms for AI products.