NVIDIA

AI Chatbot - Docker workflow

Resource

NVIDIA

AI Chatbot - Docker workflow

AI Chatbots with RAG - Docker workflow

AI Chatbots with RAG - Docker Workflow

Download this resource to run enterprise RAG applications based on NVIDIA services with Docker Compose.

Example RAG Applications

Canonical RAG Llamaindex
Canonical RAG Langchain
Multimodal RAG
Multi-Turn RAG
Query Decomposition RAG
Structured Data based RAG

Common Customizations

Configuring Milvus as the Vector Database
Configuring pgvector as the Vector Database
Inference Model: Llama 3 70B Instruct
Custom Chain Server

Common Prerequisites

Install Docker Engine and Docker Compose.

Verify NVIDIA GPU driver version 535 or later is installed.

$ nvidia-smi --query-gpu=driver_version --format=csv,noheader
535.129.03

$ nvidia-smi -q -d compute

==============NVSMI LOG==============

Timestamp                                 : Sun Nov 26 21:17:25 2023
Driver Version                            : 535.129.03
CUDA Version                              : 12.2

Attached GPUs                             : 1
GPU 00000000:CA:00.0
    Compute Mode                          : Default

Refer to the NVIDIA Linux driver installation instructions for more information.

Install the NVIDIA Container Toolkit.

Verify the toolkit is installed and configured as the default container runtime.

$ cat /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi -L
GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-d8ce95c1-12f7-3174-6395-e573163a2ace)

Create an NGC account and API Key. Refer to the instructions to create an account and generate an NGC API key.

Log in to the NVIDIA container registry using the following command:
```
docker login nvcr.io
```
Export the NGC_API_KEY
```
export NGC_API_KEY=<ngc-api-key>
```
Refer to Accessing And Pulling an NGC Container Image via the Docker CLI for more information.

01: Canonical RAG Llamaindex

This example showcases a baseline Llamaindex based RAG pipeline that deploys the NIM for LLMs microservice to host a TensorRT optimized LLM and the NeMo Retriever Embedding microservice. Milvus is deployed as the vector database to store embeddings and generate responses to queries.

LLM Model	Embedding	Framework	Document Type	Vector Database	Model deployment platform
meta/llama3-8b-instruct	nv-embedqa-e5-v5	llama-index	PDF/Text	Milvus	On Prem

Deployment

Complete the Common Prerequisites.

Run the pipeline.

cd rag-app-text-chatbot-llamaindex/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d

Check status of the containers.

docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

CONTAINER ID   NAMES                                   STATUS
32515fcb8ad2   rag-playground                          Up 26 minutes
d60e0cee49f7   rag-application-text-chatbot-langchain  Up 27 minutes
02c8062f15da   nemo-retriever-embedding-microservice   Up 27 minutes (healthy)
7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes
55135224e8fd   milvus-standalone                       Up 48 minutes (healthy)
5844248a08df   milvus-minio                            Up 48 minutes (healthy)
c42df344bb25   milvus-etcd                             Up 48 minutes (healthy)

Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

02: Canonical RAG Langchain

This example showcases a baseline Langchain based RAG pipeline that deploys the NIM for LLMs microservice to host a TensorRT optimized LLM and the NeMo Retriever Embedding microservice. Milvus is deployed as the vector database to store embeddings and generate responses to queries.

LLM Model	Embedding	Framework	Document Type	Vector Database	Model deployment platform
meta/llama3-8b-instruct	nv-embedqa-e5-v5	llama-index	PDF/Text	Milvus	On Prem

Deployment

Complete the Common Prerequisites.

Run the pipeline.

cd rag-app-text-chatbot-langchain/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d

Check status of the containers.

docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

CONTAINER ID   NAMES                                   STATUS
32515fcb8ad2   rag-playground                          Up 26 minutes
d60e0cee49f7   rag-application-text-chatbot-llamaindex Up 27 minutes
02c8062f15da   nemo-retriever-embedding-microservice   Up 27 minutes (healthy)
7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes
55135224e8fd   milvus-standalone                       Up 48 minutes (healthy)
5844248a08df   milvus-minio                            Up 48 minutes (healthy)
c42df344bb25   milvus-etcd                             Up 48 minutes (healthy)

Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

03: Multimodal RAG

This example showcases a multimodal use case in a RAG pipeline. The example understands any kind of image in PDF, such as graphs and plots, alongside text and tables. The example uses multimodal models from NVIDIA API Catalog to answer queries.

LLM Model	Embedding	Framework	Document Type	Vector Database	Model deployment platform
meta/llama3-8b-instruct for response generation, ai-google-Deplot for graph to text convertion, ai-Neva-22B for image to text convertion	nv-embedqa-e5-v5	Custom Python	PDF with images/PPTX	Milvus	Cloud - NVIDIA API Catalog for ai-google-Deplot and ai-Neva-22B. On Prem NeMo Inference Microservices for meta/llama3-8b-instruct model

Deployment

Complete the Common Prerequisites.
Export your NVIDIA API key in your terminal. The key is used to access Neva 22B and Deplot models from NVIDIA API Catalog.
```
export NVIDIA_API_KEY="nvapi-*"
```

Run the pipeline.

cd rag-app-multimodal-chatbot/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d

Check status of the containers.

docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

CONTAINER ID   NAMES                                   STATUS
32515fcb8ad2   rag-playground                          Up 26 minutes
d60e0cee49f7   rag-application-multimodal-chatbot      Up 27 minutes
02c8062f15da   nemo-retriever-embedding-microservice   Up 27 minutes (healthy)
7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes
55135224e8fd   milvus-standalone                       Up 48 minutes (healthy)
5844248a08df   milvus-minio                            Up 48 minutes (healthy)
c42df344bb25   milvus-etcd                             Up 48 minutes (healthy)

Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

04: Multi-Turn RAG

This example showcases a multi-turn use case in a RAG pipeline. The example stores the conversation history and knowledge base in Milvus and retrieves them at runtime to understand contextual queries. The example deploys NIM for LLMs and NeMo Retriever Embedding Microservice.

The example supports ingestion of PDF and .txt files. The documents are ingested in a dedicated document vector store. The prompt for the example is tuned to act as a document chat bot. For maintaining the conversation history, the chain server stores the previous user queries and the generated answers as a text entry in a different dedicated vector store for conversation history. Both vector stores are part of a LangChain LCEL chain as LangChain Retrievers. When the chain is invoked with a query, the query is passed through both retrievers. The retriever retrieves context from the document vector store and the closest matching conversation history from conversation history vector store. The document chunks retrieved from the document vector store are then passed through a reranker model to determine the most relevant top_k context. The context is then passed onto the LLM prompt for response generation.

LLM Model	Embedding	Ranking(Optional)	Framework	Document Type	Vector Database	Model deployment platform
meta/llama3-8b-instruct	nv-embedqa-e5-v5	nv-rerankqa-mistral-4b-v3	LangChain Expression Language	PDF/Text	Milvus	On Prem

Deployment

Complete the Common Prerequisites.

Run the pipeline.

cd rag-app-multiturn-chatbot/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d

Deploy the NVIDIA NeMo Retriever Text Reranking NIM container.

USERID=$(id -u) docker compose -f docker-compose-nim-ms.yaml up -d ranking-ms

Check status of the containers.

docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

CONTAINER ID   NAMES                                   STATUS
32515fcb8ad2   rag-playground                          Up 26 minutes
d60e0cee49f7   rag-application-multiturn-chatbot       Up 27 minutes
02c8062f15da   nemo-retriever-embedding-microservice   Up 27 minutes (healthy)
02c8062f15da   nemo-retriever-reranking-microservice   Up 27 minutes (healthy)
7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes
55135224e8fd   milvus-standalone                       Up 48 minutes (healthy)
5844248a08df   milvus-minio                            Up 48 minutes (healthy)
c42df344bb25   milvus-etcd                             Up 48 minutes (healthy)

Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

05: Query Decomposition RAG

This example showcases a RAG use case built using the task decomposition paradigm. The chain server breaks down a query into smaller subtasks and then combines results from different subtasks to formulate the final answer. It uses models from NVIDIA API Catalog and LangChain as the framework.

LLM Model	Embedding	Framework	Document Type	Vector Database	Model deployment platform
meta/llama3-70b-instruct	nv-embedqa-e5-v5	LangChain Agent	PDF/Text	Milvus	On Prem

Deployment

Complete the Common Prerequisites.

Run the pipeline.

cd rag-app-query-decomposition-agent/
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d

Check status of the containers.

docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

CONTAINER ID   NAMES                                   STATUS
32515fcb8ad2   rag-playground                          Up 26 minutes
d60e0cee49f7   rag-app-query-decomposition-agent       Up 27 minutes
02c8062f15da   nemo-retriever-embedding-microservice   Up 27 minutes (healthy)
7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes
55135224e8fd   milvus-standalone                       Up 48 minutes (healthy)
5844248a08df   milvus-minio                            Up 48 minutes (healthy)
c42df344bb25   milvus-etcd                             Up 48 minutes (healthy)

Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

06: Structured Data RAG

This example demonstrates a use case of RAG with structured CSV data. This approach does not involve embedding models or vector database solutions and instead uses PandasAI for interacting with the CSV data. For ingestion, the structured data is loaded from a CSV file into a Pandas dataframe. The pipeline can ingest multiple CSV files, provided they have identical columns. Ingestion of CSV files with differing columns is not currently supported and results in an exception.

The core functionality utilizes a PandasAI agent to extract information from the dataframe. This agent combines the query with the structure of the dataframe into an LLM prompt. The LLM then generates Python code to extract the required information from the dataframe. Subsequently, this generated code is executed on the dataframe, yielding the output dataframe.

To test the example, sample CSV files are available. These are part of the structured data example Helm chart and represent a subset of the Microsoft Azure Predictive Maintenance from Kaggle. The CSV data retrieval prompt is specifically tuned for three CSV files from this dataset: PdM_machines.csv, PdM_errors.csv, and PdM_failures.csv. Specify the CSV file to use in the rag-app-structured-data-chatbot.yaml file within this resource by updating the environment variable CSV_NAME. By default, the file is set to PdM_machines but you can change it to PdM_errors or PdM_failures. Currently, customization of the CSV data retrieval prompt is not supported.

LLM Model (Data Retrieval)	LLM Model (Response Paraphrasing)	Embedding	Framework	Document Type	Vector Database	Model deployment platform
meta/llama3-70b-instruct	meta/llama3-70b-instruct	Not Used	PandasAI	CSV	Not Used	On Prem

Deployment

Complete the Common Prerequisites.

Run the pipeline.

cd rag-app-structured-data-chatbot/
USERID=$(id -u) docker compose --profile local-nim up -d

Check status of the containers.

docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

CONTAINER ID   NAMES                                   STATUS
32515fcb8ad2   rag-playground                          Up 26 minutes
d60e0cee49f7   rag-app-structured-data-chatbot         Up 27 minutes
7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes

Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

Configuring Milvus as the Vector Database

Perform the following steps to use Milvus as the vector database for an example.

Edit the Docker Compose file for the example, such as rag-app-text-chatbot-llamaindex.yaml.

Update the environment variables within the chain server service:

services:
  chain-server:
    container_name: chain-server
    environment:
      APP_VECTORSTORE_NAME: "milvus"
      APP_VECTORSTORE_URL: "http://milvus:19530"

Optional: Stop the services running

$ USERID=$(id -u) docker compose --profile local-nim --profile milvus down

Stop and then start the services:

$ USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d

Optional: View the chain server logs to confirm the vector database.

View the logs:
```
$ docker logs -f <rag-example>
```

Try out a query after uploading any document and confirm the log output includes the vector database:

INFO:example:Ingesting <file-name>.pdf in vectorDB
INFO:RetrievalAugmentedGeneration.common.utils:Using milvus as vector store
INFO:RetrievalAugmentedGeneration.common.utils:Using milvus collection: <example-name>

Configuring pgvector as the Vector Database

Edit the Docker Compose file for the example to use pgvector as the vector database.

Update the environment variables within the chain server service:

services:
  chain-server:
    container_name: chain-server
    environment:
      APP_VECTORSTORE_NAME: "pgvector"
      APP_VECTORSTORE_URL: "pgvector:5432"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-password}
      POSTGRES_USER: ${POSTGRES_USER:-postgres}
      POSTGRES_DB: ${POSTGRES_DB:-api}

The preceding example shows the default values for the database user, password, and database. To override the defaults, edit the values in the Docker Compose file, or set the values in the compose.env file.

Optional: Stop the services running

$ USERID=$(id -u) docker compose --profile local-nim --profile pgvector down

Stop and then start the services:

$ USERID=$(id -u) docker compose --profile local-nim --profile pgvector up -d

Optional: View the chain server logs to confirm the vector database.

View the logs:
```
$ docker logs -f <rag-example>
```

Try out a query after uploading any document and confirm the log output includes the vector database:

INFO:example:Ingesting <file-name>.pdf in vectorDB
INFO:RetrievalAugmentedGeneration.common.utils:Using pgvector as vector store
INFO:RetrievalAugmentedGeneration.common.utils:Using PGVector collection: <example-name>

Alternative LLM Model: Deploy Meta Llama 3 70B Instruct

The Llama 3 70B Instruct model with NVIDIA NIM for LLMs is the only supported alternative model.

Edit the docker-compose-nim-ms.yaml file and update the image for the nemollm-inference-microservice service.

Update the image to nvcr.io/nim/meta/llama3-70b-instruct:1.0.0, as shown in the following sample.

services:
  nemollm-inference:
    container_name: nemollm-inference-microservice
    image: nvcr.io/nim/meta/llama3-70b-instruct:1.0.0
    volumes:
    - ${MODEL_DIRECTORY}:/opt/nim/.cache
    ports:
    - "8000:8000"
    expose:
    - "8000"
    environment:
      NGC_API_KEY: ${NGC_API_KEY}
    shm_size: 20gb
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: ${INFERENCE_GPU_COUNT:-all}
              capabilities: [gpu]
      profiles: ["local-nim", "nemo-retriever"]

Start the container.

cd <example_dir>
docker compose --profile local-nim up -d

Custom Chain Server

Clone the repository:

git lfs clone https://github.com/NVIDIA/GenerativeAIExamples.git

After you modify the chain server code, build the image:

USERID=$(id -u) docker compose build --no-cache chain-server

Tag and push the container to a private registry, such as NVIDIA NGC:

docker tag chain-server:latest nvcr.io/<org-name>/<team-name>/<example-name>:<version>

docker push nvcr.io/<org-name>/<team-name>/<example-name>:<version>

Modify the compose file for the enterprise example to run and set your image for the chain server:

services:
  chain-server:
    container_name: rag-application-text-chatbot
    image: nvcr.io/<org-name>/<team-name>/<example-name>:<version>
    ...

Start the containers by running docker compose up -d.

Additional Resources

Learn more about how to use NVIDIA NIM microservices for RAG through our Deep Learning Institute. Access the course here.

Security considerations

The RAG applications are shared as reference architectures and are provided “as is”. The security of them in production environments is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats (including direct and indirect prompt injection); define the trust boundaries, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment including the containers up to date, ensure the containers are secure and free of vulnerabilities.

Licenses

By downloading or using this artifact included in the AI Chatbot workflows you agree to the terms of the NVIDIA Software License Agreement and Product-specific Terms for AI products.

Publisher

NVIDIA

Latest Version24.08

UpdatedSeptember 4, 2025 UTC

Compressed Size174.57 KB