NVIDIA
NVIDIA
AI Chatbot - Docker workflow
Resource
NVIDIA
NVIDIA
AI Chatbot - Docker workflow

AI Chatbots with RAG - Docker workflow

AI Chatbots with RAG - Docker Workflow

Download this resource to run enterprise RAG applications based on NVIDIA services with Docker Compose.

Example RAG Applications

  1. Canonical RAG Llamaindex
  2. Canonical RAG Langchain
  3. Multimodal RAG
  4. Multi-Turn RAG
  5. Query Decomposition RAG
  6. Structured Data based RAG

Common Customizations

  1. Configuring Milvus as the Vector Database
  2. Configuring pgvector as the Vector Database
  3. Inference Model: Llama 3 70B Instruct
  4. Custom Chain Server

Common Prerequisites

  1. Install Docker Engine and Docker Compose.

  2. Verify NVIDIA GPU driver version 535 or later is installed.

    $ nvidia-smi --query-gpu=driver_version --format=csv,noheader 535.129.03 $ nvidia-smi -q -d compute ==============NVSMI LOG============== Timestamp : Sun Nov 26 21:17:25 2023 Driver Version : 535.129.03 CUDA Version : 12.2 Attached GPUs : 1 GPU 00000000:CA:00.0 Compute Mode : Default

    Refer to the NVIDIA Linux driver installation instructions for more information.

  3. Install the NVIDIA Container Toolkit.

    Verify the toolkit is installed and configured as the default container runtime.

    $ cat /etc/docker/daemon.json { "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } } $ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi -L GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-d8ce95c1-12f7-3174-6395-e573163a2ace)
  4. Create an NGC account and API Key.
    Refer to the instructions to create an account and generate an NGC API key.

    Log in to the NVIDIA container registry using the following command:

        docker login nvcr.io

    Export the NGC_API_KEY

       export NGC_API_KEY=<ngc-api-key>

    Refer to Accessing And Pulling an NGC Container Image via the Docker CLI for more information.

01: Canonical RAG Llamaindex

This example showcases a baseline Llamaindex based RAG pipeline that deploys the NIM for LLMs microservice to host a TensorRT optimized LLM and the NeMo Retriever Embedding microservice.
Milvus is deployed as the vector database to store embeddings and generate responses to queries.

LLM ModelEmbeddingFrameworkDocument TypeVector DatabaseModel deployment platform
meta/llama3-8b-instructnv-embedqa-e5-v5llama-indexPDF/TextMilvusOn Prem

Deployment

  1. Complete the Common Prerequisites.

  2. Run the pipeline.

    cd rag-app-text-chatbot-llamaindex/ USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
  3. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}" CONTAINER ID NAMES STATUS 32515fcb8ad2 rag-playground Up 26 minutes d60e0cee49f7 rag-application-text-chatbot-langchain Up 27 minutes 02c8062f15da nemo-retriever-embedding-microservice Up 27 minutes (healthy) 7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes 55135224e8fd milvus-standalone Up 48 minutes (healthy) 5844248a08df milvus-minio Up 48 minutes (healthy) c42df344bb25 milvus-etcd Up 48 minutes (healthy)
  4. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

02: Canonical RAG Langchain

This example showcases a baseline Langchain based RAG pipeline that deploys the NIM for LLMs microservice to host a TensorRT optimized LLM and the NeMo Retriever Embedding microservice.
Milvus is deployed as the vector database to store embeddings and generate responses to queries.

LLM ModelEmbeddingFrameworkDocument TypeVector DatabaseModel deployment platform
meta/llama3-8b-instructnv-embedqa-e5-v5llama-indexPDF/TextMilvusOn Prem

Deployment

  1. Complete the Common Prerequisites.

  2. Run the pipeline.

    cd rag-app-text-chatbot-langchain/ USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
  3. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}" CONTAINER ID NAMES STATUS 32515fcb8ad2 rag-playground Up 26 minutes d60e0cee49f7 rag-application-text-chatbot-llamaindex Up 27 minutes 02c8062f15da nemo-retriever-embedding-microservice Up 27 minutes (healthy) 7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes 55135224e8fd milvus-standalone Up 48 minutes (healthy) 5844248a08df milvus-minio Up 48 minutes (healthy) c42df344bb25 milvus-etcd Up 48 minutes (healthy)
  4. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

03: Multimodal RAG

This example showcases a multimodal use case in a RAG pipeline.
The example understands any kind of image in PDF, such as graphs and plots, alongside text and tables.
The example uses multimodal models from NVIDIA API Catalog to answer queries.

LLM ModelEmbeddingFrameworkDocument TypeVector DatabaseModel deployment platform
meta/llama3-8b-instruct for response generation, ai-google-Deplot for graph to text convertion, ai-Neva-22B for image to text convertionnv-embedqa-e5-v5Custom PythonPDF with images/PPTXMilvusCloud - NVIDIA API Catalog for ai-google-Deplot and ai-Neva-22B. On Prem NeMo Inference Microservices for meta/llama3-8b-instruct model

Deployment

  1. Complete the Common Prerequisites.

  2. Export your NVIDIA API key in your terminal.
    The key is used to access Neva 22B and Deplot models from NVIDIA API Catalog.

       export NVIDIA_API_KEY="nvapi-*"
  3. Run the pipeline.

    cd rag-app-multimodal-chatbot/ USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
  4. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}" CONTAINER ID NAMES STATUS 32515fcb8ad2 rag-playground Up 26 minutes d60e0cee49f7 rag-application-multimodal-chatbot Up 27 minutes 02c8062f15da nemo-retriever-embedding-microservice Up 27 minutes (healthy) 7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes 55135224e8fd milvus-standalone Up 48 minutes (healthy) 5844248a08df milvus-minio Up 48 minutes (healthy) c42df344bb25 milvus-etcd Up 48 minutes (healthy)
  5. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

04: Multi-Turn RAG

This example showcases a multi-turn use case in a RAG pipeline.
The example stores the conversation history and knowledge base in Milvus and retrieves them at runtime to understand contextual queries.
The example deploys NIM for LLMs and NeMo Retriever Embedding Microservice.

The example supports ingestion of PDF and .txt files.
The documents are ingested in a dedicated document vector store.
The prompt for the example is tuned to act as a document chat bot.
For maintaining the conversation history, the chain server stores the previous user queries and the generated answers as a text entry in a different dedicated vector store for conversation history.
Both vector stores are part of a LangChain LCEL chain as LangChain Retrievers.
When the chain is invoked with a query, the query is passed through both retrievers.
The retriever retrieves context from the document vector store and the closest matching conversation history from conversation history vector store. The document chunks retrieved from the document vector store are then passed through a reranker model to determine the most relevant top_k context. The context is then passed onto the LLM prompt for response generation.

LLM ModelEmbeddingRanking(Optional)FrameworkDocument TypeVector DatabaseModel deployment platform
meta/llama3-8b-instructnv-embedqa-e5-v5nv-rerankqa-mistral-4b-v3LangChain Expression LanguagePDF/TextMilvusOn Prem

Deployment

  1. Complete the Common Prerequisites.

  2. Run the pipeline.

    cd rag-app-multiturn-chatbot/ USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
  3. Deploy the NVIDIA NeMo Retriever Text Reranking NIM container.

       USERID=$(id -u) docker compose -f docker-compose-nim-ms.yaml up -d ranking-ms
  4. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}" CONTAINER ID NAMES STATUS 32515fcb8ad2 rag-playground Up 26 minutes d60e0cee49f7 rag-application-multiturn-chatbot Up 27 minutes 02c8062f15da nemo-retriever-embedding-microservice Up 27 minutes (healthy) 02c8062f15da nemo-retriever-reranking-microservice Up 27 minutes (healthy) 7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes 55135224e8fd milvus-standalone Up 48 minutes (healthy) 5844248a08df milvus-minio Up 48 minutes (healthy) c42df344bb25 milvus-etcd Up 48 minutes (healthy)
  5. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

05: Query Decomposition RAG

This example showcases a RAG use case built using the task decomposition paradigm.
The chain server breaks down a query into smaller subtasks and then combines results from different subtasks to formulate the final answer.
It uses models from NVIDIA API Catalog and LangChain as the framework.

LLM ModelEmbeddingFrameworkDocument TypeVector DatabaseModel deployment platform
meta/llama3-70b-instructnv-embedqa-e5-v5LangChain AgentPDF/TextMilvusOn Prem

Deployment

  1. Complete the Common Prerequisites.

  2. Run the pipeline.

    cd rag-app-query-decomposition-agent/ USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
  3. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}" CONTAINER ID NAMES STATUS 32515fcb8ad2 rag-playground Up 26 minutes d60e0cee49f7 rag-app-query-decomposition-agent Up 27 minutes 02c8062f15da nemo-retriever-embedding-microservice Up 27 minutes (healthy) 7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes 55135224e8fd milvus-standalone Up 48 minutes (healthy) 5844248a08df milvus-minio Up 48 minutes (healthy) c42df344bb25 milvus-etcd Up 48 minutes (healthy)
  4. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

06: Structured Data RAG

This example demonstrates a use case of RAG with structured CSV data.
This approach does not involve embedding models or vector database solutions and instead uses PandasAI for interacting with the CSV data.
For ingestion, the structured data is loaded from a CSV file into a Pandas dataframe.
The pipeline can ingest multiple CSV files, provided they have identical columns.
Ingestion of CSV files with differing columns is not currently supported and results in an exception.

The core functionality utilizes a PandasAI agent to extract information from the dataframe.
This agent combines the query with the structure of the dataframe into an LLM prompt.
The LLM then generates Python code to extract the required information from the dataframe.
Subsequently, this generated code is executed on the dataframe, yielding the output dataframe.

To test the example, sample CSV files are available.
These are part of the structured data example Helm chart and represent a subset of the Microsoft Azure Predictive Maintenance from Kaggle.
The CSV data retrieval prompt is specifically tuned for three CSV files from this dataset: PdM_machines.csv, PdM_errors.csv, and PdM_failures.csv.
Specify the CSV file to use in the rag-app-structured-data-chatbot.yaml file within this resource by updating the environment variable CSV_NAME.
By default, the file is set to PdM_machines but you can change it to PdM_errors or PdM_failures.
Currently, customization of the CSV data retrieval prompt is not supported.

LLM Model (Data Retrieval)LLM Model (Response Paraphrasing)EmbeddingFrameworkDocument TypeVector DatabaseModel deployment platform
meta/llama3-70b-instructmeta/llama3-70b-instructNot UsedPandasAICSVNot UsedOn Prem

Deployment

  1. Complete the Common Prerequisites.

  2. Run the pipeline.

    cd rag-app-structured-data-chatbot/ USERID=$(id -u) docker compose --profile local-nim up -d
  3. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}" CONTAINER ID NAMES STATUS 32515fcb8ad2 rag-playground Up 26 minutes d60e0cee49f7 rag-app-structured-data-chatbot Up 27 minutes 7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes
  4. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

Configuring Milvus as the Vector Database

Perform the following steps to use Milvus as the vector database for an example.

  1. Edit the Docker Compose file for the example, such as rag-app-text-chatbot-llamaindex.yaml.

    Update the environment variables within the chain server service:

    services: chain-server: container_name: chain-server environment: APP_VECTORSTORE_NAME: "milvus" APP_VECTORSTORE_URL: "http://milvus:19530"
  2. Optional: Stop the services running

       $ USERID=$(id -u) docker compose --profile local-nim --profile milvus down
  3. Stop and then start the services:

       $ USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
  4. Optional: View the chain server logs to confirm the vector database.

    1. View the logs:

            $ docker logs -f <rag-example>
    2. Try out a query after uploading any document and confirm the log output includes the vector database:

      INFO:example:Ingesting <file-name>.pdf in vectorDB INFO:RetrievalAugmentedGeneration.common.utils:Using milvus as vector store INFO:RetrievalAugmentedGeneration.common.utils:Using milvus collection: <example-name>

Configuring pgvector as the Vector Database

  1. Edit the Docker Compose file for the example to use pgvector as the vector database.

    Update the environment variables within the chain server service:

    services: chain-server: container_name: chain-server environment: APP_VECTORSTORE_NAME: "pgvector" APP_VECTORSTORE_URL: "pgvector:5432" POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-password} POSTGRES_USER: ${POSTGRES_USER:-postgres} POSTGRES_DB: ${POSTGRES_DB:-api}

    The preceding example shows the default values for the database user, password, and database.
    To override the defaults, edit the values in the Docker Compose file, or set the values in the compose.env file.

  2. Optional: Stop the services running

       $ USERID=$(id -u) docker compose --profile local-nim --profile pgvector down
  3. Stop and then start the services:

       $ USERID=$(id -u) docker compose --profile local-nim --profile pgvector up -d
  4. Optional: View the chain server logs to confirm the vector database.

    1. View the logs:

            $ docker logs -f <rag-example>
    2. Try out a query after uploading any document and confirm the log output includes the vector database:

      INFO:example:Ingesting <file-name>.pdf in vectorDB INFO:RetrievalAugmentedGeneration.common.utils:Using pgvector as vector store INFO:RetrievalAugmentedGeneration.common.utils:Using PGVector collection: <example-name>

Alternative LLM Model: Deploy Meta Llama 3 70B Instruct

The Llama 3 70B Instruct model with NVIDIA NIM for LLMs is the only supported alternative model.

  1. Edit the docker-compose-nim-ms.yaml file and update the image for the nemollm-inference-microservice service.

    Update the image to nvcr.io/nim/meta/llama3-70b-instruct:1.0.0, as shown in the following sample.

    services: nemollm-inference: container_name: nemollm-inference-microservice image: nvcr.io/nim/meta/llama3-70b-instruct:1.0.0 volumes: - ${MODEL_DIRECTORY}:/opt/nim/.cache ports: - "8000:8000" expose: - "8000" environment: NGC_API_KEY: ${NGC_API_KEY} shm_size: 20gb deploy: resources: reservations: devices: - driver: nvidia count: ${INFERENCE_GPU_COUNT:-all} capabilities: [gpu] profiles: ["local-nim", "nemo-retriever"]
  2. Start the container.

    cd <example_dir> docker compose --profile local-nim up -d

Custom Chain Server

  1. Clone the repository:

    git lfs clone https://github.com/NVIDIA/GenerativeAIExamples.git
  2. After you modify the chain server code, build the image:

    USERID=$(id -u) docker compose build --no-cache chain-server
  3. Tag and push the container to a private registry, such as NVIDIA NGC:

    docker tag chain-server:latest nvcr.io/<org-name>/<team-name>/<example-name>:<version> docker push nvcr.io/<org-name>/<team-name>/<example-name>:<version>
  4. Modify the compose file for the enterprise example to run and set your image for the chain server:

    services: chain-server: container_name: rag-application-text-chatbot image: nvcr.io/<org-name>/<team-name>/<example-name>:<version> ...
  5. Start the containers by running docker compose up -d.

Additional Resources

Learn more about how to use NVIDIA NIM microservices for RAG through our Deep Learning Institute. Access the course here.

Security considerations

The RAG applications are shared as reference architectures and are provided “as is”. The security of them in production environments is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats (including direct and indirect prompt injection); define the trust boundaries, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment including the containers up to date, ensure the containers are secure and free of vulnerabilities.

Licenses

By downloading or using this artifact included in the AI Chatbot workflows you agree to the terms of the NVIDIA Software License Agreement and Product-specific Terms for AI products.

Publisher
NVIDIA
NVIDIA
Latest Version24.08
UpdatedSeptember 4, 2025 UTC
Compressed Size174.57 KB