NGC Catalog
CLASSIC
Welcome Guest
Resources
AI Chatbot - Docker workflow

AI Chatbot - Docker workflow

For downloads and more information, please view on a desktop device.
Logo for AI Chatbot - Docker workflow
Description
AI Chatbots with RAG - Docker workflow
Publisher
NVIDIA
Latest Version
24.08
Modified
August 26, 2024
Compressed Size
174.57 KB

AI Chatbots with RAG - Docker Workflow

Download this resource to run enterprise RAG applications based on NVIDIA services with Docker Compose.

Example RAG Applications

  1. Canonical RAG Llamaindex
  2. Canonical RAG Langchain
  3. Multimodal RAG
  4. Multi-Turn RAG
  5. Query Decomposition RAG
  6. Structured Data based RAG

Common Customizations

  1. Configuring Milvus as the Vector Database
  2. Configuring pgvector as the Vector Database
  3. Inference Model: Llama 3 70B Instruct
  4. Custom Chain Server

Common Prerequisites

  1. Install Docker Engine and Docker Compose.

  2. Verify NVIDIA GPU driver version 535 or later is installed.

    $ nvidia-smi --query-gpu=driver_version --format=csv,noheader
    535.129.03
    
    $ nvidia-smi -q -d compute
    
    ==============NVSMI LOG==============
    
    Timestamp                                 : Sun Nov 26 21:17:25 2023
    Driver Version                            : 535.129.03
    CUDA Version                              : 12.2
    
    Attached GPUs                             : 1
    GPU 00000000:CA:00.0
        Compute Mode                          : Default
    

    Refer to the NVIDIA Linux driver installation instructions for more information.

  3. Install the NVIDIA Container Toolkit.

    Verify the toolkit is installed and configured as the default container runtime.

    $ cat /etc/docker/daemon.json
    {
        "default-runtime": "nvidia",
        "runtimes": {
            "nvidia": {
                "path": "/usr/bin/nvidia-container-runtime",
                "runtimeArgs": []
            }
        }
    }
    
    $ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi -L
    GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-d8ce95c1-12f7-3174-6395-e573163a2ace)
    
  4. Create an NGC account and API Key. Refer to the instructions to create an account and generate an NGC API key.

    Log in to the NVIDIA container registry using the following command:

    docker login nvcr.io
    

    Export the NGC_API_KEY

    export NGC_API_KEY=<ngc-api-key>
    

    Refer to Accessing And Pulling an NGC Container Image via the Docker CLI for more information.

01: Canonical RAG Llamaindex

This example showcases a baseline Llamaindex based RAG pipeline that deploys the NIM for LLMs microservice to host a TensorRT optimized LLM and the NeMo Retriever Embedding microservice. Milvus is deployed as the vector database to store embeddings and generate responses to queries.

LLM Model Embedding Framework Document Type Vector Database Model deployment platform
meta/llama3-8b-instruct nv-embedqa-e5-v5 llama-index PDF/Text Milvus On Prem

Deployment

  1. Complete the Common Prerequisites.

  2. Run the pipeline.

    cd rag-app-text-chatbot-llamaindex/
    USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
    
  3. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
    
    CONTAINER ID   NAMES                                   STATUS
    32515fcb8ad2   rag-playground                          Up 26 minutes
    d60e0cee49f7   rag-application-text-chatbot-langchain  Up 27 minutes
    02c8062f15da   nemo-retriever-embedding-microservice   Up 27 minutes (healthy)
    7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes
    55135224e8fd   milvus-standalone                       Up 48 minutes (healthy)
    5844248a08df   milvus-minio                            Up 48 minutes (healthy)
    c42df344bb25   milvus-etcd                             Up 48 minutes (healthy)
    
  4. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

02: Canonical RAG Langchain

This example showcases a baseline Langchain based RAG pipeline that deploys the NIM for LLMs microservice to host a TensorRT optimized LLM and the NeMo Retriever Embedding microservice. Milvus is deployed as the vector database to store embeddings and generate responses to queries.

LLM Model Embedding Framework Document Type Vector Database Model deployment platform
meta/llama3-8b-instruct nv-embedqa-e5-v5 llama-index PDF/Text Milvus On Prem

Deployment

  1. Complete the Common Prerequisites.

  2. Run the pipeline.

    cd rag-app-text-chatbot-langchain/
    USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
    
  3. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
    
    CONTAINER ID   NAMES                                   STATUS
    32515fcb8ad2   rag-playground                          Up 26 minutes
    d60e0cee49f7   rag-application-text-chatbot-llamaindex Up 27 minutes
    02c8062f15da   nemo-retriever-embedding-microservice   Up 27 minutes (healthy)
    7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes
    55135224e8fd   milvus-standalone                       Up 48 minutes (healthy)
    5844248a08df   milvus-minio                            Up 48 minutes (healthy)
    c42df344bb25   milvus-etcd                             Up 48 minutes (healthy)
    
  4. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

03: Multimodal RAG

This example showcases a multimodal use case in a RAG pipeline. The example understands any kind of image in PDF, such as graphs and plots, alongside text and tables. The example uses multimodal models from NVIDIA API Catalog to answer queries.

LLM Model Embedding Framework Document Type Vector Database Model deployment platform
meta/llama3-8b-instruct for response generation, ai-google-Deplot for graph to text convertion, ai-Neva-22B for image to text convertion nv-embedqa-e5-v5 Custom Python PDF with images/PPTX Milvus Cloud - NVIDIA API Catalog for ai-google-Deplot and ai-Neva-22B. On Prem NeMo Inference Microservices for meta/llama3-8b-instruct model

Deployment

  1. Complete the Common Prerequisites.

  2. Export your NVIDIA API key in your terminal. The key is used to access Neva 22B and Deplot models from NVIDIA API Catalog.

    export NVIDIA_API_KEY="nvapi-*"
    
  3. Run the pipeline.

    cd rag-app-multimodal-chatbot/
    USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
    
  4. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
    
    CONTAINER ID   NAMES                                   STATUS
    32515fcb8ad2   rag-playground                          Up 26 minutes
    d60e0cee49f7   rag-application-multimodal-chatbot      Up 27 minutes
    02c8062f15da   nemo-retriever-embedding-microservice   Up 27 minutes (healthy)
    7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes
    55135224e8fd   milvus-standalone                       Up 48 minutes (healthy)
    5844248a08df   milvus-minio                            Up 48 minutes (healthy)
    c42df344bb25   milvus-etcd                             Up 48 minutes (healthy)
    
  5. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

04: Multi-Turn RAG

This example showcases a multi-turn use case in a RAG pipeline. The example stores the conversation history and knowledge base in Milvus and retrieves them at runtime to understand contextual queries. The example deploys NIM for LLMs and NeMo Retriever Embedding Microservice.

The example supports ingestion of PDF and .txt files. The documents are ingested in a dedicated document vector store. The prompt for the example is tuned to act as a document chat bot. For maintaining the conversation history, the chain server stores the previous user queries and the generated answers as a text entry in a different dedicated vector store for conversation history. Both vector stores are part of a LangChain LCEL chain as LangChain Retrievers. When the chain is invoked with a query, the query is passed through both retrievers. The retriever retrieves context from the document vector store and the closest matching conversation history from conversation history vector store. The document chunks retrieved from the document vector store are then passed through a reranker model to determine the most relevant top_k context. The context is then passed onto the LLM prompt for response generation.

LLM Model Embedding Ranking(Optional) Framework Document Type Vector Database Model deployment platform
meta/llama3-8b-instruct nv-embedqa-e5-v5 nv-rerankqa-mistral-4b-v3 LangChain Expression Language PDF/Text Milvus On Prem

Deployment

  1. Complete the Common Prerequisites.

  2. Run the pipeline.

    cd rag-app-multiturn-chatbot/
    USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
    
  3. Deploy the NVIDIA NeMo Retriever Text Reranking NIM container.

    USERID=$(id -u) docker compose -f docker-compose-nim-ms.yaml up -d ranking-ms
    
  4. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
    
    CONTAINER ID   NAMES                                   STATUS
    32515fcb8ad2   rag-playground                          Up 26 minutes
    d60e0cee49f7   rag-application-multiturn-chatbot       Up 27 minutes
    02c8062f15da   nemo-retriever-embedding-microservice   Up 27 minutes (healthy)
    02c8062f15da   nemo-retriever-reranking-microservice   Up 27 minutes (healthy)
    7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes
    55135224e8fd   milvus-standalone                       Up 48 minutes (healthy)
    5844248a08df   milvus-minio                            Up 48 minutes (healthy)
    c42df344bb25   milvus-etcd                             Up 48 minutes (healthy)
    
  5. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

05: Query Decomposition RAG

This example showcases a RAG use case built using the task decomposition paradigm. The chain server breaks down a query into smaller subtasks and then combines results from different subtasks to formulate the final answer. It uses models from NVIDIA API Catalog and LangChain as the framework.

LLM Model Embedding Framework Document Type Vector Database Model deployment platform
meta/llama3-70b-instruct nv-embedqa-e5-v5 LangChain Agent PDF/Text Milvus On Prem

Deployment

  1. Complete the Common Prerequisites.

  2. Run the pipeline.

    cd rag-app-query-decomposition-agent/
    USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
    
  3. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
    
    CONTAINER ID   NAMES                                   STATUS
    32515fcb8ad2   rag-playground                          Up 26 minutes
    d60e0cee49f7   rag-app-query-decomposition-agent       Up 27 minutes
    02c8062f15da   nemo-retriever-embedding-microservice   Up 27 minutes (healthy)
    7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes
    55135224e8fd   milvus-standalone                       Up 48 minutes (healthy)
    5844248a08df   milvus-minio                            Up 48 minutes (healthy)
    c42df344bb25   milvus-etcd                             Up 48 minutes (healthy)
    
  4. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

06: Structured Data RAG

This example demonstrates a use case of RAG with structured CSV data. This approach does not involve embedding models or vector database solutions and instead uses PandasAI for interacting with the CSV data. For ingestion, the structured data is loaded from a CSV file into a Pandas dataframe. The pipeline can ingest multiple CSV files, provided they have identical columns. Ingestion of CSV files with differing columns is not currently supported and results in an exception.

The core functionality utilizes a PandasAI agent to extract information from the dataframe. This agent combines the query with the structure of the dataframe into an LLM prompt. The LLM then generates Python code to extract the required information from the dataframe. Subsequently, this generated code is executed on the dataframe, yielding the output dataframe.

To test the example, sample CSV files are available. These are part of the structured data example Helm chart and represent a subset of the Microsoft Azure Predictive Maintenance from Kaggle. The CSV data retrieval prompt is specifically tuned for three CSV files from this dataset: PdM_machines.csv, PdM_errors.csv, and PdM_failures.csv. Specify the CSV file to use in the rag-app-structured-data-chatbot.yaml file within this resource by updating the environment variable CSV_NAME. By default, the file is set to PdM_machines but you can change it to PdM_errors or PdM_failures. Currently, customization of the CSV data retrieval prompt is not supported.

LLM Model (Data Retrieval) LLM Model (Response Paraphrasing) Embedding Framework Document Type Vector Database Model deployment platform
meta/llama3-70b-instruct meta/llama3-70b-instruct Not Used PandasAI CSV Not Used On Prem

Deployment

  1. Complete the Common Prerequisites.

  2. Run the pipeline.

    cd rag-app-structured-data-chatbot/
    USERID=$(id -u) docker compose --profile local-nim up -d
    
  3. Check status of the containers.

    docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
    
    CONTAINER ID   NAMES                                   STATUS
    32515fcb8ad2   rag-playground                          Up 26 minutes
    d60e0cee49f7   rag-app-structured-data-chatbot         Up 27 minutes
    7bd4d94dc7a7   nemollm-inference-microservice          Up 27 minutes
    
  4. Open your browser and interact with the RAG Playground at http://localhost:3001/converse.

Configuring Milvus as the Vector Database

Perform the following steps to use Milvus as the vector database for an example.

  1. Edit the Docker Compose file for the example, such as rag-app-text-chatbot-llamaindex.yaml.

    Update the environment variables within the chain server service:

    services:
      chain-server:
        container_name: chain-server
        environment:
          APP_VECTORSTORE_NAME: "milvus"
          APP_VECTORSTORE_URL: "http://milvus:19530"
    
  2. Optional: Stop the services running

    $ USERID=$(id -u) docker compose --profile local-nim --profile milvus down
    
  3. Stop and then start the services:

    $ USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
    
  4. Optional: View the chain server logs to confirm the vector database.

    1. View the logs:

      $ docker logs -f <rag-example>
      
    2. Try out a query after uploading any document and confirm the log output includes the vector database:

      INFO:example:Ingesting <file-name>.pdf in vectorDB
      INFO:RetrievalAugmentedGeneration.common.utils:Using milvus as vector store
      INFO:RetrievalAugmentedGeneration.common.utils:Using milvus collection: <example-name>
      

Configuring pgvector as the Vector Database

  1. Edit the Docker Compose file for the example to use pgvector as the vector database.

    Update the environment variables within the chain server service:

    services:
      chain-server:
        container_name: chain-server
        environment:
          APP_VECTORSTORE_NAME: "pgvector"
          APP_VECTORSTORE_URL: "pgvector:5432"
          POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-password}
          POSTGRES_USER: ${POSTGRES_USER:-postgres}
          POSTGRES_DB: ${POSTGRES_DB:-api}
    

    The preceding example shows the default values for the database user, password, and database. To override the defaults, edit the values in the Docker Compose file, or set the values in the compose.env file.

  2. Optional: Stop the services running

    $ USERID=$(id -u) docker compose --profile local-nim --profile pgvector down
    
  3. Stop and then start the services:

    $ USERID=$(id -u) docker compose --profile local-nim --profile pgvector up -d
    
  4. Optional: View the chain server logs to confirm the vector database.

    1. View the logs:

      $ docker logs -f <rag-example>
      
    2. Try out a query after uploading any document and confirm the log output includes the vector database:

      INFO:example:Ingesting <file-name>.pdf in vectorDB
      INFO:RetrievalAugmentedGeneration.common.utils:Using pgvector as vector store
      INFO:RetrievalAugmentedGeneration.common.utils:Using PGVector collection: <example-name>
      

Alternative LLM Model: Deploy Meta Llama 3 70B Instruct

The Llama 3 70B Instruct model with NVIDIA NIM for LLMs is the only supported alternative model.

  1. Edit the docker-compose-nim-ms.yaml file and update the image for the nemollm-inference-microservice service.

    Update the image to nvcr.io/nim/meta/llama3-70b-instruct:1.0.0, as shown in the following sample.

    services:
      nemollm-inference:
        container_name: nemollm-inference-microservice
        image: nvcr.io/nim/meta/llama3-70b-instruct:1.0.0
        volumes:
        - ${MODEL_DIRECTORY}:/opt/nim/.cache
        ports:
        - "8000:8000"
        expose:
        - "8000"
        environment:
          NGC_API_KEY: ${NGC_API_KEY}
        shm_size: 20gb
        deploy:
          resources:
            reservations:
              devices:
                - driver: nvidia
                  count: ${INFERENCE_GPU_COUNT:-all}
                  capabilities: [gpu]
          profiles: ["local-nim", "nemo-retriever"]
    
  2. Start the container.

    cd <example_dir>
    docker compose --profile local-nim up -d
    

Custom Chain Server

  1. Clone the repository:

    git lfs clone https://github.com/NVIDIA/GenerativeAIExamples.git
    
  2. After you modify the chain server code, build the image:

    USERID=$(id -u) docker compose build --no-cache chain-server
    
  3. Tag and push the container to a private registry, such as NVIDIA NGC:

    docker tag chain-server:latest nvcr.io/<org-name>/<team-name>/<example-name>:<version>
    
    docker push nvcr.io/<org-name>/<team-name>/<example-name>:<version> 
    
  4. Modify the compose file for the enterprise example to run and set your image for the chain server:

    services:
      chain-server:
        container_name: rag-application-text-chatbot
        image: nvcr.io/<org-name>/<team-name>/<example-name>:<version>
        ...
    
  5. Start the containers by running docker compose up -d.

Additional Resources

Learn more about how to use NVIDIA NIM microservices for RAG through our Deep Learning Institute. Access the course here.

Security considerations

The RAG applications are shared as reference architectures and are provided “as is”. The security of them in production environments is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats (including direct and indirect prompt injection); define the trust boundaries, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment including the containers up to date, ensure the containers are secure and free of vulnerabilities.

Licenses

By downloading or using this artifact included in the AI Chatbot workflows you agree to the terms of the NVIDIA Software License Agreement and Product-specific Terms for AI products.