NGC Catalog
CLASSIC
Welcome Guest
Helm Charts
RAG Application: Multimodal Chatbot

RAG Application: Multimodal Chatbot

For versions and more information, please view on a desktop device.
Logo for RAG Application: Multimodal Chatbot
Description
This example showcases multi modal usecase in a RAG pipeline. It can understand any kind of images in PDF or .pptx (like graphs and plots) alongside text and tables.
Publisher
NVIDIA
Latest Version
24.08
Compressed Size
12.14 KB
Modified
August 26, 2024

Multi Modal RAG

Description

This example showcases multi modal usecase in a RAG pipeline. It can understand any kind of images in PDF or .pptx (like graphs and plots) alongside text and tables. It uses multimodal models from NVIDIA API Catalog to answer queries.

LLM Model Embedding Framework Document Type Vector Database Model deployment platform
meta/llama3-8b-instruct for response generation, ai-google-Deplot for graph to text convertion, ai-Neva-22B for image to text convertion nv-embedqa-e5-v5 Custom Python PDF with images/PPTX Milvus Cloud - NVIDIA API Catalog for ai-google-Deplot and ai-Neva-22B. On Prem NeMo Inference Microservices for meta/llama3-8b-instruct model

Prerequisites

  • You have the NGC CLI available on your client machine. You can download the CLI from https://ngc.nvidia.com/setup/installers/cli.

  • You have Kubernetes installed and running Ubuntu 22.04. Refer to the Kubernetes documentation or the NVIDIA Cloud Native Stack repository for more information.

  • You have a default storage class available in the cluster for PVC provisioning. One option is the local path provisioner by Rancher. Refer to the installation section of the README in the GitHub repository.

    kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.26/deploy/local-path-storage.yaml
    kubectl get pods -n local-path-storage
    kubectl get storageclass
    
  • If the local path storage class is not set as default, it can be made default using the command below

    kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
    
  • You have installed the NVIDIA GPU Operator following steps here

Deployment

  1. Fetch the Helm chart from NGC
helm fetch https://helm.ngc.nvidia.com/nvidia/aiworkflows/charts/rag-app-multimodal-chatbot-24.08.tgz --username='$oauthtoken' --password=<YOUR API KEY>
  1. Deploy NVIDIA NIM LLM and NVIDIA NeMo Retriever Embedding Microservice following steps in this section.

  2. Deploy Milvus vectorstore following steps in this section.

  3. Create the example namespace

    kubectl create namespace multimodal-rag
    
  4. Set the NVIDIA API Catalog API key

    kubectl create secret -n multimodal-rag generic nv-api-catalog-secret --from-literal=NVIDIA_API_KEY="nvapi-*"
    
  5. Export the NGC API Key in the environment.

export NGC_CLI_API_KEY="<YOUR NGC API KEY>"
  1. Create the Helm pipeline instance for core multimodal rag services.

    helm install multimodal-rag rag-app-multimodal-chatbot-24.08.tgz -n multimodal-rag --set imagePullSecret.password=$NGC_CLI_API_KEY
    
  2. Verify the pods are running and ready.

    kubectl get pods -n multimodal-rag
    

    Example Output

    NAME                                   READY   STATUS    RESTARTS   AGE
    chain-server-multimodal-rag-5bdcd6b848-ps2ht     1/1     Running   0          74m
    rag-playground-multimodal-rag-6d7ff8ddf6-kgtcn   1/1     Running   0          74m
    
  3. Access the app using port-forwarding.

    kubectl port-forward service/rag-playground-multimodal-rag -n multimodal-rag 30004:3001
    

    Open browser and access the rag-playground UI using http://localhost:30004/converse.

Install the NVIDIA GPU Operator

Use the NVIDIA GPU Operator to install, configure, and manage the NVIDIA GPU driver and NVIDIA container runtime on the Kubernetes node.

  1. Add the NVIDIA Helm repository:

    helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
       && helm repo update
    
  2. Install the Operator:

    helm install --wait --generate-name \
       -n gpu-operator --create-namespace \
       nvidia/gpu-operator
    
  3. Optional: Configure GPU time-slicing if you have fewer than three GPUs.

    • Create a file, time-slicing-config-all.yaml, with the following content:

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: time-slicing-config-all
      data:
        any: |-
          version: v1
          flags:
            migStrategy: none
          sharing:
            timeSlicing:
              resources:
              - name: nvidia.com/gpu
                replicas: 3
      

      The sample configuration creates three replicas from each GPU on the node.

    • Add the config map to the Operator namespace:

      kubectl create -n gpu-operator -f time-slicing-config-all.yaml
      
    • Configure the device plugin with the config map and set the default time-slicing configuration:

      kubectl patch clusterpolicies.nvidia.com/cluster-policy \
          -n gpu-operator --type merge \
          -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config-all", "default": "any"}}}}'
      
    • Verify that at least 3 GPUs are allocatable:

      kubectl get nodes -l nvidia.com/gpu.present -o json | jq '.items[0].status.allocatable | with_entries(select(.key | startswith("nvidia.com/"))) | with_entries(select(.value != "0"))'
      

      Example Output

      {
        "nvidia.com/gpu": "3"
      }
      

Deploying NVIDIA NIM Microservices

Deploying NVIDIA NIM for LLMs

(Default flow deploys meta/llama3-8b-instruct)
  1. Follow the steps from nim-deploy repository to deploy NIM LLM microservice with meta/llama3-8b-instruct as default LLM model.

Deploying NVIDIA Nemo Retriever Embedding Microservice

  1. Follow steps from here to fetch and deploy the Nemo Retriever Embedding Microservice Helm chart.

Note: While deploying the NREM helm chart, use below step to forcefully set the embedding image path to the GA version of embedding model.

helm upgrade --install \
  --namespace nrem \
  --set image.repository=nvcr.io/nim/nvidia/nv-embedqa-e5-v5 \
  --set image.tag=1.0.0 \
  nemo-embedder \
  text-embedding-nim-1.0.0.tgz

Deploying Milvus Vectorstore Helm Chart

  1. Create a new nanespace for vectorstore
kubectl create namespace vectorstore
  1. Add the milvus repository
helm repo add milvus https://zilliztech.github.io/milvus-helm/
  1. Update the helm repository
helm repo update
  1. Create a file named custom_value.yaml with below content to utilize GPU's
standalone:
  resources:
    requests:
      nvidia.com/gpu: "1"
    limits:
      nvidia.com/gpu: "1"
  1. Install the helm chart and point to the above created file using -f argument as shown below.
helm install milvus milvus/milvus --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsar.enabled=false -f custom_value.yaml -n vectorstore
  1. Check status of the pods
kubectl get pods -n vectorstore
  1. All pods should be running and in a ready state within couple of minutes
NAME                                READY   STATUS    RESTARTS        AGE
milvus-etcd-0                       1/1     Running   0               5m34s
milvus-minio-76f9d647d5-44799       1/1     Running   0               5m34s
milvus-standalone-9ccf56df4-m4tpm   1/1     Running   3 (4m35s ago)   5m34

Configuring Examples

You can configure various parameters such as prompts and vectorstore using environment variables. Modify the environment variables in the env section of the query service in the values.yaml file of the respective examples.

Configuring Prompts

---
depth: 2
local: true
backlinks: none
---

Each example utilizes a prompt.yaml file that defines prompts for different contexts. These prompts guide the RAG model in generating appropriate responses. You can tailor these prompts to fit your specific needs and achieve desired responses from the models.

Accessing Prompts

The prompts are loaded as a Python dictionary within the application. To access this dictionary, you can use the get_prompts() function provided by the utils module. This function retrieves the complete dictionary of prompts.

Consider we have following prompt.yaml file which is under files directory for all the helm charts

chat_template: |
    You are a helpful, respectful and honest assistant.
    Always answer as helpfully as possible, while being safe.
    Please ensure that your responses are positive in nature.

rag_template: |
    You are a helpful AI assistant named Envie.
    You will reply to questions only based on the context that you are provided.
    If something is out of context, you will refrain from replying and politely decline to respond to the user.

You can access it's chat_template using following code in you chain server

from RAG.src.chain_server.utils import get_prompts

prompts = get_prompts()

chat_template = prompts.get("chat_template", "")

Once you have updated the prompt you can update the deployment for any of the examples by using the command below.

helm upgrade <rag-example-name> <rag-example-helm-chart-path> -n <rag-example-namespace> --set imagePullSecret.password=$NGC_CLI_API_KEY

Configuring VectorStore

The vector store can be modified from environment variables. You can update:

  1. APP_VECTORSTORE_NAME: This is the vector store name. Currently, we support milvus and pgvector Note: This only specifies the vector store name. The vector store container needs to be started separately.

  2. APP_VECTORSTORE_URL: The host machine URL where the vector store is running.

Additional Resources

Learn more about how to use NVIDIA NIM microservices for RAG through our Deep Learning Institute. Access the course here.

Security considerations

The RAG applications are shared as reference architectures and are provided “as is”. The security of them in production environments is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats (including direct and indirect prompt injection); define the trust boundaries, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment including the containers up to date, ensure the containers are secure and free of vulnerabilities.

Licenses

By downloading or using NVIDIA NIM inference microservices included in the AI Chatbot workflows you agree to the terms of the NVIDIA Software License Agreement and Product-specific Terms for AI products.