RAG Application: Structured Data Chatbot

NGC Catalog

CLASSIC

Welcome Guest

For versions and more information, please view on a desktop device.

Description

Sample RAG application which can handle question-answering from tabular data stored in CSV format.

Publisher

NVIDIA

Latest Version

24.08

Compressed Size

264.11 KB

Modified

August 26, 2024

Structured Data RAG

Description

This example demonstrates a use case of RAG using structured CSV data. It incorporates models from the NVIDIA API Catalog to build the use case. This approach does not involve embedding models or vector database solutions, instead leveraging PandasAI to manage the workflow. For ingestion, the structured data is loaded from a CSV file into a Pandas dataframe. It can ingest multiple CSV files, provided they have identical columns. However, ingestion of CSV files with differing columns is not currently supported and will result in an exception. The core functionality utilizes a PandasAI agent to extract information from the dataframe. This agent combines the query with the structure of the dataframe into an LLM prompt. The LLM then generates Python code to extract the required information from the dataframe. Subsequently, this generated code is executed on the dataframe, yielding the output dataframe.

To test the example, sample CSV files are available. These are part of the structured data example Helm chart and represent a subset of the Microsoft Azure Predictive Maintenance from Kaggle. The CSV data retrieval prompt is specifically tuned for three CSV files from this dataset: PdM_machines.csv, PdM_errors.csv, and PdM_failures.csv. The CSV file to be used can be specified in the values.yaml file within the Helm chart by updating the environment variable CSV_NAME. By default, it is set to PdM_machines but can be changed to PdM_errors or PdM_failures. Currently, customization of the CSV data retrieval prompt is not supported.

LLM Model (Data Retrieval)	LLM Model (Response Paraphrasing)	Embedding	Framework	Document Type	Vector Database	Model deployment platform
meta/llama3-8b-instruct	meta/llama3-8b-instruct	Not Used	PandasAI	CSV	Not Used	On Prem

Prerequisites

You have the NGC CLI available on your client machine. You can download the CLI from https://ngc.nvidia.com/setup/installers/cli.
You have Kubernetes installed and running Ubuntu 22.04. Refer to the Kubernetes documentation or the NVIDIA Cloud Native Stack repository for more information.
You have a default storage class available in the cluster for PVC provisioning. One option is the local path provisioner by Rancher. Refer to the installation section of the README in the GitHub repository.
```
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.26/deploy/local-path-storage.yaml
kubectl get pods -n local-path-storage
kubectl get storageclass
```

If the local path storage class is not set as default, it can be made default using the command below

kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

You have installed the NVIDIA GPU Operator following steps here

Deployment

Fetch the Helm chart from NGC

helm fetch https://helm.ngc.nvidia.com/nvidia/aiworkflows/charts/rag-app-structured-data-chatbot-24.08.tgz --username='$oauthtoken' --password=<YOUR API KEY>

Deploy NVIDIA NIM LLM following steps in this section.

Create the example namespace

kubectl create namespace structured-rag

Export the NGC API Key in the environment.

export NGC_CLI_API_KEY="<YOUR NGC API KEY>"

Create the Helm pipeline instance and start the services.

helm install structured-rag rag-app-structured-data-chatbot-24.08.tgz -n structured-rag --set imagePullSecret.password=$NGC_CLI_API_KEY

Verify the pods are running and ready.

kubectl get pods -n structured-rag

Example Output

   NAME                                   READY   STATUS    RESTARTS   AGE
   chain-server-structured-rag-5bdcd6b848-ps2ht     1/1     Running   0          74m
   rag-playground-structured-rag-6d7ff8ddf6-kgtcn   1/1     Running   0          74m

Access the app using port-forwarding.
```
kubectl port-forward service/rag-playground-structured-rag -n structured-rag 30005:3001
```
Open browser and access the rag-playground UI using http://localhost:30005/converse.

Install the NVIDIA GPU Operator

Use the NVIDIA GPU Operator to install, configure, and manage the NVIDIA GPU driver and NVIDIA container runtime on the Kubernetes node.

Add the NVIDIA Helm repository:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
   && helm repo update

Install the Operator:

helm install --wait --generate-name \
   -n gpu-operator --create-namespace \
   nvidia/gpu-operator

Optional: Configure GPU time-slicing if you have fewer than three GPUs.

Create a file, time-slicing-config-all.yaml, with the following content:

apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config-all
data:
  any: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        resources:
        - name: nvidia.com/gpu
          replicas: 3

The sample configuration creates three replicas from each GPU on the node.

Add the config map to the Operator namespace:

kubectl create -n gpu-operator -f time-slicing-config-all.yaml

Configure the device plugin with the config map and set the default time-slicing configuration:

kubectl patch clusterpolicies.nvidia.com/cluster-policy \
    -n gpu-operator --type merge \
    -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config-all", "default": "any"}}}}'

Verify that at least 3 GPUs are allocatable:

kubectl get nodes -l nvidia.com/gpu.present -o json | jq '.items[0].status.allocatable | with_entries(select(.key | startswith("nvidia.com/"))) | with_entries(select(.value != "0"))'

Example Output

{
  "nvidia.com/gpu": "3"
}

Deploying NVIDIA NIMs

Deploying NVIDIA NIM for LLMs

(Default flow deploys meta/llama3-8b-instruct)

Follow the steps from nim-deploy repository to deploy NIM LLM microservice with meta/llama3-8b-instruct as default LLM model.

Deploying Milvus Vectorstore Helm Chart

Create a new nanespace for vectorstore

kubectl create namespace vectorstore

Add the milvus repository

helm repo add milvus https://zilliztech.github.io/milvus-helm/

Update the helm repository

helm repo update

Create a file named custom_value.yaml with below content to utilize GPU's

standalone:
  resources:
    requests:
      nvidia.com/gpu: "1"
    limits:
      nvidia.com/gpu: "1"

Install the helm chart and point to the above created file using -f argument as shown below.

helm install milvus milvus/milvus --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsar.enabled=false -f custom_value.yaml -n vectorstore

Check status of the pods

kubectl get pods -n vectorstore

All pods should be running and in a ready state within couple of minutes

NAME                                READY   STATUS    RESTARTS        AGE
milvus-etcd-0                       1/1     Running   0               5m34s
milvus-minio-76f9d647d5-44799       1/1     Running   0               5m34s
milvus-standalone-9ccf56df4-m4tpm   1/1     Running   3 (4m35s ago)   5m34

Configuring Examples

You can configure various parameters such as prompts and vectorstore using environment variables. Modify the environment variables in the env section of the query service in the values.yaml file of the respective examples.

Configuring Prompts

---
depth: 2
local: true
backlinks: none
---

Each example utilizes a prompt.yaml file that defines prompts for different contexts. These prompts guide the RAG model in generating appropriate responses. You can tailor these prompts to fit your specific needs and achieve desired responses from the models.

Accessing Prompts

The prompts are loaded as a Python dictionary within the application. To access this dictionary, you can use the get_prompts() function provided by the utils module. This function retrieves the complete dictionary of prompts.

Consider we have following prompt.yaml file which is under files directory for all the helm charts

chat_template: |
    You are a helpful, respectful and honest assistant.
    Always answer as helpfully as possible, while being safe.
    Please ensure that your responses are positive in nature.

rag_template: |
    You are a helpful AI assistant named Envie.
    You will reply to questions only based on the context that you are provided.
    If something is out of context, you will refrain from replying and politely decline to respond to the user.

You can access it's chat_template using following code in you chain server

from RAG.src.chain_server.utils import get_prompts

prompts = get_prompts()

chat_template = prompts.get("chat_template", "")

For any other example we can restart by changing the example name in the following command

helm upgrade <rag-example-name> <rag-example-helm-chart-path> -n <rag-example-namespace> --set imagePullSecret.password=$NGC_CLI_API_KEY

Configuring VectorStore

The vector store can be modified from environment variables. You can update:

APP_VECTORSTORE_NAME: This is the vector store name. Currently, we support milvus and pgvector Note: This only specifies the vector store name. The vector store container needs to be started separately.
APP_VECTORSTORE_URL: The host machine URL where the vector store is running.

Additional Resources

Learn more about how to use NVIDIA NIM microservices for RAG through our Deep Learning Institute. Access the course here.

Security considerations

The RAG applications are shared as reference architectures and are provided “as is”. The security of them in production environments is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats (including direct and indirect prompt injection); define the trust boundaries, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment including the containers up to date, ensure the containers are secure and free of vulnerabilities.

Licenses

By downloading or using NVIDIA NIM inference microservices included in the AI Chatbot workflows you agree to the terms of the NVIDIA Software License Agreement and Product-specific Terms for AI products.