This example demonstrates a use case of RAG using structured CSV data. It incorporates models from the NVIDIA API Catalog to build the use case. This approach does not involve embedding models or vector database solutions, instead leveraging PandasAI to manage the workflow. For ingestion, the structured data is loaded from a CSV file into a Pandas dataframe. It can ingest multiple CSV files, provided they have identical columns. However, ingestion of CSV files with differing columns is not currently supported and will result in an exception. The core functionality utilizes a PandasAI agent to extract information from the dataframe. This agent combines the query with the structure of the dataframe into an LLM prompt. The LLM then generates Python code to extract the required information from the dataframe. Subsequently, this generated code is executed on the dataframe, yielding the output dataframe.
To test the example, sample CSV files are available. These are part of the structured data example Helm chart and represent a subset of the Microsoft Azure Predictive Maintenance from Kaggle.
The CSV data retrieval prompt is specifically tuned for three CSV files from this dataset: PdM_machines.csv, PdM_errors.csv, and PdM_failures.csv
.
The CSV file to be used can be specified in the values.yaml
file within the Helm chart by updating the environment variable CSV_NAME
. By default, it is set to PdM_machines
but can be changed to PdM_errors
or PdM_failures
.
Currently, customization of the CSV data retrieval prompt is not supported.
LLM Model (Data Retrieval) | LLM Model (Response Paraphrasing) | Embedding | Framework | Document Type | Vector Database | Model deployment platform |
---|---|---|---|---|---|---|
meta/llama3-8b-instruct | meta/llama3-8b-instruct | Not Used | PandasAI | CSV | Not Used | On Prem |
You have the NGC CLI available on your client machine. You can download the CLI from https://ngc.nvidia.com/setup/installers/cli.
You have Kubernetes installed and running Ubuntu 22.04. Refer to the Kubernetes documentation or the NVIDIA Cloud Native Stack repository for more information.
You have a default storage class available in the cluster for PVC provisioning. One option is the local path provisioner by Rancher. Refer to the installation section of the README in the GitHub repository.
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.26/deploy/local-path-storage.yaml
kubectl get pods -n local-path-storage
kubectl get storageclass
If the local path storage class is not set as default, it can be made default using the command below
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
You have installed the NVIDIA GPU Operator following steps here
Fetch the Helm chart from NGC
helm fetch https://helm.ngc.nvidia.com/nvidia/aiworkflows/charts/rag-app-structured-data-chatbot-24.08.tgz --username='$oauthtoken' --password=<YOUR API KEY>
Deploy NVIDIA NIM LLM following steps in this section.
Create the example namespace
kubectl create namespace structured-rag
Export the NGC API Key in the environment.
export NGC_CLI_API_KEY="<YOUR NGC API KEY>"
Create the Helm pipeline instance and start the services.
helm install structured-rag rag-app-structured-data-chatbot-24.08.tgz -n structured-rag --set imagePullSecret.password=$NGC_CLI_API_KEY
Verify the pods are running and ready.
kubectl get pods -n structured-rag
Example Output
NAME READY STATUS RESTARTS AGE
chain-server-structured-rag-5bdcd6b848-ps2ht 1/1 Running 0 74m
rag-playground-structured-rag-6d7ff8ddf6-kgtcn 1/1 Running 0 74m
Access the app using port-forwarding.
kubectl port-forward service/rag-playground-structured-rag -n structured-rag 30005:3001
Open browser and access the rag-playground UI using http://localhost:30005/converse.
Use the NVIDIA GPU Operator to install, configure, and manage the NVIDIA GPU driver and NVIDIA container runtime on the Kubernetes node.
Add the NVIDIA Helm repository:
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
&& helm repo update
Install the Operator:
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator
Optional: Configure GPU time-slicing if you have fewer than three GPUs.
Create a file, time-slicing-config-all.yaml
, with the following content:
apiVersion: v1
kind: ConfigMap
metadata:
name: time-slicing-config-all
data:
any: |-
version: v1
flags:
migStrategy: none
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 3
The sample configuration creates three replicas from each GPU on the node.
Add the config map to the Operator namespace:
kubectl create -n gpu-operator -f time-slicing-config-all.yaml
Configure the device plugin with the config map and set the default time-slicing configuration:
kubectl patch clusterpolicies.nvidia.com/cluster-policy \
-n gpu-operator --type merge \
-p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config-all", "default": "any"}}}}'
Verify that at least 3
GPUs are allocatable:
kubectl get nodes -l nvidia.com/gpu.present -o json | jq '.items[0].status.allocatable | with_entries(select(.key | startswith("nvidia.com/"))) | with_entries(select(.value != "0"))'
Example Output
{
"nvidia.com/gpu": "3"
}
meta/llama3-8b-instruct
as default LLM model.kubectl create namespace vectorstore
helm repo add milvus https://zilliztech.github.io/milvus-helm/
helm repo update
standalone:
resources:
requests:
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"
helm install milvus milvus/milvus --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsar.enabled=false -f custom_value.yaml -n vectorstore
kubectl get pods -n vectorstore
NAME READY STATUS RESTARTS AGE
milvus-etcd-0 1/1 Running 0 5m34s
milvus-minio-76f9d647d5-44799 1/1 Running 0 5m34s
milvus-standalone-9ccf56df4-m4tpm 1/1 Running 3 (4m35s ago) 5m34
You can configure various parameters such as prompts and vectorstore using environment variables. Modify the environment variables in the env
section of the query service in the values.yaml file of the respective examples.
---
depth: 2
local: true
backlinks: none
---
Each example utilizes a prompt.yaml file that defines prompts for different contexts. These prompts guide the RAG model in generating appropriate responses. You can tailor these prompts to fit your specific needs and achieve desired responses from the models.
The prompts are loaded as a Python dictionary within the application. To access this dictionary, you can use the get_prompts()
function provided by the utils
module. This function retrieves the complete dictionary of prompts.
Consider we have following prompt.yaml
file which is under files
directory for all the helm charts
chat_template: |
You are a helpful, respectful and honest assistant.
Always answer as helpfully as possible, while being safe.
Please ensure that your responses are positive in nature.
rag_template: |
You are a helpful AI assistant named Envie.
You will reply to questions only based on the context that you are provided.
If something is out of context, you will refrain from replying and politely decline to respond to the user.
You can access it's chat_template using following code in you chain server
from RAG.src.chain_server.utils import get_prompts
prompts = get_prompts()
chat_template = prompts.get("chat_template", "")
For any other example we can restart by changing the example name in the following command
helm upgrade <rag-example-name> <rag-example-helm-chart-path> -n <rag-example-namespace> --set imagePullSecret.password=$NGC_CLI_API_KEY
The vector store can be modified from environment variables. You can update:
APP_VECTORSTORE_NAME
: This is the vector store name. Currently, we support milvus
and pgvector
Note: This only specifies the vector store name. The vector store container needs to be started separately.
APP_VECTORSTORE_URL
: The host machine URL where the vector store is running.
Learn more about how to use NVIDIA NIM microservices for RAG through our Deep Learning Institute. Access the course here.
The RAG applications are shared as reference architectures and are provided “as is”. The security of them in production environments is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats (including direct and indirect prompt injection); define the trust boundaries, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment including the containers up to date, ensure the containers are secure and free of vulnerabilities.
By downloading or using NVIDIA NIM inference microservices included in the AI Chatbot workflows you agree to the terms of the NVIDIA Software License Agreement and Product-specific Terms for AI products.