NGC Catalog
CLASSIC
Welcome Guest
Collections
Build an AI Chatbot with RAG

Build an AI Chatbot with RAG

For contents of this collection and more information, please view on a desktop device.
Logo for Build an AI Chatbot with RAG
Description
Use a reference application to build a fully functional retrieval-augmented generation (RAG)-based AI chatbot built with NVIDIA NIMTM microservices
Curator
NVIDIA
Modified
March 14, 2025
Containers
Sorry, your browser does not support inline SVG.
Helm Charts
Sorry, your browser does not support inline SVG.
Models
Sorry, your browser does not support inline SVG.
Resources
Sorry, your browser does not support inline SVG.

Overview

This reference application demonstrates how to use NVIDIA NIM inference microservices to develop generative AI-powered chatbots using retrieval-augmented generation (RAG).

This application showcases how to build canonical RAG-based chatbots utilizing LangChain and LlamaIndex. Included are advanced applications with Q&A chatbots based on multimodal and structured datasets, as well as agentic RAG-based chatbots.

Leverage NVIDIA NIM—including Llama-3-8b-instruct, NVIDIA NeMoTM Retriever embedding and reranking models—in end-to-end sample RAG chain servers using LangChain, LlamaIndex, PandasAI, and a GPU-accelerated Milvus vector database. Deploy with either Helm charts or Docker Compose, optimized for LLM inference performance and scaling.

NVIDIA AI Chatbot with RAG architecture diagram

Key benefits of the workflow include:

  • Deliver contextually accurate responses to naturally formed questions based on different types of knowledge bases.
  • Simplify the orchestration of scaling RAG pods on Kubernetes in production.
  • Deploy on your preferred on-premise or cloud platform.

Getting Started

To get started with Helm charts, select any of the following applications and follow the instructions mentioned in the overview section.

  • RAG Application: LangChain Text QA Chatbot
  • RAG Application: LlamaIndex Tex-t QA Chatbot
  • RAG Application: Multiturn Chatbot
  • RAG Application: Multimodal Chatbot
  • RAG Application: Structured Data Chatbot
  • RAG Application: Query Decomposition Agent

To get started with docker workflow follow the instructions mentioned in the following NGC resource

  • AI Chatbots with RAG - Docker workflow

Documentation

Documentation and source code regarding each of the above reference architectures can be found in NVIDIA GenerativeAIExamples GitHub repo.

  • LangChain Text QA Chatbot
  • LlamaIndex Text QA Chatbot
  • Multiturn Chatbot
  • Multimodal Chatbot
  • Structured Data Chatbot
  • Query Decomposition Agent

Additional Resources

Learn more about how to use NVIDIA NIM microservices for RAG through our Deep Learning Institute. Access the course here.

Security considerations

The RAG applications are shared as reference architectures and are provided “as is”. The security of them in production environments is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats (including direct and indirect prompt injection); define the trust boundaries, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment including the containers up to date, ensure the containers are secure and free of vulnerabilities.

License

By downloading or using NVIDIA NIM inference microservices included in the workflow you agree to the terms of the NVIDIA Software License Agreement and Product-specific Terms for AI products.