Build an AI Chatbot with RAG

NGC Catalog

CLASSIC

Welcome Guest

For contents of this collection and more information, please view on a desktop device.

Description

Use a reference application to build a fully functional retrieval-augmented generation (RAG)-based AI chatbot built with NVIDIA NIMTM microservices

Curator

NVIDIA

Modified

March 14, 2025

Containers

Helm Charts

Models

Resources

Overview

This reference application demonstrates how to use NVIDIA NIM inference microservices to develop generative AI-powered chatbots using retrieval-augmented generation (RAG).

This application showcases how to build canonical RAG-based chatbots utilizing LangChain and LlamaIndex. Included are advanced applications with Q&A chatbots based on multimodal and structured datasets, as well as agentic RAG-based chatbots.

Leverage NVIDIA NIM—including Llama-3-8b-instruct, NVIDIA NeMoTM Retriever embedding and reranking models—in end-to-end sample RAG chain servers using LangChain, LlamaIndex, PandasAI, and a GPU-accelerated Milvus vector database. Deploy with either Helm charts or Docker Compose, optimized for LLM inference performance and scaling.

NVIDIA AI Chatbot with RAG architecture diagram

Key benefits of the workflow include:

Deliver contextually accurate responses to naturally formed questions based on different types of knowledge bases.
Simplify the orchestration of scaling RAG pods on Kubernetes in production.
Deploy on your preferred on-premise or cloud platform.

Getting Started

To get started with Helm charts, select any of the following applications and follow the instructions mentioned in the overview section.

To get started with docker workflow follow the instructions mentioned in the following NGC resource

AI Chatbots with RAG - Docker workflow

Documentation

Documentation and source code regarding each of the above reference architectures can be found in NVIDIA GenerativeAIExamples GitHub repo.

Additional Resources

Learn more about how to use NVIDIA NIM microservices for RAG through our Deep Learning Institute. Access the course here.

Security considerations

The RAG applications are shared as reference architectures and are provided “as is”. The security of them in production environments is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats (including direct and indirect prompt injection); define the trust boundaries, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment including the containers up to date, ensure the containers are secure and free of vulnerabilities.

License

By downloading or using NVIDIA NIM inference microservices included in the workflow you agree to the terms of the NVIDIA Software License Agreement and Product-specific Terms for AI products.