NGC | Catalog
CatalogCollectionsAI Chatbot With Retrieval Augmented Generation

AI Chatbot With Retrieval Augmented Generation

For contents of this collection and more information, please view on a desktop device.
Logo for AI Chatbot With Retrieval Augmented Generation


This sample application demonstrates a fully functional co-pilot system performing Retrieval Augmented Generation with: - NVIDIA TensorRT - NVIDIA Triton - Milvus - Llama Index + LangChain - Meta's Llama2 models




November 21, 2023
Sorry, your browser does not support inline SVG.
Helm Charts
Sorry, your browser does not support inline SVG.
Sorry, your browser does not support inline SVG.
Sorry, your browser does not support inline SVG.


Augmenting an existing AI foundational model provides an advanced starting point and a low-cost solution that enterprises can leverage to generate accurate and clear responses to their specific use case. The Retrieval Augmented Generation (RAG)-based AI chatbot workflow accelerates building and deploying enterprise LLM solutions and is currently in private, early access for our NVIDIA AI Enterprise customers.

This RAG-based reference chatbot workflow contains:

  • NVIDIA NeMo framework - part of NVIDIA AI Enterprise solution
  • NVIDIA TensorRT LLM (TRT-LLM) for low latency and high throughput inference for LLMs
  • LangChain and LlamaIndex for combining language model components and easily constructing question-answering from a company’s database
  • Sample Jupyter Notebooks and chatbot web application/API calls so that you can test the chat system in an interactive manner

Key benefits include:

  • NeMo powered LLM generates responses based on real-time information from the company’s knowledge base.
  • Accelerated inference with Triton Inference server and TRT-LLM
  • The entire workflow can be deployed on your preferred on-prem and cloud platform.

Getting Started

To get started, review the documentation linked below and learn what is included in this RAG-based AI chatbot workflow, and how to run the workflow.


Additional Resources

Learn more about generative AI through our Deep Learning Institute. Access the courses here.

Contact NVIDIA to learn more about options for accessing the AI Chatbot with Retrieval Augmented Generation workflow, Triton Inference Server, and NeMo.


By accessing NeMo as part of the AI chatbot with RAG workflow, you accept the terms and conditions of this End User License Agreement.