The NVIDIA K8s Developer LLM Operator is an open source and easy to deploy Kubernetes Operator to self-host Generative AI workflows.
In the initial release, the operator deploys a reference Retrieval Augmented Generation(RAG) workflow for a chatbot to question answer off public press releases & tech blogs. It performs document ingestion & Q&A interface using open source models deployed on any cloud or customer datacenter, leverages the power of GPU-accelerated Milvus for efficient vector storage and retrieval, along with TRT-LLM, to achieve lightning-fast inference speeds with custom LangChain LLM wrapper.
For information on support and getting started, visit the official documentation on GitHub
We're posting these examples on GitHub to better support the community, facilitate feedback, as well as collect and implement contributions using GitHub Issues and pull requests. We welcome all contributions!
In each of the READMEs within the GitHub repository, we indicate the level of support provided.
By pulling and using the container, you accept the terms and conditions of the NVIDIA AI Product License.