NGC Catalog
CLASSIC
Welcome Guest
Containers
LLM-as-a-Judge Benchmark Tool

LLM-as-a-Judge Benchmark Tool

For copy image paths and more information, please view on a desktop device.
Logo for LLM-as-a-Judge Benchmark Tool
Features
Description
Evaluation benchmark using LLMs as judges
Publisher
NVIDIA
Latest Tag
0.12.15
Modified
April 18, 2025
Compressed Size
3.2 GB
Multinode Support
No
Multi-Arch Support
No
0.12.15 (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

LLM as a Judge Benchmark container

Overview

This image is a container to run LLM-as-a-judge evaluation benchmarks. The NeMo Evaluator microservice uses this container to run LLM-as-a-judge benchmarks. As part of the evaluation, the container downloads a dataset, performs inference, and uploads the results and logs to the datastore.

To get started with NeMo Evaluator, refer to Evaluation Tutorials.

Note: Use, distribution or deployment of this microservice in production requires an NVIDIA AI Enterprise License.

Governing Terms

The software and materials are governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products.