NVIDIA
NVIDIA
NVIDIA Tau2-Bench
Container
NVIDIA
NVIDIA
NVIDIA Tau2-Bench

NVIDIA NeMo Evaluator-compatible container with Tau2-Bench support

NVIDIA NeMo Evaluator

The goal of NVIDIA NeMo Evaluator is to advance and refine state-of-the-art methodologies for model evaluation, and deliver them as modular evaluation packages (evaluation containers and pip wheels) that teams can use as standardized building blocks.

Overview

Tau2-Bench implements a simulation framework for evaluating customer service agents across various domains.

Key Features:

  • Comprehensive evaluation across multiple task types
  • Standardized benchmarking methodology
  • Support for diverse model architectures

Quick Start Guide

List the available evaluations:

$ nemo-evaluator ls
tau2_bench:
  * tau2_bench_airline
  * tau2_bench_retail
  * tau2_bench_telecom

NVIDIA NeMo Evaluator provides you with evaluation clients that are specifically built to evaluate model endpoints using our Standard API.

Launching an Evaluation

Run the Evaluation of Your Choice

export API_KEY=your_nvidia_api_key_here

nemo-evaluator run_eval \
    --eval_type tau2_bench_telecom \
    --model_id openai/gpt-oss-120b \
    --model_type chat \
    --model_url https://integrate.api.nvidia.com/v1/chat/completions \
    --api_key_name API_KEY \
    --output_dir './results/tau2bench'

3rd Party Source Code

Users can download the third party source code through the URL provided in the container's README located in workdir.

Publisher
NVIDIA
NVIDIA
Latest Tag26.03
UpdatedMarch 11, 2026 UTC
Compressed Size605.3 MB
Multinode SupportNo
Multi-Arch SupportYes

NVIDIA uses cookies to improve your experience on our web site. We and our third-party partners also use cookies and other tools to collect and record information you provide as well as information about your interactions with our websites for performance improvement, analytics, and to assist in marketing efforts. By clicking "Accept All", you consent to our use of cookies and other tools as described in our Cookie Policy. You can manage your cookie settings by clicking on "Manage Settings." By continuing to use this site or by clicking one of the buttons below, you agree to our Terms of Service (which contains important waivers). Please see our Privacy Policy for more information on our privacy practices.