NVIDIA NeMo Evaluator-compatible container with AA-LCR support
NVIDIA NeMo Evaluator
The goal of NVIDIA NeMo Evaluator is to advance and refine state-of-the-art methodologies for model evaluation, and deliver them as modular evaluation packages (evaluation containers and pip wheels) that teams can use as standardized building blocks.
AA-LCR
AA-LCR is part of NVIDIA NeMo Evaluator, providing standardized evaluation methodologies for assessing LLMs.
Overview
Artificial Analysis - Long Context Reasoning (AA-LCR) is a benchmark for evaluating long context performance through testing reasoning capabilities across multiple long documents (~100k tokens).
Quick Start Guide
List the available evaluations:
NVIDIA NeMo Evaluator provides you with evaluation clients that are specifically built to evaluate model endpoints using our Standard API.
Launching an Evaluation
Run the Evaluation of Your Choice
3rd Party Source Code
Users can download the third party source code through the URL provided in the container's README located in workdir.