NVIDIA NeMo Evaluator

The goal of NVIDIA NeMo Evaluator is to advance and refine state-of-the-art methodologies for model evaluation, and deliver them as modular evaluation packages (evaluation containers and pip wheels) that teams can use as standardized building blocks.

AA-LCR

AA-LCR is part of NVIDIA NeMo Evaluator, providing standardized evaluation methodologies for assessing LLMs.

Overview

Artificial Analysis - Long Context Reasoning (AA-LCR) is a benchmark for evaluating long context performance through testing reasoning capabilities across multiple long documents (~100k tokens).

Quick Start Guide

List the available evaluations:

$ nemo-evaluator ls
AA-LCR:
  * aa_lcr
  * 1

NVIDIA NeMo Evaluator provides you with evaluation clients that are specifically built to evaluate model endpoints using our Standard API.

Launching an Evaluation

Run the Evaluation of Your Choice

export API_KEY=your_nvidia_api_key_here

nemo-evaluator run_eval \
    --eval_type aa_lcr \
    --model_id meta/llama-3.1-70b-instruct \
    --model_type chat \
    --model_url https://integrate.api.nvidia.com/v1/chat/completions \
    --api_key_name API_KEY \
    --output_dir './results/aa_lcr'

3rd Party Source Code

Users can download the third party source code through the URL provided in the container's README located in workdir.

Publisher

NVIDIA

Latest Taglatest

UpdatedMarch 15, 2026 UTC

Compressed Size476.22 MB

Multinode SupportNo

Multi-Arch SupportYes

System

signed images

Labels

AI Language Modeling ML Natural Language Processing NeMo NSPECT-JL1B-TVGU