NGC | Catalog


Logo for retro-8b-base-4k
InstructRetro is an autoregressive decoder-only language model (LM) with retrieval-augmented pretraining and instruction tuning
Latest Version
January 17, 2024
97.65 GB


Retro (Borgeaud et al., 2022) is an autoregressive decoder-only language model (LM) pretrained with retrieval-augmentation. Retro features practical scalibility to support large-scale pretraining from scratch by retrieving from trillions of token. Pretraining with retrieval provides a more efficient storage mechanism of factual knowledge, when compared to storing factual knowledge implicitly within the network's parameters, thus largely reducing model parameters while achieving lower perplexity than standard GPT. Retro also provides the flexibility to update the knowledge stored in LMs (Wang et al., 2023a) by updating the retrieval database without training LMs again.

InstructRetro (Wang et al., 2023b) further scales up the size of Retro to 48B, featuring the largest LLM pretrained with retrieval (as of December 2023). The obtained foundation model, Retro 48B, largely outperforms the GPT counterpart in terms of perplexity. With instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on downstream tasks in the zero-shot setting. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA tasks, and 10% over GPT across 4 challenging long-form QA tasks. We also find that one can ablate the encoder from InstructRetro architecture and directly use the InstructRetro decoder backbone as GPT, while achieving comparable results.

Model Overview


The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement.

Supported Hardware

  • H100
  • A100 80GB, A100 40GB

Model Version(s)

retro-8b-instruct-4k: Pretrained Retro 8B LM without instruction tuning.

Using base models without instruction tuning for downstream task evaluation is not recommended.


Megatron-LM Framework


We recommend using docker environment to run the code.

Docker image

We provide a docker build file in Dockerfile for the reproduction. The docker image is based on

Install dependencies

Clone the Megatron repo:

git clone --branch InstructRetro

If docker is not available, we recommend starting from a clean conda environment with the following runtime dependencies:

  • Python 3.10
  • NVIDIA CUDA┬« 12.2.1
  • NVIDIA cuDNN 8.9.5
  • NVIDIA NCCL 2.18.5
  • PyTorch 2.1.0a0+32f93b1

Then install Retro-specific dependencies, including:

pip install -U faiss-gpu
pip install -U transformers
pip install -U sentencepiece
pip install -U h5py
pip install -U nltk
pip install -U einops

Evaluation Command

Download our model checkpoint and tokenizer.

Specify the blank args in the tools/retro/text_generation/ script, including model path, Retro workdir, and model related params.

Parameter Value Explanation
mod_par 4 Tensor parallelism
layers 32 Number of layers in the model
hid_dim 4096 Hidden dimension size
heads 32 Number of attention heads
pip_par 1 Pipeline parallelism

We present an example command to run retro generation with the InstructRetro checkpoints for the Natural Question (NQ) task. The example command is for the 8b InstructRetro. Please specify the directory for the NQ dataset and update the command accordingly for other checkpoints.

bash tools/retro/text_generation/ nq 8b greedy test  0 20000 1000 5 pp1 <path/to/checkpoint> 2

The generated responses will be saved in the corresponding checkpoint directory. For example, for the 8b InstructRetro, it will be saved to <path/to/retro>/retro-generate-nq_5_2_8b_test_greedy_0_20000_1000.txt.

To evaluate the F1 / Exact Match (EM) scores of the generated responses, we provide an example script to run the evaluation on the NQ dataset. Please specify the directory for the NQ dataset and update the command accordingly for other checkpoints and downstream tasks.

python3 tools/retro/text_generation/


See more details from our papers:

Shall we Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study.

Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro. (EMNLP 2023)

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining.

Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro.

Please cite the papers as follows if you use the data or code from this repo:

    title   = {Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study},
    author  = {Boxin Wang and Wei Ping and Peng Xu and Lawrence McAfee and Zihan Liu and Mohammad Shoeybi and Yi Dong and Oleksii Kuchaiev and Bo Li and Chaowei Xiao and Anima Anandkumar and Bryan Catanzaro},
    journal = {The 2023 Conference on Empirical Methods in Natural Language Processing},
    year    = {2023}

    title   = {InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining},
    author  = {Boxin Wang and Wei Ping and Lawrence McAfee and Peng Xu and Bo Li and Mohammad Shoeybi and Bryan Catanzaro},
    year    = {2023},
    journal = {arXiv preprint arXiv: 2310.07713}