NGC | Catalog
CatalogModelsBART PyT checkpoint (Summarization, XSum)

BART PyT checkpoint (Summarization, XSum)

Logo for BART PyT checkpoint (Summarization, XSum)
BART PyT checkpoint for summarization on XSum dataset
NVIDIA Deep Learning Examples
Latest Version
April 4, 2023
4.14 GB

Model Overview

BART is a denoising autoencoder for pretraining sequence-to-sequence models.

Model Architecture

BART uses a standard sequence-to-sequence Transformer architecture with GeLU activations. The base model consists of 6 layers in encoder and decoder, whereas large consists of 12. The architecture has roughly 10% more parameters than BERT.

BART is trained by corrupting documents and then optimizing the reconstruction loss. The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.


This model was trained using script available on NGC and in GitHub repo.


The following datasets were used to train this model:

  • Extreme Summarization - Dataset consisting of 226,711 Wayback archived BBC articles ranging over almost a decade (2010 to 2017) and covering a wide variety of domains (e.g., News, Politics, Sports, Weather, Business, Technology, Science, Health, Family, Education, Entertainment and Arts).


Performance numbers for this model are available in NGC.



This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.