NGC | Catalog
CatalogModelsBART PyT checkpoint (Summarization, XSum)

BART PyT checkpoint (Summarization, XSum)

Logo for BART PyT checkpoint (Summarization, XSum)
Description
BART PyT checkpoint for summarization on XSum dataset
Publisher
NVIDIA Deep Learning Examples
Latest Version
20.11.0_amp
Modified
April 4, 2023
Size
4.14 GB

Model Overview

BART is a denoising autoencoder for pretraining sequence-to-sequence models.

Model Architecture

BART uses a standard sequence-to-sequence Transformer architecture with GeLU activations. The base model consists of 6 layers in encoder and decoder, whereas large consists of 12. The architecture has roughly 10% more parameters than BERT.

BART is trained by corrupting documents and then optimizing the reconstruction loss. The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.

Training

This model was trained using script available on NGC and in GitHub repo.

Dataset

The following datasets were used to train this model:

  • Extreme Summarization - Dataset consisting of 226,711 Wayback archived BBC articles ranging over almost a decade (2010 to 2017) and covering a wide variety of domains (e.g., News, Politics, Sports, Weather, Business, Technology, Science, Health, Family, Education, Entertainment and Arts).

Performance

Performance numbers for this model are available in NGC.

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.