BART PyT checkpoint (Summarization, XSum)

NGC Catalog

CLASSIC

Welcome Guest

For downloads and more information, please view on a desktop device.

Description

BART PyT checkpoint for summarization on XSum dataset

Publisher

NVIDIA Deep Learning Examples

Latest Version

20.11.0_amp

Modified

April 4, 2023

Size

4.14 GB

Model Overview

BART is a denoising autoencoder for pretraining sequence-to-sequence models.

Model Architecture

BART uses a standard sequence-to-sequence Transformer architecture with GeLU activations. The base model consists of 6 layers in encoder and decoder, whereas large consists of 12. The architecture has roughly 10% more parameters than BERT.

BART is trained by corrupting documents and then optimizing the reconstruction loss. The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.

Training

This model was trained using script available on NGC and in GitHub repo.

Dataset

The following datasets were used to train this model:

Extreme Summarization - Dataset consisting of 226,711 Wayback archived BBC articles ranging over almost a decade (2010 to 2017) and covering a wide variety of domains (e.g., News, Politics, Sports, Weather, Business, Technology, Science, Health, Family, Education, Entertainment and Arts).

Performance

Performance numbers for this model are available in NGC.

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.