NGC | Catalog
CatalogModelsBART PyT checkpoint (Summarization, CNN-DM)

BART PyT checkpoint (Summarization, CNN-DM)

For downloads and more information, please view on a desktop device.
Logo for BART PyT checkpoint (Summarization, CNN-DM)

Description

BART PyT checkpoint for summarization on CNN-DM dataset

Publisher

NVIDIA Deep Learning Examples

Latest Version

20.11.1_amp

Modified

April 4, 2023

Size

4.14 GB

Model Overview

BART is a denoising autoencoder for pretraining sequence-to-sequence models.

Model Architecture

BART uses a standard sequence-to-sequence Transformer architecture with GeLU activations. The base model consists of 6 layers in encoder and decoder, whereas large consists of 12. The architecture has roughly 10% more parameters than BERT.

BART is trained by corrupting documents and then optimizing the reconstruction loss. The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.

Training

This model was trained using script available on NGC and in GitHub repo.

Dataset

The following datasets were used to train this model:

  • CNN/DailyMail - Dataset consisting of news articles (39 sentences on average) paired with multi-sentence summaries.

Performance

Performance numbers for this model are available in NGC.

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.