NGC | Catalog
CatalogModelsMegatron GPT2 345M

Megatron GPT2 345M

For downloads and more information, please view on a desktop device.
Logo for Megatron GPT2 345M

Description

345M parameter GPT generative Megatron model

Publisher

NVIDIA NeMo

Latest Version

1

Modified

April 4, 2023

Size

1.32 GB

Megatron-LM GPT2 345M

Megatron is a large, powerful transformer. For this particular Megatron model we trained a generative, left-to-right transformer in the style of GPT-2. This model contains 345 million parameters made up of 24 layers, 16 attention heads, and a hidden size of 1024.

This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories.

For more information about NeMo Megatron visit https://github.com/NVIDIA/NeMo

How to use this Model

NVIDIA NeMo can be used for text generation and prompt/p-tuning. Tutorial notebooks on p-tuning the model for multiple nlp tasks can be found on the tutorials page of NeMo.

Source code and developer guide is available at https://github.com/NVIDIA/NeMo Refer to documentation at https://docs.nvidia.com/deeplearning/nemo/neural-modules-release-notes/index.html

Limitations

No known limitations available at this time.

References

  1. P-tuning: An Effective Prompt Engineering Method to Significantly Improve the Performance of Your Large NLP Model (https://reg.rainfocus.com/flow/nvidia/gtcspring2022/aplive/page/ap/session/1638550963192001Z0Gr)

Licence

License to use this model is covered by the NGC TERMS OF USE unless another License/Terms Of Use/EULA is clearly specified. By downloading the public and release version of the model, you accept the terms and conditions of the NGC TERMS OF USE.