345M parameter GPT generative Megatron model
Megatron-LM GPT2 345M
Megatron is a large, powerful transformer. For this particular Megatron model we trained a generative, left-to-right transformer in the style of GPT-2. This model contains 345 million parameters made up of 24 layers, 16 attention heads, and a hidden size of 1024.
This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories.
For more information about NeMo Megatron visit https://github.com/NVIDIA/NeMo
How to use this Model
NVIDIA NeMo can be used for text generation and prompt/p-tuning. Tutorial notebooks on p-tuning the model for multiple nlp tasks can be found on the tutorials page of NeMo.
Source code and developer guide is available at https://github.com/NVIDIA/NeMo Refer to documentation at https://docs.nvidia.com/deeplearning/nemo/neural-modules-release-notes/index.html
Limitations
No known limitations available at this time.
References
- P-tuning: An Effective Prompt Engineering Method to Significantly Improve the Performance of Your Large NLP Model (https://reg.rainfocus.com/flow/nvidia/gtcspring2022/aplive/page/ap/session/1638550963192001Z0Gr)
Licence
License to use this model is covered by the NGC TERMS OF USE unless another License/Terms Of Use/EULA is clearly specified. By downloading the public and release version of the model, you accept the terms and conditions of the NGC TERMS OF USE.