Megatron is a large, powerful transformer. For this particular Megatron model we trained a generative, left-to-right transformer in the style of GPT-2. This model contains 345 million parameters made up of 24 layers, 16 attention heads, and a hidden size of 1024.
This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories.
For more information about NeMo Megatron visit https://github.com/NVIDIA/NeMo
Source code and developer guide is available at https://github.com/NVIDIA/NeMo Refer to documentation at https://docs.nvidia.com/deeplearning/nemo/neural-modules-release-notes/index.html
No known limitations available at this time.