This is a nemo file for Megatron BERT 345m with uncased BERT vocab.
Please be sure to download the latest version in order to ensure compatibility with the latest NeMo release.
NeMo Megatron is a new capability in the NeMo framework that allows developers to effectively train and scale language models to billions of parameters. Unlike BERT, the position of the layer normalization and the residual connection in the model architecture (similar to GPT-2 architucture) are swapped, which allowed the models to continue to improve as they were scaled up. This model reaches higher scores compared to BERT on a range of Natural Language Processing (NLP) tasks.
This 345m papameter model has 24 layers (Transformer blocks), 1024 hidden-units, and 16 attention heads.
For more information about NeMo Megatron visit https://github.com/NVIDIA/NeMo
This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. We offer versions of this model pretrained both with a cased and uncased vocabulary.
How to use this Model
NVIDIA NeMo can be used for easy fine-tuning to a number of different tasks. Tutorial notebooks on fine-tuning the model for Named Entity Recognition, Relation Extraction can be found on the tutorials page of NeMo.
Source code and developer guide is available at https://github.com/NVIDIA/NeMo Refer to documentation at https://docs.nvidia.com/deeplearning/nemo/neural-modules-release-notes/index.html
In the following we show examples for how to finetune BioMegatron on different downstream tasks.
Usage example 1: Finetune on RE dataset ChemProt https://github.com/NVIDIA/NeMo/blob/r1.7.2/tutorials/nlp/Relation_Extraction-BioMegatron.ipynb
Usage example 2: Finetune on NER dataset NBCI https://github.com/NVIDIA/NeMo/blob/r1.7.2/tutorials/nlp/Token_Classification-BioMegatron.ipynb
No known limitations available at this time.