This is a checkpoint for BioMegatron 345m uncased. BioMegatron is Megatron language model pretrained on uncased PubMed https://catalog.data.gov/dataset/pubmed, a biomedical domain dataset, which gives state-of-the-art results on a range of biomedical natural language downstream tasks. The model has around 345 million paramters. The model is trained with https://github.com/NVIDIA/Megatron-LM.
Megatron is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA, which was trained with multinode and using mixed precision. Unlike BERT, the position of the layer normalization and the residual connection in the model architecture (similar to GPT-2 architucture) are swapped, which allowed the models to continue to improve as they were scaled up. This model reaches higher scores compared to BERT on a range of Natural Language Processing (NLP) tasks. BioMegatron has the same network architecture as the Megatron-LM/BERT, but is pretrained on a different dataset - PubMed, a large biomedical text corpus, which achieves better performance in biomedical downstream tasks than the original Megatron.
This 345m papameter model has 24 layers (Transformer blocks), 1024 hidden-units, and 16 attention heads. It uses the original BERT uncased vocabulary learned from Wikipedia and Books corpus. For more detail about the Megatron-LM/BERT architecture, please refer to the Megatron-LM or BERT papers.
For more information about BioBERT or Megatron visit https://ngc.nvidia.com/catalog/models/nvidia:biobertbasecasedfornemo or https://github.com/NVIDIA/Megatron-LM.
The entire pre-training takes about 400 hours on 8 DGX-2 machines with Tesla V100 GPUs. Loss function and hyper-parameter settings are the same as pre-training the BERT language models with the Megatron-LM codebase.
The model achieves state-of-the-art results on BioASQ-7b-factoid biomedical question answering:
How to use this Model
NVIDIA NeMo can be used for easy fine-tuning to a number of different tasks. Tutorial notebooks on fine-tuning the model for Named Entity Recognition, Relation Extraction, and Question Answering can be found on the tutorials page of NeMo.
Ulternatively, users can also choose to develop their own fine-tuning script from the Megatron-LM codebase, or use any other PyTorch-based framework such as Huggingface Transformers by installing the Megatron-LM package from PyPi.
No known limitations available at this time.
- Shin, Hoo-Chang and Zhang, Yang and Bakhturina, Evelina and Puri, Raul and Patwary, Mostofa and Shoeybi, Mohammad and Mani, Raghav, 2020, November. BioMegatron: Larger Biomedical Domain Language Model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 4700--4706).