NGC | Catalog
CatalogModelsBioMegatron345mCased

BioMegatron345mCased

Logo for BioMegatron345mCased
Description
Megatron pretrained on cased biomedical dataset PubMed with 345 million parameters.
Publisher
-
Latest Version
0
Modified
April 4, 2023
Size
634.64 MB

Overview

This is a checkpoint for BioMegatron 345m cased. BioMegatron is Megatron https://arxiv.org/abs/1909.08053 pretrained on cased PubMed https://catalog.data.gov/dataset/pubmed, a biomedical domain dataset, which gives improved results on a range of biomedical downstream tasks. The model has around 345 million paramters. The model is trained with https://github.com/NVIDIA/Megatron-LM.

The model achieves weighted SAcc/MRR/LAcc of 45.12/61.17/51.4 on BioASQ-7b-factoid test set (after finetuned on SQuADv1.1 dataset), macro precision/recall/f1 of 82.46/78.79/80.52 on RE dataset ChemProt.

Please be sure to download the latest version in order to ensure compatibility with the latest NeMo release.

  • MegatronBERT.pt - pretrained Megatron model weights
  • config.json - the config file used to initialize model network architecture in NeMo
  • vocab.txt - vocabulary file used for train this checkpoint

More Details

Megatron is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA, which was trained with multinode and using mixed precision. Unlike BERT, the position of the layer normalization and the residual connection in the model architecture (similar to GPT-2 architucture) are swapped, which allowed the models to continue to improve as they were scaled up. This model reaches higher scores compared to BERT on a range of Natural Language Processing (NLP) tasks. BioMegatron has the same network architecture as the Megatron, but is pretrained on a different dataset - PubMed, a large biomedical text corpus, which achieves better performance in biomedical downstream tasks than the original Megatron.

For more information about BioBERT or Megatron visit https://ngc.nvidia.com/catalog/models/nvidia:biobertbaseuncasedfornemo, https://ngc.nvidia.com/catalog/models/nvidia:biobertlargeuncasedfornemo or https://github.com/NVIDIA/Megatron-LM.

Documentation

Source code and developer guide is available at https://github.com/NVIDIA/NeMo Refer to documentation at https://docs.nvidia.com/deeplearning/nemo/neural-modules-release-notes/index.html

This model checkpoint can be used for finetuning on biomedical question answering datasets, such as named entity recognition(NER), question answering (QA), or relationship extraction (RE).

In the following we show examples for how to finetune BioBERT on different downstream tasks.

Usage example 1: Finetune on BioASQ-factoid dataset

Visit https://github.com/NVIDIA/NeMo/blob/master/examples/nlp/biobert_notebooks/biobert_qa.ipynb

Usage example 2: Finetune on RE dataset ChemProt

Visit https://github.com/NVIDIA/NeMo/blob/master/examples/nlp/biobert_notebooks/biobert_re.ipynb

Usage example 2: Finetune on NER dataset NBCI

Visit https://github.com/NVIDIA/NeMo/blob/master/examples/nlp/biobert_notebooks/biobert_ner.ipynb