NGC | Catalog
Welcome Guest


For downloads and more information, please view on a desktop device.
Logo for BioMegatron345mUncasedSQuADv1


BioMegatron 345M uncased model for Question Answering finetuned with NeMo on SQuAD v1.1 dataset.



Use Case

Language Modeling


PyTorch with NeMo

Latest Version



April 13, 2021


637.65 MB


This is a checkpoint for BioMegatron 345m uncased for Question Answering finetuned in NeMo on the question answering dataset SQuADv1.1 BioMegatron is Megatron pretrained on uncased PubMed, a biomedical domain dataset. The model is trained with This model is helpful if you want to futher finetune on biomedical question answering datasets and gives better results than BioMegatron that is not finetuned on SQuADv1.1 beforehand.

The model achieves weighted SAcc/MRR/LAcc of 48/63.88/53.87 on BioASQ-7b-factoid test set.

Please be sure to download the latest version in order to ensure compatibility with the latest NeMo release.

  • - finetuned Megatron model weights
  • - finetuned question answering head weights
  • config.json - the config file used to initialize model network architecture in NeMo
  • vocab.txt - vocabulary file used for train this checkpoint

More Details

Megatron is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA, which was trained with multinode and using mixed precision. Unlike BERT, the position of the layer normalization and the residual connection in the model architecture (similar to GPT-2 architucture) are swapped, which allowed the models to continue to improve as they were scaled up. This model reaches higher scores compared to BERT on a range of Natural Language Processing (NLP) tasks, including SQuAD Question Answering dataset. BioMegatron has the same network architecture as the Megatron, but is pretrained on a different dataset - PubMed, a large biomedical text corpus, which achieves better performance in biomedical downstream tasks, such as question answering(QA). Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. Apart from the Megatron encoder architecture this model also includes a question answering model head, which is stacked on top of Megatron. This question answering head is a token classifier and is, more specifically, a single fully connected layer.

Training works as follows: The user provides training and evaluation data in text form in JSON format. This data is parsed by scripts in NeMo and converted into model input. The input sequence is a concatenatenation of a tokenized query and its according reading passage. The question answering head predicts for each token in the reading passage or context if it is the start or end of the answer span. The model is trained using cross entropy loss.

For more information about BERT or Megatron please visit or


Source code and developer guide is available at Refer to documentation at Code to pretrain and reproduce this model checkpoint are available at

This model checkpoint can be used for either inference or finetuning on biomedical question answering datasets, as long as they are in the required format. More details at

Usage example 1: Finetune on BioASQ-factoid dataset