This is a checkpoint for BioMegatron 345m uncased for Question Answering finetuned in NeMo https://github.com/NVIDIA/NeMo on the question answering dataset SQuADv1.1 https://rajpurkar.github.io/SQuAD-explorer/. BioMegatron is Megatron https://arxiv.org/abs/1909.08053 pretrained on uncased PubMed https://catalog.data.gov/dataset/pubmed, a biomedical domain dataset. The model is trained with https://github.com/NVIDIA/Megatron-LM. This model is helpful if you want to futher finetune on biomedical question answering datasets and gives better results than BioMegatron that is not finetuned on SQuADv1.1 beforehand.
The model achieves weighted SAcc/MRR/LAcc of 48/63.88/53.87 on BioASQ-7b-factoid test set.
Please be sure to download the latest version in order to ensure compatibility with the latest NeMo release.
Megatron is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA, which was trained with multinode and using mixed precision. Unlike BERT, the position of the layer normalization and the residual connection in the model architecture (similar to GPT-2 architucture) are swapped, which allowed the models to continue to improve as they were scaled up. This model reaches higher scores compared to BERT on a range of Natural Language Processing (NLP) tasks, including SQuAD Question Answering dataset. BioMegatron has the same network architecture as the Megatron, but is pretrained on a different dataset - PubMed, a large biomedical text corpus, which achieves better performance in biomedical downstream tasks, such as question answering(QA). Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. Apart from the Megatron encoder architecture this model also includes a question answering model head, which is stacked on top of Megatron. This question answering head is a token classifier and is, more specifically, a single fully connected layer.
Training works as follows: The user provides training and evaluation data in text form in JSON format. This data is parsed by scripts in NeMo and converted into model input. The input sequence is a concatenatenation of a tokenized query and its according reading passage. The question answering head predicts for each token in the reading passage or context if it is the start or end of the answer span. The model is trained using cross entropy loss.
For more information about BERT or Megatron please visit https://ngc.nvidia.com/catalog/models/nvidia:bertlargeuncasedfornemo or https://github.com/NVIDIA/Megatron-LM.
Source code and developer guide is available at https://github.com/NVIDIA/NeMo Refer to documentation at https://docs.nvidia.com/deeplearning/nemo/neural-modules-release-notes/index.html Code to pretrain and reproduce this model checkpoint are available at https://github.com/NVIDIA/Megatron-LM.
This model checkpoint can be used for either inference or finetuning on biomedical question answering datasets, as long as they are in the required format. More details at https://github.com/NVIDIA/NeMo.