This is an uncased question answering model with a BERT Large encoder finetuned on dataset SQuADv1.1 . With Question Answering, or Reading Comprehension, given a question and a passage of content (context) that may contain an answer for the question, the model predicts the span within the text with a start and end position indicating the answer to the question.
The current version of the question answering model The model is based on the architecture presented in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" paper . In this particular instance, the model has 24 Transformer blocks. On top of that it is using a span prediction head, that is equivalent to token classification with 2 classes: one for the start of the span and one for the end of the span. All model parameters are jointly fine-tuned on the downstream task. More specifically, an input text is fed to the BERT encoder model, and the output states are further fed to the span prediction.
The model was trained with NeMo BERT large uncased checkpoint.
The model was trained on SQuADv1.1  corpus for question answering. For datasets like SQuAD 1.1, this model only supports cases where the answer is contained in the context.
Evaluation on the SQuAD1.1 dev set:
Exact Match 85.44%
The model is available for use in the NeMo toolkit , and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
import nemo import nemo.collections.nlp as nemo_nlp model = nemo_nlp.models.question_answering.qa_model.QAModel.from_pretrained(model_name="qa_squadv1.1_bertlarge")
python [NEMO_GIT_FOLDER]/examples/nlp/question_answering/question_answering_squad.py do_training=false pretrained_model=qa_squadv1.1_bertlarge model.validation_ds.file=[SOURCE_FILE] model.dataset.do_lower_case=true
The model takes a Json file as input that follows the SQuAD format.
The model outputs a JSON file as output for prediction and n-Best list.
The length of the input text is currently constrained by the maximum sequence length of the uncased encoder model, which is 512 tokens after tokenization.