This is a pre-trained autoencoding language model trained on English Wikipedia and BookCorpus using a sequence length of 512. The model is based on the architecture presented in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" paper [1].
The model is based on the architecture presented in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" paper [1]. In this particular instance, the model has 12 Transformer blocks. It is using WordPiece tokenizer [2].
The model was trained from scratch on preprocessed English Wikipedia and BookCorpus using a sequence length of 512.
The model was trained from scratch on preprocessed English Wikipedia and BookCorpus using a sequence length of 512. The processing was done with NVIDIA Deep Learning Examples [4].
The accuracy of language models are often measured on downstream tasks such as SQuAD [3]. On SQuADv1.1 it reaches EM=82.78, F1=89.97, on SQuADv2.0 EM=75.04, F1=78.08
The model is available for use in the NeMo toolkit [5], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
import nemo
import nemo.collections.nlp as nemo_nlp
model = nemo_nlp.models.language_modeling.BERTLMModel.from_pretrained(model_name="bertbaseuncased")
python [NEMO_GIT_FOLDER]/examples/nlp/language_modeling/ --config-name=bert_pretraining_from_preprocessed_config.yaml
The model takes preprocessed data as input.
The model outputs masked language model loss and optional next sentence prediction.
The length of the input text is currently constrained by the maximum sequence length of the model, which is 512 tokens after tokenization.
License to use this model is covered by the NGC TERMS OF USE unless another License/Terms Of Use/EULA is clearly specified. By downloading the public and release version of the model, you accept the terms and conditions of the NGC TERMS OF USE.