NGC | Catalog
Welcome Guest
CatalogModelsNER En Bert

NER En Bert

For downloads and more information, please view on a desktop device.
Logo for NER En Bert


Named Entity Recognition model with BERT



Use Case

Name Entity Recognition


PyTorch with NeMo

Latest Version



June 21, 2022


388.4 MB

Model Overview

The TokenClassification Model in TAO supports Named Entity Recognition (NER) and other token-level classification tasks, as long as the data follows the format specified below. This model card will focus on the NER task.

Named entity recognition (NER), also referred to as entity chunking, identification or extraction, is the task of detecting and classifying key information (entities) in text. In other words, a NER model takes a piece of text as input and for each word in the text, the model identifies a category the word belongs to. For example, in a sentence: Mary lives in Santa Clara and works at NVIDIA, the model should detect that Mary is a person, Santa Clara is a location and NVIDIA is a company.

Trained or fine-tuned NeMo models (with the file extenstion .nemo) can be converted to Riva models (with the file extension .riva) and then deployed. Here is a pre-trained Riva model for NER with BERT in English.

Model Architecture

The current version of the Named Entity Recognition model consists of the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model [1] followed by a token classification head. All model parameters are jointly fine-tuned on the downstream task. More specifically, an input text is fed to the BERT model, and then the [CLS] representation of the text sequence is passed to the classification layer(s).


The model was trained with NeMo BERT base uncased checkpoint.


The model was trained on GMB (Groningen Meaning Bank) corpus for entity recognition. The GMB dataset is a fairly large corpus with a lot of annotations. Note, that GMB is not completely human annotated and it’s not considered 100% correct. The data is labeled using the IOB format (short for inside, outside, beginning). The following classes appear in the dataset:

  • LOC = Geographical Entity
  • ORG = Organization
  • PER = Person
  • GPE = Geopolitical Entity
  • TIME = Time indicator
  • ART = Artifact
  • EVE = Event
  • NAT = Natural Phenomenon

For this model, the classes ART, EVE, and NAT were combined into a MISC class due to small number of examples for these classes.

Pre-processed data that was used for training and evaluation could be found here.


Each word in the input sequence could be split into one or more tokens, as a result, there are two possible ways of the model evaluation:

  • marking the whole entity as a single label
  • perform evaluation on the sub token level

Here, the first approach was applied, and the predictions for the first token of the input were used to label the whole word. Due to high class unbalancing, the suggested metric for this model is F1 score (with macro averaging).

Evaluation on the GMB dataset dev set:

                            precision    recall  f1-score   support
          O (label id: 0)     0.9913    0.9917    0.9915    131141
      B-GPE (label id: 1)     0.9574    0.9420    0.9496      2362
      B-LOC (label id: 2)     0.8402    0.9037    0.8708      5346
     B-MISC (label id: 3)     0.4124    0.3077    0.3524       130
      B-ORG (label id: 4)     0.7732    0.6805    0.7239      2980
      B-PER (label id: 5)     0.8335    0.8510    0.8422      2577
     B-TIME (label id: 6)     0.9176    0.9133    0.9154      2975
      I-GPE (label id: 7)     0.8889    0.3478    0.5000        23
      I-LOC (label id: 8)     0.7782    0.7835    0.7808      1030
     I-MISC (label id: 9)     0.3036    0.2267    0.2595        75
     I-ORG (label id: 10)     0.7712    0.7466    0.7587      2384
     I-PER (label id: 11)     0.8710    0.8820    0.8765      2687
    I-TIME (label id: 12)     0.8255    0.8273    0.8264       938
                 accuracy                         0.9689    154648
                macro avg     0.7818    0.7234    0.7421    154648
             weighted avg     0.9685    0.9689    0.9686    154648

How to use this model

The model is available for use in the NeMo toolkit [2], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.


The model takes text as input.


The model outputs category labels for each token in the input text.

Automatically load the model from NGC

import nemo
import nemo.collections.nlp as nemo_nlp
model = nemo_nlp.models.TokenClassificationModel.from_pretrained(model_name="ner_en_bert")
model.add_predictions(['we bought four shirts from the nvidia gear store in santa clara.', 'NVIDIA is a company.'])


The length of the input text is currently constrained by the maximum sequence length of the BERT base uncased model, which is 512 tokens after tokenization. The punctuation model supports commas, periods and question marks.


[1] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

[2] NVIDIA NeMo Toolkit


License to use this model is covered by the NGC TERMS OF USE unless another License/Terms Of Use/EULA is clearly specified. By downloading the public and release version of the model, you accept the terms and conditions of the NGC TERMS OF USE.