The TokenClassification Model in TAO supports Named entity recognition (NER) and other token level classification tasks, as long as the data follows the format specified below. This model card will focus on the NER task.
Named entity recognition (NER), also referred to as entity chunking, identification or extraction, is the task of detecting and classifying key information (entities) in text. In other words, a NER model takes a piece of text as input and for each word in the text, the model identifies a category the word belongs to.
For example, in a sentence: Mary lives in Santa Clara and works at NVIDIA
, the model should detect that Mary
is a person, Santa Clara
is a location and NVIDIA
is a company.
The primary use case of this model is to identify entities in a given input text. The model supports the following categories:
Input of the model: Text Output of the model: Category labels for each token in the input text.
The current version of the Named Entity Recognition model consists of the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model followed by a token classification head. All model parameters are jointly fine-tuned on the downstream task. More specifically, an input text is fed to the BERT model, and then the [CLS] representation of the text sequence is passed to the classification layer(s).
The length of the input text is currently constrained by the maximum sequence length of the BERT base uncased model, which is 512 tokens after tokenization.
The model was trained on GMB (Groningen Meaning Bank) corpus for entity recognition. The GMB dataset is a fairly large corpus with a lot of annotations. Note, that GMB is not completely human annotated and it’s not considered 100% correct. The data is labeled using the IOB format (short for inside, outside, beginning). The following classes appear in the dataset:
For this model, the classes ART, EVE, and NAT were combined into a MISC class due to small number of examples for these classes.
Pre-processed data that was used for training and evaluation could be found here.
Evaluation results with BERT base uncased model. Each word in the input sequence could be split into one or more tokens, as a result, there are two possible ways of the model evaluation: (1) marking the whole entity as a single label (2) perform evaluation on the sub token level
Here, the first approach was applied, and the predictions for the first token of the input were used to label the whole word. Due to high class unbalancing, the suggested metric for this model is F1 score (with macro averaging).
Evaluation on the GMB dataset dev set:
precision recall f1-score support O (label id: 0) 0.9913 0.9917 0.9915 131141 B-GPE (label id: 1) 0.9574 0.9420 0.9496 2362 B-LOC (label id: 2) 0.8402 0.9037 0.8708 5346 B-MISC (label id: 3) 0.4124 0.3077 0.3524 130 B-ORG (label id: 4) 0.7732 0.6805 0.7239 2980 B-PER (label id: 5) 0.8335 0.8510 0.8422 2577 B-TIME (label id: 6) 0.9176 0.9133 0.9154 2975 I-GPE (label id: 7) 0.8889 0.3478 0.5000 23 I-LOC (label id: 8) 0.7782 0.7835 0.7808 1030 I-MISC (label id: 9) 0.3036 0.2267 0.2595 75 I-ORG (label id: 10) 0.7712 0.7466 0.7587 2384 I-PER (label id: 11) 0.8710 0.8820 0.8765 2687 I-TIME (label id: 12) 0.8255 0.8273 0.8264 938
accuracy 0.9689 154648 macro avg 0.7818 0.7234 0.7421 154648 weighted avg 0.9685 0.9689 0.9686 154648
Evaluation on CoNLL-2003 dev set:
On CoNLL-2003 dev set https://www.clips.uantwerpen.be/conll2003/ner/ precision recall f1-score support O (label id: 0) 0.9834 0.9313 0.9566 42759 B-GPE (label id: 1) 0.0000 0.0000 0.0000 0 B-LOC (label id: 2) 0.7847 0.7757 0.7802 1837 B-MISC (label id: 3) 0.2917 0.0304 0.0550 922 B-ORG (label id: 4) 0.5664 0.5884 0.5772 1341 B-PER (label id: 5) 0.8591 0.7845 0.8201 1842 B-TIME (label id: 6) 0.0000 0.0000 0.0000 0 I-GPE (label id: 7) 0.0000 0.0000 0.0000 0 I-LOC (label id: 8) 0.4624 0.6226 0.5307 257 I-MISC (label id: 9) 0.3605 0.0896 0.1435 346 I-ORG (label id: 10) 0.4510 0.6618 0.5364 751 I-PER (label id: 11) 0.8800 0.8585 0.8691 1307 I-TIME (label id: 12) 0.0000 0.0000 0.0000 0
accuracy 0.8823 51362 macro avg 0.4338 0.4110 0.4053 51362 weighted avg 0.9313 0.8823 0.9034 51362
Note, that some labels of the GMB dataset are missing in the CoNLL dataset which led to the
multiple zeros in the above table and the reduced values of the aggregated metrics (the three lines in the bottom of the table).
This pre-trained model needs to be used with NVIDIA Hardware and Software.
This model can be used with Train Adapt Optimize (TAO) Toolkit with model load key: tlt_encode
(Please make sure to use this as the key for all TAO commands that require a model load key).
For more details on how to prepare your data for this model and how to fine-tune the model with TAO, see TAO user guide for Token Classification model.
By downloading and using the models and resources packaged with TAO Conversational AI, you would be accepting the terms of the Riva license
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.