NGC | Catalog
CatalogModelsBioBERT TF checkpoint (Base, Uncased, NER, ChemProt, AMP)

BioBERT TF checkpoint (Base, Uncased, NER, ChemProt, AMP)

Logo for BioBERT TF checkpoint (Base, Uncased, NER, ChemProt, AMP)
BioBERT Base Uncased Fine tuned checkpoint for Named Entity Recognition on BC5CDR Chemical dataset.
NVIDIA Deep Learning Examples
Latest Version
April 4, 2023
1.23 GB

Model Overview

BERT for biomedical text-mining.

Model Architecture

In the original BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding paper, pre-training is done on Wikipedia and Books Corpus, with state-of-the-art results demonstrated on SQuAD (Stanford Question Answering Dataset) benchmark.

Meanwhile, many works, including BioBERT, SciBERT, NCBI-BERT, ClinicalBERT (MIT), ClinicalBERT (NYU, Princeton), and others at BioNLP'19 workshop, show that additional pre-training of BERT on large biomedical text corpus such as PubMed results in better performance in biomedical text-mining tasks.

This repository provides scripts and recipe to adopt the NVIDIA BERT code-base to achieve state-of-the-art results in the following biomedical text-mining benchmark tasks:


This model was trained using script available on NGC and in GitHub repo.


The following datasets were used to train this model:

  • PubMed - Database contains more than 33 million citations and abstracts of biomedical literature.
  • BioCreative V CDR - Dataset consisting of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions.


Performance numbers for this model are available in NGC.



This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.