NGC Catalog
CLASSIC
Welcome Guest
Resources
BioBERT for TensorFlow

BioBERT for TensorFlow

For downloads and more information, please view on a desktop device.
Logo for BioBERT for TensorFlow
Description
BERT for biomedical text-mining
Publisher
NVIDIA
Latest Version
-
Modified
April 4, 2023
Compressed Size
0 B

This resource is a subproject of bert_for_tensorflow. Visit the parent project to download the code and get more information about the setup.

In the original BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding paper, pre-training is done on Wikipedia and Books Corpus, with state-of-the-art results demonstrated on SQuAD (Stanford Question Answering Dataset) benchmark.

Meanwhile, many works, including BioBERT, SciBERT, NCBI-BERT, ClinicalBERT (MIT), ClinicalBERT (NYU, Princeton), and others at BioNLP'19 workshop, show that additional pre-training of BERT on large biomedical text corpus such as PubMed results in better performance in biomedical text-mining tasks.

This repository provides scripts and recipe to adopt the NVIDIA BERT code-base to achieve state-of-the-art results in the following biomedical text-mining benchmark tasks:

  • BC5CDR-disease A Named-Entity-Recognition task to recognize diseases mentioned in a collection of 1500 PubMed titles and abstracts (Li et al., 2016)

  • BC5CDR-chemical A Named-Entity-Recognition task to recognize chemicals mentioned in a collection of 1500 PubMed titles and abstracts (Li et al., 2016)

  • ChemProt A Relation-Extraction task to determine chemical-protein interactions in a collection of 1820 PubMed abstracts (Krallinger et al., 2017)