_**This resource is a subproject of [bert_for_tensorflow](https://ngc.nvidia.com/catalog/resources/nvidia:bert_for_tensorflow). Visit the parent project to download the code and get more information about the setup.**_


In the original [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) paper, pre-training is done on [Wikipedia](https://dumps.wikimedia.org/) and [Books Corpus](http://yknzhu.wixsite.com/mbweb), with state-of-the-art results demonstrated on [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) (Stanford Question Answering Dataset) benchmark.

Meanwhile, many works, including [BioBERT](https://arxiv.org/pdf/1901.08746.pdf), [SciBERT](https://arxiv.org/pdf/1903.10676.pdf), [NCBI-BERT](https://arxiv.org/pdf/1906.05474.pdf), [ClinicalBERT (MIT)](https://arxiv.org/pdf/1904.03323.pdf), [ClinicalBERT (NYU, Princeton)](https://arxiv.org/pdf/1904.05342.pdf), and others at [BioNLP'19 workshop](https://aclweb.org/aclwiki/BioNLP_Workshop), show that additional pre-training of BERT on large biomedical text corpus such as [PubMed](https://www.ncbi.nlm.nih.gov/pubmed/) results in better performance in biomedical text-mining tasks.

This repository provides scripts and recipe to adopt the [NVIDIA BERT code-base](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT) to achieve state-of-the-art results in the following biomedical text-mining benchmark tasks:

- [BC5CDR-disease](https://biocreative.bioinformatics.udel.edu/tasks/biocreative-v/track-3-cdr/) A Named-Entity-Recognition task to recognize diseases mentioned in a collection of 1500 PubMed titles and abstracts ([Li et al., 2016](https://academic.oup.com/database/article/doi/10.1093/database/baw068/2630414))

- [BC5CDR-chemical](https://biocreative.bioinformatics.udel.edu/tasks/biocreative-v/track-3-cdr/) A Named-Entity-Recognition task to recognize chemicals mentioned in a collection of 1500 PubMed titles and abstracts ([Li et al., 2016](https://academic.oup.com/database/article/doi/10.1093/database/baw068/2630414))

- [ChemProt](https://biocreative.bioinformatics.udel.edu/news/corpora/) A Relation-Extraction task to determine chemical-protein interactions in a collection of 1820 PubMed abstracts ([Krallinger et al., 2017](https://biocreative.bioinformatics.udel.edu/media/store/files/2017/ProceedingsBCVI_v2.pdf?page=141))

BioBERT for TensorFlow