NGC | Catalog
Welcome Guest
CatalogModelsFasterTransformer BERT base SQuAD 1.1 TF QAT model

FasterTransformer BERT base SQuAD 1.1 TF QAT model

For downloads and more information, please view on a desktop device.
Logo for FasterTransformer BERT base SQuAD 1.1 TF QAT model

Description

BERT SQuAD 1.1 TensorFlow QAT checkpoint for FasterTransformer

Publisher

NVIDIA

Use Case

Natural Language Processing

Framework

TensorFlow

Latest Version

3.0

Modified

September 24, 2020

Size

1.29 GB

BERT SQuAD 1.1 TensorFlow QAT checkpoint for FasterTransformer

FasterTransformer is a highly optimized transformer implementation for inference, and it is tested and maintained by NVIDIA.

In NLP, encoder and decoder are two important components, with the transformer layer becoming a popular architecture for both components. FasterTransformer implements a highly optimized transformer layer for both the encoder and decoder for inference. On Ampere, Volta, and Turing GPUs, the computing power of Tensor Cores are used automatically.

In FasterTransformer 3.0, we implemented the INT8 quantization for encoder (supporting effective transformer). With INT8 quantization, we can take advantage of the powerful INT8 tensor core in Turing and Ampere GPUs to achieve better inference performance. We also provide quantization tools of tensorflow.

Due to the precision issues of INT8 inference, we need to do quantization-aware training to get a better accuracy results. This model checkpoint is trained by bert-tf-quantization tools for demo purpose of FasterTransformer 3.0 INT8 inference.

Usage

Please refer to our GitHub repo. Details are in "Run FasterTransformer for SQuAD 1.1 dataset" part of 3.0 version directory.

Accuracy

TensorFlow: F1: 88.53%, EM: 81.05%

TensorFlow with FT op: F1: 88.33%, EM: 80.65% (with tensorflow:19.07-py2 docker image)

TensorFlow with FT op: F1: 88.27%, EM: 80.52% (with tensorflow:20.03-tf1-py3 docker image)