SGD-QA model with BERT-Base-Cased backbone trained on Google SGD dataset
Model Overview
This is a cased dialogue state tracking model with a BERT [1] Base encoder finetuned on dataset Google SGD [2].
The model is based on the architecture presented in "SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services" paper [3].
Model Architecture
SGD-QA is a multi-pass NLU model consisting of a BERT [1] encoder with multi-task heads. We use a pre-trained BERT model
with four classification heads and one span prediction head for slot value extraction. The SGD-QA model relies on a question
answering approach and uses schema description
as query and the dialogue turn as context. Each input instance is intended for exactly one task which
is implemented by masking out the other task's losses.
Intent prediction, requested slot prediction and categorical slot value prediction are formulated as
binary sequence classification. The slot status prediction task is a 3-way sequence classification to
predict whether a slot is active, non-active ("none")
or does not matter to the user ("dontcare"). Multiple slots can become active in a single turn. Noncategorical slot value extraction extracts the values
for non-categorical slots by using a span-based prediction head, similar to SQuAD, where two token classification layers detect
the start and end positions of the slot value in the
dialogue turn. Like the categorical slot value prediction task, its prediction is only used when the
slot status is active.
Training
The model is trained on the full dataset Google SGD [2]. We used NVIDIA DGX1 with 8 V100 GPUs.
Hyperparameters can be found in [3].
Dataset
The Google SGD dataset [2] is the biggest dataset
for goal-oriented dialogue state tracking with over 20k
annotated dialogues for 45 services spanning 20
domains. The SGD dataset defines an ontology,
called schema, that contains descriptions in
natural language for all entities associated with
a particular service. Slots are further classified
into non-categorical slots and categorical slots.
For categorical slots, the schema also includes a
list of possible values. The user-system dialogues
can be either single-domain or multi-domain,
where a user can request two or more services per
dialogue. Each turn is labeled with relevant schema
information, called dialogue state, comprising
of the active intent, requested user slots and slot
assignments that occurred throughout the dialogue.
The SGD dataset is designed to test services
beyond those seen at training: 57% of the dev and
78% of the test dataset stem from with unseen
services. Nevertheless, seen and unseen services
can share similar slots and functionality
Performance
The latest model in the version history has the following joint goal accuracy which measures the accuracy of
predicting all slot assignments for a turn correctly.
Dev set
ALL SERVICES 59.72
SEEN SERVICES 65.7
UNSEEN SERVICES 51.96
Test set
ALL SERVICES 45.85
SEEN SERVICES 49.44
UNSEEN SERVICES 44.65
How to Use this Model
Automatically load the model from NGC
Test the model from NGC
If PATH_TO_PREPROCESSED_DATA does not exist, data will be preprocessed from EXISTING_DATA_DIR. If exists and user wants to load from cache for better performance set additionally model.dataset.use_cache=true
Input
The schema and dialogue json files in their original format
Output
dialogue state predictions during evaluation can be dumped as json files
Limitations
Since this model was trained on publically available datasets, the performance of this model might degrade for custom data that the model has not been trained on.
References
[1] https://arxiv.org/pdf/1810.04805.pdf
[2] https://github.com/google-research-datasets/dstc8-schema-guided-dialogue
[3] https://arxiv.org/abs/2105.08049
License
License to use this model is covered by the NGC TERMS OF USE unless another License/Terms Of Use/EULA is clearly specified. By downloading the public and release version of the model, you accept the terms and conditions of the NGC TERMS OF USE.