NGC | Catalog
CatalogResourcesParabricks DeepVariant Retraining Notebook

Parabricks DeepVariant Retraining Notebook

Logo for Parabricks DeepVariant Retraining Notebook
Description
This notebook shows how to retrain DeepVariant models using Parabricks.
Publisher
NVIDIA
Latest Version
4.0.0-2
Modified
June 22, 2023
Compressed Size
8.06 KB

Retraining DeepVariant using Parabricks

This repository includes a Jupyter notebook for retraining a DeepVariant model using Parabricks. The notebook first generates a baseline VCF using the out-of-the-box DeepVariant model, then re-trains the model on custom data, and re-evaluates the performance. This is intended to be used as a reference guide and it's encourage to try this on your own data.

The zip file provided by this resource has the following structure:

.
├── Retraining_DeepVariant.ipynb
└── scripts
    ├── download_data.sh
    └── shuffle_tfrecords_lowmem.py

Retraining_DeepVariant.ipynb is the notebook where the code will be run.

download_data.sh is used to download the dataset that is needed to run the notebook. This should be downloaded to <path_to_notebook>/data.

shuffle_tfrecords_lowmem.py is an accessory script that gets called by the notebook to shuffle the dataset.

This notebook will run on V100, T4, or A100. The accuracy can be improved by increasing the number of training steps. By default it is set fairly low at 5000 so the code runs quickly, but for full results, it should be set to 50,000 steps.