NGC | Catalog
CatalogContainersContainer for Training DeepVariant

Container for Training DeepVariant

For copy image paths and more information, please view on a desktop device.
Logo for Container for Training DeepVariant

Features

Description

This container is used for running the data preprocessing part of DeepVariant Training pipeline including make_examples and shuffle

Publisher

NVIDIA Clara Parabricks

Latest Tag

4.2.0-1

Modified

November 1, 2023

Compressed Size

1.2 GB

Multinode Support

No

Multi-Arch Support

No

4.2.0-1 (Latest) Security Scan Results

Linux / amd64

This container is used for running the data preprocessing part of DeepVariant Training pipeline including make_examples and shuffle. It will generate output files as tfrecord.gz which can be fed to DeepVariants model_train step.

This is a template command for running GPU accelerated make_examples:

docker run --gpus all --rm -v <DATA_DIR>:<DATA_DIR> nvcr.io/nvidia/clara/deepvariant_train:4.2.0-1 \
  pbrun make_examples --ref <REF_FILE> --reads <BAM_FILE> --truth-variants <TRUTH_VCF> --confident_regions <TRUTH_BED> \
  --examples <TFRECORD_FILE> --disable-use-window-selector-model --channel-insert-size \
  --num-gpus <GPU_NUM> --num-cpu-threads-per-stream <WORKER_THREAD_NUM> --num-zipper-threads <ZIPPER_THREAD_NUM>

This is a template command for running accelerated shuffle:

docker run --gpus all --rm -v <DATA_DIR>:<DATA_DIR> nvcr.io/nvidia/clara/deepvariant_train:4.2.0-1 \
  pbrun shuffle --input_pattern_list <INPUT_PATTERN_LIST> --output_pattern_prefix <OUTPUT_PATTERN_PREFIX> \
  --output_dataset_config <OUTPUT_PBTXT_FILE> --output_dataset_name <DATASET_NAME> --direct-num-workers <WORKER_THREAD_NUM>