NGC | Catalog
CatalogContainersContainer for Training DeepVariant

Container for Training DeepVariant

Logo for Container for Training DeepVariant
Features
Description
This container is used for running the data preprocessing part of DeepVariant Training pipeline including make_examples and shuffle
Publisher
NVIDIA Clara Parabricks
Latest Tag
4.2.0-1
Modified
April 1, 2024
Compressed Size
1.2 GB
Multinode Support
No
Multi-Arch Support
No
4.2.0-1 (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

This container is used for running the data preprocessing part of DeepVariant Training pipeline including make_examples and shuffle. It will generate output files as tfrecord.gz which can be fed to DeepVariants model_train step.

This is a template command for running GPU accelerated make_examples:

docker run --gpus all --rm -v <DATA_DIR>:<DATA_DIR> nvcr.io/nvidia/clara/deepvariant_train:4.2.0-1 \
  pbrun make_examples --ref <REF_FILE> --reads <BAM_FILE> --truth-variants <TRUTH_VCF> --confident_regions <TRUTH_BED> \
  --examples <TFRECORD_FILE> --disable-use-window-selector-model --channel-insert-size \
  --num-gpus <GPU_NUM> --num-cpu-threads-per-stream <WORKER_THREAD_NUM> --num-zipper-threads <ZIPPER_THREAD_NUM>

This is a template command for running accelerated shuffle:

docker run --gpus all --rm -v <DATA_DIR>:<DATA_DIR> nvcr.io/nvidia/clara/deepvariant_train:4.2.0-1 \
  pbrun shuffle --input_pattern_list <INPUT_PATTERN_LIST> --output_pattern_prefix <OUTPUT_PATTERN_PREFIX> \
  --output_dataset_config <OUTPUT_PBTXT_FILE> --output_dataset_name <DATASET_NAME> --direct-num-workers <WORKER_THREAD_NUM>