This container is used for running the data preprocessing part of DeepVariant Training pipeline including make_examples and shuffle. It will generate output files as tfrecord.gz which can be fed to DeepVariants model_train step.
This is a template command for running GPU accelerated make_examples:
docker run --gpus all --rm -v <DATA_DIR>:<DATA_DIR> nvcr.io/nvidia/clara/deepvariant_train:4.2.0-1 \
pbrun make_examples --ref <REF_FILE> --reads <BAM_FILE> --truth-variants <TRUTH_VCF> --confident_regions <TRUTH_BED> \
--examples <TFRECORD_FILE> --disable-use-window-selector-model --channel-insert-size \
--num-gpus <GPU_NUM> --num-cpu-threads-per-stream <WORKER_THREAD_NUM> --num-zipper-threads <ZIPPER_THREAD_NUM>
This is a template command for running accelerated shuffle:
docker run --gpus all --rm -v <DATA_DIR>:<DATA_DIR> nvcr.io/nvidia/clara/deepvariant_train:4.2.0-1 \
pbrun shuffle --input_pattern_list <INPUT_PATTERN_LIST> --output_pattern_prefix <OUTPUT_PATTERN_PREFIX> \
--output_dataset_config <OUTPUT_PBTXT_FILE> --output_dataset_name <DATASET_NAME> --direct-num-workers <WORKER_THREAD_NUM>