Parabricks Umi_Fgbio is a pipeline that processes sequencing reads with molecular barcodes, and provides impressive error correction and increased accuracy using a sequencing consensus read level.
Parabricks_umi_fgbio
The Parabricks umi_fgbio Container houses accelerated fgbio toolkit which utilizes molecular tagging (UMIs) to distinguish actual biological variants from machine sequencing errors, ensuring high-fidelity genomic data analysis
Description
This UMI pipeline is based on Fulcrum Genomics toolkit, processes sequencing reads with
molecular barcodes (also known as Unique Molecular Indices, UMIs), which provide
impressive error correction and increased accuracy using a sequencing consensus read level.
UMI data can be processed using the workflow based on fgbio methods. The tools can be run
in a standalone fashion or the whole set of tools can be run using one command.
The container components are ready for commercial/non-commercial use.
License/Terms of Use:
Use of this software is governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products.
Deployment Geography:
Global
Release Date:
NGC - 01/13/2026
Program Classes
A GPU-accelerated software for processing UMI-tagged DNA sequencing reads to produce error-corrected consensus reads for downstream genomic analysis.
Deployment Details:
Our accelerated fgbio tool is designed and optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware parallelism and optimized software frameworks, the toolkit achieves significantly faster genomic data processing and consensus calling times compared to standard CPU-based Java implementations
Reference(s):
Fennell, T., Homer, N., & Fulcrum Genomics. (n.d.). fgbio: Tools for working with genomic and high throughput sequencing data, GitHub. https://github.com/fulcrumgenomics/fgbio
Container Version(s):
v1.4.0
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal developer team to ensure this container meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report quality, risk, security vulnerabilities or NVIDIA AI Concerns here.
Get Help
Enterprise Support
Get access to knowledge base articles and support cases or submit a ticket.
Quick Start Run Commands
Dual Header
pbrun umi_fgbio --in-fq firstInput.fastq.gz secondInput.fastq.gz \
--ref theReferenceFile.fasta \
--out-dir directoryWithOutput \
--umi-in-header \
--strategy paired
Dual Read
pbrun umi_fgbio --in-fq firstInput.fastq.gz secondInput.fastq.gz \
--ref theReferenceFile.fasta \
--out-dir directoryWithOutput \
--strategy paired \
--read-structures 7M1B+T 7M1B+T
Dual Separate
pbrun umi_fgbio --in-fq secondInput.fastq.gz fourthInput.fastq.gz firstInput.fastq.gz thirdInput.fastq.gz \
--ref theReferenceFile.fasta \
--out-dir directoryWithOutput \
--strategy paired \
--read-structures 1B+T +T 7M 7M
Single Header
pbrun umi_fgbio --in-fq firstInput.fastq.gz secondInput.fastq.gz \
--ref theReferenceFile.fasta \
--out-dir directoryWithOutput \
--umi-in-header \
--strategy adjacency
Single Separate
pbrun umi_fgbio --in-fq secondInput.fastq.gz fourthInput.fastq.gz firstInput.fastq.gz \
--ref theReferenceFile.fasta \
--out-dir directoryWithOutput \
--strategy adjacency \
--read-structures 1B+T +T 7M
Input/Output file options
--in-fq [IN_FQ [IN_FQ ...]]
Path to one or more fastq files, each corresponding to
a sub-read. Files can be in fastq or fastq.gz format.
(default: None)
**Option is required.**
--ref REF
Path to the reference file (default: None)
**Option is required.**
--metadata METADATA
Path to a file containing the metadata about the
samples. If no file is provided, output reads will be
put into unmatched files only. (default: None)
--out-dir OUT_DIR
Path to the directory that will contain all of the
generated files.
(default: None)
**Option is required.**
Pipeline Options
--umi-in-header
Specifies that UMIs are in the read header. (default:
False)
-L INTERVAL, --interval INTERVAL
Interval within which to call bqsr from the input
reads. All intervals will have a padding of 100 to get
read records, and overlapping intervals will be
combined. Interval files should be passed using the
--interval-file option. This option can be used
multiple times (e.g. "-L chr1 -L chr2:10000 -L
chr3:20000+ -L chr4:10000-20000") (default: None)
--bwa-options BWA_OPTIONS
Pass supported bwa mem options as one string. The
current original bwa mem supported options are -M, -Y,
-T (e.g. --bwa-options="-M -Y") (default: None)
--no-warnings
Suppress warning messages about system thread and
memory usage. (default: None)
--read-group-sm READ_GROUP_SM
SM tag for read groups in this run. (default: None)
--read-group-lb READ_GROUP_LB
LB tag for read groups in this run. (default: None)
--read-group-pl READ_GROUP_PL
PL tag for read groups in this run. (default: None)
--read-group-id-prefix READ_GROUP_ID_PREFIX
Prefix for the ID and PU tags for read groups in this
run. This prefix will be used for all pairs of fastq
files in this run. The ID and PU tags will consist of
this prefix, and an identifier that will be unique for
a pair of fastq files. (default: None)
-ip INTERVAL_PADDING, --interval-padding INTERVAL_PADDING
Amount of padding (in base pairs) to add to each
interval you are including. (default: None)
--read-structures [READ_STRUCTURES [READ_STRUCTURES ...]]
The read structure for each of the FASTQs. There must
be one read structure per input fastq file. (default:
None)
--no-barcode
Remove the requirement that input read structures must
contain sample barcodes. (default: None)
--out-metrics OUT_METRICS
The file to which per-barcode metrics are written in
the output directory. If none given, a file named
demux_barcode_metrics.txt will be written to the
output directory. If UMIs are in read header, no
metrics file will be generated. (default: None)
--num-zip-threads NUM_ZIP_THREADS
Number of CPUs to use for zipping BAM files in a run
(default 16 for coordinate sorts and 10 otherwise).
(default: None)
--num-sort-threads NUM_SORT_THREADS
Number of CPUs to use for sorting in a run (default 10
for coordinate sorts and 16 otherwise). (default:
None)
--max-records-in-ram MAX_RECORDS_IN_RAM
Maximum number of records in RAM when using a
queryname or template coordinate sort mode; lowering
this number will decrease maximum memory usage.
(default: 65000000)
--mem-limit MEM_LIMIT
Memory limit in GBs during sorting and postsorting. By
default, the limit is half of the total system memory.
(default: 1007)
--gpuwrite
Use one GPU to accelerate writing final BAM/CRAM.
(default: None)
--gpuwrite-deflate-algo GPUWRITE_DEFLATE_ALGO
Choose the nvCOMP DEFLATE algorithm to use with
--gpuwrite. Note these options do not correspond to CPU
DEFLATE options. Valid options are 1, 2, and 4. Option
1 is fastest, while options 2 and 4 have progressively
lower throughput but higher compression ratios. The
default value is 1 when the user does not provide an
input (i.e., None) (default: None)
--gpusort
Use GPUs to accelerate sorting and marking.
(default: None)
--strategy STRATEGY
The UMI assignment strategy, which can be adjacency or
paired. (default: )
**Option is required.**
--min-map-q MIN_MAP_Q
Minimum mapping quality. (default: 30)
--num-worker-threads NUM_WORKER_THREADS
Number of threads for worker. (default: 14)
--error-rate-pre-umi ERROR_RATE_PRE_UMI
The Phred-scaled error rate for an error prior to the
UMIs being integrated. (default: 45)
--error-rate-post-umi ERROR_RATE_POST_UMI
The Phred-scaled error rate for an error post the UMIs
have been integrated. (default: 40)
--min-input-base-quality MIN_INPUT_BASE_QUALITY
Ignore bases in raw reads that have Q below this
value. (default: 10)
--min-consensus-base-quality MIN_CONSENSUS_BASE_QUALITY
Mask (make ‘N’) consensus bases with quality less than
this threshold. (default: 2)
--min-reads MIN_READS
The minimum number of reads to produce a consensus
base. (default: 1)
--out-suffixF OUT_SUFFIXF
Output suffix used for paired reads that are first in
pair. The suffix must end with ".gz" (default:
_1.fastq.gz)
--out-suffixF2 OUT_SUFFIXF2
Output suffix used for paired reads that are second in
pair. The suffix must end with ".gz" (default:
_2.fastq.gz)
--out-suffixO OUT_SUFFIXO
Output suffix used for orphan/unmatched reads that are
first in pair. The suffix must end with ".gz". If no
suffix is provided, these reads will be ignored
(default: None)
--out-suffixO2 OUT_SUFFIXO2
Output suffix used for orphan/unmatched reads that are
second in pair. The suffix must end with ".gz". If no
suffix is provided, these reads will be ignored
(default: None)
--out-suffixS OUT_SUFFIXS
Output suffix used for single-end/unpaired reads. The
suffix must end with ".gz". If no suffix is provided,
these reads will be ignored (default: None)
--rg-tag RG_TAG
Split reads into different fastq files based on the
read group tag. Must be either PU or ID (default:
None)
--remove-qc-failure
Remove reads from the output that have abstract QC
failure. (default: None)
--num-threads NUM_THREADS
Number of threads to run. (default: 8)