NVIDIA

Parabricks Umi_Fgbio

Container

NVIDIA

Parabricks Umi_Fgbio

Parabricks Umi_Fgbio is a pipeline that processes sequencing reads with molecular barcodes, and provides impressive error correction and increased accuracy using a sequencing consensus read level.

NVIDIA AI Enterprise Supported

Parabricks_umi_fgbio

The Parabricks umi_fgbio Container houses accelerated fgbio toolkit which utilizes molecular tagging (UMIs) to distinguish actual biological variants from machine sequencing errors, ensuring high-fidelity genomic data analysis

Description

This UMI pipeline is based on Fulcrum Genomics toolkit, processes sequencing reads with molecular barcodes (also known as Unique Molecular Indices, UMIs), which provide impressive error correction and increased accuracy using a sequencing consensus read level.

UMI data can be processed using the workflow based on fgbio methods. The tools can be run in a standalone fashion or the whole set of tools can be run using one command.

The container components are ready for commercial/non-commercial use.

License/Terms of Use:

Use of this software is governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products.

Deployment Geography:

Global

Release Date:

NGC - 01/13/2026

Program Classes

A GPU-accelerated software for processing UMI-tagged DNA sequencing reads to produce error-corrected consensus reads for downstream genomic analysis.

Deployment Details:

Our accelerated fgbio tool is designed and optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware parallelism and optimized software frameworks, the toolkit achieves significantly faster genomic data processing and consensus calling times compared to standard CPU-based Java implementations

Reference(s):

Fennell, T., Homer, N., & Fulcrum Genomics. (n.d.). fgbio: Tools for working with genomic and high throughput sequencing data, GitHub. https://github.com/fulcrumgenomics/fgbio

Container Version(s):

v1.4.0

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal developer team to ensure this container meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Get Help

Enterprise Support

Get access to knowledge base articles and support cases or submit a ticket.

Quick Start Run Commands

Dual Header

pbrun umi_fgbio --in-fq firstInput.fastq.gz secondInput.fastq.gz \
    --ref theReferenceFile.fasta \
    --out-dir directoryWithOutput \
    --umi-in-header \
    --strategy paired

Dual Read

pbrun umi_fgbio --in-fq firstInput.fastq.gz secondInput.fastq.gz \
    --ref theReferenceFile.fasta \
    --out-dir directoryWithOutput \
    --strategy paired \
    --read-structures 7M1B+T 7M1B+T

Dual Separate

pbrun umi_fgbio --in-fq secondInput.fastq.gz fourthInput.fastq.gz firstInput.fastq.gz thirdInput.fastq.gz \
    --ref theReferenceFile.fasta \
    --out-dir directoryWithOutput \
    --strategy paired \
    --read-structures 1B+T +T 7M 7M

Single Header

pbrun umi_fgbio --in-fq firstInput.fastq.gz secondInput.fastq.gz \
    --ref theReferenceFile.fasta \
    --out-dir directoryWithOutput \
    --umi-in-header \
    --strategy adjacency

Single Separate

pbrun umi_fgbio --in-fq secondInput.fastq.gz fourthInput.fastq.gz firstInput.fastq.gz \
    --ref theReferenceFile.fasta \
    --out-dir directoryWithOutput \
    --strategy adjacency \
    --read-structures 1B+T +T 7M

Input/Output file options

--in-fq [IN_FQ [IN_FQ ...]]
    Path to one or more fastq files, each corresponding to
    a sub-read. Files can be in fastq or fastq.gz format.
    (default: None)

    **Option is required.**


--ref REF
    Path to the reference file (default: None)

    **Option is required.**


--metadata METADATA
    Path to a file containing the metadata about the
    samples. If no file is provided, output reads will be
    put into unmatched files only. (default: None)


--out-dir OUT_DIR
    Path to the directory that will contain all of the
    generated files.
    (default: None)

    **Option is required.**

Pipeline Options

--umi-in-header
    Specifies that UMIs are in the read header. (default:
    False)


-L INTERVAL, --interval INTERVAL
    Interval within which to call bqsr from the input
    reads. All intervals will have a padding of 100 to get
    read records, and overlapping intervals will be
    combined. Interval files should be passed using the
    --interval-file option. This option can be used
    multiple times (e.g. "-L chr1 -L chr2:10000 -L
    chr3:20000+ -L chr4:10000-20000") (default: None)


--bwa-options BWA_OPTIONS
    Pass supported bwa mem options as one string. The
    current original bwa mem supported options are -M, -Y,
    -T (e.g. --bwa-options="-M -Y") (default: None)


--no-warnings
    Suppress warning messages about system thread and
    memory usage. (default: None)


--read-group-sm READ_GROUP_SM
    SM tag for read groups in this run. (default: None)


--read-group-lb READ_GROUP_LB
    LB tag for read groups in this run. (default: None)


--read-group-pl READ_GROUP_PL
    PL tag for read groups in this run. (default: None)


--read-group-id-prefix READ_GROUP_ID_PREFIX
    Prefix for the ID and PU tags for read groups in this
    run. This prefix will be used for all pairs of fastq
    files in this run. The ID and PU tags will consist of
    this prefix, and an identifier that will be unique for
    a pair of fastq files. (default: None)


-ip INTERVAL_PADDING, --interval-padding INTERVAL_PADDING
    Amount of padding (in base pairs) to add to each
    interval you are including. (default: None)


--read-structures [READ_STRUCTURES [READ_STRUCTURES ...]]
    The read structure for each of the FASTQs. There must
    be one read structure per input fastq file. (default:
    None)


--no-barcode
    Remove the requirement that input read structures must
    contain sample barcodes. (default: None)


--out-metrics OUT_METRICS
    The file to which per-barcode metrics are written in
    the output directory. If none given, a file named
    demux_barcode_metrics.txt will be written to the
    output directory. If UMIs are in read header, no
    metrics file will be generated. (default: None)


--num-zip-threads NUM_ZIP_THREADS
    Number of CPUs to use for zipping BAM files in a run
    (default 16 for coordinate sorts and 10 otherwise).
    (default: None)


--num-sort-threads NUM_SORT_THREADS
    Number of CPUs to use for sorting in a run (default 10
    for coordinate sorts and 16 otherwise). (default:
    None)


--max-records-in-ram MAX_RECORDS_IN_RAM
    Maximum number of records in RAM when using a
    queryname or template coordinate sort mode; lowering
    this number will decrease maximum memory usage.
    (default: 65000000)


--mem-limit MEM_LIMIT
    Memory limit in GBs during sorting and postsorting. By
    default, the limit is half of the total system memory. 
    (default: 1007)


--gpuwrite
    Use one GPU to accelerate writing final BAM/CRAM.
    (default: None)


--gpuwrite-deflate-algo GPUWRITE_DEFLATE_ALGO
    Choose the nvCOMP DEFLATE algorithm to use with
    --gpuwrite. Note these options do not correspond to CPU
    DEFLATE options. Valid options are 1, 2, and 4. Option
    1 is fastest, while options 2 and 4 have progressively
    lower throughput but higher compression ratios. The
    default value is 1 when the user does not provide an
    input (i.e., None) (default: None)


--gpusort
    Use GPUs to accelerate sorting and marking.
    (default: None)


--strategy STRATEGY
    The UMI assignment strategy, which can be adjacency or
    paired. (default: )

    **Option is required.**


--min-map-q MIN_MAP_Q
    Minimum mapping quality. (default: 30)


--num-worker-threads NUM_WORKER_THREADS
    Number of threads for worker. (default: 14)


--error-rate-pre-umi ERROR_RATE_PRE_UMI
    The Phred-scaled error rate for an error prior to the
    UMIs being integrated. (default: 45)


--error-rate-post-umi ERROR_RATE_POST_UMI
    The Phred-scaled error rate for an error post the UMIs
    have been integrated. (default: 40)


--min-input-base-quality MIN_INPUT_BASE_QUALITY
    Ignore bases in raw reads that have Q below this
    value. (default: 10)


--min-consensus-base-quality MIN_CONSENSUS_BASE_QUALITY
    Mask (make ‘N’) consensus bases with quality less than
    this threshold. (default: 2)


--min-reads MIN_READS
    The minimum number of reads to produce a consensus
    base. (default: 1)


--out-suffixF OUT_SUFFIXF
    Output suffix used for paired reads that are first in
    pair. The suffix must end with ".gz" (default:
    _1.fastq.gz)


--out-suffixF2 OUT_SUFFIXF2
    Output suffix used for paired reads that are second in
    pair. The suffix must end with ".gz" (default:
    _2.fastq.gz)


--out-suffixO OUT_SUFFIXO
    Output suffix used for orphan/unmatched reads that are
    first in pair. The suffix must end with ".gz". If no
    suffix is provided, these reads will be ignored
    (default: None)


--out-suffixO2 OUT_SUFFIXO2
    Output suffix used for orphan/unmatched reads that are
    second in pair. The suffix must end with ".gz". If no
    suffix is provided, these reads will be ignored
    (default: None)


--out-suffixS OUT_SUFFIXS
    Output suffix used for single-end/unpaired reads. The
    suffix must end with ".gz". If no suffix is provided,
    these reads will be ignored (default: None)


--rg-tag RG_TAG
    Split reads into different fastq files based on the
    read group tag. Must be either PU or ID (default:
    None)


--remove-qc-failure
    Remove reads from the output that have abstract QC
    failure. (default: None)


--num-threads NUM_THREADS
    Number of threads to run. (default: 8)

Publisher

NVIDIA

Latest Tag1.4.0-1

UpdatedJanuary 16, 2026 UTC

Compressed Size373.21 MB

Multinode SupportNo

Multi-Arch SupportNo

System

signed images

Labels

Clara Parabricks Genomics Healthcare High Performance Computing HPC Life Sciences NSPECT-0S9C-XAVB