Linux / amd64
DeepSAP is a Transformer-based workflow designed to enhance splice junction detection in RNA-seq data. By default, DeepSAP utilizes the highly sensitive GSNAP TGGA aligner for FASTQ inputs. Alternatively, it can process pre-aligned BAM files directly.
To run DeepSAP, you need Docker with GPU support. Ensure that the DeepSAP Docker image nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest
is available locally.
DeepSAP comes with a sample dataset located in the test folder to help users get started. This dataset includes:
You can use this dataset to test DeepSAP's functionality with minimal setup. Update the paths in the provided example commands to point to the files in the test folder.
1- Running DeepSAP with short-read RNA-seq FASTQ files
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm \
--volume $(pwd)/test:/workdir \
--volume $(pwd)/test/outputdir:/outputdir \
nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest \
--out /outputdir/ \
--prefix test_run \
--mate_1 /workdir/short_read_pe_dataset/reads_1.fastq \
--mate_2 /workdir/short_read_pe_dataset/reads_2.fastq \
--fasta /workdir/short_read_pe_dataset/malaria_genome.fa \
--gtf /workdir/short_read_pe_dataset/malaria_annotation.gtf
2- Running DeepSAP with short-read RNA-seq FASTQ files and GSNAP index.
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm \
--volume $(pwd)/test:/workdir \
--volume $(pwd)/test/outputdir:/outputdir \
nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest \
--out /outputdir/ \
--prefix test_run \
--mate_1 /workdir/short_read_pe_dataset/reads_1.fastq \
--mate_2 /workdir/short_read_pe_dataset/reads_2.fastq \
--fasta /workdir/short_read_pe_dataset/malaria_genome.fa \
--gtf /workdir/short_read_pe_dataset/malaria_annotation.gtf \
--gsnap_idx /workdir/gsnap_idx/
3- Running DeepSAP with an alignment BAM file generated from short-read RNA-seq data.
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm \
--volume $(pwd)/test:/workdir \
--volume $(pwd)/test/outputdir:/outputdir \
nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest \
--out /outputdir/ \
--prefix test_run \
--sam /workdir/short_read_pe_dataset/alignments.bam \
--fasta /workdir/short_read_pe_dataset/malaria_genome.fa \
--gtf /workdir/short_read_pe_dataset/malaria_annotation.gtf
Argument | Description | Required |
---|---|---|
-o, --out |
Path to the output folder | Yes |
--prefix |
Output files prefix string | Yes |
-g, --gtf |
Path to the GTF annotation file compatible with the BAM file | Yes |
-f, --fasta |
Path to the FASTA genome file compatible with the BAM file | Yes |
-s, --sam |
Path to the SAM/BAM file or directory of files | Yes (if BAM) |
--mate_1 |
Path to FASTQ file of mate 1 (for paired-end reads) | Yes (if FASTQ) |
--mate_2 |
Path to FASTQ file of mate 2 (for paired-end reads) | Yes (if FASTQ) |
--gsnap_idx |
Path to GSNAP index | No |
-c, --config |
Config .json file to control DeepSAP internal parameters |
No |
--batch |
Batch size for inference | No |
--set_size |
Set size to split datasets for inference | No |
-t, --threads |
Number of threads | No |
--score_reads |
Classify also reads using the transformer model and add scores to SAM, as appose to only SJ | No |
--n_reads |
Number of reads to classify if --score_reads is used |
No |
Governing Terms: The software and materials are governed by the NVIDIA Software License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and the Product-Specific Terms for NVIDIA AI Products (found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/); except for the model which is governed by the NVIDIA Models Community License Agreement(found at NVIDIA Community Model License). ADDITIONAL INFORMATION: Apache 2.0.