Linux / amd64
DeepSAP is a transformer-based workflow designed to enhance splice junction detection in RNA-seq data. By default, DeepSAP utilizes the highly sensitive GSNAP TGGA aligner for FASTQ inputs. Alternatively, it can process pre-aligned BAM files from GSNAP aligner directly.
To use DeepSAP, you must have Docker with GPU support enabled and make sure the DeepSAP Docker image is available on your system. You can obtain the image by running the following command:
$ docker pull nvcr.io/nvidia/clara/clara-parabricks-deepsap:<TAG>
You can use the accompanied dataset named malaria_short_pe
under the test
folder to test DeepSAP's functionality with minimal setup. This dataset includes:
Update the paths in the provided example commands to point to the files in the test/
folder.
1- Running DeepSAP with short-read RNA-seq FASTQ files
docker run --gpus all --ulimit memlock=-1 --ulimit stack=67108864 --rm \
--volume $(pwd)/test:/workdir \
--volume $(pwd)/test/outputdir:/outputdir \
nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest \
--out /outputdir/ \
--prefix test_run_10K \
--mate_1 /workdir/malaria_short_pe/SRR14793977_10K_1.fastq.gz \
--mate_2 /workdir/malaria_short_pe/SRR14793977_10K_2.fastq.gz \
--gtf /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.60.gtf \
--fasta /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.dna.toplevel.fa
2- Running DeepSAP with short-read RNA-seq FASTQ files and GSNAP index.
docker run --gpus all --ulimit memlock=-1 --ulimit stack=67108864 --rm \
--volume $(pwd)/test:/workdir \
--volume $(pwd)/test/outputdir:/outputdir \
nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest \
--out /outputdir/ \
--prefix test_run_10K \
--mate_1 /workdir/malaria_short_pe/SRR14793977_10K_1.fastq.gz \
--mate_2 /workdir/malaria_short_pe/SRR14793977_10K_2.fastq.gz \
--gtf /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.60.gtf \
--fasta /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.dna.toplevel.fa\
--gsnap_idx /workdir/gsnap_idx/
Argument | Description | Required |
---|---|---|
-o, --out |
Path to the output folder | Yes |
--prefix |
Output files prefix string | Yes |
-g, --gtf |
Path to the GTF annotation file compatible with the BAM file | Yes |
-f, --fasta |
Path to the FASTA genome file compatible with the BAM file | Yes |
-s, --sam |
Path to the SAM/BAM file or directory of files | Yes (if BAM) |
--mate_1 |
Path to FASTQ file of mate 1 (for paired-end reads) | Yes (if FASTQ) |
--mate_2 |
Path to FASTQ file of mate 2 (for paired-end reads) | Yes (if FASTQ) |
--gsnap_idx |
Path to GSNAP index | No |
-c, --config |
Config .json file to control DeepSAP internal parameters |
No |
--batch |
Batch size for inference | No |
--set_size |
Set size to split datasets for inference | No |
-t, --threads |
Number of threads | No |
--score_reads |
Classify also reads using the transformer model and add scores to SAM, as appose to only SJ | No |
--n_reads |
Number of reads to classify if --score_reads is used |
No |
By pulling and using the Parabricks container, you accept the governing terms: The software and materials are governed by the NVIDIA Software License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and the Product-Specific Terms for NVIDIA AI Products (found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/); except for the model which is governed by the NVIDIA Models Community License Agreement(found at NVIDIA Community Model License). ADDITIONAL INFORMATION: Apache 2.0.