NGC | Catalog
CatalogModelsDsc Atac-seq Low Cell Count 1M Reads

Dsc Atac-seq Low Cell Count 1M Reads

For downloads and more information, please view on a desktop device.
Logo for Dsc Atac-seq Low Cell Count 1M Reads

Description

Model trained on single cell atac-seq data using dsc-ATAC protocol

Publisher

NVIDIA

Latest Version

0.3

Modified

April 4, 2023

Size

1.42 GB

Model Overview

The ultimate use of this model is to denoise ATAC-seq data and create a higher-quality signal for a much cheaper cost compared to collecting high quality data.

Model Architecture

The model is a residual neural network consisting of 5 residual blocks that produce the denoised ATAC-seq signal, followed by 2 additional residual blocks and a sigmoid layer to produce classification (peak calling) output. All residual blocks are composed of 1-dimensional convolutional layers and ReLU activations.

Training

This model was trained on low cell count single-cell ATAC-seq data from human blood cells (B cells and monocytes). Uses the dsc-ATAC protocol. Clean data read depth is 48 Million, noisy data read depth is 1 Million. Noisy cell count is 50, clean cell count is 2400. Data was obtained from Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat. Biotechnol. 37, 916–924 (2019).

Dataset

Noisy Data: Click on the File Browser tab above. Click on train_data -> noisy_data.

Clean Data: Click on the File Browser tab above. Click on train_data -> clean_data.

Performance

No performance information available at this time.

How to Use this Model

Step-by-step tutorial for denoising atac-seq data using a pre-trained model can be found at AtacWorks repository: https://github.com/clara-parabricks/AtacWorks/blob/master/tutorials/tutorial2.md

Make sure to edit Steps 2,3 and 4 to change the download links of the files to this model.

Step-by-step tutorial for training a model can be found at AtacWorks repository: https://github.com/clara-parabricks/AtacWorks/blob/master/tutorials/tutorial1.md

Make sure to edit Step 2 to change the download links of the files to this model.

Pretrained Models: The best performing model can be found under the File Broswer tab, models directory.

Configs: AtacWorks has the ability to take in config files that specify various command line options. This makes it easy to share the exact parameters the model was trained with. You can find the configs under the File Browser tab, configs folder.

Input

The input data for this model consists of low cell count ATAC-seq coverage tracks. The labels to train the model are high cell count ATAC-seq coverage tracks and peak calls based on the high cell count coverage tracks.

Output

This model learned to denoise a 1 Million read depth atac-seq data to resemble a 48 Miliion read depth data. The output from atacworks is two bedGraph, optionally BigWig files. One containing denoised tracks and other containing denoised peaks.

Output Files: For the convenience of users, we have uploaded the output files generated by AtacWorks. They can be found under the File Broswer tab, output directory.

Limitations

No information about known limitations available at this time.

Reference

Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021). https://doi.org/10.1038/s41467-021-21765-5

License

The model architecture, training and inference pipelines for this model were developed by the NVIDIA Genomics team. The relevant source code is open sourced under a custom NVIDIA license available here.