The ultimate use of this model is to denoise ATAC-seq data and create a higher-quality signal for a much cheaper cost compared to collecting high quality data.
The model is a residual neural network consisting of 5 residual blocks that produce the denoised ATAC-seq signal, followed by 2 additional residual blocks and a sigmoid layer to produce classification (peak calling) output. All residual blocks are composed of 1-dimensional convolutional layers and ReLU activations.
This model was trained on bulk ATAC-seq data from 4 blood cell types (B cells, NK cells, CD4+ and CD8+ T cells). Noisy data depth is 0.2 million reads. Clean data depth is 50 million reads. Data was obtained from Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nature Genetics vol. 48 1193–1203 (2016).
Noisy Data: Click on the File Browser tab above. Click on train_data -> noisy_data.
Clean Data: Click on the File Browser tab above. Click on train_data -> clean_data.
No performance information available at this time.
A step-by-step tutorial for denoising ATAC-seq data using a pre-trained model can be found at AtacWorks repository: https://github.com/clara-parabricks/AtacWorks/blob/master/tutorials/tutorial2.md
Make sure to edit Steps 2,3 and 4 to change the download links of the files to this model. A step-by-step tutorial for training a model can be found at AtacWorks repository: https://github.com/clara-parabricks/AtacWorks/blob/master/tutorials/tutorial1.md
Pretrained Models: The best performing model can be found under the File Broswer tab, models directory.
Configs: AtacWorks has the ability to take in config files that specify various command line options. This makes it easy to share the exact parameters the model was trained with. You can find the configs for this model under the File Browser tab, configs folder.
The input data for this model consists of noisy ATAC-seq coverage tracks. The labels to train the model are clean ATAC-seq coverage tracks and peak calls based on the clean coverage tracks.
This model learned to denoise a 0.2 Million read depth ATAC-seq signal to resemble a 50 Miliion read depth signal, and to call peaks from the noisy ATAC-seq signal. The output from AtacWorks is two bedGraph, optionally bigWig files, one containing a denoised coverage track and other containing denoised peaks.
Output Files: For the convenience of users, we have uploaded the output files generated by AtacWorks. They can be found under File Browser tab, output directory.
This model was trained on data processed using the method in https://github.com/zchiang/atacworks_analysis. It may not deliver suitable results on data processed with different methods.
Lal, A., Chiang, Z.D., Yakovenko, N. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat Commun 12, 1507 (2021). https://doi.org/10.1038/s41467-021-21765-5
The model architecture, training and inference pipelines for this model were developed by the NVIDIA Genomics team. The relevant source code is open sourced under a custom NVIDIA license available here.