This document describes data generation, processing, and QC standards in PanKbase for single-nucleus ATAC-sequencing studies, based on standards adapted from snATAC HPAP protocol and Gaulton Lab snATAC data processing pipeline.
Based on standard practices for Chromium Single Cell ATAC Sequencing (10x Genomics) and snATAC HPAP protocol:
1. Sequencing Depth
Recommended depth: ~25,000–50,000 read pairs per nucleus.
For a targeted recovery of 5,000 nuclei (as stated in the protocol), the total sequencing depth would typically range between 125 million to 250 million read pairs.
2. Read Length
Chromium Single Cell ATAC uses paired-end sequencing.
Standard read length:
Read 1: 50 bp (transposase cut site)
Read 2: 50 bp (barcode + UMI information)
Index 1 (i7): 8 bp (sample index)
Index 2 (i5): 16 bp (cell barcode)
3. Quality Control Metrics
Cell viability threshold: ≥85%.
Nuclei concentration after isolation: ~500–5,000 nuclei/μl, with a final concentration of ~30 nuclei/μl during transposition.
Nuclei recovery rate: ~25–50% of input cells after processing.
Low debris and ambient DNA: DNase treatment can help reduce contamination.
4. Data Standards
Fragment size distribution: Enrichment around 200 bp and multiples (nucleosome phasing).
TSS enrichment score: ≥10 (for high-quality libraries).
Fraction of reads in peaks (FRiP): ≥15–20%.
Duplicate rate: ≤10%.
For exact sequencing parameters, the final experimental design and requirements should follow recommendations in the Chromium Single Cell ATAC Solution User Guide provided by 10x Genomics.
References
10x Genomics Chromium Single Cell ATAC Solution User Guide
This pipeline is designed for reproducibility and transparency:
Purpose: Generate position-sorted BAM files using Cell Ranger ATAC.
Command:
for SAMPLE in HPAP-109; do
~/scripts/cellranger-atac-2.0.0/cellranger-atac count \
--id ${SAMPLE} \
--fastqs ~/hpap/atac/${SAMPLE}/Upenn_scATACseq/fastq/ \
--sample ${SAMPLE} \
--reference ~/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/ \
--localcores 24 \
--disable-ui;
done
Inputs: Raw FASTQ files
Outputs: QC HTML files, position-sorted BAM files with barcode annotations
Purpose: Process BAM files to filter reads, calculate QC metrics, and generate chromatin accessibility matrices.
Command:
python3 snATAC_pipeline_hg38_10X.py \
-b SAMPLE/possorted_bam.bam \
-o SAMPLE \
-n SAMPLE \
-t 24 \
-m 2 \
--minimum-reads 1000
Options:
-b
: Input BAM file-o
: Output directory-n
: Prefix for output names-t
: Number of threads (default: 8)-m
: Memory per thread in GB (default: 4)--minimum-reads
: Minimum reads per barcode (default: 500)Outputs:
Purpose: Prepare data for peak-based clustering by performing window-based clustering.
Command:
Rscript 01_Seurat_snATAC_windows_Harmony_reducePCs.r
Workflow:
Outputs:
HVG_all_samples.rds
)hvw.txt
)Purpose: Detect multiplets using AMULET.
Command:
AMULET.sh --forcesorted --bambc CB --bcidx 0 --cellidx 8 --iscellidx 9 \
BAM_PATH CSV_PATH autosomes.txt blacklist.bed OUTPUT_DIR
Outputs:
Purpose: Remove multiplets and refine clusters.
Command:
Rscript 02_Removing_multiplets.r
Outputs:
Purpose: Perform peak calling by generating and merging TagAlign files, splitting by cell type, and subsampling.
Key Steps:
Command Example:
bash call_peaks_unparallel.sh -c cells.txt -t tagAligns.txt -b barcodes.txt -o peaks/
Outputs:
mergedPeak.txt
)Purpose: Create long-format matrices by intersecting reads with peaks.
Command:
python3 Josh_10XPipeline_withPeaks_justLFM.py \
-o output_dir \
-k barcodes.txt \
-a tagAlign.gz \
-n SAMPLE \
-t 24 \
-m 2
Outputs:
.long_fmt_mtx.txt.gz
)Purpose: Process and analyze snATAC-seq data using peaks for clustering.
Command:
Rscript 03_Seurat_snATAC_peaks_Harmony_reducePCs_all_samples_final.r
Key Steps:
Outputs:
atac_obj_peak_based_final.rds
)This pipeline provides a structured workflow for processing single-nucleus ATAC-seq data, ensuring high reproducibility and quality control at each step. It integrates tools like Cell Ranger, Seurat, Signac, and AMULET for comprehensive data preprocessing, clustering, and analysis. For support, visit the GitHub Repository.