This pipeline is designed for reproducibility and transparency:
Purpose: Generate position-sorted BAM files using Cell Ranger ATAC.
Command:
for SAMPLE in HPAP-109; do
~/scripts/cellranger-atac-2.0.0/cellranger-atac count \
--id ${SAMPLE} \
--fastqs ~/hpap/atac/${SAMPLE}/Upenn_scATACseq/fastq/ \
--sample ${SAMPLE} \
--reference ~/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/ \
--localcores 24 \
--disable-ui;
done
Inputs: Raw FASTQ files
Outputs: QC HTML files, position-sorted BAM files with barcode annotations
Purpose: Process BAM files to filter reads, calculate QC metrics, and generate chromatin accessibility matrices.
Command:
python3 snATAC_pipeline_hg38_10X.py \
-b SAMPLE/possorted_bam.bam \
-o SAMPLE \
-n SAMPLE \
-t 24 \
-m 2 \
--minimum-reads 1000
Options:
-b
: Input BAM file-o
: Output directory-n
: Prefix for output names-t
: Number of threads (default: 8)-m
: Memory per thread in GB (default: 4)--minimum-reads
: Minimum reads per barcode (default: 500)Outputs:
Purpose: Prepare data for peak-based clustering by performing window-based clustering.
Command:
Rscript 01_Seurat_snATAC_windows_Harmony_reducePCs.r
Workflow:
Outputs:
HVG_all_samples.rds
)hvw.txt
)Purpose: Detect multiplets using AMULET.
Command:
AMULET.sh --forcesorted --bambc CB --bcidx 0 --cellidx 8 --iscellidx 9 \
BAM_PATH CSV_PATH autosomes.txt blacklist.bed OUTPUT_DIR
Outputs:
Purpose: Remove multiplets and refine clusters.
Command:
Rscript 02_Removing_multiplets.r
Outputs:
Purpose: Perform peak calling by generating and merging TagAlign files, splitting by cell type, and subsampling.
Key Steps:
Command Example:
bash call_peaks_unparallel.sh -c cells.txt -t tagAligns.txt -b barcodes.txt -o peaks/
Outputs:
mergedPeak.txt
)Purpose: Create long-format matrices by intersecting reads with peaks.
Command:
python3 Josh_10XPipeline_withPeaks_justLFM.py \
-o output_dir \
-k barcodes.txt \
-a tagAlign.gz \
-n SAMPLE \
-t 24 \
-m 2
Outputs:
.long_fmt_mtx.txt.gz
)Purpose: Process and analyze snATAC-seq data using peaks for clustering.
Command:
Rscript 03_Seurat_snATAC_peaks_Harmony_reducePCs_all_samples_final.r
Key Steps:
Outputs:
atac_obj_peak_based_final.rds
)This pipeline provides a structured workflow for processing single-nucleus ATAC-seq data, ensuring high reproducibility and quality control at each step. It integrates tools like Cell Ranger, Seurat, Signac, and AMULET for comprehensive data preprocessing, clustering, and analysis. For support, visit the GitHub Repository.