a. Information on genome build
b. Basic processing to obtain alignment, read filtering, barcode counts, and UMI counts
c. Ambient RNA correction
d. Doublet detection
a. Genotype checks to match donor identification and samples (optional)
b. Quality control for read files using tools such as FastQC and MultiQC
c. Distinction between empty and likely-true droplets using tools such as EmptyDrops (optional)
d. Filtering cells based on mitochondrial read rates
e. Filtering cells based on distribution of gene counts and UMI counts
f. Number of cells per sample
g. Barcode rank plots
h. Filtering clusters based on doublet identification and doublet rates
a. Information on covariates included in integration model
b. Information on cell annotation approaches
| Requirement | Implementation |
|---|---|
| Genome build | Human genome, Gencode V39 GRCh38.p13 |
| Basic processing to obtain alignment, read filtering, barcode counts, and UMI counts | STARsolo, custom scripts<br>https://github.com/PanKbase/snRNAseq-NextFlow |
| Ambient RNA correction | CellBender<br>CellBender was run twice using default parameters, and modified settings |
| Doublet detection | DoubletFinder<br>https://github.com/PanKbase/Multiome-Doublet-Detection-NextFlow<br>DoubletFinder was run twice; the second time without all doublets detected from the first round |
| Requirement | Implementation |
|---|---|
| Genotype checks | mbv tool and manual search |
| Read file checks | FastQC; Remove read files with flow cell issues (indicated by Per Tile Quality plots) or low quality reads (indicated by Quality Score plots) |
| Distinction between empty and likely-true droplets | - EmptyDrops; FDR < 0.005<br>- CellBender:<br> - Cell probability > 0.99<br> - Cells with fractions of ambient reads < a dynamic threshold determined per sample using the Multi-Otsu Thresholding algorithm |
| Filtering cells based on mitochondrial read rates | Retain cells with fractions of ambient reads < a dynamic threshold determined per sample using the Multi-Otsu Thresholding algorithm |
| Filtering cells based on distribution of gene counts and UMI counts | After integration, determine if a cluster is significantly different in profiles of gene numbers and UMI numbers using Wilcoxon rank sum test. A cluster is determined to be significantly different if their adjusted p-value < 0.05 and fold change > 2. |
| Number of cells per sample | Retain samples with > 200 cells that satisfied QC criteria of EmptyDrops, CellBender, and mitochondrial reads. |
| Barcode rank plots | Obtain using custom scripts<br>https://github.com/PanKbase/PanKbase-scRNA-seq/blob/main/1_preprocessing/2_barcode_qc.ipynb |
| Filtering clusters based on doublet identification and doublet rates | - Remove doublets and doublet-enriched clusters which are defined as ones with doublet rates > 65%.<br>- Remove doublet-like cells which exhibit high UMI counts and express markers from at least two cell populations. |
| Requirement | Implementation |
|---|---|
| Integration model | Harmony, correcting for the following covariates: sex, BMI, age, studies, treatments, chemistry and tissue sources. |
| Cell annotation approach | Annotate using known marker genes:<br>- Beta: 'INS', 'IAPP'<br>- Alpha: 'GCG'<br>- Delta: 'SST'<br>- Gamma: 'PPY'<br>- Epsilon: 'GHRL'<br>- Ductal: 'KRT19'<br>- Acinar: 'REG1A', 'CTRB2', 'PRSS1', 'PRSS2', 'CPA1'<br>- Active Stellate: 'PDGFRB', 'COL6A1'<br>- Quiescent Stellate: 'PDGFRB', 'COL6A1', 'RGS5'<br>- Endothelial: 'PECAM1', 'PLVAP', 'ESAM', 'VWF'<br>- Immune: 'PTPRC'<br>- Cycling Alpha: 'GCG', 'MKI67', 'CDK1' |