Ha TH Vu, Han Sun, Seth Sharp, Marcela Brissova, Kyle Gaulton, Anna L Gloyn, Stephen CJ Parker
| Cell type | Number of cells |
|---|---|
| Alpha | 225637 |
| Beta | 160063 |
| Acinar | 54489 |
| Ductal | 47221 |
| Active Stellate | 28747 |
| Delta | 17292 |
| Gamma + Epsilon | 13163 |
| Endothelial | 4031 |
| MUC5B+ Ductal | 3828 |
| Quiescent Stellate | 3663 |
| Cycling Alpha | 3646 |
| Immune (Macrophage) | 3608 |
scRNA-seq data and metadata generated from islet tissues using the 10X Chromium platform were downloaded from https://hpap.pmacs.upenn.edu/ (accessed March 2024) [1], [2], [3], [4].
Literature searches were conducted in order to identify original studies involving scRNA-seq generated using islets distributed by IIDP and Prodo. Literature searches were conducted in order to identify original studies involving scRNA-seq generated using islets distributed by IIDP and Prodo. Briefly, pancreatic islet datasets of expression profiling by high throughput sequencing were identified via GEO search for the date range before July 2024, then studies with scRNA-seq data were determined manually. Datasets were then downloaded from the studies' corresponding GEO page using the fastq-dump command from the SRA toolkit (v3.1.1) [5]. Data from the following GEO accession IDs were used: GSE142465 [6], GSE159556 [7], GSE183568 [8], GSE190447 [9], GSE201256 [10], GSE217837 [11], GSE221156 [12], GSE251730 [13], GSE251912 [14].
For each sample, DoubletFinder (v2.0.3) [26] was used on the RNA ambient-corrected gene count matrix. Next, we jointly clustered libraries using Seurat (v4.4.0) [27] with doublets included, and then filtered out any cells that were marked as doublets or were in clusters with doublet rates > 37.5%.
All data integration was performed using Seurat. In particular, due to the large number of samples, we first combined all samples per source (data generated using HPAP, IIDP, and Prodo samples) to create three Seurat objects. For each object, we then retained a) protein-coding genes that were in ≥ 0.01% of the cells, and b) cells that expressed ≥ 5% of the protein-coding genes. Subsequently, we merged all objects and carried out integration using Harmony (v1.2.0) [28], correcting for the following covariates: samples, studies, donor identifier, tissue sources, and chemistry. As a result, we obtained a set of 665,256 cells expressing 18,704 protein-coding genes.
Next, we adopted an iterative approach to further clean up the data:
This work used data acquired from the database (https://hpap.pmacs.upenn.edu/) of the Human Pancreas Analysis Program (HPAP; RRID:SCR_016202; PMID: 31127054; PMID: 36206763; PMID: 35228745 and PMID: 37188822). HPAP is part of a Human Islet Research Network (RRID:SCR_014393) consortium (UC4-DK112217, U01-DK123594, UC4-DK112232, and U01-DK123716).
Additional islet samples and tissues were provided by the Integrated Islet Distribution Program (IIDP) (RRID:SCR_014387), funded by NIH Grant #2UC4DK098085.
[1] S. N. Shapira, A. Naji, M. A. Atkinson, A. C. Powers, and K. H. Kaestner, "Understanding islet dysfunction in type 2 diabetes through multidimensional pancreatic phenotyping: The Human Pancreas Analysis Program," Cell Metab., vol. 34, no. 12, pp. 1906–1913, Dec. 2022, doi: 10.1016/j.cmet.2022.09.013.
[2] K. H. Kaestner, A. C. Powers, A. Naji, HPAP Consortium, and M. A. Atkinson, "NIH Initiative to Improve Understanding of the Pancreas, Islet, and Autoimmunity in Type 1 Diabetes: The Human Pancreas Analysis Program (HPAP)," Diabetes, vol. 68, no. 7, pp. 1394–1402, Jul. 2019, doi: 10.2337/db19-0058.
[3] A. R. Patil, J. Schug, A. Naji, K. H. Kaestner, R. B. Faryabi, and G. Vahedi, "Single-cell expression profiling of islets generated by the Human Pancreas Analysis Program," Nat. Metab., vol. 5, no. 5, pp. 713–715, May 2023, doi: 10.1038/s42255-023-00806-x.
[4] M. Fasolino et al., "Single-cell multi-omics analysis of human pancreatic islets reveals novel cellular states in type 1 diabetes," Nat. Metab., vol. 4, no. 2, pp. 284–299, Feb. 2022, doi: 10.1038/s42255-022-00531-x.
[5] SRA Toolkit Development Team, SRA Toolkit. [Online]. Available: https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software
[6] B. Marquina-Sanchez et al., "Single-cell RNA-seq with spike-in cells enables accurate quantification of cell-specific drug effects in pancreatic islets," Genome Biol., vol. 21, no. 1, p. 106, Dec. 2020, doi: 10.1186/s13059-020-02006-2.
[7] X. Tang et al., "SARS-CoV-2 infection induces beta cell transdifferentiation," Cell Metab., vol. 33, no. 8, pp. 1577-1591.e7, Aug. 2021, doi: 10.1016/j.cmet.2021.05.015.
[8] S. Shrestha et al., "Combinatorial transcription factor profiles predict mature and functional human islet α and β cells," JCI Insight, vol. 6, no. 18, p. e151621, Sep. 2021, doi: 10.1172/jci.insight.151621.
[9] G. Basile et al., "Excess pancreatic elastase alters acinar-β cell communication by impairing the mechano-signaling and the PAR2 pathways," Cell Metab., vol. 35, no. 7, pp. 1242-1260.e9, Jul. 2023, doi: 10.1016/j.cmet.2023.05.007.
[10] W. Xu et al., "Architecture of androgen receptor pathways amplifying glucagon-like peptide-1 insulinotropic action in male pancreatic β cells," Cell Rep., vol. 42, no. 5, p. 112529, May 2023, doi: 10.1016/j.celrep.2023.112529.
[11] R. B. Kang et al., "Single-nucleus RNA sequencing of human pancreatic islets identifies novel gene sets and distinguishes β-cell subpopulations with dynamic transcriptome profiles," Genome Med., vol. 15, no. 1, p. 30, May 2023, doi: 10.1186/s13073-023-01179-2.
[12] K. Bandesh et al., "Single-cell decoding of human islet cell type-specific alterations in type 2 diabetes reveals converging genetic- and state-driven β-cell gene expression defects," Jan. 22, 2025, Genomics. doi: 10.1101/2025.01.17.633590.
[13] J. S. Stancill, M. Y. Kasmani, W. Cui, and J. A. Corbett, "Single Cell RNAseq Analysis of Cytokine-Treated Human Islets: Association of Cellular Stress with Impaired Cytokine Responsiveness," Function, vol. 5, no. 4, p. zqae015, Jul. 2024, doi: 10.1093/function/zqae015.
[14] E. K. Sokolowski et al., "Multi-omic human pancreatic islet endoplasmic reticulum and cytokine stress response mapping provides type 2 diabetes genetic insights," Cell Metab., vol. 36, no. 11, pp. 2468-2488.e7, Nov. 2024, doi: 10.1016/j.cmet.2024.09.006.
[15] S. Andrews, FastQC. [Online]. Available: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
[16] P. Ewels, M. Magnusson, S. Lundin, and M. Käller, "MultiQC: summarize analysis results for multiple tools and samples in a single report," Bioinformatics, vol. 32, no. 19, pp. 3047–3048, Oct. 2016, doi: 10.1093/bioinformatics/btw354.
[17] B. Kaminow, D. Yunusov, and A. Dobin, "STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data," May 05, 2021, Bioinformatics. doi: 10.1101/2021.05.05.442755.
[18] H. Li et al., "The Sequence Alignment/Map format and SAMtools," Bioinformatics, vol. 25, no. 16, pp. 2078–2079, Aug. 2009, doi: 10.1093/bioinformatics/btp352.
[19] A. Fort et al., "MBV : a method to solve sample mislabeling and detect technical bias in large combined genotype and sequencing assay datasets," Bioinformatics, vol. 33, no. 12, pp. 1895–1897, Jun. 2017, doi: 10.1093/bioinformatics/btx074.
[20] O. Delaneau, H. Ongen, A. A. Brown, A. Fort, N. I. Panousis, and E. T. Dermitzakis, "A complete tool set for molecular QTL discovery and analysis," Nat. Commun., vol. 8, no. 1, p. 15452, May 2017, doi: 10.1038/ncomms15452.
[21] 10x Genomics, "Quality Assessment Using the Cell Ranger Web Summary." [Online]. Available: https://www.10xgenomics.com/analysis-guides/quality-assessment-using-the-cell-ranger-web-summary
[22] A. T. L. Lun et al., "EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data," Genome Biol., vol. 20, no. 1, p. 63, Dec. 2019, doi: 10.1186/s13059-019-1662-y.
[23] Abraham. Savitzky and M. J. E. Golay, "Smoothing and Differentiation of Data by Simplified Least Squares Procedures.," Anal. Chem., vol. 36, no. 8, pp. 1627–1639, Jul. 1964, doi: 10.1021/ac60214a047.
[24] S. J. Fleming et al., "Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender," Nat. Methods, vol. 20, no. 9, pp. 1323–1335, Sep. 2023, doi: 10.1038/s41592-023-01943-7.
[25] J. G. Aaron Lun, DropletUtils. (2018). Bioconductor. doi: 10.18129/B9.BIOC.DROPLETUTILS.
[26] C. S. McGinnis, L. M. Murrow, and Z. J. Gartner, "DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors," Cell Syst., vol. 8, no. 4, pp. 329-337.e4, Apr. 2019, doi: 10.1016/j.cels.2019.03.003.
[27] Y. Hao et al., "Integrated analysis of multimodal single-cell data," Cell, vol. 184, no. 13, pp. 3573-3587.e29, Jun. 2021, doi: 10.1016/j.cell.2021.04.048.
[28] I. Korsunsky et al., "Fast, sensitive and accurate integration of single-cell data with Harmony," Nat. Methods, vol. 16, no. 12, pp. 1289–1296, Dec. 2019, doi: 10.1038/s41592-019-0619-0.
Visit our Citation Policies for details on how to cite this dataset. Please cite all underlying resources whose data were used to generate the composite dataset.