File format standards

File Format Specifications – PanKbase

This document provides an overview of required formats for the different file types hosted by PanKbase.


Sequencing Data


Sequence Alignments


Normalized Genomic Signal


Gene Quantifications

  • Format: Tab-delimited text
  • Description: Contains gene or transcript-level counts and normalized expression values (e.g., from bulk RNA-seq).
  • Specification: GTEx Gene Quantifications Format

Gene Count Matrix

  • Format: Not specified
  • Description: Matrix of gene expression values across multiple samples or cells.

QTL Summary Statistics

  • Format: Tab-delimited text
  • Description: Results from QTL mapping (e.g., eQTL, caQTL).
  • Specification: To be added

Genetic Association Summary Statistics

  • Format: Tab-delimited text
  • Description: Results from GWAS or other genetic association analyses.
  • Specification: To be added

Gene Sets

  • Format: .gmt (Gene Matrix Transposed)
  • Description: Lists of genes grouped by pathway, function, or disease association.
  • Specification: GMT Format – Broad GSEA Wiki