Primary Cell Schema and Standards

This document describes recommended meta-data standards in PanKbase for reporting pancreatic primary cells biosamples from human donors

These recommendations are based on standards used by Human Islet Resource Network (HIRN) resources including the Integrated Islet Distribution Program (IIDP) and the Human Pancreas Analysis Program (HPAP), as well as the University of Alberta IsletCore. In addition, these standards consistent with those outlined in: Hart N, Powers A Diabetologia 2018


Schema URL: https://data.pankbase.org/profiles/primary_cell

The Primary Cell data model captures information about cells directly harvested from donors, such as fibroblasts or immune cells. This model is designed to track the source, characteristics, and modifications of primary cells in the PanKbase Database.


Required Fields

The following fields must be included for a valid submission:

  • award: Grant associated with the submission (link to Award)
  • donors: Donor(s) the sample was derived from (link to Donor)
  • lab: Lab associated with the submission (link to Lab)
  • sample_terms: Ontology terms identifying the biosample
  • sources: The originating lab(s) or vendor(s)

Sample Metadata

  • description: A plain text description of the object
  • lower_bound_age / upper_bound_age: Age of the organism at the time of collection
  • age_units: Units of time associated with age (minute, hour, day, week, month, year)
  • disease_terms: Ontology terms of diseases associated with the biosample
  • date_obtained: The date the sample was harvested (YYYY-MM-DD format)
  • embryonic: Boolean indicating if the biosample is embryonic
  • virtual: Boolean indicating if the sample represents a hypothetical entity

Sample Relationships

  • part_of: Links to a larger biosample from which this sample was taken
  • originated_from: Links to source biosample (for modified samples)
  • pooled_from: The biosamples this sample is pooled from (minimum 2)
  • sorted_from: Links to source sample for sorted fractions
  • sorted_from_detail: Details about the sorting process

Processing and Quantification

  • starting_amount: The initial quantity of samples obtained
  • starting_amount_units: Units for quantifying samples (cells, g, mg, etc.)
  • passage_number: Number of passages including those from the source
  • cellular_sub_pool: Cellular sub-pool fraction of the sample
  • treatments: List of treatments applied to the biosample
  • modifications: Links to modifications applied to this biosample
  • biomarkers: Biological markers associated with this sample

Vector Introduction

  • construct_library_sets: Construct library sets introduced to this sample
  • moi: Multiplicity of infection for introduced vectors
  • nucleic_acid_delivery: Method of introducing nucleic acid into the cell
  • time_post_library_delivery: Time elapsed after construct library introduction
  • time_post_library_delivery_units: Units for time post library delivery

Reference Information

  • protocols: Links to protocols on Protocols.io
  • lot_id: Lot identifier provided by the originating lab or vendor
  • product_id: Product identifier from the originating lab or vendor
  • documents: Documents providing additional information
  • publication_identifiers: Publication identifiers with more information
  • url: External resource with additional information

Identifiers and References

  • accession: Unique identifier (prefixed with PKB, assigned by server)
  • aliases: Lab-specific identifiers
  • dbxrefs: External resource identifiers (admin-only)

Derived and Computed Fields

The following fields are computed or derived and should not be submitted:

  • sex: Sex of the sample (derived from donor information)
  • age: Age of organism at collection time (computed from bounds)
  • upper_bound_age_in_hours / lower_bound_age_in_hours: Age in hours (computed)
  • taxa: Species of the organism (derived from donor)
  • classifications: General category of sample type
  • summary: Auto-generated summary of the sample

Related Objects

Fields that track relationships to other objects:

  • file_sets: File sets linked to this sample
  • multiplexed_in: Multiplexed samples including this sample
  • sorted_fractions: Fractions into which this sample has been sorted
  • origin_of: Samples which originate from this sample
  • parts: Parts into which this sample has been divided
  • pooled_in: Pooled samples including this sample
  • institutional_certificates: Certificates approving sample use

Administrator-Only Fields

The following fields are for administrative purposes only:

  • status: Status of the metadata object
  • release_timestamp: Date the object was released
  • schema_version: JSON schema version (current: 19)
  • uuid: Unique identifier
  • collections: Data collections (for DACC use only)
  • creation_timestamp: Creation date
  • submitted_by: User who submitted the object
  • notes: DACC internal notes
  • revoke_detail: Explanation for revoked status

Submission Guidelines

  • Ensure all required fields are completed
  • Include comprehensive metadata for better data discoverability
  • Follow proper formatting for identifiers (RRIDs, accessions, etc.)
  • Link to appropriate ontology terms for sample and disease terms
  • Include relationships to donor and derived samples when applicable
  • Provide detailed information about cell processing when available

Schema Dependencies

Several fields have dependencies requiring additional fields when specified:

  • When specifying lot_id, you must also include product_id
  • When specifying sorted_from, you must include sorted_from_detail (and vice versa)
  • When specifying starting_amount, you must include starting_amount_units (and vice versa)
  • Age fields (lower_bound_age, upper_bound_age, and age_units) must be specified together