PanKbase — Primary Cell Schema and Standards

This document describes recommended meta-data standards in PanKbase for reporting pancreatic primary cells biosamples from human donors

These recommendations are based on standards used by Human Islet Resource Network (HIRN) resources including the Integrated Islet Distribution Program (IIDP) and the Human Pancreas Analysis Program (HPAP), as well as the University of Alberta IsletCore. In addition, these standards consistent with those outlined in: Hart N, Powers A Diabetologia 2018

Schema URL: https://data.pankbase.org/profiles/primary_cell

The Primary Cell data model captures information about cells directly harvested from donors, such as fibroblasts or immune cells. This model is designed to track the source, characteristics, and modifications of primary cells in the PanKbase Database.

Required Fields

The following fields must be included for a valid submission:

award: Grant associated with the submission (link to Award)
donors: Donor(s) the sample was derived from (link to Donor)
lab: Lab associated with the submission (link to Lab)
sample_terms: Ontology terms identifying the biosample
sources: The originating lab(s) or vendor(s)

Sample Metadata

description: A plain text description of the object
lower_bound_age / upper_bound_age: Age of the organism at the time of collection
age_units: Units of time associated with age (minute, hour, day, week, month, year)
disease_terms: Ontology terms of diseases associated with the biosample
date_obtained: The date the sample was harvested (YYYY-MM-DD format)
embryonic: Boolean indicating if the biosample is embryonic
virtual: Boolean indicating if the sample represents a hypothetical entity

Sample Relationships

part_of: Links to a larger biosample from which this sample was taken
originated_from: Links to source biosample (for modified samples)
pooled_from: The biosamples this sample is pooled from (minimum 2)
sorted_from: Links to source sample for sorted fractions
sorted_from_detail: Details about the sorting process

Processing and Quantification

starting_amount: The initial quantity of samples obtained
starting_amount_units: Units for quantifying samples (cells, g, mg, etc.)
passage_number: Number of passages including those from the source
cellular_sub_pool: Cellular sub-pool fraction of the sample
treatments: List of treatments applied to the biosample
modifications: Links to modifications applied to this biosample
biomarkers: Biological markers associated with this sample

Vector Introduction

construct_library_sets: Construct library sets introduced to this sample
moi: Multiplicity of infection for introduced vectors
nucleic_acid_delivery: Method of introducing nucleic acid into the cell
time_post_library_delivery: Time elapsed after construct library introduction
time_post_library_delivery_units: Units for time post library delivery

Reference Information

protocols: Links to protocols on Protocols.io
lot_id: Lot identifier provided by the originating lab or vendor
product_id: Product identifier from the originating lab or vendor
documents: Documents providing additional information
publication_identifiers: Publication identifiers with more information
url: External resource with additional information

Identifiers and References

accession: Unique identifier (prefixed with PKB, assigned by server)
aliases: Lab-specific identifiers
dbxrefs: External resource identifiers (admin-only)

Derived and Computed Fields

The following fields are computed or derived and should not be submitted:

sex: Sex of the sample (derived from donor information)
age: Age of organism at collection time (computed from bounds)
upper_bound_age_in_hours / lower_bound_age_in_hours: Age in hours (computed)
taxa: Species of the organism (derived from donor)
classifications: General category of sample type
summary: Auto-generated summary of the sample

Related Objects

Fields that track relationships to other objects:

file_sets: File sets linked to this sample
multiplexed_in: Multiplexed samples including this sample
sorted_fractions: Fractions into which this sample has been sorted
origin_of: Samples which originate from this sample
parts: Parts into which this sample has been divided
pooled_in: Pooled samples including this sample
institutional_certificates: Certificates approving sample use

Administrator-Only Fields

The following fields are for administrative purposes only:

status: Status of the metadata object
release_timestamp: Date the object was released
schema_version: JSON schema version (current: 19)
uuid: Unique identifier
collections: Data collections (for DACC use only)
creation_timestamp: Creation date
submitted_by: User who submitted the object
notes: DACC internal notes
revoke_detail: Explanation for revoked status

Submission Guidelines

Ensure all required fields are completed
Include comprehensive metadata for better data discoverability
Follow proper formatting for identifiers (RRIDs, accessions, etc.)
Link to appropriate ontology terms for sample and disease terms
Include relationships to donor and derived samples when applicable
Provide detailed information about cell processing when available

Schema Dependencies

Several fields have dependencies requiring additional fields when specified:

When specifying lot_id, you must also include product_id
When specifying sorted_from, you must include sorted_from_detail (and vice versa)
When specifying starting_amount, you must include starting_amount_units (and vice versa)
Age fields (lower_bound_age, upper_bound_age, and age_units) must be specified together