2. Analysis Set Metadata
a) Intermediate Analysis Set
b) Principal Analysis Set
Overview
An Analysis Set represents the results of a computational analysis of raw genomic data or other analyses. Analysis sets can be either intermediate (part of a larger analysis chain) or principal (final interpretable results).
Required Fields
The following fields are required when creating an Analysis Set:
- award: Grant associated with the submission
- file_set_type: The level of this analysis set ("intermediate analysis" or "principal analysis")
- lab: Lab associated with the submission
Additionally, input_file_sets is required if the file_set_type is "principal analysis".
Important Rules
- Mutually Exclusive Fields: Specification of samples is mutually exclusive with specification of donors.
- Status Requirements:
- Release, archived, and revoked status should have release_timestamp specified
- Release timestamp is required if status is released, revoked, or archived
- Principal Analysis Requirement:
- If file_set_type is "principal analysis", then input_file_sets must be specified
Field Descriptions
Basic Information
- accession: A unique identifier prefixed with PKB (server-assigned)
- aliases: Lab-specific identifiers to reference an object (Format: lab-name:identifier)
- description: A plain text description of the object
- file_set_type: The level of this analysis set ("intermediate analysis" or "principal analysis")
- status: The status of the metadata object (admin-only, default: "in progress")
- submitter_comment: Additional information from submitter
Analysis Information
- input_file_sets: File set(s) required for this analysis (required for principal analysis)
- samples: Sample(s) associated with this file set (mutually exclusive with donors)
- donors: Donors of the samples (not directly submittable)
- assay_titles: Titles of assays that produced data analyzed (not directly submittable)
Related Data
- files: Files associated with this file set (not directly submittable)
- control_for: File sets for which this is a control (not directly submittable)
- input_file_set_for: File sets that use this as input (not directly submittable)
References
- dbxrefs: External resource identifiers (Pattern: ^GEO:GSE\d+$)
- documents: Documents providing additional information (links to Document)
- publication_identifiers: Publication identifiers (various formats accepted)
Administrative Fields
- award: Grant associated with the submission (links to Award)
- lab: Lab associated with the submission (links to Lab)
- uuid: Unique identifier for the object (server-assigned)
- collections: Data collections the samples are part of (admin-only)
- schema_version: JSON schema version (default: "7")
- alternate_accessions: Previous accessions for merged objects (admin-only)
- creation_timestamp: Object creation date (server-assigned)
- release_timestamp: Object release date (admin-only)
- submitted_by: User who submitted the object (server-assigned)
- submitted_files_timestamp: Timestamp of first file creation (not submittable)
- notes: DACC internal notes (admin-only)
- revoke_detail: Explanation for revoked status (admin-only)
Types of Analysis Sets
Intermediate Analysis
- Processed data which are not the final results of an experiment
- May not be interpretable on their own
- Part of a larger analysis pipeline
Principal Analysis
- Processed data which are the final results of an experiment
- Results can be interpretable on their own
- Requires specification of input_file_sets
Relationship with Measurement Sets
Analysis Sets often process data from Measurement Sets. The connection is made through the input_file_sets
field, which can reference Measurement Sets or other Analysis Sets that served as input for the current analysis.