Expanding the 'organism' table

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Expanding the 'organism' table

Mara Kim-2
Hello everyone,

I would like to propose some extensions to the organism table.
Specifically, to add a strain and version field, and extending the
current unique constraint to incorporate those additional columns.
The goal is to generate a unique organism ID for each strain/version.

The proposed SQL is for an in place change is:
  ADD COLUMN strain character varying(256) DEFAULT ''::character varying,
  ADD COLUMN version character varying(255) DEFAULT ''::character varying,
  DROP CONSTRAINT organism_c1;
  ADD CONSTRAINT organism_c1 UNIQUE (genus, species, strain, version);

Or for a clean install:
CREATE TABLE organism (
    organism_id integer NOT NULL,
    abbreviation character varying(255),
    genus character varying(255) NOT NULL,
    species character varying(255) NOT NULL,
    strain character varying(255) DEFAULT ''::character varying,
    version character varying(255) DEFAULT ''::character varying,
    common_name character varying(255),
    comment text
    ADD CONSTRAINT organism_c1 UNIQUE (genus, species, strain, version);
    ADD CONSTRAINT organism_pkey PRIMARY KEY (organism_id);

The strain field is trying to capture the fact that these
sequenced genomes are of the same species but have distinct sequences.
 This is necessary due to the fact that there are often multiple
published genomes from different sequencing centers that used slightly
different strains.  As these are distinct biological replicates, the
end result of any annotation (genes) should be treated as distinct
from one another.  In response to the inevitable suggestion to use the
'stock' table, I see stocks as an even finer gradation, where you
might catalog samples in a specific experiment without sequencing
their genomes.  You would still want to know which strain it came from
to use as a reference genome.

The version field is necessary when supporting annotations that use
different annotation versions of a reference genome.  For example,
while it would be nice to only maintain one copy of the human genome,
published datasets use a wide range of different revisions of the
human genome as their reference.

Ideally, there wouldn't even be a version field in the organism table.
 Instead it would be a separate table 'organism_version', but since
sequence features are all specific to a given version of the genome,
it would be necessary to replace the 'organism_id' foreign key in the
feature table with 'organism_version_id', which is a much more serious

Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment
Start a new project now. Try Jenkins in the cloud.
Gmod-schema mailing list
[hidden email]