Strain and stock tables in Chado

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Strain and stock tables in Chado

Scott Cain
Hi Dave, other Harvard folks and all,

We are making a hard push to get a new version of Chado out, and one of the things we're trying to reconcile is the strain tables that FlyBase contributed versus the way, well, just about everybody else is storing similar information.  I think we exchanged emails about this a few years ago, but I'd like to get this sorted.  

Before I go into the strain tables, I want to mention that one of the changes we want to incorporate into the organism table is a infraspecific_name field, which would store strain, cultivar and subspecies names.  The reason we want to add this to the organism table is so that features and other things that have organism as a foreign key can be associated directly with the subspecies that it belongs to.  We'll make the default value in the infraspecific_name field be an empty string which should make it backward compatible with software the queries the organism table and as long as loading software doesn't need to deal with the infraspecific_name field, it will be backward compatible with loading software too.

Now, about storing strains and the like: when the Natural Diversity paper was published, we advocated for treating strains, cultivars, and crosses as stocks, and that has largely worked (I'm sure plant people could point out where there are short comings, and we're working on those too), where the stock module has several tables that are analogous to the strain tables you committed, or will with the next release of Chado (like the feature_stock table, or is it stock_feature, I can't remember).

Anyway, what I'd really like from you, especially since you've now had a few years of using your strain tables, is whether there is something "missing" about the stock table approach that you think would be better captured using the strain tables.  My concern is that having both the strain and stock tables represent a sort of denormalization, where the same data could be stored in two places.  That makes the DBA inside of me queasy. 

Thanks much for your feedback.
Scott





--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Strain and stock tables in Chado

David Emmert
Hi Scott,

Good to hear from you and we're glad to give feedback, but need to talk it over here first.   One question: would organism.infraspecific_name be part of the unique key on organism?

Best,

-D

On Thu, May 14, 2015 at 3:01 PM, Scott Cain <[hidden email]> wrote:
Hi Dave, other Harvard folks and all,

We are making a hard push to get a new version of Chado out, and one of the things we're trying to reconcile is the strain tables that FlyBase contributed versus the way, well, just about everybody else is storing similar information.  I think we exchanged emails about this a few years ago, but I'd like to get this sorted.  

Before I go into the strain tables, I want to mention that one of the changes we want to incorporate into the organism table is a infraspecific_name field, which would store strain, cultivar and subspecies names.  The reason we want to add this to the organism table is so that features and other things that have organism as a foreign key can be associated directly with the subspecies that it belongs to.  We'll make the default value in the infraspecific_name field be an empty string which should make it backward compatible with software the queries the organism table and as long as loading software doesn't need to deal with the infraspecific_name field, it will be backward compatible with loading software too.

Now, about storing strains and the like: when the Natural Diversity paper was published, we advocated for treating strains, cultivars, and crosses as stocks, and that has largely worked (I'm sure plant people could point out where there are short comings, and we're working on those too), where the stock module has several tables that are analogous to the strain tables you committed, or will with the next release of Chado (like the feature_stock table, or is it stock_feature, I can't remember).

Anyway, what I'd really like from you, especially since you've now had a few years of using your strain tables, is whether there is something "missing" about the stock table approach that you think would be better captured using the strain tables.  My concern is that having both the strain and stock tables represent a sort of denormalization, where the same data could be stored in two places.  That makes the DBA inside of me queasy. 

Thanks much for your feedback.
Scott





--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Strain and stock tables in Chado

Scott Cain
Hi Dave,

Yes, it would have to be.



On Fri, May 15, 2015 at 9:23 AM, David Emmert <[hidden email]> wrote:
Hi Scott,

Good to hear from you and we're glad to give feedback, but need to talk it over here first.   One question: would organism.infraspecific_name be part of the unique key on organism?

Best,

-D

On Thu, May 14, 2015 at 3:01 PM, Scott Cain <[hidden email]> wrote:
Hi Dave, other Harvard folks and all,

We are making a hard push to get a new version of Chado out, and one of the things we're trying to reconcile is the strain tables that FlyBase contributed versus the way, well, just about everybody else is storing similar information.  I think we exchanged emails about this a few years ago, but I'd like to get this sorted.  

Before I go into the strain tables, I want to mention that one of the changes we want to incorporate into the organism table is a infraspecific_name field, which would store strain, cultivar and subspecies names.  The reason we want to add this to the organism table is so that features and other things that have organism as a foreign key can be associated directly with the subspecies that it belongs to.  We'll make the default value in the infraspecific_name field be an empty string which should make it backward compatible with software the queries the organism table and as long as loading software doesn't need to deal with the infraspecific_name field, it will be backward compatible with loading software too.

Now, about storing strains and the like: when the Natural Diversity paper was published, we advocated for treating strains, cultivars, and crosses as stocks, and that has largely worked (I'm sure plant people could point out where there are short comings, and we're working on those too), where the stock module has several tables that are analogous to the strain tables you committed, or will with the next release of Chado (like the feature_stock table, or is it stock_feature, I can't remember).

Anyway, what I'd really like from you, especially since you've now had a few years of using your strain tables, is whether there is something "missing" about the stock table approach that you think would be better captured using the strain tables.  My concern is that having both the strain and stock tables represent a sort of denormalization, where the same data could be stored in two places.  That makes the DBA inside of me queasy. 

Thanks much for your feedback.
Scott





--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Strain and stock tables in Chado

David Emmert
Hey Scott,

Anyway, what I'd really like from you, especially since you've now had a few years of using your strain tables, is whether there is something "missing" about the stock table approach that you think would be better captured using the strain tables.

I don't think there's anything wrong with the stock table approach for managing strains, and in fact after talking over your note with other FlyBase developers here at Harvard, and noting that using the stock module for managing strain data is pretty much the consensus among other chado users, I think we've agreed that FlyBase should probably bite the bullet and re-implement our strain data in the stock module. 

I can't say when this will happen, as we're really busy with other things at the moment, but its on our todo list.   We've tried all along to keep our chado implementation as similar as possible to the GMOD consensus, and its been four or five years since we've gone back and reconciled our chado instance with the official GMOD schema, so its high time we do this...

One thing, though: it looks like we would probably need to add a few tables to the stock module, to capture some attributes of strains that the stock module couldn't handle at the mo, but I assume we can work out details once we start the reimplementation in earnest.   At first glance, it doesn't look like any of the new tables would cause any backward compatibility issues.

My concern is that having both the strain and stock tables represent a sort of denormalization, where the same data could be stored in two places.  That makes the DBA inside of me queasy.

Yep!  Makes my inner DBA queasy too!    I totally agree that such denormalization should be avoided in the chado schema.   But having said that, isn't the proposed organism.infraspecific_name field (the notion of which I really like) not also a denormalization, since presumably the same name would be used for the strain in the stock module?

Best,

-Dave

On Fri, May 15, 2015 at 9:53 AM, Scott Cain <[hidden email]> wrote:
Hi Dave,

Yes, it would have to be.



On Fri, May 15, 2015 at 9:23 AM, David Emmert <[hidden email]> wrote:
Hi Scott,

Good to hear from you and we're glad to give feedback, but need to talk it over here first.   One question: would organism.infraspecific_name be part of the unique key on organism?

Best,

-D

On Thu, May 14, 2015 at 3:01 PM, Scott Cain <[hidden email]> wrote:
Hi Dave, other Harvard folks and all,

We are making a hard push to get a new version of Chado out, and one of the things we're trying to reconcile is the strain tables that FlyBase contributed versus the way, well, just about everybody else is storing similar information.  I think we exchanged emails about this a few years ago, but I'd like to get this sorted.  

Before I go into the strain tables, I want to mention that one of the changes we want to incorporate into the organism table is a infraspecific_name field, which would store strain, cultivar and subspecies names.  The reason we want to add this to the organism table is so that features and other things that have organism as a foreign key can be associated directly with the subspecies that it belongs to.  We'll make the default value in the infraspecific_name field be an empty string which should make it backward compatible with software the queries the organism table and as long as loading software doesn't need to deal with the infraspecific_name field, it will be backward compatible with loading software too.

Now, about storing strains and the like: when the Natural Diversity paper was published, we advocated for treating strains, cultivars, and crosses as stocks, and that has largely worked (I'm sure plant people could point out where there are short comings, and we're working on those too), where the stock module has several tables that are analogous to the strain tables you committed, or will with the next release of Chado (like the feature_stock table, or is it stock_feature, I can't remember).

Anyway, what I'd really like from you, especially since you've now had a few years of using your strain tables, is whether there is something "missing" about the stock table approach that you think would be better captured using the strain tables.  My concern is that having both the strain and stock tables represent a sort of denormalization, where the same data could be stored in two places.  That makes the DBA inside of me queasy. 

Thanks much for your feedback.
Scott





--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Strain and stock tables in Chado

Scott Cain
Hi Dave,

Hmm, I think I'd have to answer that last question with a hesitant "probably, I think, but" :-)  My gut feeling is that the main thing that the orgainism table is for is relating to features, and for the most part, there will be relatively few strains for which there will be sequences and features (that is relatively few compared to the number of entries in the stock table).  This does point to something that is going have to be cleared up at the very least via documentation.  We certainly don't want a 1:1 correspondence between entries in the stock table and entries in the organism table.

Scott


On Fri, May 15, 2015 at 4:02 PM, David Emmert <[hidden email]> wrote:
Hey Scott,

Anyway, what I'd really like from you, especially since you've now had a few years of using your strain tables, is whether there is something "missing" about the stock table approach that you think would be better captured using the strain tables.

I don't think there's anything wrong with the stock table approach for managing strains, and in fact after talking over your note with other FlyBase developers here at Harvard, and noting that using the stock module for managing strain data is pretty much the consensus among other chado users, I think we've agreed that FlyBase should probably bite the bullet and re-implement our strain data in the stock module. 

I can't say when this will happen, as we're really busy with other things at the moment, but its on our todo list.   We've tried all along to keep our chado implementation as similar as possible to the GMOD consensus, and its been four or five years since we've gone back and reconciled our chado instance with the official GMOD schema, so its high time we do this...

One thing, though: it looks like we would probably need to add a few tables to the stock module, to capture some attributes of strains that the stock module couldn't handle at the mo, but I assume we can work out details once we start the reimplementation in earnest.   At first glance, it doesn't look like any of the new tables would cause any backward compatibility issues.

My concern is that having both the strain and stock tables represent a sort of denormalization, where the same data could be stored in two places.  That makes the DBA inside of me queasy.

Yep!  Makes my inner DBA queasy too!    I totally agree that such denormalization should be avoided in the chado schema.   But having said that, isn't the proposed organism.infraspecific_name field (the notion of which I really like) not also a denormalization, since presumably the same name would be used for the strain in the stock module?

Best,

-Dave

On Fri, May 15, 2015 at 9:53 AM, Scott Cain <[hidden email]> wrote:
Hi Dave,

Yes, it would have to be.



On Fri, May 15, 2015 at 9:23 AM, David Emmert <[hidden email]> wrote:
Hi Scott,

Good to hear from you and we're glad to give feedback, but need to talk it over here first.   One question: would organism.infraspecific_name be part of the unique key on organism?

Best,

-D

On Thu, May 14, 2015 at 3:01 PM, Scott Cain <[hidden email]> wrote:
Hi Dave, other Harvard folks and all,

We are making a hard push to get a new version of Chado out, and one of the things we're trying to reconcile is the strain tables that FlyBase contributed versus the way, well, just about everybody else is storing similar information.  I think we exchanged emails about this a few years ago, but I'd like to get this sorted.  

Before I go into the strain tables, I want to mention that one of the changes we want to incorporate into the organism table is a infraspecific_name field, which would store strain, cultivar and subspecies names.  The reason we want to add this to the organism table is so that features and other things that have organism as a foreign key can be associated directly with the subspecies that it belongs to.  We'll make the default value in the infraspecific_name field be an empty string which should make it backward compatible with software the queries the organism table and as long as loading software doesn't need to deal with the infraspecific_name field, it will be backward compatible with loading software too.

Now, about storing strains and the like: when the Natural Diversity paper was published, we advocated for treating strains, cultivars, and crosses as stocks, and that has largely worked (I'm sure plant people could point out where there are short comings, and we're working on those too), where the stock module has several tables that are analogous to the strain tables you committed, or will with the next release of Chado (like the feature_stock table, or is it stock_feature, I can't remember).

Anyway, what I'd really like from you, especially since you've now had a few years of using your strain tables, is whether there is something "missing" about the stock table approach that you think would be better captured using the strain tables.  My concern is that having both the strain and stock tables represent a sort of denormalization, where the same data could be stored in two places.  That makes the DBA inside of me queasy. 

Thanks much for your feedback.
Scott





--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Strain and stock tables in Chado

Karl O. Pinc
FYI.

Babase is highly individual/demography centric.  

We pretty much ignore organism.  

Stocks represent physical samples, many of which
are from a single individual baboon taken in different
ways at different times.  

We relate both sequencing of individuals
and stock (samples) back to individuals using dbxrefs.



Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Strain and stock tables in Chado

Siddhartha Basu
In reply to this post by Scott Cain
Hi,

On Thu, 14 May 2015, Scott Cain wrote:

>    Hi Dave, other Harvard folks and all,
>    We are making a hard push to get a new version of Chado out, and one of
>    the things we're trying to reconcile is the strain tables that FlyBase
>    contributed versus the way, well, just about everybody else is storing
>    similar information.A  I think we exchanged emails about this a few years
>    ago, but I'd like to get this sorted. A
>    Before I go into the strain tables, I want to mention that one of the
>    changes we want to incorporate into the organism table is a
>    infraspecific_name field, which would store strain, cultivar and
>    subspecies names.A  The reason we want to add this to the organism table
>    is so that features and other things that have organism as a foreign key
>    can be associated directly with the subspecies that it belongs to.A  We'll
>    make the default value in the infraspecific_name field be an empty string
>    which should make it backward compatible with software the queries the
>    organism table and as long as loading software doesn't need to deal with
>    the infraspecific_name field, it will be backward compatible with loading
>    software too.
This will be a welcoming change at least for our use case at dictyBase
as we will be able to store our strain name without appending it to the
species name, couldn't be more happier than that.
Just wonder, how the unique field will work, will genus,species and
infraspecific_name now becomes the unique field with this change.


>    Now, about storing strains and the like: when the Natural Diversity paper
>    was published, we advocated for treating strains, cultivars, and crosses
>    as stocks, and that has largely worked (I'm sure plant people could point
>    out where there are short comings, and we're working on those too), where
>    the stock module has several tables that are analogous to the strain
>    tables you committed, or will with the next release of Chado (like the
>    feature_stock table, or is it stock_feature, I can't remember).
>    Anyway, what I'd really like from you, especially since you've now had a
>    few years of using your strain tables, is whether there is something
>    "missing" about the stock table approach that you think would be better
>    captured using the strain tables.A  My concern is that having both the
>    strain and stock tables represent a sort of denormalization, where the
>    same data could be stored in two places.A  That makes the DBA inside of me
>    queasy.A
Here is the previous discussion about strain tables
http://generic-model-organism-system-database.450254.n5.nabble.com/Opening-up-comments-on-changes-to-Chado-tp5711717.html
As i said earlier, it would be nice to have some use cases where these
new tables might come handly and hopefully we will get the good of both
worlds.

thanks,
-siddhartha

>    Thanks much for your feedback.
>    Scott
>    --
>    ------------------------------------------------------------------------
>    Scott Cain, Ph. D.A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A A
>    scott at scottcain dot net
>    GMOD Coordinator (http://gmod.org/)A  A  A  A  A  A  A  A  A  A A
>    216-392-3087
>    Ontario Institute for Cancer Research

> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Strain and stock tables in Chado

Stephen Ficklin-3
Hi Siddhartha,

In answer to your question about the unique constraint on the proposed organism table.  Here is the proposed DDL:

CREATE TABLE organism (
   organism_id serial NOT NULL,
   primary key (organism_id),
   abbreviation varchar(255) NULL,
   genus varchar(255) NOT NULL,
   species varchar(255) NOT NULL,
   common_name varchar(255) NULL,
   infraspecific_name varchar(1024) NULL,
   type_id int default NULL,
   comment text NULL,
   FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE,
   CONSTRAINT organism_c1 unique (genus,species, type_id, infraspecific_name)
);

There are two new fields:  type_id and infraspecific name.  The type_id is a foreign key on the cvterm table. It is used for terms like subspecies, variety, subvariety, forma and subforma.  For example, to store the name   "Acanthocalycium klimpelianum var. macranthum" the pieces would go as follows:

organism.genus = Acanthocalycium
organism.species =
klimpelianum
organism.type_id =
cvterm that has a value of 'variety'
organism.infraspecific_name =
macranthum

The unique constraint will be expanded to include both the type_id and infraspecific name but both can be NULL.

Stephen


On 15-05-18 12:11 PM, Siddhartha Basu wrote:
Hi,

On Thu, 14 May 2015, Scott Cain wrote:

   Hi Dave, other Harvard folks and all,
   We are making a hard push to get a new version of Chado out, and one of
   the things we're trying to reconcile is the strain tables that FlyBase
   contributed versus the way, well, just about everybody else is storing
   similar information.A  I think we exchanged emails about this a few years
   ago, but I'd like to get this sorted. A 
   Before I go into the strain tables, I want to mention that one of the
   changes we want to incorporate into the organism table is a
   infraspecific_name field, which would store strain, cultivar and
   subspecies names.A  The reason we want to add this to the organism table
   is so that features and other things that have organism as a foreign key
   can be associated directly with the subspecies that it belongs to.A  We'll
   make the default value in the infraspecific_name field be an empty string
   which should make it backward compatible with software the queries the
   organism table and as long as loading software doesn't need to deal with
   the infraspecific_name field, it will be backward compatible with loading
   software too.
This will be a welcoming change at least for our use case at dictyBase
as we will be able to store our strain name without appending it to the
species name, couldn't be more happier than that.
Just wonder, how the unique field will work, will genus,species and
infraspecific_name now becomes the unique field with this change.


   Now, about storing strains and the like: when the Natural Diversity paper
   was published, we advocated for treating strains, cultivars, and crosses
   as stocks, and that has largely worked (I'm sure plant people could point
   out where there are short comings, and we're working on those too), where
   the stock module has several tables that are analogous to the strain
   tables you committed, or will with the next release of Chado (like the
   feature_stock table, or is it stock_feature, I can't remember).
   Anyway, what I'd really like from you, especially since you've now had a
   few years of using your strain tables, is whether there is something
   "missing" about the stock table approach that you think would be better
   captured using the strain tables.A  My concern is that having both the
   strain and stock tables represent a sort of denormalization, where the
   same data could be stored in two places.A  That makes the DBA inside of me
   queasy.A 
Here is the previous discussion about strain tables
http://generic-model-organism-system-database.450254.n5.nabble.com/Opening-up-comments-on-changes-to-Chado-tp5711717.html
As i said earlier, it would be nice to have some use cases where these
new tables might come handly and hopefully we will get the good of both
worlds.

thanks,
-siddhartha

   Thanks much for your feedback.
   Scott
   --
   ------------------------------------------------------------------------
   Scott Cain, Ph. D.A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A A 
   scott at scottcain dot net
   GMOD Coordinator (http://gmod.org/)A  A  A  A  A  A  A  A  A  A A 
   216-392-3087
   Ontario Institute for Cancer Research

      
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

      
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Strain and stock tables in Chado

Scott Cain
Hi Stephen,

Did we decide against supplying a default empty string for infraspecific_name?  While Postgres allows null in unique constraints, since nulls aren't equal to one another, you could get a case where two rows have genus and species the same and nulls in infraspecific_name and type_id, but they would be allowed because the nulls make them not equal.  If instead the default value of an empty space is supplied for infraspecific_name, the duplicate would be rejected.

Oh, crap, no it wouldn't since the null in type_id would still keep the rows from being the same.  The only solution I can think of would be to supply a default value for infraspecific name and not include type in the unique constraint.  I understand why you want to have type_id in the constraint (so that you could have different types of infraspecific names with the same string, which probably does happen, because biology), but I think it introduces more problems that it fixes.

Scott


On Thu, May 28, 2015 at 8:44 AM, Stephen Ficklin <[hidden email]> wrote:
Hi Siddhartha,

In answer to your question about the unique constraint on the proposed organism table.  Here is the proposed DDL:

CREATE TABLE organism (
   organism_id serial NOT NULL,
   primary key (organism_id),
   abbreviation varchar(255) NULL,
   genus varchar(255) NOT NULL,
   species varchar(255) NOT NULL,
   common_name varchar(255) NULL,
   infraspecific_name varchar(1024) NULL,
   type_id int default NULL,
   comment text NULL,
   FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE,
   CONSTRAINT organism_c1 unique (genus,species, type_id, infraspecific_name)
);

There are two new fields:  type_id and infraspecific name.  The type_id is a foreign key on the cvterm table. It is used for terms like subspecies, variety, subvariety, forma and subforma.  For example, to store the name   "Acanthocalycium klimpelianum var. macranthum" the pieces would go as follows:

organism.genus = Acanthocalycium
organism.species =
klimpelianum
organism.type_id =
cvterm that has a value of 'variety'
organism.infraspecific_name =
macranthum

The unique constraint will be expanded to include both the type_id and infraspecific name but both can be NULL.

Stephen



On 15-05-18 12:11 PM, Siddhartha Basu wrote:
Hi,

On Thu, 14 May 2015, Scott Cain wrote:

   Hi Dave, other Harvard folks and all,
   We are making a hard push to get a new version of Chado out, and one of
   the things we're trying to reconcile is the strain tables that FlyBase
   contributed versus the way, well, just about everybody else is storing
   similar information.A  I think we exchanged emails about this a few years
   ago, but I'd like to get this sorted. A 
   Before I go into the strain tables, I want to mention that one of the
   changes we want to incorporate into the organism table is a
   infraspecific_name field, which would store strain, cultivar and
   subspecies names.A  The reason we want to add this to the organism table
   is so that features and other things that have organism as a foreign key
   can be associated directly with the subspecies that it belongs to.A  We'll
   make the default value in the infraspecific_name field be an empty string
   which should make it backward compatible with software the queries the
   organism table and as long as loading software doesn't need to deal with
   the infraspecific_name field, it will be backward compatible with loading
   software too.
This will be a welcoming change at least for our use case at dictyBase
as we will be able to store our strain name without appending it to the
species name, couldn't be more happier than that.
Just wonder, how the unique field will work, will genus,species and
infraspecific_name now becomes the unique field with this change.


   Now, about storing strains and the like: when the Natural Diversity paper
   was published, we advocated for treating strains, cultivars, and crosses
   as stocks, and that has largely worked (I'm sure plant people could point
   out where there are short comings, and we're working on those too), where
   the stock module has several tables that are analogous to the strain
   tables you committed, or will with the next release of Chado (like the
   feature_stock table, or is it stock_feature, I can't remember).
   Anyway, what I'd really like from you, especially since you've now had a
   few years of using your strain tables, is whether there is something
   "missing" about the stock table approach that you think would be better
   captured using the strain tables.A  My concern is that having both the
   strain and stock tables represent a sort of denormalization, where the
   same data could be stored in two places.A  That makes the DBA inside of me
   queasy.A 
Here is the previous discussion about strain tables
http://generic-model-organism-system-database.450254.n5.nabble.com/Opening-up-comments-on-changes-to-Chado-tp5711717.html
As i said earlier, it would be nice to have some use cases where these
new tables might come handly and hopefully we will get the good of both
worlds.

thanks,
-siddhartha

   Thanks much for your feedback.
   Scott
   --
   ------------------------------------------------------------------------
   Scott Cain, Ph. D.A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A A 
   scott at scottcain dot net
   GMOD Coordinator (http://gmod.org/)A  A  A  A  A  A  A  A  A  A A 
   <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
   Ontario Institute for Cancer Research

      
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

      
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Strain and stock tables in Chado

Karl O. Pinc
On Thu, 28 May 2015 12:16:17 -0400
Scott Cain <[hidden email]> wrote:

> Hi Stephen,
>
> Did we decide against supplying a default empty string for
> infraspecific_name?  While Postgres allows null in unique constraints,
> since nulls aren't equal to one another, you could get a case where
> two rows have genus and species the same and nulls in
> infraspecific_name and type_id, but they would be allowed because the
> nulls make them not equal. If instead the default value of an empty
> space is supplied for infraspecific_name, the duplicate would be
> rejected.
>
> Oh, crap, no it wouldn't since the null in type_id would still keep
> the rows from being the same.  The only solution I can think of would
> be to supply a default value for infraspecific name and not include
> type in the unique constraint.  I understand why you want to have
> type_id in the constraint (so that you could have different types of
> infraspecific names with the same string, which probably does happen,
> because biology), but I think it introduces more problems that it
> fixes.

Often the best approach is to use the WHERE syntax in the
constraint.  (Really, create a unique index, which is what
a constraint does.)

To unique index with 2 columns allowed to be null
you need 4 indexes:

create unique index i0 on organism
 (genus, species)
 where type_id is null and infraspecific_name is null;

create unique index i1 on organism
 (genus, species, type_id)
 where type_id is not null and infraspecific_name is null;

create unique index i2 on organism
 (genus, species, infraspecific_name)
 where type_id is null and infraspecific_name is not null;

create unique index i3 on organism
 (genus, species, type_id, infraspecific_name)
 where type_id is not null and infraspecific_name is not null;

Since organism is not a large table there's no problems
with having 4 indexes.

Alternately, you could index on an expression.


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema