Property tables and units

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Property tables and units

Karl O. Pinc
Hello,

What I keep coming back to vis chado's property tables is the      
question of units.  Just where are the units in which the value is
measured supposed to be recorded?    


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Property tables and units

Scott Cain
Hi Karl,

I think most people who use Chado fall into the "you're just supposed to know" category, and I imagine much of the time that works.  I can't think of a simple solution to applying units to entries in prop tables, but I can think of a moderately complex way :-)  (Of course, it's after 11pm, so I may regret writing this tomorrow).

You could allow a composition of two cvterms from separate CVs, one for type of the property and one for the units.  So if you had "concentration" as a type for the property and "molar" for the units, you could dynamically (via application code) create a new term "concentration (molar)" in a CV created to hold these compositions, where the newly created term would be related to the two parent terms via the cvterm_relationship table.

I have no doubt that there are a host of problems with this approach, but I can't think of any at the moment, so I've cc'ed Chris Mungall so he can slap me around a bit.

Scott


On Mon, Dec 1, 2014 at 7:03 PM, Karl O. Pinc <[hidden email]> wrote:
Hello,

What I keep coming back to vis chado's property tables is the
question of units.  Just where are the units in which the value is
measured supposed to be recorded?


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Fwd: Property tables and units

Sook Jung
In reply to this post by Karl O. Pinc
Hi Karl,

Forgot to cc the mailing list and then I realized I didn't mention more important point.. Since none of the cvterms that we store in prop tables (prop.type_id) have units, I focused on the phenotype descriptors (phenotype.attr_id).. But to follow the same way, you'll just have to create a cvterm for each units, length_m, length_cm etc. The actual unit (m or cm) can be stored in cvtermprop as I wrote before (either in value or type_id if you use unit ontology). I think it's simpler than trying to figure out how to break up the cvterm into two - 'length' and 'm' / 'cm' - and then try to concatenate them again..

Cheers

Sook




---------- Forwarded message ----------
From: Sook Jung <[hidden email]>
Date: Wed, Dec 3, 2014 at 10:55 PM
Subject: Re: [Gmod-schema] Property tables and units
To: "Karl O. Pinc" <[hidden email]>


Hi Karl,

I know there have been some discussions about how to store units when people met to re-think the phenotype module but I'm not sure whether they came up with a consensus or not.  

We (GDR, CottoGen, etc) just store units of phenotypic descriptors in cvtermprop. cvtermprop,type_id is a cvterm_id for 'unit' and cvtermprop.value is the actual unit (eg. cm, g, kg, etc). Specific phenotypic descriptors with units can later be associated with more generic ontology like Trait Ontology.

For more formal approach, we could use unit ontology. So the phenotypic descriptors are associated with a specific unit ontology stored as cvtermprop.type_id.

Hopefully more people who actually store units will tell us more about it..

Thanks

Sook

On Mon, Dec 1, 2014 at 7:03 PM, Karl O. Pinc <[hidden email]> wrote:
Hello,

What I keep coming back to vis chado's property tables is the
question of units.  Just where are the units in which the value is
measured supposed to be recorded?


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Property tables and units

Scott Cain
Hi Sook,

Essentially what you're advocating for is creating a cross product of two ontologies or CVs. I'm not against that; the thing to look out for is that it may become unwieldy depending on how many terms you have to create.  Also it may be more difficult for biologists to do in a consistent way.  By giving the biologist access to the full ontologies to do the composition, you (the admin) have control over the starting material, but the biologists can compose terms as needed. 


Sent from my iPad

On Dec 3, 2014, at 11:09 PM, Sook Jung <[hidden email]> wrote:

Hi Karl,

Forgot to cc the mailing list and then I realized I didn't mention more important point.. Since none of the cvterms that we store in prop tables (prop.type_id) have units, I focused on the phenotype descriptors (phenotype.attr_id).. But to follow the same way, you'll just have to create a cvterm for each units, length_m, length_cm etc. The actual unit (m or cm) can be stored in cvtermprop as I wrote before (either in value or type_id if you use unit ontology). I think it's simpler than trying to figure out how to break up the cvterm into two - 'length' and 'm' / 'cm' - and then try to concatenate them again..

Cheers

Sook




---------- Forwarded message ----------
From: Sook Jung <[hidden email]>
Date: Wed, Dec 3, 2014 at 10:55 PM
Subject: Re: [Gmod-schema] Property tables and units
To: "Karl O. Pinc" <[hidden email]>


Hi Karl,

I know there have been some discussions about how to store units when people met to re-think the phenotype module but I'm not sure whether they came up with a consensus or not.  

We (GDR, CottoGen, etc) just store units of phenotypic descriptors in cvtermprop. cvtermprop,type_id is a cvterm_id for 'unit' and cvtermprop.value is the actual unit (eg. cm, g, kg, etc). Specific phenotypic descriptors with units can later be associated with more generic ontology like Trait Ontology.

For more formal approach, we could use unit ontology. So the phenotypic descriptors are associated with a specific unit ontology stored as cvtermprop.type_id.

Hopefully more people who actually store units will tell us more about it..

Thanks

Sook

On Mon, Dec 1, 2014 at 7:03 PM, Karl O. Pinc <[hidden email]> wrote:
Hello,

What I keep coming back to vis chado's property tables is the
question of units.  Just where are the units in which the value is
measured supposed to be recorded?


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Property tables and units

Mara Kim-3
Hey guys,

This reminds me a bit about something I have been toying around with.  I have been thinking about creating `${TABLE}cvprop` tables.  Essentially, for every `${TABLE}prop` table, you could conceivably have a table with the following structure:

CREATE TABLE ${TABLE}cvprop (
  ${TABLE}cvprop_id serial,
  ${TABLE}_id integer NOT NULL REFERENCES ${TABLE}
    ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED,
  value integer NOT NULL REFERENCES cvterm
    ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED,
  rank integer NOT NULL DEFAULT 0,
  type_id integer NOT NULL REFERENCES cvterm
    ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED,
  UNIQUE(rank, type_id, ${TABLE}_id)
);

Then, you could use store things like, the property of type `length` is value `cm`, with both the type and value being controlled vocabulary terms.  This eliminates the need to generate concatenated cross products of cvs and instead compose them using the schema.

On Thu, Dec 4, 2014 at 7:15 PM, Scott Cain <[hidden email]> wrote:
Hi Sook,

Essentially what you're advocating for is creating a cross product of two ontologies or CVs. I'm not against that; the thing to look out for is that it may become unwieldy depending on how many terms you have to create.  Also it may be more difficult for biologists to do in a consistent way.  By giving the biologist access to the full ontologies to do the composition, you (the admin) have control over the starting material, but the biologists can compose terms as needed. 


Sent from my iPad

On Dec 3, 2014, at 11:09 PM, Sook Jung <[hidden email]> wrote:

Hi Karl,

Forgot to cc the mailing list and then I realized I didn't mention more important point.. Since none of the cvterms that we store in prop tables (prop.type_id) have units, I focused on the phenotype descriptors (phenotype.attr_id).. But to follow the same way, you'll just have to create a cvterm for each units, length_m, length_cm etc. The actual unit (m or cm) can be stored in cvtermprop as I wrote before (either in value or type_id if you use unit ontology). I think it's simpler than trying to figure out how to break up the cvterm into two - 'length' and 'm' / 'cm' - and then try to concatenate them again..

Cheers

Sook




---------- Forwarded message ----------
From: Sook Jung <[hidden email]>
Date: Wed, Dec 3, 2014 at 10:55 PM
Subject: Re: [Gmod-schema] Property tables and units
To: "Karl O. Pinc" <[hidden email]>


Hi Karl,

I know there have been some discussions about how to store units when people met to re-think the phenotype module but I'm not sure whether they came up with a consensus or not.  

We (GDR, CottoGen, etc) just store units of phenotypic descriptors in cvtermprop. cvtermprop,type_id is a cvterm_id for 'unit' and cvtermprop.value is the actual unit (eg. cm, g, kg, etc). Specific phenotypic descriptors with units can later be associated with more generic ontology like Trait Ontology.

For more formal approach, we could use unit ontology. So the phenotypic descriptors are associated with a specific unit ontology stored as cvtermprop.type_id.

Hopefully more people who actually store units will tell us more about it..

Thanks

Sook

On Mon, Dec 1, 2014 at 7:03 PM, Karl O. Pinc <[hidden email]> wrote:
Hello,

What I keep coming back to vis chado's property tables is the
question of units.  Just where are the units in which the value is
measured supposed to be recorded?


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Property tables and units

Karl O. Pinc
On 12/05/2014 12:36:58 PM, Mara Kim wrote:

> Hey guys,
>
> This reminds me a bit about something I have been toying around with.
> I
> have been thinking about creating `${TABLE}cvprop` tables.
> Essentially,
> for every `${TABLE}prop` table, you could conceivably have a table
> with the
> following structure:
>
> CREATE TABLE ${TABLE}cvprop (
>   ${TABLE}cvprop_id serial,
>   ${TABLE}_id integer NOT NULL REFERENCES ${TABLE}
>     ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY
> DEFERRED,
>   value integer NOT NULL REFERENCES cvterm
>     ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY
> DEFERRED,
>   rank integer NOT NULL DEFAULT 0,
>   type_id integer NOT NULL REFERENCES cvterm
>     ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY
> DEFERRED,
>   UNIQUE(rank, type_id, ${TABLE}_id)
> );
>
> Then, you could use store things like, the property of type `length`
> is
> value `cm`, with both the type and value being controlled vocabulary
> terms.  This eliminates the need to generate concatenated cross
> products of
> cvs and instead compose them using the schema.

The question is whether there's ever going to be more than
one of these rows per ${TABLE} row.  If not then may as well
just add a unit column to ${TABLE}.



Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Property tables and units

Mara Kim-3

My personal reason for these tables are to describe test and control tissue samples in expression data sets using the Uberon anatomy ontology. For example, one expression study may have compared 'control_tissue' 'liver' to a 'test_tissue' 'brain'. Being able to describe the property values in a hierarchical manner is crucial here so that it is possible to infer, for example, that 'cerebellum' is 'part_of' the 'nervous_system', and group expression studies accordingly.

On Dec 5, 2014 12:47 PM, "Karl O. Pinc" <[hidden email]> wrote:
On 12/05/2014 12:36:58 PM, Mara Kim wrote:
> Hey guys,
>
> This reminds me a bit about something I have been toying around with.
> I
> have been thinking about creating `${TABLE}cvprop` tables.
> Essentially,
> for every `${TABLE}prop` table, you could conceivably have a table
> with the
> following structure:
>
> CREATE TABLE ${TABLE}cvprop (
>   ${TABLE}cvprop_id serial,
>   ${TABLE}_id integer NOT NULL REFERENCES ${TABLE}
>     ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY
> DEFERRED,
>   value integer NOT NULL REFERENCES cvterm
>     ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY
> DEFERRED,
>   rank integer NOT NULL DEFAULT 0,
>   type_id integer NOT NULL REFERENCES cvterm
>     ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY
> DEFERRED,
>   UNIQUE(rank, type_id, ${TABLE}_id)
> );
>
> Then, you could use store things like, the property of type `length`
> is
> value `cm`, with both the type and value being controlled vocabulary
> terms.  This eliminates the need to generate concatenated cross
> products of
> cvs and instead compose them using the schema.

The question is whether there's ever going to be more than
one of these rows per ${TABLE} row.  If not then may as well
just add a unit column to ${TABLE}.



Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Property tables and units

Karl O. Pinc
On 12/05/2014 12:53:05 PM, Mara Kim wrote:

> My personal reason for these tables are to describe test and control
> tissue
> samples in expression data sets using the Uberon anatomy ontology.
> For
> example, one expression study may have compared 'control_tissue'
> 'liver' to
> a 'test_tissue' 'brain'. Being able to describe the property values
> in
> a
> hierarchical manner is crucial here so that it is possible to infer,
> for
> example, that 'cerebellum' is 'part_of' the 'nervous_system', and
> group
> expression studies accordingly.


I don't feel like I'm really familiar with chado so please bear
with me.  Aren't they described hierarchically anyway by way of
the properties being typed with cvterms, and the cvterms
being organized hierarchically via the cvterm_relationship table?  
(And then queryable via cvterm_path?)



Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Property tables and units

Mara Kim-3
What you are describing applies to the `type` of the property, but not the `value`.  The easiest way I've found to think about it is key-value pairs in a hashmap.  The ${TABLE}prop tables constrain the keys to be controlled vocabulary terms, but the value is essentially free text (thus not able to be reasoned with in a computational manner).  The idea of the ${TABLE}cvprop tables is to constrain values to be cvterms as well.

On Fri, Dec 5, 2014 at 1:52 PM, Karl O. Pinc <[hidden email]> wrote:
On 12/05/2014 12:53:05 PM, Mara Kim wrote:
> My personal reason for these tables are to describe test and control
> tissue
> samples in expression data sets using the Uberon anatomy ontology.
> For
> example, one expression study may have compared 'control_tissue'
> 'liver' to
> a 'test_tissue' 'brain'. Being able to describe the property values
> in
> a
> hierarchical manner is crucial here so that it is possible to infer,
> for
> example, that 'cerebellum' is 'part_of' the 'nervous_system', and
> group
> expression studies accordingly.


I don't feel like I'm really familiar with chado so please bear
with me.  Aren't they described hierarchically anyway by way of
the properties being typed with cvterms, and the cvterms
being organized hierarchically via the cvterm_relationship table?
(And then queryable via cvterm_path?)



Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Property tables and units

Karl O. Pinc
On 12/05/2014 01:56:35 PM, Mara Kim wrote:

> What you are describing applies to the `type` of the property, but
> not
> the
> `value`.  The easiest way I've found to think about it is key-value
> pairs
> in a hashmap.  The ${TABLE}prop tables constrain the keys to be
> controlled
> vocabulary terms, but the value is essentially free text (thus not
> able to
> be reasoned with in a computational manner).  The idea of the
> ${TABLE}cvprop tables is to constrain values to be cvterms as well.

Ok.  Very different from what I'm interested in.  Except that
we both want 2 cvterms.  But I want 2 cvterms and a value
and you just want a tuple of 2 cvterms.




Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Property tables and units

Mara Kim-3
Ah, I see.  How are you storing your data?  Are the units for your properties going to be consistent for an entire data set, or will each individual value use different units?  If your units are consistent within a whole data set, it would reduce redundancy (and potential inconsistencies) to describe the units at the level of the entire data set rather than for each value of the analysis.  In that case you could use a structure like this:

my_dataset
|-> properties
|  |-> length : cm
|  |-> weight : kg
|
|-> data
   |-> sample1
   |  |-> length : 5
   |  |-> weight : 2
   |
   |-> sample2
   | etc.


On Fri, Dec 5, 2014 at 3:44 PM, Karl O. Pinc <[hidden email]> wrote:
On 12/05/2014 01:56:35 PM, Mara Kim wrote:
> What you are describing applies to the `type` of the property, but
> not
> the
> `value`.  The easiest way I've found to think about it is key-value
> pairs
> in a hashmap.  The ${TABLE}prop tables constrain the keys to be
> controlled
> vocabulary terms, but the value is essentially free text (thus not
> able to
> be reasoned with in a computational manner).  The idea of the
> ${TABLE}cvprop tables is to constrain values to be cvterms as well.

Ok.  Very different from what I'm interested in.  Except that
we both want 2 cvterms.  But I want 2 cvterms and a value
and you just want a tuple of 2 cvterms.




Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Property tables and units

Karl O. Pinc
On 12/05/2014 04:19:03 PM, Mara Kim wrote:

> Ah, I see.  How are you storing your data?  Are the units for your
> properties going to be consistent for an entire data set, or will
> each
> individual value use different units?  If your units are consistent
> within
> a whole data set, it would reduce redundancy (and potential
> inconsistencies) to describe the units at the level of the entire
> data
> set
> rather than for each value of the analysis.  In that case you could
> use a
> structure like this:
>
> my_dataset
> |-> properties
> |  |-> length : cm
> |  |-> weight : kg
> |
> |-> data
>    |-> sample1
>    |  |-> length : 5
>    |  |-> weight : 2
>    |
>    |-> sample2
>    | etc.


This would put me back in your camp of wanting to
have 'cm' and 'kg' be types.

But I'm also worried about having multiple
metrics of the same sort, say, volume:

my_dataset
|-> properties
|  |-> corn_metric : bushels
|  |-> wheat_metric : pecks
|
|-> data
   |-> sample1
   |  |-> corn : 5
   |  |-> wheat : 2
   |
   |-> sample2
   | etc.

Clearly the details matter and I made
up my example out of whole cloth, it has
nothing to do with what we're really doing
(baboons and SNVs).  But it still seems
like there could be potential for doubling
the number of types just to stick a '_metric'
on the end.  It does not feel right.

(At this point I'm ready to stop thinking about
it and get back to work.  ;-)

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Property tables and units

Valentin Guignon
Hi there, sorry for such a long mail but there's some content that can't
really be shortened.




[CONTEXT]
a couple of month ago, you discussed about property tables. Then during
GMOD meeting at PAG in January, I also expressed a need about
properties: we need to be able to use controlled vocabulary terms as
values. And today, we should talk about the next releases of Chado (RFC:
Chado relase v1.3) that's why I bring back this topic here now.




[NEEDS]
First, I'd like to sum up what's needed/uses cases I saw (correct me if
I missed some):
-store numeric values or even dates as they are, and not as text in
order to be able to use SQL comparison function that works on numbers
and not on text storing numbers.
-use CV terms as values
-be able to compose properties in order to qualify values (for instance
having a numeric value coming with its unit)




[PROPOSITIONS]
So I came up with a proposition to be discussed here and that I'll post
on the google doc for the next releases (cf. RFC: Chado relase v1.3
mail). Actually, I got several flavors to offer:
1) replacing existing table by a new table with "typed" properties
2) adding a new table with "typed" properties and keep the old one
3) adding a new table with "typed" properties and remove the old one
2b) same as 2 with triggers to auto-synchronize the 2 tables
3b) same as 3 with a view and rules in order to have virtual old
property tables that would behave just like before but store data in the
new table

note: the idea behind 2b et 3b is to have retro-compatibility with "old"
softwares.

The main idea is to have several typed value fields that will hold typed
data and add a rule (check) in order to prevent missuses and only allow
one of those value fields to be used at once. To compose values, I added
a "compose_order" field that would allow to order composition while
still having the possibility to use the rank field to have several
composed-values (for instance a same length in 2 different units).




[EXAMPLES]
Let's see several use cases on the stock table for instance.

Storing:
I got a plant stock stock_id=42 with the following properties:
* stem length of 30.5cm or 12inches (inches stored as int while cm as float)
* dark green color
* acquisition date being the 2nd of January, 2015
* description: 'my favorite plant'

Some cv and cvterms:
* cv 'units':
** cvterm 'cm' cvterm_id=13
** cvterm 'inches' cvterm_id=34
* cv 'colors':
** cvterm 'dark green' cvterm_id=304
* cv 'crop descriptors':
** cvterm 'stem length' cvterm_id=98
** cvterm 'leaf color' cvterm_id=145
** cvterm 'acquisition date' cvterm_id=167
** cvterm 'description' cvterm_id=204

Insert statement:
INSERT INTO stocktprop (
     stock_id,
     type_id,
     text_value,
     int_value,
     float_value,
     time_value,
     cv_value,
     rank,
     compose_order
   )
VALUES (
     42,
     98, -- stem length
     NULL,
     NULL,
     30.5, -- float
     NULL,
     NULL,
     0,
     0
   ), (
     42,
     98, -- stem length
     NULL,
     NULL,
     NULL,
     NULL,
     13, -- cv term cm
     0,
     1 -- next part of the value
   ), (
     42,
     98, -- stem length
     NULL,
     12, -- int
     NULL,
     NULL,
     NULL,
     1, -- next rank
     0
   ), (
     42,
     98, -- stem length
     NULL,
     NULL,
     NULL,
     NULL,
     34, -- cv term inches
     1,
     1
   ), (
     42,
     145, -- leaf color
     NULL,
     NULL,
     NULL,
     NULL,
     304, -- cv term dark green
     0,
     0
   ), (
     42,
     167, -- acquisition date
     NULL,
     NULL,
     NULL,
     '2015-01-02',
     NULL,
     0,
     0
   )
;

And for the last value for flavors 1, 2 and 3:
INSERT INTO stocktprop (
     stock_id,
     type_id,
     text_value,
     int_value,
     float_value,
     time_value,
     cv_value,
     rank,
     compose_order
   )
VALUES (
     42,
     204, -- description
     'my favorite plant',
     NULL,
     NULL,
     NULL,
     NULL,
     0,
     0
   )
;

Alternatively, for flavors 2b and 3b, we can use above query but also
this one which would behave a bit differently for flavor 2b as the data
will be duplicated in stockprop and stocktprop:
INSERT INTO stockprop (
     stock_id,
     type_id,
     value,
     rank
   )
VALUES (
     42,
     204, -- description
     'my favorite plant',
     0
   )
;

Note: the same CV 'colors' could be used to describe other parts than
leaves, that's why we couldn't use stock_cvterm in that case.

To get all stocks that have a stem shorter than or equal to 12 inches
from the:
SELECT sp.stock_id, COALESCE(sp.int_value, sp.float_value) AS "stem
length", cvt.name AS "unit"
   FROM stocktprop sp
     JOIN stocktprop spunit ON sp.type_id = spunit.type_id AND sp.rank =
spunit.rank
     JOIN cvterm cvt ON cvt.cvterm_id = spunit.cv_value
   WHERE
     sp.type_id = 98 -- stem length
     AND sp.compose_order = 0 -- value part
     AND (sp.int_value <= 12 OR sp.float_value <= 12) -- allows to
switch unit if we want or handle inches stored as float
     AND spunit.compose_order = 1 -- unit part
     AND spunit.cv_value = 34 -- inches
ORDER BY "stem length" DESC;


Note: with flavors 2b) and 3b), you can still query the stockprop
table/view but you can't select values below or equal to 12.
Also note that you can insert/update data into stockprop table/view with
old softwares and have that data appear in the new stocktprop table
allowing new softwares (supporting stocktprop table) to use that data.
However, in flavor 2b, data inserted into stocktprop is not also
inserted into stockprop (a trigger could be done to do so though...).




[SQL CODE]
----------

-- FLAVOR 1): just remove the 't' to have [% TABLENAME %]prop and
--            replace the field name 'text_value' by 'value'
--            In case of existing database update, you will only
--            have to write the SQL queries to alter your schema
--            according to the CREATE TABLE statement.
-- ALL FLAVORS
-- New typed-property (tprop) table creation
CREATE SEQUENCE [% TABLENAME %]tprop_[% TABLENAME %]tprop_id_seq;
CREATE TABLE [% TABLENAME %]tprop (
   [% TABLENAME %]tprop_id INTEGER NOT NULL DEFAULT NEXTVAL('[%
TABLENAME %]tprop_[% TABLENAME %]tprop_id_seq'),
   [% TABLENAME %]_id INTEGER NOT NULL REFERENCES [% TABLENAME %]
     ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED,
   type_id INTEGER NOT NULL REFERENCES cvterm
     ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED,

   text_value TEXT NULL DEFAULT NULL,
   int_value BIGINT NULL DEFAULT NULL,
   float_value DOUBLE PRECISION NULL DEFAULT NULL,
   time_value TIMESTAMP NULL DEFAULT NULL,
   cv_value INTEGER NULL DEFAULT NULL REFERENCES cvterm
     ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED,

   rank INTEGER NOT NULL DEFAULT 0,
   compose_order INTEGER NOT NULL DEFAULT 0,
   -- we add compose_order in the constraint
   UNIQUE([% TABLENAME %]_id, type_id, rank, compose_order),
   -- rule to make sure only 1 value field at a time will be used
   CHECK(1 =
     ( CAST(text_value IS NOT NULL AS INTEGER)
     + CAST(int_value IS NOT NULL AS INTEGER)
     + CAST(float_value IS NOT NULL AS INTEGER)
     + CAST(time_value IS NOT NULL AS INTEGER)
     + CAST(cv_value IS NOT NULL AS INTEGER)
     )
   )
);
COMMENT ON TABLE [% TABLENAME %]tprop IS 'Fill only one value field for
an entry and use compose_order field to group values together in a
specific order. Use rank to store several values of the same type.';
-- ALL BUT FLAVOR 1: keep previous data
INSERT INTO [% TABLENAME %]tprop ([% TABLENAME %]_id, type_id,
text_value, rank)
   SELECT ([% TABLENAME %]_id, type_id, value, rank) FROM [% TABLENAME
%]prop;




-- FLAVORS 3) and 3b) only on existing databases:
DROP TABLE [% TABLENAME %]prop;




-- FALVOR 3b) only:
-- keep backward compatibility using views (no data duplicates)
-- View creation
CREATE OR REPLACE VIEW [% TABLENAME %]prop (
     [% TABLENAME %]prop_id,
     [% TABLENAME %]_id,
     type_id,
     value,
     rank
   ) AS
   SELECT
     min(p.[% TABLENAME %]tprop_id),
     p.[% TABLENAME %]_id,
     p.type_id,
     string_agg(
       COALESCE(
         p.text_value,
         CAST(p.int_value AS TEXT),
         CAST(p.float_value AS TEXT),
         CAST(p.time_value AS TEXT),
         cvt.name
       ),
       ':'
     ORDER BY p.compose_order),
     p.rank
   FROM [% TABLENAME %]tprop p
     LEFT JOIN cvterm cvt ON cvt.cvterm_id = p.cv_value
   GROUP BY p.[% TABLENAME %]_id, p.type_id, p.rank
;

-- Manage inserts on the view
CREATE RULE [% TABLENAME %]prop_insert AS
ON INSERT TO [% TABLENAME %]prop
DO INSTEAD
   INSERT INTO [% TABLENAME %]tprop VALUES (
     COALESCE(NEW.[% TABLENAME %]prop_id, NEXTVAL('[% TABLENAME
%]tprop_[% TABLENAME %]tprop_id_seq')),
     NEW.[% TABLENAME %]_id,
     NEW.type_id,
     NEW.value,
     NULL,
     NULL,
     NULL,
     NULL,
     COALESCE(NEW.rank, 0),
     DEFAULT
   )
;

-- Manage updates on the view
CREATE RULE [% TABLENAME %]prop_update AS
ON UPDATE TO [% TABLENAME %]prop
DO INSTEAD
   UPDATE [% TABLENAME %]tprop SET
     [% TABLENAME %]tprop_id = NEW.[% TABLENAME %]prop_id,
     [% TABLENAME %]_id      = NEW.[% TABLENAME %]_id,
     type_id         = NEW.type_id,
     text_value      = NEW.value,
     int_value       = NULL,
     float_value     = NULL,
     time_value      = NULL,
     cv_value        = NULL,
     rank            = NEW.rank
   WHERE
     [% TABLENAME %]_id = OLD.[% TABLENAME %]_id
     AND type_id = OLD.type_id
     AND rank = OLD.rank
;

-- Manage deletes on the view
CREATE RULE [% TABLENAME %]prop_delete AS
ON UPDATE TO [% TABLENAME %]prop
DO INSTEAD
   DELETE FROM [% TABLENAME %]tprop
   WHERE
     [% TABLENAME %]_id = OLD.[% TABLENAME %]_id
     AND type_id = OLD.type_id
     AND rank = OLD.rank
;




-- FALVOR 2b) only:
-- backward compatibility keeping old table with triggers (to duplicate
data)
-- Insert trigger
CREATE OR REPLACE FUNCTION [% TABLENAME %]prop_to_[% TABLENAME
%]tprop_insert() RETURNS TRIGGER AS
$$
   BEGIN
     INSERT INTO [% TABLENAME %]tprop VALUES (
       COALESCE(NEW.[% TABLENAME %]prop_id, NEXTVAL('[% TABLENAME
%]tprop_[% TABLENAME %]tprop_id_seq')),
       NEW.[% TABLENAME %]_id,
       NEW.type_id,
       NEW.value,
       NULL,
       NULL,
       NULL,
       NULL,
       COALESCE(NEW.rank, 0),
       DEFAULT
     );
     RETURN NEW;
   END
$$
LANGUAGE plpgsql;
CREATE TRIGGER [% TABLENAME %]prop_to_[% TABLENAME %]tprop_insert_trigger
   AFTER INSERT ON [% TABLENAME %]prop
   FOR EACH ROW
   EXECUTE PROCEDURE [% TABLENAME %]prop_to_[% TABLENAME %]tprop_insert();

-- Update trigger
CREATE OR REPLACE FUNCTION [% TABLENAME %]prop_to_[% TABLENAME
%]tprop_update() RETURNS TRIGGER AS
$$
   BEGIN
     UPDATE [% TABLENAME %]tprop SET
       [% TABLENAME %]tprop_id = NEW.[% TABLENAME %]prop_id,
       [% TABLENAME %]_id      = NEW.[% TABLENAME %]_id,
       type_id         = NEW.type_id,
       text_value      = NEW.value,
       int_value       = NULL,
       float_value     = NULL,
       time_value      = NULL,
       cv_value        = NULL,
       rank            = NEW.rank
     WHERE
       [% TABLENAME %]_id = OLD.[% TABLENAME %]_id
       AND type_id = OLD.type_id
       AND rank = OLD.rank
     ;
     RETURN NEW;
   END
$$
LANGUAGE plpgsql;
CREATE TRIGGER [% TABLENAME %]prop_to_[% TABLENAME %]tprop_update_trigger
   AFTER UPDATE ON [% TABLENAME %]prop
   FOR EACH ROW
   EXECUTE PROCEDURE [% TABLENAME %]prop_to_[% TABLENAME %]tprop_update();

-- Delete trigger
CREATE OR REPLACE FUNCTION [% TABLENAME %]prop_to_[% TABLENAME
%]tprop_delete() RETURNS TRIGGER AS
$$
   BEGIN
     DELETE FROM [% TABLENAME %]tprop
     WHERE
       [% TABLENAME %]_id = OLD.[% TABLENAME %]_id
       AND type_id = OLD.type_id
       AND rank = OLD.rank
     ;
     RETURN OLD;
   END
$$
LANGUAGE plpgsql;
CREATE TRIGGER [% TABLENAME %]prop_to_[% TABLENAME %]tprop_delete_trigger
   AFTER DELETE ON [% TABLENAME %]prop
   FOR EACH ROW
   EXECUTE PROCEDURE [% TABLENAME %]prop_to_[% TABLENAME %]tprop_delete();

-- Truncate trigger
CREATE OR REPLACE FUNCTION [% TABLENAME %]prop_to_[% TABLENAME
%]tprop_truncate() RETURNS TRIGGER AS
$$
   BEGIN
     TRUNCATE [% TABLENAME %]tprop;
     RETURN NULL;
   END
$$
LANGUAGE plpgsql;
CREATE TRIGGER [% TABLENAME %]prop_to_[% TABLENAME %]tprop_truncate_trigger
   AFTER TRUNCATE ON [% TABLENAME %]prop
   FOR EACH STATEMENT
   EXECUTE PROCEDURE [% TABLENAME %]prop_to_[% TABLENAME %]tprop_truncate();

----------




[SUMMARY]
To solve table property issues, I propose to the Chado community to
debate and vote on the 5 flavors presented here to keep only one of them.

Flavors:
1) replacing existing table by a new table with "typed" properties
Pros: keeps the same schema with minor changes on the 'prop tables
Cons: some old software may "not like" to see new columns and would
   need to be updated.

2) adding a new table with "typed" properties and keep the old one
Pros: backward compatibility...
Cons: ...but properties will come from 2 different tables and
conflicting values could appear.
Double table management.

3) adding a new table with "typed" properties and remove the old one
Pros: forces people to be aware and use the new way of storing properties
Cons: no backward compatibility

2b) same as 2 with triggers to auto-synchronize the 2 tables
Pros: backward compatibility
Cons: double data storage, working with the old property table may have
side effect issues.

3b) same as 3 with a view and rules in order to have virtual old
property tables that would behave just like before but store data in the
new table
Pros: backward compatibility
Cons: inserting/updating through the view replacing the old property
tables may have side effect issues.


Personally, I would prefer flavor 1 or 3b.
Regards,



    Valentin Guignon


On 05/12/2014 23:39, Karl O. Pinc wrote:

> On 12/05/2014 04:19:03 PM, Mara Kim wrote:
>> Ah, I see.  How are you storing your data?  Are the units for your
>> properties going to be consistent for an entire data set, or will
>> each
>> individual value use different units?  If your units are consistent
>> within
>> a whole data set, it would reduce redundancy (and potential
>> inconsistencies) to describe the units at the level of the entire
>> data
>> set
>> rather than for each value of the analysis.  In that case you could
>> use a
>> structure like this:
>>
>> my_dataset
>> |-> properties
>> |  |-> length : cm
>> |  |-> weight : kg
>> |
>> |-> data
>>     |-> sample1
>>     |  |-> length : 5
>>     |  |-> weight : 2
>>     |
>>     |-> sample2
>>     | etc.
>
>
> This would put me back in your camp of wanting to
> have 'cm' and 'kg' be types.
>
> But I'm also worried about having multiple
> metrics of the same sort, say, volume:
>
> my_dataset
> |-> properties
> |  |-> corn_metric : bushels
> |  |-> wheat_metric : pecks
> |
> |-> data
>     |-> sample1
>     |  |-> corn : 5
>     |  |-> wheat : 2
>     |
>     |-> sample2
>     | etc.
>
> Clearly the details matter and I made
> up my example out of whole cloth, it has
> nothing to do with what we're really doing
> (baboons and SNVs).  But it still seems
> like there could be potential for doubling
> the number of types just to stick a '_metric'
> on the end.  It does not feel right.
>
> (At this point I'm ready to stop thinking about
> it and get back to work.  ;-)
>
> Karl <[hidden email]>
> Free Software:  "You don't pay back, you pay forward."
>                   -- Robert A. Heinlein
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema