Proposal for storing EQ statements in Chado

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal for storing EQ statements in Chado

Cannon, Ethalinda K [COM S]

A small group of us has been working out a method for storing EQ statements in Chado. These are post-composed terms with specific syntax rules. The basic structure is:

primary entity - [quality - [secondary entity]]


Each of the three parts may themselves contain 1 or more terms:

primary en​tity: primary entity 1 [primary relationship [primary entity 2]] 

quality: quality [q​ualifier]

secondary entity: secondary entit​y1 [secondary relationship [secondary entity2]

In addition to these particular post-composed terms, we wanted a system that could potentially accommodate any post-composed syntax while still maintaining the ability to attach single-term phenotypes and traits to objects.

We have a proposed structure using the new Group Module and minor changes to existing tables. A description of our proposal is available here: http://gmod.org/wiki/Chado_Post-Composed_Phenotypes#Overview

The pages shows two options. The first is not advisable as it ties the phenotype module directly to the Group Module, but the second option seems excessively complex. It would be great to get some feedback from the Chado group.

Ethy

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for storing EQ statements in Chado

Bob MacCallum
Hi Ethy,

This looks interesting, we have come across similar issues developing a population biology database on top of Chado with the new(ish) Natural Diversity module.

When it comes to phenotypes we don't mind shoving what we can into the observable/attribute/value fields of the Phenotype table, but for other things we've needed, I guess you could call them, post-composed ontology terms, or sentences.

An example of what we need is "insecticide: 4% DDT"

which we express as a list of cvterms:

1. insecticidal substance (MIRO:10000239)
2. DDT (MIRO:10000157)
3. concentration of (PATO:0000033)
4. percent (UO:0000187)

followed by a plain text/number value "4".

Here's that example in the wild:
https://www.vectorbase.org/popbio/assay/?id=VBA0117010

You might ask "but DDT is_a insecticidal substance, so why do you need both?" and I would vaguely answer: "it's all about what we want to display on the webpage..." - we want to explicitly flag up to our users that this is an insecticide.

We find that we need these "cvterm sentences" for various different props tables (the above example is for nd_experimentprops) - and so we call them "multiprops". A "multiprop" is defined as an arbitrary length sequence of cvterms followed by an optional single plain text value.

We already have a wrapper around Bio::Chado::Schema so we figured out a way to hack these multiprops into the normal props tables, and our wrapper takes care of inserting and retrieving them.

So, I wonder if you would consider prototyping a grouped cvterm schema that wasn't just for phenotypes?  If it's possible, of course.
I guess making these grouped cvterms available for all props tables would be good, but I think there's no easy way to do it without adding a column to every prop table...

Also, do your composed terms ever include plain text - or would you put that into the phenotype.value field?

cheers,
Bob



On Thu, Jan 22, 2015 at 6:28 PM, Cannon, Ethalinda K [E CPE] <[hidden email]> wrote:

A small group of us has been working out a method for storing EQ statements in Chado. These are post-composed terms with specific syntax rules. The basic structure is:

primary entity - [quality - [secondary entity]]


Each of the three parts may themselves contain 1 or more terms:

primary en​tity: primary entity 1 [primary relationship [primary entity 2]] 

quality: quality [q​ualifier]

secondary entity: secondary entit​y1 [secondary relationship [secondary entity2]

In addition to these particular post-composed terms, we wanted a system that could potentially accommodate any post-composed syntax while still maintaining the ability to attach single-term phenotypes and traits to objects.

We have a proposed structure using the new Group Module and minor changes to existing tables. A description of our proposal is available here: http://gmod.org/wiki/Chado_Post-Composed_Phenotypes#Overview

The pages shows two options. The first is not advisable as it ties the phenotype module directly to the Group Module, but the second option seems excessively complex. It would be great to get some feedback from the Chado group.

Ethy

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for storing EQ statements in Chado

Karl O. Pinc
On Fri, 23 Jan 2015 11:53:07 +0000
Bob MacCallum <[hidden email]> wrote:

> This looks interesting, we have come across similar issues developing
> a population biology database on top of Chado with the new(ish)
> Natural Diversity module.
>
> When it comes to phenotypes we don't mind shoving what we can into the
> observable/attribute/value fields of the Phenotype table, but for
> other things we've needed, I guess you could call them, post-composed
> ontology terms, or sentences.
>
> An example of what we need is "insecticide: 4% DDT"
>
> which we express as a list of cvterms:
>
> 1. insecticidal substance (MIRO:10000239)
> 2. DDT (MIRO:10000157)
> 3. concentration of (PATO:0000033)
> 4. percent (UO:0000187)
>
> followed by a plain text/number value "4".

The would seem to relate to the "metric problem",
where users are supposed to "just know" the
unit of measurement of a property value.

I've not been following, but it'd be nice if
some solution arose.



Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for storing EQ statements in Chado

Cannon, Ethalinda K [COM S]
In reply to this post by Bob MacCallum

Hi Bob,


Thank you! We've been working on this from the fairly narrow perspective of EQ statements for plant phenotypes, so it's great to get an example well out of that realm.


Regarding text values: yes, I do have a vague notion that the value field in the phenotype table would handle this in cases where a trait is being represented, but I haven't worked through examples to see if this is feasible, or if there may be cases  where a text value is needed in the middle of a statement. Because the end goal is to be able to search and compute these statements, it seems that inserting text terms could thwart the intent.


I did work out an alternative method for representing post-composed terms using the group module alone, then attaching the root of the group to the phenotype record. I added it to the bottom of the wiki page:

http://gmod.org/wiki/Chado_Po​​st-Composed_Phenotypes#Alternative:_Represented_directly_in_Group_Module

We are planning to start testing option 2, but the Group Module approach might work for statements like yours.


Ethy



From: [hidden email] <[hidden email]> on behalf of Bob MacCallum <[hidden email]>
Sent: Friday, January 23, 2015 5:53 AM
To: Cannon, Ethalinda K [E CPE]
Cc: [hidden email]
Subject: Re: [Gmod-schema] Proposal for storing EQ statements in Chado
 
Hi Ethy,

This looks interesting, we have come across similar issues developing a population biology database on top of Chado with the new(ish) Natural Diversity module.

When it comes to phenotypes we don't mind shoving what we can into the observable/attribute/value fields of the Phenotype table, but for other things we've needed, I guess you could call them, post-composed ontology terms, or sentences.

An example of what we need is "insecticide: 4% DDT"

which we express as a list of cvterms:

1. insecticidal substance (MIRO:10000239)
2. DDT (MIRO:10000157)
3. concentration of (PATO:0000033)
4. percent (UO:0000187)

followed by a plain text/number value "4".

Here's that example in the wild:
https://www.vectorbase.org/popbio/assay/?id=VBA0117010

You might ask "but DDT is_a insecticidal substance, so why do you need both?" and I would vaguely answer: "it's all about what we want to display on the webpage..." - we want to explicitly flag up to our users that this is an insecticide.

We find that we need these "cvterm sentences" for various different props tables (the above example is for nd_experimentprops) - and so we call them "multiprops". A "multiprop" is defined as an arbitrary length sequence of cvterms followed by an optional single plain text value.

We already have a wrapper around Bio::Chado::Schema so we figured out a way to hack these multiprops into the normal props tables, and our wrapper takes care of inserting and retrieving them.

So, I wonder if you would consider prototyping a grouped cvterm schema that wasn't just for phenotypes?  If it's possible, of course.
I guess making these grouped cvterms available for all props tables would be good, but I think there's no easy way to do it without adding a column to every prop table...

Also, do your composed terms ever include plain text - or would you put that into the phenotype.value field?

cheers,
Bob



On Thu, Jan 22, 2015 at 6:28 PM, Cannon, Ethalinda K [E CPE] <[hidden email]> wrote:

A small group of us has been working out a method for storing EQ statements in Chado. These are post-composed terms with specific syntax rules. The basic structure is:

primary entity - [quality - [secondary entity]]


Each of the three parts may themselves contain 1 or more terms:

primary en​tity: primary entity 1 [primary relationship [primary entity 2]] 

quality: quality [q​ualifier]

secondary entity: secondary entit​y1 [secondary relationship [secondary entity2]

In addition to these particular post-composed terms, we wanted a system that could potentially accommodate any post-composed syntax while still maintaining the ability to attach single-term phenotypes and traits to objects.

We have a proposed structure using the new Group Module and minor changes to existing tables. A description of our proposal is available here: http://gmod.org/wiki/Chado_Post-Composed_Phenotypes#Overview

The pages shows two options. The first is not advisable as it ties the phenotype module directly to the Group Module, but the second option seems excessively complex. It would be great to get some feedback from the Chado group.

Ethy

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for storing EQ statements in Chado

Mara Kim
In reply to this post by Bob MacCallum

Hi Bob,

How have you implemented the arbitrary length CV term chains?

On Jan 23, 2015 5:53 AM, "Bob MacCallum" <[hidden email]> wrote:
Hi Ethy,

This looks interesting, we have come across similar issues developing a population biology database on top of Chado with the new(ish) Natural Diversity module.

When it comes to phenotypes we don't mind shoving what we can into the observable/attribute/value fields of the Phenotype table, but for other things we've needed, I guess you could call them, post-composed ontology terms, or sentences.

An example of what we need is "insecticide: 4% DDT"

which we express as a list of cvterms:

1. insecticidal substance (MIRO:10000239)
2. DDT (MIRO:10000157)
3. concentration of (PATO:0000033)
4. percent (UO:0000187)

followed by a plain text/number value "4".

Here's that example in the wild:
https://www.vectorbase.org/popbio/assay/?id=VBA0117010

You might ask "but DDT is_a insecticidal substance, so why do you need both?" and I would vaguely answer: "it's all about what we want to display on the webpage..." - we want to explicitly flag up to our users that this is an insecticide.

We find that we need these "cvterm sentences" for various different props tables (the above example is for nd_experimentprops) - and so we call them "multiprops". A "multiprop" is defined as an arbitrary length sequence of cvterms followed by an optional single plain text value.

We already have a wrapper around Bio::Chado::Schema so we figured out a way to hack these multiprops into the normal props tables, and our wrapper takes care of inserting and retrieving them.

So, I wonder if you would consider prototyping a grouped cvterm schema that wasn't just for phenotypes?  If it's possible, of course.
I guess making these grouped cvterms available for all props tables would be good, but I think there's no easy way to do it without adding a column to every prop table...

Also, do your composed terms ever include plain text - or would you put that into the phenotype.value field?

cheers,
Bob



On Thu, Jan 22, 2015 at 6:28 PM, Cannon, Ethalinda K [E CPE] <[hidden email]> wrote:

A small group of us has been working out a method for storing EQ statements in Chado. These are post-composed terms with specific syntax rules. The basic structure is:

primary entity - [quality - [secondary entity]]


Each of the three parts may themselves contain 1 or more terms:

primary en​tity: primary entity 1 [primary relationship [primary entity 2]] 

quality: quality [q​ualifier]

secondary entity: secondary entit​y1 [secondary relationship [secondary entity2]

In addition to these particular post-composed terms, we wanted a system that could potentially accommodate any post-composed syntax while still maintaining the ability to attach single-term phenotypes and traits to objects.

We have a proposed structure using the new Group Module and minor changes to existing tables. A description of our proposal is available here: http://gmod.org/wiki/Chado_Post-Composed_Phenotypes#Overview

The pages shows two options. The first is not advisable as it ties the phenotype module directly to the Group Module, but the second option seems excessively complex. It would be great to get some feedback from the Chado group.

Ethy

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Proposal for storing EQ statements in Chado

Bob MacCallum
Hi Mara,

I said it was a bit of a hack, and it is...

A prop where value="," (yes, a comma, but you could use any other magic value) indicates that the following prop is part of the chain. The chain terminates with either null value or a non-magic value.

Here's the code for adding and getting these chains.
https://github.com/bobular/VBPopBio/blob/master/api/Bio-Chado-VBPopBio/lib/Bio/Chado/VBPopBio/Util/Multiprops.pm
It's highly inefficient because all the props have to be retrieved into memory even if you only wanted one of the chains (using the filter option/argument to get_multiprops

HTH
cheers,
Bob



On Mon, Jan 26, 2015 at 6:00 PM, Mara Kim <[hidden email]> wrote:

Hi Bob,

How have you implemented the arbitrary length CV term chains?

On Jan 23, 2015 5:53 AM, "Bob MacCallum" <[hidden email]> wrote:
Hi Ethy,

This looks interesting, we have come across similar issues developing a population biology database on top of Chado with the new(ish) Natural Diversity module.

When it comes to phenotypes we don't mind shoving what we can into the observable/attribute/value fields of the Phenotype table, but for other things we've needed, I guess you could call them, post-composed ontology terms, or sentences.

An example of what we need is "insecticide: 4% DDT"

which we express as a list of cvterms:

1. insecticidal substance (MIRO:10000239)
2. DDT (MIRO:10000157)
3. concentration of (PATO:0000033)
4. percent (UO:0000187)

followed by a plain text/number value "4".

Here's that example in the wild:
https://www.vectorbase.org/popbio/assay/?id=VBA0117010

You might ask "but DDT is_a insecticidal substance, so why do you need both?" and I would vaguely answer: "it's all about what we want to display on the webpage..." - we want to explicitly flag up to our users that this is an insecticide.

We find that we need these "cvterm sentences" for various different props tables (the above example is for nd_experimentprops) - and so we call them "multiprops". A "multiprop" is defined as an arbitrary length sequence of cvterms followed by an optional single plain text value.

We already have a wrapper around Bio::Chado::Schema so we figured out a way to hack these multiprops into the normal props tables, and our wrapper takes care of inserting and retrieving them.

So, I wonder if you would consider prototyping a grouped cvterm schema that wasn't just for phenotypes?  If it's possible, of course.
I guess making these grouped cvterms available for all props tables would be good, but I think there's no easy way to do it without adding a column to every prop table...

Also, do your composed terms ever include plain text - or would you put that into the phenotype.value field?

cheers,
Bob



On Thu, Jan 22, 2015 at 6:28 PM, Cannon, Ethalinda K [E CPE] <[hidden email]> wrote:

A small group of us has been working out a method for storing EQ statements in Chado. These are post-composed terms with specific syntax rules. The basic structure is:

primary entity - [quality - [secondary entity]]


Each of the three parts may themselves contain 1 or more terms:

primary en​tity: primary entity 1 [primary relationship [primary entity 2]] 

quality: quality [q​ualifier]

secondary entity: secondary entit​y1 [secondary relationship [secondary entity2]

In addition to these particular post-composed terms, we wanted a system that could potentially accommodate any post-composed syntax while still maintaining the ability to attach single-term phenotypes and traits to objects.

We have a proposed structure using the new Group Module and minor changes to existing tables. A description of our proposal is available here: http://gmod.org/wiki/Chado_Post-Composed_Phenotypes#Overview

The pages shows two options. The first is not advisable as it ties the phenotype module directly to the Group Module, but the second option seems excessively complex. It would be great to get some feedback from the Chado group.

Ethy

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema