Feature Lists

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Feature Lists

Ben J Woodcroft
Hi,

Is there any table or accepted best practise in Chado for storing sets
of features?

Thanks,
ben

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Feature Lists

David Emmert
Hi Ben,

Did somebody answer your question?

Chado doesn't have a single convention for grouping sets of features.  
Different kinds of feature are grouped differently.  Do you have some
examples of what you want to do that you'd be willing to share, and we
could work from those?

-Dave

On Sat, 28 Jun 2008 5:11, Ben Woodcroft wrote:

> Hi,
>
> Is there any table or accepted best practise in Chado for storing sets
> of features?
>
> Thanks,
> ben
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Feature Lists

Ben J Woodcroft
In reply to this post by Ben J Woodcroft
Hi,

Thanks for the reply - noone had responded - i was just sending using
the wrong email address, so it wasn't being accepted by the list.

I have 2 examples I can think of:
* random genes grouped by a user - something like the useful
functionality at plasmodb.org - user runs a query, then the results
get saved to lists.
* gene lists from analysis as is often published in papers - a list of
differentially expressed genes for a microarray that has been manually
inspected for instance.


Thanks,
ben


2008/7/1 Jay Sundaram <[hidden email]>:

> Ben,
>
> How are the features related?
>
> Perhaps you could insert a term like 'feature_set' into cvterm.
> Then insert one feature record per set with feature.type_id = [feature_set].
> Finally, insert records into feature_relationship with
> feature_relationship.type_id == [member_of] (one record per feature in the
> set).
> All of your related features become members of that feature_set.
>
> Alternatively: you could group the features via featureprop records.
>
> Jay
>
> Ben Woodcroft wrote:
>
>> Hi,
>>
>> Is there any table or accepted best practise in Chado for storing sets
>> of features?
>>
>> Thanks,
>> ben
>>
>> -------------------------------------------------------------------------
>> Check out the new SourceForge.net Marketplace.
>> It's the best place to buy or sell services for
>> just about anything Open Source.
>> http://sourceforge.net/services/buy/index.php
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>
>
>



--
FYI: My email addresses at unimelb, uq and gmail all redirect to the same place.

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Feature Lists

David Emmert
In reply to this post by Ben J Woodcroft

Hi Ben,

>> * random genes grouped by a user - something like the useful
>> functionality at plasmodb.org - user runs a query, then the results
>> get saved to lists.

As far as I know, we've never modeled this!  Features are typically grouped
in chado by some biological or experimental attribute.  

Off the top of my head, one solution using existing chado mechanisms might be
to use the cv module.  You could implement a record in cv with cv.name
something like "arbitrary_user_feature_group", then for implementing
particular groups, create a cvterm which identifies that group, perhaps
with cvterm.name composed of user-name and datestamp, and then feature_cvterm
links to the features in the group.  If you wanted to have control over who is
authorized to make groups, you might restrict the cvterms simply to names of
authorized users, and for implementing particular groups, create a pub
record (of type "personal communication" or whatever), and use links via the
feature_cvterm.pub_id to implement the group.   This is a novel use of
the whole feature_cvterm mechanism; I wonder what other chado developers think
of it?

>> * gene lists from analysis as is often published in papers - a list of
>> differentially expressed genes for a microarray that has been manually
>> inspected for instance.

I'm not completely sure I understand what you mean here, but wouldn't it be
the case that if you implemented the (differential) expression data
effectively, you could generate the list you want?

Best,

-Dave

>From [hidden email] Mon Jun 30 19:41:35 2008
>> To: "Jay Sundaram" <[hidden email]>, [hidden email]
>> Subject: Re: [Gmod-schema] Feature Lists
>>
>>
>> Hi,
>>
>> Thanks for the reply - noone had responded - i was just sending using
>> the wrong email address, so it wasn't being accepted by the list.
>>
>> I have 2 examples I can think of:
>> * random genes grouped by a user - something like the useful
>> functionality at plasmodb.org - user runs a query, then the results
>> get saved to lists.
>> * gene lists from analysis as is often published in papers - a list of
>> differentially expressed genes for a microarray that has been manually
>> inspected for instance.
>>
>>
>> Thanks,
>> ben
>>
>>
>> 2008/7/1 Jay Sundaram <[hidden email]>:
>> > Ben,
>> >
>> > How are the features related?
>> >
>> > Perhaps you could insert a term like 'feature_set' into cvterm.
>> > Then insert one feature record per set with feature.type_id = [feature_set].
>> > Finally, insert records into feature_relationship with
>> > feature_relationship.type_id == [member_of] (one record per feature in the
>> > set).
>> > All of your related features become members of that feature_set.
>> >
>> > Alternatively: you could group the features via featureprop records.
>> >
>> > Jay
>> >
>> > Ben Woodcroft wrote:
>> >
>> >> Hi,
>> >>
>> >> Is there any table or accepted best practise in Chado for storing sets
>> >> of features?
>> >>
>> >> Thanks,
>> >> ben
>> >>
>> >> -------------------------------------------------------------------------
>> >> Check out the new SourceForge.net Marketplace.
>> >> It's the best place to buy or sell services for
>> >> just about anything Open Source.
>> >> http://sourceforge.net/services/buy/index.php
>> >> _______________________________________________
>> >> Gmod-schema mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >>
>> >
>> >
>> >
>>
>>
>>
>> --
>> FYI: My email addresses at unimelb, uq and gmail all redirect to the same place.
>>
>> -------------------------------------------------------------------------
>> Check out the new SourceForge.net Marketplace.
>> It's the best place to buy or sell services for
>> just about anything Open Source.
>> http://sourceforge.net/services/buy/index.php
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Feature Lists

Jonathan Crabtree-4

Hi Dave,

I think it's worth noting that the method described by Jay (i.e., linking the desired set of chado features  to a new feature that represents the group, via feature_relationships of type "member_of") has been used both at JCVI(1) (formerly TIGR) and, from what I remember, at a couple of other sites too, to represent protein/polypeptide clusters (e.g., for putative groups of orthologous polypeptides).  The reason for grouping features may be different in the current instance, but I don't immediately see a reason why the same implementation couldn't be used for both protein clusters and also more generally-defined groups of features.  If there's agreement on that point then I, for one, would argue for using the implementation that has already been tested in a couple of places for the protein cluster case.  I also don't think the cvterm-based approach you described would work as well for the protein cluster case, since there is no analysiscvterm table, which you'd need in order to represent groups generated by an algorithm.

Jonathan

(1) Actually the JCVI/TIGR implementation used featureloc instead of feature_relationship, but in retrospect this turned out to be unnecessary, as no coordinate information was stored, and plus there's a natural fit IMO in using "member_of" feature_relationships to represent group membership.  The minus, of course, is that you are expanding the interpretation of "feature" to include groups of features.


On Wed, Jul 2, 2008 at 9:11 AM, David Emmert <[hidden email]> wrote:

Hi Ben,

>> * random genes grouped by a user - something like the useful
>> functionality at plasmodb.org - user runs a query, then the results
>> get saved to lists.

As far as I know, we've never modeled this!  Features are typically grouped
in chado by some biological or experimental attribute.

Off the top of my head, one solution using existing chado mechanisms might be
to use the cv module.  You could implement a record in cv with cv.name
something like "arbitrary_user_feature_group", then for implementing
particular groups, create a cvterm which identifies that group, perhaps
with cvterm.name composed of user-name and datestamp, and then feature_cvterm
links to the features in the group.  If you wanted to have control over who is
authorized to make groups, you might restrict the cvterms simply to names of
authorized users, and for implementing particular groups, create a pub
record (of type "personal communication" or whatever), and use links via the
feature_cvterm.pub_id to implement the group.   This is a novel use of
the whole feature_cvterm mechanism; I wonder what other chado developers think
of it?

>> * gene lists from analysis as is often published in papers - a list of
>> differentially expressed genes for a microarray that has been manually
>> inspected for instance.

I'm not completely sure I understand what you mean here, but wouldn't it be
the case that if you implemented the (differential) expression data
effectively, you could generate the list you want?

Best,

-Dave

>From [hidden email] Mon Jun 30 19:41:35 2008
>> To: "Jay Sundaram" <[hidden email]>, [hidden email]
>> Subject: Re: [Gmod-schema] Feature Lists
>>
>>
>> Hi,
>>
>> Thanks for the reply - noone had responded - i was just sending using
>> the wrong email address, so it wasn't being accepted by the list.
>>
>> I have 2 examples I can think of:
>> * random genes grouped by a user - something like the useful
>> functionality at plasmodb.org - user runs a query, then the results
>> get saved to lists.
>> * gene lists from analysis as is often published in papers - a list of
>> differentially expressed genes for a microarray that has been manually
>> inspected for instance.
>>
>>
>> Thanks,
>> ben
>>
>>
>> 2008/7/1 Jay Sundaram <[hidden email]>:
>> > Ben,
>> >
>> > How are the features related?
>> >
>> > Perhaps you could insert a term like 'feature_set' into cvterm.
>> > Then insert one feature record per set with feature.type_id = [feature_set].
>> > Finally, insert records into feature_relationship with
>> > feature_relationship.type_id == [member_of] (one record per feature in the
>> > set).
>> > All of your related features become members of that feature_set.
>> >
>> > Alternatively: you could group the features via featureprop records.
>> >
>> > Jay
>> >
>> > Ben Woodcroft wrote:
>> >
>> >> Hi,
>> >>
>> >> Is there any table or accepted best practise in Chado for storing sets
>> >> of features?
>> >>
>> >> Thanks,
>> >> ben
>> >>
>> >> -------------------------------------------------------------------------
>> >> Check out the new SourceForge.net Marketplace.
>> >> It's the best place to buy or sell services for
>> >> just about anything Open Source.
>> >> http://sourceforge.net/services/buy/index.php
>> >> _______________________________________________
>> >> Gmod-schema mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >>
>> >
>> >
>> >
>>
>>
>>
>> --
>> FYI: My email addresses at unimelb, uq and gmail all redirect to the same place.
>>
>> -------------------------------------------------------------------------
>> Check out the new SourceForge.net Marketplace.
>> It's the best place to buy or sell services for
>> just about anything Open Source.
>> http://sourceforge.net/services/buy/index.php
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Feature Lists

David Emmert
In reply to this post by Ben J Woodcroft
Hi Jonathan,

Yes, I'm aware that there are groups who are using grouping features and
feature_relationships (or featurelocs!) to implement protein clusters.  I
just recently had a look at a chado instance of P-POD, which is doing  
similarly.

As we originally designed chado, we defined a "feature" as a biological
sequence or something which could be localized on a biological sequence.  
The ortholog grouping-features you mention are a departure from this, but
they can slide because at least they're a sort of abstraction of valid
features.

If I'm understanding Ben's description of arbitrary groupings of features, it
doesn't sound like the grouping-feature would necessarily have any biological
relevance whatsoever.   Wouldn't this utterly break the definition of
"feature" as we've defined it in chado?

If we wanted to, we could implement pretty much any data as features linked
by feature_relationships.  I've often threatened to implement my address book
in chado, just to show that we weren't kidding when we said its generic.   But
in seriousness I think we need to exercise some discipline in what we allow to
go into the table, or the concept of "feature" is going to get horribly
muddled.  This is why I suggested the cvterm / feature_cvterm approach for
the arbitrary grouping of features.

Does that make sense to you?

-Dave


>From [hidden email] Wed Jul  2 11:55:24 2008
>> To: "David Emmert" <[hidden email]>
>> Subject: Re: [Gmod-schema] Feature Lists
>> Cc: [hidden email], [hidden email],
>>         [hidden email]
>>
>> Hi Dave,
>>
>> I think it's worth noting that the method described by Jay (i.e., linking
>> the desired set of chado features  to a new feature that represents the
>> group, via feature_relationships of type "member_of") has been used both at
>> JCVI(1) (formerly TIGR) and, from what I remember, at a couple of other
>> sites too, to represent protein/polypeptide clusters (e.g., for putative
>> groups of orthologous polypeptides).  The reason for grouping features may
>> be different in the current instance, but I don't immediately see a reason
>> why the same implementation couldn't be used for both protein clusters and
>> also more generally-defined groups of features.  If there's agreement on
>> that point then I, for one, would argue for using the implementation that
>> has already been tested in a couple of places for the protein cluster case.
>> I also don't think the cvterm-based approach you described would work as
>> well for the protein cluster case, since there is no analysiscvterm table,
>> which you'd need in order to represent groups generated by an algorithm.
>>
>> Jonathan
>>
>> (1) Actually the JCVI/TIGR implementation used featureloc instead of
>> feature_relationship, but in retrospect this turned out to be unnecessary,
>> as no coordinate information was stored, and plus there's a natural fit IMO
>> in using "member_of" feature_relationships to represent group membership.
>> The minus, of course, is that you are expanding the interpretation of
>> "feature" to include groups of features.
>>
>>
>> On Wed, Jul 2, 2008 at 9:11 AM, David Emmert <[hidden email]>
>> wrote:
>>
>> >
>> > Hi Ben,
>> >
>> > >> * random genes grouped by a user - something like the useful
>> > >> functionality at plasmodb.org - user runs a query, then the results
>> > >> get saved to lists.
>> >
>> > As far as I know, we've never modeled this!  Features are typically grouped
>> > in chado by some biological or experimental attribute.
>> >
>> > Off the top of my head, one solution using existing chado mechanisms might
>> > be
>> > to use the cv module.  You could implement a record in cv with cv.name
>> > something like "arbitrary_user_feature_group", then for implementing
>> > particular groups, create a cvterm which identifies that group, perhaps
>> > with cvterm.name composed of user-name and datestamp, and then
>> > feature_cvterm
>> > links to the features in the group.  If you wanted to have control over who
>> > is
>> > authorized to make groups, you might restrict the cvterms simply to names
>> > of
>> > authorized users, and for implementing particular groups, create a pub
>> > record (of type "personal communication" or whatever), and use links via
>> > the
>> > feature_cvterm.pub_id to implement the group.   This is a novel use of
>> > the whole feature_cvterm mechanism; I wonder what other chado developers
>> > think
>> > of it?
>> >
>> > >> * gene lists from analysis as is often published in papers - a list of
>> > >> differentially expressed genes for a microarray that has been manually
>> > >> inspected for instance.
>> >
>> > I'm not completely sure I understand what you mean here, but wouldn't it be
>> > the case that if you implemented the (differential) expression data
>> > effectively, you could generate the list you want?
>> >
>> > Best,
>> >
>> > -Dave
>> >
>> > >From [hidden email] Mon Jun 30 19:41:35 2008
>> > >> To: "Jay Sundaram" <[hidden email]>,
>> > [hidden email]
>> > >> Subject: Re: [Gmod-schema] Feature Lists
>> > >>
>> > >>
>> > >> Hi,
>> > >>
>> > >> Thanks for the reply - noone had responded - i was just sending using
>> > >> the wrong email address, so it wasn't being accepted by the list.
>> > >>
>> > >> I have 2 examples I can think of:
>> > >> * random genes grouped by a user - something like the useful
>> > >> functionality at plasmodb.org - user runs a query, then the results
>> > >> get saved to lists.
>> > >> * gene lists from analysis as is often published in papers - a list of
>> > >> differentially expressed genes for a microarray that has been manually
>> > >> inspected for instance.
>> > >>
>> > >>
>> > >> Thanks,
>> > >> ben
>> > >>
>> > >>
>> > >> 2008/7/1 Jay Sundaram <[hidden email]>:
>> > >> > Ben,
>> > >> >
>> > >> > How are the features related?
>> > >> >
>> > >> > Perhaps you could insert a term like 'feature_set' into cvterm.
>> > >> > Then insert one feature record per set with feature.type_id =
>> > [feature_set].
>> > >> > Finally, insert records into feature_relationship with
>> > >> > feature_relationship.type_id == [member_of] (one record per feature in
>> > the
>> > >> > set).
>> > >> > All of your related features become members of that feature_set.
>> > >> >
>> > >> > Alternatively: you could group the features via featureprop records.
>> > >> >
>> > >> > Jay
>> > >> >
>> > >> > Ben Woodcroft wrote:
>> > >> >
>> > >> >> Hi,
>> > >> >>
>> > >> >> Is there any table or accepted best practise in Chado for storing
>> > sets
>> > >> >> of features?
>> > >> >>
>> > >> >> Thanks,
>> > >> >> ben
>> > >> >>
>> > >> >>
>> > -------------------------------------------------------------------------
>> > >> >> Check out the new SourceForge.net Marketplace.
>> > >> >> It's the best place to buy or sell services for
>> > >> >> just about anything Open Source.
>> > >> >> http://sourceforge.net/services/buy/index.php
>> > >> >> _______________________________________________
>> > >> >> Gmod-schema mailing list
>> > >> >> [hidden email]
>> > >> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> FYI: My email addresses at unimelb, uq and gmail all redirect to the
>> > same place.
>> > >>
>> > >>
>> > -------------------------------------------------------------------------
>> > >> Check out the new SourceForge.net Marketplace.
>> > >> It's the best place to buy or sell services for
>> > >> just about anything Open Source.
>> > >> http://sourceforge.net/services/buy/index.php
>> > >> _______________________________________________
>> > >> Gmod-schema mailing list
>> > >> [hidden email]
>> > >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> > >>
>> > >>
>> >
>> > -------------------------------------------------------------------------
>> > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
>> > Studies have shown that voting for your favorite open source project,
>> > along with a healthy diet, reduces your potential for chronic lameness
>> > and boredom. Vote Now at http://www.sourceforge.net/community/cca08
>> > _______________________________________________
>> > Gmod-schema mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Feature Lists

Jonathan Crabtree-4

Hi Dave,

I agree with you completely that some discipline is needed to prevent the schema from becoming a complete free-for-all.  Where I think we disagree is on the question of what types of usage would violate (or have violated) the original definition of "feature."  Your position, if I understand it correctly, is that:

1. Representing algorithmically-derived protein clusters as features is a "departure" from the original definition of feature but is acceptable "because at least they're a sort of abstraction of valid features."
2. Representing arbitrary groups of features as a feature would "utterly break" the definition of "feature", because such groups of features need not have any biological relevance.

So it sounds as though your principle objection (to using the feature table to represent arbitrary feature groups) is the (potential) lack of biological relevance in the proposed user-defined groups of features.  My (current) view, on the other hand, is that:

1. Representing algorithmically-derived protein clusters as features breaks the original definition of feature.
2. Representing arbitrary groups of features as features breaks the original definition of feature.
3. Given that the cat (1.) is already out of the bag, so to speak, why not also allow 2.?

In other words, I see the exception currently granted for protein clusters as implicitly broadening the definition of feature from "biological sequence or something localizable to a biological sequence" to "biological sequence or something localizable to a biological sequence OR a group of such sequences and/or features".  I don't think it's a necessary condition for the group itself to have biological relevance.  (I have certainly seen automatically-generated protein clusters in the past whose biological relevance was questionable at best!)

Or, to put it another way, if a scientist comes along and says "Hey, I think this group of genes is worth looking at right now because of X,Y, and Z" then shouldn't that in and of itself constitute sufficient biological relevance?  (I believe this is the use-case under consideration, with the caveat that the scientist won't necessarily tell you the "X,Y, and Z" part.)

If that's not a sufficiently compelling argument, then consider the following questions:

1. How do you propose to define "a sort of abstraction of valid feature" in a sufficiently rigorous manner to allow future chado users to tell which departures from the definition of "feature" are acceptable and which utterly break the definition?
2. If biological relevance is the crucial criterion that determines whether a group of features is allowed to be represented as a feature via feature_relationship, then doesn't this imply that _some_ user-defined feature groups should be represented as features and some should be represented via the cvterm mechanism you described?  What if we required the end-user to check a box in the user interface that said "I certify that this group of genes has biological relevance."?  Would you then allow it to be stored as a feature, but make it a cvterm if the user failed to check the box?

I'm taking things to the logical extreme here, of course, but I'm trying to make the point that--when applied to groups of biological sequences or features--the term "biological relevance" is something that different people are liable to disagree over.  To me at least, the crucial question here is simply "How should we represent groups of features in chado?" and given that a protein cluster is a special case of a group of features, I would prefer that there be a single answer to this question.  What do you think?  Am I misconstruing what you mean by biological relevance somehow?

Jonathan


On Wed, Jul 2, 2008 at 2:19 PM, David Emmert <[hidden email]> wrote:
Hi Jonathan,

Yes, I'm aware that there are groups who are using grouping features and
feature_relationships (or featurelocs!) to implement protein clusters.  I
just recently had a look at a chado instance of P-POD, which is doing
similarly.

As we originally designed chado, we defined a "feature" as a biological
sequence or something which could be localized on a biological sequence.
The ortholog grouping-features you mention are a departure from this, but
they can slide because at least they're a sort of abstraction of valid
features.

If I'm understanding Ben's description of arbitrary groupings of features, it
doesn't sound like the grouping-feature would necessarily have any biological
relevance whatsoever.   Wouldn't this utterly break the definition of
"feature" as we've defined it in chado?

If we wanted to, we could implement pretty much any data as features linked
by feature_relationships.  I've often threatened to implement my address book
in chado, just to show that we weren't kidding when we said its generic.   But
in seriousness I think we need to exercise some discipline in what we allow to
go into the table, or the concept of "feature" is going to get horribly
muddled.  This is why I suggested the cvterm / feature_cvterm approach for
the arbitrary grouping of features.

Does that make sense to you?

-Dave


>From [hidden email] Wed Jul  2 11:55:24 2008
>> To: "David Emmert" <[hidden email]>
>> Subject: Re: [Gmod-schema] Feature Lists
>> Cc: [hidden email], [hidden email],
>>         [hidden email]
>>
>> Hi Dave,
>>
>> I think it's worth noting that the method described by Jay (i.e., linking
>> the desired set of chado features  to a new feature that represents the
>> group, via feature_relationships of type "member_of") has been used both at
>> JCVI(1) (formerly TIGR) and, from what I remember, at a couple of other
>> sites too, to represent protein/polypeptide clusters (e.g., for putative
>> groups of orthologous polypeptides).  The reason for grouping features may
>> be different in the current instance, but I don't immediately see a reason
>> why the same implementation couldn't be used for both protein clusters and
>> also more generally-defined groups of features.  If there's agreement on
>> that point then I, for one, would argue for using the implementation that
>> has already been tested in a couple of places for the protein cluster case.
>> I also don't think the cvterm-based approach you described would work as
>> well for the protein cluster case, since there is no analysiscvterm table,
>> which you'd need in order to represent groups generated by an algorithm.
>>
>> Jonathan
>>
>> (1) Actually the JCVI/TIGR implementation used featureloc instead of
>> feature_relationship, but in retrospect this turned out to be unnecessary,
>> as no coordinate information was stored, and plus there's a natural fit IMO
>> in using "member_of" feature_relationships to represent group membership.
>> The minus, of course, is that you are expanding the interpretation of
>> "feature" to include groups of features.
>>
>>
>> On Wed, Jul 2, 2008 at 9:11 AM, David Emmert <[hidden email]>
>> wrote:
>>
>> >
>> > Hi Ben,
>> >
>> > >> * random genes grouped by a user - something like the useful
>> > >> functionality at plasmodb.org - user runs a query, then the results
>> > >> get saved to lists.
>> >
>> > As far as I know, we've never modeled this!  Features are typically grouped
>> > in chado by some biological or experimental attribute.
>> >
>> > Off the top of my head, one solution using existing chado mechanisms might
>> > be
>> > to use the cv module.  You could implement a record in cv with cv.name
>> > something like "arbitrary_user_feature_group", then for implementing
>> > particular groups, create a cvterm which identifies that group, perhaps
>> > with cvterm.name composed of user-name and datestamp, and then
>> > feature_cvterm
>> > links to the features in the group.  If you wanted to have control over who
>> > is
>> > authorized to make groups, you might restrict the cvterms simply to names
>> > of
>> > authorized users, and for implementing particular groups, create a pub
>> > record (of type "personal communication" or whatever), and use links via
>> > the
>> > feature_cvterm.pub_id to implement the group.   This is a novel use of
>> > the whole feature_cvterm mechanism; I wonder what other chado developers
>> > think
>> > of it?
>> >
>> > >> * gene lists from analysis as is often published in papers - a list of
>> > >> differentially expressed genes for a microarray that has been manually
>> > >> inspected for instance.
>> >
>> > I'm not completely sure I understand what you mean here, but wouldn't it be
>> > the case that if you implemented the (differential) expression data
>> > effectively, you could generate the list you want?
>> >
>> > Best,
>> >
>> > -Dave
>> >
>> > >From [hidden email] Mon Jun 30 19:41:35 2008
>> > >> To: "Jay Sundaram" <[hidden email]>,
>> > [hidden email]
>> > >> Subject: Re: [Gmod-schema] Feature Lists
>> > >>
>> > >>
>> > >> Hi,
>> > >>
>> > >> Thanks for the reply - noone had responded - i was just sending using
>> > >> the wrong email address, so it wasn't being accepted by the list.
>> > >>
>> > >> I have 2 examples I can think of:
>> > >> * random genes grouped by a user - something like the useful
>> > >> functionality at plasmodb.org - user runs a query, then the results
>> > >> get saved to lists.
>> > >> * gene lists from analysis as is often published in papers - a list of
>> > >> differentially expressed genes for a microarray that has been manually
>> > >> inspected for instance.
>> > >>
>> > >>
>> > >> Thanks,
>> > >> ben
>> > >>
>> > >>
>> > >> 2008/7/1 Jay Sundaram <[hidden email]>:
>> > >> > Ben,
>> > >> >
>> > >> > How are the features related?
>> > >> >
>> > >> > Perhaps you could insert a term like 'feature_set' into cvterm.
>> > >> > Then insert one feature record per set with feature.type_id =
>> > [feature_set].
>> > >> > Finally, insert records into feature_relationship with
>> > >> > feature_relationship.type_id == [member_of] (one record per feature in
>> > the
>> > >> > set).
>> > >> > All of your related features become members of that feature_set.
>> > >> >
>> > >> > Alternatively: you could group the features via featureprop records.
>> > >> >
>> > >> > Jay
>> > >> >
>> > >> > Ben Woodcroft wrote:
>> > >> >
>> > >> >> Hi,
>> > >> >>
>> > >> >> Is there any table or accepted best practise in Chado for storing
>> > sets
>> > >> >> of features?
>> > >> >>
>> > >> >> Thanks,
>> > >> >> ben
>> > >> >>
>> > >> >>
>> > -------------------------------------------------------------------------
>> > >> >> Check out the new SourceForge.net Marketplace.
>> > >> >> It's the best place to buy or sell services for
>> > >> >> just about anything Open Source.
>> > >> >> http://sourceforge.net/services/buy/index.php
>> > >> >> _______________________________________________
>> > >> >> Gmod-schema mailing list
>> > >> >> [hidden email]
>> > >> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> FYI: My email addresses at unimelb, uq and gmail all redirect to the
>> > same place.
>> > >>
>> > >>
>> > -------------------------------------------------------------------------
>> > >> Check out the new SourceForge.net Marketplace.
>> > >> It's the best place to buy or sell services for
>> > >> just about anything Open Source.
>> > >> http://sourceforge.net/services/buy/index.php
>> > >> _______________________________________________
>> > >> Gmod-schema mailing list
>> > >> [hidden email]
>> > >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> > >>
>> > >>
>> >
>> > -------------------------------------------------------------------------
>> > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
>> > Studies have shown that voting for your favorite open source project,
>> > along with a healthy diet, reduces your potential for chronic lameness
>> > and boredom. Vote Now at http://www.sourceforge.net/community/cca08
>> > _______________________________________________
>> > Gmod-schema mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >


-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Feature Lists

Don Gilbert-2-3
In reply to this post by Ben J Woodcroft
Dave and Jonathan,

This is certainly an interesting discussion.  One operational
definition for genome databases of biological relevance of a feature
would be that the feature type exist in Sequence Ontology. Yes?
Try this one for your bag of interesting genes:

name: gene_group
def: "A collection of related genes." [SO:ma]
subset: SOFA
is_a: SO:0000001 ! region

- Don

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Feature Lists

David Emmert
In reply to this post by Ben J Woodcroft
Jonathan, Don,

I can't dispute Jonathan's assertion that the cats out of the bag, alas, and
Don's point is a good one.  Without conceding that every grouping of features,
no matter the nature of the grouping, however un-biological, is legitimately
implemented using a feature, I guess we should leave it to Ben to decide how
he wants to implement his data.  He certainly knows some of the issues now!

Many thanks for this thoughtful discussion.  I find this sort of exchange
very useful and hope you did too.

I'd love to hear what you end up doing, Ben.

Best,

-Dave


>From [hidden email] Wed Jul  2 15:57:27 2008
>> To: [hidden email], [hidden email]
>> Cc: [hidden email]
>> Subject: Re: [Gmod-schema] Feature Lists
>>
>> Dave and Jonathan,
>>
>> This is certainly an interesting discussion.  One operational
>> definition for genome databases of biological relevance of a feature
>> would be that the feature type exist in Sequence Ontology. Yes?
>> Try this one for your bag of interesting genes:
>>
>> name: gene_group
>> def: "A collection of related genes." [SO:ma]
>> subset: SOFA
>> is_a: SO:0000001 ! region
>>
>> - Don
>>
>> -------------------------------------------------------------------------
>> Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
>> Studies have shown that voting for your favorite open source project,
>> along with a healthy diet, reduces your potential for chronic lameness
>> and boredom. Vote Now at http://www.sourceforge.net/community/cca08
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema