Question on Chado Group Module

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Question on Chado Group Module

David Emmert
Hi gmod/schema,

FlyBase is preparing to implement gene group curations in our chado instance.   We've been having a look at the chado group module, and are wondering if anyone has actually implemented data in it, or if it is still under development.   Like all of the other chado modules, there's a lot of room for variable implementations in the group module, and if anyone has implemented data similar to ours, we'd be interested in seeing if we can follow similar conventions.

In case it helps, here's a description of our aims for gene group curation (from our curators):

Aim/motivation:
To group genes that are acknowledged in the published literature (typically a review) to share biological and/or functional attributes (e.g. genes encoding actins, GPCR receptors, or ribosomal proteins), and present the membership together with relevant associated data and links on the FlyBase website.

Genomic-scale, meta-analyses that define 'gene groups' are increasingly described in the literature, but these haven't been captured  as such in FlyBase to date.  Doing so now therefore reflects community activity, and will provide an intuitive way for FlyBase users (particularly non-fly researchers) to find all members of a particular group and to easily access their associated data.


Data associated with each group:
- Name, symbol, ID, synonyms
- type of group (CV, including 'functional group', 'protein complex group', 'gene array'...)
- list of members (= genes) of group
- description of group (free text)
- key GO terms associated with group
- parent (supergroup) relationships are defined; child (subgroup) relationships are derived
- internal links to related datasets in FlyBase
- external links to orthologous groups (currently: HGNC, WormBase, TAIR)
- external links to other related websites/databases
- free text comments
- date of last internal review


Similar existing resources:
- HGNC 'Gene Families/Groupings':  http://www.genenames.org/genefamilies/a-z


Good to know:
- The number of members in a 'gene group' can range from 1-200, though the mean is ~10.
- We are currently only curating clearly defined gene groups based on function, and will be for the foreseeable future.  However, it's possible that gene groups reflecting signalling/metabolic pathways (e.g. WNT signalling), or biological processes (e.g. genes involved in gastrulation) may be curated in future.  This will be harder as such gene groups are less well defined/agreed upon, and additional database fields will likely be required to accurately capture these concepts.
- We are currently only curating D. melanogaster groups, and will be for the foreseeable future.  However, it's conceivable that other non-melanogaster groups may be curated in this way by in future FlyBase projects.  Should this happen, it may be desirable to be able to group genes across multiple species (e.g. actin genes from all Drosophila species).

Best,

-Dave

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Mara Kim-3
I implemented the consensus of our previous discussions here:

There is a PNG/DIA describing the module structure, as well as a working SQL implementation.  Note that this implementation depends on module specific linker tables (see the bottom of the SQL file).  The linker tables for 'feature' and 'organism' are already provided.

On Tue, Sep 23, 2014 at 2:19 PM, David Emmert <[hidden email]> wrote:
Hi gmod/schema,

FlyBase is preparing to implement gene group curations in our chado instance.   We've been having a look at the chado group module, and are wondering if anyone has actually implemented data in it, or if it is still under development.   Like all of the other chado modules, there's a lot of room for variable implementations in the group module, and if anyone has implemented data similar to ours, we'd be interested in seeing if we can follow similar conventions.

In case it helps, here's a description of our aims for gene group curation (from our curators):

Aim/motivation:
To group genes that are acknowledged in the published literature (typically a review) to share biological and/or functional attributes (e.g. genes encoding actins, GPCR receptors, or ribosomal proteins), and present the membership together with relevant associated data and links on the FlyBase website.

Genomic-scale, meta-analyses that define 'gene groups' are increasingly described in the literature, but these haven't been captured  as such in FlyBase to date.  Doing so now therefore reflects community activity, and will provide an intuitive way for FlyBase users (particularly non-fly researchers) to find all members of a particular group and to easily access their associated data.


Data associated with each group:
- Name, symbol, ID, synonyms
- type of group (CV, including 'functional group', 'protein complex group', 'gene array'...)
- list of members (= genes) of group
- description of group (free text)
- key GO terms associated with group
- parent (supergroup) relationships are defined; child (subgroup) relationships are derived
- internal links to related datasets in FlyBase
- external links to orthologous groups (currently: HGNC, WormBase, TAIR)
- external links to other related websites/databases
- free text comments
- date of last internal review


Similar existing resources:
- HGNC 'Gene Families/Groupings':  http://www.genenames.org/genefamilies/a-z


Good to know:
- The number of members in a 'gene group' can range from 1-200, though the mean is ~10.
- We are currently only curating clearly defined gene groups based on function, and will be for the foreseeable future.  However, it's possible that gene groups reflecting signalling/metabolic pathways (e.g. WNT signalling), or biological processes (e.g. genes involved in gastrulation) may be curated in future.  This will be harder as such gene groups are less well defined/agreed upon, and additional database fields will likely be required to accurately capture these concepts.
- We are currently only curating D. melanogaster groups, and will be for the foreseeable future.  However, it's conceivable that other non-melanogaster groups may be curated in this way by in future FlyBase projects.  Should this happen, it may be desirable to be able to group genes across multiple species (e.g. actin genes from all Drosophila species).

Best,

-Dave

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Mara Kim-3
David,

I looks like this implementation should satisfy most of your needs, except for the need for a 'grp_synonym' and the addition of 'comment' and 'date' columns to the 'grp' table.  These seem pretty reasonable additions to me.  Let me know what you think.

On Tue, Sep 23, 2014 at 2:48 PM, Mara Kim <[hidden email]> wrote:
I implemented the consensus of our previous discussions here:

There is a PNG/DIA describing the module structure, as well as a working SQL implementation.  Note that this implementation depends on module specific linker tables (see the bottom of the SQL file).  The linker tables for 'feature' and 'organism' are already provided.

On Tue, Sep 23, 2014 at 2:19 PM, David Emmert <[hidden email]> wrote:
Hi gmod/schema,

FlyBase is preparing to implement gene group curations in our chado instance.   We've been having a look at the chado group module, and are wondering if anyone has actually implemented data in it, or if it is still under development.   Like all of the other chado modules, there's a lot of room for variable implementations in the group module, and if anyone has implemented data similar to ours, we'd be interested in seeing if we can follow similar conventions.

In case it helps, here's a description of our aims for gene group curation (from our curators):

Aim/motivation:
To group genes that are acknowledged in the published literature (typically a review) to share biological and/or functional attributes (e.g. genes encoding actins, GPCR receptors, or ribosomal proteins), and present the membership together with relevant associated data and links on the FlyBase website.

Genomic-scale, meta-analyses that define 'gene groups' are increasingly described in the literature, but these haven't been captured  as such in FlyBase to date.  Doing so now therefore reflects community activity, and will provide an intuitive way for FlyBase users (particularly non-fly researchers) to find all members of a particular group and to easily access their associated data.


Data associated with each group:
- Name, symbol, ID, synonyms
- type of group (CV, including 'functional group', 'protein complex group', 'gene array'...)
- list of members (= genes) of group
- description of group (free text)
- key GO terms associated with group
- parent (supergroup) relationships are defined; child (subgroup) relationships are derived
- internal links to related datasets in FlyBase
- external links to orthologous groups (currently: HGNC, WormBase, TAIR)
- external links to other related websites/databases
- free text comments
- date of last internal review


Similar existing resources:
- HGNC 'Gene Families/Groupings':  http://www.genenames.org/genefamilies/a-z


Good to know:
- The number of members in a 'gene group' can range from 1-200, though the mean is ~10.
- We are currently only curating clearly defined gene groups based on function, and will be for the foreseeable future.  However, it's possible that gene groups reflecting signalling/metabolic pathways (e.g. WNT signalling), or biological processes (e.g. genes involved in gastrulation) may be curated in future.  This will be harder as such gene groups are less well defined/agreed upon, and additional database fields will likely be required to accurately capture these concepts.
- We are currently only curating D. melanogaster groups, and will be for the foreseeable future.  However, it's conceivable that other non-melanogaster groups may be curated in this way by in future FlyBase projects.  Should this happen, it may be desirable to be able to group genes across multiple species (e.g. actin genes from all Drosophila species).

Best,

-Dave

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Kathleen Falls-2
Mara,

Thanks for your comments. Actually it would be helpful if the grp
table could have the column is_obsolete (boolean, not null default
false) similar to many other Chado tables. The 'comment' and 'date'
can be stored in grpprop as type/value/rank so no need to add them to
the grp table.
FlyBase makes use of synonyms for our curated data so we would add a
grp_synonym table similar to the feature_synonym and library_synonym
tables.

While we at FlyBase have some idea of how the GMOD grp/grpmember
module would fit our curation needs we were actually hoping for
use-cases from others to see if we were on the right track.
Is this something you or others would be willing to share?


Cheers,
Kathleen

On Tue, Sep 23, 2014 at 3:54 PM, Mara Kim <[hidden email]> wrote:

> David,
>
> I looks like this implementation should satisfy most of your needs, except
> for the need for a 'grp_synonym' and the addition of 'comment' and 'date'
> columns to the 'grp' table.  These seem pretty reasonable additions to me.
> Let me know what you think.
>
> On Tue, Sep 23, 2014 at 2:48 PM, Mara Kim <[hidden email]> wrote:
>>
>> I implemented the consensus of our previous discussions here:
>> https://github.com/autochthe/Chado-group-module
>>
>> There is a PNG/DIA describing the module structure, as well as a working
>> SQL implementation.  Note that this implementation depends on module
>> specific linker tables (see the bottom of the SQL file).  The linker tables
>> for 'feature' and 'organism' are already provided.
>>
>> On Tue, Sep 23, 2014 at 2:19 PM, David Emmert <[hidden email]>
>> wrote:
>>>
>>> Hi gmod/schema,
>>>
>>> FlyBase is preparing to implement gene group curations in our chado
>>> instance.   We've been having a look at the chado group module, and are
>>> wondering if anyone has actually implemented data in it, or if it is still
>>> under development.   Like all of the other chado modules, there's a lot of
>>> room for variable implementations in the group module, and if anyone has
>>> implemented data similar to ours, we'd be interested in seeing if we can
>>> follow similar conventions.
>>>
>>> In case it helps, here's a description of our aims for gene group
>>> curation (from our curators):
>>>
>>> Aim/motivation:
>>> To group genes that are acknowledged in the published literature
>>> (typically a review) to share biological and/or functional attributes (e.g.
>>> genes encoding actins, GPCR receptors, or ribosomal proteins), and present
>>> the membership together with relevant associated data and links on the
>>> FlyBase website.
>>>
>>> Genomic-scale, meta-analyses that define 'gene groups' are increasingly
>>> described in the literature, but these haven't been captured  as such in
>>> FlyBase to date.  Doing so now therefore reflects community activity, and
>>> will provide an intuitive way for FlyBase users (particularly non-fly
>>> researchers) to find all members of a particular group and to easily access
>>> their associated data.
>>>
>>>
>>> Data associated with each group:
>>> - Name, symbol, ID, synonyms
>>> - type of group (CV, including 'functional group', 'protein complex
>>> group', 'gene array'...)
>>> - list of members (= genes) of group
>>> - description of group (free text)
>>> - key GO terms associated with group
>>> - parent (supergroup) relationships are defined; child (subgroup)
>>> relationships are derived
>>> - internal links to related datasets in FlyBase
>>> - external links to orthologous groups (currently: HGNC, WormBase, TAIR)
>>> - external links to other related websites/databases
>>> - free text comments
>>> - date of last internal review
>>>
>>>
>>> Similar existing resources:
>>> - HGNC 'Gene Families/Groupings':
>>> http://www.genenames.org/genefamilies/a-z
>>> - WormBase 'Gene Classes': http://www.wormbase.org/resources/gene_class/
>>> - TAIR 'Gene Families': http://www.arabidopsis.org/browse/genefamily/
>>>
>>>
>>> Good to know:
>>> - The number of members in a 'gene group' can range from 1-200, though
>>> the mean is ~10.
>>> - We are currently only curating clearly defined gene groups based on
>>> function, and will be for the foreseeable future.  However, it's possible
>>> that gene groups reflecting signalling/metabolic pathways (e.g. WNT
>>> signalling), or biological processes (e.g. genes involved in gastrulation)
>>> may be curated in future.  This will be harder as such gene groups are less
>>> well defined/agreed upon, and additional database fields will likely be
>>> required to accurately capture these concepts.
>>> - We are currently only curating D. melanogaster groups, and will be for
>>> the foreseeable future.  However, it's conceivable that other
>>> non-melanogaster groups may be curated in this way by in future FlyBase
>>> projects.  Should this happen, it may be desirable to be able to group genes
>>> across multiple species (e.g. actin genes from all Drosophila species).
>>>
>>> Best,
>>>
>>> -Dave
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
>>> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
>>> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
>>> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>
>>
>>
>> --
>> Mara Kim
>>
>> Ph.D. Candidate
>> Computational Biology
>> Vanderbilt University
>> Nashville, TN
>
>
>
>
> --
> Mara Kim
>
> Ph.D. Candidate
> Computational Biology
> Vanderbilt University
> Nashville, TN
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Stephen Ficklin-3
Hi All,

Adding to this thread, I have a question about the group module. It has
been a while since we chatted about it so forgive me if I have forgotten
some detail.   In the schema diagram
(http://gmod.org/wiki/Chado_Group_Module) there is the grpmember table
which links to other Chado tables (e.g. feature, stock, organism) via
"submodule" linker tables (e.g. organism_grpmember, feature_grpmember).  
If I remember correctly this allows us to store different data classes
in the same group.

Suppose, I am exploring a populated instance of the group module. I can
get the list of members from the grpmember table and I can classify them
by their type_id, but I don't know which table to join on to get further
information (e.g. feature_grpmember -> feature).   It seems the group
module requires a bit of human knowledge to know which tables to find
information about group members.  I do not remember the mechanism we
decided to know which table to join on?   Did we have a suggested method
or did we expect that knowledge be built into the application to know
which tables to join?  If the latter perhaps we should revisit the
schema as I think it would be best if the schema provided all
information about data relationships.

Thanks,
Stephen

On 9/24/2014 12:42 PM, Kathleen Falls wrote:

> Mara,
>
> Thanks for your comments. Actually it would be helpful if the grp
> table could have the column is_obsolete (boolean, not null default
> false) similar to many other Chado tables. The 'comment' and 'date'
> can be stored in grpprop as type/value/rank so no need to add them to
> the grp table.
> FlyBase makes use of synonyms for our curated data so we would add a
> grp_synonym table similar to the feature_synonym and library_synonym
> tables.
>
> While we at FlyBase have some idea of how the GMOD grp/grpmember
> module would fit our curation needs we were actually hoping for
> use-cases from others to see if we were on the right track.
> Is this something you or others would be willing to share?
>
>
> Cheers,
> Kathleen
>
> On Tue, Sep 23, 2014 at 3:54 PM, Mara Kim <[hidden email]> wrote:
>> David,
>>
>> I looks like this implementation should satisfy most of your needs, except
>> for the need for a 'grp_synonym' and the addition of 'comment' and 'date'
>> columns to the 'grp' table.  These seem pretty reasonable additions to me.
>> Let me know what you think.
>>
>> On Tue, Sep 23, 2014 at 2:48 PM, Mara Kim <[hidden email]> wrote:
>>> I implemented the consensus of our previous discussions here:
>>> https://github.com/autochthe/Chado-group-module
>>>
>>> There is a PNG/DIA describing the module structure, as well as a working
>>> SQL implementation.  Note that this implementation depends on module
>>> specific linker tables (see the bottom of the SQL file).  The linker tables
>>> for 'feature' and 'organism' are already provided.
>>>
>>> On Tue, Sep 23, 2014 at 2:19 PM, David Emmert <[hidden email]>
>>> wrote:
>>>> Hi gmod/schema,
>>>>
>>>> FlyBase is preparing to implement gene group curations in our chado
>>>> instance.   We've been having a look at the chado group module, and are
>>>> wondering if anyone has actually implemented data in it, or if it is still
>>>> under development.   Like all of the other chado modules, there's a lot of
>>>> room for variable implementations in the group module, and if anyone has
>>>> implemented data similar to ours, we'd be interested in seeing if we can
>>>> follow similar conventions.
>>>>
>>>> In case it helps, here's a description of our aims for gene group
>>>> curation (from our curators):
>>>>
>>>> Aim/motivation:
>>>> To group genes that are acknowledged in the published literature
>>>> (typically a review) to share biological and/or functional attributes (e.g.
>>>> genes encoding actins, GPCR receptors, or ribosomal proteins), and present
>>>> the membership together with relevant associated data and links on the
>>>> FlyBase website.
>>>>
>>>> Genomic-scale, meta-analyses that define 'gene groups' are increasingly
>>>> described in the literature, but these haven't been captured  as such in
>>>> FlyBase to date.  Doing so now therefore reflects community activity, and
>>>> will provide an intuitive way for FlyBase users (particularly non-fly
>>>> researchers) to find all members of a particular group and to easily access
>>>> their associated data.
>>>>
>>>>
>>>> Data associated with each group:
>>>> - Name, symbol, ID, synonyms
>>>> - type of group (CV, including 'functional group', 'protein complex
>>>> group', 'gene array'...)
>>>> - list of members (= genes) of group
>>>> - description of group (free text)
>>>> - key GO terms associated with group
>>>> - parent (supergroup) relationships are defined; child (subgroup)
>>>> relationships are derived
>>>> - internal links to related datasets in FlyBase
>>>> - external links to orthologous groups (currently: HGNC, WormBase, TAIR)
>>>> - external links to other related websites/databases
>>>> - free text comments
>>>> - date of last internal review
>>>>
>>>>
>>>> Similar existing resources:
>>>> - HGNC 'Gene Families/Groupings':
>>>> http://www.genenames.org/genefamilies/a-z
>>>> - WormBase 'Gene Classes': http://www.wormbase.org/resources/gene_class/
>>>> - TAIR 'Gene Families': http://www.arabidopsis.org/browse/genefamily/
>>>>
>>>>
>>>> Good to know:
>>>> - The number of members in a 'gene group' can range from 1-200, though
>>>> the mean is ~10.
>>>> - We are currently only curating clearly defined gene groups based on
>>>> function, and will be for the foreseeable future.  However, it's possible
>>>> that gene groups reflecting signalling/metabolic pathways (e.g. WNT
>>>> signalling), or biological processes (e.g. genes involved in gastrulation)
>>>> may be curated in future.  This will be harder as such gene groups are less
>>>> well defined/agreed upon, and additional database fields will likely be
>>>> required to accurately capture these concepts.
>>>> - We are currently only curating D. melanogaster groups, and will be for
>>>> the foreseeable future.  However, it's conceivable that other
>>>> non-melanogaster groups may be curated in this way by in future FlyBase
>>>> projects.  Should this happen, it may be desirable to be able to group genes
>>>> across multiple species (e.g. actin genes from all Drosophila species).
>>>>
>>>> Best,
>>>>
>>>> -Dave
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
>>>> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
>>>> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
>>>> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>
>>>
>>> --
>>> Mara Kim
>>>
>>> Ph.D. Candidate
>>> Computational Biology
>>> Vanderbilt University
>>> Nashville, TN
>>
>>
>>
>> --
>> Mara Kim
>>
>> Ph.D. Candidate
>> Computational Biology
>> Vanderbilt University
>> Nashville, TN
>>
>> ------------------------------------------------------------------------------
>> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
>> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
>> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
>> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Karl O. Pinc
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Mara Kim-2
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Mara Kim-2
I'm sorry, there is an error:

Then in each linker table, make a `linked_table` column with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.


On Thu, Dec 4, 2014 at 2:10 PM, Mara Kim <[hidden email]> wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Stephen Ficklin-3
In reply to this post by Mara Kim-2
Thanks Mara and Karl for the responses.   I think your idea, Mara, would work. I'm at a loss for a more elegant solution.  At least if we had a 'linked_table' column the knowledge of how to join is there.  And even thought it would take multiple queries to do the join (one to figure which tables to join and a second (or more) to do all the necessary joining) at least I could do it programmatically.

To help solve both problems of say an organism and a feature sharing the same grpmember and to knowing what table to link to... what if we added an intermediary table named 'grpmemberlinker'.  This table will have the 'linked_table' field and an FK relationship with the 'grpmember' table.  It will also have a unique constraint on the grpmember_id.  This will only allow that grpmember_id to be linked to a record in only one table. Then in the "submodule" linking tables we use the 'grpmemberlinker_id' rather than the 'grpmember_id' (Example schema below).  It adds an extra join but would perhaps help with both problems?

grpmember
---------------
grpmember_id
grp_id
type_id
rank

grpmemberlinker
------------------------
grpmemberlinker_id
grpmember_id  (unique constraint)
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmemberlinker_id  (uses the grpmemberlinker_id rather than the grpmember_id).


On 12/4/2014 3:10 PM, Mara Kim wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Sook Jung
Hi,
Sorry for asking this now but I'm just wondering why not just have grp_feature, grp_organism, etc? Is that because there is a case where someone want to add a feature and an organism into the same group?
Sook

On Thu, Dec 4, 2014 at 4:02 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks Mara and Karl for the responses.   I think your idea, Mara, would work. I'm at a loss for a more elegant solution.  At least if we had a 'linked_table' column the knowledge of how to join is there.  And even thought it would take multiple queries to do the join (one to figure which tables to join and a second (or more) to do all the necessary joining) at least I could do it programmatically.

To help solve both problems of say an organism and a feature sharing the same grpmember and to knowing what table to link to... what if we added an intermediary table named 'grpmemberlinker'.  This table will have the 'linked_table' field and an FK relationship with the 'grpmember' table.  It will also have a unique constraint on the grpmember_id.  This will only allow that grpmember_id to be linked to a record in only one table. Then in the "submodule" linking tables we use the 'grpmemberlinker_id' rather than the 'grpmember_id' (Example schema below).  It adds an extra join but would perhaps help with both problems?

grpmember
---------------
grpmember_id
grp_id
type_id
rank

grpmemberlinker
------------------------
grpmemberlinker_id
grpmember_id  (unique constraint)
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmemberlinker_id  (uses the grpmemberlinker_id rather than the grpmember_id).



On 12/4/2014 3:10 PM, Mara Kim wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Mara Kim-3
@Stephen:

I don't see the benefit to having the grpmemberlinker as it the resulting grpmember-grpmemberlinker structure is equivalent to a singular grpmember table in Boyce-Codd Normal Form, since the grpmemberlinker table has a unique constraint on grpmember_id. The solution I was imagining is essentially this:

grpmember
---------------
grpmember_id
grp_id
type_id
rank
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmember_id
linking_table DEFAULT "xxx" CHECK (linking_table = "xxx")
FOREIGN KEY (grpmember_id,linking_table) REFERENCES grpmember(grpmember_id,linking_table)


@Sook:

At one point the was a design where we had grp tables in the style of featuregrp, organismgrp, etc.  The reason why we decided not to have xxxgrp tables was to avoid the multiplicative effect that would result from creating xxxgrp_synonym, xxxgrpprop, xxxgrp_pub, etc. for every single type of group.  It would require the cross product of every tagging table for groups with every table type that could form a group.


On Thu, Dec 4, 2014 at 3:14 PM, Sook Jung <[hidden email]> wrote:
Hi,
Sorry for asking this now but I'm just wondering why not just have grp_feature, grp_organism, etc? Is that because there is a case where someone want to add a feature and an organism into the same group?
Sook

On Thu, Dec 4, 2014 at 4:02 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks Mara and Karl for the responses.   I think your idea, Mara, would work. I'm at a loss for a more elegant solution.  At least if we had a 'linked_table' column the knowledge of how to join is there.  And even thought it would take multiple queries to do the join (one to figure which tables to join and a second (or more) to do all the necessary joining) at least I could do it programmatically.

To help solve both problems of say an organism and a feature sharing the same grpmember and to knowing what table to link to... what if we added an intermediary table named 'grpmemberlinker'.  This table will have the 'linked_table' field and an FK relationship with the 'grpmember' table.  It will also have a unique constraint on the grpmember_id.  This will only allow that grpmember_id to be linked to a record in only one table. Then in the "submodule" linking tables we use the 'grpmemberlinker_id' rather than the 'grpmember_id' (Example schema below).  It adds an extra join but would perhaps help with both problems?

grpmember
---------------
grpmember_id
grp_id
type_id
rank

grpmemberlinker
------------------------
grpmemberlinker_id
grpmember_id  (unique constraint)
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmemberlinker_id  (uses the grpmemberlinker_id rather than the grpmember_id).



On 12/4/2014 3:10 PM, Mara Kim wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Stephen Ficklin-3
Hi Mara,

Oh, I missed that there would also be a linking_table field in the submodule table as well.  I like that better that what I suggested.

Stephen

On 12/5/2014 2:35 PM, Mara Kim wrote:
@Stephen:

I don't see the benefit to having the grpmemberlinker as it the resulting grpmember-grpmemberlinker structure is equivalent to a singular grpmember table in Boyce-Codd Normal Form, since the grpmemberlinker table has a unique constraint on grpmember_id. The solution I was imagining is essentially this:

grpmember
---------------
grpmember_id
grp_id
type_id
rank
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmember_id
linking_table DEFAULT "xxx" CHECK (linking_table = "xxx")
FOREIGN KEY (grpmember_id,linking_table) REFERENCES grpmember(grpmember_id,linking_table)


@Sook:

At one point the was a design where we had grp tables in the style of featuregrp, organismgrp, etc.  The reason why we decided not to have xxxgrp tables was to avoid the multiplicative effect that would result from creating xxxgrp_synonym, xxxgrpprop, xxxgrp_pub, etc. for every single type of group.  It would require the cross product of every tagging table for groups with every table type that could form a group.


On Thu, Dec 4, 2014 at 3:14 PM, Sook Jung <[hidden email]> wrote:
Hi,
Sorry for asking this now but I'm just wondering why not just have grp_feature, grp_organism, etc? Is that because there is a case where someone want to add a feature and an organism into the same group?
Sook

On Thu, Dec 4, 2014 at 4:02 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks Mara and Karl for the responses.   I think your idea, Mara, would work. I'm at a loss for a more elegant solution.  At least if we had a 'linked_table' column the knowledge of how to join is there.  And even thought it would take multiple queries to do the join (one to figure which tables to join and a second (or more) to do all the necessary joining) at least I could do it programmatically.

To help solve both problems of say an organism and a feature sharing the same grpmember and to knowing what table to link to... what if we added an intermediary table named 'grpmemberlinker'.  This table will have the 'linked_table' field and an FK relationship with the 'grpmember' table.  It will also have a unique constraint on the grpmember_id.  This will only allow that grpmember_id to be linked to a record in only one table. Then in the "submodule" linking tables we use the 'grpmemberlinker_id' rather than the 'grpmember_id' (Example schema below).  It adds an extra join but would perhaps help with both problems?

grpmember
---------------
grpmember_id
grp_id
type_id
rank

grpmemberlinker
------------------------
grpmemberlinker_id
grpmember_id  (unique constraint)
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmemberlinker_id  (uses the grpmemberlinker_id rather than the grpmember_id).



On 12/4/2014 3:10 PM, Mara Kim wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Sook Jung
Hi Mara,

You may have difficult time working today answering two different email thread :)

I am a little bit cautious about making the group module open to any base table.. One of the main idea of Natural Diversity Module was to use 'stock' and 'project' table to store hierarchical entries using '_relationship' table (following the characteristics of feature table). 

So a sample of a germplasm, germplasm and population can be stock entries. A project can have a sub-project and all of them can be stored in project table. With group module open to every base table, some folk may end up using grouping stocks using group table which will make sharing chado-related code very difficult..

We could start with absolutely necessary tables - (eg. featuregrp) and then have further discussions about the necessity of grouping other tables..

Sook

On Fri, Dec 5, 2014 at 3:07 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Mara,

Oh, I missed that there would also be a linking_table field in the submodule table as well.  I like that better that what I suggested.

Stephen


On 12/5/2014 2:35 PM, Mara Kim wrote:
@Stephen:

I don't see the benefit to having the grpmemberlinker as it the resulting grpmember-grpmemberlinker structure is equivalent to a singular grpmember table in Boyce-Codd Normal Form, since the grpmemberlinker table has a unique constraint on grpmember_id. The solution I was imagining is essentially this:

grpmember
---------------
grpmember_id
grp_id
type_id
rank
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmember_id
linking_table DEFAULT "xxx" CHECK (linking_table = "xxx")
FOREIGN KEY (grpmember_id,linking_table) REFERENCES grpmember(grpmember_id,linking_table)


@Sook:

At one point the was a design where we had grp tables in the style of featuregrp, organismgrp, etc.  The reason why we decided not to have xxxgrp tables was to avoid the multiplicative effect that would result from creating xxxgrp_synonym, xxxgrpprop, xxxgrp_pub, etc. for every single type of group.  It would require the cross product of every tagging table for groups with every table type that could form a group.


On Thu, Dec 4, 2014 at 3:14 PM, Sook Jung <[hidden email]> wrote:
Hi,
Sorry for asking this now but I'm just wondering why not just have grp_feature, grp_organism, etc? Is that because there is a case where someone want to add a feature and an organism into the same group?
Sook

On Thu, Dec 4, 2014 at 4:02 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks Mara and Karl for the responses.   I think your idea, Mara, would work. I'm at a loss for a more elegant solution.  At least if we had a 'linked_table' column the knowledge of how to join is there.  And even thought it would take multiple queries to do the join (one to figure which tables to join and a second (or more) to do all the necessary joining) at least I could do it programmatically.

To help solve both problems of say an organism and a feature sharing the same grpmember and to knowing what table to link to... what if we added an intermediary table named 'grpmemberlinker'.  This table will have the 'linked_table' field and an FK relationship with the 'grpmember' table.  It will also have a unique constraint on the grpmember_id.  This will only allow that grpmember_id to be linked to a record in only one table. Then in the "submodule" linking tables we use the 'grpmemberlinker_id' rather than the 'grpmember_id' (Example schema below).  It adds an extra join but would perhaps help with both problems?

grpmember
---------------
grpmember_id
grp_id
type_id
rank

grpmemberlinker
------------------------
grpmemberlinker_id
grpmember_id  (unique constraint)
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmemberlinker_id  (uses the grpmemberlinker_id rather than the grpmember_id).



On 12/4/2014 3:10 PM, Mara Kim wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Andy Schroeder
FlyBase is already planning on going with the group module as Mara has specified it for the reasons she described.  I don't see the incompatibility or conflict of the _relationship example for hierarchical organization and the grp for grouping.  The nice thing about groups is that there is no direction or hierarchy specification.  An in fact one can have a hierarchy of groups using grp_relationship, which we will be making extensive use of.  Hopefully the hypothetical shared code would be generic and configurable enough to account for the implementation needs of the user and their desired implementation.

cheers,
Andy

cheers,
Andy

On Fri, Dec 5, 2014 at 3:20 PM, Sook Jung <[hidden email]> wrote:
Hi Mara,

You may have difficult time working today answering two different email thread :)

I am a little bit cautious about making the group module open to any base table.. One of the main idea of Natural Diversity Module was to use 'stock' and 'project' table to store hierarchical entries using '_relationship' table (following the characteristics of feature table). 

So a sample of a germplasm, germplasm and population can be stock entries. A project can have a sub-project and all of them can be stored in project table. With group module open to every base table, some folk may end up using grouping stocks using group table which will make sharing chado-related code very difficult..

We could start with absolutely necessary tables - (eg. featuregrp) and then have further discussions about the necessity of grouping other tables..

Sook

On Fri, Dec 5, 2014 at 3:07 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Mara,

Oh, I missed that there would also be a linking_table field in the submodule table as well.  I like that better that what I suggested.

Stephen


On 12/5/2014 2:35 PM, Mara Kim wrote:
@Stephen:

I don't see the benefit to having the grpmemberlinker as it the resulting grpmember-grpmemberlinker structure is equivalent to a singular grpmember table in Boyce-Codd Normal Form, since the grpmemberlinker table has a unique constraint on grpmember_id. The solution I was imagining is essentially this:

grpmember
---------------
grpmember_id
grp_id
type_id
rank
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmember_id
linking_table DEFAULT "xxx" CHECK (linking_table = "xxx")
FOREIGN KEY (grpmember_id,linking_table) REFERENCES grpmember(grpmember_id,linking_table)


@Sook:

At one point the was a design where we had grp tables in the style of featuregrp, organismgrp, etc.  The reason why we decided not to have xxxgrp tables was to avoid the multiplicative effect that would result from creating xxxgrp_synonym, xxxgrpprop, xxxgrp_pub, etc. for every single type of group.  It would require the cross product of every tagging table for groups with every table type that could form a group.


On Thu, Dec 4, 2014 at 3:14 PM, Sook Jung <[hidden email]> wrote:
Hi,
Sorry for asking this now but I'm just wondering why not just have grp_feature, grp_organism, etc? Is that because there is a case where someone want to add a feature and an organism into the same group?
Sook

On Thu, Dec 4, 2014 at 4:02 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks Mara and Karl for the responses.   I think your idea, Mara, would work. I'm at a loss for a more elegant solution.  At least if we had a 'linked_table' column the knowledge of how to join is there.  And even thought it would take multiple queries to do the join (one to figure which tables to join and a second (or more) to do all the necessary joining) at least I could do it programmatically.

To help solve both problems of say an organism and a feature sharing the same grpmember and to knowing what table to link to... what if we added an intermediary table named 'grpmemberlinker'.  This table will have the 'linked_table' field and an FK relationship with the 'grpmember' table.  It will also have a unique constraint on the grpmember_id.  This will only allow that grpmember_id to be linked to a record in only one table. Then in the "submodule" linking tables we use the 'grpmemberlinker_id' rather than the 'grpmember_id' (Example schema below).  It adds an extra join but would perhaps help with both problems?

grpmember
---------------
grpmember_id
grp_id
type_id
rank

grpmemberlinker
------------------------
grpmemberlinker_id
grpmember_id  (unique constraint)
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmemberlinker_id  (uses the grpmemberlinker_id rather than the grpmember_id).



On 12/4/2014 3:10 PM, Mara Kim wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Stephen Ficklin-3
Thanks all for the discussion.  Mara,  there has been some discussions in the past about changes to the group module.  Here's a summary:

1) Add a grp_synonym table
2) Add two new columns to the grp table: comment and date? or should those go to the grpprop table?
3) Add is_obsolete (boolean, not null default false) to the grp table
4) Add a link_table field to the grpmember and xxxx_grpmember tables with FK (grpmember_id, link_table) in the xxx_grpmember table.

In the future I want to explore using it for storing networks, and it sounds like the Flybase folks are about ready to use it.  Which once someone starts using it, it makes it harder to adjust later.   So, I'm just wondering so that I can prepare for the future, if these changes will be added ?

Thanks,
Stephen

On 12/5/2014 3:50 PM, Andy Schroeder wrote:
FlyBase is already planning on going with the group module as Mara has specified it for the reasons she described.  I don't see the incompatibility or conflict of the _relationship example for hierarchical organization and the grp for grouping.  The nice thing about groups is that there is no direction or hierarchy specification.  An in fact one can have a hierarchy of groups using grp_relationship, which we will be making extensive use of.  Hopefully the hypothetical shared code would be generic and configurable enough to account for the implementation needs of the user and their desired implementation.

cheers,
Andy

cheers,
Andy

On Fri, Dec 5, 2014 at 3:20 PM, Sook Jung <[hidden email]> wrote:
Hi Mara,

You may have difficult time working today answering two different email thread :)

I am a little bit cautious about making the group module open to any base table.. One of the main idea of Natural Diversity Module was to use 'stock' and 'project' table to store hierarchical entries using '_relationship' table (following the characteristics of feature table). 

So a sample of a germplasm, germplasm and population can be stock entries. A project can have a sub-project and all of them can be stored in project table. With group module open to every base table, some folk may end up using grouping stocks using group table which will make sharing chado-related code very difficult..

We could start with absolutely necessary tables - (eg. featuregrp) and then have further discussions about the necessity of grouping other tables..

Sook

On Fri, Dec 5, 2014 at 3:07 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Mara,

Oh, I missed that there would also be a linking_table field in the submodule table as well.  I like that better that what I suggested.

Stephen


On 12/5/2014 2:35 PM, Mara Kim wrote:
@Stephen:

I don't see the benefit to having the grpmemberlinker as it the resulting grpmember-grpmemberlinker structure is equivalent to a singular grpmember table in Boyce-Codd Normal Form, since the grpmemberlinker table has a unique constraint on grpmember_id. The solution I was imagining is essentially this:

grpmember
---------------
grpmember_id
grp_id
type_id
rank
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmember_id
linking_table DEFAULT "xxx" CHECK (linking_table = "xxx")
FOREIGN KEY (grpmember_id,linking_table) REFERENCES grpmember(grpmember_id,linking_table)


@Sook:

At one point the was a design where we had grp tables in the style of featuregrp, organismgrp, etc.  The reason why we decided not to have xxxgrp tables was to avoid the multiplicative effect that would result from creating xxxgrp_synonym, xxxgrpprop, xxxgrp_pub, etc. for every single type of group.  It would require the cross product of every tagging table for groups with every table type that could form a group.


On Thu, Dec 4, 2014 at 3:14 PM, Sook Jung <[hidden email]> wrote:
Hi,
Sorry for asking this now but I'm just wondering why not just have grp_feature, grp_organism, etc? Is that because there is a case where someone want to add a feature and an organism into the same group?
Sook

On Thu, Dec 4, 2014 at 4:02 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks Mara and Karl for the responses.   I think your idea, Mara, would work. I'm at a loss for a more elegant solution.  At least if we had a 'linked_table' column the knowledge of how to join is there.  And even thought it would take multiple queries to do the join (one to figure which tables to join and a second (or more) to do all the necessary joining) at least I could do it programmatically.

To help solve both problems of say an organism and a feature sharing the same grpmember and to knowing what table to link to... what if we added an intermediary table named 'grpmemberlinker'.  This table will have the 'linked_table' field and an FK relationship with the 'grpmember' table.  It will also have a unique constraint on the grpmember_id.  This will only allow that grpmember_id to be linked to a record in only one table. Then in the "submodule" linking tables we use the 'grpmemberlinker_id' rather than the 'grpmember_id' (Example schema below).  It adds an extra join but would perhaps help with both problems?

grpmember
---------------
grpmember_id
grp_id
type_id
rank

grpmemberlinker
------------------------
grpmemberlinker_id
grpmember_id  (unique constraint)
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmemberlinker_id  (uses the grpmemberlinker_id rather than the grpmember_id).



On 12/4/2014 3:10 PM, Mara Kim wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Mara Kim-3
As of Release Candidate 2, there is a grp_synonym table, as well as an is_obsolete column on the grp table.

I am almost ready to publish RC3.  All future versions (including RC3) will include update commands to allow upgrades from any previous release candidate except RC1.  If someone is still on RC1 and needs an upgrade script, I will work on one, otherwise I'll assume there isn't a need.

Features slated for RC3:
grp_contact table
two new columns to the grp table: comment and date
link_table field in grpmember and xxxx_grpmember tables with FK grpmember(grpmember_id, link_table)

On Fri, Dec 5, 2014 at 4:15 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks all for the discussion.  Mara,  there has been some discussions in the past about changes to the group module.  Here's a summary:

1) Add a grp_synonym table
2) Add two new columns to the grp table: comment and date? or should those go to the grpprop table?
3) Add is_obsolete (boolean, not null default false) to the grp table
4) Add a link_table field to the grpmember and xxxx_grpmember tables with FK (grpmember_id, link_table) in the xxx_grpmember table.

In the future I want to explore using it for storing networks, and it sounds like the Flybase folks are about ready to use it.  Which once someone starts using it, it makes it harder to adjust later.   So, I'm just wondering so that I can prepare for the future, if these changes will be added ?

Thanks,
Stephen


On 12/5/2014 3:50 PM, Andy Schroeder wrote:
FlyBase is already planning on going with the group module as Mara has specified it for the reasons she described.  I don't see the incompatibility or conflict of the _relationship example for hierarchical organization and the grp for grouping.  The nice thing about groups is that there is no direction or hierarchy specification.  An in fact one can have a hierarchy of groups using grp_relationship, which we will be making extensive use of.  Hopefully the hypothetical shared code would be generic and configurable enough to account for the implementation needs of the user and their desired implementation.

cheers,
Andy

cheers,
Andy

On Fri, Dec 5, 2014 at 3:20 PM, Sook Jung <[hidden email]> wrote:
Hi Mara,

You may have difficult time working today answering two different email thread :)

I am a little bit cautious about making the group module open to any base table.. One of the main idea of Natural Diversity Module was to use 'stock' and 'project' table to store hierarchical entries using '_relationship' table (following the characteristics of feature table). 

So a sample of a germplasm, germplasm and population can be stock entries. A project can have a sub-project and all of them can be stored in project table. With group module open to every base table, some folk may end up using grouping stocks using group table which will make sharing chado-related code very difficult..

We could start with absolutely necessary tables - (eg. featuregrp) and then have further discussions about the necessity of grouping other tables..

Sook

On Fri, Dec 5, 2014 at 3:07 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Mara,

Oh, I missed that there would also be a linking_table field in the submodule table as well.  I like that better that what I suggested.

Stephen


On 12/5/2014 2:35 PM, Mara Kim wrote:
@Stephen:

I don't see the benefit to having the grpmemberlinker as it the resulting grpmember-grpmemberlinker structure is equivalent to a singular grpmember table in Boyce-Codd Normal Form, since the grpmemberlinker table has a unique constraint on grpmember_id. The solution I was imagining is essentially this:

grpmember
---------------
grpmember_id
grp_id
type_id
rank
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmember_id
linking_table DEFAULT "xxx" CHECK (linking_table = "xxx")
FOREIGN KEY (grpmember_id,linking_table) REFERENCES grpmember(grpmember_id,linking_table)


@Sook:

At one point the was a design where we had grp tables in the style of featuregrp, organismgrp, etc.  The reason why we decided not to have xxxgrp tables was to avoid the multiplicative effect that would result from creating xxxgrp_synonym, xxxgrpprop, xxxgrp_pub, etc. for every single type of group.  It would require the cross product of every tagging table for groups with every table type that could form a group.


On Thu, Dec 4, 2014 at 3:14 PM, Sook Jung <[hidden email]> wrote:
Hi,
Sorry for asking this now but I'm just wondering why not just have grp_feature, grp_organism, etc? Is that because there is a case where someone want to add a feature and an organism into the same group?
Sook

On Thu, Dec 4, 2014 at 4:02 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks Mara and Karl for the responses.   I think your idea, Mara, would work. I'm at a loss for a more elegant solution.  At least if we had a 'linked_table' column the knowledge of how to join is there.  And even thought it would take multiple queries to do the join (one to figure which tables to join and a second (or more) to do all the necessary joining) at least I could do it programmatically.

To help solve both problems of say an organism and a feature sharing the same grpmember and to knowing what table to link to... what if we added an intermediary table named 'grpmemberlinker'.  This table will have the 'linked_table' field and an FK relationship with the 'grpmember' table.  It will also have a unique constraint on the grpmember_id.  This will only allow that grpmember_id to be linked to a record in only one table. Then in the "submodule" linking tables we use the 'grpmemberlinker_id' rather than the 'grpmember_id' (Example schema below).  It adds an extra join but would perhaps help with both problems?

grpmember
---------------
grpmember_id
grp_id
type_id
rank

grpmemberlinker
------------------------
grpmemberlinker_id
grpmember_id  (unique constraint)
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmemberlinker_id  (uses the grpmemberlinker_id rather than the grpmember_id).



On 12/4/2014 3:10 PM, Mara Kim wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Kathleen Falls-2
Mara,

I would suggest storing comments in the grpprop table. FlyBase has stored several 'types' of comments in the featureprop. We are counting on storing comments in grpprop.
 Please could someone give a use case for date in the grp table? What date would be stored and would it be updated, when/why? Our specs for the first kind of use of the grp module (GeneGroup) calls for a date value (mmddyyyy) to be stored in grpprop. 
FlyBase actually uses an audit table to track inserts, updates deletes including date for most tables. There are populated by triggers. 

Cheers,

Kathleen 

On Fri, Dec 5, 2014 at 5:29 PM, Mara Kim <[hidden email]> wrote:
As of Release Candidate 2, there is a grp_synonym table, as well as an is_obsolete column on the grp table.

I am almost ready to publish RC3.  All future versions (including RC3) will include update commands to allow upgrades from any previous release candidate except RC1.  If someone is still on RC1 and needs an upgrade script, I will work on one, otherwise I'll assume there isn't a need.

Features slated for RC3:
grp_contact table
two new columns to the grp table: comment and date
link_table field in grpmember and xxxx_grpmember tables with FK grpmember(grpmember_id, link_table)

On Fri, Dec 5, 2014 at 4:15 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks all for the discussion.  Mara,  there has been some discussions in the past about changes to the group module.  Here's a summary:

1) Add a grp_synonym table
2) Add two new columns to the grp table: comment and date? or should those go to the grpprop table?
3) Add is_obsolete (boolean, not null default false) to the grp table
4) Add a link_table field to the grpmember and xxxx_grpmember tables with FK (grpmember_id, link_table) in the xxx_grpmember table.

In the future I want to explore using it for storing networks, and it sounds like the Flybase folks are about ready to use it.  Which once someone starts using it, it makes it harder to adjust later.   So, I'm just wondering so that I can prepare for the future, if these changes will be added ?

Thanks,
Stephen


On 12/5/2014 3:50 PM, Andy Schroeder wrote:
FlyBase is already planning on going with the group module as Mara has specified it for the reasons she described.  I don't see the incompatibility or conflict of the _relationship example for hierarchical organization and the grp for grouping.  The nice thing about groups is that there is no direction or hierarchy specification.  An in fact one can have a hierarchy of groups using grp_relationship, which we will be making extensive use of.  Hopefully the hypothetical shared code would be generic and configurable enough to account for the implementation needs of the user and their desired implementation.

cheers,
Andy

cheers,
Andy

On Fri, Dec 5, 2014 at 3:20 PM, Sook Jung <[hidden email]> wrote:
Hi Mara,

You may have difficult time working today answering two different email thread :)

I am a little bit cautious about making the group module open to any base table.. One of the main idea of Natural Diversity Module was to use 'stock' and 'project' table to store hierarchical entries using '_relationship' table (following the characteristics of feature table). 

So a sample of a germplasm, germplasm and population can be stock entries. A project can have a sub-project and all of them can be stored in project table. With group module open to every base table, some folk may end up using grouping stocks using group table which will make sharing chado-related code very difficult..

We could start with absolutely necessary tables - (eg. featuregrp) and then have further discussions about the necessity of grouping other tables..

Sook

On Fri, Dec 5, 2014 at 3:07 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Mara,

Oh, I missed that there would also be a linking_table field in the submodule table as well.  I like that better that what I suggested.

Stephen


On 12/5/2014 2:35 PM, Mara Kim wrote:
@Stephen:

I don't see the benefit to having the grpmemberlinker as it the resulting grpmember-grpmemberlinker structure is equivalent to a singular grpmember table in Boyce-Codd Normal Form, since the grpmemberlinker table has a unique constraint on grpmember_id. The solution I was imagining is essentially this:

grpmember
---------------
grpmember_id
grp_id
type_id
rank
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmember_id
linking_table DEFAULT "xxx" CHECK (linking_table = "xxx")
FOREIGN KEY (grpmember_id,linking_table) REFERENCES grpmember(grpmember_id,linking_table)


@Sook:

At one point the was a design where we had grp tables in the style of featuregrp, organismgrp, etc.  The reason why we decided not to have xxxgrp tables was to avoid the multiplicative effect that would result from creating xxxgrp_synonym, xxxgrpprop, xxxgrp_pub, etc. for every single type of group.  It would require the cross product of every tagging table for groups with every table type that could form a group.


On Thu, Dec 4, 2014 at 3:14 PM, Sook Jung <[hidden email]> wrote:
Hi,
Sorry for asking this now but I'm just wondering why not just have grp_feature, grp_organism, etc? Is that because there is a case where someone want to add a feature and an organism into the same group?
Sook

On Thu, Dec 4, 2014 at 4:02 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks Mara and Karl for the responses.   I think your idea, Mara, would work. I'm at a loss for a more elegant solution.  At least if we had a 'linked_table' column the knowledge of how to join is there.  And even thought it would take multiple queries to do the join (one to figure which tables to join and a second (or more) to do all the necessary joining) at least I could do it programmatically.

To help solve both problems of say an organism and a feature sharing the same grpmember and to knowing what table to link to... what if we added an intermediary table named 'grpmemberlinker'.  This table will have the 'linked_table' field and an FK relationship with the 'grpmember' table.  It will also have a unique constraint on the grpmember_id.  This will only allow that grpmember_id to be linked to a record in only one table. Then in the "submodule" linking tables we use the 'grpmemberlinker_id' rather than the 'grpmember_id' (Example schema below).  It adds an extra join but would perhaps help with both problems?

grpmember
---------------
grpmember_id
grp_id
type_id
rank

grpmemberlinker
------------------------
grpmemberlinker_id
grpmember_id  (unique constraint)
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmemberlinker_id  (uses the grpmemberlinker_id rather than the grpmember_id).



On 12/4/2014 3:10 PM, Mara Kim wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Mara Kim-3

Kathleen, you bring up some good points. I am inclined to agree with you, especially about the comments. What are your thoughts Stephen? Do you think Tripal could just use the prop table?

Mara,

I would suggest storing comments in the grpprop table. FlyBase has stored several 'types' of comments in the featureprop. We are counting on storing comments in grpprop.
 Please could someone give a use case for date in the grp table? What date would be stored and would it be updated, when/why? Our specs for the first kind of use of the grp module (GeneGroup) calls for a date value (mmddyyyy) to be stored in grpprop. 
FlyBase actually uses an audit table to track inserts, updates deletes including date for most tables. There are populated by triggers. 

Cheers,

Kathleen 

On Fri, Dec 5, 2014 at 5:29 PM, Mara Kim <[hidden email]> wrote:
As of Release Candidate 2, there is a grp_synonym table, as well as an is_obsolete column on the grp table.

I am almost ready to publish RC3.  All future versions (including RC3) will include update commands to allow upgrades from any previous release candidate except RC1.  If someone is still on RC1 and needs an upgrade script, I will work on one, otherwise I'll assume there isn't a need.

Features slated for RC3:
grp_contact table
two new columns to the grp table: comment and date
link_table field in grpmember and xxxx_grpmember tables with FK grpmember(grpmember_id, link_table)

On Fri, Dec 5, 2014 at 4:15 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks all for the discussion.  Mara,  there has been some discussions in the past about changes to the group module.  Here's a summary:

1) Add a grp_synonym table
2) Add two new columns to the grp table: comment and date? or should those go to the grpprop table?
3) Add is_obsolete (boolean, not null default false) to the grp table
4) Add a link_table field to the grpmember and xxxx_grpmember tables with FK (grpmember_id, link_table) in the xxx_grpmember table.

In the future I want to explore using it for storing networks, and it sounds like the Flybase folks are about ready to use it.  Which once someone starts using it, it makes it harder to adjust later.   So, I'm just wondering so that I can prepare for the future, if these changes will be added ?

Thanks,
Stephen


On 12/5/2014 3:50 PM, Andy Schroeder wrote:
FlyBase is already planning on going with the group module as Mara has specified it for the reasons she described.  I don't see the incompatibility or conflict of the _relationship example for hierarchical organization and the grp for grouping.  The nice thing about groups is that there is no direction or hierarchy specification.  An in fact one can have a hierarchy of groups using grp_relationship, which we will be making extensive use of.  Hopefully the hypothetical shared code would be generic and configurable enough to account for the implementation needs of the user and their desired implementation.

cheers,
Andy

cheers,
Andy

On Fri, Dec 5, 2014 at 3:20 PM, Sook Jung <[hidden email]> wrote:
Hi Mara,

You may have difficult time working today answering two different email thread :)

I am a little bit cautious about making the group module open to any base table.. One of the main idea of Natural Diversity Module was to use 'stock' and 'project' table to store hierarchical entries using '_relationship' table (following the characteristics of feature table). 

So a sample of a germplasm, germplasm and population can be stock entries. A project can have a sub-project and all of them can be stored in project table. With group module open to every base table, some folk may end up using grouping stocks using group table which will make sharing chado-related code very difficult..

We could start with absolutely necessary tables - (eg. featuregrp) and then have further discussions about the necessity of grouping other tables..

Sook

On Fri, Dec 5, 2014 at 3:07 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Mara,

Oh, I missed that there would also be a linking_table field in the submodule table as well.  I like that better that what I suggested.

Stephen


On 12/5/2014 2:35 PM, Mara Kim wrote:
@Stephen:

I don't see the benefit to having the grpmemberlinker as it the resulting grpmember-grpmemberlinker structure is equivalent to a singular grpmember table in Boyce-Codd Normal Form, since the grpmemberlinker table has a unique constraint on grpmember_id. The solution I was imagining is essentially this:

grpmember
---------------
grpmember_id
grp_id
type_id
rank
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmember_id
linking_table DEFAULT "xxx" CHECK (linking_table = "xxx")
FOREIGN KEY (grpmember_id,linking_table) REFERENCES grpmember(grpmember_id,linking_table)


@Sook:

At one point the was a design where we had grp tables in the style of featuregrp, organismgrp, etc.  The reason why we decided not to have xxxgrp tables was to avoid the multiplicative effect that would result from creating xxxgrp_synonym, xxxgrpprop, xxxgrp_pub, etc. for every single type of group.  It would require the cross product of every tagging table for groups with every table type that could form a group.


On Thu, Dec 4, 2014 at 3:14 PM, Sook Jung <[hidden email]> wrote:
Hi,
Sorry for asking this now but I'm just wondering why not just have grp_feature, grp_organism, etc? Is that because there is a case where someone want to add a feature and an organism into the same group?
Sook

On Thu, Dec 4, 2014 at 4:02 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks Mara and Karl for the responses.   I think your idea, Mara, would work. I'm at a loss for a more elegant solution.  At least if we had a 'linked_table' column the knowledge of how to join is there.  And even thought it would take multiple queries to do the join (one to figure which tables to join and a second (or more) to do all the necessary joining) at least I could do it programmatically.

To help solve both problems of say an organism and a feature sharing the same grpmember and to knowing what table to link to... what if we added an intermediary table named 'grpmemberlinker'.  This table will have the 'linked_table' field and an FK relationship with the 'grpmember' table.  It will also have a unique constraint on the grpmember_id.  This will only allow that grpmember_id to be linked to a record in only one table. Then in the "submodule" linking tables we use the 'grpmemberlinker_id' rather than the 'grpmember_id' (Example schema below).  It adds an extra join but would perhaps help with both problems?

grpmember
---------------
grpmember_id
grp_id
type_id
rank

grpmemberlinker
------------------------
grpmemberlinker_id
grpmember_id  (unique constraint)
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmemberlinker_id  (uses the grpmemberlinker_id rather than the grpmember_id).



On 12/4/2014 3:10 PM, Mara Kim wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Question on Chado Group Module

Stephen Ficklin-3
Hi Mara,

Tripal could handle it either way.  Some of the "base" tables do have a comments/description field (e.g. assay, analysis, organism, stock, project, study), but some don't (e.g. library, feature, contact).  And amongst those that do it's inconsistently defined ('character varying(255)' or 'text').   So, we've had to deal with them on a case-by-case basis.   I like to keep things consistent, but there doesn't seem to be a rule for use of comments/descriptions in base tables of Chado.   It would definitely be easier to deal with a 'description' field in the 'grp' table for Tripal. I would just recommend it be set as a 'text' field and not a 'character varying' (if it isn't already).  If we do put a 'description' field with the 'grp' table but folks store multiple comments in the 'grprop' table then Tripal can handle both cases. 

I don't know what the date field would be for.  Some of the tables have 'timeaccessioned' or 'timelastmodified'.  There isn't a consistent application of those fields across tables.  And when they are used, they have specific names for 'datetime' fields (e.g. assay.assaydate, analysis.timeexecuted, acquisition.acquisitiondate, quantification.quantificationdate).  So, for the sake of trying to be as consistent as possible, I think if we want a 'date' field in the 'grp' table that we name it to identify what it is used for.  I think one disadvantage for storing a date in the prop table is it makes it more difficult to compare dates within an SQL query.  So, if we have a use-case for it that's ubiquitous then I think it makes sense to give it a meaningful name and put it in the 'grp' table. 

Stephen

On 12/6/2014 2:40 PM, Mara Kim wrote:

Kathleen, you bring up some good points. I am inclined to agree with you, especially about the comments. What are your thoughts Stephen? Do you think Tripal could just use the prop table?

Mara,

I would suggest storing comments in the grpprop table. FlyBase has stored several 'types' of comments in the featureprop. We are counting on storing comments in grpprop.
 Please could someone give a use case for date in the grp table? What date would be stored and would it be updated, when/why? Our specs for the first kind of use of the grp module (GeneGroup) calls for a date value (mmddyyyy) to be stored in grpprop. 
FlyBase actually uses an audit table to track inserts, updates deletes including date for most tables. There are populated by triggers. 

Cheers,

Kathleen 

On Fri, Dec 5, 2014 at 5:29 PM, Mara Kim <[hidden email]> wrote:
As of Release Candidate 2, there is a grp_synonym table, as well as an is_obsolete column on the grp table.

I am almost ready to publish RC3.  All future versions (including RC3) will include update commands to allow upgrades from any previous release candidate except RC1.  If someone is still on RC1 and needs an upgrade script, I will work on one, otherwise I'll assume there isn't a need.

Features slated for RC3:
grp_contact table
two new columns to the grp table: comment and date
link_table field in grpmember and xxxx_grpmember tables with FK grpmember(grpmember_id, link_table)

On Fri, Dec 5, 2014 at 4:15 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks all for the discussion.  Mara,  there has been some discussions in the past about changes to the group module.  Here's a summary:

1) Add a grp_synonym table
2) Add two new columns to the grp table: comment and date? or should those go to the grpprop table?
3) Add is_obsolete (boolean, not null default false) to the grp table
4) Add a link_table field to the grpmember and xxxx_grpmember tables with FK (grpmember_id, link_table) in the xxx_grpmember table.

In the future I want to explore using it for storing networks, and it sounds like the Flybase folks are about ready to use it.  Which once someone starts using it, it makes it harder to adjust later.   So, I'm just wondering so that I can prepare for the future, if these changes will be added ?

Thanks,
Stephen


On 12/5/2014 3:50 PM, Andy Schroeder wrote:
FlyBase is already planning on going with the group module as Mara has specified it for the reasons she described.  I don't see the incompatibility or conflict of the _relationship example for hierarchical organization and the grp for grouping.  The nice thing about groups is that there is no direction or hierarchy specification.  An in fact one can have a hierarchy of groups using grp_relationship, which we will be making extensive use of.  Hopefully the hypothetical shared code would be generic and configurable enough to account for the implementation needs of the user and their desired implementation.

cheers,
Andy

cheers,
Andy

On Fri, Dec 5, 2014 at 3:20 PM, Sook Jung <[hidden email]> wrote:
Hi Mara,

You may have difficult time working today answering two different email thread :)

I am a little bit cautious about making the group module open to any base table.. One of the main idea of Natural Diversity Module was to use 'stock' and 'project' table to store hierarchical entries using '_relationship' table (following the characteristics of feature table). 

So a sample of a germplasm, germplasm and population can be stock entries. A project can have a sub-project and all of them can be stored in project table. With group module open to every base table, some folk may end up using grouping stocks using group table which will make sharing chado-related code very difficult..

We could start with absolutely necessary tables - (eg. featuregrp) and then have further discussions about the necessity of grouping other tables..

Sook

On Fri, Dec 5, 2014 at 3:07 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Mara,

Oh, I missed that there would also be a linking_table field in the submodule table as well.  I like that better that what I suggested.

Stephen


On 12/5/2014 2:35 PM, Mara Kim wrote:
@Stephen:

I don't see the benefit to having the grpmemberlinker as it the resulting grpmember-grpmemberlinker structure is equivalent to a singular grpmember table in Boyce-Codd Normal Form, since the grpmemberlinker table has a unique constraint on grpmember_id. The solution I was imagining is essentially this:

grpmember
---------------
grpmember_id
grp_id
type_id
rank
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmember_id
linking_table DEFAULT "xxx" CHECK (linking_table = "xxx")
FOREIGN KEY (grpmember_id,linking_table) REFERENCES grpmember(grpmember_id,linking_table)


@Sook:

At one point the was a design where we had grp tables in the style of featuregrp, organismgrp, etc.  The reason why we decided not to have xxxgrp tables was to avoid the multiplicative effect that would result from creating xxxgrp_synonym, xxxgrpprop, xxxgrp_pub, etc. for every single type of group.  It would require the cross product of every tagging table for groups with every table type that could form a group.


On Thu, Dec 4, 2014 at 3:14 PM, Sook Jung <[hidden email]> wrote:
Hi,
Sorry for asking this now but I'm just wondering why not just have grp_feature, grp_organism, etc? Is that because there is a case where someone want to add a feature and an organism into the same group?
Sook

On Thu, Dec 4, 2014 at 4:02 PM, Stephen Ficklin <[hidden email]> wrote:
Thanks Mara and Karl for the responses.   I think your idea, Mara, would work. I'm at a loss for a more elegant solution.  At least if we had a 'linked_table' column the knowledge of how to join is there.  And even thought it would take multiple queries to do the join (one to figure which tables to join and a second (or more) to do all the necessary joining) at least I could do it programmatically.

To help solve both problems of say an organism and a feature sharing the same grpmember and to knowing what table to link to... what if we added an intermediary table named 'grpmemberlinker'.  This table will have the 'linked_table' field and an FK relationship with the 'grpmember' table.  It will also have a unique constraint on the grpmember_id.  This will only allow that grpmember_id to be linked to a record in only one table. Then in the "submodule" linking tables we use the 'grpmemberlinker_id' rather than the 'grpmember_id' (Example schema below).  It adds an extra join but would perhaps help with both problems?

grpmember
---------------
grpmember_id
grp_id
type_id
rank

grpmemberlinker
------------------------
grpmemberlinker_id
grpmember_id  (unique constraint)
linking_table (contains the name of the table where the group member is found)

xxxx_grpmember  (links to the group member table (e.g. feature_grpmember, organism_grpmember)).
----------------------
xxxx_grpmember_id
xxxx_id
grpmemberlinker_id  (uses the grpmemberlinker_id rather than the grpmember_id).



On 12/4/2014 3:10 PM, Mara Kim wrote:
This was a point of contention during our discussions.  The linker tables are necessary to preserve referential integrity with foreign key constraints, but also allows an organism and a feature to be linked to the same grpmember which seems wrong.  This means that the goal of referential integrity is at odds with relationship multiplicity, and the consensus during our conferences was that referential integrity was more important. 

We could partially solve this by using the information_schema to do introspection, and record the table name for each grpmember as a column (ex. `linked_table`).  Then in each linker table, make a `linked_table` table with a CHECK constraint to force it to contain the linked table name, and then use a FK constraint on grpmember_id and linked_table.  That *should* work, but its a pretty ugly hack.

On Thu, Dec 4, 2014 at 1:38 PM, Karl O. Pinc <[hidden email]> wrote:
Hi Stephen,

My 2 cents.

On 12/04/2014 12:35:33 PM, Stephen Ficklin wrote:

> Suppose, I am exploring a populated instance of the group module. I
> can
> get the list of members from the grpmember table and I can classify
> them
> by their type_id, but I don't know which table to join on to get
> further
> information (e.g. feature_grpmember -> feature).   It seems the group
> module requires a bit of human knowledge to know which tables to find
> information about group members.  I do not remember the mechanism we
> decided to know which table to join on?   Did we have a suggested
> method
> or did we expect that knowledge be built into the application to know
> which tables to join?  If the latter perhaps we should revisit the
> schema as I think it would be best if the schema provided all
> information about data relationships.

I don't know anything about the group module, but coming from
a database perspective the answer is that if you want to find
out if there's data on the other side of a potential join
the way to find out is to do the join.  To do otherwise
is to invite trouble by storing the same data in two places.

In the classic case where a person can have multiple addresses
you'd join the people table to the addresses table to count
whether and how many addresses a person has.  You wouldn't
also have an "address_count" column on the people table.

Of course you could make a view, materialized or otherwise,
that looks like the people table but also has an address_count
column.  But this is different from actually encoding
in the schema what's going on with the join.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema