RFC: Chado relase v1.3

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: Chado relase v1.3

Stephen Ficklin-3
Dear Chado User Community,

Representatives from the Tripal (Stephen Ficklin, Lacey Sanderson) and
Chado (Scott Cain) projects have combined efforts to work towards a v1.3
release of Chado.    To do this, we have compiled a list of the
requested changes that we knew about or that were posted to the GMOD
Schema mailing list.   You can find the list on the Google Doc at this link:

https://docs.google.com/document/d/1IZ3VMpIoG1hhpbHYi6rbChImLgrlmbyy7Ewms-EpaeU/edit?usp=sharing

We are requesting comments on the document.   For v1.3 we are proposing
a quick release that will include mostly new linking and property tables
to existing Chado tables (see Google doc for complete list).  If you
have any additional linking tables that you would like to request for
the v1.3 release please make a suggestion so we can add them to the list
for consideration.

Aside from these linking tables we are considering the following changes
to the v1.3 release.

1)  Add a new 'infraspecific' field for the organism table to allow for
storing the names of subspecies, varieties, subvarieties, forma and
subforma.   However, we would like to know.... should the infraspecific
field be used for storing names of strains and cultivars?  If so, then
the recommendation would be to store details about individual strains
and cultivars in the Stock module tables. Alternatively, FlyBase has
suggested a separate set of tables for storing strains.   Please comment
on the Google Doc if you have opinions on the best way to
represent/store strains/cultivars in Chado.

2)  The addition of an 'organism_relationship' table that allows for
storing relationships (not taxonomy) between organisms.  An example use
case would be for storing breeding relationships (e.g. sterile_with,
incompatible_with, fertile_with).

3)  Move the 'db' and 'dbxref' tables into a new module called 'DB'.  
This will not require any SQL changes, just a name change in the
documentation.

4) Change 'feature.seqlen' to a bigint to accommodate longer sequences.

The more complex issues we are reserving for a potential v1.4 release
after more discussion is held.

Thanks for any input!
Stephen






------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Chado relase v1.3

Lukas Mueller
HI Stephen,

just to add another totally unrelated request: Could we have timestamps for creation time (maybe also update) in the nd_experiment and stock (possibly project) tables? In breeding programs, people would like to know when and how many things have been added…

cheers
Lukas

> On Mar 9, 2015, at 9:20 AM, Stephen Ficklin <[hidden email]> wrote:
>
> Dear Chado User Community,
>
> Representatives from the Tripal (Stephen Ficklin, Lacey Sanderson) and
> Chado (Scott Cain) projects have combined efforts to work towards a v1.3
> release of Chado.    To do this, we have compiled a list of the
> requested changes that we knew about or that were posted to the GMOD
> Schema mailing list.   You can find the list on the Google Doc at this link:
>
> https://docs.google.com/document/d/1IZ3VMpIoG1hhpbHYi6rbChImLgrlmbyy7Ewms-EpaeU/edit?usp=sharing
>
> We are requesting comments on the document.   For v1.3 we are proposing
> a quick release that will include mostly new linking and property tables
> to existing Chado tables (see Google doc for complete list).  If you
> have any additional linking tables that you would like to request for
> the v1.3 release please make a suggestion so we can add them to the list
> for consideration.
>
> Aside from these linking tables we are considering the following changes
> to the v1.3 release.
>
> 1)  Add a new 'infraspecific' field for the organism table to allow for
> storing the names of subspecies, varieties, subvarieties, forma and
> subforma.   However, we would like to know.... should the infraspecific
> field be used for storing names of strains and cultivars?  If so, then
> the recommendation would be to store details about individual strains
> and cultivars in the Stock module tables. Alternatively, FlyBase has
> suggested a separate set of tables for storing strains.   Please comment
> on the Google Doc if you have opinions on the best way to
> represent/store strains/cultivars in Chado.
>
> 2)  The addition of an 'organism_relationship' table that allows for
> storing relationships (not taxonomy) between organisms.  An example use
> case would be for storing breeding relationships (e.g. sterile_with,
> incompatible_with, fertile_with).
>
> 3)  Move the 'db' and 'dbxref' tables into a new module called 'DB'.  
> This will not require any SQL changes, just a name change in the
> documentation.
>
> 4) Change 'feature.seqlen' to a bigint to accommodate longer sequences.
>
> The more complex issues we are reserving for a potential v1.4 release
> after more discussion is held.
>
> Thanks for any input!
> Stephen
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Chado relase v1.3

Sook Jung
Hi,
I recently also realized that it would be really useful to have timestamps in the tables that Lukas mentioned.
Thanks
Sook

On Mon, Mar 9, 2015 at 10:37 AM, Lukas A. Mueller <[hidden email]> wrote:
HI Stephen,

just to add another totally unrelated request: Could we have timestamps for creation time (maybe also update) in the nd_experiment and stock (possibly project) tables? In breeding programs, people would like to know when and how many things have been added…

cheers
Lukas

> On Mar 9, 2015, at 9:20 AM, Stephen Ficklin <[hidden email]> wrote:
>
> Dear Chado User Community,
>
> Representatives from the Tripal (Stephen Ficklin, Lacey Sanderson) and
> Chado (Scott Cain) projects have combined efforts to work towards a v1.3
> release of Chado.    To do this, we have compiled a list of the
> requested changes that we knew about or that were posted to the GMOD
> Schema mailing list.   You can find the list on the Google Doc at this link:
>
> https://docs.google.com/document/d/1IZ3VMpIoG1hhpbHYi6rbChImLgrlmbyy7Ewms-EpaeU/edit?usp=sharing
>
> We are requesting comments on the document.   For v1.3 we are proposing
> a quick release that will include mostly new linking and property tables
> to existing Chado tables (see Google doc for complete list).  If you
> have any additional linking tables that you would like to request for
> the v1.3 release please make a suggestion so we can add them to the list
> for consideration.
>
> Aside from these linking tables we are considering the following changes
> to the v1.3 release.
>
> 1)  Add a new 'infraspecific' field for the organism table to allow for
> storing the names of subspecies, varieties, subvarieties, forma and
> subforma.   However, we would like to know.... should the infraspecific
> field be used for storing names of strains and cultivars?  If so, then
> the recommendation would be to store details about individual strains
> and cultivars in the Stock module tables. Alternatively, FlyBase has
> suggested a separate set of tables for storing strains.   Please comment
> on the Google Doc if you have opinions on the best way to
> represent/store strains/cultivars in Chado.
>
> 2)  The addition of an 'organism_relationship' table that allows for
> storing relationships (not taxonomy) between organisms.  An example use
> case would be for storing breeding relationships (e.g. sterile_with,
> incompatible_with, fertile_with).
>
> 3)  Move the 'db' and 'dbxref' tables into a new module called 'DB'.
> This will not require any SQL changes, just a name change in the
> documentation.
>
> 4) Change 'feature.seqlen' to a bigint to accommodate longer sequences.
>
> The more complex issues we are reserving for a potential v1.4 release
> after more discussion is held.
>
> Thanks for any input!
> Stephen
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Chado relase v1.3

Andrew Farmer
In reply to this post by Stephen Ficklin-3
Hi Stephen-
it's great to see this moving forward!

here are a couple of additional very minor changes from our ongoing work with the phylogeny module that I think
would fit the spirit of the proposed v1.3 release:

add phylotreeprop table
add index to phylonode.parent_phylonode_id (we had some serious performance issues with tree deletions until
                                                                        the omission of this index was discovered)


Also, unrelated to phylogeny but possibly worth including in the set of linker tables to be added this round:
biomaterial_project

we found this useful when representing BioSample/BioProject info taken from NCBI; but, we also made some associated
changes to existing tables that may be outside the scope of the v1.3 release:
  • project table
    • add dbxref_id => dbxref (supports NCBI BioProject ids, for example)
    • add type_id => cvterm (supports classification of projects, initially using cv derived from NCBI's BioProject vocabulary)
    • ALTER COLUMN description TYPE text (BioProject descriptions can be long)
  • biomaterial table
    • add stock_id => stock (allows tracking cultivar, etc. for samples; but this may be unnecessary if subspecies info is now going to be included in organism)
    • add project_id => project (links samples to primary projects, as is done in NCBI)

hope that is helpful; let us know if you need more info to justify their inclusion or whatever else would make
it easier for you to get the changes incorporated into v 1.3 (DDL, etc).

thanks again

Andrew Farmer




On 3/9/15 7:20 AM, Stephen Ficklin wrote:
Dear Chado User Community,

Representatives from the Tripal (Stephen Ficklin, Lacey Sanderson) and 
Chado (Scott Cain) projects have combined efforts to work towards a v1.3 
release of Chado.    To do this, we have compiled a list of the 
requested changes that we knew about or that were posted to the GMOD 
Schema mailing list.   You can find the list on the Google Doc at this link:

https://docs.google.com/document/d/1IZ3VMpIoG1hhpbHYi6rbChImLgrlmbyy7Ewms-EpaeU/edit?usp=sharing

We are requesting comments on the document.   For v1.3 we are proposing 
a quick release that will include mostly new linking and property tables 
to existing Chado tables (see Google doc for complete list).  If you 
have any additional linking tables that you would like to request for 
the v1.3 release please make a suggestion so we can add them to the list 
for consideration.

Aside from these linking tables we are considering the following changes 
to the v1.3 release.

1)  Add a new 'infraspecific' field for the organism table to allow for 
storing the names of subspecies, varieties, subvarieties, forma and 
subforma.   However, we would like to know.... should the infraspecific 
field be used for storing names of strains and cultivars?  If so, then 
the recommendation would be to store details about individual strains 
and cultivars in the Stock module tables. Alternatively, FlyBase has 
suggested a separate set of tables for storing strains.   Please comment 
on the Google Doc if you have opinions on the best way to 
represent/store strains/cultivars in Chado.

2)  The addition of an 'organism_relationship' table that allows for 
storing relationships (not taxonomy) between organisms.  An example use 
case would be for storing breeding relationships (e.g. sterile_with, 
incompatible_with, fertile_with).

3)  Move the 'db' and 'dbxref' tables into a new module called 'DB'.  
This will not require any SQL changes, just a name change in the 
documentation.

4) Change 'feature.seqlen' to a bigint to accommodate longer sequences.

The more complex issues we are reserving for a potential v1.4 release 
after more discussion is held.

Thanks for any input!
Stephen






------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema

-- 
...all concepts in which an entire process is semiotically concentrated
elude definition; only that which has no history is definable.

Friedrich Nietzsche

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Chado relase v1.3

Karl O. Pinc
In reply to this post by Stephen Ficklin-3
On Mon, 09 Mar 2015 09:20:49 -0400
Stephen Ficklin <[hidden email]> wrote:

> Dear Chado User Community,
>
> Representatives from the Tripal (Stephen Ficklin, Lacey Sanderson)
> and Chado (Scott Cain) projects have combined efforts to work towards
> a v1.3 release of Chado.    To do this, we have compiled a list of
> the requested changes that we knew about or that were posted to the
> GMOD Schema mailing list.   You can find the list on the Google Doc
> at this link:
>
> https://docs.google.com/document/d/1IZ3VMpIoG1hhpbHYi6rbChImLgrlmbyy7Ewms-EpaeU/edit?usp=sharing
>
> We are requesting comments on the document.   For v1.3 we are
> proposing a quick release that will include mostly new linking and
> property tables to existing Chado tables (see Google doc for complete
> list).  If you have any additional linking tables that you would like
> to request for the v1.3 release please make a suggestion so we can
> add them to the list for consideration.

You may wish to consider these indexes.  I've found them
essential to making the queries we run perform well.

-- The effectiveness of this will vary based on
-- whether you have more subjects or objects.
create index feature_relationship_idx1b
  on feature_relationship (object_id, subject_id, type_id);

create index featureloc_idx1b
  on featureloc (feature_id, fmin, fmax);

create index feature_idx1b
  on feature (feature_id, dbxref_id)
  where dbxref_id is not null;

---------------------------

I would like to see the installation process change to
be _way_ more friendly.  The following would be a good
start:

Do _not_ delete tables before installing.  (If the table
already exists the transaction should roll back.)

Do not be so chatty, display only warnings
and errors, not informational messages.

--------------------------

I would like to have it be possible to install
Chado into it's own schema.  The first step
for this is to get rid of the multiple
schemas that Chado currently uses.

------------------------------

I would like to see Chado be able to be installed
modularly.  At present this is rather-to-very difficult.
(At least it's difficult if you want only Chado.  I
don't know the process if Tripal is involved.)

The way to make this happen is to remove dependencies
between modules regards installation, as far as is possible.
You'd do this by doing the following:

Separate each module.sql file into pieces:

  Cascading destruction of the objects in the module.

  Creation of the module's tables.

  Creation of the module's constraints and indexes.

  Creation of the module's views.

  Creation of the module's triggers and functions.

By doing this a user can then create the tables of each
of the desired modules in any order and not have to know
ahead of time which modules require other modules.  No
foreign key constraints get in the way of table
creation.  Afterwards create the constraints and
triggers in any order desired.  Afterwards create the
views.  Wrap the execution of each file in a
transaction.  If an error is raised at any point during
constraint or view creation then you're missing a
required table (really, a module).  Roll back the
transaction and install the tables (and later the
constraints) of the missing modules.  Then re-install
until it works.

(Note that this assumes that it's possible to notice
that an error is raised during the install.  The present
system is so chatty that it's very easy to miss error
messages.)

This can be automated and much improved, if you want to
get fancy.  The design proposed here also solves the
problem of "linking tables" (e.g., analysis_feature) as
regards determining in which module such tables belong.
Linking tables and views would not belong to any module
but would be installed on an as-requirements-are-met
basis.  There is also the advantage, once setup, of very
little on-going maintenance.

(In my opinion if you're going to go through the work of
re-structuring the DDL statements you may as well go all
the way.  I believe that restructuring the existing DDL
files will be more labor intensive than the programming
required, although this may be optimistic.)

The first goal is to make it easy for a program to tell
what tables and views exist in each module, and what
object has triggers, constraints and indexes.  The idea
is to reflect this information in file/directory names
within a modified version of the existing
one-module-per-directory structure.  Likewise the
proposed directory structure reveals which tables are
linking tables.

Instead of having a single file for all tables in a
module, separate out the linking tables from the regular
tables.  Put each of the CREATE TABLE statements for the
regular tables into files, one per table, in a "tables"
directory within the directory for the module.
Throughout the design proposed here each file would have
a name that is the name of the table it creates (or
creates constraints and indexes for, or creates triggers
for, etc.).  Likewise, within each module there must be
a directory containing per-table files holding each
table's constraints and indexes, and an analogous
structure for triggers.  Put each of the CREATE TABLE
statements for each linking tables into a module-level
directory, shared by all modules, with one file per
linking table.  Do the same for the linking tables
constraints and indexes, etc.  Put all the views in all
modules into a single view directory for all modules, as
with the linking tables.  The views would be defined
one-per-file and each file name would be the name of the
view.  Note that views may have triggers.  There would
be another directory for these.

After installing all of Chado into a test db Postgres'
introspection can be used to determine which tables have
non-null columns containing foreign keys, including
which linking tables require which other tables, and
which views require which other tables or views.

Note that installing all of Chado is easy.  Since all
the tables will exist, the files which comprise the
totality of each step, table creation, index and
constraint creation, etc., can be installed in any
order.  The only un-addressed problem is views that
depend on other views.  But this is only a problem for
the developers of Chado, need only be solved once and
that can be on an ad-hoc basis.  It is not a problem for
the user who wants to install individual Chado modules.

Because it is now known what db objects require other db
objects a program that installs modules can ensure that
the tables in pre-requisite modules are installed.  It
would know that a module is a pre-requisite of another
if the latter contains a table with a non-NULL column
storing a foreign key of the former.  Likewise, an
installation program can look through all the linking
tables that exist, and all the views that exist, and
determine if the prerequisites are met for the
installation of any given linking table or view, and
install said linking table or view when and only when
it's pre-requisites are met.

In summary, module installation should then be as simple
as giving an install program a module name.  I've not
thought through module deletion, but it seems like
deletion could also be as straightforward for the user.
(Since creation and deletion would be easy, a
deletion/creation cycle provides a handy way of removing
all content from a module -- working around Chado's
cascading deletes that could impact data in other
modules.  This could provide a way to "do-over" the
loading of data into an unfamiliar module.)

Note that the db introspection does not have to occur at
the time of install.  The results of introspection on a
complete Chado install can be cached at development time
and distributed as part of the Chado distribution.

Triggers (functions really) are the only difficult part.
In the case of the "simple version" and manual piecemeal
module install, above, since triggers and functions
don't raise errors until they are used you won't know
that you're missing a dependent module.  As regards a
more complete revamp of the installation system there's
no way to use introspection to determine what
tables/views a given trigger requires.  So, triggers
require the same sort of hack that's used now for whole
modules, a manually written list of the tables and views
the trigger uses, to inform the installation system of
each trigger's prerequisites.

Fortunately, there don't seem to be many triggers.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Chado relase v1.3

Karl O. Pinc
On Wed, 11 Mar 2015 11:49:51 -0500
"Karl O. Pinc" <[hidden email]> wrote:

> You may wish to consider these indexes.  I've found them
> essential to making the queries we run perform well.

Note that I chose index names so as not to collide with
official Chado index names.  You will want to change
the index names from what was written.

> --------------------------
>
> I would like to have it be possible to install
> Chado into it's own schema.  The first step
> for this is to get rid of the multiple
> schemas that Chado currently uses.

A second step might be to have chado install into
a schema named "chado".

Right now it installs, IIRC,
some stuff directly into "public" and other stuff into
whatever happens to be at the front of the
search_path, and other stuff into the couple of
other some schema names that are hardcoded.

Of course a recommendation
(or default change) to frob the default search path
to contain "chado" would be required.


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Chado relase v1.3

Siddhartha Basu
In reply to this post by Stephen Ficklin-3
Hi,
Thanks for taking this step. Really appreciate that.

On Mon, 09 Mar 2015, Stephen Ficklin wrote:

> Dear Chado User Community,
>
> Representatives from the Tripal (Stephen Ficklin, Lacey Sanderson) and
> Chado (Scott Cain) projects have combined efforts to work towards a v1.3
> release of Chado.    To do this, we have compiled a list of the
> requested changes that we knew about or that were posted to the GMOD
> Schema mailing list.   You can find the list on the Google Doc at this link:
>
> https://docs.google.com/document/d/1IZ3VMpIoG1hhpbHYi6rbChImLgrlmbyy7Ewms-EpaeU/edit?usp=sharing
>
> We are requesting comments on the document.   For v1.3 we are proposing
> a quick release that will include mostly new linking and property tables
> to existing Chado tables (see Google doc for complete list).  If you
> have any additional linking tables that you would like to request for
> the v1.3 release please make a suggestion so we can add them to the list
> for consideration.
>
> Aside from these linking tables we are considering the following changes
> to the v1.3 release.
>
> 1)  Add a new 'infraspecific' field for the organism table to allow for
> storing the names of subspecies, varieties, subvarieties, forma and
> subforma.   However, we would like to know.... should the infraspecific
> field be used for storing names of strains and cultivars?  If so, then
> the recommendation would be to store details about individual strains
> and cultivars in the Stock module tables. Alternatively, FlyBase has
> suggested a separate set of tables for storing strains.   Please comment
> on the Google Doc if you have opinions on the best way to
> represent/store strains/cultivars in Chado.
We do need to store the name of our strain which actually ties it to
genome as that is what got sequenced and we have all the annotations for.
For example, dictyostelium discoideum AX4, dictyostelium
discoideum AX2, dictyostelium discoideum NC4 etc. At this i do append
the strain to the species. Is this change designed to take care of this
limitation(having a separate column instead of stringifying). No,
using Stock module does not address this problem.


>
> 2)  The addition of an 'organism_relationship' table that allows for
> storing relationships (not taxonomy) between organisms.  An example use
> case would be for storing breeding relationships (e.g. sterile_with,
> incompatible_with, fertile_with).
Seemed reasonable to me, as long as i could also store any arbitary relationships.
>
> 3)  Move the 'db' and 'dbxref' tables into a new module called 'DB'.  
> This will not require any SQL changes, just a name change in the
> documentation.
>
> 4) Change 'feature.seqlen' to a bigint to accommodate longer sequences.
Great.

One more thing that come to my mind is to have a datetime column for
most of the central tables for example in the pub table. It simply allows me to
have the state of row with making changes to the core tables or
adding additional linking tables and application logic. The idea is
similar to what ruby on rails framework add to every table once you run
the migration through it(date_created, date_updated).

thanks,
-siddhartha



>
> The more complex issues we are reserving for a potential v1.4 release
> after more discussion is held.
>
> Thanks for any input!
> Stephen
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Chado relase v1.3

Siddhartha Basu
One more thing, is there any timeline of v1.3 release.
And what would be the versioning schema look like, will it use semantic
versioning or something arbitary.

-siddhartha

On Wed, 11 Mar 2015, Siddhartha Basu wrote:

> Hi,
> Thanks for taking this step. Really appreciate that.
>
> On Mon, 09 Mar 2015, Stephen Ficklin wrote:
>
> > Dear Chado User Community,
> >
> > Representatives from the Tripal (Stephen Ficklin, Lacey Sanderson) and
> > Chado (Scott Cain) projects have combined efforts to work towards a v1.3
> > release of Chado.    To do this, we have compiled a list of the
> > requested changes that we knew about or that were posted to the GMOD
> > Schema mailing list.   You can find the list on the Google Doc at this link:
> >
> > https://docs.google.com/document/d/1IZ3VMpIoG1hhpbHYi6rbChImLgrlmbyy7Ewms-EpaeU/edit?usp=sharing
> >
> > We are requesting comments on the document.   For v1.3 we are proposing
> > a quick release that will include mostly new linking and property tables
> > to existing Chado tables (see Google doc for complete list).  If you
> > have any additional linking tables that you would like to request for
> > the v1.3 release please make a suggestion so we can add them to the list
> > for consideration.
> >
> > Aside from these linking tables we are considering the following changes
> > to the v1.3 release.
> >
> > 1)  Add a new 'infraspecific' field for the organism table to allow for
> > storing the names of subspecies, varieties, subvarieties, forma and
> > subforma.   However, we would like to know.... should the infraspecific
> > field be used for storing names of strains and cultivars?  If so, then
> > the recommendation would be to store details about individual strains
> > and cultivars in the Stock module tables. Alternatively, FlyBase has
> > suggested a separate set of tables for storing strains.   Please comment
> > on the Google Doc if you have opinions on the best way to
> > represent/store strains/cultivars in Chado.
> We do need to store the name of our strain which actually ties it to
> genome as that is what got sequenced and we have all the annotations for.
> For example, dictyostelium discoideum AX4, dictyostelium
> discoideum AX2, dictyostelium discoideum NC4 etc. At this i do append
> the strain to the species. Is this change designed to take care of this
> limitation(having a separate column instead of stringifying). No,
> using Stock module does not address this problem.
>
>
> >
> > 2)  The addition of an 'organism_relationship' table that allows for
> > storing relationships (not taxonomy) between organisms.  An example use
> > case would be for storing breeding relationships (e.g. sterile_with,
> > incompatible_with, fertile_with).
> Seemed reasonable to me, as long as i could also store any arbitary relationships.
> >
> > 3)  Move the 'db' and 'dbxref' tables into a new module called 'DB'.  
> > This will not require any SQL changes, just a name change in the
> > documentation.
> >
> > 4) Change 'feature.seqlen' to a bigint to accommodate longer sequences.
> Great.
>
> One more thing that come to my mind is to have a datetime column for
> most of the central tables for example in the pub table. It simply allows me to
> have the state of row with making changes to the core tables or
> adding additional linking tables and application logic. The idea is
> similar to what ruby on rails framework add to every table once you run
> the migration through it(date_created, date_updated).
>
> thanks,
> -siddhartha
>
>
>
> >
> > The more complex issues we are reserving for a potential v1.4 release
> > after more discussion is held.
> >
> > Thanks for any input!
> > Stephen
> >
> >
> >
> >
> >
> >
> > ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> > by Intel and developed in partnership with Slashdot Media, is your hub for all
> > things parallel software development, from weekly thought leadership blogs to
> > news, videos, case studies, tutorials and more. Take a look and join the
> > conversation now. http://goparallel.sourceforge.net/
> > _______________________________________________
> > Gmod-schema mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Chado relase v1.3

Siddhartha Basu
In reply to this post by Karl O. Pinc
Hi,
Some great feedback from Karl, should take a while to process everthing.
By quickly looking few things i would like to agree and add on top of
those(more might come later)....

* We need to have a default/official way to manage version upgrade.
* Allow to install in a schema of choice(as karl suggested).
* Decouple the schema from data loading softwares, Its just a database schema,
  release the packaged ddl only. If we have a version management software
  that piece could be shipped together in that case, or people could
  easily that using a package management file. Something like this ...

  * To just install chado, download tarball, untar and run psql on some
   sql file.

that's all for now.

thanks,
-siddhartha



On Wed, 11 Mar 2015, Karl O. Pinc wrote:

> On Mon, 09 Mar 2015 09:20:49 -0400
> Stephen Ficklin <[hidden email]> wrote:
>
> > Dear Chado User Community,
> >
> > Representatives from the Tripal (Stephen Ficklin, Lacey Sanderson)
> > and Chado (Scott Cain) projects have combined efforts to work towards
> > a v1.3 release of Chado.    To do this, we have compiled a list of
> > the requested changes that we knew about or that were posted to the
> > GMOD Schema mailing list.   You can find the list on the Google Doc
> > at this link:
> >
> > https://docs.google.com/document/d/1IZ3VMpIoG1hhpbHYi6rbChImLgrlmbyy7Ewms-EpaeU/edit?usp=sharing
> >
> > We are requesting comments on the document.   For v1.3 we are
> > proposing a quick release that will include mostly new linking and
> > property tables to existing Chado tables (see Google doc for complete
> > list).  If you have any additional linking tables that you would like
> > to request for the v1.3 release please make a suggestion so we can
> > add them to the list for consideration.
>
> You may wish to consider these indexes.  I've found them
> essential to making the queries we run perform well.
>
> -- The effectiveness of this will vary based on
> -- whether you have more subjects or objects.
> create index feature_relationship_idx1b
>   on feature_relationship (object_id, subject_id, type_id);
>
> create index featureloc_idx1b
>   on featureloc (feature_id, fmin, fmax);
>
> create index feature_idx1b
>   on feature (feature_id, dbxref_id)
>   where dbxref_id is not null;
>
> ---------------------------
>
> I would like to see the installation process change to
> be _way_ more friendly.  The following would be a good
> start:
>
> Do _not_ delete tables before installing.  (If the table
> already exists the transaction should roll back.)
>
> Do not be so chatty, display only warnings
> and errors, not informational messages.
>
> --------------------------
>
> I would like to have it be possible to install
> Chado into it's own schema.  The first step
> for this is to get rid of the multiple
> schemas that Chado currently uses.
>
> ------------------------------
>
> I would like to see Chado be able to be installed
> modularly.  At present this is rather-to-very difficult.
> (At least it's difficult if you want only Chado.  I
> don't know the process if Tripal is involved.)
>
> The way to make this happen is to remove dependencies
> between modules regards installation, as far as is possible.
> You'd do this by doing the following:
>
> Separate each module.sql file into pieces:
>
>   Cascading destruction of the objects in the module.
>
>   Creation of the module's tables.
>
>   Creation of the module's constraints and indexes.
>
>   Creation of the module's views.
>
>   Creation of the module's triggers and functions.
>
> By doing this a user can then create the tables of each
> of the desired modules in any order and not have to know
> ahead of time which modules require other modules.  No
> foreign key constraints get in the way of table
> creation.  Afterwards create the constraints and
> triggers in any order desired.  Afterwards create the
> views.  Wrap the execution of each file in a
> transaction.  If an error is raised at any point during
> constraint or view creation then you're missing a
> required table (really, a module).  Roll back the
> transaction and install the tables (and later the
> constraints) of the missing modules.  Then re-install
> until it works.
>
> (Note that this assumes that it's possible to notice
> that an error is raised during the install.  The present
> system is so chatty that it's very easy to miss error
> messages.)
>
> This can be automated and much improved, if you want to
> get fancy.  The design proposed here also solves the
> problem of "linking tables" (e.g., analysis_feature) as
> regards determining in which module such tables belong.
> Linking tables and views would not belong to any module
> but would be installed on an as-requirements-are-met
> basis.  There is also the advantage, once setup, of very
> little on-going maintenance.
>
> (In my opinion if you're going to go through the work of
> re-structuring the DDL statements you may as well go all
> the way.  I believe that restructuring the existing DDL
> files will be more labor intensive than the programming
> required, although this may be optimistic.)
>
> The first goal is to make it easy for a program to tell
> what tables and views exist in each module, and what
> object has triggers, constraints and indexes.  The idea
> is to reflect this information in file/directory names
> within a modified version of the existing
> one-module-per-directory structure.  Likewise the
> proposed directory structure reveals which tables are
> linking tables.
>
> Instead of having a single file for all tables in a
> module, separate out the linking tables from the regular
> tables.  Put each of the CREATE TABLE statements for the
> regular tables into files, one per table, in a "tables"
> directory within the directory for the module.
> Throughout the design proposed here each file would have
> a name that is the name of the table it creates (or
> creates constraints and indexes for, or creates triggers
> for, etc.).  Likewise, within each module there must be
> a directory containing per-table files holding each
> table's constraints and indexes, and an analogous
> structure for triggers.  Put each of the CREATE TABLE
> statements for each linking tables into a module-level
> directory, shared by all modules, with one file per
> linking table.  Do the same for the linking tables
> constraints and indexes, etc.  Put all the views in all
> modules into a single view directory for all modules, as
> with the linking tables.  The views would be defined
> one-per-file and each file name would be the name of the
> view.  Note that views may have triggers.  There would
> be another directory for these.
>
> After installing all of Chado into a test db Postgres'
> introspection can be used to determine which tables have
> non-null columns containing foreign keys, including
> which linking tables require which other tables, and
> which views require which other tables or views.
>
> Note that installing all of Chado is easy.  Since all
> the tables will exist, the files which comprise the
> totality of each step, table creation, index and
> constraint creation, etc., can be installed in any
> order.  The only un-addressed problem is views that
> depend on other views.  But this is only a problem for
> the developers of Chado, need only be solved once and
> that can be on an ad-hoc basis.  It is not a problem for
> the user who wants to install individual Chado modules.
>
> Because it is now known what db objects require other db
> objects a program that installs modules can ensure that
> the tables in pre-requisite modules are installed.  It
> would know that a module is a pre-requisite of another
> if the latter contains a table with a non-NULL column
> storing a foreign key of the former.  Likewise, an
> installation program can look through all the linking
> tables that exist, and all the views that exist, and
> determine if the prerequisites are met for the
> installation of any given linking table or view, and
> install said linking table or view when and only when
> it's pre-requisites are met.
>
> In summary, module installation should then be as simple
> as giving an install program a module name.  I've not
> thought through module deletion, but it seems like
> deletion could also be as straightforward for the user.
> (Since creation and deletion would be easy, a
> deletion/creation cycle provides a handy way of removing
> all content from a module -- working around Chado's
> cascading deletes that could impact data in other
> modules.  This could provide a way to "do-over" the
> loading of data into an unfamiliar module.)
>
> Note that the db introspection does not have to occur at
> the time of install.  The results of introspection on a
> complete Chado install can be cached at development time
> and distributed as part of the Chado distribution.
>
> Triggers (functions really) are the only difficult part.
> In the case of the "simple version" and manual piecemeal
> module install, above, since triggers and functions
> don't raise errors until they are used you won't know
> that you're missing a dependent module.  As regards a
> more complete revamp of the installation system there's
> no way to use introspection to determine what
> tables/views a given trigger requires.  So, triggers
> require the same sort of hack that's used now for whole
> modules, a manually written list of the tables and views
> the trigger uses, to inform the installation system of
> each trigger's prerequisites.
>
> Fortunately, there don't seem to be many triggers.
>
> Regards,
>
> Karl <[hidden email]>
> Free Software:  "You don't pay back, you pay forward."
>                  -- Robert A. Heinlein
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Chado relase v1.3

Karl O. Pinc
In reply to this post by Stephen Ficklin-3
On Mon, 09 Mar 2015 09:20:49 -0400
Stephen Ficklin <[hidden email]> wrote:

> Representatives from the Tripal (Stephen Ficklin, Lacey Sanderson)
> and Chado (Scott Cain) projects have combined efforts to work towards
> a v1.3 release of Chado.

> We are requesting comments...

You could make Papio anubis (baboon) organism_id 13....


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema