Avoiding the empty string in text

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Avoiding the empty string in text

Karl O. Pinc
Hi,

I notice that Chado sometimes defaults text
columns to ''.  There is, IMO, a better way to
do this.  (NULL is a better choice.)

Appended is an example patch.

Note that this works back to PG 7.2.  Internally
it is (roughly) equivalent to the current code since
creating a unique constraint makes an index
anyway.


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


-----------------<snip>-------------------

--- /tmp/general.sql 2014-06-26 12:09:14.534790558 -0400
+++ /tmp/general.sql-new 2014-06-26 12:33:26.872864761 -0400
@@ -55,13 +55,18 @@
     db_id int not null,
     foreign key (db_id) references db (db_id) on delete cascade
INITIALLY DEFERRED,
     accession varchar(255) not null,
-    version varchar(255) not null default '',
-    description text,
-    constraint dbxref_c1 unique (db_id,accession,version)
+    version varchar(255),
+    description text
 );
 create index dbxref_idx1 on dbxref (db_id);
 create index dbxref_idx2 on dbxref (accession);
 create index dbxref_idx3 on dbxref (version);
+create unique index dbxref_idx4
+       on dbxref (db_id,accession,version)
+       where version is not null;
+create unique index dbxref_idx5
+       on dbxref (db_id,accession)
+       where version is null;
 
 COMMENT ON TABLE dbxref IS 'A unique, global, public, stable
identifier. Not necessarily an external reference - can reference data
items inside the particular chado instance being used. Typically a row
in a table can be uniquely identified with a primary identifier (called
dbxref_id); a table may also have secondary identifiers (in a linking
table <T>_dbxref). A dbxref is generally written as <DB>:<ACCESSION> or
as <DB>:<ACCESSION>:<VERSION>.';
 

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding the empty string in text

Siddhartha Basu
Hi,

On Thu, 26 Jun 2014, Karl O. Pinc wrote:

> Hi,
>
> I notice that Chado sometimes defaults text
> columns to ''.  There is, IMO, a better way to
> do this.  (NULL is a better choice.)
I think this is a good idea, hopefully the patch gets in the next
release of chado.

thanks,
-sidd




>
> Appended is an example patch.
>
> Note that this works back to PG 7.2.  Internally
> it is (roughly) equivalent to the current code since
> creating a unique constraint makes an index
> anyway.
>
>
> Karl <[hidden email]>
> Free Software:  "You don't pay back, you pay forward."
>                  -- Robert A. Heinlein
>
>
> -----------------<snip>-------------------
>
> --- /tmp/general.sql 2014-06-26 12:09:14.534790558 -0400
> +++ /tmp/general.sql-new 2014-06-26 12:33:26.872864761 -0400
> @@ -55,13 +55,18 @@
>      db_id int not null,
>      foreign key (db_id) references db (db_id) on delete cascade
> INITIALLY DEFERRED,
>      accession varchar(255) not null,
> -    version varchar(255) not null default '',
> -    description text,
> -    constraint dbxref_c1 unique (db_id,accession,version)
> +    version varchar(255),
> +    description text
>  );
>  create index dbxref_idx1 on dbxref (db_id);
>  create index dbxref_idx2 on dbxref (accession);
>  create index dbxref_idx3 on dbxref (version);
> +create unique index dbxref_idx4
> +       on dbxref (db_id,accession,version)
> +       where version is not null;
> +create unique index dbxref_idx5
> +       on dbxref (db_id,accession)
> +       where version is null;
>  
>  COMMENT ON TABLE dbxref IS 'A unique, global, public, stable
> identifier. Not necessarily an external reference - can reference data
> items inside the particular chado instance being used. Typically a row
> in a table can be uniquely identified with a primary identifier (called
> dbxref_id); a table may also have secondary identifiers (in a linking
> table <T>_dbxref). A dbxref is generally written as <DB>:<ACCESSION> or
> as <DB>:<ACCESSION>:<VERSION>.';
>  
>
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding the empty string in text

Karl O. Pinc
On 06/27/2014 12:35:28 PM, Siddhartha Basu wrote:

> Hi,
>
> On Thu, 26 Jun 2014, Karl O. Pinc wrote:
>
> > Hi,
> >
> > I notice that Chado sometimes defaults text
> > columns to ''.  There is, IMO, a better way to
> > do this.  (NULL is a better choice.)
> I think this is a good idea, hopefully the patch gets in the next
> release of chado.

The patch does not address all the places where
this happens.  I just picked a random example
to patch.  "Standard is better than better."
Best to patch everywhere or not at all.

I was reluctant to put any more work into
a patch when unsure of the reception.


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding the empty string in text

Hilmar Lapp-3
I agree that a empty string as default is a bad choice. I wasn't even aware that Chado had this in places.

 -hilmar


On Fri, Jun 27, 2014 at 2:36 PM, Karl O. Pinc <[hidden email]> wrote:
On 06/27/2014 12:35:28 PM, Siddhartha Basu wrote:
> Hi,
>
> On Thu, 26 Jun 2014, Karl O. Pinc wrote:
>
> > Hi,
> >
> > I notice that Chado sometimes defaults text
> > columns to ''.  There is, IMO, a better way to
> > do this.  (NULL is a better choice.)
> I think this is a good idea, hopefully the patch gets in the next
> release of chado.

The patch does not address all the places where
this happens.  I just picked a random example
to patch.  "Standard is better than better."
Best to patch everywhere or not at all.

I was reluctant to put any more work into
a patch when unsure of the reception.


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Hilmar Lapp -:- lappland.io


------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding the empty string in text

Scott Cain
This is actually a little complicated.  Let me see if I can reason it out (though I wasn't involved in the initial design decision): the problem is that sometimes accessions sometimes have versions, sometimes they do not.  If we allow nulls, then an accession that has a version (like GenBank) would be allowed to have a null version, which would be bad, so we want to require them to always have a version, so that implies "NOT NULL".  Also, since the "real" primary key on the table (not the surrogate dbxref_id primary key) on the table is the combination of db_id, accession and version we have a unique constraint on that combination.  If we allow nulls, (I think) you could have a case where somebody enters the same db_id and accession and a null version twice, but since NULL != NULL it would survive the constraint (I really can't remember if that's true or not :-/

Does that make sense?


On Sun, Jun 29, 2014 at 9:06 PM, Hilmar Lapp <[hidden email]> wrote:
I agree that a empty string as default is a bad choice. I wasn't even aware that Chado had this in places.

 -hilmar


On Fri, Jun 27, 2014 at 2:36 PM, Karl O. Pinc <[hidden email]> wrote:
On 06/27/2014 12:35:28 PM, Siddhartha Basu wrote:
> Hi,
>
> On Thu, 26 Jun 2014, Karl O. Pinc wrote:
>
> > Hi,
> >
> > I notice that Chado sometimes defaults text
> > columns to ''.  There is, IMO, a better way to
> > do this.  (NULL is a better choice.)
> I think this is a good idea, hopefully the patch gets in the next
> release of chado.

The patch does not address all the places where
this happens.  I just picked a random example
to patch.  "Standard is better than better."
Best to patch everywhere or not at all.

I was reluctant to put any more work into
a patch when unsure of the reception.


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Hilmar Lapp -:- lappland.io


------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding the empty string in text

Andy Schroeder
I think Scott's second point is more pertinent here, at least in the case of dbxref.  Having NULL as part of a primary key is not good practice even though you can do it in postgres.  

cheers,
Andy


On Tue, Jul 8, 2014 at 2:42 PM, Scott Cain <[hidden email]> wrote:
This is actually a little complicated.  Let me see if I can reason it out (though I wasn't involved in the initial design decision): the problem is that sometimes accessions sometimes have versions, sometimes they do not.  If we allow nulls, then an accession that has a version (like GenBank) would be allowed to have a null version, which would be bad, so we want to require them to always have a version, so that implies "NOT NULL".  Also, since the "real" primary key on the table (not the surrogate dbxref_id primary key) on the table is the combination of db_id, accession and version we have a unique constraint on that combination.  If we allow nulls, (I think) you could have a case where somebody enters the same db_id and accession and a null version twice, but since NULL != NULL it would survive the constraint (I really can't remember if that's true or not :-/

Does that make sense?


On Sun, Jun 29, 2014 at 9:06 PM, Hilmar Lapp <[hidden email]> wrote:
I agree that a empty string as default is a bad choice. I wasn't even aware that Chado had this in places.

 -hilmar


On Fri, Jun 27, 2014 at 2:36 PM, Karl O. Pinc <[hidden email]> wrote:
On 06/27/2014 12:35:28 PM, Siddhartha Basu wrote:
> Hi,
>
> On Thu, 26 Jun 2014, Karl O. Pinc wrote:
>
> > Hi,
> >
> > I notice that Chado sometimes defaults text
> > columns to ''.  There is, IMO, a better way to
> > do this.  (NULL is a better choice.)
> I think this is a good idea, hopefully the patch gets in the next
> release of chado.

The patch does not address all the places where
this happens.  I just picked a random example
to patch.  "Standard is better than better."
Best to patch everywhere or not at all.

I was reluctant to put any more work into
a patch when unsure of the reception.


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
Hilmar Lapp -:- lappland.io


------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding the empty string in text

Karl O. Pinc
In reply to this post by Scott Cain
On 07/08/2014 01:42:30 PM, Scott Cain wrote:

> This is actually a little complicated.  Let me see if I can reason it
> out
> (though I wasn't involved in the initial design decision): the
> problem
> is
> that sometimes accessions sometimes have versions, sometimes they do
> not.
>  If we allow nulls, then an accession that has a version (like
> GenBank)
> would be allowed to have a null version, which would be bad, so we
> want to
> require them to always have a version, so that implies "NOT NULL".

Sure, but currently those accessions required to have a version
can have a version consisting of the empty string so a not-NULL
constraint does not buy you anything.

> Also,
> since the "real" primary key on the table (not the surrogate
> dbxref_id
> primary key) on the table is the combination of db_id, accession and
> version we have a unique constraint on that combination.  If we allow
> nulls, (I think) you could have a case where somebody enters the same
> db_id
> and accession and a null version twice, but since NULL != NULL it
> would
> survive the constraint (I really can't remember if that's true or not
> :-/

It _would_ survive the constraint, except that the technique of using
a WHERE and creating 2 indexes ensures that there is only a single
NULL-and-some-other-unique-pair combination.

>From the original patch:

+create unique index dbxref_idx4
+       on dbxref (db_id,accession,version)
+       where version is not null;
+create unique index dbxref_idx5
+       on dbxref (db_id,accession)
+       where version is null;
 
I believe that while there's other ways to do this this is
most clear and efficient.

> > On Fri, Jun 27, 2014 at 2:36 PM, Karl O. Pinc <[hidden email]> wrote:
> >
> >> On 06/27/2014 12:35:28 PM, Siddhartha Basu wrote:
> >> > Hi,
> >> >
> >> > On Thu, 26 Jun 2014, Karl O. Pinc wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I notice that Chado sometimes defaults text
> >> > > columns to ''.  There is, IMO, a better way to
> >> > > do this.  (NULL is a better choice.)
> >> > I think this is a good idea, hopefully the patch gets in the
> next
> >> > release of chado.
> >>
> >> The patch does not address all the places where
> >> this happens.  I just picked a random example
> >> to patch.  "Standard is better than better."
> >> Best to patch everywhere or not at all.
> >>
> >> I was reluctant to put any more work into
> >> a patch when unsure of the reception.
> >>
> >>
> >> Karl <[hidden email]>
> >> Free Software:  "You don't pay back, you pay forward."
> >>                  -- Robert A. Heinlein
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Open source business process management suite built on Java and
> Eclipse
> >> Turn processes into business applications with Bonita BPM
> Community
> >> Edition
> >> Quickly connect people, data, and systems into organized workflows
> >> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> >> http://p.sf.net/sfu/Bonitasoft
> >> _______________________________________________
> >> Gmod-schema mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >>
> >
> >
> >
> > --
> > Hilmar Lapp -:- lappland.io
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Open source business process management suite built on Java and
> Eclipse
> > Turn processes into business applications with Bonita BPM Community
> Edition
> > Quickly connect people, data, and systems into organized workflows
> > Winner of BOSSIE, CODIE, OW2 and Gartner awards
> > http://p.sf.net/sfu/Bonitasoft
> > _______________________________________________
> > Gmod-schema mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >
> >
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at
> scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>




Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema