Creating GBrowse database

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating GBrowse database

Vaneet Lotay

Hello,

 

We seem to be having trouble creating a new GBrowse database in MySQL for a brand new track or so it would seem as I’m not sure what exactly is going wrong.  We have a FASTA file and GFF3 file, we created a new GBrowse database and followed the tutorial (http://cpansearch.perl.org/src/LDS/GBrowse-2.43/htdocs/tutorial/tutorial.html#mysql) except to change user names and database names relevant to us.  We used the following command as in the tutorial and it loads successfully (replacing volvox of course):

 

bp_seqfeature_load.pl -c -f -a DBI::mysql -d volvox volvox_all.fa volvox_all.gff3

 

We checked the database after to verify that the sequence and features are loaded properly with matching sequence IDs to link them and it’s all there. However when we go to our test server we get a ‘Not found’ error as if it can’t find those tables in the database.  Doing some gene searches eventually gets another common error “Chromosome/contig not found”.  It actually reveals in the detailed message the correct gene name in our attributes column as well as the correct scaffold ID yet it says not defined in the database.

 

We created a very simplistic subset of the FASTA and GFF3 file containing only one scaffold sequence and one gene/mRNA with 1 exon and 1 CDS from the original files. We created a brand new database and loaded these small files and it still comes up with the same error.  Just wondering what we can be doing wrong as we’ve really changed a lot of small minor things that might be causing this disconnection but to no avail.  I attached these small subset files as well as the configuration file and you can visit our test server for this new database here:

 

http://gbrowse-test.xenbase.org/fgb2/gbrowse/xl_wt1_0/

 

In the context of these new files, after you visit this test server page, you’ll notice if you search for LOC which is the first 3 letters of the actual gene name (LOC100489), it comes up with the ‘contig not found’ message and shows this detail underneath which I stated before indicates that it at least found the GFF3 file and possibly the Chr01 sequence but still won’t display them for some reason:

 

Cannot display LOC100489 because the chromosome/contig named Chr01 is not defined in the database.

 

Please help if you know what initial steps might be going wrong.

 

Thanks,


Vaneet

 

Vaneet Lotay

Xenbase Bioinformatician

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

 


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

xl_wt1_0.conf (3K) Download Attachment
Xlaevis1503_no_dupl_fixed_alias-test2.gff3 (530 bytes) Download Attachment
Xlaevis1503_v2.assembly-test.fa (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Creating GBrowse database

Alexey Morozov
That's because "Chromosome/contig named Chr01 is not defined in the database". That means you need a separate line declaring all your chromosomes/scaffolds/whatever have you in GFF3 file. For example, my gff goes as follows:

#gff-version 3
scaffold00001   newbler contig  1       1348726 .       .       .       ID=scaffold00001;Name=scaffold00001
#
#The same for all scaffolds
#This is the thing you missed. It goes just like gene definition,
#
scaffold00001   maker   gene    56082   56739   .       +       .       ID=4447-gene;Name=4447-gene
scaffold00001   maker   mRNA    56082   56739   .       +       .       ID=4447;Parent=4447-gene;Name=4447;_AED=0.02;_eAED=0.02;_QI=0|0|0|1|1|1|2|0|184
scaffold00001   maker   exon    56082   56220   .       +       .       ID=4447:exon:0;Parent=4447
scaffold00001   maker   exon    56324   56739   .       +       .       ID=4447:exon:1;Parent=4447
scaffold00001   maker   CDS     56082   56220   .       +       0       ID=4447:cds;Parent=4447
scaffold00001   maker   CDS     56324   56739   .       +       2       ID=4447:cds;Parent=4447
#
#Same for all genes on them
#
>scaffold00001  length=1348726
CTGTTTCATCTCAAAGGTCTTCCTTAATTTTAATCCATGGTGATCCAGGCTCTGGAAAAA
GCACTCTTGTTCAGGCATTTATAGATAAGTTACCTAAATCTGTTTTGTTCGCCGTTGGGA
ATTTCGACCGGCCGAAAAATCATTCTCCCTACTCTGCCTTAGTTGCAGCATCTGATATTC
TTTGCCGTCAGATTATTCGAATGAAGAATTGGGAAGAAATTAGCAAAAACATCAGAGATG
#
#Actual sequences for all the scaffolds
#

Sequences don't really HAVE to be in the same file, but I find it useful to keep all the data in one place. Plus you'll never accidentally upload GFF and FASTA from different versions of assembly and waste time checking why did it suddenly stop working.

2015-03-31 2:06 GMT+08:00 Vaneet Lotay <[hidden email]>:

Hello,

 

We seem to be having trouble creating a new GBrowse database in MySQL for a brand new track or so it would seem as I’m not sure what exactly is going wrong.  We have a FASTA file and GFF3 file, we created a new GBrowse database and followed the tutorial (http://cpansearch.perl.org/src/LDS/GBrowse-2.43/htdocs/tutorial/tutorial.html#mysql) except to change user names and database names relevant to us.  We used the following command as in the tutorial and it loads successfully (replacing volvox of course):

 

bp_seqfeature_load.pl -c -f -a DBI::mysql -d volvox volvox_all.fa volvox_all.gff3

 

We checked the database after to verify that the sequence and features are loaded properly with matching sequence IDs to link them and it’s all there. However when we go to our test server we get a ‘Not found’ error as if it can’t find those tables in the database.  Doing some gene searches eventually gets another common error “Chromosome/contig not found”.  It actually reveals in the detailed message the correct gene name in our attributes column as well as the correct scaffold ID yet it says not defined in the database.

 

We created a very simplistic subset of the FASTA and GFF3 file containing only one scaffold sequence and one gene/mRNA with 1 exon and 1 CDS from the original files. We created a brand new database and loaded these small files and it still comes up with the same error.  Just wondering what we can be doing wrong as we’ve really changed a lot of small minor things that might be causing this disconnection but to no avail.  I attached these small subset files as well as the configuration file and you can visit our test server for this new database here:

 

http://gbrowse-test.xenbase.org/fgb2/gbrowse/xl_wt1_0/

 

In the context of these new files, after you visit this test server page, you’ll notice if you search for LOC which is the first 3 letters of the actual gene name (LOC100489), it comes up with the ‘contig not found’ message and shows this detail underneath which I stated before indicates that it at least found the GFF3 file and possibly the Chr01 sequence but still won’t display them for some reason:

 

Cannot display LOC100489 because the chromosome/contig named Chr01 is not defined in the database.

 

Please help if you know what initial steps might be going wrong.

 

Thanks,


Vaneet

 

Vaneet Lotay

Xenbase Bioinformatician

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

 


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Creating GBrowse database

Vaneet Lotay

Thanks Alexey, that worked.

 

Vaneet

 

From: Alexey Morozov [mailto:[hidden email]]
Sent: Monday, March 30, 2015 11:37 PM
To: Vaneet Lotay
Cc: [hidden email]
Subject: Re: [Gmod-gbrowse] Creating GBrowse database

 

That's because "Chromosome/contig named Chr01 is not defined in the database". That means you need a separate line declaring all your chromosomes/scaffolds/whatever have you in GFF3 file. For example, my gff goes as follows:

 

#gff-version 3

scaffold00001   newbler contig  1       1348726 .       .       .       ID=scaffold00001;Name=scaffold00001

#

#The same for all scaffolds

#This is the thing you missed. It goes just like gene definition,

#

scaffold00001   maker   gene    56082   56739   .       +       .       ID=4447-gene;Name=4447-gene

scaffold00001   maker   mRNA    56082   56739   .       +       .       ID=4447;Parent=4447-gene;Name=4447;_AED=0.02;_eAED=0.02;_QI=0|0|0|1|1|1|2|0|184

scaffold00001   maker   exon    56082   56220   .       +       .       ID=4447:exon:0;Parent=4447

scaffold00001   maker   exon    56324   56739   .       +       .       ID=4447:exon:1;Parent=4447

scaffold00001   maker   CDS     56082   56220   .       +       0       ID=4447:cds;Parent=4447

scaffold00001   maker   CDS     56324   56739   .       +       2       ID=4447:cds;Parent=4447

#

#Same for all genes on them

#

>scaffold00001  length=1348726

CTGTTTCATCTCAAAGGTCTTCCTTAATTTTAATCCATGGTGATCCAGGCTCTGGAAAAA

GCACTCTTGTTCAGGCATTTATAGATAAGTTACCTAAATCTGTTTTGTTCGCCGTTGGGA

ATTTCGACCGGCCGAAAAATCATTCTCCCTACTCTGCCTTAGTTGCAGCATCTGATATTC

TTTGCCGTCAGATTATTCGAATGAAGAATTGGGAAGAAATTAGCAAAAACATCAGAGATG

#

#Actual sequences for all the scaffolds

#

 

Sequences don't really HAVE to be in the same file, but I find it useful to keep all the data in one place. Plus you'll never accidentally upload GFF and FASTA from different versions of assembly and waste time checking why did it suddenly stop working.

 

2015-03-31 2:06 GMT+08:00 Vaneet Lotay <[hidden email]>:

Hello,

 

We seem to be having trouble creating a new GBrowse database in MySQL for a brand new track or so it would seem as I’m not sure what exactly is going wrong.  We have a FASTA file and GFF3 file, we created a new GBrowse database and followed the tutorial (http://cpansearch.perl.org/src/LDS/GBrowse-2.43/htdocs/tutorial/tutorial.html#mysql) except to change user names and database names relevant to us.  We used the following command as in the tutorial and it loads successfully (replacing volvox of course):

 

bp_seqfeature_load.pl -c -f -a DBI::mysql -d volvox volvox_all.fa volvox_all.gff3

 

We checked the database after to verify that the sequence and features are loaded properly with matching sequence IDs to link them and it’s all there. However when we go to our test server we get a ‘Not found’ error as if it can’t find those tables in the database.  Doing some gene searches eventually gets another common error “Chromosome/contig not found”.  It actually reveals in the detailed message the correct gene name in our attributes column as well as the correct scaffold ID yet it says not defined in the database.

 

We created a very simplistic subset of the FASTA and GFF3 file containing only one scaffold sequence and one gene/mRNA with 1 exon and 1 CDS from the original files. We created a brand new database and loaded these small files and it still comes up with the same error.  Just wondering what we can be doing wrong as we’ve really changed a lot of small minor things that might be causing this disconnection but to no avail.  I attached these small subset files as well as the configuration file and you can visit our test server for this new database here:

 

http://gbrowse-test.xenbase.org/fgb2/gbrowse/xl_wt1_0/

 

In the context of these new files, after you visit this test server page, you’ll notice if you search for LOC which is the first 3 letters of the actual gene name (LOC100489), it comes up with the ‘contig not found’ message and shows this detail underneath which I stated before indicates that it at least found the GFF3 file and possibly the Chr01 sequence but still won’t display them for some reason:

 

Cannot display LOC100489 because the chromosome/contig named Chr01 is not defined in the database.

 

Please help if you know what initial steps might be going wrong.

 

Thanks,


Vaneet

 

Vaneet Lotay

Xenbase Bioinformatician

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

 


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse



 

--

Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse