UGENE Forum
https://forum.ugene.net/forum/YaBB.pl
General Category >> Bugs and Issues >> UGENE-produced gb files cannot be parsed in biopython or bioperl
https://forum.ugene.net/forum/YaBB.pl?num=1253431935

Message started by vigil on Sep 20th, 2009 at 2:32pm

Title: UGENE-produced gb files cannot be parsed in biopython or bioperl
Post by vigil on Sep 20th, 2009 at 2:32pm
Thankyou again for a great program, soon to be cited in my paper!

UGENE produces gb files of hits found, for instance when searching for repats or pattern matches. I've been trying to further treat these results by parsing them with bio-perl/python, but the parsers of these two languages are not compatible with the kind of genbank output I get from UGENE.

I notice now I'm using 1.4.1 (on windows, 1.4 gHz single processor on tc4200 laptop) not the most recent verison, but I can't see anything about this problem in the later bug fixes.

The perl code that produced the latest error looked like this:


Code:
use Bio::SeqIO;

### This is a file from UGENEs find pattern tool:####

my $filenme="C:/Documents and Settings/Carl/My Documents/NewBioInf/myGB/hitsOniS3.gb";

$seqio_obj = Bio::SeqIO->new(-file => $filenme, -format => "genbank" );
$seq_obj = $seqio_obj->next_seq;


And here's the error:

Code:
C:\Documents and Settings\Carl>perl readGB.pl

--------------------- WARNING ---------------------
MSG: Unknown alphabet:
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: Unexpected error in feature table for  Skipping feature, attempting to recover
---------------------------------------------------

------------- EXCEPTION -------------
MSG: Alphabet '1' is not a valid alphabet ('dna','protein','rna') lowercase
STACK Bio::PrimarySeq::alphabet C:/Perl/site/lib/Bio/PrimarySeq.pm:571
STACK Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:208
STACK Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484
STACK Bio::Seq::RichSeq::new C:/Perl/site/lib/Bio/Seq/RichSeq.pm:110
STACK Bio::Seq::SeqFactory::create C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116
STACK Bio::Factory::ObjectFactoryI::create_object C:/Perl/site/lib/Bio/Factory/ObjectFactoryI.pm:102

STACK Bio::Seq::SeqBuilder::make_object C:/Perl/site/lib/Bio/Seq/SeqBuilder.pm:337
STACK Bio::SeqIO::genbank::next_seq C:/Perl/site/lib/Bio\SeqIO\genbank.pm:717
STACK toplevel readGB.pl:5
-------------------------------------


Best regards,

Theo Vigil

Title: Re: UGENE-produced gb files cannot be parsed in biopython or bioperl
Post by Ivan Efremov on Sep 20th, 2009 at 6:48pm
Hi Theo,
please post the file - we will try to investigate the problem.

Title: Re: UGENE-produced gb files cannot be parsed in biopython or bioperl
Post by vigil on Oct 19th, 2009 at 2:29pm
Forgot what file this was, but I think I corrected the problem by adjusting the number of spaces surrounding the nucleotide position in the genbank file - from 9 to 10 or 11 to 10 or something similar.

Title: Re: UGENE-produced gb files cannot be parsed in biopython or bioperl
Post by Ivan Efremov on Oct 19th, 2009 at 3:05pm
OK, we will try to investigate the problem.
Thanks for report!

Title: Re: UGENE-produced gb files cannot be parsed in biopython or bioperl
Post by Ivan Efremov on Oct 19th, 2009 at 5:26pm
Theo,
I've tried to reproduce the error and investigated the following.

This exception is thrown only if I try to open the file (.gb) which does not contain a sequence (see no_seq.gb). And, as I think, this is not 'very' incorrect behavior since you try to access a sequence object from a file without such object - some error must be reported by bio perl.

When I try to open files with sequences (bioperl_ok.gb2) - everything is OK.
https://forum.ugene.net/forum/YaBB.pl?action=downloadfile;file=no_seq.gb (0 KB | )
https://forum.ugene.net/forum/YaBB.pl?action=downloadfile;file=bioperl_ok.gb (0 KB | )

UGENE Forum » Powered by YaBB 2.5 AE!
YaBB Forum Software © 2000-2010. All Rights Reserved.