Welcome, Guest. Please Login or Register
UGENE Bulletin Board
  Welcome to our forum.
  HomeHelpSearchLoginRegister  
 
 
Page Index Toggle Pages: 1
re-write of original FASTA files by UGENE (Read 10590 times)
Apr 12th, 2012 at 4:11pm

dustman   Offline
YaBB Newbies

Posts: 7
*
 
I've notced that a FASTA file containing number of sequences is modified by UGENE when ClustalW is ran. A new file looks almost the same with exception of lines being wrapped (not an issue) and total length being normalized with dashes. Theoretically it provides nice alignment in text form, but it may rend the file unsuitable for other programs, i.e. outside of UGENE. Most programs should be specifically told to process '-' as an empty symbol or they return ERROR. Plus, combining elements of the original for a new alignment becomes unnecessary tedious.

I can't see a way to prevent changes to the original file in alignment dialog box, or a method to discard them later. Can it be done?
 
IP Logged
 
Reply #1 - Apr 12th, 2012 at 6:08pm

Yuriy Vaskin   Offline
Global Moderator

Gender: male
Posts: 138
*****
 
Thanks for using UGENE!

Let's see if I understand your issue correctly.

You open a multiple FASTA file, export it into alignment (have you done that using "Join sequences into alignment..." option in the "Sequence reading option" dialog when you open your file?), then align it. Your original file is not modified yet, unless you click on "Save selected document" (pic1 - 1). Then the original file is overwritten.

Just click on the "Save a copy" (pic1 - 2) to keep original file and save a modified one into another file. By the way, you may choose another format for the alignment. FASTA is not a good one for that. If you choose a multiple alignment format (CLUSTALW, Mega, ...) there will be no problem with opening it by other tools.

Also you may export your alignment into sequences (Right button on the document - Export - Export alignment to sequence format)(pic1 - 3). And remove dashes from the output file by choosing the corresponding option (pic 2.)

Please let me know if it doesn't work for you.
 

al1.jpg (137 KB | )
al1.jpg
al2.jpg (28 KB | )
al2.jpg
IP Logged
 
Reply #2 - Apr 13th, 2012 at 2:13pm

dustman   Offline
YaBB Newbies

Posts: 7
*
 
Thank you for reply, Yuriy,

No, I manually create/parse a FASTA file for every sequencing I do and store it in that form beside originals (separate FASTA/ABI). Since ClustalW can use FASTA for alignments, it's convinient. Problem, as stated, is that UGENE modifies original FASTA without asking user if it's OK. And for me it's not OK, since I will use these files in non-alignment related programs, often using web tools. Beside, FASTA is the most common and accepted sequence format, albeit limited for annotations.
 
IP Logged
 
Reply #3 - Apr 13th, 2012 at 2:30pm

dustman   Offline
YaBB Newbies

Posts: 7
*
 
Export to FASTA with 'trim' option works fine, but it leaves out quite a bit of info from sequence description line, starting with '>'.

For example, from
>9466317.seq - ID: HAT-A4-GATC-fwd-pHAT-seq-617775 on 2012/4/5-20:22:30 automatically edited with PhredPhrap, start with base no.: 25  Internal Params: Windowsize: 20, Goodqual: 19, Badqual: 10, Minseqlength: 50, nbadelimit: 1
to
>9466317.seq_-_ID__HAT-A4-GATC-

In this particular case sequencing primer info and forward is missing. Even for aesthetical reasons it's hard to understand leaving less than 50 character line for an identifier (31 with '>'), with common line length accepted on most terminals being 60-80.
 
IP Logged
 
Reply #4 - Apr 17th, 2012 at 7:44pm

Yuriy Vaskin   Offline
Global Moderator

Gender: male
Posts: 138
*****
 
You are right. It's not a nice behavior to modify sequences if users don't need that. We'll fix the names issue (https://ugene.unipro.ru/tracker/browse/UGENE-934).
Have you found other particular cases when the sequences are modified somehow?
 
IP Logged
 
Reply #5 - Apr 21st, 2012 at 3:03pm

dustman   Offline
YaBB Newbies

Posts: 7
*
 
Thank you. I haven't used UGENE much so can't comment much on file re-writing.
 
IP Logged
 
Page Index Toggle Pages: 1