UGENE Forum
https://forum.ugene.net/forum/YaBB.pl
General Category >> Help and How-to >> re-write of original FASTA files by UGENE
https://forum.ugene.net/forum/YaBB.pl?num=1334221907

Message started by dustman on Apr 12th, 2012 at 4:11pm

Title: re-write of original FASTA files by UGENE
Post by dustman on Apr 12th, 2012 at 4:11pm
I've notced that a FASTA file containing number of sequences is modified by UGENE when ClustalW is ran. A new file looks almost the same with exception of lines being wrapped (not an issue) and total length being normalized with dashes. Theoretically it provides nice alignment in text form, but it may rend the file unsuitable for other programs, i.e. outside of UGENE. Most programs should be specifically told to process '-' as an empty symbol or they return ERROR. Plus, combining elements of the original for a new alignment becomes unnecessary tedious.

I can't see a way to prevent changes to the original file in alignment dialog box, or a method to discard them later. Can it be done?

Title: Re: re-write of original FASTA files by UGENE
Post by Yuriy Vaskin on Apr 12th, 2012 at 6:08pm
Thanks for using UGENE!

Let's see if I understand your issue correctly.

You open a multiple FASTA file, export it into alignment (have you done that using "Join sequences into alignment..." option in the "Sequence reading option" dialog when you open your file?), then align it. Your original file is not modified yet, unless you click on "Save selected document" (pic1 - 1). Then the original file is overwritten.

Just click on the "Save a copy" (pic1 - 2) to keep original file and save a modified one into another file. By the way, you may choose another format for the alignment. FASTA is not a good one for that. If you choose a multiple alignment format (CLUSTALW, Mega, ...) there will be no problem with opening it by other tools.

Also you may export your alignment into sequences (Right button on the document - Export - Export alignment to sequence format)(pic1 - 3). And remove dashes from the output file by choosing the corresponding option (pic 2.)

Please let me know if it doesn't work for you.

al1.jpg (137 KB | )
al2.jpg (28 KB | )

Title: Re: re-write of original FASTA files by UGENE
Post by dustman on Apr 13th, 2012 at 2:13pm
Thank you for reply, Yuriy,

No, I manually create/parse a FASTA file for every sequencing I do and store it in that form beside originals (separate FASTA/ABI). Since ClustalW can use FASTA for alignments, it's convinient. Problem, as stated, is that UGENE modifies original FASTA without asking user if it's OK. And for me it's not OK, since I will use these files in non-alignment related programs, often using web tools. Beside, FASTA is the most common and accepted sequence format, albeit limited for annotations.

Title: Re: re-write of original FASTA files by UGENE
Post by dustman on Apr 13th, 2012 at 2:30pm
Export to FASTA with 'trim' option works fine, but it leaves out quite a bit of info from sequence description line, starting with '>'.

For example, from
>9466317.seq - ID: HAT-A4-GATC-fwd-pHAT-seq-617775 on 2012/4/5-20:22:30 automatically edited with PhredPhrap, start with base no.: 25  Internal Params: Windowsize: 20, Goodqual: 19, Badqual: 10, Minseqlength: 50, nbadelimit: 1
to
>9466317.seq_-_ID__HAT-A4-GATC-

In this particular case sequencing primer info and forward is missing. Even for aesthetical reasons it's hard to understand leaving less than 50 character line for an identifier (31 with '>'), with common line length accepted on most terminals being 60-80.

Title: Re: re-write of original FASTA files by UGENE
Post by Yuriy Vaskin on Apr 17th, 2012 at 7:44pm
You are right. It's not a nice behavior to modify sequences if users don't need that. We'll fix the names issue (https://ugene.unipro.ru/tracker/browse/UGENE-934).
Have you found other particular cases when the sequences are modified somehow?

Title: Re: re-write of original FASTA files by UGENE
Post by dustman on Apr 21st, 2012 at 3:03pm
Thank you. I haven't used UGENE much so can't comment much on file re-writing.

UGENE Forum » Powered by YaBB 2.5 AE!
YaBB Forum Software © 2000-2010. All Rights Reserved.