Welcome, Guest. Please Login or Register
UGENE Bulletin Board
  Welcome to our forum.
  HomeHelpSearchLoginRegister  
 
 
Page Index Toggle Pages: 1
Creating workflow for analysis of multiple files at once: output question (Read 11675 times)
Mar 30th, 2012 at 12:53am

protein_introns   Offline
YaBB Newbies

Posts: 5
*
 
I am trying to design workflow for HMM analysis of multiple files at once. Ideally I would like to feed ~2000 files with bacterial genomes (~20 Mb/file) at once and perform HMM search in each sequence.

At the end of workflow I want to write gb files with new annotations. My problem is: How can I make names of output files to be the same as input files? The format/overwriting is not an issue, because input files are gbk, new files will be gb. At this point I figured out that I can choose the name for the first file and all other will have subsequent numbers. It is very inconvenient for me because I do not know the starting file and have to open each of the generated files to figure out what the parent file was.
Is there any option for me. Please help!
 

workflow_1.jpg (215 KB | )
workflow_1.jpg
IP Logged
 
Reply #1 - Mar 30th, 2012 at 1:17pm

Yuriy Vaskin   Offline
Global Moderator

Gender: male
Posts: 138
*****
 
Thank you for your question!

Unfortunately, there is no way to do that with standard WD features yet. But it’s a good chance to try extended ones! You need to use WD Scripts.

1.      Download the latest snapshot of UGENE (http://ugene.unipro.ru/snapshot.html). Because WD Scripts don’t work in recent versions for some reasons.
2.      Create your scheme from scratch. Note, the “Write Genbank” element disappeared, use the “Write Sequence” element with “genbank” document format option instead.
3.      Switch the script mode on (pict1)
4.      Setup the script (pict2 and pict3)
5.      Run the scheme and find output file in the input files' directory

The script mode is rather limited in scope and may be unstable. But it works for your task. You may try to experiment with the scripts (it’s in JavaScript) to set them up as you want. In that case it would be better to backup your initial data.

If you have any questions I will be happy to answer!
« Last Edit: Mar 30th, 2012 at 2:25pm by Yuriy Vaskin »  

script1.jpg (182 KB | )
script1.jpg
script2.jpg (67 KB | )
script2.jpg
script3.jpg (45 KB | )
script3.jpg
IP Logged
 
Reply #2 - Apr 5th, 2012 at 12:08am

protein_introns   Offline
YaBB Newbies

Posts: 5
*
 
Thanks a lot!

I have another question. Right now uGENE can process only 30 files with my workflow, after HMMER search is done in 30 files uGENE quits.

Is there any options to fix it?
 
IP Logged
 
Reply #3 - Apr 5th, 2012 at 12:25am

protein_introns   Offline
YaBB Newbies

Posts: 5
*
 
Here is the message I get:

Exception with code C++ exception - Unhandled exception

Operation system: Windows x86

UGENE version: 1.11.0-dev

ActiveWindow: None

Log:
None
Task tree:
Workflow run from cmdline      (Running)      100
-Execute workflow      (Running)      100
--Default iteration      (Running)      100

 
IP Logged
 
Reply #4 - Apr 5th, 2012 at 12:33am

protein_introns   Offline
YaBB Newbies

Posts: 5
*
 
Plus, it appeared that new files are truncated:

[13:29:27] 'Opening document: C:/Users/Olga/actinobacteria/NC_006511.gbkhmmer.gb' task failed: Subtask {Loading documents} is failed: Subtask {Opening view for document: NC_006511.gbkhmmer.gb} is failed: Subtask {Load document: 'NC_006511.gbkhmmer.gb'} is failed: Sequence is truncated

I can not open them in uGENE anymore.
 
IP Logged
 
Reply #5 - Apr 5th, 2012 at 3:42pm

Yuriy Vaskin   Offline
Global Moderator

Gender: male
Posts: 138
*****
 
Thank you for the feedback!

The truncation problem will be fixed asap (https://ugene.unipro.ru/tracker/browse/UGENE-913) in the comming out versions of UGENE.

There is no constrain on amount of files in UGENE Smiley The problem with the crash seems to be the problem with memory management, a leak. It requeries to much memory to perform the search on a list of files. Any way, we will fix it. Here is the corresponding issue in our bug tracker (https://ugene.unipro.ru/tracker/browse/UGENE-914). But until it's not fixed you may try to load, let's say, 20 files and perform the search on them and then load another 20 and so on. Such iterations will avoid huge memory consumptions and the crash.

 
IP Logged
 
Reply #6 - Apr 5th, 2012 at 8:58pm

protein_introns   Offline
YaBB Newbies

Posts: 5
*
 
Thanks!

An update:
I found out that when I do analysis of <30 files, the final .gb files are ok. They are not truncated. It seems to me that because uGENE could not properly finish work when >30 files are feed to the program, the final .gb files remained truncated. Now it works fine when I give only <30 files, but it is very time-consuming. I have ~2600 files and I need to run HMM search with several profiles.

Do you think if I use more powerful machine I could do analysis of more than 30 files at once? Thanks once again!
 
IP Logged
 
Reply #7 - Apr 6th, 2012 at 1:02pm

Yuriy Vaskin   Offline
Global Moderator

Gender: male
Posts: 138
*****
 
Thanks for the additional information!

When I tried to run the analysis it consumed ~500 Mb on 20 files and it continued allocating memory. So, a rude estimation could be 25Mb per file. According to the estimation you will need ~63 Gb for 2600 files Shocked

So, I don't think a more powerful machine is a nice solution. You should wait until the problem is fixed (next versions of UGENE which are coming soon) and you will be able to run the analysis on any machine. It will be just a question of time consuming but not memory.
« Last Edit: Apr 6th, 2012 at 3:38pm by Yuriy Vaskin »  
IP Logged
 
Page Index Toggle Pages: 1