UGENE Forum - Print Page

Can Ugene handle NGS data

Support of Next Generation Sequencing capabilities is a high priority direction for UGENE project. We have already implemented several steps towrads NGS data analisys and visualization.

Do you have any particular tasks in mind?

I have Solexa/Illumina reads in fastq format and contigs in fasta format. I tried several things below.

1. I want to create a local DB from reads and then blastn contigs to this DB. It seems that FormatDB tool does not support for fastq sequence. An empty DB was created.

2. I would like to annotate around 40k contigs by Blastn to RefRNA or some other DB. When I designed a workflow for this by Read sequence => Remote Blastn => Write sequences. Is this posible to transfer blast results to annotation information that attached directly with contig sequences? Of cource I can do this by select single sequence for blasting but not in workflow.

DO you have any solution for those tasks?

Thanks in advance,

Hi Hieu,

here are the answers:

1. You can convert fastq to fasta using "Save a copy" button in project view, or using a simple workflow "Read Sequence -> Write Fasta". After converting fastq to fasta you will be able to use the FormatDB tool.

2. You have to select that annotations from remote blast should be used in genbank writer. See attached screenshot.

Feel free to ask more questions if further help is needed.

save-a-copy.png (12 KB | )

workflow-remote-blast.png (33 KB | )

Thanks for your quick answers. However for the 1st task, I used a simple workflow "Read Sequence -> Write Fasta" for converting fastq to fasta format. I got following error message.

Code:

Exception with code C++ exception - Unhandled exception

Operation system: Windows x86

UGENE version: 1.9.1

ActiveWindow: Workflow Designer - New schema

TaskLog:
[00:25::25.373] promoting task {Read sequences from C:/Users/Hieu/Desktop/sequence/UGENE/GPL3/110216.fastq} to 'Running'
[00:25::25.373] acquiring resource: 'Threads':1, state: 1/1000
[00:25::25.373] promoting task {Default iteration} to 'Running'
[00:25::25.373] promoting task {Execute workflow} to 'Running'
[00:25::25.373] promoting task {Read sequences from C:/Users/Hieu/Desktop/sequence/UGENE/GPL3/110216.fastq} to 'Prepared'
[00:25::25.373] promoting task {Default iteration} to 'Prepared'
[00:25::25.373] promoting task {Execute workflow} to 'Prepared'
[00:25::25.373] Registering new task: Execute workflow
[00:25::25.373] Deleting task: Enable 'External tools support' service
[00:25::25.373] Uregistering task: Enable 'External tools support' service
[00:25::25.373] promoting task {Enable 'External tools support' service} to 'Finished'
[00:25::25.373] promoting task {Enable 'External tools support' service} to 'Running'
[00:25::25.373] promoting task {Enable 'External tools support' service} to 'Prepared'
[00:25::25.373] Registering new task: Enable 'External tools support' service
[00:25::25.373] Deleting task: Enable 'DNA export service' service
[00:25::25.373] Uregistering task: Enable 'DNA export service' service
[00:25::25.373] promoting task {Enable 'DNA export service' service} to 'Finished'
[00:25::25.373] promoting task {Enable 'DNA export service' service} to 'Running'
[00:25::25.373] promoting task {Enable 'DNA export service' service} to 'Prepared'
[00:25::25.373] Registering new task: Enable 'DNA export service' service
[00:25::25.373] Deleting task: Enable 'ProjectView' service
[00:25::25.373] Deleting task: Enable ProjectView
[00:25::25.373] Uregistering task: Enable 'ProjectView' service
[00:25::25.374] promoting task {Enable 'ProjectView' service} to 'Finished'
[00:25::25.374] promoting task {Enable ProjectView} to 'Finished'
[00:25::25.374] promoting task {Enable ProjectView} to 'Running'
[00:25::25.374] promoting task {Enable 'ProjectView' service} to 'Running'
[00:25::25.374] promoting task {Enable ProjectView} to 'Prepared'
[00:25::25.374] promoting task {Enable 'ProjectView' service} to 'Prepared'
[00:25::25.374] Registering new task: Enable 'ProjectView' service
[00:25::25.374] Deleting task: Open project/document
[00:25::25.374] Deleting task: Register project
[00:25::25.374] Deleting task: Register 'Project' service
[00:25::25.374] Deleting task: Enable 'Project' service
[00:25::25.374] Deleting task: Enable Project
[00:25::25.374] Uregistering task: Open project/document
[00:25::25.374] promoting task {Open project/document} to 'Finished'
[00:25::25.374] promoting task {Register project} to 'Finished'
[00:25::25.374] promoting task {Register 'Project' service} to 'Finished'
[00:25::25.374] promoting task {Enable 'Project' service} to 'Finished'
[00:25::25.374] promoting task {Enable Project} to 'Finished'
[00:25::25.374] promoting task {Enable Project} to 'Running'
[00:25::25.374] promoting task {Enable 'Project' service} to 'Running'
[00:25::25.374] promoting task {Register 'Project' service} to 'Running'
[00:25::25.374] promoting task {Register project} to 'Running'
[00:25::25.374] promoting task {Open project/document} to 'Running'
[00:25::25.374] promoting task {Enable Project} to 'Prepared'
[00:25::25.374] promoting task {Enable 'Project' service} to 'Prepared'
[00:25::25.374] promoting task {Register 'Project' service} to 'Prepared'
[00:25::25.374] promoting task {Register project} to 'Prepared'
[00:25::25.374] promoting task {Open project/document} to 'Prepared'

Task tree:
Execute workflow (Running) 1
-Default iteration (Running) 1
--Read sequences from C:/Users/Hieu/Desktop/sequence/UGENE/GPL3/110216.fastq (Running) 1

Many thanks for the report. We will try to fix the bug as soon as possible.
It would be very helpful if you share with us the workflow scheme file you are running and the fastq file. If you can not share the fastq, please tell us its filesize and number of sequences.

Thanks!

The fastq file is 5.8 Gb in size containing about 20 milions of reads/sequences.

I attached the schema below.

https://forum.ugene.net/forum/YaBB.pl?action=downloadfile;file=remoteBLAST.uwl (3 KB | )

Ok, thanks. We will re-check and fix handling big datafiles in UGENE.

for the 2nd task, i created a workflow to 1) read sequence from contigs => 2) remote BLASTn => 3) write to a Genbank format file with annotation from remoteBLAST. It looks like your screenshot.

My question is do we need to create a blank file with name 1.gb in target location before running scheme? Because in my scheme, it run well in remote BLASTn task but could not somehow write to output file. What difference between overwrite and append modes?

BR,

You don't have to create an empty file, it will be created automatically. If this doesn't happen, something is wrong. What was the error message after the schema execution?

The difference between the "append" and "overwrite" modes are the following:
in "overwrite" mode the output Genbank file will be overwritten on every schema launch, while in "append" mode the data will be added to end of file.

thank you.

Is there any function in UGene that can help us to summarize information of a contig file? Like calculate N50, how many contigs/sequences have its length longer than 1kb, for instance.

Right now there is no such functionality in UGENE, so we will think over how to add this. For now, it looks like the easiest way would be to create a simple script worker for Workflow Designer.

Hieu Cao wrote on May 15^th, 2011 at 9:02pm:

thank you.

Is there any function in UGene that can help us to summarize information of a contig file? Like calculate N50, how many contigs/sequences have its length longer than 1kb, for instance.

Please check this issue for the progress: https://ugene.unipro.ru/tracker/browse/UGENE-313

UGENE Forum
https://forum.ugene.net/forum/YaBB.pl General Category >> Help and How-to >> Next Gen Sequencing https://forum.ugene.net/forum/YaBB.pl?num=1300429237 Message started by vijay on Mar 18^th, 2011 at 1:20pm