UGENE Forum - Print Page

Dear All,
I would like to prepare statistical report from my fastq data.
I was responsible for project with bacterial genome:
1. 1st step - it was de novo assembly (after that I choose possible reference sequence, and...)
2. 2nd step - it was resequencing project (I choose bowtie2 alignment)
After that I used also basys platform for automated annotation.

Now, I have some questions:
A. Is there any possibility to prepare some statistic about sequencing project with ugene platform and included scripts? I mean length distribution, GC-content (I think I saw that), average phred score, and base-quality distribution?

B. non matched sequence - as I write, I used chosen sequence as a reference one, after de novo sequencing I have seen some contigs in which I recognised some plasmids sequences, after resequencing step I think I lost this data (or I do not know where to find this reads...) Can you suggest something in that case? I mean de novo assembly using data which do not match to the "refseq"

Sorry for such many questions, but I really enjoy your platform as a easy to use for people with not to big knowledge about unix...

Regards
L.

Hi Leiga!

A:
UGENE integrates FastQC tool for getting the statistics like shown on the image. You can watch this video to learn how to interpret the FastQC reports: http://www.youtube.com/watch?v=bz93ReOv87Y

To run it in UGENE 1.16, select "Tools > NGS data analysis > Reads quality control" in the main menu.

B:
Could you please give more details? What is the goal? Wouldn't it help to map the reads to the plasmid sequences?

By the way, have you succeeded using SPAdes in UGENE (http://ugene.unipro.ru/forum/YaBB.pl?num=1422521083) or you used another de novo assembler?

fastqc.png (102 KB | 478 )

Dear Olga,

my goal?
A) main task - to identify and obtain sequence of bacterial DNA
I decided to use de novo as a first step (for free in illumina base space platform, but I have got to much reads to use free spades tool there, that's why i was looking for system like yours), but it gives me only partial information, but helpful in looking for possible reference sequence ( I have checked it manually using BLAST)
After that I choose "refseq" and do resequencing... (very nice coverage - near 1000x, no gaps...., but also i do not know how to sort reads which do not match to my refseq).
B) minor task - plasmids... or another add-ons

For resequencing I used your platform and bowtie2, and obtained consensus I uploaded to basys platform (now I got annotated sequence),
Do I do something wrong? what is a typical pipeline for this kind of task?
Now I am using clc tool, I have similar but different consensus sequence obtained from the same reads and the same refseq, using unmatched reads I was able to perform de novo sequencing and I have some huge fragments of DNA (different set of plasmids, still analyse it)

Another project - unknown bacteriophages sequencing project - I think I need spades for that...
but still have a problem with my data... I used new release of UGENE but it does not change anything. Tomorrow I will have my IT specialist, I am not familiar with unix system, he will help me with command line (I know, you have write my almost everything :)). Tomorrow I will try to use Win version of UGENE, and I will see what I have after using clc tool...

Sorry for my silence, but I had long break from work, today is my second day.

Regards
L.

Dear Leiga,

Thank you for the description!

Quote:

but also i do not know how to sort reads which do not match to my refseq

Quote:

using unmatched reads I was able to perform de novo sequencing and I have some huge fragments of DNA (different set of plasmids, still analyse it)

We're going to add a new feature into UGENE to simplify the procedure of the unmapped reads extraction (see UGENE-4123).

Quote:

Another project - unknown bacteriophages sequencing project - I think I need spades for that... but still have a problem with my data...

Thanks again for testing! As I wrote you in the other forum thread, it would help a lot if you share with us the data, so we're able to reproduce the issue. Of course, if it is possible.

Quote:

Sorry for my silence, but I had long break from work, today is my second day.

No worries! I'm glad that you back :)

UGENE Forum
https://forum.ugene.net/forum/YaBB.pl General Category >> Help and How-to >> Report generation https://forum.ugene.net/forum/YaBB.pl?num=1425320889 Message started by Leiga on Mar 3^rd, 2015 at 1:28am