Welcome, Guest. Please Login or Register
UGENE Bulletin Board
  Welcome to our forum.
  HomeHelpSearchLoginRegister  
 
 
Page Index Toggle Pages: 1
EMBOSS and BLAST component (Read 21038 times)
Aug 10th, 2010 at 5:57pm

Poh   Offline
YaBB Newbies

Posts: 46
*
 
Hi! I would like to know is there any consideration to import EMBOSS or BLAST as component in UGENE?

I would think they are a good resource for bioinformatics components to develop workflow on.

If it is not yet in consideration, I would think it is a good target to include. Many things can be developed around it.

Thanks.
 
IP Logged
 
Reply #1 - Aug 10th, 2010 at 7:06pm

Mikhail Fursov   Offline
YaBB Administrator

Gender: male
Posts: 162
*****
 
You already can use remote BLAST queries to NCBI servers. Check "Remote Query" plugin: http://ugene.unipro.ru/plugin_remote_query.html


Local BLAST is N1 priority for us today.

Actually we have NCBI-toolkit version of the algorithm integrated into UGENE sources directly, but it was decided to support it as a separate tool, since the size of the source package of BLAST exceeds overall UGENE sources size several time.

So with v1.7.2 you will get CLUSTAL & MAFFT support as external tools. Local BLAST will be added with v1.8.0 as external tool also.

We can also add support for EMBOSS toolset. Which EMBOSS tools would you like to see supported first?
 

---
UGENE team
IP Logged
 
Reply #2 - Aug 11th, 2010 at 1:18pm

Poh   Offline
YaBB Newbies

Posts: 46
*
 
Nice to know local BLAST is N1 priority. I would be great if there is an allocation for it to be configured to run via SGE through ugene.

Is it possible to configure ugene to execute jobs remotely on a cluster or a grid?

On EMBOSS:
It is a long list. I would prioritize the sequence analysis bit first. It would expand ugene's functionality significantly.  I made a short list as follows.

$ wossname seq
Finds programs by keywords in their short description
SEARCH FOR 'SEQ'
backtranambig  Back-translate a protein sequence to ambiguous nucleotide sequence
backtranseq    Back-translate a protein sequence to a nucleotide sequence
cons           Create a consensus sequence from a multiple alignment
consambig      Create an ambiguous consensus sequence from a multiple alignment
cusp           Create a codon usage table from nucleotide sequence(s)
dotmatcher     Draw a threshold dotplot of two sequences
dotpath        Draw a non-overlapping wordmatch dotplot of two sequences
dottup         Displays a wordmatch dotplot of two sequences
dreg           Regular expression search of nucleotide sequence(s)
equicktandem   Finds tandem repeats in nucleotide sequences
est2genome     Align EST sequences to genomic DNA sequence
etandem        Finds tandem repeats in a nucleotide sequence
fuzznuc        Search for patterns in nucleotide sequences
fuzzpro        Search for patterns in protein sequences
fuzztran       Search for patterns in protein sequences (translated)
geecee         Calculate fractional GC content of nucleic acid sequences
infoalign      Display basic information about a multiple sequence alignment
infoseq        Display basic information about sequences
isochore       Plots isochores in DNA sequences
needle         Needleman-Wunsch global alignment of two sequences
needleall      Many-to-many pairwise alignments of two sequence sets
palindrome     Finds inverted repeats in nucleotide sequence(s)
patmatdb       Searches protein sequences with a sequence motif
patmatmotifs   Scan a protein sequence with motifs from the PROSITE database
plotorf        Plot potential open reading frames in a nucleotide sequence
polydot        Draw dotplots for all-against-all comparison of a sequence set
preg           Regular expression search of protein sequence(s)
prettyplot     Draw a sequence alignment with pretty formatting
prettyseq      Write a nucleotide sequence and its translation to file
primersearch   Search DNA sequences for matches with primer pairs
profit         Scan one or more sequences with a simple frequency matrix
prophet        Scan one or more sequences with a Gribskov or Henikoff profile
pscan          Scans protein sequence(s) with fingerprints from the PRINTS database
remap          Display restriction enzyme binding sites in a nucleotide sequence
restrict       Report restriction enzyme cleavage sites in a nucleotide sequence
revseq         Reverse and complement a nucleotide sequence
seqmatchall    All-against-all word comparison of a sequence set
seqret         Reads and writes (returns) sequences
seqretsetall   Reads and writes (returns) many sets of sequences
seqretsplit    Reads sequences and writes them to individual files
showalign      Display a multiple sequence alignment in pretty format
showfeat       Display features of a sequence in pretty format
showorf        Display a nucleotide sequence and translation in pretty format
showpep        Displays protein sequences with features in pretty format
showseq        Displays sequences with features in pretty format
shuffleseq     Shuffles a set of sequences maintaining composition
sigcleave      Reports on signal cleavage sites in a protein sequence
sixpack        Display a DNA sequence with 6-frame translation and ORFs
sizeseq        Sort sequences by size
skipredundant  Remove redundant sequences from an input set
skipseq        Reads and writes (returns) sequences, skipping first few
splitsource    Split sequence(s) into original source sequences
splitter       Split sequence(s) into smaller sequences
stretcher      Needleman-Wunsch rapid global alignment of two sequences
supermatcher   Calculate approximate local pair-wise alignments of larger sequences
syco           Draw synonymous codon usage statistic plot for a nucleotide sequence
textsearch     Search the textual description of sequence(s)
tfscan         Identify transcription factor binding sites in DNA sequences
tmap           Predict and plot transmembrane segments in protein sequences
transeq        Translate nucleic acid sequences
trimest        Remove poly-A tails from nucleotide sequences
trimseq        Remove unwanted characters from start and end of sequence(s)
vectorstrip    Removes vectors from the ends of nucleotide sequence(s)
water          Smith-Waterman local alignment of sequences
whichdb        Search all sequence databases for an entry and retrieve it
wobble         Plot third base position variability in a nucleotide sequence
wordcount      Count and extract unique words in molecular sequence(s)
wordfinder     Match large sequences against one or more other sequences
wordmatch      Finds regions of identity (exact matches) of two sequences

 
IP Logged
 
Reply #3 - Aug 12th, 2010 at 3:07pm

Mikhail Fursov   Offline
YaBB Administrator

Gender: male
Posts: 162
*****
 
A lot of these EMBOSS features are already implemented in UGENE. For example: backtranseq, cons, cusp, dotmatcher ....

Looks like we need better docs for them and provide CMD-line level interfaces.

We will review all these commands, enrich CMD-line interface and wrappers and own impls for all those we missed in future releases
 

---
UGENE team
IP Logged
 
Reply #4 - Jan 7th, 2011 at 3:36pm

Poh   Offline
YaBB Newbies

Posts: 46
*
 
Mikhail Fursov wrote on Aug 12th, 2010 at 3:07pm:
A lot of these EMBOSS features are already implemented in UGENE. For example: backtranseq, cons, cusp, dotmatcher ....

Looks like we need better docs for them and provide CMD-line level interfaces.

We will review all these commands, enrich CMD-line interface and wrappers and own impls for all those we missed in future releases



I just tried this on ugene1.9.0

./ugenecl.exe --help

I cannot find any emboss package tools...
I would suggest that, if there is any, please include it in the menu.

Command line interface of ugene is good, as I do see a potential to use it for scripting purposes. Is there a way to develop a pipeline that runs on both workflow designer and commandline? That would be helpful in developing pipelines, hopefully less a need to port it as a package in ugene, but could exist independently.


 
IP Logged
 
Reply #5 - Jan 7th, 2011 at 8:04pm

Konstantin Okonechnikov   Offline
Global Moderator

Posts: 173
*****
 
Poh,
any pipeline (workflow schema) created with Workflow Designer can be configured to launch as command line tool! Actually this is a very handy feature for scripting Smiley

Check this video tutorial:
http://www.youtube.com/watch?v=ZfxmX_2Ot5M

Also more documentation is available here:
http://ugene.unipro.ru/documentation/manual/appendixes/command_line/other_schema...
 
IP Logged
 
Reply #6 - Jan 9th, 2011 at 1:24pm

Poh   Offline
YaBB Newbies

Posts: 46
*
 
Konstantin Okonechnikov wrote on Jan 7th, 2011 at 8:04pm:
Poh,
any pipeline (workflow schema) created with Workflow Designer can be configured to launch as command line tool! Actually this is a very handy feature for scripting Smiley

Check this video tutorial:
http://www.youtube.com/watch?v=ZfxmX_2Ot5M

Also more documentation is available here:
http://ugene.unipro.ru/documentation/manual/appendixes/command_line/other_schema...


I would think it is very much limited to ugene's built in or ported application and readers. I would still need a prior knowledge to how the application works before design and using it.

Wouldn't it be better that ugene have all the common workflows packaged with it where users only needed to define the parameters?

At the moment, I am yet to see how am I to add a component of a commandline tool into ugene. How is that possible with ugene 1.9.0?

 
IP Logged
 
Reply #7 - Jan 10th, 2011 at 6:44pm

Mikhail Fursov   Offline
YaBB Administrator

Gender: male
Posts: 162
*****
 
> Wouldn't it be better that ugene have all the common workflows packaged with it where users only needed to define the parameters?

Please check the /data/cmdline folder:

align-clustalw.uwl
align-kalign.uwl
align-mafft.uwl
align.uwl
bowtie-build.uwl
bowtie.uwl
convert-msa.uwl
convert-seq.uwl
extract-sequence.uwl
find-orfs.uwl
find-repeats.uwl
find-sw.uwl
hmm2-build.uwl
hmm2-search.uwl
join-quality.uwl
local-blast.uwl
local-blast+.uwl
pfm-build.uwl
pfm-search.uwl
pwm-build.uwl
pwm-search.uwl
remote-request.uwl
sitecon-build.uwl
sitecon-search.uwl

All these workflows are available as cmd-line options. If you put your own workflow to this folder it will automatically work as a command line paramer.

Command example: ugene find-orfs

We plan to make this command set more complete with the next releases.


>At the moment, I am yet to see how am I to add a component of a commandline tool into ugene. How is that possible with ugene 1.9.0?

Yes, but you have to code in C++ today to do it.
Please check BLAST, MAFFT or CLUSTAL tools integrational code as an example.
The main reason why you need to code in C++ today: you need to derive data-model from a tool output compatiple with internal UGENE data model.

Actually we can support scrtipting based integration for external tools and provide a user with JavaScript like (QtScript) interface. This is just a question of priorities.
 

---
UGENE team
IP Logged
 
Page Index Toggle Pages: 1