UGENE Forum | |
https://forum.ugene.net/forum/YaBB.pl
General Category >> Feature Requests >> EMBOSS and BLAST component https://forum.ugene.net/forum/YaBB.pl?num=1281437854 Message started by Poh on Aug 10th, 2010 at 5:57pm |
Title: EMBOSS and BLAST component Post by Poh on Aug 10th, 2010 at 5:57pm
Hi! I would like to know is there any consideration to import EMBOSS or BLAST as component in UGENE?
I would think they are a good resource for bioinformatics components to develop workflow on. If it is not yet in consideration, I would think it is a good target to include. Many things can be developed around it. Thanks. |
Title: Re: EMBOSS and BLAST component Post by Mikhail Fursov on Aug 10th, 2010 at 7:06pm
You already can use remote BLAST queries to NCBI servers. Check "Remote Query" plugin: http://ugene.unipro.ru/plugin_remote_query.html
Local BLAST is N1 priority for us today. Actually we have NCBI-toolkit version of the algorithm integrated into UGENE sources directly, but it was decided to support it as a separate tool, since the size of the source package of BLAST exceeds overall UGENE sources size several time. So with v1.7.2 you will get CLUSTAL & MAFFT support as external tools. Local BLAST will be added with v1.8.0 as external tool also. We can also add support for EMBOSS toolset. Which EMBOSS tools would you like to see supported first? |
Title: Re: EMBOSS and BLAST component Post by Poh on Aug 11th, 2010 at 1:18pm
Nice to know local BLAST is N1 priority. I would be great if there is an allocation for it to be configured to run via SGE through ugene.
Is it possible to configure ugene to execute jobs remotely on a cluster or a grid? On EMBOSS: It is a long list. I would prioritize the sequence analysis bit first. It would expand ugene's functionality significantly. I made a short list as follows. $ wossname seq Finds programs by keywords in their short description SEARCH FOR 'SEQ' backtranambig Back-translate a protein sequence to ambiguous nucleotide sequence backtranseq Back-translate a protein sequence to a nucleotide sequence cons Create a consensus sequence from a multiple alignment consambig Create an ambiguous consensus sequence from a multiple alignment cusp Create a codon usage table from nucleotide sequence(s) dotmatcher Draw a threshold dotplot of two sequences dotpath Draw a non-overlapping wordmatch dotplot of two sequences dottup Displays a wordmatch dotplot of two sequences dreg Regular expression search of nucleotide sequence(s) equicktandem Finds tandem repeats in nucleotide sequences est2genome Align EST sequences to genomic DNA sequence etandem Finds tandem repeats in a nucleotide sequence fuzznuc Search for patterns in nucleotide sequences fuzzpro Search for patterns in protein sequences fuzztran Search for patterns in protein sequences (translated) geecee Calculate fractional GC content of nucleic acid sequences infoalign Display basic information about a multiple sequence alignment infoseq Display basic information about sequences isochore Plots isochores in DNA sequences needle Needleman-Wunsch global alignment of two sequences needleall Many-to-many pairwise alignments of two sequence sets palindrome Finds inverted repeats in nucleotide sequence(s) patmatdb Searches protein sequences with a sequence motif patmatmotifs Scan a protein sequence with motifs from the PROSITE database plotorf Plot potential open reading frames in a nucleotide sequence polydot Draw dotplots for all-against-all comparison of a sequence set preg Regular expression search of protein sequence(s) prettyplot Draw a sequence alignment with pretty formatting prettyseq Write a nucleotide sequence and its translation to file primersearch Search DNA sequences for matches with primer pairs profit Scan one or more sequences with a simple frequency matrix prophet Scan one or more sequences with a Gribskov or Henikoff profile pscan Scans protein sequence(s) with fingerprints from the PRINTS database remap Display restriction enzyme binding sites in a nucleotide sequence restrict Report restriction enzyme cleavage sites in a nucleotide sequence revseq Reverse and complement a nucleotide sequence seqmatchall All-against-all word comparison of a sequence set seqret Reads and writes (returns) sequences seqretsetall Reads and writes (returns) many sets of sequences seqretsplit Reads sequences and writes them to individual files showalign Display a multiple sequence alignment in pretty format showfeat Display features of a sequence in pretty format showorf Display a nucleotide sequence and translation in pretty format showpep Displays protein sequences with features in pretty format showseq Displays sequences with features in pretty format shuffleseq Shuffles a set of sequences maintaining composition sigcleave Reports on signal cleavage sites in a protein sequence sixpack Display a DNA sequence with 6-frame translation and ORFs sizeseq Sort sequences by size skipredundant Remove redundant sequences from an input set skipseq Reads and writes (returns) sequences, skipping first few splitsource Split sequence(s) into original source sequences splitter Split sequence(s) into smaller sequences stretcher Needleman-Wunsch rapid global alignment of two sequences supermatcher Calculate approximate local pair-wise alignments of larger sequences syco Draw synonymous codon usage statistic plot for a nucleotide sequence textsearch Search the textual description of sequence(s) tfscan Identify transcription factor binding sites in DNA sequences tmap Predict and plot transmembrane segments in protein sequences transeq Translate nucleic acid sequences trimest Remove poly-A tails from nucleotide sequences trimseq Remove unwanted characters from start and end of sequence(s) vectorstrip Removes vectors from the ends of nucleotide sequence(s) water Smith-Waterman local alignment of sequences whichdb Search all sequence databases for an entry and retrieve it wobble Plot third base position variability in a nucleotide sequence wordcount Count and extract unique words in molecular sequence(s) wordfinder Match large sequences against one or more other sequences wordmatch Finds regions of identity (exact matches) of two sequences |
Title: Re: EMBOSS and BLAST component Post by Mikhail Fursov on Aug 12th, 2010 at 3:07pm
A lot of these EMBOSS features are already implemented in UGENE. For example: backtranseq, cons, cusp, dotmatcher ....
Looks like we need better docs for them and provide CMD-line level interfaces. We will review all these commands, enrich CMD-line interface and wrappers and own impls for all those we missed in future releases |
Title: Re: EMBOSS and BLAST component Post by Poh on Jan 7th, 2011 at 3:36pm Mikhail Fursov wrote on Aug 12th, 2010 at 3:07pm:
I just tried this on ugene1.9.0 ./ugenecl.exe --help I cannot find any emboss package tools... I would suggest that, if there is any, please include it in the menu. Command line interface of ugene is good, as I do see a potential to use it for scripting purposes. Is there a way to develop a pipeline that runs on both workflow designer and commandline? That would be helpful in developing pipelines, hopefully less a need to port it as a package in ugene, but could exist independently. |
Title: Re: EMBOSS and BLAST component Post by Konstantin Okonechnikov on Jan 7th, 2011 at 8:04pm
Poh,
any pipeline (workflow schema) created with Workflow Designer can be configured to launch as command line tool! Actually this is a very handy feature for scripting :) Check this video tutorial: http://www.youtube.com/watch?v=ZfxmX_2Ot5M Also more documentation is available here: http://ugene.unipro.ru/documentation/manual/appendixes/command_line/other_schema.html |
Title: Re: EMBOSS and BLAST component Post by Poh on Jan 9th, 2011 at 1:24pm Konstantin Okonechnikov wrote on Jan 7th, 2011 at 8:04pm:
I would think it is very much limited to ugene's built in or ported application and readers. I would still need a prior knowledge to how the application works before design and using it. Wouldn't it be better that ugene have all the common workflows packaged with it where users only needed to define the parameters? At the moment, I am yet to see how am I to add a component of a commandline tool into ugene. How is that possible with ugene 1.9.0? |
Title: Re: EMBOSS and BLAST component Post by Mikhail Fursov on Jan 10th, 2011 at 6:44pm
> Wouldn't it be better that ugene have all the common workflows packaged with it where users only needed to define the parameters?
Please check the /data/cmdline folder: align-clustalw.uwl align-kalign.uwl align-mafft.uwl align.uwl bowtie-build.uwl bowtie.uwl convert-msa.uwl convert-seq.uwl extract-sequence.uwl find-orfs.uwl find-repeats.uwl find-sw.uwl hmm2-build.uwl hmm2-search.uwl join-quality.uwl local-blast.uwl local-blast+.uwl pfm-build.uwl pfm-search.uwl pwm-build.uwl pwm-search.uwl remote-request.uwl sitecon-build.uwl sitecon-search.uwl All these workflows are available as cmd-line options. If you put your own workflow to this folder it will automatically work as a command line paramer. Command example: ugene find-orfs We plan to make this command set more complete with the next releases. >At the moment, I am yet to see how am I to add a component of a commandline tool into ugene. How is that possible with ugene 1.9.0? Yes, but you have to code in C++ today to do it. Please check BLAST, MAFFT or CLUSTAL tools integrational code as an example. The main reason why you need to code in C++ today: you need to derive data-model from a tool output compatiple with internal UGENE data model. Actually we can support scrtipting based integration for external tools and provide a user with JavaScript like (QtScript) interface. This is just a question of priorities. |
UGENE Forum » Powered by YaBB 2.5 AE! YaBB Forum Software © 2000-2010. All Rights Reserved. |