UGENE Forum - Workflow

Aug 14^th, 2020 at 8:21pm

SB Offline
YaBB Newbies

Posts: 5

Is it possible to ask a basic question re:Workflow?

I have just started using UGene and the pre-prepared Sanger workflow to a reference.
It works well.

Manually - I then open the alignment and remove a section (ATG to TAG) [1], save, remove another section (between 2 fixed 24 base pair sequences) [2], save and then translate that section and save.
The manually pop the 3 saved files into one manually.

Please can I ask if I can do all of that as an automated workflow without manual intervention?

Also - the ABIF which I use contains Quality information which can be extracted eg in BioPython here:
{link removed biostars.org/p/309727/#416388}

I would quite like to reject the sequence labelled [2] above based on quality score. Can I ask if that is possible in a UGene Workflow ?

Thanks! Will delve much deeper into workflow if my basic workflow is possible.

Back to top

IP Logged

Reply #1 - Aug 17^th, 2020 at 12:19pm

Dmitrii Sukhomlinov Offline
YaBB Administrator
Russia

Gender:
Posts: 78

Hello,
> Please can I ask if I can do all of that as an automated workflow without manual intervention?
Can you please, clarify the actions you want to make automatic more clearly? Because I didn't understand this part well. Do you want to replace all entries of some subsequences (e.g. ATG) with another one (e.g. TAG)? Or you want to remove some subsequences at all?

> I would quite like to reject the sequence labelled [2] above based on quality score. Can I ask if that is possible in a UGene Workflow ?
The "Map Sanger reads to Reference" feature has the setting "Trimming quality threshold", which means the bottom edge of the quality (if the quality is lower than the threshold when the read will be filtered). But, there is no option that configures the top edge of the quality.

Best regards,
Dmitrii Sukhomlinov,
The UGENE team.

Back to top

IP Logged

Reply #2 - Aug 18^th, 2020 at 11:09pm

SB Offline
YaBB Newbies

Posts: 5

Many thanks for the response.
I would love to know if the following is possible in an automated workflow.

[1]
Import 600 bp Sanger Sequence ABIF.
[2]
Align to a 450bp reference which starts with ATG and ends with TAG with 2 9 amino acid loops inside.
The sequence is completely conserved apart from the 2 x 9 amino acid loops which are variable.
[3]
Ideally I want to cut out the 450bp DNA sequence,
[3b]
cut out the 2 x 27 base pair loops and
[4]
translate the 2 loops to s x 9 amino acids.

I've done it using your pre-prepared Sanger workflow and then performed all of the extractions and translations manually. It works well - just trying to work out if it can be fully automated.

If so - the only other part I need is to exclude sequence based on quality of either complete 450 bp or 2 x 27bp loops.

Quality information is available in the ABIF (I believe) - but I haven't been able to do this manually in UGene.
I'm more familiar with Quality score in NGS data - and have just found out they're in Sanger ABIF files too.

So the question is in two parts - can we perform the first part (fully automatic extraction/translation/compilation of sections of a sequence) in an automated workflow ?

And is it possible to generate an automated workflow which excludes sequences based on the quality score of the extracted sequence (from an ABIF)?

I really appreciate the help and will work out how to do it myself if I get confirmation that it's possible.

Back to top

IP Logged

Reply #3 - Aug 19^th, 2020 at 1:46pm

Dmitrii Sukhomlinov Offline
YaBB Administrator
Russia

Gender:
Posts: 78

It starts to become clearer, but I still have some questions about it.
First of all, what input data do you have? Do you have a bunch of ABIF files? Or something else? Please, describe the format of the input and, if it possible, please share it to me.

>Import 600 bp Sanger Sequence ABIF.
Import from what to what? What input data do you have to import to something, and what do you want to import these input data to?

>Align to a 450bp reference which starts with ATG and ends with TAG with 2 9 amino acid loops inside. The sequence is completely conserved apart from the 2 x 9 amino acid loops which are variable.
So, the main idea is that you have some reference, and you want to align the data from the previous step to this reference, am I right?

I understand it's a lot of questions, so to speed the process up, I've got a suggestion for you - let's organize a meeting on Skype or Zoom, so you can share a screen for me and I can help you it each point. Does it possible for you?

Best regards,
Dmitrii Sukhomlinov,
The UGENE team.

Back to top

IP Logged

Reply #4 - Aug 19^th, 2020 at 7:06pm

SB Offline
YaBB Newbies

Posts: 5

Hi!
So - the files are generated by a Sequencing company and they give us standard .AB1F files. I've opened it in a text editor and it makes no sense, so I'm not really sure what an AB1F file contains but I think it has quality information and chromatograms.

I want to bring AB1F (a 600 bp sequence) into UGene and align to reference (this I do using your pre-written workflow).

All i then want to do is to automatically extract sections of this 600 bp - which I can do manually in UGene but am not sure if it can be automated. Just to know the mechanism of automatic extraction is what I'm after in the first place.

I believe CLC Workbench (which seems similar to UGene) allows align to reference but not extract subsection in a workflow - and so that's the reason for asking. I'm not sure if it's possible to extract a section (any section) from an imported sequence which has been aligned to an annotated reference in UGene.

So - what I'm after is:

1. Standard AB1F brought into UGene covering 600bp.
2. Alignment of AB1F to annotated 450bp reference with ATG start, TAG stop and 2 x 27bp loops within.
3. Extract a subsection of my sequence based on annotation (ATG to TAG) and then Extract based on Loop 1 annotation and Loop 2 annotation.
3. Translate Loop 1 and 2.
4. Compile extracted DNA and protein sequences in 1 file.

I can do all of this manually in UGene - it's just whether it can all be automated is the question. All that I'm after is knowing which bits can and can't - and then I'll work out how.

I have been using CLC Workbench and it can Align to Reference but can not Extract, can not compile report and can not assess sub-section quality (just does the ends of imported sequence) - and so I'm hoping that UGene will.

What I want is possible given BioPython and the pre-written code to convert AB1F to FastQ - but I'm slooooow

at writing code and am more laboratory-based so don't have the time.

Love the workflows in UGene as they're quick to string together!! Anyway - just wanted to know if it's possible as I believe that an automated workflow of the type I'm after may not be possible in UGene. I really appreciate the offer of help - but I think I'll be wasting your time if UGene can't do what I'm after.

Back to top

IP Logged

Reply #5 - Aug 20^th, 2020 at 11:17am

Dmitrii Sukhomlinov Offline
YaBB Administrator
Russia

Gender:
Posts: 78

Ok, let see what we've got.
As far as I've understood, the bunch of ".ab1f" files is mapping on the reference by using "Trim and Map Sanger reads" workflow (we call it "sample workflows", not "pre-defined workflows"). After this step, you've got the sanger alignment in Chromatogram Alignment Editor (which looks like the same one from the attached picture), am I right?

The next thing to do is to extract some subsequence, but I still don't understand what subsequence you mean. In this view, we've got several components - reference, consensus and reads. Reference and reads are our input files, so I think you need to extract consensus from this alignment. If yes, unfortunately, it could be done from the Multiple Chromatogram Alignment Editor only.

And I've got some questions about further steps. I'm not sure what "annotated 450bp reference with ATG start, TAG stop and 2 x 27bp loops within" is, I mean, what the "loops" are. Maybe you can bring an example?

Also, some questions about "Translate Loop 1 and 2." step. You mean, that you have a sequence and you want to translate sequence into it's amino translation?

And a question about "Compile extracted DNA and protein sequences in 1 file." Compile in which way? Concatenate one-after-another? Or make an alignment? Or some other thing?

sa.png (122 KB | 248 )

Best regards,
Dmitrii Sukhomlinov,
The UGENE team.

Back to top

IP Logged

Reply #6 - Aug 20^th, 2020 at 10:23pm

SB Offline
YaBB Newbies

Posts: 5

Excellent Thanks - What I'd love to know if the following can be automated using a Workflow - I'll use images.

1. Trim and Map Sanger Reads
2. See Image Left and Right.

The reference sequence is at the top - it is about 400 base pairs and is labelled 2016.1

The 2 Sanger sequences are below - they are about 600 base pairs in length and are labelled 66F02 and 67B07.

I want to cut out only the section of the Sanger sequence which aligns with the reference sequence.

The reference sequence only runs from the ATG shown in Left to the TAG shown in right.

The bottom pane shows that there's about 100 base pairs of excess Sanger sequence on the left and about the same amount on the right.

It's easy to extract this sequence manually - I'm trying to work out if the extraction can be automated based on eg the consensus sequence which it always begins with:
ATG->ATCCCGCGTGGCCTG etc... ...
and the consensus sequence which it always ends with:
CATCATCACCAC<-TAG

That's probably the most important part of the question.

Thanks!!

EDIT - ERROR MESSAGE UPLOADING LEFT.

right.jpg (141 KB | 251 )

Back to top

IP Logged

Reply #7 - Aug 25^th, 2020 at 10:42am

Dmitrii Sukhomlinov Offline
YaBB Administrator
Russia

Gender:
Posts: 78

Yes, I have understood now!
Unfortunately, this is very highly specialized actions and there is no way to make them automatic, it should be done by hand.

Best regards,
Dmitrii Sukhomlinov,
The UGENE team.

Back to top

IP Logged

Reply #8 - Aug 25^th, 2020 at 5:07pm

SB Offline
YaBB Newbies

Posts: 5

Thanks for your time ... ... ... into python then I suppose.

Back to top

IP Logged

	Welcome, Guest. Please Login or Register
	Welcome to our forum.