
A crucial part of any molecular experiment is the validation phase - known as sequence alignment, in which you need to verify whether the template DNA sequence you designed is identical to the actual sequence you have in hand. This phase can be done efficiently and easily when you work with a software tool that allows you to set up the criteria for the sequence alignment, as well as editing your template sequence based on the sequencing result you have.
Here’s the top 5 essential tools to look for in your sequence alignment software:
- Efficient Visualization
A convenient visualization tool is the first ‘must have’ feature. When you first view the alignments, it is essential to have an immediate high level representation of the data (Figure 1) . This provides you with an instant impression of the quality of your sequencing.Figure 1 - High level representation of alignments against template sequence:
Additionally, the ability to quickly pinpoint discrepancies will save you a lot of time in searching through the sequence. For example, in figure 2 below, you can immediately pinpoint specific, incompatible bases. Selecting this region in the high level circular view automatically selects the specific area inside the the corresponding sequence view, where you can take a closer look and from there decipher the specific issue.Figure 2 - Quickly pinpointing discrepancies between your template and aligned sequences:
- Manual and Auto Trimming
Sequencing files contain noisy data at the 5' and 3' ends of the sequence. Such data needs to be trimmed to avoid false positive detection of misaligned bases. Make sure to work with a software tool which allows you to both automatically and manually trim the sequence, as well as conveniently view the trimmed areas. When auto-trimming your sequence, a good software tool, should trim the sequence according to a set of user-defined parameters (figure 3) , such as the percentage cut-off for good bases within a defined base pair range.
Figure 3 - Auto-trimming settings:Figure 4 - Viewing auto-trimmed sequence:
Additionally, you should be able to manually trim the data at the sequence ends, for a more precise selection.Figure 5 - Manually trimming a sequence:
- Confidence Score
The confidence score is a number assigned to each base in a chromatogram showing the percentage that base can be trusted. A low confidence score means that the predicted base identity is uncertain. A high confidence score means that the base identity can be trusted .
While most tools work with a default threshold confidence score of 25-30 (underneath that score the base is considered untrusted), make sure to work with a tool that lets you manually define this score as well. This will ensure that bases are automatically trimmed according to your desired preference.Figure 6 - Setting a confidence score for auto-trimming data:
- Chromatogram Height
Another important feature that will help you to investigate the sequencing results and the extent of the match to the original template file, is the ability to toggle the chromatogram height. This enables you to easily look into specific areas where the sequencing signal is low but the confidence score is still high, such as the example in Figure 7 and Figure 8.Figure 7 - Low chromatogram peaks:
Figure 8 - The same area of sequence after increasing the height of the chromatogram peaks:
- Edit your template sequence based on the alignment results
Last, but definitely not least, you should be able to easily edit your template sequence according to the sequencing results. Such edits can be:- Fix mismatches
Any mismatch between the sequence and the original template should have the ability to be changed easily, whether to fit the sequencing result or to the original template.Figure 9 - Fixing mismatches between alignment and template:
- Fix ambiguous base calls
It should be possible to easily change an ambiguous base to any other base according to the chromatogram results as in Figure 10.Figure 10 - Changing ambiguous bases:
- Edit any gaps in the sequence
Depending on the specific area in which the gap is in (important area in the sequence or not), you should be able to decide if you want to delete this gap from the original template sequence, or either copy the base pairs from the original template to the actual sequence result (Figure 11).Figure 11 - Deleting gaps or adding bases to the alignment sequence
- Add or remove any additions that were found in the sequencing results
In cases like these, you should be able to choose whether you want to add the additional base pairs that were found in the alignment sequence, to the original template sequence or to delete them from the alignment itself (Figure 12).Figure 12 - Adding or removing additions detected in the alignment sequence:
- Fix mismatches
All of these must-have features are available in Genome Compiler’s free all-in-one software tool. To learn more about how to align your sequences using Genome Compiler, check out this short tutorial video.
How about manually editing the alignment itself? If the matching is not great the clustalO might not give a correct output, so can one manually open/move the chromatogram on the template? Another example, how about if I want to align a cDNA on a DNA? How can I correct (if needed) the gaps?