341 Tutorial1 Answers

This document summarizes a bioinformatics tutorial that includes: 1. Transcribing and translating a DNA sequence to an amino acid sequence. A deletion mutation is introduced that changes the amino acid sequence and halves the length of the gene. 2. Calculating the Hamming distance between two strings. 3. Determining the best alignment of two protein sequences using BLOSUM62. 4. Calculating the compositional complexity of amino acid and DNA sequences. 5. Extending a local sequence match into a high-scoring pair (HSP) as in BLAST.

Uploaded by

snowstarffh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

355 views4 pages

341 Tutorial1 Answers

Uploaded by

snowstarffh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 4

Bioinformatics Tutorial 1- Answers

1. Transcribe the following DNA to RNA, then use the genetic code to translate it to a sequence of amino acids. TCATAATACGTTTTGTATTCGCCAGCGCTTCGGTGT To transcribe the DNA, first substitute each DNA for its counterpart (i.e., G for C, C for G, T for A and A for T): AGTATTATGCAAAACATAAGCGGTCGCGAAGCCACA Next, remember that the Thymine (T) bases become a Uracil (U). Hence our sequence becomes: AGUAUUAUGCAAAACAUAAGCGGUCGCGAAGCCACA Using the genetic code is also easy just split the RNA sequence into triplets: AGU AUU AUG CAA AAC AUA AGC GGU CGC GAA GCC ACA then look each triplet (codon) up in the genetic code table. So AGU becomes Serine, which we can write as Ser, or just S. AUU becomes Isoleucine (Ile), which we write as I. Carrying on in this way, we get: SIMQNISGREAT Remove the first letter from this sequence, and start again. Use this example to explain why mutations (including deletions and insertions) are usually deleterious to an organism. Removing the first letter and splitting into codons again gives us: GUA UUA UGC AAA ACA UAA GCG GUC GCG AAG CCA CA GUA translates to Val (V), UUA translates to Leu (L), UGC translates to Cys (C), AAA translates to Lys (K), ACA translates to Thr (T), and UAA translates to STOP. This gives us the sequence: VLCKT STOP Continuing with the translation, we get: AVAKP So, if the above DNA sequence from which the RNA was transcribe was actually a gene, its effective length would have been halved, in addition to all of the amino acids changing in the residue sequence it generated. Given that the protein structure is largely dictated by its shape, and its shape is largely dictated by the residue sequence, we see that it is not surprising that a random mutation such as a deletion will cause harm, or even death to an organism.

2. What is the Hamming distance between these two strings? (ignore the overlapping end) BIOINFORMATICS_IS_THE BEST_FOR_STRUCTURE_PREDICTION To calculate the Hamming distance, just count the number of pairs of letters in the alignment which are not the same. So, the first letter in both sequences is B, so we dont count that. However, the second letter in the first sequence is I, but it is E in the second, so we must count this. If we put a star between the two sentences where they mismatch, we get: BIOINFORMATICS_IS_THE **** ** ********** BEST_FOR_STRUCTURE_PREDICTION There are 16 stars, so the Hamming distance must be 16. Note that the underscores are counted as normal letters in the sequence.

3. Using the BLOSUM62 substitution matrix, what is the best alignment of these two sequences? (Slide one over the other, and score 1 for end gaps, i.e., letters hanging over either ends). FYGNYK DGSFNW To work out the best alignment, we have to write down all the ways to overlap these sequences and work out the BLOSUM scores for each alignment, remembering to take off 1 for every gap (-). Its possible to use a heuristic, and have a look to see if there are any obviously good overlaps. If we score these first, then it may become obvious that all the others will not give us a good score. All possible overlaps are given in the two boxes with their scores. The best overlap is therefore the only one scoring a positive number (5) FYGNYK-DGSFNW score FYGNYK---------DGSFNW FYGNYK-------DGSFNW FYGNYK-----DGSFNW FYGNYK---DGSFNW FYGNYK-DGSFNW FYGNYK DGSFNW -11 -13 -8 -10 5 -14 -FYGNYK DGSFNW--FYGNYK DGSFNW----FYGNYK DGSFNW------FYGNYK DGSFNW--------FYGNYK DGSFNW----score -2 -7 -4 -9 -9

4. What is the compositional complexity of these residue sequences? KKKKTRAITERMMMM and TRAITER Remember that the formula we use for compositional complexity is the following:

Note that L is the sequence length and the nis are the number of occurrences of the letters of the alphabet that can occur in the sequence. As our sequence is a residue sequence, there can only be twenty different letters in the sequence. Well work out the complexity of the longer sequence first. To calculate the compositional complexity using this formula, we need to work out the values we will be putting into it. Firstly, we need length, L, of the sequence, which is 15. Next, we need the number of occurrences of each letter in the sequences. The number of occurrences of those letters we cannot see there is obviously zero. For the rest: there are 4 Ks, 2 Ts, 2 Rs, 1 A, 1 I, 1 E and 4 Ms. So we can write: nK = 4, nT = 2, nR = 2, nA = 1, nI = 1, nE = 1 and nM = 4 Now we need to multiply together all the factorials of these numbers. 0! = 1, so we dont need to worry about the letters which arent there, as we will just be multiplying by 1. Hence, we need to calculate: 4! * 2! * 2! * 1! * 1! *1! *4! = 24 * 2 * 2 * 1 *1 * 1 * 24 = 2304 We now divide L! by this number: 15!/2304 = 567567000, and take log to the base 20 of this big number: log20(567567000) = 6.729. To do this calculation with your calculator, you may need to remember that: logx(y) = ln(y)/ln(x), where ln(y) is the natural log of y, and your calculator should handle this. We finish by dividing our value by the length of the sequence, 15. So finally, our answer is: 6.729/15 = 0.449. The same calculation for TRAITER yields: 1/7 (log20(7/(2!*2!))) = 1/7 (log20(7!/4)) = 1/7 (log20(1260)) = 1/7 (2.383) = 0.340. Hence we see that the second sequence is less complex than the first. Which of these base sequences do you think is more complex? AAAGTGTGTAAC and CCCCAGATAGGATT What is the compositional complexity of these base sequences?

It seems likely that the second sentence is more complex than the first, but well see what the calculations say. We can use the same formula as the first part of the question, but we must remember to change the 20s to 4s, because we are now working with DNA sequences, which contain only four letters. The calculations come to: 1/12 (log4(12!/(5!*1!*3!*3!))) = 1/12 (log4(12!/(4320))) = 1/12(log4(110880)) = 1/12(8.379) = 0.698 for the first sequence, and: 1/14 (log4(14!(4!*4!*3!*3!))) = 1/14 (log4(14!/(20736))) = 1/14 (log4(4204200)) = 1/14 (11.002) = 0.79 for the second sequence. So we see that our guess was correct, the second sentence is indeed more complex than the first. 5. Extend this matching sequence into a HSP as in the BLAST algorithm: CPAGNDYWMIHRLV

WWCTGANDYWVMREH

What is the final BLOSUM62 score for the HSP?

To expand this triple into a HSP, we can first extend it to the left. Remember that we can continue expanding it only until the BLOSUM score for the whole HSP decrease. The current BLOSUM score of DYW:DYW is 24. In the BLOSUM matrix, N:N scores 6, so the HSP NDYW:NDYW scores 30, which is fine. G:A in BLOSUM scores 0, so the score stays the same, which is still OK. A:G scores 0, so we are still OK. However, P:T scores 1, hence the score for PAGNDYW:CGANDYW is 29, which is less than 30. Hence, we cut off our extension to the left before the P:T, giving us AGNDYW:GANDYW. Working to the right now, we can extend past M:V, because this scores 1. We can go past I:M, because this scores 1. H:R scores 0, so this is OK. R:E scores 0, so this is OK. Finally, L:H scores 3, so we cannot extend the HSP all the way, and our final HSP looks like this: AGNDYWMIHR GANDYWVMRE And it scores 32.

09 - AI-900 1-35 - M - Answered
No ratings yet
09 - AI-900 1-35 - M - Answered
9 pages
AI Mock 2
No ratings yet
AI Mock 2
17 pages
Beispielfragen Bioinformatik 1
No ratings yet
Beispielfragen Bioinformatik 1
4 pages
Soln4 15
No ratings yet
Soln4 15
10 pages
Heuristic Local Alignerers: The Basic Indexing & Extension Technique
No ratings yet
Heuristic Local Alignerers: The Basic Indexing & Extension Technique
39 pages
Python Dictionary and DNA Sequence Guide
13% (8)
Python Dictionary and DNA Sequence Guide
33 pages
Computational Genome Analysis: Lecture-4
No ratings yet
Computational Genome Analysis: Lecture-4
60 pages
Introduction To Bioinformatics 3. Sequence Alignment #1
No ratings yet
Introduction To Bioinformatics 3. Sequence Alignment #1
24 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
Bio-Encryption: Paper Presentataion ON
No ratings yet
Bio-Encryption: Paper Presentataion ON
6 pages
02-11-22-Lab-5-MS21212.ipynb - Colaboratory
No ratings yet
02-11-22-Lab-5-MS21212.ipynb - Colaboratory
8 pages
IBS Basic Problems
No ratings yet
IBS Basic Problems
10 pages
Beispielfragen Bioinformatik
No ratings yet
Beispielfragen Bioinformatik
4 pages
Encoding Information For DNA Computing: Shinnosuke Seki
No ratings yet
Encoding Information For DNA Computing: Shinnosuke Seki
45 pages
Bio in For Matics
No ratings yet
Bio in For Matics
10 pages
RNA Structureprediction
No ratings yet
RNA Structureprediction
26 pages
Bioinformatics Assignment: Sequence Analysis
No ratings yet
Bioinformatics Assignment: Sequence Analysis
9 pages
De Novo Assembly of High-Throughput Short Read Sequences: Chuming Chen
No ratings yet
De Novo Assembly of High-Throughput Short Read Sequences: Chuming Chen
38 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Task #4: 1 Ghastly Checkpoints
No ratings yet
Task #4: 1 Ghastly Checkpoints
4 pages
Algorithm As-1
No ratings yet
Algorithm As-1
7 pages
Computational Problem For Practice
No ratings yet
Computational Problem For Practice
18 pages
Gene Sequence Analysis Guide
No ratings yet
Gene Sequence Analysis Guide
14 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
Solutions To Problems From IOI 2018: Tomasz Idziaszek
No ratings yet
Solutions To Problems From IOI 2018: Tomasz Idziaszek
72 pages
Bioinformatics Sequence Alignment
No ratings yet
Bioinformatics Sequence Alignment
3 pages
Ete - 54506QP
No ratings yet
Ete - 54506QP
2 pages
String Matching
No ratings yet
String Matching
116 pages
Bangladesh Informatics Olympiad 2013
No ratings yet
Bangladesh Informatics Olympiad 2013
11 pages
HW1 2014
No ratings yet
HW1 2014
2 pages
Aanchal Maurya Bioinformatics 2
No ratings yet
Aanchal Maurya Bioinformatics 2
24 pages
Iit M Degree An Exam Qdb2 13 July 2025
No ratings yet
Iit M Degree An Exam Qdb2 13 July 2025
189 pages
Finding Low-Complexity DNA Sequences With Longdust: Heng Li and Brian Li
No ratings yet
Finding Low-Complexity DNA Sequences With Longdust: Heng Li and Brian Li
4 pages
CS 1332 - 2018fall - Practice Exam 3
No ratings yet
CS 1332 - 2018fall - Practice Exam 3
10 pages
Tutorial Note 7 Midterm Exam Review (Again!)
No ratings yet
Tutorial Note 7 Midterm Exam Review (Again!)
16 pages
CL662 HW3
No ratings yet
CL662 HW3
5 pages
Final Exam Prep Guide 2021
No ratings yet
Final Exam Prep Guide 2021
18 pages
05 CAP5510 Fall21
No ratings yet
05 CAP5510 Fall21
40 pages
Lecture 9-10 (Sequence Alignment)
No ratings yet
Lecture 9-10 (Sequence Alignment)
48 pages
Solnlug
No ratings yet
Solnlug
10 pages
INFO390C DNDS Pset05
No ratings yet
INFO390C DNDS Pset05
9 pages
Lecture 01 - Genome Sequencing
No ratings yet
Lecture 01 - Genome Sequencing
48 pages
DNA Sequences Analysis: Hasan Alshahrani CS6800
No ratings yet
DNA Sequences Analysis: Hasan Alshahrani CS6800
26 pages
Fin f12 Sol
No ratings yet
Fin f12 Sol
6 pages
01-Intro To Sequence
No ratings yet
01-Intro To Sequence
2 pages
2B Strings
No ratings yet
2B Strings
21 pages
String Matching
No ratings yet
String Matching
4 pages
Sequence Similarity Searching: Basic Local Alignment Search Tool
No ratings yet
Sequence Similarity Searching: Basic Local Alignment Search Tool
47 pages
Design & Analysis of Algorithms Exam
No ratings yet
Design & Analysis of Algorithms Exam
10 pages
Bio Lab 1 Set A
No ratings yet
Bio Lab 1 Set A
2 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
Hors Pool
No ratings yet
Hors Pool
16 pages
Hors Pool
No ratings yet
Hors Pool
16 pages
Solution Notes
No ratings yet
Solution Notes
3 pages
DNA Sequence Alignment
No ratings yet
DNA Sequence Alignment
21 pages
Interpreting DNA SequenceREV
No ratings yet
Interpreting DNA SequenceREV
12 pages
04 CAP5510 Fall21
No ratings yet
04 CAP5510 Fall21
37 pages
Chikkodi Bio Fourm Passing Package 2025
No ratings yet
Chikkodi Bio Fourm Passing Package 2025
49 pages
Secondary Metabolism Part 1
No ratings yet
Secondary Metabolism Part 1
13 pages
Standard Office 2008288791 B2: (12) (19) Patent Australian Patent AU
No ratings yet
Standard Office 2008288791 B2: (12) (19) Patent Australian Patent AU
882 pages
Voltage-Gated Potassium Channels: Gavin Y. Oudit and Peter H. Backx
No ratings yet
Voltage-Gated Potassium Channels: Gavin Y. Oudit and Peter H. Backx
13 pages
Neurotransmission Steps Explained
100% (1)
Neurotransmission Steps Explained
23 pages
DNA Microarrays and Gene Expression
100% (1)
DNA Microarrays and Gene Expression
230 pages
Oncogenes and Tumour Suppressor Genes
No ratings yet
Oncogenes and Tumour Suppressor Genes
59 pages
Inserto HPV Ing
No ratings yet
Inserto HPV Ing
48 pages
Biotechnology 2nd Edition Clark Fast Access
No ratings yet
Biotechnology 2nd Edition Clark Fast Access
298 pages
Interkoneksi Metabolisme Karbohidrat, Lipid, Dan Protein
No ratings yet
Interkoneksi Metabolisme Karbohidrat, Lipid, Dan Protein
17 pages
Lab04 Answerkey
83% (12)
Lab04 Answerkey
5 pages
Simultaneous Profiling of Native-State Proteomes and Transcriptomes of Neural Cell Types Using Proximity Labeling
No ratings yet
Simultaneous Profiling of Native-State Proteomes and Transcriptomes of Neural Cell Types Using Proximity Labeling
19 pages
Week 9 Assignment (Operons)
No ratings yet
Week 9 Assignment (Operons)
7 pages
Unit 2 Cell Functions AP Bio
100% (1)
Unit 2 Cell Functions AP Bio
90 pages
2 Enzymes
No ratings yet
2 Enzymes
31 pages
L6 Progress Test Jan 24 Q
No ratings yet
L6 Progress Test Jan 24 Q
17 pages
Nida Malik - Review Article
No ratings yet
Nida Malik - Review Article
13 pages
Cloning Vectors - I
No ratings yet
Cloning Vectors - I
14 pages
A25 Brochure PDF
No ratings yet
A25 Brochure PDF
2 pages
2.2 Cell Division SF
No ratings yet
2.2 Cell Division SF
7 pages
Biology Olympiad: Module 1: Introduction
No ratings yet
Biology Olympiad: Module 1: Introduction
36 pages
Gem International School Unit Test BIOLOGY
No ratings yet
Gem International School Unit Test BIOLOGY
4 pages
Bacterial Identification Tests Guide
No ratings yet
Bacterial Identification Tests Guide
2 pages
Ch17 - Gene Expression From Gene To Protein - Campbell Biology 12th
No ratings yet
Ch17 - Gene Expression From Gene To Protein - Campbell Biology 12th
30 pages
Course: MSC BT Semester: Iii Subject Code: MBT 305 Subject Name: Computational Biology Unit Number: 1 Unit Title: Introduction To Bioinformatics
No ratings yet
Course: MSC BT Semester: Iii Subject Code: MBT 305 Subject Name: Computational Biology Unit Number: 1 Unit Title: Introduction To Bioinformatics
19 pages
Multiple Choice Questions in Biochemistry.
100% (4)
Multiple Choice Questions in Biochemistry.
294 pages
MALLILLIN - Isolation and Hydrolysis of Casein From A Non-Fat Milk
No ratings yet
MALLILLIN - Isolation and Hydrolysis of Casein From A Non-Fat Milk
6 pages
Science: Mahay Integrated Secondary School
No ratings yet
Science: Mahay Integrated Secondary School
8 pages
Carmencita R. Pacis, RN Man PHD
No ratings yet
Carmencita R. Pacis, RN Man PHD
35 pages
DR Siraj Ahmad
No ratings yet
DR Siraj Ahmad
20 pages

341 Tutorial1 Answers

Uploaded by

341 Tutorial1 Answers

Uploaded by

Bioinformatics Tutorial 1- Answers

What is the final BLOSUM62 score for the HSP?

You might also like