[go: up one dir, main page]

0% found this document useful (0 votes)
23 views5 pages

Bioinfo Lab - Exp 5 9921001004

The document details a bioinformatics laboratory experiment using the BLAST program to perform a database similarity search for the protein 'actin' in Homo sapiens. It outlines the background of BLAST, its various programs, and the parameters used in the algorithm, along with a step-by-step procedure for conducting the search and analyzing the results. The inference highlights BLAST's capability to retrieve similar sequences and provide additional information regarding lineage and taxonomy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views5 pages

Bioinfo Lab - Exp 5 9921001004

The document details a bioinformatics laboratory experiment using the BLAST program to perform a database similarity search for the protein 'actin' in Homo sapiens. It outlines the background of BLAST, its various programs, and the parameters used in the algorithm, along with a step-by-step procedure for conducting the search and analyzing the results. The inference highlights BLAST's capability to retrieve similar sequences and provide additional information regarding lineage and taxonomy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

212BIT1304 – Bioinformatics

Laboratory Record
Date: 03.01.2023

Reg. No: 9921001004

Name of the Student: Annie magdaline

Experiment Number: 5

Experiment Title: Database Similarity Search using BLAST

Aim: To perform a database similarity search using BLAST

Background Information:

BLAST program was designed by Stephen Altschul, Warren Gish, Webb Miller, EugeneMyers,
and David J. Lipmann at the National Institutes of Health (NIH), USA. BLAST (Basic Local
Alignment Search Tool) is a heuristic search algorithm, it finds solutions from all possibilities,
which takes input as nucleotide or protein sequence and compare it with existing databases like
NCBI, GenBank etc. This finds the local similarity between different sequences and calculates
the statistical significance of matches. It can also be used to find functional and evolutionary
relationship between different sequences. Search is done by taking the sequence of a certain
word size, comparing it with the database sequence and scores are assigned for each comparison.
Based on the threshold, a suitable match of that query word is taken and the alignment is
extended to both sides. After the alignment is completed, the total score is calculated and the
alignment is displayed on the BLAST results‟ page only, if the total scores exceed the threshold
value.

Nucleotide BLAST Programs:

 BLASTN: Searches a nucleotide query against a nucleotide sequence database

 Mega BLAST: Searches for highly similar sequences.

 Discontiguous Mega BLAST: Searches for more dissimilar sequences.

Protein BLAST Programs:

 BLASTp: Finds the similarity between the query protein sequence to the protein
sequences available in the protein database.

 PSI-BLAST: Position-Specific Iterated-BLAST is the most sensitive BLAST program. It


is used to find very distantly related proteins or new members of a protein family.

 PHI-BLAST: Pattern-Hit Initiated BLAST is used to find protein sequences which


contains

Parameters used in BLAST algorithm:


Threshold: It is a boundary of minimum or maximum value which can be used to filter out
words during comparison.

True Homology: In BLAST true homology refers how much the sequence is similar to the
query sequence.

E-value: It decreases exponentially with the score that is assigned to an alignment between
two sequences.

Word size: Whole Search is done by taking the sequence of a certain word size and compares
it with the database sequence and scores are assigned for each comparison. Word size is given as
11 for nucleic acids and 3 for proteins.

Putative conserved domains: These are the domains that have different functionalities.

Procedure:

Let us take “actin” as our protein of interest. Our aim is to get the sequence of
“actin” of Homo sapiens from database and to find out the similar sequences in other
organisms. To get the protein sequence of “actin” of Homo sapiens, following steps
should be followed. (Note: Actin is a globular, roughly 42-kDa moonlighting
protein found in all eukaryotic cells (the only known exception being nematode
sperm), where it may be present at concentrations of over 100 μM. It is also one of
the most highly-conserved proteins, differing by no more than 20% in species as
diverse as algae and humans.)

1. Open the BLAST home page.


2. Select “BLAST P” from the list of BLAST variants.
3. To submit the sequence for BLAST, there are two options. You can copy
the sequence from the “actin-blast” notepad, and paste it in the box provided in
the BLAST page. Or click on “upload” button and select the “actin-blast” file
and click on “open”.
4. Click on “Submit”.
5. It will take few minutes to be directed in to the result page.

BLAST- result analysis


BLAST result page has three parts.
1. Graphic summary
2.Description
3. Alignment

Graphic summary includes conserved domains detected in the given sequence


and distribution of 100 BLAST hits on the query sequence. Conserved domain result
tells us that the query sequence belongs to “actin superfamily”. In the distribution of
BLAST hits, the regions in the query sequence that matches with other sequences are
highlighted with colored lines. Coloring scheme for BLAST denotes the following
interpretations.

In the “descriptions” part of the result page there will be a table containing
information such as query coverage, E-Value and percent identity. Lower the E value
higher the significance of the result (for more details refer BLAST explanation in the
previous pages of this book). The results will give 100% identity for the first two hits
followed by 99% for the following hits.
There are some links (single lettered symbols) at the last column of the description
table. Clicking on any of the symbols will direct to other websites which provides
more information regarding the “query sequence”.
The legends for the links to other resources:

- Unigene
- GEO

- Gene

The third part of the result page contains the alignment of the query sequence with
all the hit sequences. Each alignment contains the query sequence in the first line and
the hit sequence in the last line and the centre line shows the identical or similar
amino acids corresponding to each amino acid pair in the query and hit. The „+‟
symbol in the center line of the alignment indicates the two amino acids (in query
and hit respectively) are similar but not identical. The “-“ (gap) symbol in the center
line of the amino acid indicates that the two amino acids are neither identical nor
similar.
Activity 1: BlastN
Take ‘actin’ nucleotide sequence as the query sequence and perform BlastN and display
the result analysis

Screenshots:

Figure 5.1: Result for interested nucleotide sequence.


Activity 2: BlastP
Take ‘actin’ protein sequence as the query sequence and perform BlastP and display the
result analysis

Screenshots:

Figure 5.2 - Result for interested protein sequence.

Inference:
BLAST is the platform which used to retrieve the similar sequences. These sequences can be
in the forms like nucleotide, protein, etc. We can also get the additional information about the
sequences in the form of graph. It shows the lineage and taxonomy of the sequence which is used
for further references.

You might also like