Biological Databases Lab
BBIT418P
Assessment – 1
By:
Suryaa .A
21BCB0032
Experiment 1
Accessing Biological database/tools
Aim: To get familiarize with the different biological databases and tools.
1. Answer the following questions using search engine?
a.) What is a data related to biology?
Biological data refers to any information related to living organisms. This
includes data on genes, proteins, cellular components, and entire ecosystems. It
can come from experiments (e.g., DNA sequences, protein structures),
observations (e.g., ecological data), or computational predictions (e.g., gene
functions). Biological data is often used to understand the structure, function,
and evolution of organisms.
b.) What are databases?
A database is an organized collection of data that can be easily accessed,
managed, and updated. Databases store large amounts of information in a
structured format, allowing for efficient retrieval and manipulation of data. They
can be used for a variety of purposes, from managing business information to
storing scientific data.
c.) What are biological databases?
Biological databases are specialized databases that store data related to biology.
These databases collect, organize, and make accessible information such as
DNA sequences, protein structures, gene functions, and more. They are essential
tools for researchers in the fields of genomics, proteomics, and other areas of
biology, enabling them to retrieve and analyze biological data efficiently.
d.)Give an examples for sequence database, structure database and
literature database.
Sequence Database: GenBank is a comprehensive public database of
nucleotide sequences and supporting bibliographic and biological
annotation.
Structure Database: Protein Data Bank (PDB) is a database of 3D
structural data of large biological molecules, such as proteins and nucleic
acids.
Literature Database: PubMed is a free search engine accessing
primarily the MEDLINE database of references and abstracts on life
sciences and biomedical topics.
2. Visit the following biological database website URL's and answer the
following questions
[Link]
[Link]
[Link] or [Link]
[Link]
[Link]
a) How the sequence are arranged in NCBI database?
b) Which database is more specific to the literature data among the above
databases?
c) Name the database which is specific to protein 3D structure? Describe
the contents present in it.
d) Write the feature of the following tools
i)BLAST – NCBI
Sequence Alignment: Compares a query sequence against a database.
Fast Searches: Optimized for quick alignment of large datasets.
Customizable: Adjustable parameters and various output formats.
ii)Clustal omega
Multiple Sequence Alignment: Aligns many sequences simultaneously.
Scalability: Handles large datasets effectively.
High Accuracy: Uses advanced algorithms for precise alignments.
iii)Swiss-PdbViewer
3D Visualization: Displays detailed protein structures.
Mutagenesis: Models and analyzes protein mutations.
Energy Minimization: Optimizes protein conformations.
iv)SwissDock
Molecular Docking: Predicts how small molecules bind to targets.
Web-Based: Accessible through a web interface for ease of use.
V)STRING-covid
Protein-Protein Interactions: Maps interactions between proteins related
to COVID-19.
Data Integration: Combines data from various sources to study viral
mechanisms.
Experiment 2
Conversion of sequence/structure from one database format to another
using file format converter
AIM: To convert sequence from one database format to another using file
format converter
PROTOCOL:
1. Open Browser (Chrome or Firefox) from your system
2. Visit the OPENBABEL Chemical File Format Converter
([Link]
[Link])
Part A
I)Input Query sequence (SMILES format):
1.C1=CN=CC=C1C(=O)NN
2.CC1=CC(=C(C(=N1)SC)NC(=O)CN2CCN(CC2)CCSC3=NC4=CC=CC=C4
N3)SC
Try for the following and save the results
a. Retrieve the PDB format for the input query sequence
i)C1=CN=CC=C1C(=O)NN
ii)CC1=CC(=C(C(=N1)SC)NC(=O)CN2CCN(CC2)CCSC3=NC4=CC=CC=C4
N3)SC
b. Retrieve the SDF format for the input query sequence
i)C1=CN=CC=C1C(=O)NN
ii)CC1=CC(=C(C(=N1)SC)NC(=O)CN2CCN(CC2)CCSC3=NC4=CC=CC=C4
N3)SC
II)Input Query sequence (SDF format):
[Link] the following structure in the structure editing panel
OUTPUT:
Convert the following sequence (text file) into fasta format
GACAGTCTTACGTACTAATGCGTCGTACGTCGTCGTGTCGTAGTA
GTCGTGTCTGCAATCAGTAGTGCTATGCGTCGTCGTAGTAGTCCT
GTCGTGATGCTGACTGTGCTGCTGCGTACTGAGCTGTCGCGCGT
GTCTGCAGTCGTCGTAGTCGTACGTACGTACGCCGTGTACGTACG
TGTACGTACTGCTGCACGTAGCTAGATCGTACTCATGTCATCGAT
C
OUTPUT: