Name: …………………..
Class: …………………..
Student-ID: …………….
Introduction to Ensembl
h ps://www.youtube.com/watch? me_con nue=1442&v=lA2xq3YkWko
Exercise 1 – Panda
(a) Go to the species homepage for Giant Panda. What is the name of the genome
assembly for Panda?
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
(b) Click on More information and statistics. How long is the Panda genome (in bp)?
How many coding genes have been annotated?
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
Exercise 2 – Zebra sh
What previous assemblies are available for zebra sh?
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
Exercise 3 – Mosquitoes
(a) Go to Ensembl Metazoa. How many species of the genus Anopheles are
represented in Ensembl Metazoa?
(b) When was the current Anopheles gambiae genome assembly last revised?
tt
fi
ti
ti
fi
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
Exercise 4 – Bacteria
Go to Ensembl Bacteria and nd the species Belliella baltica. How many coding and
non-coding genes does it have?
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
Exercise 5 – Exploring the human MYH9 gene
(a) Find the human MYH9 (myosin, heavy chain 9, non-muscle) gene, and go to the
Gene
• On which chromosome and which strand of the genome is this gene located?
• How many transcripts (splice variants) are there and how many are protein coding?
• What is the longest transcript that codes for protein, and how long is the protein it
encodes?
(b) Click on Phenotype at the left side of the page. Are there any diseases associated
with this gene, according to O-MIM (Online Mendelian Inheritance in Man)?
(c) In the transcript table, click on the transcript ID for MYH9-201, and go to the
Transcript tab.
• How many exons does it have?
• Are any of the exons completely or partially untranslated?
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
fi
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
Exercise 6 – Finding a gene associated with a phenotype
Phenylketonuria is a genetic disorder caused by an inability to metabolise phenylalanine
in any body tissue. This results in an accumulation of phenylalanine causing seizures
and mental retardation.
(a) Search for phenylketonuria from the Ensembl homepage and narrow down your
search to only genes. What gene is associated with this disorder?
(b) How many protein coding transcripts does this gene have? View all of these in
the transcript comparison view.
(c) What is the OMIM gene identi er for this gene?
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
Exercises 7 - Exploring the Human genome
(a) What is the current human genome assembly? What are the previous
assemblies?
(b) What is the size of a human genome? How many coding genen (Primary
assembly)?
fi
(c) Search for the gene BRCA2? What is the position of the gene? How many
protein coding transcripts does this gene have?
(d) The same question as question (c), but using the GRCh37 human genome
assembly version.
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
Exercise 8 — Human population genetics and phenotype data
The SNP rs1738074 in the 5’ UTR of the human TAGAP gene has been identi ed as a
genetic risk factor for a few diseases.
(a) In which transcripts is this SNP found?
(b) What is the least frequent genotype for this SNP in the Yoruba (YRI) population from
the 1000 Genomes phase 3?
(c) What is the ancestral allele? Is it conserved in the 90 eutherian mammals EPO-
Extended?
(d) With which diseases is this SNP associated? Are there any known risk (or
associated alleles?
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
fi
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
Exercise 9 – Exploring a Coprinopsis cinerea okayama region
(a) Go to the region 7:1400000-1425000 in Coprinopsis cinerea okayama in Ensembl
fungi.
(b) How many complete genes are found in this region? How many on the forward and
how many on the reverse strand?
(c) Zoom in on the largest gene EFI27358. How many exons does this gene have?
(d) Export the cDNA sequence of the transcript variant EFI27358
.……………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………
………………………………………………………………………………………………………