Long Study Guide
Long Study Guide
including detailed explanations of key concepts and discussion questions with answers to
enhance your understanding.
1. Sequencing Genomes
This section covers various techniques for studying genomes, from amplifying DNA and RNA to
advanced sequencing technologies and bioinformatics tools.
Key Concepts:
Studying Genomes
o Genomes are too large and complex to study directly, so they are broken into
small fragments.
o PCR (Polymerase Chain Reaction): Amplifies specific DNA sequences in vitro.
DNA doubles each cycle through three steps: Denaturation (separating
dsDNA at 95°C), Annealing(primers bind to target DNA at 55°C),
and Extension (Taq DNA polymerase extends primers at 72°C).
o Cloning: Creates identical copies of DNA fragments using host cells, typically
bacteria.
Steps: Insert DNA into a cloning vector (must have a replication origin,
multiple cloning sites, selectable markers, and screenable markers),
transform recombinant plasmid into a host cell, select colonies with
antibiotics and color screening, and grow overnight cultures to get plasmid
DNA.
Colony Selection: Plasmids often contain the lacZ gene, which codes for
the enzyme β-galactosidase. This enzyme cleaves X-gal sugar, turning the
colony blue.
Blue colonies indicate that the target DNA was not inserted
because the lacZ gene is intact.
White colonies indicate that the target DNA was inserted,
disrupting the lacZ gene, preventing β-galactosidase production
and X-gal cleavage.
RNA Techniques
o RT-qPCR (Real-Time Quantitative PCR): Amplifies cDNA from RNA in real
time, with fluorescence correlating to the amount of starting RNA.
Quantifies specific transcripts like mRNA and miRNA.
Requires reverse transcription (RNA to cDNA), a fluorescent
probe/dye (e.g., SYBR green, TaqMan), and a cq value output reflecting
transcript abundance.
SYBR green: Intercalates into the minor groove of dsDNA and can bind
to non-specific products like primer dimers.
TaqMan: Uses a reporter dye at the 5’ end and a quencher dye at the 3’
end; Taq polymerase cleaves the probe, causing fluorescence.
Applications: Gene expression analysis and validation of RNA-seq
results.
o RNA-seq (RNA Sequencing): Unbiased, hypothesis-free method to profile gene
expression by sequencing the transcriptome.
Detects transcripts, alternative splicing, and novel transcripts (those
transcribed from the genome but not in existing gene annotations or
databases).
Sensitive enough for single-cell analysis.
Uses sample barcoding for pooling and cost efficiency.
Requires bioinformatics for large data analysis.
DNA Techniques
o dPCR (Digital PCR): Partitions samples into thousands of individual PCR
reactions, each with 0, 1, or a few DNA molecules.
The Poisson distribution corrects for over/underestimation of DNA
concentration.
More accurate than qPCR.
Provides a binary readout (positive samples fluoresce, negative ones do
not), offering better sensitivity.
Does not require a standard curve because it directly counts molecules.
Unaffected by inhibitors.
o FISH (Fluorescence in situ Hybridization): Detects specific RNA or DNA
molecules in cells or tissues using fluorescently labeled RNA/DNA probes.
Offers high spatial resolution, allowing gene expression analysis at
cellular or organ levels.
Can be multiplexed with different colored probes.
o Methyl-seq (Methylation Sequencing): Maps DNA methylation patterns, which
are epigenetic marks that silence genes.
Methylated CpG sites often correlate with gene silencing, especially in
promoter regions.
Method: Uses bisulfite conversion, which converts unmethylated cytosine
to uracil, while methylated cytosines remain unchanged. After sequencing,
bisulfite-treated DNA is compared to untreated DNA to identify
methylated sites.
Applications: Epigenetics, developmental biology, and cancer research.
Protein Techniques
o Western Blotting (Immunoblotting): A method to detect specific proteins.
Steps: Protein separation by SDS-PAGE, transfer to a membrane, and
detection with specific antibody probes.
Chromatin Configuration Techniques
o Hi-C-seq (Chromosome Conformation Capture Sequencing): Maps 3D
chromatin interactions within the nucleus.
Method: Crosslink DNA and associated proteins, digest, ligate, and purify
chimeric DNA fragments that reflect spatial interactions, then biotin label
and sequence.
Applications: Detects TADs (topologically associating domains),
chromatin loops, A/B compartments, and chromosome territories. Used to
understand gene regulation and its disruption in diseases like cancer.
o ChIP-seq (Chromatin Immunoprecipitation Sequencing): Identifies DNA-
protein binding sites, such as transcription factors (TFs) and histones.
Methods: Crosslink proteins to DNA, fragment DNA (e.g., by sonication),
immunoprecipitate with a specific antibody, and then sequence the bound
DNA.
Applications: Maps TF binding, histone modifications, and changes in
chromatin binding under different conditions (e.g., euchromatin vs.
heterochromatin).
o ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing):
Maps open chromatin regions (euchromatin).
Uses Tn5 transposase, which cuts accessible DNA and inserts adapters.
Shows regulatory elements, TF motifs, and changes in accessibility across
cell types or conditions.
Single Cell Genomics (SCG)
o Aim: Characterize gene expression, epigenetic marks, and mutations in individual
cells to create a complete catalog of human cell types and reveal cellular
heterogeneity.
o Technologies: Single-cell RNA-seq and ATAC-seq.
o Applications: Lineage tracing, biomarker discovery, therapeutic monitoring. It
can identify rare cell types (e.g., stem cells, cancer precursors), study disease
progression and developmental trajectories, track cellular lineages, and contribute
to projects like the Human Cell Atlas.
Sequencing Techniques
o Sanger Sequencing (1st Generation): A chain termination method.
Uses a DNA template, primer, DNA polymerase, dNTPs, and
fluorescently labeled ddNTPs (dideoxynucleotides).
ddNTPs terminate elongation because they lack a 3’ OH group.
Products are separated by capillary electrophoresis, and the sequence is
read from a fluorescence-based chromatogram/electropherogram.
Applications: Good for single gene sequencing, validating mutations
identified by NGS, and genotyping known variants. Used in small
sequencing projects (e.g., verifying mutations) and early genome projects
(e.g., human, mouse, dog).
Strengths: Read length of approximately 1000 bp, low error rate, and
robust (still used for specific tasks).
Limitations: Low throughput, time-consuming, and expensive per base
for large projects.
o Next Generation Sequencing (NGS, 2nd Generation): Involves parallel
sequencing of millions of fragments and requires amplification.
Steps: DNA fragmentation, adapter ligation, amplification (PCR-based),
sequencing by synthesis (e.g., Illumina using fluorescence-based
reversible terminator chemistry), detection and base calling, alignment to a
reference genome, and variant calling and annotation.
Strengths: Massive data output, suitable for discovering new genes, and
rare variant discovery (SNPs, indels, CNVs).
Limitations: High initial equipment costs, requires strong bioinformatics
support, short reads, and data overload.
Technologies:
Illumina (Solexa): Short reads, high throughput, low cost per base.
Challenges include difficulty with repetitive regions,
overclustering, and signal loss.
Ion Torrent: Various chemistries and read lengths.
Applications: Whole genome/exome/transcriptome sequencing, RNA-seq,
ChIP-seq, and identifying SNPs, indels, CNVs, and alternative splicing.
o Third Generation Sequencing (TGS): Single-molecule sequencing, usually in
real time and often without PCR, which minimizes GC bias.
Strengths: Long reads to resolve complex genomic regions and minimal
GC bias.
Limitations: Lower accuracy than NGS, but improving; might require
intensive data analysis.
Technologies:
PacBio (SMRT - Single Molecule Real-Time sequencing): Long
reads, no PCR (minimal GC bias), uses SMRTbell (circular DNA)
as a template, real-time fluorescent detection as DNA polymerase
incorporates nucleotides, and directly detects base modifications
(e.g., methylation).
Oxford Nanopore (MinION): Portable, very long reads, low cost,
real-time sequencing. Weaknesses include a high error rate.
Measures changes in electric current as DNA passes through a
protein nanopore. No PCR needed.
Applications: Genome assembly, structural variation, full-length
transcriptomics, and epigenetics.
SNP Genotyping
o SNP genotyping: Identifies Single Nucleotide Polymorphisms (SNPs).
o Methods:
SNP chips/microarrays: Probes for thousands of known SNPs across the
genome; high throughput, used in genetic mapping and association studies
for disease screening, expression profiling, and breed mapping in animals.
TaqMan Assays: Fluorescent probe-based qPCR targeting specific SNP
alleles, with allele-specific probes having different reporter dyes (e.g.,
FAM, VIC) and a quencher dye; highly specific.
RFLP (Restriction Fragment Length Polymorphism): SNPs must
affect a restriction enzyme site; involves PCR, enzymatic digestion, and
gel electrophoresis.
Sequencing-based genotyping: Direct sequencing (e.g., Sanger, NGS) to
detect SNPs.
Microsatellite genotyping: Uses STRs (short tandem repeats), which are
very polymorphic; involves PCR and polyacrylamide gel electrophoresis.
Applications: Forensics, paternity testing, and population genetics.
Bioinformatics
o Definition: Application of computational tools to understand, organize, and
analyze molecular biological data; described as a "management information
system for molecular bio".
o Disciplines: Combines biology, computer science, medicine, physics,
mathematics, and statistics.
o Core Objectives: Model biological systems (e.g., protein folding, cellular
pathways), understand sequences, organize vast sequence data, and handle the
"data deluge" from high-throughput technologies (e.g., Illumina, PacBio).
Important for studying gene expression, regulatory networks, and evolutionary
relationships.
o Evolution of the Field:
Pre-genomic era: Focused on sequence → structure → function.
Post-genomic era: Shifted to genome → cell function → phenotype, with
an expansion of "omics" disciplines.
o Modern Implications: Bioinformatics has become an information science, and
experimental biology now relies on computations for candidate gene discovery,
phenotype-genotype correlations, visualization, and simulation of cellular
networks.
o Comparing Biological Sequences: Aims to identify homologous sequences.
Homologs: Sequences sharing a common ancestor.
Orthologs: Same gene in different species, evolved from a common
ancestor.
Paralogs: Gene duplication within the same organism.
Evolutionary Events: Insertion, deletion, substitution, duplication,
inversion, and translocation all drive sequence divergence across species
and within genomes.
o Sequence Alignment: Measures similarity by comparing nucleotides or amino
acids.
Scores for matches, mismatches, and gaps (indels).
Types: Global alignment (aligns entire sequence end-to-end) or Local
alignment (finds best matching subsections/domains).
Scoring and Speed: Scoring matrices (e.g., BLOSUM, PAM for proteins)
weigh biologically likely substitutions and conserved residues. Gap
penalties (gap opening and extension cost models) discourage excessive
indels. Computationally heavy, so optimization and heuristics like BLAST
are used for feasibility.
o Gene Annotation: Assigns biological meaning to raw sequences.
Methods include similarity search, comparative genomics, and de
novo prediction.
o Genome Browsers: Software tools enabling visualization and exploration of
annotated genomic sequences.
Display conserved sequences, synteny (conserved gene order), and
orthologous genes across species.
Examples: NCBI Genome Data Viewer, UCSC Genome Browser,
Ensembl.
Integrated with BLAST tools, comparative genomics functions, and
transcript annotations (e.g., RefSeq, GENCODE).
Information Provided: Gene names, symbols, synonyms, transcript
variants (alternative splicing), protein domains and structure, Gene
Ontology (GO) annotations, expression data (e.g., RNA-seq), variation
data (SNPs, indels), and links to literature and databases.
Practical Use: Using these browsers can prevent "running experiments
blindly".
Features: Interactive interface for zooming, track-based format (users can
turn data layers on/off), and links to functional data (e.g., GO), clinical
significance (e.g., ClinVar), and population variants (e.g., 1000 Genomes
Project).
Applications: Identify gene structures and regulatory elements, compare
genomes across species (conservation analysis), design primers or probes,
view experimental data (e.g., ChIP-seq, RNA-seq), and for educational
and research exploration of genome architecture.
o Tools for Homology-based Searching: BLAST, BLAT, Gene trees (Ensembl),
Genome alignments and conservation tracks.
o BLAST (Basic Local Alignment Search Tool): A tool to quickly compare
sequences.
Steps: Break query into short "words", match to database, extend
alignments around word matches.
Applications: Identify unknown genes/proteins, annotate new genomes,
and find conserved regions across species.
1. Question: Explain how RT-qPCR and RNA-seq differ in their approach to studying gene
expression, and what unique insights each technique offers.
o Answer: RT-qPCR focuses on quantifying specific, pre-selected transcripts by
real-time amplification of cDNA from RNA, providing precise measurements of
transcript abundance for a targeted set of genes. It is highly sensitive and specific.
In contrast, RNA-seq is an unbiased, hypothesis-free method that sequences the
entire transcriptome, providing a comprehensive profile of all expressed genes.
RNA-seq can detect novel transcripts, alternative splicing events, and is sensitive
enough for single-cell analysis, offering broader insights into gene expression
dynamics without prior knowledge of specific genes. While RT-qPCR is good for
validation, RNA-seq is for discovery.
2. Question: Describe the principle of DNA cloning, including the role of the lacZ gene and
X-gal in distinguishing successful DNA insertion.
o Answer: DNA cloning involves creating identical copies of DNA fragments by
inserting them into a cloning vector, which is then introduced into a host cell,
typically bacteria. The vector must have a replication origin, multiple cloning
sites, and markers for selection and screening. A common screening method uses
the lacZ gene, which codes for β-galactosidase. This enzyme breaks down X-gal
sugar, producing a blue color. When the target DNA is successfully inserted into
the cloning site within the lacZgene, it disrupts the gene, preventing β-
galactosidase production. As a result, colonies containing the inserted DNA
remain white because X-gal is not cleaved. If the target DNA is not inserted,
the lacZ gene remains intact, β-galactosidase is produced, and the colony
turns blue when X-gal is present.
3. Question: Compare and contrast Sanger sequencing, Next Generation Sequencing
(NGS), and Third Generation Sequencing (TGS) in terms of their core methodology,
strengths, and limitations.
o Answer:
Sanger Sequencing (1st Gen) uses a chain termination method with
ddNTPs to stop DNA synthesis at specific bases, followed by capillary
electrophoresis to separate fragments by size, and a fluorescence-based
read-out. Its strengths include read lengths of ~1000 bp, low error rates,
and robustness for single gene sequencing or validating other methods.
Its limitations are low throughput, time consumption, and high cost per
base for large projects.
NGS (2nd Gen) involves parallel sequencing of millions of DNA
fragments after amplification. It typically uses sequencing by synthesis
with reversible terminator chemistry (e.g., Illumina). Its strengths are
massive data output, suitability for discovering new genes, and rare variant
detection (SNPs, indels, CNVs). Its limitations include high initial
equipment costs, requirement for strong bioinformatics support, short
reads, and potential data overload.
TGS (3rd Gen) performs single-molecule sequencing, often in real-time
and without PCR, minimizing GC bias. Technologies like PacBio (SMRT)
detect nucleotides as they are incorporated, and Oxford Nanopore
(MinION) measures electrical current changes. Strengths are very long
reads, which are excellent for resolving complex genomic regions and
structural variations, and minimal GC bias. Limitations currently include
lower accuracy than NGS, though this is improving, and potentially
intensive data analysis.
4. Question: Discuss the role of bioinformatics in the post-genomic era. How has the field
evolved, and what are its core objectives and modern implications?
o Answer: In the post-genomic era, bioinformatics has shifted from focusing on
sequence-to-function predictions to understanding how the entire genome relates
to cell function and phenotype. The field has expanded significantly due to the
"omics" revolution. Its core objectives include modeling biological systems,
understanding sequences, organizing vast amounts of sequence data, and
managing the "data deluge" generated by high-throughput technologies like
Illumina and PacBio. Modern implications are that biology has become
an information science, with experimental biology increasingly relying on
computational methods for tasks like candidate gene discovery, genotype-
phenotype correlations, and visualization/simulation of cellular networks.
Bioinformatics is crucial for comparing biological sequences to identify
homologs, orthologs, and paralogs, and for performing sequence alignment and
gene annotation to assign biological meaning to raw data.
2. Organization of Human Genomes
This section details the structure and composition of the human mitochondrial and nuclear
genomes, including protein-coding genes, non-coding RNAs, and repetitive DNA elements, as
well as the modern definition of a gene from the ENCODE project.
Key Concepts:
1. Question: Compare and contrast the key features of the human mitochondrial and
nuclear genomes, highlighting their structural, inheritable, and functional differences.
o Answer: The mitochondrial genome is a circular DNA molecule, inherited
exclusively from the mother, exists in many copies per cell (many mitochondria
per cell, none in RBCs), is relatively small (16,569 bp), contains 37 genes (13
protein-coding, 22 tRNAs, 2 rRNAs), and notably lacks introns. It has a higher
mutation rate due to less robust error checking. Its primary function is ATP
production through cellular respiration. In contrast, the nuclear genome consists
of linear DNA organized into 23 pairs of chromosomes, inherited from both
parents, is diploid in somatic cells, is vastly larger (3.1 billion bp), contains
approximately 40,000 genes (21,000 protein-coding), and contains introns. It has
a lower mutation rate due to more robust repair systems. The nuclear genome is
the primary determinant for traits and diseases, encoding diverse structural
proteins, enzymes, and regulatory elements.
2. Question: Discuss the various classes of non-coding RNAs in the human genome and
their diverse roles, with a particular focus on microRNAs (miRNAs) and their biogenesis.
o Answer: Non-coding RNAs (ncRNAs) are abundant and play diverse roles in the
human genome. Main classes include rRNAs (essential for ribosomes)
and tRNAs (for protein translation). Short ncRNAsinclude snRNAs (for RNA
splicing), snoRNAs (modify rRNAs/tRNAs), scaRNAs (modify snRNAs),
piRNAs (silence transposons), and most notably, miRNAs. miRNAs are small
(18-25 nt) regulators of gene expression post-transcriptionally.
miRNA Biogenesis: It begins with transcription by RNA Pol II into
a primary-miRNA (pri-miRNA). In the nucleus, the pri-miRNA is
cleaved by the Drosha enzyme into a smaller hairpin-structured pre-
miRNA. This pre-miRNA is then exported from the nucleus to the
cytoplasm by Exportin-5. In the cytoplasm, the Dicer enzyme cleaves the
pre-miRNA into a mature miRNA duplex. Finally, one strand of this
duplex (the guide strand) is loaded into the RNA-induced silencing
complex (RISC), while the other strand is degraded. The miRNA-RISC
complex then binds to complementary sequences, typically in the 3' UTR
of target mRNAs, to either cleave and degrade the mRNA or repress its
translation.
Long ncRNAs (lncRNAs) are over 200 nt and regulate transcription,
chromatin, and epigenetic states. These diverse ncRNAs highlight that the
genome's functionality extends far beyond protein-coding genes.
3. Question: Explain the redefinition of a "gene" as proposed by the ENCODE project.
How did ENCODE findings challenge the previous understanding of the human genome,
particularly the concept of "junk DNA"?
o Answer: The ENCODE project (2003-2012) redefined a gene as "a union of
genomic sequences encoding a coherent set of potentially overlapping functional
products". This updated definition reflects the complexity revealed by ENCODE,
moving beyond simply protein-coding sequences. ENCODE findings significantly
challenged the concept of "junk DNA" by demonstrating that approximately 80%
of the human genome has biochemical functions. It found that 75-80% of the
genome is transcribed into RNA in at least one cell type, with most of these
transcripts being non-coding RNAs. Furthermore, it showed that DNA and
histone modifications vary across cell types, influencing gene regulation, and that
protein-coding genes can produce multiple transcripts through alternative
splicing. These discoveries revealed that vast non-coding regions previously
considered "junk" actually play crucial regulatory roles, contributing to the
complexity of human biology.
This section explores the intricate mechanisms that control gene expression, from chromatin
structure and transcription factors to epigenetic modifications and post-transcriptional
processing.
Key Concepts:
Gene Regulation
o Definition: When, where, and how much a gene is expressed.
o Regulation occurs at multiple levels to ensure spatial (where)
and temporal (when) control.
o It ensures tissue specificity, developmental stage-specific expression, and
responses to environmental signals.
Chromatin/Accessibility/TADs
o Chromatin: Composed of DNA, histones, and other proteins.
Euchromatin: Open, transcriptionally active.
Heterochromatin: Condensed, transcriptionally inactive.
Remodeling changes DNA accessibility to promoters, aiding or hindering
RNA polymerase binding.
o TADs (Topologically Associating Domains): 3D chromatin domains that bring
enhancers near promoters.
Delineated by boundary proteins (also called boundary elements or TAD
insulators) which are regulatory DNA sequences that prevent the spread of
heterochromatin and block enhancers from activating unintended genes.
TAD boundaries are usually found near promoters and transcription start
sites, are highly conserved across species, and are often bound to the
CTCF transcription factor.
Crucial for gene regulation, and their disruption is linked to rare diseases.
Levels of Regulation
o Transcriptional (in the nucleus):
1. Chromatin Remodeling: Controls DNA accessibility through
modifications of histones.
Modifications include acetylation (generally activates gene
expression), methylation (can activate or repress depending on
amino acid and degree), and phosphorylation.
Involves writers (add modifications), erasers (remove
modifications), and readers (interpret modifications).
2. Promoter Accessibility: Promoters are DNA sequences upstream of
genes.
Core promoters: Contain elements like the TATA box, Inr
(initiator element), and DPE (downstream promoter element).
Proximal promoters: Contain elements like the GC box and
CCAAT-box.
Bidirectional promoters allow transcription in opposite directions
on both strands.
3. Transcription Factors (TFs): Proteins with DNA binding domains
(e.g., zinc fingers, helix-turn-helix) and activation domains.
General TFs: Needed for all transcription.
Specific TFs: Act as activators or repressors for specific tissues or
signals.
4. RNA Polymerases:
Pol I: Transcribes rRNAs.
Pol II: Transcribes mRNAs, miRNAs, snRNAs, and tissue-specific
genes.
Pol III: Transcribes tRNAs and 5S rRNAs.
Mitochondrial RNA polymerase also exists.
Epigenetic Mechanisms
o These are heritable changes in gene expression that occur without altering the
underlying DNA sequence.
o 1. DNA Methylation: Addition of methyl groups to CpG islands (regions rich in
cytosine and guanine) in promoter regions by DNA methyltransferases.
Blocks TFs from binding and recruits repressors.
High methylation in promoter regions silences gene expression.
Heritable through mitosis (cell division) but not meiosis (gamete
formation).
Plays a role in development, cell lineage commitment, and genomic
imprinting.
o 2. Histone Modification: Chemical modifications to histone tails (proteins
around which DNA is wrapped).
Often interpreted as a "histone code" that defines chromatin states.
Heritable through mitosis.
Acetylation: Generally activates gene expression.
Methylation: Can activate or repress based on specific amino acid
residues and degree of methylation.
Phosphorylation also occurs.
Like chromatin remodeling, involves writers, erasers, and readers.
o 3. Imprinting: Gene expression depends on whether the gene is inherited from
the father or the mother.
Results in monoallelic expression, where only one parental allele is
expressed.
Affects approximately 1% of the mammalian genome, usually found in
clusters.
Example: The IGF2 gene, where only the paternal allele is expressed.
o 4. X-inactivation: In females, one of the two X chromosomes is inactivated to
balance gene dosage between sexes.
Controlled by XIST, a long non-coding RNA that silences the chosen X
chromosome.
Leads to mosaic expression, meaning different cells in the same organism
can have different active X chromosomes, as seen in the fur color patterns
of calico cats.
Post-Transcriptional Control (in the cytoplasm)
o 1. Pre-mRNA Processing: Occurs in the nucleus before mRNA is exported to the
cytoplasm.
RNA Splicing: Removal of introns (non-coding regions) and joining of
exons (coding regions) to form mature mRNA.
Alternative Splicing: Different combinations of exons from a single gene
can be joined to produce different mRNAs, significantly increasing protein
diversity. Important for tissue-specific expression (e.g., in the CNS) and
for making protein isoforms with different functions or localizations.
5’ Capping: Addition of a modified guanine nucleotide to the 5' end of
mRNA, crucial for stability and translation initiation.
Poly-A Tail: Addition of a string of adenine nucleotides to the 3' end,
important for mRNA stability and translation.
o 2. mRNA Stability: Regulation of how long an mRNA molecule persists in the
cytoplasm before degradation.
mRNA can be degraded by 5’→3’ exonucleases (after 5’ cap removal) or
3’→5’ exonucleases (after polyA tail shortens).
MicroRNAs (miRNAs) regulate mRNA degradation or inhibit translation
by binding to the 3’ UTR of mRNA. One miRNA can target many
mRNAs.
o 3. Translation Control: Regulation of protein synthesis from mRNA.
miRNAs binding to the 3’ UTR of mRNA, often with RNA-binding
proteins, can inhibit translation initiation at the ribosome or cause mRNA
degradation.
This provides a faster cellular response compared to transcriptional
control.
o 4. Protein Processing: Modifications to a polypeptide chain after translation to
become a functional protein.
Misfolded proteins are degraded.
Covalent modifications: e.g., phosphorylation (addition of a phosphate
group).
Proteolytic cleavage: Cutting of a protein to generate mature, active
proteins.
o 5. Protein Transport: Signal sequences on proteins direct them to specific
cellular destinations. This process is regulated and essential for proper cell
function.
Techniques to Measure Gene Expression/Epigenetic Modifications
o At RNA level:
qPCR: Measures mRNA levels of specific genes; sensitive, specific, and
high-throughput.
RNA-seq: NGS-based, provides a comprehensive transcriptome profile,
detects alternative splicing, isoforms, and novel transcripts.
Northern Blot: Less common now, detects RNA size and abundance.
Microarrays: Hybridization-based method to compare expression
profiles; involves reverse transcription of RNA into cDNA, labeling, and
hybridization to a microarray chip.
o At Protein level:
Western Blot: Detects specific proteins with antibodies; semi-
quantitative.
ELISA (Enzyme-Linked Immunosorbent Assay): Quantitative analysis
of protein concentration.
Mass Spectrometry: High-resolution protein identification and Post-
Translational Modification (PTM) mapping.
Immunohistochemistry/Immunofluorescence: Spatial visualization of
protein expression in tissues.
o Epigenetic Modification Measurement:
Bisulfite Sequencing: Detects DNA methylation by converting
unmethylated cytosines to uracil.
ChIP-Seq (Chromatin Immunoprecipitation Sequencing): Measures
histone modifications and transcription factor binding sites.
ATAC-Seq/DNase-Seq: Measures chromatin accessibility.
MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing):
Captures methylated DNA regions.
1. Question: Explain the concept of Topologically Associating Domains (TADs) and their
importance in gene regulation. What role do boundary proteins play, and what happens if
TADs are disrupted?
o Answer: Topologically Associating Domains (TADs) are 3D chromatin
domains within the nucleus that bring enhancers close to their target promoters,
facilitating gene regulation. They are crucial for ensuring proper gene expression
patterns, are typically conserved across species, and are delineated by boundary
proteins (or boundary elements/insulators). These boundary proteins, often bound
by CTCF transcription factor, act as borders that prevent the spread of
heterochromatin and block enhancers from activating unintended
genes. Disruption of TADs can lead to abnormal gene regulation, such as ectopic
gene expression, and is linked to rare diseases.
2. Question: Describe the major epigenetic mechanisms that regulate gene expression. How
do DNA methylation and histone modifications work, and what is their impact on gene
activity?
o Answer: The major epigenetic mechanisms are DNA methylation, histone
modification, imprinting, and X-inactivation.
DNA methylation involves adding methyl groups to CpG islands,
typically in promoter regions, by DNA methyltransferases. This can
silence genes by blocking transcription factors from binding or by
recruiting repressor proteins. High methylation in promoters correlates
with gene silencing and is heritable through mitosis.
Histone modifications are chemical changes to histone tails, such as
acetylation, methylation, and phosphorylation. These modifications are
interpreted as a "histone code" that defines the chromatin state. For
example, acetylation generally opens chromatin and activates gene
expression, while methylation can either activate or repress gene
expression depending on the specific amino acid and degree of
modification. These modifications influence DNA accessibility for
transcription and are also heritable through mitosis.
3. Question: Discuss the different levels at which gene expression is regulated,
differentiating between transcriptional and post-transcriptional control. Provide examples
of specific mechanisms at each level.
o Answer: Gene expression is regulated at multiple levels to control when, where,
and how much a gene is expressed.
Transcriptional Control (in the nucleus) determines if and how much
mRNA is made from a gene. Key mechanisms include:
Chromatin remodeling, where the open (euchromatin) or
condensed (heterochromatin) state of DNA influences accessibility
for RNA polymerase, often through histone modifications like
acetylation (activation) or methylation (activation/repression).
Promoter accessibility, as transcription factors bind to core and
proximal promoter elements to initiate or block transcription.
The activity of various RNA polymerases (Pol I, II, III) dictates
which types of RNA are transcribed.
Post-Transcriptional Control (in the cytoplasm) regulates gene
expression after mRNA has been transcribed. Mechanisms include:
Pre-mRNA processing, such as RNA splicing (removal of introns
and joining of exons to form mature mRNA) and alternative
splicing (producing different protein isoforms from a single gene).
5' capping and poly-A tail addition also regulate stability and
translation.
mRNA stability, controlled by how long an mRNA molecule
persists before being degraded by exonucleases.
Translation control, where mechanisms like microRNAs
(miRNAs) bind to the 3' UTR of mRNAs to either inhibit
translation initiation or cause mRNA degradation, providing a fast
cellular response.
Protein processing (e.g., misfolded protein degradation, covalent
modifications like phosphorylation, or proteolytic cleavage)
ensures proteins are functional.
Protein transport directs proteins to their correct cellular
destinations.
4. Genetic Variation
This section delves into the origins and types of genetic variations, their consequences,
mechanisms of repair, and how these variations are studied using various molecular techniques.
It also touches on comparative genomics and ancient DNA.
Key Concepts:
Genetic Variation
o Origin:
Replication errors: DNA polymerase mispairs bases (can be fixed by
DNA mismatch repair).
Replication slippage: Occurs typically in Short Tandem Repeats (STRs),
leading to insertions or deletions (indels).
Chromosome segregation/recombination errors: Can cause inversions,
deletions, and translocations, or aneuploidy (e.g., non-disjunction leading
to Down, Turner, Klinefelter syndromes).
Endogenous DNA damage: Spontaneous loss of bases, Reactive Oxygen
Species (ROS), deamination (C→U, C→T).
External mutagens: Ionizing radiation (breaks DNA), UV radiation
(forms thymine dimers), hydrocarbons from smoke/pollution.
o Types of Mutations:
1. Substitutions (SNVs - Single Nucleotide Variants): Change a single
nucleotide.
Transition: Purine to purine (A↔G) or pyrimidine to pyrimidine
(C↔T).
Transversion: Purine to pyrimidine or vice versa.
2. Indels: Insertions or deletions of nucleotides, often causing frameshifts.
3. Copy Number Variants (CNVs - type of structural variants): Large-
scale duplications or deletions of DNA segments, also include inversions
and translocations.
4. Tandem repeat expansions: Increase in the number of short repetitive
DNA sequences, causing diseases like Huntington's and myotonic
dystrophy (often due to RNA toxicity).
Balanced mutations: No net gain or loss of DNA, but can still disrupt
genes (e.g., inversions, reciprocal translocations).
Unbalanced mutations: Change in copy number, typically cause disease
(e.g., deletions, duplications, aneuploidy).
Consequences of Genetic Variation
o Loss of Function:
Nonsense mutation: Introduces a premature stop codon, which can trigger
nonsense-mediated mRNA decay if it occurs early in the coding region
(e.g., in ASS1 leading to citrullinemia).
Missense mutation: Changes a single amino acid.
Synonymous: No amino acid change, low risk.
Tolerated: Chemically similar substitution, mild or no effect.
Not tolerated: Major disruption in function, high risk (e.g., in beta
thalassemia).
Splice site mutation: Disrupts intron/exon boundaries (5’ splice donor or
3’ splice acceptor). Leads to exon skipping, new exons, or enlarged exons.
Frameshift mutations: Shift the reading frame due to indels, usually
leading to an early stop codon and truncated or non-functional proteins.
o Gain of Function:
Structural rearrangements: Can lead to chimeric genes (combinations of
two or more distinct genes) or ectopic expression (gene expressed where
it's not normally). Example: Enhancers placed next to oncogenes.
RNA toxicity: Usually from unstable repeat expansions where abnormally
long RNA transcripts form stable secondary structures (hairpins) that trap
RNA-binding proteins, disrupting RNA processing (e.g., myotonic
dystrophy).
Gene duplications: Lead to more transcripts (e.g., IGF2 duplication).
Missense mutations: Can change protein functions to a detrimental gain
(e.g., dominant oncogenes).
Repair Mechanisms
o Base-excision repair: Fixes modified bases (e.g., deaminated C to U).
o Nucleotide-excision repair: Removes bulky lesions (e.g., UV damage).
o Mismatch repairs: Fixes replication errors.
o Nonsense-mediated decay: Degrades transcripts with premature stop codons.
Population Genetic Variation
o Negative (Purifying) Selection: Removes harmful alleles from a population,
leading to conserved regions. Causes Runs of Homozygosity (ROH), long
stretches of identical chromosome copies, often seen in bottlenecked or inbred
populations. This process is called "purging".
o Positive (Adaptive) Selection: Favors beneficial mutations, leading to "selective
sweeps" where advantageous alleles become more common and heterozygosity is
reduced in selected regions. Example: Olfactory genes in dogs.
Chromosomal Abnormalities & Structural Variations
o Chromosome Types and Structure:
Euploidy: Complete set of chromosomes (e.g., 46 in humans).
Aneuploidy: Missing or extra individual chromosomes (e.g., monosomy,
trisomy, nullisomy). Examples: Trisomy 21 (Down syndrome), Trisomy
13 (Patau syndrome), Trisomy 18 (Edwards syndrome), Monosomy X
(Turner syndrome).
Structural abnormalities: Can be balanced (e.g., inversions, reciprocal
translocations, no gain/loss of genetic material) or unbalanced (e.g.,
deletions, duplications, Robertsonian translocations, gain/loss of material).
A/B chromatin compartments: Active (A) vs. repressed (B) chromatin
regions.
CTCF boundaries: DNA insulators critical for organizing TADs.
o Chromosomal Translocation:
Reciprocal: Swap between two non-homologous chromosomes.
Robertsonian: Fusion between short arms of acrocentric chromosomes
(can cause trisomies).
o TADs (Topologically Associating Domains): 3D genome units that restrict
enhancer-promoter activity. Disruption can cause ectopic gene expression, cancer,
and developmental disorders. Studied with Hi-C sequencing.
Techniques to Study Chromosomal Arrangements
o Karyotyping: Visualizes the full set of chromosomes in a cell, useful for
detecting large-scale changes (e.g., G-banding) and balanced abnormalities.
o Chromosome painting: Uses fluorescently labeled DNA probes that bind to
specific chromosomes, detecting translocations or fusions.
o FISH (Fluorescence in situ Hybridization): Uses DNA probes with fluorescent
dyes to bind to DNA sequences on chromosomes, detecting translocations,
inversions, duplications, deletions, and CNVs.
o Comparative mapping: Aligns genes or markers across species to identify
conserved synteny (blocks of genes with the same order). Tools include genome
browsers, sequence alignments (e.g., BLAST), linkage maps, and sequencing
data.
o CGH arrays (Comparative Genomic Hybridization arrays): Detects copy
number differences smaller than 5 Mb.
o Whole Genome Sequencing (WGS) and Hi-C sequencing are also used.
Comparative Genomics
o Definition: Study of genetic differences and similarities across species.
o Purpose: Identify conserved genes and regulatory elements, understand genome
evolution and function, and select optimal model organisms.
o Purifying/Negative Selection: Removes harmful mutations, preserving
functionally important sequences (coding and non-coding). Indicated by a Ka/Ks
ratio < 1 (Ka: nonsynonymous substitutions, Ks: synonymous substitutions).
o Positive/Adaptive Selection: Drives lineage-specific adaptation and the
development of unique traits suited to an environment (e.g., expanded olfactory
genes in dogs). Indicated by a Ka/Ks ratio > 1.
o G-value paradox: No direct link between gene number and organism complexity;
complexity primarily comes from regulation, not gene count (e.g., humans vs.
wheat).
o Gene duplication: Generates paralogs (within species) and orthologs (across
species). Leads to new functions (neofunctionalization), redundant backup
(subfunctionalization), and increased complexity.
o Genome evolution: Includes gene duplication, exon duplication/shuffling
(rearranging exons to produce new genes, increasing functional complexity),
expansion of noncoding regulatory regions (especially in vertebrates), and the G-
value paradox. Also involves lineage-specific changes (e.g., in immune response,
toxin degradation, sensory genes).
o Animal models: Comparative models aid cross-species annotations and disease
gene identification. Dogs (inbred breeds useful for Mendelian diseases) and pigs
(close to humans, useful in metabolic/obesity studies) are examples.
Techniques to Study DNA Variation
o DNA-Seq (WGS): Detects SNVs, indels, and structural variations; replacing
Sanger sequencing in diagnostics.
o RFLP (Restriction Fragment Length Polymorphism): Restriction enzymes cut
DNA at specific sequences, and variations change band patterns on gels; largely
replaced by SNP chips and sequencing.
o SNP chip: High-throughput genotyping of known SNPs, used for population
genetics, GWAS, and parentage testing.
o TaqMan Assay: Uses allele-specific probes with fluorescent dyes (e.g., FAM,
VIC) in qPCR-based genotyping, used in dog disease testing.
Ancient DNA (aDNA)
o Sources: Good sources are cold environments (permafrost), dry places (deserts,
mummies), and stable, neutral pH conditions (caves). Bad sources are humid/hot
climates, acidic/basic soils, and fluctuating temperatures/humidity.
o Degradation & Damage: aDNA is prone to fragmentation and chemical changes
(e.g., cytosine deamination, which can be reduced with USER enzyme).
o Contamination: From the environment, lab, humans, reagents, and PCR products
(due to DNase and RNAse).
o Good practice: Sterile collection, dedicated clean labs, no PCR products in
aDNA areas, use DNase/RNAse-free materials, and accepting when samples are
too degraded.
o Ancient proteins: More stable than DNA, can persist longer, and detected with
mass spectrometry; reveal metabolic and phylogenetic information (e.g., collagen
in T-rex).
o Ancient RNA (aRNA): Historically thought too fragile, but recent studies found
in mammoths; preserved through fast desiccation, freezing, chemical treatment;
provides insight into gene expression from ancient tissues.
Ethics: Genetic variation studies raise ethical considerations in areas like dog breeding,
human genetic testing, and gene editing in humans and animals.
This section covers the principles of Mendelian genetics, genetic mapping techniques, and the
use of Whole Genome Sequencing (WGS) to identify causative mutations for single-gene
disorders.
Key Concepts:
Basic Concepts on Mendelian Genetics
o Heritability: The proportion of phenotypic variation in a population attributable
to genetic variation among individuals. It is population-specific and does not
imply inevitability at the individual level. Important in animal breeding for
selecting strategies for desirable traits.
o Mendelian patterns of inheritance: Autosomal dominant/recessive, X-linked
dominant/recessive, Y-linked, and mitochondrial inheritance (from mother).
o Mendel's Laws:
1. Law of Segregation: Each parent passes one of two alleles to offspring
randomly.
2. Law of Independent Assortment: Alleles for different genes segregate
independently during gamete formation.
3. Law of Dominance: Dominant alleles mask recessive ones in
heterozygotes.
o Forces that change allele frequency: Mutation, gene flow, genetic drift, natural
selection, and non-random mating.
o Hardy-Weinberg Equations: Used to describe allele and genotype frequencies
in a stable population.
Allele frequency: p + q = 1.
Genotype frequency: p² + 2pq + q² = 1.
Genetic Heterogeneity
o Different genes or mutations can produce the same phenotype.
o 1. Allele heterogeneity: Different mutations within the same gene lead to the
same phenotype (e.g., over 12 mutations in CFTR cause cystic fibrosis).
o 2. Locus heterogeneity: Mutations in different genes cause the same phenotype
(e.g., retinitis pigmentosa from mutations in over 16 genes).
o 3. Clinical heterogeneity: Different mutations within the same gene lead to
different phenotypes (e.g., different dystrophin mutations cause Duchenne or
Becker muscular dystrophy).
Haplotype: A combination of alleles at adjacent loci on a chromosome that are inherited
together. Important for tracking inheritance patterns and identifying disease loci.
Genetic Distance vs. Physical Distance in the Genome
o Genetic distance: Measured in centimorgans (cM), reflects recombination
frequency.
o Physical distance: Measured in base pairs (bp).
o Recombination is not evenly distributed across the genome (has hotspots) and
sex-specific recombination influences genetic maps.
Other Concepts
o Penetrance: The percentage of individuals with a given genotype who express its
associated phenotype.
o Codominance: Both alleles are fully expressed in the heterozygote.
o Incomplete dominance: Heterozygotes show an intermediate phenotype.
o Variable Expressivity: The degree or intensity of gene expression varies among
individuals with the same genotype.
o Epistasis: One gene masks or modifies the effect of another gene.
o Pleiotropy: One gene affects multiple seemingly unrelated traits.
o Recombination: Exchange of genetic material between homologous
chromosomes during meiosis, forming the basis for estimating distances between
markers on chromosomes via genetic maps.
o Linkage phase: The arrangement of alleles on parental chromosomes (coupling
phase: dominant alleles linked; repulsion phase: one dominant linked to one
recessive allele).
o Informative meiosis: Meiosis that allows determination of whether
recombination has occurred, crucial for linkage analysis.
Purpose of Genetic Mapping
o Locate genes responsible for traits (e.g., disease loci).
o Determine the relative positions of genes or markers on chromosomes.
o Understand recombination patterns and genome architecture.
o Enable marker-assisted selection in breeding programs.
Genetic Markers: Polymorphic DNA sequences used to trace inheritance.
o Types: SNPs (single base changes), Microsatellites/STRs (repetitive DNA, very
polymorphic), RFLPs (detected by restriction enzymes).
o Criteria: Must be polymorphic (variable in the population), stable, and easily
genotyped.
Genetic Maps/Fine Mapping
o Genetic maps: Show the relative positions of markers based on recombination
rates. A map unit of 1 cM (centimorgan) equals 1% recombination frequency.
o Fine mapping: High-resolution mapping to narrow down the location of disease-
causing mutations using dense marker panels and recombination events. Used in
techniques like GWAS and linkage analysis.
Linkage Mapping
o Main Concepts: Identifies disease loci by examining the inheritance of traits with
nearby genetic markers. It relies on the principle that loci located close together
on the same chromosome are inherited together more often due to reduced
recombination.
Unlinked genes: On different chromosomes or far apart on the same
chromosome, show independent assortment, with a recombination
frequency of 50%.
Linked genes: Close together on the same chromosome, usually inherited
together, with a lower recombination frequency. The lower the
recombination frequency, the closer the loci.
o Informative meiosis/recombination: Crucial because it tells you whether
recombination has occurred between a marker and a disease allele.
o LOD score (Logarithm of Odds): Estimates the likelihood of linkage versus no
linkage.
Formula: log₁₀ (likelihood linked / likelihood unlinked).
LOD > 3: Significant evidence for linkage (1000:1 odds in favor).
LOD < -2: Evidence against linkage.
Values between are inconclusive.
The peak of a LOD score curve indicates the most probable recombination
frequency.
o Evaluating evidence for linkage in a pedigree: Involves counting informative
meiosis, identifying recombinant vs. parental gametes, visualizing segregation
patterns, and determining the phase (coupling or repulsion).
o Limitations: Requires large, informative families; less useful for complex traits;
needs clear phenotype classification; and has limited resolution, defining only
broad regions unless recombination is very frequent.
Identification of Mutations for Mendelian Traits by WGS
o Whole Genome Sequencing (WGS): Considered the gold standard for
identifying causative mutations in monogenic (Mendelian) traits.
o Analysis of NGS data: NGS technologies (e.g., Illumina HiSeq) generate billions
of short read sequences, resulting in datasets ranging from gigabytes to terabytes.
This high volume necessitates automated pipelines, high-performance computing,
and systematic data processing strategies.
o Overview of omics data analysis workflow (pipeline):
1. Quality control of raw reads: Trimming, adapter removal to remove
sequencing errors.
2. Alignment to reference genome: To identify where mutations are,
creating SAM/BAM files that show aligned reads.
3. Detect genetic variants: (e.g., SNPs, indels) in aligned reads compared
to the reference genome.
4. Annotate variants: With potential functional consequences and filter
out irrelevant variants to prioritize those likely to affect gene function or
cause diseases.
o Handling data in multi-node computer clusters: Nodes (computers) are used to
analyze data on multi-node clusters or cloud platforms for parallel processing.
This allows distributing data across nodes, running tools simultaneously, and easy
storage.
o Translating technical data to biological insight: The ultimate goal is to identify
a single causative mutation or meaningful pattern.
PRA example: In the case of Progressive Retinal Atrophy (PRA) in dogs,
filtering eliminated millions of variants down to one, which was then
biologically validated to cause the disease, leading to the development of a
genetic test to prevent breeding affected dogs.
o WGS workflow:
Starts with genomic DNA.
Sonication to shear DNA into smaller fragments.
End repair to fix fragment ends to be blunt or compatible.
Ligate sequencing adapters.
PCR amplification.
Select appropriately sized DNA fragments.
Sequencing.
o Example: PRA in Old Danish Pointer:
Compared 5 cases vs. 5 controls, performed 30X sequencing with Illumina
HiSeq.
Bioinformatics pipeline (GATK, Ensembl VEP) identified variants.
Filtering prioritized variants that were homozygous in cases but not in
controls, and were located in known protein-coding regions.
Found a 1 bp insertion in exon 1 of a specific gene, leading to a nonsense
mutation (premature stop codon). This led to a genetic test for the breed.
1. Question: Define genetic heterogeneity and differentiate between its three types: allelic,
locus, and clinical heterogeneity. Provide an example for each.
o Answer: Genetic heterogeneity describes situations where different genes or
mutations can produce the same or similar phenotypes.
Allelic heterogeneity occurs when different mutations within the same
gene cause the same phenotype. For example, over 12 distinct mutations
in the CFTR gene can all lead to cystic fibrosis.
Locus heterogeneity describes cases where mutations in different
genes result in the same phenotype. An example is retinitis pigmentosa,
which can be caused by mutations in over 16 different genes.
Clinical heterogeneity refers to situations where different
mutations within the same gene lead to different phenotypes. For instance,
various mutations in the dystrophin gene can cause either Duchenne
muscular dystrophy (more severe) or Becker muscular dystrophy (milder).
2. Question: Explain the principles of linkage mapping and the significance of the LOD
score in establishing genetic linkage. What are the main limitations of this technique?
o Answer: Linkage mapping aims to identify disease loci by observing how a trait
(like a disease) co-segregates with known genetic markers within families. It
operates on the principle that genes or markers located physically close on the
same chromosome (linked genes) will be inherited together more frequently than
if they were unlinked, due to reduced recombination between them.
The recombination frequencybetween linked loci is less than 50%.
The LOD score (Logarithm of Odds) is a statistical measure used to
determine the likelihood of linkage. A LOD score of 3 or
greater (meaning the odds of linkage are 1000 times higher than the odds
of no linkage) is considered significant evidence for linkage, while a score
of -2 or lessprovides evidence against linkage.
Limitations of linkage mapping include the need for large, informative
families, its reduced utility for complex traits (which have multiple genetic
and environmental factors), the requirement for clear phenotype
classification, and its limited resolution, often only narrowing down broad
chromosomal regions.
3. Question: Outline the general workflow for identifying causative mutations in Mendelian
traits using Whole Genome Sequencing (WGS). How does bioinformatics play a critical
role in this process, using the PRA example?
o Answer: The WGS workflow for identifying causative mutations in Mendelian
traits typically starts with obtaining genomic DNA, which is then fragmented
(e.g., by sonication). The ends of these fragments are repaired, and sequencing
adapters are ligated. After PCR amplification and size selection, the DNA
fragments are sequenced.
Bioinformatics is critical for processing the massive amounts of data
generated by WGS. The workflow involves several steps: quality
control of raw reads (trimming low-quality bases and adapter
sequences), alignment of reads to a reference genome (creating
SAM/BAM files), variant detection (identifying SNPs and indels by
comparing aligned reads to the reference), and variant annotation and
filtering (assigning potential functional consequences and prioritizing
variants that are likely to affect gene function or cause disease).
In the PRA example, WGS was performed on affected dogs and controls.
Bioinformatics tools were used to align reads, detect variants, and then
filter these variants to find ones that were homozygous in affected dogs
but absent in controls, particularly prioritizing those in protein-coding
regions. This systematic filtering and annotation, powered by
bioinformatics, enabled the identification of a 1 bp insertion leading to a
nonsense mutation, which was then validated as the causative mutation.
This process demonstrates how bioinformatics translates raw sequencing
data into biologically meaningful insights and ultimately can lead to
practical applications like genetic tests.
This section focuses on the genetic basis of complex traits, which are influenced by multiple
genes and environmental factors. It covers concepts like Linkage Disequilibrium (LD), Genome-
Wide Association Studies (GWAS), and the challenges in mapping these traits.
Key Concepts:
7. Biomarkers
This section defines biomarkers, explores their various types and applications, and focuses on
microRNAs as promising diagnostic and therapeutic tools in personalized medicine.
Key Concepts:
1. Question: Define what a biomarker is and explain its various types, providing a specific
example for each.
o Answer: A biomarker is a measurable characteristic that indicates a normal
biological process, a pathogenic process, or a pharmacological response to
therapy.
Diagnostic biomarkers define the presence or type of disease
(e.g., PSA for prostate cancer).
Prognostic biomarkers predict the likely outcome or course of a disease
(e.g., HER2 status in breast cancer predicts aggressive tumor behavior).
Predictive biomarkers indicate the likelihood of a patient responding to a
specific treatment (e.g., EGFR mutations predicting response to certain
lung cancer drugs).
Mechanistic biomarkers offer insight into the molecular mechanisms
underlying a disease or drug action (e.g., measuring the activation of a
specific signaling pathway affected by a drug).
Safety biomarkers monitor responses to adverse or toxic drug effects
(e.g., creatinine levels to monitor kidney function during drug treatment).
2. Question: Discuss the role of biomarkers in addressing the challenges of
pharmacotherapy and advancing personalized medicine. What is a "companion
diagnostic," and what are its key characteristics?
o Answer: Biomarkers are crucial for addressing key challenges in
pharmacotherapy and driving personalized medicine, which aims to tailor
medical treatment to individual patient characteristics. Pharmacotherapy faces
issues like disease heterogeneity (same diagnosis, different molecular
mechanisms), the failure of "one-size-fits-all" blockbuster drugs, high failure rates
in drug development, and over/mistreatment due to a lack of precise guidance.
Biomarkers help overcome these by allowing patient stratification,
ensuring the right drug reaches the right patient. A companion diagnostic
(CDx) is a specific type of test developed concurrently with a therapeutic
drug. Its purpose is to identify patients who are most likely to benefit from
the drug, minimize adverse effects, and optimize dosing. Key
characteristics of a CDx include its potential to be expensive, showing
minimal efficacy in biomarker-negative patients while demonstrating high
benefit in a stratified subgroup, and playing a critical role in avoiding
treatment failure.
3. Question: Describe the biogenesis of microRNAs (miRNAs) and explain why circulating
miRNAs are considered useful as non-invasive biomarkers.
o Answer: The biogenesis of miRNAs is a multi-step process:
It begins with transcription of miRNA genes by RNA Polymerase II into
a primary-miRNA (pri-miRNA).
In the nucleus, the pri-miRNA is processed by the Drosha enzyme into a
hairpin-shaped pre-miRNA.
The pre-miRNA is then exported from the nucleus to the cytoplasm
by Exportin-5.
In the cytoplasm, the Dicer enzyme cleaves the pre-miRNA into a mature
miRNA duplex (18-25 nt).
Finally, one strand of this duplex (the guide strand) is loaded into
the RNA-induced silencing complex (RISC), which then targets
complementary mRNA sequences for degradation or translational
repression. Circulating miRNAs are found stably in various body fluids
like blood (serum/plasma), urine, and saliva. They are remarkably stable
extracellularly because they are protected within exosomes (small
vesicles) or bound to proteins. This stability and their presence in easily
accessible fluids make them highly useful as non-invasive
biomarkers for conditions like cancer, as exemplified by miRNA-21 in
breast cancer. They offer a less invasive alternative to tissue biopsies for
diagnosis, prognosis, and therapeutic monitoring.
8. Cancer Genetics and Genomics
This section explores cancer as a genetic disease, its evolutionary nature, the hallmarks of
cancer, the roles of oncogenes and tumor suppressors, cell plasticity, the tumor
microenvironment, and various techniques used to study cancer at a molecular level.
Key Concepts:
3. Evasion of apoptosis
This section explores the principles and techniques of genetic manipulation, including animal
models, transgenic animals, gene editing technologies like CRISPR-Cas9, and genetic
approaches to treat diseases, such as gene and cell therapies. It also addresses ethical
considerations and risk assessment.
Key Concepts:
Concepts/Principles of Genetic Manipulation
o Genetic manipulation: Involves the insertion, deletion, or alteration of genes in
organisms.
o Purposes: Model human diseases (e.g., Alzheimer's in mice), understand gene
roles during development, and produce therapeutic proteins (e.g., insulin).
o Gene Therapy: Aims to correct genetic disorders by modifying gene expression
in affected cells.
Different Animal Models (Pros/Cons)
o Flies (Drosophila) & C. elegans: Cheap, short life cycles, many offspring, good
for genetics. Cons: Evolutionarily distant from humans, poor physiological
relevance.
o Frogs & Zebrafish: Transparent embryos, good for early developmental studies.
Cons: Evolutionarily distant from humans.
o Mice: Mammalian, extensive genetic tools, short life cycles, many established
disease models. Cons: Differences in complex organs (e.g., brain), short lifespan.
o Dogs, pigs, non-human primates: Closer physiology to humans, similar
behavior, immunity, useful for neurological studies. Cons: Expensive, ethical
concerns, long generation times, not many offspring.
Transgenic Animals
o Definition: Animals genetically modified to carry foreign DNA integrated into
their genome and passed to progeny.
o How to Generate:
1. Insert into germ cells: Ensures all cells in the animal have the genetic
modification, which is then passed on to the next generation.
Pronuclear injection: DNA injected directly into the haploid
pronucleus of a fertilized oocyte before nuclear fusion of sperm
and egg.
Gene transfer into early embryos or gametes: DNA introduced
at gamete or early embryonic stages (e.g., male sperm nuclei with
integrated DNA injected into a female oocyte, trackable with green
fluorescence).
Somatic cell nuclear transfer (SCNT): Nucleus from a modified
adult cell is inserted into an enucleated egg (e.g., Dolly the sheep).
2. Gene targeting (precise modification): Uses a plasmid as a template,
where the plasmid and target gene have complementary homology arms,
allowing for specific gene insertion or mutation via homologous
recombination. Often involves backcrossing (wild type x heterozygous
GM) to ensure only the targeted gene is genetically different.
Techniques (protein-based gene editing):
Zinc Finger Nucleases (ZFNs): Engineered proteins, each
recognizing 3bp DNA sequence; requires multiple
"fingers". They have a DNA binding domain and a FokI
nuclease domain. Two ZFNs bind to sequences next to each
other around the target site, their FokI domains dimerize,
activating nuclease activity to cut DNA (double-strand
break - DSB). Repair occurs via NHEJ (knockout via small
indels) or HDR (precise repair with DNA template).
TALENs (Transcription Activator-Like Effector
Nucleases): Similar to ZFNs, also use DNA binding
domains fused to a FokI nuclease domain. Two TALENs
bind to adjacent sequences, their FokI domains dimerize,
and activate the nuclease to create a DSB at the spacer
region between their binding sites. Repair via NHEJ
(knockout) or HDR (precise repair/knockin). Used in CAR
T cells.
CRISPR-Cas9: Uses a Cas9 nuclease guided by an RNA
molecule (guide RNA) to a specific DNA sequence, which
must be next to a PAM (Protospacer Adjacent Motif)
sequence. Cas9 cuts DNA, creating a DSB. Repair via
NHEJ (knockout via indels) or HDR (precise repair if a
DNA template is provided). Can be used for selective
reproduction (introducing/removing traits before birth).
NHEJ (Non-Homologous End Joining): An error-prone
repair pathway that often leads to small insertions or
deletions (indels), resulting in a gene knockout.
HDR (Homology-Directed Repair): A precise repair
pathway that requires a donor DNA template with
homology arms to the target site. Used
for knockin (inserting a gene or correcting a mutation).
Conditional gene editing (Cre-lox recombination): Used to
control where (tissue-specific) and when (time-specific) a gene is
modified. Useful for studying gene function without unwanted
developmental effects. Cre recombinase is an enzyme that
recognizes and cuts at loxP sites (short DNA regions that flank the
target gene). Cre is introduced under tissue-specific or inducible
promoters. When expressed, Cre excises or inverts the DNA
between the loxP sites, leading to gene knockout.
3. Random mutagenesis: Involves inserting DNA randomly into the
genome. Less control and may unintentionally disrupt endogenous genes.
o Applications of Transgenic Animals: Disease modeling (e.g., Alzheimer's,
Huntington's), functional genomics (studying gene function and regulatory
regions), pharmaceutical production (e.g., monoclonal antibodies in CHO cells),
xenotransplantation, and vaccine development [11 gene or mutation is inserted at
a specific locus, leading to a gain of function. Achieved via HDR. Can involve
inserting a gene or reporter (e.g., LacZ) to track expression in a known promoter.
o Knockdown: Partial reduction in gene expression, usually achieved via RNA
interference (RNAi) or short hairpin RNA (shRNA).
o Applications: Disease correction, gene silencing in cell lines, and modifications
in live animals.
Genetic Approaches to Treat Disease
o Production of reagents:
1. Therapeutic proteins: Produced using genetically engineered
organisms (e.g., CHO cells for insulin, monoclonal antibodies). Requires
precise control of expression, purification, and post-translational
modifications (PTMs).
2. Vaccines: Engineered viral vectors (e.g., Ebola vaccine uses GM VSV
virus), mRNA vaccines (synthetic mRNA encoding antigens like spike
protein for COVID-19), and DNA vaccines (deliver plasmids into host
cells to express antigens).
3. Gene therapy products: (e.g., Casgevy for sickle cell disease).
o Cell therapy: Uses cells as therapeutic agents.
Applications: iPSC-derived neurons for neurodegenerative diseases,
hematopoietic stem cell transplants, and regenerative repair (e.g., spinal
injury, retinal degeneration).
Limitations/Risks: Tumor risk (especially with pluripotent stem cells like
iPSCs), immune rejection in allogeneic transplants, poor control over
differentiation (wrong tissue formation), ethical issues with embryonic
stem cells, and difficulty in effective delivery/targeting of damaged
tissues.
o CAR T-cells (Chimeric Antigen Receptor T-cells): A type of cell therapy for
cancer.
Process: T cells are extracted from patients, genetically modified with a
viral vector to express an artificial CAR19 receptor (which targets CD19-
positive cancer cells). A second gene can be modified to prevent donor T
cells from being destroyed or rejected by the host immune system. These
modified T cells are then infused back into patients to destroy CD19-
positive malignant B cells.
If donor T cells are used (allogeneic transplant), Graft-vs-Host reaction is
a risk. TALEN can be used to genetically modify donor T cells to disrupt
T cell receptor expression (so they are not recognized by host MHC
molecules) and knock out CD52 (making them resistant to alemtuzumab,
an antibody used in treatment).
Stem Cells
o Use: Regenerative medicine (e.g., spinal injury, macular degeneration), disease
modeling of cell types within a specific lineage (e.g., adult stem cells in bone
marrow).
o How to Make:
1. Embryonic Stem Cells (ESCs): Derived from the inner cell mass of a
blastocyst. They are pluripotent and stable (not prone to mutations).
2. Adult Stem Cells: Found in adult tissues (e.g., hematopoietic stem cells
in bone marrow). They are multipotent and can be used to make iPSCs.
3. Induced Pluripotent Stem Cells (iPSCs): Created by reprogramming
somatic cells (e.g., skin fibroblasts) into a pluripotent state using specific
transcription factors (Oct4, Sox2, Klf4, c-Myc). Techniques include
lentiviral transduction and episomal vectors.
Pros: Patient-specific (not attacked by immune system), no ethical
issues (no embryos needed).
Cons: Tumor risk, low reprogramming efficiency, may have
epigenetic memory, not as "good" as ESCs, and prone to
mutations.
Concepts Learnt from Risk Assessment Lecture
o Environmental risk assessment and human health risk assessment are central.
o GMO regulation varies worldwide (EU is strictest).
o GMOs in the EU are regulated under two directives: one for deliberate release
into the environment and one for contained use.
o Environmental Risk Assessment (ERA): Evaluates risks to human health and
the environment, identifying, characterizing, and managing potential risks before
GMOs are used in clinical trials, agriculture, or industry.
o Genome editing technologies (e.g., CRISPR) are currently regulated as GMOs,
but this is evolving.
o Considerations in GMO medicines (gene therapies, cell therapies, vaccines):
Need extensive preclinical trials in model organisms before approval.
Risks:
Shedding: Genetically modified material released into the
environment (e.g., from dead embryos), potentially infecting
unintended individuals.
Replication competence: Ability of a viral vector to replicate
uncontrollably after introduction, spreading within the patient or
environment if defective replication is not ensured.
Insertional mutagenesis: Gene inserted in the wrong place,
potentially leading to cancer.
Genetic stability of the construct: Whether the transgene
(inserted gene) remains intact and unchanged over time in host
cells; instability could lead to ineffectiveness or harm (e.g.,
producing toxic proteins).
Recombination of the construct: Inserted genetic material could
recombine
CRISPR-edited chickens: Engineered so male embryos die early in
development, while females survive and are non-transgenic (transgene not
heritable). Regulations must check for horizontal gene transfer or
incomplete lethality of males, ensure transgenic animals do not enter the
environment, and assess human exposure to transgenic embryos or eggs
with GMO remnants.
o Key Distinction:
Risk assessment: Science-based process to evaluate hazards and
likelihood (e.g., of GMOs on human health, environment).
Risk management: Policy-driven process that uses scientific input to
make regulatory decisions.
Risk communication: Interactive exchange of information and opinions
concerning risks.
Organisms that can be Genetically Modified (and their purpose):
o Bacteria (E. coli, Bacillus subtilis): Production of insulin, enzymes, vitamins.
o Plants (Maize, soybean, tomato, potato): Pest resistance, herbicide tolerance,
improved nutrition.
o Animals (Mice, zebrafish, dogs, pigs, chickens): Disease models, drug testing,
organ donor models.
o Fish (Salmon e.g., AquaAdvantage salmon): Faster growth via modified hormone
regulation.
o Human cells (CAR T cells, gene therapies): Treat genetic diseases, cancer
immunotherapy.
o Purpose: Research, therapeutics, agriculture, industrial biotechnology.
1. Question: Describe the main strategies for generating transgenic animals. How do these
differ, and what are their respective advantages and disadvantages?
o Answer: There are two main strategies for generating transgenic animals:
1. Insertion into germ cells (random insertion): This involves directly
injecting foreign DNA into fertilized oocytes (pronuclear injection) or
transferring DNA into early embryos or gametes. This method results in
the random integration of the foreign DNA into the host genome, meaning
all cells of the animal will carry the modification, and it will be passed to
progeny. An advantage is its relative simplicity for initial transgenesis. A
disadvantage is the lack of control over where the DNA inserts, which can
lead to insertional mutagenesis (disrupting an endogenous gene) or
variable expression levels.
2. Gene targeting (precise modification): This strategy uses homologous
recombination to insert or alter genes at a specific, predetermined locus.
Techniques like ZFNs, TALENs, and CRISPR-Cas9create double-strand
breaks (DSBs) at precise genomic locations, which are then repaired via
either NHEJ (error-prone, leading to knockout by small indels)
or HDR (precise repair allowing knockin of new sequences or gene
correction if a donor template is provided). The advantage here is precise
control over the genetic modification, allowing for specific gene
knockouts, knockins, or even conditional gene editing (Cre-lox
recombination) that enables tissue- and time-specific modifications. The
disadvantage is that these methods are more complex and require careful
design and validation.
2. Question: Compare and contrast gene knockout, knockin, and knockdown techniques in
terms of their molecular outcomes and applications.
o Answer: These techniques manipulate gene expression levels:
Gene Knockout: Involves the complete inactivation of a gene, leading to
a total loss of function. This is typically achieved by disrupting the gene,
often through error-prone Non-Homologous End Joining (NHEJ) after a
double-strand break. Applications include studying gene function by
observing the phenotypic consequences of its absence and modeling
human diseases caused by gene inactivation.
Gene Knockin: Involves the precise insertion of a new gene, a modified
version of an existing gene, or a specific mutation at a particular locus.
This is achieved through Homology-Directed Repair (HDR), which
requires a donor DNA template. Knockin can result in a gain of
function (e.g., introducing a hyperactive gene) or correction of a mutation.
Applications include modeling human diseases caused by specific
mutations, tracking gene expression in vivo by inserting reporter genes,
and developing gene therapies to correct genetic defects.
Gene Knockdown: Involves a partial reduction in gene expression,
rather than complete inactivation. This is commonly achieved using RNA
interference (RNAi) or short hairpin RNA (shRNA) molecules that target
and reduce the amount of specific mRNA. Applications include studying
the effects of reduced gene dosage, which might more closely mimic
certain disease states or allow for dose-dependent studies, and gene
silencing in cell lines for research.
3. Question: Discuss the ethical and safety considerations involved in the genetic
manipulation of mammalian cells for therapeutic purposes (e.g., gene therapies, cell
therapies). What are some key risks that need to be assessed during preclinical trials?
o Answer: Genetic manipulation of mammalian cells for therapeutics, such as gene
therapies and cell therapies, involves significant ethical and safety considerations.
Ethical concerns particularly arise with the use of embryonic stem cells
(ESCs) due to their origin, though induced pluripotent stem cells (iPSCs)
offer an alternative with fewer ethical concerns as they are derived from
adult somatic cells. Other ethical debates involve gene editing in humans
and animals.
Safety considerations are paramount and necessitate extensive preclinical
trials in model organisms before approval. Key risks that must be assessed
include:
Shedding: The potential for genetically modified material (e.g.,
viral vectors) to be released from the patient into the environment,
potentially infecting unintended individuals.
Replication competence: Ensuring that viral vectors used for gene
delivery are replication-defective to prevent uncontrolled spread
within the patient or environment.
Insertional mutagenesis: The risk that the therapeutic gene might
insert into the wrong place in the host genome, potentially
disrupting an essential gene or activating an oncogene, leading to
cancer.
Genetic stability of the construct: Ensuring that the introduced
transgene remains intact and functional over time within the host
cells to maintain efficacy and avoid harmful protein production.
Recombination of the construct: The possibility that the inserted
genetic material could recombine with host DNA or other viral
sequences in unintended ways, potentially generating new, harmful
viruses or hybrid genes.
Molecular characteristics of the construct: Poorly designed
constructs might lead to overexpression of the therapeutic gene,
silencing of host genes, or unintended off-target effects. These
risks are meticulously evaluated through processes like
Environmental Risk Assessment (ERA) and human health risk
assessment.