[go: up one dir, main page]

0% found this document useful (0 votes)
15 views46 pages

Long Study Guide

This study guide provides an overview of genome sequencing techniques, including PCR, RNA-seq, and various DNA and protein analysis methods. It discusses the principles and applications of each technique, such as RT-qPCR for gene expression quantification and Sanger vs. Next Generation Sequencing. Additionally, it covers bioinformatics tools for data analysis and gene annotation, along with discussion questions to enhance understanding of the material.

Uploaded by

teesriv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views46 pages

Long Study Guide

This study guide provides an overview of genome sequencing techniques, including PCR, RNA-seq, and various DNA and protein analysis methods. It discusses the principles and applications of each technique, such as RT-qPCR for gene expression quantification and Sanger vs. Next Generation Sequencing. Additionally, it covers bioinformatics tools for data analysis and gene annotation, along with discussion questions to enhance understanding of the material.

Uploaded by

teesriv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

This study guide provides a comprehensive overview of the topics covered in the sources,

including detailed explanations of key concepts and discussion questions with answers to
enhance your understanding.

1. Sequencing Genomes

This section covers various techniques for studying genomes, from amplifying DNA and RNA to
advanced sequencing technologies and bioinformatics tools.

Key Concepts:

 Studying Genomes
o Genomes are too large and complex to study directly, so they are broken into
small fragments.
o PCR (Polymerase Chain Reaction): Amplifies specific DNA sequences in vitro.
 DNA doubles each cycle through three steps: Denaturation (separating
dsDNA at 95°C), Annealing(primers bind to target DNA at 55°C),
and Extension (Taq DNA polymerase extends primers at 72°C).
o Cloning: Creates identical copies of DNA fragments using host cells, typically
bacteria.
 Steps: Insert DNA into a cloning vector (must have a replication origin,
multiple cloning sites, selectable markers, and screenable markers),
transform recombinant plasmid into a host cell, select colonies with
antibiotics and color screening, and grow overnight cultures to get plasmid
DNA.
 Colony Selection: Plasmids often contain the lacZ gene, which codes for
the enzyme β-galactosidase. This enzyme cleaves X-gal sugar, turning the
colony blue.
 Blue colonies indicate that the target DNA was not inserted
because the lacZ gene is intact.
 White colonies indicate that the target DNA was inserted,
disrupting the lacZ gene, preventing β-galactosidase production
and X-gal cleavage.
 RNA Techniques
o RT-qPCR (Real-Time Quantitative PCR): Amplifies cDNA from RNA in real
time, with fluorescence correlating to the amount of starting RNA.
 Quantifies specific transcripts like mRNA and miRNA.
 Requires reverse transcription (RNA to cDNA), a fluorescent
probe/dye (e.g., SYBR green, TaqMan), and a cq value output reflecting
transcript abundance.
 SYBR green: Intercalates into the minor groove of dsDNA and can bind
to non-specific products like primer dimers.
 TaqMan: Uses a reporter dye at the 5’ end and a quencher dye at the 3’
end; Taq polymerase cleaves the probe, causing fluorescence.
 Applications: Gene expression analysis and validation of RNA-seq
results.
o RNA-seq (RNA Sequencing): Unbiased, hypothesis-free method to profile gene
expression by sequencing the transcriptome.
 Detects transcripts, alternative splicing, and novel transcripts (those
transcribed from the genome but not in existing gene annotations or
databases).
 Sensitive enough for single-cell analysis.
 Uses sample barcoding for pooling and cost efficiency.
 Requires bioinformatics for large data analysis.
 DNA Techniques
o dPCR (Digital PCR): Partitions samples into thousands of individual PCR
reactions, each with 0, 1, or a few DNA molecules.
 The Poisson distribution corrects for over/underestimation of DNA
concentration.
 More accurate than qPCR.
 Provides a binary readout (positive samples fluoresce, negative ones do
not), offering better sensitivity.
 Does not require a standard curve because it directly counts molecules.
 Unaffected by inhibitors.
o FISH (Fluorescence in situ Hybridization): Detects specific RNA or DNA
molecules in cells or tissues using fluorescently labeled RNA/DNA probes.
 Offers high spatial resolution, allowing gene expression analysis at
cellular or organ levels.
 Can be multiplexed with different colored probes.
o Methyl-seq (Methylation Sequencing): Maps DNA methylation patterns, which
are epigenetic marks that silence genes.
 Methylated CpG sites often correlate with gene silencing, especially in
promoter regions.
 Method: Uses bisulfite conversion, which converts unmethylated cytosine
to uracil, while methylated cytosines remain unchanged. After sequencing,
bisulfite-treated DNA is compared to untreated DNA to identify
methylated sites.
 Applications: Epigenetics, developmental biology, and cancer research.
 Protein Techniques
o Western Blotting (Immunoblotting): A method to detect specific proteins.
 Steps: Protein separation by SDS-PAGE, transfer to a membrane, and
detection with specific antibody probes.
 Chromatin Configuration Techniques
o Hi-C-seq (Chromosome Conformation Capture Sequencing): Maps 3D
chromatin interactions within the nucleus.
 Method: Crosslink DNA and associated proteins, digest, ligate, and purify
chimeric DNA fragments that reflect spatial interactions, then biotin label
and sequence.
 Applications: Detects TADs (topologically associating domains),
chromatin loops, A/B compartments, and chromosome territories. Used to
understand gene regulation and its disruption in diseases like cancer.
o ChIP-seq (Chromatin Immunoprecipitation Sequencing): Identifies DNA-
protein binding sites, such as transcription factors (TFs) and histones.
 Methods: Crosslink proteins to DNA, fragment DNA (e.g., by sonication),
immunoprecipitate with a specific antibody, and then sequence the bound
DNA.
 Applications: Maps TF binding, histone modifications, and changes in
chromatin binding under different conditions (e.g., euchromatin vs.
heterochromatin).
o ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing):
Maps open chromatin regions (euchromatin).
 Uses Tn5 transposase, which cuts accessible DNA and inserts adapters.
 Shows regulatory elements, TF motifs, and changes in accessibility across
cell types or conditions.
 Single Cell Genomics (SCG)
o Aim: Characterize gene expression, epigenetic marks, and mutations in individual
cells to create a complete catalog of human cell types and reveal cellular
heterogeneity.
o Technologies: Single-cell RNA-seq and ATAC-seq.
o Applications: Lineage tracing, biomarker discovery, therapeutic monitoring. It
can identify rare cell types (e.g., stem cells, cancer precursors), study disease
progression and developmental trajectories, track cellular lineages, and contribute
to projects like the Human Cell Atlas.
 Sequencing Techniques
o Sanger Sequencing (1st Generation): A chain termination method.
 Uses a DNA template, primer, DNA polymerase, dNTPs, and
fluorescently labeled ddNTPs (dideoxynucleotides).
 ddNTPs terminate elongation because they lack a 3’ OH group.
 Products are separated by capillary electrophoresis, and the sequence is
read from a fluorescence-based chromatogram/electropherogram.
 Applications: Good for single gene sequencing, validating mutations
identified by NGS, and genotyping known variants. Used in small
sequencing projects (e.g., verifying mutations) and early genome projects
(e.g., human, mouse, dog).
 Strengths: Read length of approximately 1000 bp, low error rate, and
robust (still used for specific tasks).
 Limitations: Low throughput, time-consuming, and expensive per base
for large projects.
o Next Generation Sequencing (NGS, 2nd Generation): Involves parallel
sequencing of millions of fragments and requires amplification.
 Steps: DNA fragmentation, adapter ligation, amplification (PCR-based),
sequencing by synthesis (e.g., Illumina using fluorescence-based
reversible terminator chemistry), detection and base calling, alignment to a
reference genome, and variant calling and annotation.
 Strengths: Massive data output, suitable for discovering new genes, and
rare variant discovery (SNPs, indels, CNVs).
 Limitations: High initial equipment costs, requires strong bioinformatics
support, short reads, and data overload.
 Technologies:
 Illumina (Solexa): Short reads, high throughput, low cost per base.
Challenges include difficulty with repetitive regions,
overclustering, and signal loss.
 Ion Torrent: Various chemistries and read lengths.
 Applications: Whole genome/exome/transcriptome sequencing, RNA-seq,
ChIP-seq, and identifying SNPs, indels, CNVs, and alternative splicing.
o Third Generation Sequencing (TGS): Single-molecule sequencing, usually in
real time and often without PCR, which minimizes GC bias.
 Strengths: Long reads to resolve complex genomic regions and minimal
GC bias.
 Limitations: Lower accuracy than NGS, but improving; might require
intensive data analysis.
 Technologies:
 PacBio (SMRT - Single Molecule Real-Time sequencing): Long
reads, no PCR (minimal GC bias), uses SMRTbell (circular DNA)
as a template, real-time fluorescent detection as DNA polymerase
incorporates nucleotides, and directly detects base modifications
(e.g., methylation).
 Oxford Nanopore (MinION): Portable, very long reads, low cost,
real-time sequencing. Weaknesses include a high error rate.
Measures changes in electric current as DNA passes through a
protein nanopore. No PCR needed.
 Applications: Genome assembly, structural variation, full-length
transcriptomics, and epigenetics.
 SNP Genotyping
o SNP genotyping: Identifies Single Nucleotide Polymorphisms (SNPs).
o Methods:
 SNP chips/microarrays: Probes for thousands of known SNPs across the
genome; high throughput, used in genetic mapping and association studies
for disease screening, expression profiling, and breed mapping in animals.
 TaqMan Assays: Fluorescent probe-based qPCR targeting specific SNP
alleles, with allele-specific probes having different reporter dyes (e.g.,
FAM, VIC) and a quencher dye; highly specific.
 RFLP (Restriction Fragment Length Polymorphism): SNPs must
affect a restriction enzyme site; involves PCR, enzymatic digestion, and
gel electrophoresis.
 Sequencing-based genotyping: Direct sequencing (e.g., Sanger, NGS) to
detect SNPs.
 Microsatellite genotyping: Uses STRs (short tandem repeats), which are
very polymorphic; involves PCR and polyacrylamide gel electrophoresis.
Applications: Forensics, paternity testing, and population genetics.
 Bioinformatics
o Definition: Application of computational tools to understand, organize, and
analyze molecular biological data; described as a "management information
system for molecular bio".
o Disciplines: Combines biology, computer science, medicine, physics,
mathematics, and statistics.
o Core Objectives: Model biological systems (e.g., protein folding, cellular
pathways), understand sequences, organize vast sequence data, and handle the
"data deluge" from high-throughput technologies (e.g., Illumina, PacBio).
Important for studying gene expression, regulatory networks, and evolutionary
relationships.
o Evolution of the Field:
 Pre-genomic era: Focused on sequence → structure → function.
 Post-genomic era: Shifted to genome → cell function → phenotype, with
an expansion of "omics" disciplines.
o Modern Implications: Bioinformatics has become an information science, and
experimental biology now relies on computations for candidate gene discovery,
phenotype-genotype correlations, visualization, and simulation of cellular
networks.
o Comparing Biological Sequences: Aims to identify homologous sequences.
 Homologs: Sequences sharing a common ancestor.
 Orthologs: Same gene in different species, evolved from a common
ancestor.
 Paralogs: Gene duplication within the same organism.
 Evolutionary Events: Insertion, deletion, substitution, duplication,
inversion, and translocation all drive sequence divergence across species
and within genomes.
o Sequence Alignment: Measures similarity by comparing nucleotides or amino
acids.
 Scores for matches, mismatches, and gaps (indels).
 Types: Global alignment (aligns entire sequence end-to-end) or Local
alignment (finds best matching subsections/domains).
 Scoring and Speed: Scoring matrices (e.g., BLOSUM, PAM for proteins)
weigh biologically likely substitutions and conserved residues. Gap
penalties (gap opening and extension cost models) discourage excessive
indels. Computationally heavy, so optimization and heuristics like BLAST
are used for feasibility.
o Gene Annotation: Assigns biological meaning to raw sequences.
 Methods include similarity search, comparative genomics, and de
novo prediction.
o Genome Browsers: Software tools enabling visualization and exploration of
annotated genomic sequences.
 Display conserved sequences, synteny (conserved gene order), and
orthologous genes across species.
 Examples: NCBI Genome Data Viewer, UCSC Genome Browser,
Ensembl.
 Integrated with BLAST tools, comparative genomics functions, and
transcript annotations (e.g., RefSeq, GENCODE).
 Information Provided: Gene names, symbols, synonyms, transcript
variants (alternative splicing), protein domains and structure, Gene
Ontology (GO) annotations, expression data (e.g., RNA-seq), variation
data (SNPs, indels), and links to literature and databases.
 Practical Use: Using these browsers can prevent "running experiments
blindly".
 Features: Interactive interface for zooming, track-based format (users can
turn data layers on/off), and links to functional data (e.g., GO), clinical
significance (e.g., ClinVar), and population variants (e.g., 1000 Genomes
Project).
 Applications: Identify gene structures and regulatory elements, compare
genomes across species (conservation analysis), design primers or probes,
view experimental data (e.g., ChIP-seq, RNA-seq), and for educational
and research exploration of genome architecture.
o Tools for Homology-based Searching: BLAST, BLAT, Gene trees (Ensembl),
Genome alignments and conservation tracks.
o BLAST (Basic Local Alignment Search Tool): A tool to quickly compare
sequences.
 Steps: Break query into short "words", match to database, extend
alignments around word matches.
 Applications: Identify unknown genes/proteins, annotate new genomes,
and find conserved regions across species.

Discussion Questions and Answers:

1. Question: Explain how RT-qPCR and RNA-seq differ in their approach to studying gene
expression, and what unique insights each technique offers.
o Answer: RT-qPCR focuses on quantifying specific, pre-selected transcripts by
real-time amplification of cDNA from RNA, providing precise measurements of
transcript abundance for a targeted set of genes. It is highly sensitive and specific.
In contrast, RNA-seq is an unbiased, hypothesis-free method that sequences the
entire transcriptome, providing a comprehensive profile of all expressed genes.
RNA-seq can detect novel transcripts, alternative splicing events, and is sensitive
enough for single-cell analysis, offering broader insights into gene expression
dynamics without prior knowledge of specific genes. While RT-qPCR is good for
validation, RNA-seq is for discovery.
2. Question: Describe the principle of DNA cloning, including the role of the lacZ gene and
X-gal in distinguishing successful DNA insertion.
o Answer: DNA cloning involves creating identical copies of DNA fragments by
inserting them into a cloning vector, which is then introduced into a host cell,
typically bacteria. The vector must have a replication origin, multiple cloning
sites, and markers for selection and screening. A common screening method uses
the lacZ gene, which codes for β-galactosidase. This enzyme breaks down X-gal
sugar, producing a blue color. When the target DNA is successfully inserted into
the cloning site within the lacZgene, it disrupts the gene, preventing β-
galactosidase production. As a result, colonies containing the inserted DNA
remain white because X-gal is not cleaved. If the target DNA is not inserted,
the lacZ gene remains intact, β-galactosidase is produced, and the colony
turns blue when X-gal is present.
3. Question: Compare and contrast Sanger sequencing, Next Generation Sequencing
(NGS), and Third Generation Sequencing (TGS) in terms of their core methodology,
strengths, and limitations.
o Answer:
 Sanger Sequencing (1st Gen) uses a chain termination method with
ddNTPs to stop DNA synthesis at specific bases, followed by capillary
electrophoresis to separate fragments by size, and a fluorescence-based
read-out. Its strengths include read lengths of ~1000 bp, low error rates,
and robustness for single gene sequencing or validating other methods.
Its limitations are low throughput, time consumption, and high cost per
base for large projects.
 NGS (2nd Gen) involves parallel sequencing of millions of DNA
fragments after amplification. It typically uses sequencing by synthesis
with reversible terminator chemistry (e.g., Illumina). Its strengths are
massive data output, suitability for discovering new genes, and rare variant
detection (SNPs, indels, CNVs). Its limitations include high initial
equipment costs, requirement for strong bioinformatics support, short
reads, and potential data overload.
 TGS (3rd Gen) performs single-molecule sequencing, often in real-time
and without PCR, minimizing GC bias. Technologies like PacBio (SMRT)
detect nucleotides as they are incorporated, and Oxford Nanopore
(MinION) measures electrical current changes. Strengths are very long
reads, which are excellent for resolving complex genomic regions and
structural variations, and minimal GC bias. Limitations currently include
lower accuracy than NGS, though this is improving, and potentially
intensive data analysis.
4. Question: Discuss the role of bioinformatics in the post-genomic era. How has the field
evolved, and what are its core objectives and modern implications?
o Answer: In the post-genomic era, bioinformatics has shifted from focusing on
sequence-to-function predictions to understanding how the entire genome relates
to cell function and phenotype. The field has expanded significantly due to the
"omics" revolution. Its core objectives include modeling biological systems,
understanding sequences, organizing vast amounts of sequence data, and
managing the "data deluge" generated by high-throughput technologies like
Illumina and PacBio. Modern implications are that biology has become
an information science, with experimental biology increasingly relying on
computational methods for tasks like candidate gene discovery, genotype-
phenotype correlations, and visualization/simulation of cellular networks.
Bioinformatics is crucial for comparing biological sequences to identify
homologs, orthologs, and paralogs, and for performing sequence alignment and
gene annotation to assign biological meaning to raw data.
2. Organization of Human Genomes

This section details the structure and composition of the human mitochondrial and nuclear
genomes, including protein-coding genes, non-coding RNAs, and repetitive DNA elements, as
well as the modern definition of a gene from the ENCODE project.

Key Concepts:

 Mitochondrial vs. Nuclear Genomes


o Mitochondrial Genome: Circular DNA structure, inherited from the mother,
many copies per mitochondrion (none in RBC), 16,569 bp in size, 37 genes (13
proteins, 22 tRNAs, 2 rRNAs), no introns, higher mutation rate (no error
checking), 93% coding content, D-loop used to track maternal lineages. Its
primary function is ATP production via cellular respiration.
o Nuclear Genome: Linear DNA as 23 chromosomes, inherited from both
parents, diploid in somatic cells, 3.1 billion bp in size, approx. 40,000
genes (21,000 protein-coding + ncRNAs), introns present, lower mutation rate
(robust repair systems), 2% coding content, primary genome for traits and
diseases.
 Composition of the Mitochondrial Genome
o Genes: 13 protein-coding genes (all for ATP production), 22 tRNAs (for
mitochondrial translation), 2 rRNAs (16S and 12S).
o Features: No introns, common gene overlap, very compact, 60 sense/coding
codons (compared to 61 in nuclear DNA), and 4 STOP codons (UAA, UAG,
AGA, AGG). Relies on 1700 nuclear genes for its function.
o Control region: The D-loop, which is hypervariable and used in forensic tracking
(e.g., Romanov case).
 Composition of the Nuclear Genome
o Chromosomes: 23 pairs (22 autosomes, 1 pair of sex chromosomes).
o GC Content: Approximately 41.5%.
o Heterochromatin: Approx. 200 Mb, condensed and unsequenced.
o Protein-Coding Genes: Include exons, introns, and promoters.
 Gene Families: Arise from gene duplication, often clustered or dispersed
(e.g., globins, MHC, olfactory receptor genes). May include pseudogenes.
Important in evolution and functional redundancy.
 Pseudogenes: Defective gene copies.
 Non-processed: Arise from gene duplication.
 Processed: From reverse-transcribed mRNA (lack introns).
 Usually non-functional, though some are functional (retrogenes).
Approximately 20,000 in the human genome.
 Non-Coding RNA (ncRNA)
o Main Classes: rRNAs (28S, 18S, 5.8S, 5S encoded in large arrays) and tRNAs
(approx. 600 genes scattered across genome).
o Short ncRNAs:
 snRNAs: Involved in RNA splicing (e.g., U1, U2).
 snoRNAs: Modify rRNAs and tRNAs.
 scaRNAs: Modify snRNAs in Cajal bodies.
 miRNAs (microRNAs): Small (18-25 nt) noncoding RNAs that regulate
gene expression post-transcriptionally.
 Often located in introns of other genes, can cluster and be co-
transcribed as polycistronic transcripts. Play a very large role in
regulatory networks.
 Biogenesis: Transcribed by RNA Pol II as primary-miRNA (pri-
miRNA). Cleaved by Drosha (RNase III enzyme) into pre-miRNA
(70nt hairpin) in the nucleus. Transported into the cytoplasm by
Exportin-5. Dicer (another RNase III enzyme) cleaves pre-miRNA
into a miRNA duplex. One strand (guide strand) is loaded onto the
RNA-induced silencing complex (RISC), while the other
(passenger strand) is degraded.
 Function: Downregulate target genes by the miRNA-RISC
complex binding to complementary sequences in the 3’
untranslated region (3’ UTR) of target mRNAs. If perfectly
complementary, the target mRNA is cleaved and degraded; if
partially complementary, it represses translation or destabilizes the
mRNA.
 One miRNA can target hundreds of mRNAs, and one mRNA can
be regulated by multiple miRNAs.
 Clinical Relevance: Dysregulated or tumor-suppressor miRNAs
are linked to cancer (oncogenic miRNAs), cardiovascular disease,
and neurodegenerative disorders. Used as biomarkers in blood or
tissues.
 Therapeutic Potential: miRNA mimics can restore tumor-
suppressor miRNAs, and miRNA inhibitors (antagomirs) can
silence overexpressed oncomiRs.
 piRNAs: Silence transposons, especially in spermatogenesis.
o Long ncRNAs (lncRNAs): >200 nucleotides, low coding potential.
 Includes intergenic (lincRNAs), intronic, and antisense types.
 Approximately 78% show tissue/developmental specificity.
 Regulate transcription, chromatin, and epigenetic states.
 Highly Repetitive DNA Elements
o Heterochromatin/Tandem Repeats: Found in centromeres and telomeres,
generally not transcribed.
 Types: Satellites (repeated ncDNA sequences in centromeres/telomeres),
minisatellites, and microsatellites.
o Transposons (Interspersed Repeats): Mobile genetic elements.
 Class I (Retrotransposons): "Copy and paste" mechanism involving
transcription, reverse transcription, and integration. Can be with or without
LTRs (long terminal repeats).
 LINEs (Long Interspersed Nuclear Elements): Autonomous,
approximately 17% of the genome.
 SINEs (Short Interspersed Nuclear Elements): Non-autonomous
(rely on LINEs), e.g., Alu elements.
 Class II (DNA Transposons): "Cut and paste" mechanism.
o Importance: Contribute to structural integrity and genome evolution, but can also
be mutagenic.
 Gene Definition
o Historical Evolution:
 Pre-1900: Unit of heredity.
 1950s: Molecule.
 1960s: Transcribed unit.
 1970s-80s: ORF-based (Open Reading Frame).
 Post-1990s: Genomic entity.
 Post-ENCODE: A gene is defined as "a union of genomic sequences
encoding a coherent set of potentially overlapping functional products".
o ENCODE Project (Encyclopedia of DNA Elements): (2003-2012) redefined the
understanding of genome functionality.
 Key Findings: Approximately 80% of the genome has biochemical
functions. 75-80% of the genome is transcribed in at least one cell type.
Most transcripts are non-coding RNA. Each protein-coding gene produces
6-7 transcripts via alternative splicing. DNA and histone modifications
vary by cell type and affect regulation. Disproved the notion of "junk
DNA" by showing non-coding regions have regulatory roles. Produced a
catalogue of human peptides and proteins.

Discussion Questions and Answers:

1. Question: Compare and contrast the key features of the human mitochondrial and
nuclear genomes, highlighting their structural, inheritable, and functional differences.
o Answer: The mitochondrial genome is a circular DNA molecule, inherited
exclusively from the mother, exists in many copies per cell (many mitochondria
per cell, none in RBCs), is relatively small (16,569 bp), contains 37 genes (13
protein-coding, 22 tRNAs, 2 rRNAs), and notably lacks introns. It has a higher
mutation rate due to less robust error checking. Its primary function is ATP
production through cellular respiration. In contrast, the nuclear genome consists
of linear DNA organized into 23 pairs of chromosomes, inherited from both
parents, is diploid in somatic cells, is vastly larger (3.1 billion bp), contains
approximately 40,000 genes (21,000 protein-coding), and contains introns. It has
a lower mutation rate due to more robust repair systems. The nuclear genome is
the primary determinant for traits and diseases, encoding diverse structural
proteins, enzymes, and regulatory elements.
2. Question: Discuss the various classes of non-coding RNAs in the human genome and
their diverse roles, with a particular focus on microRNAs (miRNAs) and their biogenesis.
o Answer: Non-coding RNAs (ncRNAs) are abundant and play diverse roles in the
human genome. Main classes include rRNAs (essential for ribosomes)
and tRNAs (for protein translation). Short ncRNAsinclude snRNAs (for RNA
splicing), snoRNAs (modify rRNAs/tRNAs), scaRNAs (modify snRNAs),
piRNAs (silence transposons), and most notably, miRNAs. miRNAs are small
(18-25 nt) regulators of gene expression post-transcriptionally.
 miRNA Biogenesis: It begins with transcription by RNA Pol II into
a primary-miRNA (pri-miRNA). In the nucleus, the pri-miRNA is
cleaved by the Drosha enzyme into a smaller hairpin-structured pre-
miRNA. This pre-miRNA is then exported from the nucleus to the
cytoplasm by Exportin-5. In the cytoplasm, the Dicer enzyme cleaves the
pre-miRNA into a mature miRNA duplex. Finally, one strand of this
duplex (the guide strand) is loaded into the RNA-induced silencing
complex (RISC), while the other strand is degraded. The miRNA-RISC
complex then binds to complementary sequences, typically in the 3' UTR
of target mRNAs, to either cleave and degrade the mRNA or repress its
translation.
 Long ncRNAs (lncRNAs) are over 200 nt and regulate transcription,
chromatin, and epigenetic states. These diverse ncRNAs highlight that the
genome's functionality extends far beyond protein-coding genes.
3. Question: Explain the redefinition of a "gene" as proposed by the ENCODE project.
How did ENCODE findings challenge the previous understanding of the human genome,
particularly the concept of "junk DNA"?
o Answer: The ENCODE project (2003-2012) redefined a gene as "a union of
genomic sequences encoding a coherent set of potentially overlapping functional
products". This updated definition reflects the complexity revealed by ENCODE,
moving beyond simply protein-coding sequences. ENCODE findings significantly
challenged the concept of "junk DNA" by demonstrating that approximately 80%
of the human genome has biochemical functions. It found that 75-80% of the
genome is transcribed into RNA in at least one cell type, with most of these
transcripts being non-coding RNAs. Furthermore, it showed that DNA and
histone modifications vary across cell types, influencing gene regulation, and that
protein-coding genes can produce multiple transcripts through alternative
splicing. These discoveries revealed that vast non-coding regions previously
considered "junk" actually play crucial regulatory roles, contributing to the
complexity of human biology.

3. Gene Regulation and Epigenome

This section explores the intricate mechanisms that control gene expression, from chromatin
structure and transcription factors to epigenetic modifications and post-transcriptional
processing.

Key Concepts:

 Gene Regulation
o Definition: When, where, and how much a gene is expressed.
o Regulation occurs at multiple levels to ensure spatial (where)
and temporal (when) control.
o It ensures tissue specificity, developmental stage-specific expression, and
responses to environmental signals.
 Chromatin/Accessibility/TADs
o Chromatin: Composed of DNA, histones, and other proteins.
 Euchromatin: Open, transcriptionally active.
 Heterochromatin: Condensed, transcriptionally inactive.
 Remodeling changes DNA accessibility to promoters, aiding or hindering
RNA polymerase binding.
o TADs (Topologically Associating Domains): 3D chromatin domains that bring
enhancers near promoters.
 Delineated by boundary proteins (also called boundary elements or TAD
insulators) which are regulatory DNA sequences that prevent the spread of
heterochromatin and block enhancers from activating unintended genes.
 TAD boundaries are usually found near promoters and transcription start
sites, are highly conserved across species, and are often bound to the
CTCF transcription factor.
 Crucial for gene regulation, and their disruption is linked to rare diseases.
 Levels of Regulation
o Transcriptional (in the nucleus):
 1. Chromatin Remodeling: Controls DNA accessibility through
modifications of histones.
 Modifications include acetylation (generally activates gene
expression), methylation (can activate or repress depending on
amino acid and degree), and phosphorylation.
 Involves writers (add modifications), erasers (remove
modifications), and readers (interpret modifications).
 2. Promoter Accessibility: Promoters are DNA sequences upstream of
genes.
 Core promoters: Contain elements like the TATA box, Inr
(initiator element), and DPE (downstream promoter element).
 Proximal promoters: Contain elements like the GC box and
CCAAT-box.
 Bidirectional promoters allow transcription in opposite directions
on both strands.
 3. Transcription Factors (TFs): Proteins with DNA binding domains
(e.g., zinc fingers, helix-turn-helix) and activation domains.
 General TFs: Needed for all transcription.
 Specific TFs: Act as activators or repressors for specific tissues or
signals.
 4. RNA Polymerases:
 Pol I: Transcribes rRNAs.
 Pol II: Transcribes mRNAs, miRNAs, snRNAs, and tissue-specific
genes.
 Pol III: Transcribes tRNAs and 5S rRNAs.
 Mitochondrial RNA polymerase also exists.
 Epigenetic Mechanisms
o These are heritable changes in gene expression that occur without altering the
underlying DNA sequence.
o 1. DNA Methylation: Addition of methyl groups to CpG islands (regions rich in
cytosine and guanine) in promoter regions by DNA methyltransferases.
 Blocks TFs from binding and recruits repressors.
 High methylation in promoter regions silences gene expression.
 Heritable through mitosis (cell division) but not meiosis (gamete
formation).
 Plays a role in development, cell lineage commitment, and genomic
imprinting.
o 2. Histone Modification: Chemical modifications to histone tails (proteins
around which DNA is wrapped).
 Often interpreted as a "histone code" that defines chromatin states.
 Heritable through mitosis.
 Acetylation: Generally activates gene expression.
 Methylation: Can activate or repress based on specific amino acid
residues and degree of methylation.
 Phosphorylation also occurs.
 Like chromatin remodeling, involves writers, erasers, and readers.
o 3. Imprinting: Gene expression depends on whether the gene is inherited from
the father or the mother.
 Results in monoallelic expression, where only one parental allele is
expressed.
 Affects approximately 1% of the mammalian genome, usually found in
clusters.
 Example: The IGF2 gene, where only the paternal allele is expressed.
o 4. X-inactivation: In females, one of the two X chromosomes is inactivated to
balance gene dosage between sexes.
 Controlled by XIST, a long non-coding RNA that silences the chosen X
chromosome.
 Leads to mosaic expression, meaning different cells in the same organism
can have different active X chromosomes, as seen in the fur color patterns
of calico cats.
 Post-Transcriptional Control (in the cytoplasm)
o 1. Pre-mRNA Processing: Occurs in the nucleus before mRNA is exported to the
cytoplasm.
 RNA Splicing: Removal of introns (non-coding regions) and joining of
exons (coding regions) to form mature mRNA.
 Alternative Splicing: Different combinations of exons from a single gene
can be joined to produce different mRNAs, significantly increasing protein
diversity. Important for tissue-specific expression (e.g., in the CNS) and
for making protein isoforms with different functions or localizations.
 5’ Capping: Addition of a modified guanine nucleotide to the 5' end of
mRNA, crucial for stability and translation initiation.
 Poly-A Tail: Addition of a string of adenine nucleotides to the 3' end,
important for mRNA stability and translation.
o 2. mRNA Stability: Regulation of how long an mRNA molecule persists in the
cytoplasm before degradation.
 mRNA can be degraded by 5’→3’ exonucleases (after 5’ cap removal) or
3’→5’ exonucleases (after polyA tail shortens).
 MicroRNAs (miRNAs) regulate mRNA degradation or inhibit translation
by binding to the 3’ UTR of mRNA. One miRNA can target many
mRNAs.
o 3. Translation Control: Regulation of protein synthesis from mRNA.
 miRNAs binding to the 3’ UTR of mRNA, often with RNA-binding
proteins, can inhibit translation initiation at the ribosome or cause mRNA
degradation.
 This provides a faster cellular response compared to transcriptional
control.
o 4. Protein Processing: Modifications to a polypeptide chain after translation to
become a functional protein.
 Misfolded proteins are degraded.
 Covalent modifications: e.g., phosphorylation (addition of a phosphate
group).
 Proteolytic cleavage: Cutting of a protein to generate mature, active
proteins.
o 5. Protein Transport: Signal sequences on proteins direct them to specific
cellular destinations. This process is regulated and essential for proper cell
function.
 Techniques to Measure Gene Expression/Epigenetic Modifications
o At RNA level:
 qPCR: Measures mRNA levels of specific genes; sensitive, specific, and
high-throughput.
 RNA-seq: NGS-based, provides a comprehensive transcriptome profile,
detects alternative splicing, isoforms, and novel transcripts.
 Northern Blot: Less common now, detects RNA size and abundance.
 Microarrays: Hybridization-based method to compare expression
profiles; involves reverse transcription of RNA into cDNA, labeling, and
hybridization to a microarray chip.
o At Protein level:
 Western Blot: Detects specific proteins with antibodies; semi-
quantitative.
 ELISA (Enzyme-Linked Immunosorbent Assay): Quantitative analysis
of protein concentration.
 Mass Spectrometry: High-resolution protein identification and Post-
Translational Modification (PTM) mapping.
 Immunohistochemistry/Immunofluorescence: Spatial visualization of
protein expression in tissues.
o Epigenetic Modification Measurement:
 Bisulfite Sequencing: Detects DNA methylation by converting
unmethylated cytosines to uracil.
 ChIP-Seq (Chromatin Immunoprecipitation Sequencing): Measures
histone modifications and transcription factor binding sites.
 ATAC-Seq/DNase-Seq: Measures chromatin accessibility.
 MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing):
Captures methylated DNA regions.

Discussion Questions and Answers:

1. Question: Explain the concept of Topologically Associating Domains (TADs) and their
importance in gene regulation. What role do boundary proteins play, and what happens if
TADs are disrupted?
o Answer: Topologically Associating Domains (TADs) are 3D chromatin
domains within the nucleus that bring enhancers close to their target promoters,
facilitating gene regulation. They are crucial for ensuring proper gene expression
patterns, are typically conserved across species, and are delineated by boundary
proteins (or boundary elements/insulators). These boundary proteins, often bound
by CTCF transcription factor, act as borders that prevent the spread of
heterochromatin and block enhancers from activating unintended
genes. Disruption of TADs can lead to abnormal gene regulation, such as ectopic
gene expression, and is linked to rare diseases.
2. Question: Describe the major epigenetic mechanisms that regulate gene expression. How
do DNA methylation and histone modifications work, and what is their impact on gene
activity?
o Answer: The major epigenetic mechanisms are DNA methylation, histone
modification, imprinting, and X-inactivation.
 DNA methylation involves adding methyl groups to CpG islands,
typically in promoter regions, by DNA methyltransferases. This can
silence genes by blocking transcription factors from binding or by
recruiting repressor proteins. High methylation in promoters correlates
with gene silencing and is heritable through mitosis.
 Histone modifications are chemical changes to histone tails, such as
acetylation, methylation, and phosphorylation. These modifications are
interpreted as a "histone code" that defines the chromatin state. For
example, acetylation generally opens chromatin and activates gene
expression, while methylation can either activate or repress gene
expression depending on the specific amino acid and degree of
modification. These modifications influence DNA accessibility for
transcription and are also heritable through mitosis.
3. Question: Discuss the different levels at which gene expression is regulated,
differentiating between transcriptional and post-transcriptional control. Provide examples
of specific mechanisms at each level.
o Answer: Gene expression is regulated at multiple levels to control when, where,
and how much a gene is expressed.
 Transcriptional Control (in the nucleus) determines if and how much
mRNA is made from a gene. Key mechanisms include:
 Chromatin remodeling, where the open (euchromatin) or
condensed (heterochromatin) state of DNA influences accessibility
for RNA polymerase, often through histone modifications like
acetylation (activation) or methylation (activation/repression).
 Promoter accessibility, as transcription factors bind to core and
proximal promoter elements to initiate or block transcription.
 The activity of various RNA polymerases (Pol I, II, III) dictates
which types of RNA are transcribed.
 Post-Transcriptional Control (in the cytoplasm) regulates gene
expression after mRNA has been transcribed. Mechanisms include:
 Pre-mRNA processing, such as RNA splicing (removal of introns
and joining of exons to form mature mRNA) and alternative
splicing (producing different protein isoforms from a single gene).
5' capping and poly-A tail addition also regulate stability and
translation.
 mRNA stability, controlled by how long an mRNA molecule
persists before being degraded by exonucleases.
 Translation control, where mechanisms like microRNAs
(miRNAs) bind to the 3' UTR of mRNAs to either inhibit
translation initiation or cause mRNA degradation, providing a fast
cellular response.
 Protein processing (e.g., misfolded protein degradation, covalent
modifications like phosphorylation, or proteolytic cleavage)
ensures proteins are functional.
 Protein transport directs proteins to their correct cellular
destinations.

4. Genetic Variation

This section delves into the origins and types of genetic variations, their consequences,
mechanisms of repair, and how these variations are studied using various molecular techniques.
It also touches on comparative genomics and ancient DNA.

Key Concepts:

 Genetic Variation
o Origin:
 Replication errors: DNA polymerase mispairs bases (can be fixed by
DNA mismatch repair).
 Replication slippage: Occurs typically in Short Tandem Repeats (STRs),
leading to insertions or deletions (indels).
Chromosome segregation/recombination errors: Can cause inversions,
deletions, and translocations, or aneuploidy (e.g., non-disjunction leading
to Down, Turner, Klinefelter syndromes).
 Endogenous DNA damage: Spontaneous loss of bases, Reactive Oxygen
Species (ROS), deamination (C→U, C→T).
 External mutagens: Ionizing radiation (breaks DNA), UV radiation
(forms thymine dimers), hydrocarbons from smoke/pollution.
o Types of Mutations:
 1. Substitutions (SNVs - Single Nucleotide Variants): Change a single
nucleotide.
 Transition: Purine to purine (A↔G) or pyrimidine to pyrimidine
(C↔T).
 Transversion: Purine to pyrimidine or vice versa.
 2. Indels: Insertions or deletions of nucleotides, often causing frameshifts.
 3. Copy Number Variants (CNVs - type of structural variants): Large-
scale duplications or deletions of DNA segments, also include inversions
and translocations.
 4. Tandem repeat expansions: Increase in the number of short repetitive
DNA sequences, causing diseases like Huntington's and myotonic
dystrophy (often due to RNA toxicity).
 Balanced mutations: No net gain or loss of DNA, but can still disrupt
genes (e.g., inversions, reciprocal translocations).
 Unbalanced mutations: Change in copy number, typically cause disease
(e.g., deletions, duplications, aneuploidy).
 Consequences of Genetic Variation
o Loss of Function:
 Nonsense mutation: Introduces a premature stop codon, which can trigger
nonsense-mediated mRNA decay if it occurs early in the coding region
(e.g., in ASS1 leading to citrullinemia).
 Missense mutation: Changes a single amino acid.
 Synonymous: No amino acid change, low risk.
 Tolerated: Chemically similar substitution, mild or no effect.
 Not tolerated: Major disruption in function, high risk (e.g., in beta
thalassemia).
 Splice site mutation: Disrupts intron/exon boundaries (5’ splice donor or
3’ splice acceptor). Leads to exon skipping, new exons, or enlarged exons.
 Frameshift mutations: Shift the reading frame due to indels, usually
leading to an early stop codon and truncated or non-functional proteins.
o Gain of Function:
 Structural rearrangements: Can lead to chimeric genes (combinations of
two or more distinct genes) or ectopic expression (gene expressed where
it's not normally). Example: Enhancers placed next to oncogenes.
 RNA toxicity: Usually from unstable repeat expansions where abnormally
long RNA transcripts form stable secondary structures (hairpins) that trap
RNA-binding proteins, disrupting RNA processing (e.g., myotonic
dystrophy).
 Gene duplications: Lead to more transcripts (e.g., IGF2 duplication).
 Missense mutations: Can change protein functions to a detrimental gain
(e.g., dominant oncogenes).
 Repair Mechanisms
o Base-excision repair: Fixes modified bases (e.g., deaminated C to U).
o Nucleotide-excision repair: Removes bulky lesions (e.g., UV damage).
o Mismatch repairs: Fixes replication errors.
o Nonsense-mediated decay: Degrades transcripts with premature stop codons.
 Population Genetic Variation
o Negative (Purifying) Selection: Removes harmful alleles from a population,
leading to conserved regions. Causes Runs of Homozygosity (ROH), long
stretches of identical chromosome copies, often seen in bottlenecked or inbred
populations. This process is called "purging".
o Positive (Adaptive) Selection: Favors beneficial mutations, leading to "selective
sweeps" where advantageous alleles become more common and heterozygosity is
reduced in selected regions. Example: Olfactory genes in dogs.
 Chromosomal Abnormalities & Structural Variations
o Chromosome Types and Structure:
 Euploidy: Complete set of chromosomes (e.g., 46 in humans).
 Aneuploidy: Missing or extra individual chromosomes (e.g., monosomy,
trisomy, nullisomy). Examples: Trisomy 21 (Down syndrome), Trisomy
13 (Patau syndrome), Trisomy 18 (Edwards syndrome), Monosomy X
(Turner syndrome).
 Structural abnormalities: Can be balanced (e.g., inversions, reciprocal
translocations, no gain/loss of genetic material) or unbalanced (e.g.,
deletions, duplications, Robertsonian translocations, gain/loss of material).
 A/B chromatin compartments: Active (A) vs. repressed (B) chromatin
regions.
 CTCF boundaries: DNA insulators critical for organizing TADs.
o Chromosomal Translocation:
 Reciprocal: Swap between two non-homologous chromosomes.
 Robertsonian: Fusion between short arms of acrocentric chromosomes
(can cause trisomies).
o TADs (Topologically Associating Domains): 3D genome units that restrict
enhancer-promoter activity. Disruption can cause ectopic gene expression, cancer,
and developmental disorders. Studied with Hi-C sequencing.
 Techniques to Study Chromosomal Arrangements
o Karyotyping: Visualizes the full set of chromosomes in a cell, useful for
detecting large-scale changes (e.g., G-banding) and balanced abnormalities.
o Chromosome painting: Uses fluorescently labeled DNA probes that bind to
specific chromosomes, detecting translocations or fusions.
o FISH (Fluorescence in situ Hybridization): Uses DNA probes with fluorescent
dyes to bind to DNA sequences on chromosomes, detecting translocations,
inversions, duplications, deletions, and CNVs.
o Comparative mapping: Aligns genes or markers across species to identify
conserved synteny (blocks of genes with the same order). Tools include genome
browsers, sequence alignments (e.g., BLAST), linkage maps, and sequencing
data.
o CGH arrays (Comparative Genomic Hybridization arrays): Detects copy
number differences smaller than 5 Mb.
o Whole Genome Sequencing (WGS) and Hi-C sequencing are also used.
 Comparative Genomics
o Definition: Study of genetic differences and similarities across species.
o Purpose: Identify conserved genes and regulatory elements, understand genome
evolution and function, and select optimal model organisms.
o Purifying/Negative Selection: Removes harmful mutations, preserving
functionally important sequences (coding and non-coding). Indicated by a Ka/Ks
ratio < 1 (Ka: nonsynonymous substitutions, Ks: synonymous substitutions).
o Positive/Adaptive Selection: Drives lineage-specific adaptation and the
development of unique traits suited to an environment (e.g., expanded olfactory
genes in dogs). Indicated by a Ka/Ks ratio > 1.
o G-value paradox: No direct link between gene number and organism complexity;
complexity primarily comes from regulation, not gene count (e.g., humans vs.
wheat).
o Gene duplication: Generates paralogs (within species) and orthologs (across
species). Leads to new functions (neofunctionalization), redundant backup
(subfunctionalization), and increased complexity.
o Genome evolution: Includes gene duplication, exon duplication/shuffling
(rearranging exons to produce new genes, increasing functional complexity),
expansion of noncoding regulatory regions (especially in vertebrates), and the G-
value paradox. Also involves lineage-specific changes (e.g., in immune response,
toxin degradation, sensory genes).
o Animal models: Comparative models aid cross-species annotations and disease
gene identification. Dogs (inbred breeds useful for Mendelian diseases) and pigs
(close to humans, useful in metabolic/obesity studies) are examples.
 Techniques to Study DNA Variation
o DNA-Seq (WGS): Detects SNVs, indels, and structural variations; replacing
Sanger sequencing in diagnostics.
o RFLP (Restriction Fragment Length Polymorphism): Restriction enzymes cut
DNA at specific sequences, and variations change band patterns on gels; largely
replaced by SNP chips and sequencing.
o SNP chip: High-throughput genotyping of known SNPs, used for population
genetics, GWAS, and parentage testing.
o TaqMan Assay: Uses allele-specific probes with fluorescent dyes (e.g., FAM,
VIC) in qPCR-based genotyping, used in dog disease testing.
 Ancient DNA (aDNA)
o Sources: Good sources are cold environments (permafrost), dry places (deserts,
mummies), and stable, neutral pH conditions (caves). Bad sources are humid/hot
climates, acidic/basic soils, and fluctuating temperatures/humidity.
o Degradation & Damage: aDNA is prone to fragmentation and chemical changes
(e.g., cytosine deamination, which can be reduced with USER enzyme).
o Contamination: From the environment, lab, humans, reagents, and PCR products
(due to DNase and RNAse).
o Good practice: Sterile collection, dedicated clean labs, no PCR products in
aDNA areas, use DNase/RNAse-free materials, and accepting when samples are
too degraded.
o Ancient proteins: More stable than DNA, can persist longer, and detected with
mass spectrometry; reveal metabolic and phylogenetic information (e.g., collagen
in T-rex).
o Ancient RNA (aRNA): Historically thought too fragile, but recent studies found
in mammoths; preserved through fast desiccation, freezing, chemical treatment;
provides insight into gene expression from ancient tissues.
 Ethics: Genetic variation studies raise ethical considerations in areas like dog breeding,
human genetic testing, and gene editing in humans and animals.

Discussion Questions and Answers:

1. Question: Describe the various origins of genetic variation. Differentiate between


balanced and unbalanced mutations and provide examples of how each type can impact
an organism.
o Answer: Genetic variation originates from replication errors (DNA pol
mispairing), replication slippage(in STRs leading to indels), chromosome
segregation/recombination errors (causing inversions, translocations,
aneuploidy), endogenous DNA damage (e.g., deamination, ROS), and external
mutagens(e.g., ionizing radiation, UV light, hydrocarbons).
 Balanced mutations involve no net gain or loss of genetic material, but
DNA segments are rearranged. Examples include inversions (a segment
of a chromosome is reversed) and reciprocal translocations (segments
swap between non-homologous chromosomes). While they do not change
the total amount of DNA, they can still disrupt genes by breaking them or
placing them under new regulatory control.
 Unbalanced mutations involve a change in the copy number of genetic
material. Examples include deletions (loss of a DNA
segment), duplications (gain of a segment), and aneuploidy (missing or
extra individual chromosomes, like Trisomy 21 causing Down syndrome).
These mutations typically have more severe consequences because they
alter gene dosage and usually cause disease.
2. Question: Explain the concepts of "loss of function" and "gain of function" mutations.
Provide specific examples of each and the molecular mechanisms that lead to these
outcomes.
o Answer:
 Loss of function mutations result in a gene product that is non-
functional, partially functional, or absent. Examples include:
 Nonsense mutations, which introduce a premature stop codon,
leading to a truncated protein or triggering mRNA degradation
(e.g., ASS1 mutation causing citrullinemia).
 Frameshift mutations, caused by indels, shift the reading frame,
usually resulting in a non-functional or severely truncated protein.
 Splice site mutations, which disrupt intron/exon boundaries,
leading to incorrect protein products due to exon skipping or
inclusion.
 Gain of function mutations result in a gene product with new or
enhanced activity, or overexpression of a normal gene product. Examples
include:
 Structural rearrangements, which can lead to chimeric
genes (fusions of two distinct genes) or ectopic expression (a gene
expressed in an abnormal location, e.g., an enhancer being placed
next to an oncogene, leading to its overexpression).
 Gene duplications, which result in more transcripts and thus more
protein product (e.g., IGF2 duplication).
 RNA toxicity, where unstable repeat expansions in RNA lead to
abnormal RNA structures that trap RNA-binding proteins,
disrupting normal RNA processing (e.g., myotonic dystrophy).
3. Question: How do comparative genomics studies contribute to our understanding of
genome evolution and function? Discuss the significance of the Ka/Ks ratio and the G-
value paradox in this context.
o Answer: Comparative genomics involves studying genetic differences and
similarities across species to identify conserved genes and regulatory elements.
This helps understand how genomes evolve and function, aiding in the selection
of optimal model organisms.
 The Ka/Ks ratio (ratio of nonsynonymous to synonymous base
substitutions) is a key metric. A ratio < 1 indicates purifying (negative)
selection, where harmful mutations are removed, preserving functionally
important sequences. A ratio > 1 indicates positive (adaptive) selection,
favoring beneficial mutations and driving lineage-specific adaptations.
 The G-value paradox highlights that there is no direct correlation
between the number of protein-coding genes and the complexity of an
organism. For example, humans have ~20,000 genes, while wheat has
~100,000. This suggests that organismal complexity largely arises from
the regulation of genes, rather than simply the total gene count,
emphasizing the importance of non-coding regulatory regions and
alternative splicing in genome evolution.

5. Mapping Mendelian Characters

This section covers the principles of Mendelian genetics, genetic mapping techniques, and the
use of Whole Genome Sequencing (WGS) to identify causative mutations for single-gene
disorders.

Key Concepts:
 Basic Concepts on Mendelian Genetics
o Heritability: The proportion of phenotypic variation in a population attributable
to genetic variation among individuals. It is population-specific and does not
imply inevitability at the individual level. Important in animal breeding for
selecting strategies for desirable traits.
o Mendelian patterns of inheritance: Autosomal dominant/recessive, X-linked
dominant/recessive, Y-linked, and mitochondrial inheritance (from mother).
o Mendel's Laws:
 1. Law of Segregation: Each parent passes one of two alleles to offspring
randomly.
 2. Law of Independent Assortment: Alleles for different genes segregate
independently during gamete formation.
 3. Law of Dominance: Dominant alleles mask recessive ones in
heterozygotes.
o Forces that change allele frequency: Mutation, gene flow, genetic drift, natural
selection, and non-random mating.
o Hardy-Weinberg Equations: Used to describe allele and genotype frequencies
in a stable population.
 Allele frequency: p + q = 1.
 Genotype frequency: p² + 2pq + q² = 1.
 Genetic Heterogeneity
o Different genes or mutations can produce the same phenotype.
o 1. Allele heterogeneity: Different mutations within the same gene lead to the
same phenotype (e.g., over 12 mutations in CFTR cause cystic fibrosis).
o 2. Locus heterogeneity: Mutations in different genes cause the same phenotype
(e.g., retinitis pigmentosa from mutations in over 16 genes).
o 3. Clinical heterogeneity: Different mutations within the same gene lead to
different phenotypes (e.g., different dystrophin mutations cause Duchenne or
Becker muscular dystrophy).
 Haplotype: A combination of alleles at adjacent loci on a chromosome that are inherited
together. Important for tracking inheritance patterns and identifying disease loci.
 Genetic Distance vs. Physical Distance in the Genome
o Genetic distance: Measured in centimorgans (cM), reflects recombination
frequency.
o Physical distance: Measured in base pairs (bp).
o Recombination is not evenly distributed across the genome (has hotspots) and
sex-specific recombination influences genetic maps.
 Other Concepts
o Penetrance: The percentage of individuals with a given genotype who express its
associated phenotype.
o Codominance: Both alleles are fully expressed in the heterozygote.
o Incomplete dominance: Heterozygotes show an intermediate phenotype.
o Variable Expressivity: The degree or intensity of gene expression varies among
individuals with the same genotype.
o Epistasis: One gene masks or modifies the effect of another gene.
o Pleiotropy: One gene affects multiple seemingly unrelated traits.
o Recombination: Exchange of genetic material between homologous
chromosomes during meiosis, forming the basis for estimating distances between
markers on chromosomes via genetic maps.
o Linkage phase: The arrangement of alleles on parental chromosomes (coupling
phase: dominant alleles linked; repulsion phase: one dominant linked to one
recessive allele).
o Informative meiosis: Meiosis that allows determination of whether
recombination has occurred, crucial for linkage analysis.
 Purpose of Genetic Mapping
o Locate genes responsible for traits (e.g., disease loci).
o Determine the relative positions of genes or markers on chromosomes.
o Understand recombination patterns and genome architecture.
o Enable marker-assisted selection in breeding programs.
 Genetic Markers: Polymorphic DNA sequences used to trace inheritance.
o Types: SNPs (single base changes), Microsatellites/STRs (repetitive DNA, very
polymorphic), RFLPs (detected by restriction enzymes).
o Criteria: Must be polymorphic (variable in the population), stable, and easily
genotyped.
 Genetic Maps/Fine Mapping
o Genetic maps: Show the relative positions of markers based on recombination
rates. A map unit of 1 cM (centimorgan) equals 1% recombination frequency.
o Fine mapping: High-resolution mapping to narrow down the location of disease-
causing mutations using dense marker panels and recombination events. Used in
techniques like GWAS and linkage analysis.
 Linkage Mapping
o Main Concepts: Identifies disease loci by examining the inheritance of traits with
nearby genetic markers. It relies on the principle that loci located close together
on the same chromosome are inherited together more often due to reduced
recombination.
 Unlinked genes: On different chromosomes or far apart on the same
chromosome, show independent assortment, with a recombination
frequency of 50%.
 Linked genes: Close together on the same chromosome, usually inherited
together, with a lower recombination frequency. The lower the
recombination frequency, the closer the loci.
o Informative meiosis/recombination: Crucial because it tells you whether
recombination has occurred between a marker and a disease allele.
o LOD score (Logarithm of Odds): Estimates the likelihood of linkage versus no
linkage.
 Formula: log₁₀ (likelihood linked / likelihood unlinked).
 LOD > 3: Significant evidence for linkage (1000:1 odds in favor).
 LOD < -2: Evidence against linkage.
 Values between are inconclusive.
 The peak of a LOD score curve indicates the most probable recombination
frequency.
o Evaluating evidence for linkage in a pedigree: Involves counting informative
meiosis, identifying recombinant vs. parental gametes, visualizing segregation
patterns, and determining the phase (coupling or repulsion).
o Limitations: Requires large, informative families; less useful for complex traits;
needs clear phenotype classification; and has limited resolution, defining only
broad regions unless recombination is very frequent.
 Identification of Mutations for Mendelian Traits by WGS
o Whole Genome Sequencing (WGS): Considered the gold standard for
identifying causative mutations in monogenic (Mendelian) traits.
o Analysis of NGS data: NGS technologies (e.g., Illumina HiSeq) generate billions
of short read sequences, resulting in datasets ranging from gigabytes to terabytes.
This high volume necessitates automated pipelines, high-performance computing,
and systematic data processing strategies.
o Overview of omics data analysis workflow (pipeline):
 1. Quality control of raw reads: Trimming, adapter removal to remove
sequencing errors.
 2. Alignment to reference genome: To identify where mutations are,
creating SAM/BAM files that show aligned reads.
 3. Detect genetic variants: (e.g., SNPs, indels) in aligned reads compared
to the reference genome.
 4. Annotate variants: With potential functional consequences and filter
out irrelevant variants to prioritize those likely to affect gene function or
cause diseases.
o Handling data in multi-node computer clusters: Nodes (computers) are used to
analyze data on multi-node clusters or cloud platforms for parallel processing.
This allows distributing data across nodes, running tools simultaneously, and easy
storage.
o Translating technical data to biological insight: The ultimate goal is to identify
a single causative mutation or meaningful pattern.
 PRA example: In the case of Progressive Retinal Atrophy (PRA) in dogs,
filtering eliminated millions of variants down to one, which was then
biologically validated to cause the disease, leading to the development of a
genetic test to prevent breeding affected dogs.
o WGS workflow:
 Starts with genomic DNA.
 Sonication to shear DNA into smaller fragments.
 End repair to fix fragment ends to be blunt or compatible.
 Ligate sequencing adapters.
 PCR amplification.
 Select appropriately sized DNA fragments.
 Sequencing.
o Example: PRA in Old Danish Pointer:
 Compared 5 cases vs. 5 controls, performed 30X sequencing with Illumina
HiSeq.
 Bioinformatics pipeline (GATK, Ensembl VEP) identified variants.
 Filtering prioritized variants that were homozygous in cases but not in
controls, and were located in known protein-coding regions.
 Found a 1 bp insertion in exon 1 of a specific gene, leading to a nonsense
mutation (premature stop codon). This led to a genetic test for the breed.

Discussion Questions and Answers:

1. Question: Define genetic heterogeneity and differentiate between its three types: allelic,
locus, and clinical heterogeneity. Provide an example for each.
o Answer: Genetic heterogeneity describes situations where different genes or
mutations can produce the same or similar phenotypes.
 Allelic heterogeneity occurs when different mutations within the same
gene cause the same phenotype. For example, over 12 distinct mutations
in the CFTR gene can all lead to cystic fibrosis.
 Locus heterogeneity describes cases where mutations in different
genes result in the same phenotype. An example is retinitis pigmentosa,
which can be caused by mutations in over 16 different genes.
 Clinical heterogeneity refers to situations where different
mutations within the same gene lead to different phenotypes. For instance,
various mutations in the dystrophin gene can cause either Duchenne
muscular dystrophy (more severe) or Becker muscular dystrophy (milder).
2. Question: Explain the principles of linkage mapping and the significance of the LOD
score in establishing genetic linkage. What are the main limitations of this technique?
o Answer: Linkage mapping aims to identify disease loci by observing how a trait
(like a disease) co-segregates with known genetic markers within families. It
operates on the principle that genes or markers located physically close on the
same chromosome (linked genes) will be inherited together more frequently than
if they were unlinked, due to reduced recombination between them.
The recombination frequencybetween linked loci is less than 50%.
 The LOD score (Logarithm of Odds) is a statistical measure used to
determine the likelihood of linkage. A LOD score of 3 or
greater (meaning the odds of linkage are 1000 times higher than the odds
of no linkage) is considered significant evidence for linkage, while a score
of -2 or lessprovides evidence against linkage.
 Limitations of linkage mapping include the need for large, informative
families, its reduced utility for complex traits (which have multiple genetic
and environmental factors), the requirement for clear phenotype
classification, and its limited resolution, often only narrowing down broad
chromosomal regions.
3. Question: Outline the general workflow for identifying causative mutations in Mendelian
traits using Whole Genome Sequencing (WGS). How does bioinformatics play a critical
role in this process, using the PRA example?
o Answer: The WGS workflow for identifying causative mutations in Mendelian
traits typically starts with obtaining genomic DNA, which is then fragmented
(e.g., by sonication). The ends of these fragments are repaired, and sequencing
adapters are ligated. After PCR amplification and size selection, the DNA
fragments are sequenced.
 Bioinformatics is critical for processing the massive amounts of data
generated by WGS. The workflow involves several steps: quality
control of raw reads (trimming low-quality bases and adapter
sequences), alignment of reads to a reference genome (creating
SAM/BAM files), variant detection (identifying SNPs and indels by
comparing aligned reads to the reference), and variant annotation and
filtering (assigning potential functional consequences and prioritizing
variants that are likely to affect gene function or cause disease).
 In the PRA example, WGS was performed on affected dogs and controls.
Bioinformatics tools were used to align reads, detect variants, and then
filter these variants to find ones that were homozygous in affected dogs
but absent in controls, particularly prioritizing those in protein-coding
regions. This systematic filtering and annotation, powered by
bioinformatics, enabled the identification of a 1 bp insertion leading to a
nonsense mutation, which was then validated as the causative mutation.
This process demonstrates how bioinformatics translates raw sequencing
data into biologically meaningful insights and ultimately can lead to
practical applications like genetic tests.

6. Mapping Genes for Complex Phenotypes

This section focuses on the genetic basis of complex traits, which are influenced by multiple
genes and environmental factors. It covers concepts like Linkage Disequilibrium (LD), Genome-
Wide Association Studies (GWAS), and the challenges in mapping these traits.

Key Concepts:

 Purpose of Mapping Genes for Complex Phenotypes


o Identify genetic loci that contribute to phenotypic variation in traits influenced by
many genetic and environmental factors.
o Helps detect QTLs (Quantitative Trait Loci) and common SNPs associated with
diseases.
o Complements approaches used for Mendelian traits by targeting non-Mendelian,
multifactorial phenotypes.
o Importance: Understand disease mechanisms, predict risks for stratified
medicine, inform breeding strategies (especially in animals), and identify drug
targets and therapeutic strategies.
 Complex Phenotype/Disease
o Definition: A phenotype affected by both genetic and environmental factors.
o Includes:
 Quantitative traits: Measured on a continuous scale (e.g., BMI, blood
pressure, height).
 Complex dichotomous traits: Can be separated into two categories (e.g.,
diabetes, heart disease).
o Key Concepts:
 Multifactorial inheritance: Traits influenced by many genes plus
environmental factors, leading to continuous variation.
 QTL (Quantitative Trait Loci): Regions of the genome associated with
variation of a quantitative trait. These traits can be controlled by many loci
with small additive effects, studied by linkage and association
mapping. QTL mapping is used in animal genetics to locate these loci.
 Family clustering: Complex traits often show family clustering,
suggesting a genetic component, but shared environmental factors must
also be considered (e.g., through twin/adoption studies).
 Phenotypes cannot always be classified into a diagnostic box because they
can be continuous.
 Obesity Case Study
o Definition: Abnormal or excess fat accumulation that may impair health, often
classified by BMI (BMI > 30 is Class I).
o Causes (multifactorial): Genetic predisposition (e.g., polygenic inheritance)
interacting with environmental/lifestyle factors (diet, activity), hormone/metabolic
dysregulation, altered gut-brain axis and microbiome, and neuroendocrine
regulation (e.g., leptin, ghrelin, GLP1).
o Consequences (many comorbidities): Type 2 diabetes, NAFLD (non-alcoholic
fatty liver disease), cardiovascular disease, OSA (obstructive sleep apnea), PCOS,
cancer, joint issues, depression.
o Treatments:
 Lifestyle changes: Diet, physical activity, behavioral therapy (first line,
but high relapse rates).
 Pharmacotherapy: GLP-1 receptor agonists like semaglutide (increase
satiety, delay gastric emptying, increase appetite). Semaglutide effects
include lower glucagon, increased insulin, suppressed appetite, and
promoted satiety.
 Surgery: For severe cases (e.g., bariatric procedures like gastric bypass).
o Case study: Gottingen Minipig Model: Used in preclinical drug discovery to
manipulate diet and test pharmacological interventions. Highlights semaglutide's
role in reducing weight while preserving lean mass and maintaining energy
expenditure. Semaglutide reduced food intake and decreased weight gain,
preserved fat-free mass, and led to better maintenance of resting metabolic rate
(RMR) under weight loss conditions.
 Preclinical Drug Discovery
o Target to treatment pipeline: Identify genes with key roles in disease, confirm
modification affects disease outcomes, find molecules that interact with the gene,
improve potency and selectivity, perform preclinical testing (safety/efficacy in
animal models) before clinical trials.
o Energy expenditure: Important for weight maintenance, includes resting
metabolic rate, thermic effect of food, and physical activity. Can be estimated
using the Weir equation.
o Animal models: Used to mimic obesity physiology (e.g., pigs are good because
they don't have brown adipose tissue like humans; rodents). Gottingen Minipig is
a model closer to human metabolic profile.
 Linkage Disequilibrium (LD)
o Definition: Non-random association of alleles at different loci in a population.
Alleles are in LD if they occur together more or less often than expected by
chance.
o Importance in GWAS: GWAS detects SNPs that are in LD with a causal variant,
not necessarily the causal variant itself. LD patterns vary by species, breed,
population history, and recombination rate.
o Measures: D’ and r² quantify LD.
o Decay: LD decays over time due to recombination; the extent of decay affects the
resolution of association studies.
 GWAS (Genome-Wide Association Studies)
o Association: Co-occurrence of a SNP and a phenotype.
o Causes of association: Direct causation, natural selection/epistatic effect,
population stratification, Type 1 error (false positives), and linkage disequilibrium
(LD).
o Reliance on LD: GWAS scans the entire genome to identify mutations in LD that
are linked to diseases. It uses probes matching DNA with known SNPs as
markers, assuming that disease-causing alleles are inherited along with
neighboring alleles in haploblocks (large stretches of DNA inherited together).
Increased distance between markers generally decreases LD.
 Bottleneck populations have limited recombination over generations,
resulting in long haploblocks, requiring smaller sample sizes because one
marker can cover a large region due to long LD. Humans have shorter LD
blocks, requiring many SNP markers. Cross-breeding can increase
recombination to narrow down candidate regions.
o Problem of "hidden heritability": GWAS often explains only a small
proportion of the heritability estimated from family studies.
 Possible causes: Small effect sizes missed due to lack of power, gene-
gene (epistatic) and gene-environment interactions, structural variants or
rare variants not captured, and epigenetics or non-additive genetic
architecture.
o Multiple testing problem: GWAS tests millions of SNPs, which increases the
chance of false positives (Type 1 error). False positives can also arise from
population stratification (differences in allele frequencies due to ancestry).
 Correction: P-values are used to create Manhattan plots to highlight
significant associations. Corrections include Bonferroni correction
(conservative), permutation tests, and False Discovery Rate (FDR)
control.
o Uses of GWAS: Identify genetic loci associated with disease susceptibility,
generate biological hypotheses, predict risk (polygenic risk scores), and guide
precision medicine and breeding programs.
o Limitations of GWAS: Does not directly identify the causal variant. Has less
power to detect rare variants, epistasis, or environmental interactions. Often
identifies variants in non-coding regions, making biological interpretation
challenging. Can be confounded by population stratification and phenotyping
challenges (e.g., diagnostic thresholds, continuous vs. binary traits).
o Outcome of a GWAS analysis: Identification of significant SNPs associated with
a phenotype, visualized via Manhattan plots.
o Follow-up steps: Identify haplotype blocks with associations, sequence candidate
regions to find causal variants, and perform functional studies to determine
biological effects (e.g., TaqMan validation).
o Example: MMVD in Cavalier King Charles Spaniels: GWAS identified SNPs
in LD with a causal variant. Subsequent genotyping revealed a HYAL4
variant associated with disease risk. The risk estimate (odds ratio) was translated
into relative risk.

Discussion Questions and Answers:

1. Question: Explain what a "complex phenotype" or "complex disease" is, providing


examples. How does the concept of Quantitative Trait Loci (QTLs) relate to these
conditions, and what is the role of family clustering in their study?
o Answer: A complex phenotype or complex disease is a trait or condition
influenced by multiple genetic and environmental factors, rather than a single
gene. Examples include quantitative traits like BMI and blood
pressure (measured on a continuous scale) or complex dichotomous traits
like diabetes and heart disease (where individuals either have the condition or
not).
 Quantitative Trait Loci (QTLs) are regions of the genome that are
associated with variation in a quantitative trait. Complex traits are often
controlled by many QTLs, each having a small additive effect.
 Family clustering is common for complex traits, suggesting a genetic
component. However, when studying family clustering, it's crucial to also
consider shared environmental factors (e.g., diet, lifestyle) that might
contribute to the observed patterns, which can be distinguished using
studies like twin or adoption studies.
2. Question: Describe Linkage Disequilibrium (LD) and its critical importance in Genome-
Wide Association Studies (GWAS). How do factors like population bottlenecks influence
LD patterns and, consequently, GWAS design?
o Answer: Linkage Disequilibrium (LD) refers to the non-random association of
alleles at different loci in a population, meaning certain combinations of alleles
occur together more or less frequently than would be expected by chance.
In GWAS, LD is critical because the studies often do not identify the causal
variant directly but instead detect SNPs that are in LD with the causal variant.
These SNPs act as markers, indicating the presence of a nearby disease-causing
allele within a haploblock (a large stretch of DNA inherited together).
 Population bottlenecks (severe reductions in population size)
significantly influence LD patterns. In such populations, recombination is
limited over generations, leading to longer haploblocks and extended
regions of high LD. This means that fewer SNP markers are needed to
cover the genome in GWAS, and smaller sample sizes can be sufficient
because one marker can effectively represent a larger genomic region. In
contrast, human populations generally have shorter LD blocks, requiring a
greater density of SNP markers for GWAS.
3. Question: Discuss the "problem of hidden heritability" and the "multiple testing
problem" in the context of GWAS. How are researchers attempting to address these
challenges?
o Answer:
 The "problem of hidden heritability" refers to the observation that
GWAS typically explains only a small proportion of the heritability
estimated from family studies for complex traits. Possible reasons include
that GWAS may miss small effect sizes due to insufficient statistical
power, fail to capture gene-gene (epistatic) or gene-environment
interactions, overlook structural variants or rare variants, or not account
for epigenetic factors or non-additive genetic architecture.
 The "multiple testing problem" arises because GWAS involves testing
millions of SNPs across the genome, which dramatically increases the
chance of obtaining false positive associations (Type 1 errors) simply by
chance. This problem can be exacerbated by population stratification,
where allele frequency differences between subgroups of differing
ancestries can lead to spurious associations.
 Researchers address the multiple testing problem by using stringent
statistical corrections, such as Bonferroni correction, permutation tests,
or controlling the False Discovery Rate (FDR), to assess the significance
of associations after performing millions of tests. Addressing hidden
heritability requires integrating data from different "omics" layers (e.g.,
epigenomics, transcriptomics), considering rare variants, and exploring
complex interaction models beyond simple additive effects.

7. Biomarkers

This section defines biomarkers, explores their various types and applications, and focuses on
microRNAs as promising diagnostic and therapeutic tools in personalized medicine.

Key Concepts:

 Biomarker Definition: A characteristic that is measured and evaluated to indicate


normal biological processes, pathogenic processes, or pharmacological responses to a
therapeutic intervention.
 Types:
o Diagnostic: Defines the presence or type of disease (e.g., PSA for prostate
cancer).
o Prognostic: Predicts disease outcome (e.g., HER2 in breast cancer).
o Predictive: Indicates the likelihood of response to a specific treatment.
o Mechanistic: Provides insight into molecular mechanisms of disease or drug
action.
o Safety: Monitors response to adverse or toxic drug effects.
 Examples:
o Physiological: Body temperature, blood pressure (BP), heart rate (HR).
o Biochemical: Proteins (e.g., PSA), hormones (e.g., Insulin, cortisol),
carbohydrates (e.g., glucose), lipids (e.g., cholesterol), nucleic acids (e.g., DNA,
mRNA, miRNA), epigenetic markers (e.g., DNA methylation, histone
modifications), and metabolites (e.g., creatinine).
 Personalized Medicine: A medical approach that tailors treatment to the individual
characteristics of each patient.
 Challenges of Pharmacotherapy:
o Disease heterogeneity: Different molecular mechanisms in patients with the
same diagnosis.
o Blockbuster drug model: "One size fits all" drugs often fail in large subgroups
of patients.
o High failure rates in drug development: Only ~5% success from Phase 1 to
approval.
o Over & mistreatment: Often due to a lack of predictive biomarkers.
o Unknown mechanism: Insufficient understanding of pathophysiology makes it
hard to design targeted drugs.
 Companion Diagnostic (CDx): A test developed alongside a therapeutic drug to identify
patients likely to benefit from the drug, minimize adverse effects, and guide dose
optimization.
o Characteristics: Can be expensive, shows minimal efficacy in biomarker-
negative patients but high benefit in stratified subgroups. Avoidance of treatment
failure has critical clinical consequences.
 Main Features of MicroRNAs (miRNAs)
o Definition: Small, noncoding RNAs (18-25 nucleotides) that function post-
transcriptionally by binding to target mRNA at the 3’ UTR, causing degradation
or translational repression. Highly conserved across species.
o Biogenesis of miRNAs:
 1. Transcription: miRNA genes are transcribed by RNA Pol II into
a primary-miRNA (pri-miRNA).
 2. Drosha processing: The pri-miRNA is cleaved by the RNase III
enzyme Drosha into a pre-miRNA (a 70nt hairpin structure) in the
nucleus.
 3. Export: The pre-miRNA is transported to the cytoplasm by Exportin-
5.
 4. Dicer processing: The RNase III enzyme Dicer cleaves the pre-miRNA
into a mature miRNA duplex.
 5. RISC loading: One strand of the miRNA duplex (the guide strand) is
incorporated into the RNA-induced silencing complex (RISC).
 6. Function: The RISC-miRNA complex targets mRNA via sequence
complementarity to modulate gene expression.
o Functions/Circulating miRNAs:
 Regulate key processes in development, immune response, inflammation,
and cancer.
 Circulating miRNAs are found stably in body fluids like blood
(serum/plasma), urine, feces, saliva, and cerebrospinal fluid (CSF).
 They are very stable extracellularly because they are protected in vesicles
(exosomes) or bound to proteins.
 Useful as non-invasive biomarkers (e.g., miRNA-21 in breast cancer).
o Therapeutic Use:
 miRNA mimics: Designed to enhance the expression of downregulated
miRNAs.
 AntagomiRs/anti-miRs: Designed to inhibit overexpressed miRNAs.
 Clinical trials are ongoing for miRNA-targeted therapies in cancer,
fibrosis, and viral infections.
 Examples: miRNA-21 upregulation linked to trastuzumab resistance in
HER2+ breast cancer; miRNA-27a associated with GI cancer progression
in dogs.
 Techniques to Detect miRNAs:
o 1. RT-qPCR: Considered the gold standard; sensitive, quantitative. Challenges
include the short miRNA length and lack of a poly-A tail.
o 2. Small RNA-seq: Global, unbiased profiling method; generally less sensitive
than qPCR for low-abundance miRNAs.
o 3. ISH (in situ hybridization): Provides spatial localization in tissues, using LNA
probes or RNAscope for visualization.
o 4. Microarrays: High-throughput method, but less common now due to the rise
of RNA-seq.
o 5. Digital PCR: Very quantitative and useful in scenarios with low RNA input.
o Many proteins and RNAs can be detected simultaneously, allowing researchers to
study gene expression and protein localization in a single experiment or combine
marker detection.

Discussion Questions and Answers:

1. Question: Define what a biomarker is and explain its various types, providing a specific
example for each.
o Answer: A biomarker is a measurable characteristic that indicates a normal
biological process, a pathogenic process, or a pharmacological response to
therapy.
 Diagnostic biomarkers define the presence or type of disease
(e.g., PSA for prostate cancer).
 Prognostic biomarkers predict the likely outcome or course of a disease
(e.g., HER2 status in breast cancer predicts aggressive tumor behavior).
 Predictive biomarkers indicate the likelihood of a patient responding to a
specific treatment (e.g., EGFR mutations predicting response to certain
lung cancer drugs).
 Mechanistic biomarkers offer insight into the molecular mechanisms
underlying a disease or drug action (e.g., measuring the activation of a
specific signaling pathway affected by a drug).
 Safety biomarkers monitor responses to adverse or toxic drug effects
(e.g., creatinine levels to monitor kidney function during drug treatment).
2. Question: Discuss the role of biomarkers in addressing the challenges of
pharmacotherapy and advancing personalized medicine. What is a "companion
diagnostic," and what are its key characteristics?
o Answer: Biomarkers are crucial for addressing key challenges in
pharmacotherapy and driving personalized medicine, which aims to tailor
medical treatment to individual patient characteristics. Pharmacotherapy faces
issues like disease heterogeneity (same diagnosis, different molecular
mechanisms), the failure of "one-size-fits-all" blockbuster drugs, high failure rates
in drug development, and over/mistreatment due to a lack of precise guidance.
 Biomarkers help overcome these by allowing patient stratification,
ensuring the right drug reaches the right patient. A companion diagnostic
(CDx) is a specific type of test developed concurrently with a therapeutic
drug. Its purpose is to identify patients who are most likely to benefit from
the drug, minimize adverse effects, and optimize dosing. Key
characteristics of a CDx include its potential to be expensive, showing
minimal efficacy in biomarker-negative patients while demonstrating high
benefit in a stratified subgroup, and playing a critical role in avoiding
treatment failure.
3. Question: Describe the biogenesis of microRNAs (miRNAs) and explain why circulating
miRNAs are considered useful as non-invasive biomarkers.
o Answer: The biogenesis of miRNAs is a multi-step process:
 It begins with transcription of miRNA genes by RNA Polymerase II into
a primary-miRNA (pri-miRNA).
 In the nucleus, the pri-miRNA is processed by the Drosha enzyme into a
hairpin-shaped pre-miRNA.
 The pre-miRNA is then exported from the nucleus to the cytoplasm
by Exportin-5.
 In the cytoplasm, the Dicer enzyme cleaves the pre-miRNA into a mature
miRNA duplex (18-25 nt).
 Finally, one strand of this duplex (the guide strand) is loaded into
the RNA-induced silencing complex (RISC), which then targets
complementary mRNA sequences for degradation or translational
repression. Circulating miRNAs are found stably in various body fluids
like blood (serum/plasma), urine, and saliva. They are remarkably stable
extracellularly because they are protected within exosomes (small
vesicles) or bound to proteins. This stability and their presence in easily
accessible fluids make them highly useful as non-invasive
biomarkers for conditions like cancer, as exemplified by miRNA-21 in
breast cancer. They offer a less invasive alternative to tissue biopsies for
diagnosis, prognosis, and therapeutic monitoring.
8. Cancer Genetics and Genomics

This section explores cancer as a genetic disease, its evolutionary nature, the hallmarks of
cancer, the roles of oncogenes and tumor suppressors, cell plasticity, the tumor
microenvironment, and various techniques used to study cancer at a molecular level.

Key Concepts:

 Define Cancer/Evolution of a Cancer/Basic Concepts in Cancer


o Cancer: A collection of related diseases characterized by the accumulation
of somatic mutations. It represents a failure of growth control where normal cells
acquire the ability to bypass proliferation signals, evade apoptosis, invade tissues,
and metastasize. Progression often follows Normal → Dysplasia → Carcinoma.
o Evolution of a cancer: Follows Darwinian selection at the cellular level, where
random mutations arise, and those providing growth or survival advantages
become fixed in the cell population. It is a multistage, multistep
process (Normal epithelial → Dysplasia → Invasive cancer).
o Hallmarks of Cancer (Hanahan & Weinberg): All cancers must acquire these
characteristics for malignant growth.

1. Self-sufficiency in growth signals

2. Insensitivity to anti-growth signals

3. Evasion of apoptosis

4. Limitless replicative potential

5. Sustained angiogenesis (formation of new blood vessels)

6. Tissue invasion and metastasis.

o Causes of Mutation: Environmental mutagens, errors in DNA replication,


familial predisposition, and epigenetic dysregulation (gene expression changes
without altering DNA sequence; cancer cells hijack mechanisms to silence tumor
suppressors or activate oncogenes).
 Therapeutically Target Non-Mutational Mechanisms
o These include epigenetic changes, cell plasticity, and the tumor microenvironment
(TME).
o 1. Epigenetic alterations: DNA methylation and histone modifications can
silence tumor suppressors or activate oncogenes. These changes are reversible.
o 2. Stress-induced adaptations: Temporary adaptive states from acidosis,
hypoxia, and therapy stress can enhance survival without new mutations.
o 3. lncRNAs & non-coding RNAs: Regulate chromatin, transcription, and cell
fate (e.g., LINC00673 is essential for cancer initiation in pancreatic models).
o Biomarkers: Based on genetic profiles of TME composition.
o Therapeutics: Epigenetic drugs, metabolic therapies targeting acidic tumor
environments, and agents that modulate the TME to improve immune
infiltration. Combination therapies are often more effective, addressing both
genetic and non-genetic changes.
 Genes Involved in Cancer
o TP53 (encodes p53 protein): Known as the "guardian of the genome". It is a
transcription factor that preserves genome integrity by coordinating cellular
responses (cell cycle arrest or apoptosis) to DNA damage, oxidative stress,
hypoxia, and oncogene activation.
 P53 prevents proliferation by:

 1. Cell cycle arrest (especially at G1/S checkpoint): Induces


transcription of CDK inhibitor, preventing G1 from moving to S
phase, giving cells time to repair DNA.
 2. Upregulating DNA repair genes: (e.g., GADD45) to fix DNA
damage during arrest.
 3. Apoptosis (programmed cell death): If damage cannot be
repaired, p53 upregulates pro-apoptotic genes (BAX, PUMA,
NOXA) to prevent malignant transformation.

 Loss of p53: Often due to missense mutations in TP53, leads to continued


replication of damaged DNA. Associated with Li-Fraumeni
syndrome (multiple tumors early in life), which is an inherited germline
mutation in TP53.
o Oncogenes: Derived from proto-oncogenes (normal genes involved in cell growth
and proliferation). Become oncogenes when mutated or dysregulated, leading to
a gain of function.
 Activation mechanisms:

 Point mutations: Single nucleotide changes that alter amino acids,


causing a growth signal without external stimuli.
 Gene amplification: Multiple copies of the gene are produced,
leading to excess protein production (e.g., MYCN in
neuroblastoma).
 Chromosomal translocations: Segments from two different
chromosomes break and rejoin incorrectly, creating a fused gene.
 Enhancer hijacking: An enhancer from one gene is aberrantly
placed next to an oncogene, driving its overexpression (e.g.,
immunoglobulin enhancers driving oncogene expression).

o Tumor suppressors: Normally restrain cell proliferation, repair damaged DNA,


and promote apoptosis.
 Silencing mechanisms:

 Mutations: Nonsense, missense, frameshift mutations (often


in TP53).
 Promoter hypermethylation: Of CpG islands, preventing
transcription (epigenetic silencing).
 Loss of Heterozygosity (LOH): The second normal allele is lost
after one is already mutated, through deletion, recombination in
mitosis, etc..

 Knudson's "two-hit hypothesis": Frequently, biallelic inactivation (both


alleles silenced) is required for tumor suppressor loss of function.
o Genomic instability: Unstable genomes have higher mutation rates,
chromosomal rearrangements, and defects in DNA repair pathways, all
contributing to cancer.
 Cell Plasticity Role in Cancer (Luis Arnes seminar)
o Cell Plasticity: The ability of a differentiated cell to dedifferentiate,
transdifferentiate, or adopt different identities under stress or injury, often without
new mutations.
o Role in Cancer: Enables adaptations to therapy and hostile conditions, promotes
metastasis via EMT (epithelial-mesenchymal transition), and facilitates drug
resistance even in the absence of new mutations.
o Pancreatic Cancer: Acinar-to-ductal metaplasia (ADM), where acinar cells
(normally producing digestive enzymes) become duct-like cells, is driven by
injury and the oncogenic KRAS gene. ADM becomes a route to Pancreatic Ductal
Adenocarcinoma (PDAC).
o Sox9 and Sox4: Transcription factors crucial for plasticity and neoplastic
transformation. Sox9 is linked to tumor initiation in PDAC. Loss of Sox can
prevent healing, impair tissue regeneration, increase susceptibility to
tumorigenesis, and promote a protumorigenic microenvironment, leading to cystic
lesions.
 Tumor Microenvironment (TME)
o The TME is the complex cellular and non-cellular environment surrounding a
tumor. It profoundly shapes tumor behavior, progression (angiogenesis &
metastasis), and resistance to treatment.
o Components:
 CAFs (cancer-associated fibroblasts): Secrete growth factors (GF),
cytokines, and extracellular matrix (ECM) to support tumor growth and
immune evasion.
 Immune cells: Tumor-associated macrophages (TAMs) and myeloid-
derived suppressor cells (MDSCs) suppress anti-tumor immune responses.
 Endothelial cells: Support angiogenesis (formation of new blood vessels
from existing ones).
 Extracellular matrix: Provides structural support and biochemical
signals.
o Pancreatic cancer: Characterized by a dense stroma and an immunosuppressive
microenvironment, which are major obstacles to treatment.
 Aging and Environment: Pancreas regeneration declines with age. Acidosis (an acidic
tumor environment) can enhance metastatic potential by supporting stem-like traits.
 Therapies for Cancer: Personalized Medicine
o Personalized medicine: Uses individual genetic and molecular tumor profiles to
tailor treatment.
o Inherited mutations: In key tumor suppressor genes can lead to hereditary
cancer syndromes (e.g., BRCA1/2 mutations → hereditary breast and ovarian
cancer; TP53 → Li-Fraumeni; APC → Familial Adenomatous Polyposis (FAP)).
o Targeted therapies: Examples include Imatinib for BCR-ABL fusion (in CML)
and Trastuzumab for HER2 amplification (in breast cancer).
o Requires integration of: Whole-genome or exome sequencing, transcriptomics
(RNA-seq), and epigenetic profiling.
o Advantages: Higher efficacy, fewer side effects, and targeting driver mutations
rather than just symptoms.
 Techniques to Study Cancer at Molecular Level
o PCR-based Methods: For mutation detection (e.g., TaqMan for SNPs). Used in
veterinary genetic testing (e.g., NPHP4 mutation in dogs).
o Whole Genome Sequencing (WGS) / Whole Exome Sequencing (WES):
Detects point mutations, indels, and structural variants. Identifies driver (causal)
vs. passenger (non-causal) mutations.
o RNA-seq: Measures gene expression. Identifies gene fusions and splice variants.
o Epigenetic Assays (DNA methylation profiling):
 Methylation analysis: Bisulfite sequencing, MeDIP-seq.
 ChIP-seq: Maps histone modifications.
o Reverse Phase Protein Arrays (RPPA): Profiles protein and phosphorylation
levels.
o Fluorescence-based genotyping (Microsatellites/STRs): Used for paternity
testing and population genetics, involves PCR with fluorescent primers and
polyacrylamide gel electrophoresis.
o Multiplex PCR Panels: Co-amplify multiple microsatellites in one reaction (e.g.,
Canine Genotypes Panel 1.1). Used in both paternity and forensic analyses.
o Next-Generation Sequencing in Forensics: Uses STRs and SNPs for identity,
ancestry, and age estimation, often on Ion Torrent & Illumina platforms.

Discussion Questions and Answers:


1. Question: Explain the concept of "cancer evolution" in the context of Darwinian
selection. Describe the "Hallmarks of Cancer" and how they contribute to malignant
growth.
o Answer: Cancer evolution follows Darwinian selection at the cellular level.
This means that within a population of cells, random mutations arise. Those
mutations that provide a growth or survival advantage (e.g., faster proliferation,
resistance to cell death) are "selected for" and become fixed in the population,
leading to clonal expansion. This is a multistage, multistep process, progressing
from normal cells to dysplasia and eventually invasive cancer. The "Hallmarks
of Cancer" are a set of six acquired capabilities that all cancers must achieve to
become malignant:
1. Self-sufficiency in growth signals: Cancer cells can proliferate without
external stimuli.
2. Insensitivity to anti-growth signals: They ignore signals that would
normally stop cell division.
3. Evasion of apoptosis: They bypass programmed cell death, allowing
damaged cells to survive.
4. Limitless replicative potential: They can divide indefinitely, overcoming
normal cellular senescence.
5. Sustained angiogenesis: They induce the formation of new blood vessels
to supply nutrients and oxygen for tumor growth.
6. Tissue invasion and metastasis: They gain the ability to invade
surrounding tissues and spread to distant sites. These hallmarks
collectively enable uncontrolled growth and spread.
2. Question: Discuss the role of the TP53 gene and its protein product, p53, in preventing
cancer. What are the consequences of losing p53 function, and how is it related to Li-
Fraumeni syndrome?
o Answer: The TP53 gene encodes the p53 protein, widely known as the
"guardian of the genome". P53 is a crucial transcription factor that preserves
genome integrity by coordinating cellular responses to various stresses, including
DNA damage, oxidative stress, hypoxia, and oncogene activation. It primarily
prevents proliferation by:

 Inducing cell cycle arrest (especially at the G1/S checkpoint) to allow


time for DNA repair.
 Upregulating DNA repair genes.
 If DNA damage is irreparable, p53 triggers apoptosis (programmed cell
death) by upregulating pro-apoptotic genes (e.g., BAX, PUMA, NOXA),
thus preventing cells with damaged DNA from becoming malignant. Loss
of p53 function, often due to missense mutations in TP53, leads to
continued replication of damaged DNA, allowing mutations to accumulate
and promoting cancerous transformation. This loss of function is
associated with Li-Fraumeni syndrome, an inherited cancer
predisposition syndrome characterized by multiple tumors appearing early
in life, due to a germline mutation in TP53.
3. Question: How are oncogenes activated to promote cancer, and how are tumor
suppressor genes silenced? Provide examples of molecular mechanisms for both.
o Answer:

 Oncogenes are activated from normal proto-oncogenes (which regulate


cell growth) through gain-of-function mutations. Key mechanisms
include:
 Point mutations: Single nucleotide changes that alter the protein
product, leading to constitutive activation of growth signals
without external stimuli.
 Gene amplification: Multiple copies of the oncogene are
produced, resulting in excessive protein production (e.g., MYCN
amplification in neuroblastoma).
 Chromosomal translocations: Portions of two different
chromosomes break and incorrectly rejoin, creating a novel fused
gene with oncogenic properties (e.g., BCR-ABL fusion).
 Enhancer hijacking: An enhancer region from one gene is
aberrantly placed near an oncogene, driving its overexpression
(e.g., immunoglobulin enhancers driving oncogene expression).
 Tumor suppressor genes (which normally restrain cell proliferation,
repair DNA, and promote apoptosis) are silenced through loss-of-
function mechanisms. These include:
 Mutations: Nonsense, missense, or frameshift mutations that
inactivate the gene product.
 Promoter hypermethylation: This epigenetic modification adds
methyl groups to CpG islands in the gene's promoter region,
preventing transcription and silencing the gene.
 Loss of Heterozygosity (LOH): If one allele of a tumor
suppressor gene is already mutated, LOH refers to the loss of the
remaining healthy allele through events like deletion, mitotic
recombination, or non-disjunction, leading to complete inactivation
of the gene. Often, biallelic inactivation (both copies
lost/mutated) is required, as described by Knudson's "two-hit
hypothesis".

9. Genetic Manipulation of Mammalian Cells

This section explores the principles and techniques of genetic manipulation, including animal
models, transgenic animals, gene editing technologies like CRISPR-Cas9, and genetic
approaches to treat diseases, such as gene and cell therapies. It also addresses ethical
considerations and risk assessment.

Key Concepts:
 Concepts/Principles of Genetic Manipulation
o Genetic manipulation: Involves the insertion, deletion, or alteration of genes in
organisms.
o Purposes: Model human diseases (e.g., Alzheimer's in mice), understand gene
roles during development, and produce therapeutic proteins (e.g., insulin).
o Gene Therapy: Aims to correct genetic disorders by modifying gene expression
in affected cells.
 Different Animal Models (Pros/Cons)
o Flies (Drosophila) & C. elegans: Cheap, short life cycles, many offspring, good
for genetics. Cons: Evolutionarily distant from humans, poor physiological
relevance.
o Frogs & Zebrafish: Transparent embryos, good for early developmental studies.
Cons: Evolutionarily distant from humans.
o Mice: Mammalian, extensive genetic tools, short life cycles, many established
disease models. Cons: Differences in complex organs (e.g., brain), short lifespan.
o Dogs, pigs, non-human primates: Closer physiology to humans, similar
behavior, immunity, useful for neurological studies. Cons: Expensive, ethical
concerns, long generation times, not many offspring.
 Transgenic Animals
o Definition: Animals genetically modified to carry foreign DNA integrated into
their genome and passed to progeny.
o How to Generate:
 1. Insert into germ cells: Ensures all cells in the animal have the genetic
modification, which is then passed on to the next generation.
 Pronuclear injection: DNA injected directly into the haploid
pronucleus of a fertilized oocyte before nuclear fusion of sperm
and egg.
 Gene transfer into early embryos or gametes: DNA introduced
at gamete or early embryonic stages (e.g., male sperm nuclei with
integrated DNA injected into a female oocyte, trackable with green
fluorescence).
 Somatic cell nuclear transfer (SCNT): Nucleus from a modified
adult cell is inserted into an enucleated egg (e.g., Dolly the sheep).
 2. Gene targeting (precise modification): Uses a plasmid as a template,
where the plasmid and target gene have complementary homology arms,
allowing for specific gene insertion or mutation via homologous
recombination. Often involves backcrossing (wild type x heterozygous
GM) to ensure only the targeted gene is genetically different.
 Techniques (protein-based gene editing):
 Zinc Finger Nucleases (ZFNs): Engineered proteins, each
recognizing 3bp DNA sequence; requires multiple
"fingers". They have a DNA binding domain and a FokI
nuclease domain. Two ZFNs bind to sequences next to each
other around the target site, their FokI domains dimerize,
activating nuclease activity to cut DNA (double-strand
break - DSB). Repair occurs via NHEJ (knockout via small
indels) or HDR (precise repair with DNA template).
 TALENs (Transcription Activator-Like Effector
Nucleases): Similar to ZFNs, also use DNA binding
domains fused to a FokI nuclease domain. Two TALENs
bind to adjacent sequences, their FokI domains dimerize,
and activate the nuclease to create a DSB at the spacer
region between their binding sites. Repair via NHEJ
(knockout) or HDR (precise repair/knockin). Used in CAR
T cells.
 CRISPR-Cas9: Uses a Cas9 nuclease guided by an RNA
molecule (guide RNA) to a specific DNA sequence, which
must be next to a PAM (Protospacer Adjacent Motif)
sequence. Cas9 cuts DNA, creating a DSB. Repair via
NHEJ (knockout via indels) or HDR (precise repair if a
DNA template is provided). Can be used for selective
reproduction (introducing/removing traits before birth).
 NHEJ (Non-Homologous End Joining): An error-prone
repair pathway that often leads to small insertions or
deletions (indels), resulting in a gene knockout.
 HDR (Homology-Directed Repair): A precise repair
pathway that requires a donor DNA template with
homology arms to the target site. Used
for knockin (inserting a gene or correcting a mutation).
 Conditional gene editing (Cre-lox recombination): Used to
control where (tissue-specific) and when (time-specific) a gene is
modified. Useful for studying gene function without unwanted
developmental effects. Cre recombinase is an enzyme that
recognizes and cuts at loxP sites (short DNA regions that flank the
target gene). Cre is introduced under tissue-specific or inducible
promoters. When expressed, Cre excises or inverts the DNA
between the loxP sites, leading to gene knockout.
 3. Random mutagenesis: Involves inserting DNA randomly into the
genome. Less control and may unintentionally disrupt endogenous genes.
o Applications of Transgenic Animals: Disease modeling (e.g., Alzheimer's,
Huntington's), functional genomics (studying gene function and regulatory
regions), pharmaceutical production (e.g., monoclonal antibodies in CHO cells),
xenotransplantation, and vaccine development [11 gene or mutation is inserted at
a specific locus, leading to a gain of function. Achieved via HDR. Can involve
inserting a gene or reporter (e.g., LacZ) to track expression in a known promoter.
o Knockdown: Partial reduction in gene expression, usually achieved via RNA
interference (RNAi) or short hairpin RNA (shRNA).
o Applications: Disease correction, gene silencing in cell lines, and modifications
in live animals.
 Genetic Approaches to Treat Disease
o Production of reagents:
 1. Therapeutic proteins: Produced using genetically engineered
organisms (e.g., CHO cells for insulin, monoclonal antibodies). Requires
precise control of expression, purification, and post-translational
modifications (PTMs).
 2. Vaccines: Engineered viral vectors (e.g., Ebola vaccine uses GM VSV
virus), mRNA vaccines (synthetic mRNA encoding antigens like spike
protein for COVID-19), and DNA vaccines (deliver plasmids into host
cells to express antigens).
 3. Gene therapy products: (e.g., Casgevy for sickle cell disease).
o Cell therapy: Uses cells as therapeutic agents.
 Applications: iPSC-derived neurons for neurodegenerative diseases,
hematopoietic stem cell transplants, and regenerative repair (e.g., spinal
injury, retinal degeneration).
 Limitations/Risks: Tumor risk (especially with pluripotent stem cells like
iPSCs), immune rejection in allogeneic transplants, poor control over
differentiation (wrong tissue formation), ethical issues with embryonic
stem cells, and difficulty in effective delivery/targeting of damaged
tissues.
o CAR T-cells (Chimeric Antigen Receptor T-cells): A type of cell therapy for
cancer.
 Process: T cells are extracted from patients, genetically modified with a
viral vector to express an artificial CAR19 receptor (which targets CD19-
positive cancer cells). A second gene can be modified to prevent donor T
cells from being destroyed or rejected by the host immune system. These
modified T cells are then infused back into patients to destroy CD19-
positive malignant B cells.
 If donor T cells are used (allogeneic transplant), Graft-vs-Host reaction is
a risk. TALEN can be used to genetically modify donor T cells to disrupt
T cell receptor expression (so they are not recognized by host MHC
molecules) and knock out CD52 (making them resistant to alemtuzumab,
an antibody used in treatment).
 Stem Cells
o Use: Regenerative medicine (e.g., spinal injury, macular degeneration), disease
modeling of cell types within a specific lineage (e.g., adult stem cells in bone
marrow).
o How to Make:
 1. Embryonic Stem Cells (ESCs): Derived from the inner cell mass of a
blastocyst. They are pluripotent and stable (not prone to mutations).
 2. Adult Stem Cells: Found in adult tissues (e.g., hematopoietic stem cells
in bone marrow). They are multipotent and can be used to make iPSCs.
 3. Induced Pluripotent Stem Cells (iPSCs): Created by reprogramming
somatic cells (e.g., skin fibroblasts) into a pluripotent state using specific
transcription factors (Oct4, Sox2, Klf4, c-Myc). Techniques include
lentiviral transduction and episomal vectors.
 Pros: Patient-specific (not attacked by immune system), no ethical
issues (no embryos needed).
 Cons: Tumor risk, low reprogramming efficiency, may have
epigenetic memory, not as "good" as ESCs, and prone to
mutations.
 Concepts Learnt from Risk Assessment Lecture
o Environmental risk assessment and human health risk assessment are central.
o GMO regulation varies worldwide (EU is strictest).
o GMOs in the EU are regulated under two directives: one for deliberate release
into the environment and one for contained use.
o Environmental Risk Assessment (ERA): Evaluates risks to human health and
the environment, identifying, characterizing, and managing potential risks before
GMOs are used in clinical trials, agriculture, or industry.
o Genome editing technologies (e.g., CRISPR) are currently regulated as GMOs,
but this is evolving.
o Considerations in GMO medicines (gene therapies, cell therapies, vaccines):
Need extensive preclinical trials in model organisms before approval.
 Risks:
 Shedding: Genetically modified material released into the
environment (e.g., from dead embryos), potentially infecting
unintended individuals.
 Replication competence: Ability of a viral vector to replicate
uncontrollably after introduction, spreading within the patient or
environment if defective replication is not ensured.
 Insertional mutagenesis: Gene inserted in the wrong place,
potentially leading to cancer.
 Genetic stability of the construct: Whether the transgene
(inserted gene) remains intact and unchanged over time in host
cells; instability could lead to ineffectiveness or harm (e.g.,
producing toxic proteins).
 Recombination of the construct: Inserted genetic material could
recombine
 CRISPR-edited chickens: Engineered so male embryos die early in
development, while females survive and are non-transgenic (transgene not
heritable). Regulations must check for horizontal gene transfer or
incomplete lethality of males, ensure transgenic animals do not enter the
environment, and assess human exposure to transgenic embryos or eggs
with GMO remnants.
o Key Distinction:
 Risk assessment: Science-based process to evaluate hazards and
likelihood (e.g., of GMOs on human health, environment).
 Risk management: Policy-driven process that uses scientific input to
make regulatory decisions.
 Risk communication: Interactive exchange of information and opinions
concerning risks.
 Organisms that can be Genetically Modified (and their purpose):
o Bacteria (E. coli, Bacillus subtilis): Production of insulin, enzymes, vitamins.
o Plants (Maize, soybean, tomato, potato): Pest resistance, herbicide tolerance,
improved nutrition.
o Animals (Mice, zebrafish, dogs, pigs, chickens): Disease models, drug testing,
organ donor models.
o Fish (Salmon e.g., AquaAdvantage salmon): Faster growth via modified hormone
regulation.
o Human cells (CAR T cells, gene therapies): Treat genetic diseases, cancer
immunotherapy.
o Purpose: Research, therapeutics, agriculture, industrial biotechnology.

Discussion Questions and Answers:

1. Question: Describe the main strategies for generating transgenic animals. How do these
differ, and what are their respective advantages and disadvantages?
o Answer: There are two main strategies for generating transgenic animals:
 1. Insertion into germ cells (random insertion): This involves directly
injecting foreign DNA into fertilized oocytes (pronuclear injection) or
transferring DNA into early embryos or gametes. This method results in
the random integration of the foreign DNA into the host genome, meaning
all cells of the animal will carry the modification, and it will be passed to
progeny. An advantage is its relative simplicity for initial transgenesis. A
disadvantage is the lack of control over where the DNA inserts, which can
lead to insertional mutagenesis (disrupting an endogenous gene) or
variable expression levels.
 2. Gene targeting (precise modification): This strategy uses homologous
recombination to insert or alter genes at a specific, predetermined locus.
Techniques like ZFNs, TALENs, and CRISPR-Cas9create double-strand
breaks (DSBs) at precise genomic locations, which are then repaired via
either NHEJ (error-prone, leading to knockout by small indels)
or HDR (precise repair allowing knockin of new sequences or gene
correction if a donor template is provided). The advantage here is precise
control over the genetic modification, allowing for specific gene
knockouts, knockins, or even conditional gene editing (Cre-lox
recombination) that enables tissue- and time-specific modifications. The
disadvantage is that these methods are more complex and require careful
design and validation.
2. Question: Compare and contrast gene knockout, knockin, and knockdown techniques in
terms of their molecular outcomes and applications.
o Answer: These techniques manipulate gene expression levels:
 Gene Knockout: Involves the complete inactivation of a gene, leading to
a total loss of function. This is typically achieved by disrupting the gene,
often through error-prone Non-Homologous End Joining (NHEJ) after a
double-strand break. Applications include studying gene function by
observing the phenotypic consequences of its absence and modeling
human diseases caused by gene inactivation.
 Gene Knockin: Involves the precise insertion of a new gene, a modified
version of an existing gene, or a specific mutation at a particular locus.
This is achieved through Homology-Directed Repair (HDR), which
requires a donor DNA template. Knockin can result in a gain of
function (e.g., introducing a hyperactive gene) or correction of a mutation.
Applications include modeling human diseases caused by specific
mutations, tracking gene expression in vivo by inserting reporter genes,
and developing gene therapies to correct genetic defects.
 Gene Knockdown: Involves a partial reduction in gene expression,
rather than complete inactivation. This is commonly achieved using RNA
interference (RNAi) or short hairpin RNA (shRNA) molecules that target
and reduce the amount of specific mRNA. Applications include studying
the effects of reduced gene dosage, which might more closely mimic
certain disease states or allow for dose-dependent studies, and gene
silencing in cell lines for research.
3. Question: Discuss the ethical and safety considerations involved in the genetic
manipulation of mammalian cells for therapeutic purposes (e.g., gene therapies, cell
therapies). What are some key risks that need to be assessed during preclinical trials?
o Answer: Genetic manipulation of mammalian cells for therapeutics, such as gene
therapies and cell therapies, involves significant ethical and safety considerations.
 Ethical concerns particularly arise with the use of embryonic stem cells
(ESCs) due to their origin, though induced pluripotent stem cells (iPSCs)
offer an alternative with fewer ethical concerns as they are derived from
adult somatic cells. Other ethical debates involve gene editing in humans
and animals.
 Safety considerations are paramount and necessitate extensive preclinical
trials in model organisms before approval. Key risks that must be assessed
include:
 Shedding: The potential for genetically modified material (e.g.,
viral vectors) to be released from the patient into the environment,
potentially infecting unintended individuals.
 Replication competence: Ensuring that viral vectors used for gene
delivery are replication-defective to prevent uncontrolled spread
within the patient or environment.
 Insertional mutagenesis: The risk that the therapeutic gene might
insert into the wrong place in the host genome, potentially
disrupting an essential gene or activating an oncogene, leading to
cancer.
 Genetic stability of the construct: Ensuring that the introduced
transgene remains intact and functional over time within the host
cells to maintain efficacy and avoid harmful protein production.
 Recombination of the construct: The possibility that the inserted
genetic material could recombine with host DNA or other viral
sequences in unintended ways, potentially generating new, harmful
viruses or hybrid genes.
 Molecular characteristics of the construct: Poorly designed
constructs might lead to overexpression of the therapeutic gene,
silencing of host genes, or unintended off-target effects. These
risks are meticulously evaluated through processes like
Environmental Risk Assessment (ERA) and human health risk
assessment.

You might also like