0% found this document useful (0 votes)

10 views21 pages

M.sc Transcriptome Analysis 2025

The document outlines the process of transcriptome analysis, detailing the importance of understanding RNA molecules in cells for interpreting genomic functions and disease mechanisms. It describes the steps involved in transcriptome sequencing, including RNA isolation, cDNA library preparation, quality checks, and bioinformatic analyses using tools like HISAT2 and DESeq2. Additionally, it covers the concepts of transcript assembly, quantification, and differential expression analysis, emphasizing the significance of normalization techniques for accurate comparisons across samples.

Uploaded by

sabbuj54

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views21 pages

M.sc Transcriptome Analysis 2025

Uploaded by

sabbuj54

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Transcriptome

Analysis
Background and Pipeline
(HISAT2 and DESeq2)
M.Sc. Bioinformatic Practical 29.04.2025
Introduction
• Transcriptome is the complete set of all RNA molecules in a cell
produced under specific conditions for a specific developmental stage.

• Understanding the transcriptome is essential for-

• interpreting the functional elements of genome

• revealing molecular constituents of cells and tissues

• understanding development and diseases

• Unlike genome, which is roughly fixed, transcriptome may vary depending

upon the external conditions.
Transcriptomics
• Transcriptomics- study of RNA in any of its forms.

• also referred as expression profiling, that most of the examines expression levels of
mRNA, which further gives us an idea of genes that are being expressed in cell at a
particular condition/ stage.

• Transcriptomics aims to
• catalogue all the species of transcripts inclusing mRNA, ncRNA and small RNA.

• determine transcriptional structure of genes in terms of their start and end sites,
splicing patterns etc.
• quantify the changing expression levels of each transcript under various conditions.
Basic steps of transcriptome sequencing and
data analyses
RNA isolation
cDNA library preparation
Sampling

Sequencing

Bioinformatic analyses

Created with BioRender.com

Pipeline used for bioinformatic analysis
cDNA Library preparation
Terminologies used
• Single end- only one read is generated from one end of the fragment

• Paired-End- two reads are generated from both ends,

• N50- is the length of the shortest transcript such that 50% of the total assembled
transcriptome is contained in contigs of that length or longer.

• Phred Quality Score- A Phred score indicates the confidence of a base call in sequencing
data.

Q = -10 × log₁₀(P), Where P = probability that the base call is wrong.

Phred Score Accuracy Error Rate

20 99% 1 in 100
30 99.9% 1 in 1000
40 99.99% 1 in 10,000
1. Quality checks
• Quality check of reads is performed prior to any step of analysis

• FASTQ files- FASTA sequence with quality information.

• Poor quality at the ends needs to be trimmed.

• Left-over adapter sequences need to be removed.

• This gives best quality of reads for data analysis.

• Tools used for quality restoration of reads- FastP, Trimmomatic.

1. QC: Read Filtering and Adapter Removal with fastp

Option Purpose
-i, -I Input paired-end reads
-o, -O Output filtered paired-end reads
Auto-detect adapter sequences (very important for
--detect_adapter_for_pe
paired-end!)
Correct mismatches between overlapping paired-
--correction
end reads
Only keep bases with Phred quality ≥30 (very strict,
--qualified_quality_phred 30
good for downstream)
-w 16 Use 16 CPU threads (speeds up fastp)
--html, --json Create quality control reports
Transcriptome Assembly
• there are two types of assemblies
• Reference-based assembly- Reads are aligned (mapped) to an existing
known reference genome or transcriptome.

• tools: HISAT2, STAR, TopHat

• de novo assembly- Reads are assembled without any reference — from
scratch — based only on overlaps.

• tools: Trinity, SOAPdenovo-Trans

• commonly used: Trinity for de novo assemblies. It uses De Bruijn graph
construction for assembling the reads.
2. Reference Genome Indexing with HISAT2

•hisat2-build creates a set of index files that HISAT2 needs for fast mapping.
•Saff_A2_final_assembly.fa is your reference genome file.
•saff is the base name for the output index files.

3. Alignment of Reads to the Reference Genome using HISAT2

Option Purpose
Optimizes output for transcriptome
--dta
assemblers like StringTie
-p 40 Use 40 CPU threads
Prefix path to genome index files created by
-x
hisat2-build
-1, -2 Input paired-end reads
Save a summary report (alignment rates,
--summary-file
error rates)

Then
•samtools view converts the output SAM to BAM (compressed binary format).
•samtools sort sorts BAM files by genomic coordinates (needed for StringTie and DESeq2).
4. Transcript Assembly and Quantification using StringTie
Option Purpose
(OPTIONAL) Provide the reference genome FASTA
--ref
(StringTie can optionally use it)
Mandatory: Use known gene annotation (GTF) for
-G
better transcript prediction
-o Output predicted transcript GTF file
Output gene abundances into a simple tab-
-A
separated table
Create output suitable for Ballgown differential
-B analysis (also useful if you want alternative DE
tools)
Strict mode: only quantify reference transcripts
-e
(faster, no novel transcript assembly)

Merge multiple GTF files (from different samples)

--merge
into a non-redundant reference annotation
Use the original annotation GTF as a guide during
-G final.gtf
merging (helps to maintain consistency)
A text file listing paths to all sample GTF files (one
m_file.txt
per line)
-o merged.gtf Output the merged master GTF
Quantification of reads (or read count)
• Quantification = Counting how many reads map to each gene or transcript.

• Tools: salmon, stringtie, kallisto

• Normalization = Adjusting those counts to remove biases. It is important because-

➢ Different sequencing depths

➢ Gene length differences

➢ Library composition bias

• Tools: kallisto, edgeR, DESeq2

Normalized values
• FPKM (Fragments Per Kilobase per Million)
• Used for paired-end reads.

• RPKM (Reads Per Kilobase per Million)

• Used for single-end reads.

• FPKM/RPKM: Normalize each gene individually first. The total expression can vary between samples.
Used when comparing genes within the same sample.

• TPM (Transcripts Per Million)

• TPM improves on FPKM by reordering the normalization steps
• First normalize read counts by gene length. Then normalize across the sample to sum to 1 million.
• TPM values are comparable across samples.
• Normalizes all genes together, making the total expression the same across samples → enables direct
comparison.
• Used when comparing genes across different samples.
5. Preparing for DESeq2 (Differential Expression Analysis)

Option Purpose
Input file listing sample names and their GTF files (same as
-i merged_file.txt
used before)
Read length used for sequencing (important for accurate
-l 151
count estimation); here, it's 151 bp reads
-g gene_count_matrix.csv Output gene-level count matrix (for DESeq2, EdgeR, etc.)
Output transcript-level count matrix (useful for more
-t transcript_count_matrix.csv
detailed isoform analyses)
-v Verbose output (prints log messages, useful for debugging)
6. DESeq2 (Differential Expression Analysis)
STEP 1:
•DESeqDataSetFromMatrix():
•countData: gene count data matrix (gene_info) that
contains the counts for each gene (rows) and sample
(columns).
•colData: A data frame (sample_info) containing
metadata for the samples, including the condition
variable.
•design: Specifies the model design (~ condition),
where condition is the factor that will be used to
compare differential expression (e.g., two
experimental groups).
•DESeq(): Runs the DESeq analysis to estimate size
factors, dispersion, and perform the differential
analysis.
•saveRDS(dds, file = "dds.rds"): Saves the
DESeqDataSet object to an RDS file for future
reference
STEP 2:
•Get unique conditions: conditions <-
unique(sample_info$condition) extracts the distinct experimental
conditions from your sample_info data frame.
•Creating output directory: The line
dir.create("DESeq2_pairwise_results", showWarnings = FALSE)
creates a directory to store the results. If the directory already exists,
it suppresses warnings.
•Loop over condition pairs: The nested for loops iterate through all
pairs of conditions (conditionA vs. conditionB), ensuring that each
pair is only compared once. The cat() function prints which
comparison is being processed.
•Run DESeq2 analysis: results(dds, contrast = c("condition",
conditionB, conditionA)) computes the differential expression
between conditionB and conditionA. The contrast argument specifies
which levels of the condition factor you are comparing.
•Order results: res[order(res$padj), ] sorts the results by the
adjusted p-value (padj), which is important for filtering significant
results.
•Save results: write.csv() writes the ordered results to a CSV file in
the DESeq2_pairwise_results directory. The filename dynamically
includes the comparison (e.g.,
DESeq2_conditionB_vs_conditionA.csv).
Output

PCA plot

Heatmap
Presented by Manvi Sharma

Thank
you very
much!

Rnaseq by Example
No ratings yet
Rnaseq by Example
163 pages
Strategies To Optimize The Selectivity of Small Molecules Targeting RNA
No ratings yet
Strategies To Optimize The Selectivity of Small Molecules Targeting RNA
543 pages
Ppt
No ratings yet
Ppt
11 pages
Full download Corynebacterium Glutamicum From Systems Biology to Biotechnological Applications 1st Edition Andreas Burkovski pdf docx
No ratings yet
Full download Corynebacterium Glutamicum From Systems Biology to Biotechnological Applications 1st Edition Andreas Burkovski pdf docx
81 pages
CLC Genomics Workbench User Manual Subset
No ratings yet
CLC Genomics Workbench User Manual Subset
222 pages
Transcriptome Data Analysis Methods and Protocols 1st Edition Yejun Wang - Read the ebook online or download it to own the full content
100% (1)
Transcriptome Data Analysis Methods and Protocols 1st Edition Yejun Wang - Read the ebook online or download it to own the full content
57 pages
(Ebook) Insect Molecular Biology and Biochemistry by Lawrence I. Gilbert ISBN 0123847478 pdf download
100% (1)
(Ebook) Insect Molecular Biology and Biochemistry by Lawrence I. Gilbert ISBN 0123847478 pdf download
52 pages
Buy Ebook Computational Network Analysis With R Applications in Biology Medicine and Chemistry 1st Edition Matthias Dehmer Cheap Price
100% (6)
Buy Ebook Computational Network Analysis With R Applications in Biology Medicine and Chemistry 1st Edition Matthias Dehmer Cheap Price
52 pages
Textbook Periparturient Diseases of Dairy Cows A Systems Biology Approach 1St Edition Burim N Ametaj Eds Ebook All Chapter PDF
100% (12)
Textbook Periparturient Diseases of Dairy Cows A Systems Biology Approach 1St Edition Burim N Ametaj Eds Ebook All Chapter PDF
53 pages
SK Sir Notes MidSem
No ratings yet
SK Sir Notes MidSem
18 pages
Intro_to_RNA-seq_concepts
No ratings yet
Intro_to_RNA-seq_concepts
85 pages
Recent advancements in transcriptomics and its application in basic medical and clinical sciences
No ratings yet
Recent advancements in transcriptomics and its application in basic medical and clinical sciences
18 pages
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
122 pages
Edge RUsers Guide
No ratings yet
Edge RUsers Guide
138 pages
Assignment Cb 1
No ratings yet
Assignment Cb 1
69 pages
4 RNAseq-Quantification LO
No ratings yet
4 RNAseq-Quantification LO
30 pages
Introduction To Differential Gene Expression Analysis Using RNA-seq
No ratings yet
Introduction To Differential Gene Expression Analysis Using RNA-seq
97 pages
2023-GenomicaFuncional y Biocomputacion-Day1
No ratings yet
2023-GenomicaFuncional y Biocomputacion-Day1
92 pages
Edger Users Guide
No ratings yet
Edger Users Guide
139 pages
Beginner's Guide To Using The DESeq2 Package
No ratings yet
Beginner's Guide To Using The DESeq2 Package
32 pages
Lecture4 Expression - Analysis 2019
No ratings yet
Lecture4 Expression - Analysis 2019
79 pages
DESeq 2
No ratings yet
DESeq 2
64 pages
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
119 pages
Package Deseq2': September 18, 2019
No ratings yet
Package Deseq2': September 18, 2019
53 pages
Cm2 Debily m1 Funcgenprecmed 2024 25
No ratings yet
Cm2 Debily m1 Funcgenprecmed 2024 25
41 pages
BTC 506 Gene Identification Using Bioinformatic Tools-230302130331
No ratings yet
BTC 506 Gene Identification Using Bioinformatic Tools-230302130331
14 pages
2024 - Physiological Reviews - Jennifer Van Eyk - Proteomics of The Heart
No ratings yet
2024 - Physiological Reviews - Jennifer Van Eyk - Proteomics of The Heart
53 pages
Trancriptome and Proteome Analysis
No ratings yet
Trancriptome and Proteome Analysis
68 pages
Nazarov QC-Statistics
No ratings yet
Nazarov QC-Statistics
50 pages
유전공학 Week13
No ratings yet
유전공학 Week13
43 pages
3 RNAseq-Mapping LO
No ratings yet
3 RNAseq-Mapping LO
98 pages
Analysis of RNA-Seq Data
No ratings yet
Analysis of RNA-Seq Data
71 pages
RNA Seq Tutorial
0% (1)
RNA Seq Tutorial
139 pages
RNA Seq R - Final Decode
No ratings yet
RNA Seq R - Final Decode
76 pages
ScRNA Seq Course
100% (1)
ScRNA Seq Course
337 pages
Intro 2 RNAseq
No ratings yet
Intro 2 RNAseq
98 pages
Example Analysis AMDA Version 2.0.0: Mattia Pelizzola March 13, 2006
No ratings yet
Example Analysis AMDA Version 2.0.0: Mattia Pelizzola March 13, 2006
48 pages
RNA Sequencing of The Human Milk Fat Layer Transcriptome Reveals Distinct Gene Expression Profiles at Three Stages of Lactation
No ratings yet
RNA Sequencing of The Human Milk Fat Layer Transcriptome Reveals Distinct Gene Expression Profiles at Three Stages of Lactation
31 pages
EBTY348L_Comp Genomics lectures_Even Sem_2024-25 _set 2
No ratings yet
EBTY348L_Comp Genomics lectures_Even Sem_2024-25 _set 2
29 pages
Differential Analysis of Count Data - The Deseq2 Package: Michael Love, Simon Anders, Wolfgang Huber
No ratings yet
Differential Analysis of Count Data - The Deseq2 Package: Michael Love, Simon Anders, Wolfgang Huber
33 pages
Environmental Plast Waste Management
No ratings yet
Environmental Plast Waste Management
24 pages
NOISeq
No ratings yet
NOISeq
26 pages
PIIS1550413123003790
No ratings yet
PIIS1550413123003790
22 pages
Module8 RNASeq Pathogen Practical Manual
No ratings yet
Module8 RNASeq Pathogen Practical Manual
23 pages
nihms-977214
No ratings yet
nihms-977214
21 pages
Omics Introduction
No ratings yet
Omics Introduction
25 pages
Moderated Estimation of Fold Change and Dispersion For Rna-Seq Data With Deseq2
No ratings yet
Moderated Estimation of Fold Change and Dispersion For Rna-Seq Data With Deseq2
21 pages
Differential Expression of Rna-Seq Data at The Gene Level - The Deseq Package
No ratings yet
Differential Expression of Rna-Seq Data at The Gene Level - The Deseq Package
24 pages
KCL NGScourse Session4 Handout
No ratings yet
KCL NGScourse Session4 Handout
23 pages
1.1 - Introduction To Omics
No ratings yet
1.1 - Introduction To Omics
20 pages
ExSeq Presentation With Background
No ratings yet
ExSeq Presentation With Background
40 pages
Advanced Science - 2024 - Gong - Control of Cellular Differentiation Trajectories for Cancer Reversion
No ratings yet
Advanced Science - 2024 - Gong - Control of Cellular Differentiation Trajectories for Cancer Reversion
17 pages
Slides Nov2019 Day4
No ratings yet
Slides Nov2019 Day4
28 pages
DESeq 2
No ratings yet
DESeq 2
48 pages
ChemoSpec 1
No ratings yet
ChemoSpec 1
42 pages
Tools For The Analysis of High-Dimensional Single-Cell RNA Sequencing Data
No ratings yet
Tools For The Analysis of High-Dimensional Single-Cell RNA Sequencing Data
14 pages
RNA-Seq Analysis Course
No ratings yet
RNA-Seq Analysis Course
40 pages
Intro To Pneumatics Modified
No ratings yet
Intro To Pneumatics Modified
35 pages
Epigenomicslinc RNA
No ratings yet
Epigenomicslinc RNA
17 pages
Degradome 1
No ratings yet
Degradome 1
14 pages
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
No ratings yet
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
25 pages
Opto-Combinatorial Indexing Enables Highcontent Transcriptomics by Linking Cell Images and Transcriptomes
No ratings yet
Opto-Combinatorial Indexing Enables Highcontent Transcriptomics by Linking Cell Images and Transcriptomes
11 pages
From RNA-seq Reads To Gene Expression
No ratings yet
From RNA-seq Reads To Gene Expression
27 pages
Rcourse_partViz
No ratings yet
Rcourse_partViz
9 pages
Csir Csmcri Scientists Online Form 2025 67875529a5a6988657935
No ratings yet
Csir Csmcri Scientists Online Form 2025 67875529a5a6988657935
17 pages
1 s2.0 S0010482523009630 Main
No ratings yet
1 s2.0 S0010482523009630 Main
10 pages
Systematic Comparison and Assessment of RNA Seq Procedures For Gene Expression Quantitative Analysis
No ratings yet
Systematic Comparison and Assessment of RNA Seq Procedures For Gene Expression Quantitative Analysis
15 pages
RNA-Seq and Transcriptome Analysis: Jessica Holmes
No ratings yet
RNA-Seq and Transcriptome Analysis: Jessica Holmes
98 pages
Getting Started With HISAT, StringTie, and Ballgown
No ratings yet
Getting Started With HISAT, StringTie, and Ballgown
19 pages
Lab2
No ratings yet
Lab2
7 pages
Ten Years of Next-Generation Sequencing Technology
No ratings yet
Ten Years of Next-Generation Sequencing Technology
9 pages
DNA Barcoding of Medicinal Plant Material For Identification
No ratings yet
DNA Barcoding of Medicinal Plant Material For Identification
8 pages
Harrison
No ratings yet
Harrison
6 pages
Affy Diffexp Clustering Exercise-1
No ratings yet
Affy Diffexp Clustering Exercise-1
16 pages
Omics-Based On Science, Technology, and Applications Omics
0% (1)
Omics-Based On Science, Technology, and Applications Omics
22 pages
Transcriptome Analysis
No ratings yet
Transcriptome Analysis
6 pages
Differential Expression Analysis With Deseq2: Dr. Kathi Zarnack
No ratings yet
Differential Expression Analysis With Deseq2: Dr. Kathi Zarnack
8 pages
Product Note Curio Seeker 3x3 - 10x10 1
No ratings yet
Product Note Curio Seeker 3x3 - 10x10 1
4 pages
Transcriptome Software Paper
No ratings yet
Transcriptome Software Paper
7 pages
RNA-Seq Module 1
No ratings yet
RNA-Seq Module 1
54 pages
CourseCurriculum (6)
No ratings yet
CourseCurriculum (6)
3 pages
Next-Generation Sequencing Technologies: An Overview: Taishan Hu, Nilesh Chitnis, Dimitri Monos, Anh Dinh
No ratings yet
Next-Generation Sequencing Technologies: An Overview: Taishan Hu, Nilesh Chitnis, Dimitri Monos, Anh Dinh
11 pages
Using Limma For Microarray and RNA-Seq Analysis
No ratings yet
Using Limma For Microarray and RNA-Seq Analysis
13 pages
Dchip Expression
No ratings yet
Dchip Expression
4 pages
An Introduction To Exomepeak: Jia Meng, PHD Modified: 18 August, 2013. Compiled: June 24, 2014
No ratings yet
An Introduction To Exomepeak: Jia Meng, PHD Modified: 18 August, 2013. Compiled: June 24, 2014
5 pages
Tcseq: Time Course Sequencing Data Analysis
No ratings yet
Tcseq: Time Course Sequencing Data Analysis
8 pages
The Bench Scientist's Guide To Statistical Analysis of RNA-Seq Data
No ratings yet
The Bench Scientist's Guide To Statistical Analysis of RNA-Seq Data
10 pages
Poster PPT Portrait
No ratings yet
Poster PPT Portrait
1 page
Mastering Python
From Everand
Mastering Python
Rick van Hattem
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet

M.sc Transcriptome Analysis 2025

Uploaded by

M.sc Transcriptome Analysis 2025

Uploaded by

Transcriptome

• Understanding the transcriptome is essential for-

• revealing molecular constituents of cells and tissues

• Unlike genome, which is roughly fixed, transcriptome may vary depending

Created with BioRender.com

• Paired-End- two reads are generated from both ends,

Q = -10 × log₁₀(P), Where P = probability that the base call is wrong.

Phred Score Accuracy Error Rate

• FASTQ files- FASTA sequence with quality information.

• Poor quality at the ends needs to be trimmed.

• Left-over adapter sequences need to be removed.

• This gives best quality of reads for data analysis.

• Tools used for quality restoration of reads- FastP, Trimmomatic.

• tools: HISAT2, STAR, TopHat

• tools: Trinity, SOAPdenovo-Trans

3. Alignment of Reads to the Reference Genome using HISAT2

Merge multiple GTF files (from different samples)

• Tools: salmon, stringtie, kallisto

• Normalization = Adjusting those counts to remove biases. It is important because-

➢ Different sequencing depths

➢ Library composition bias

• Tools: kallisto, edgeR, DESeq2

• RPKM (Reads Per Kilobase per Million)

• TPM (Transcripts Per Million)

You might also like