Abstract
Whole transcriptome analysis plays an essential role in deciphering genome structure and function, identifying genetic networks underlying cellular, physiological, biochemical and biological systems and establishing molecular biomarkers that respond to diseases, pathogens and environmental challenges. Here, we review transcriptome analysis methods and technologies that have been used to conduct whole transcriptome shotgun sequencing or whole transcriptome tag/target sequencing analyses. We focus on how adaptors/linkers are added to both 5′ and 3′ ends of mRNA molecules for cloning or PCR amplification before sequencing. Challenges and potential solutions are also discussed. In brief, next generation sequencing platforms have accelerated releases of the large amounts of gene expression data. It is now time for the genome research community to assemble whole transcriptomes of all species and collect signature targets for each gene/transcript, and thus use known genes/transcripts to determine known transcriptomes directly in the near future.






Similar content being viewed by others
References
Granjeaud S, Bertucci F, Jordan BR (1999) Expression profiling: DNA arrays in many guises. Bioessays 21(9):781–790
Altman RB, Raychaudhuri S (2001) Whole-genome expression analysis: challenges beyond clustering. Curr Opin Struct Biol 11(3):340–347
Hsiao LL, Stears RL, Hong RL, Gullans SR (2000) Prospective use of DNA microarrays for evaluating renal function and disease. Curr Opin Nephrol Hypertens 9(3):253–258
Celis JE, Kruhøffer M, Gromova I, Frederiksen C, Ostergaard M, Thykjaer T, Gromov P, Yu J, Pálsdóttir H, Magnusson N, Orntoft TF (2000) Gene expression profiling: monitoring transcription and translation products using DNA microarrays and proteomics. FEBS Lett 480(1):2–16
Manger ID, Relman DA (2000) How the host ‘sees’ pathogens: global gene expression responses to infection. Curr Opin Immunol 12(2):215–218
Peffers MJ, Fang Y, Cheung K, Wei TK, Clegg PD, Birch HL (2015) Transcriptome analysis of ageing in uninjured human Achilles tendon. Arthritis Res Ther 17(1):33
Lowe R, Gemma C, Rakyan VK, Holland ML (2015) Sexually dimorphic gene expression emerges with embryonic genome activation and is dynamic throughout development. BMC Genom 16(1):295
Ji Z, Tian B (2009) Reprogramming of 3′ untranslated regions of mRNAs by alternative polyadenylation in generation of pluripotent stem cells from different cell types. PLoS One 4(12):e8419
Elkon R, Drost J, van Haaften G, Jenal M, Schrier M, Oude Vrielink JA, Agami R (2012) E2F mediates enhanced alternative polyadenylation in proliferation. Genome Biol 13:R59
Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB (2008) Proliferating cells express mRNAs with shortened 3′ untranslated regions and fewer microRNA target sites. Science 320:1643–1647
Jiang Z, Rokhsar DS, Harland RM (2009) Old can be new again: HAPPY whole genome sequencing, mapping and assembly. Int J Biol Sci 5(4):298–303
Hodkinson BP, Grice EA (2015) Next-generation sequencing: a review of technologies and tools for wound microbiome research. Adv Wound Care (New Rochelle) 4(1):50–58
Deschamps S, Llaca V, May GD (2012) Genotyping-by-sequencing in plants. Biology (Basel) 1(3):460–483
Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF et al (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252(5013):1651–1656
Quackenbush J, Liang F, Holt I, Pertea G, Upton J (2000) The TIGR gene indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Res 28(1):141–145
Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J (2001) The TIGR gene indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res 29(1):159–164
Hwang DM, Dempsey AA, Lee CY, Liew CC (2000) Identification of differentially expressed genes in cardiac hypertrophy by analysis of expressed sequence tags. Genomics 66(1):1–14
Nelson PS, Han D, Rochon Y, Corthals GL, Lin B, Monson A, Nguyen V, Franza BR, Plymate SR, Aebersold R, Hood L (2000) Comprehensive analyses of prostate gene expression: convergence of expressed sequence tag databases, transcript profiling and proteomics. Electrophoresis 21(9):1823–1831
Jiang Z, Zhang M, Wasem VD, Michal JJ, Zhang H, Wright RW Jr (2003) Census of genes expressed in porcine embryos and reproductive tissues by mining an expressed sequence tag database based on human genes. Biol Reprod 69(4):1177–1182
Jiang Z, Wu XL, Garcia MD, Griffin KB, Michal JJ, Ott TL, Gaskins CT, Wright RW Jr (2004) Comparative gene-based in silico analysis of transcriptomes in different bovine tissues and (or) organs. Genome 47(6):1164–1172
Wu XL, Griffin KB, Garcia MD, Michal JJ, Xiao Q, Wright RW Jr, Jiang Z (2004) Census of orthologous genes and self-organizing maps of biologically relevant transcriptional patterns in chickens (Gallus gallus). Gene 340(2):213–225
Rodríguez-Ezpeleta N, Teijeiro S, Forget L, Burger G, Lang BF (2009) Construction of cDNA libraries: focus on protists and fungi. Methods Mol Biol 533:33–47
Okayama H, Berg P (1982) High-efficiency cloning of full-length cDNA. Mol Cell Biol 2(2):161–170
Okubo K, Hori N, Matoba R, Niiyama T, Fukushima A, Kojima Y, Matsubara K (1992) Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nat Genet 2(3):173–179
Wan KH, Yu C, George RA, Carlson JW, Hoskins RA, Svirskas R, Stapleton M, Celniker SE (2006) High-throughput plasmid cDNA library screening. Nat Protoc 1(2):624–632
Matsubara K, Okubo K (1993) cDNA analyses in the human genome project. Gene 135(1–2):265–274
Gautheret D, Poirot O, Lopez F, Audic S, Claverie JM (1998) Alternate polyadenylation in human mRNAs: a large-scale analysis by EST clustering. Genome Res 8(5):524–530
Beaudoing E, Gautheret D (2001) Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data. Genome Res 11(9):1520–1526
Tian B, Hu J, Zhang H, Lutz CS (2005) A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res 33(1):201–212
Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, McDonald H, Varhol R, Jones S, Marra M (2008) Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques 45(1):81–94
Costa V, Angelini C, De Feis I, Ciccodicola A (2010) Uncovering the complexity of transcriptomes with RNA-seq. J Biomed Biotechnol 2010:853916
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63
Wilhelm BT, Landry JR (2009) RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods 48(3):249–257
Smibert P, Miura P, Westholm JO, Shenker S, May G, Duff MO, Zhang D, Eads BD, Carlson J, Brown JB, Eisman RC, Andrews J, Kaufman T, Cherbas P, Celniker SE, Graveley BR, Lai EC (2012) Global patterns of tissue-specific alternative polyadenylation in Drosophila. Cell Rep 1(3):277–289
Schlackow M, Marguerat S, Proudfoot NJ, Bähler J, Erban R, Gullerova M (2013) Genome-wide analysis of poly(A) site selection in Schizosaccharomyces pombe. RNA 19(12):1617–1631
Finotello F, Di Camillo B (2015) Measuring differential gene expression with RNA-seq: challenges and strategies for data analysis. Brief Funct Genomics 14:130–142
Pelechano V, Wilkening S, Järvelin AI, Tekkedil MM, Steinmetz LM (2012) Genome-wide polyadenylation site mapping. Methods Enzymol 513:271–296
Baker KE, Parker R (2004) Nonsense-mediated mRNA decay: terminating erroneous gene expression. Curr Opin Cell Biol 16:293–299
Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010) Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol 11(2):R14
Gao L, Fang Z, Zhang K, Zhi D, Cui X (2011) Length bias correction for RNA-seq data in gene set analyses. Bioinformatics 27(5):662–669
Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L (2011) Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol 12(3):R22
Rallapalli G, Kemen EM, Robert-Seilaniantz A, Segonzac C, Etherington GJ, Sohn KH, MacLean D, Jones JD (2014) EXPRSS: an Illumina based high-throughput expression-profiling method to reveal transcriptional dynamics. BMC Genom 15:341
Malone JH, Oliver B (2011) Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol 9:34
Steijger T, Abril JF, Engström PG, Kokocinski F, Hubbard TJ, Guigó R, Harrow J, Bertone P; RGASP Consortium (2013) Assessment of transcript reconstruction methods for RNA-seq. Nat Methods 10(12):1177–1184
Sharon D, Tilgner H, Grubert F, Snyder M (2013) A single-molecule long-read survey of the human transcriptome. Nat Biotechnol 31(11):1009–1014
Tilgner H, Grubert F, Sharon D, Snyder MP (2014) Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc Natl Acad Sci USA 111(27):9869–9874
Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270(5235):484–487
Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20(5):508–512
Matsumura H, Reich S, Ito A, Saitoh H, Kamoun S, Winter P, Kahl G, Reuter M, Kruger DH, Terauchi R (2003) Gene expression analysis of plant host-pathogen interactions by SuperSAGE. Proc Natl Acad Sci USA 100(26):15718–15723
Spinella DG, Bernardino AK, Redding AC, Koutz P, Wei Y, Pratt EK, Myers KK, Chappell G, Gerken S, McConnell SJ (1999) Tandem arrayed ligation of expressed sequence tags (TALEST): a new method for generating global gene expression profiles. Nucleic Acids Res 27(18):e22
Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, Roth R, George D, Eletr S, Albrecht G, Vermaas E, Williams SR, Moon K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K, Mao J, Corcoran K (2000) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol 18(6):630–634
Reinartz J, Bruyns E, Lin JZ, Burcham T, Brenner S, Bowen B, Kramer M, Woychik R (2002) Massively parallel signature sequencing (MPSS) as a tool for in-depth quantitative gene expression profiling in all organisms. Brief Funct Genomic Proteomic 1(1):95–104
Mardis ER (2008) Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 9:387–402
Asmann YW, Klee EW, Thompson EA, Perez EA, Middha S, Oberg AL, Therneau TM, Smith DI, Poland GA, Wieben ED, Kocher JP (2009) 3′ tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer. BMC Genom 10:531
Matsumura H, Urasaki N, Yoshida K, Krüger DH, Kahl G, Terauchi R (2012) SuperSAGE: powerful serial analysis of gene expression. Methods Mol Biol 883:1–17
Prashar Y, Weissman SM (1996) Analysis of differential gene expression by display of 3′ end restriction fragments of cDNAs. Proc Natl Acad Sci USA 93(2):659–663
Richards M, Tan SP, Chan WK, Bongso A (2006) Reverse serial analysis of gene expression (SAGE) characterization of orphan SAGE tags from human embryonic stem cells identifies the presence of novel transcripts and antisense transcription of key pluripotency genes. Stem Cells 24(5):1162–1173
Wu X, Liu M, Downie B, Liang C, Ji G, Li QQ, Hunt AG (2011) Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation. Proc Natl Acad Sci USA 108(30):12533–12538
Jiang Z, Zhou X, Michal JJ, Wu XL, Zhang L, Zhang M, Ding B, Liu B, Manoranjan VS, Neill JD, Harhay GP, Kehrli ME Jr, Miller LC (2013) Reactomes of porcine alveolar macrophages infected with porcine reproductive and respiratory syndrome virus. PLoS One 8(3):e59229
Liang P, Pardee AB (1992) Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science 257(5072):967–971
Bauer D, Warthoe P, Rohde M, Strauss M (1994) Detection and differential display of expressed genes by DDRT-PCR. PCR Methods Appl 4(2):S97–S108
Ma L, Pati PK, Liu M, Li QQ, Hunt AG (2014) High throughput characterizations of poly(A) site choice in plants. Methods 67(1):74–83
Mata J (2013) Genome-wide mapping of polyadenylation sites in fission yeast reveals widespread alternative polyadenylation. RNA Biol 10(8):1407–1414
Hoque M, Ji Z, Zheng D, Luo W, Li W, You B, Park JY, Yehia G, Tian B (2013) Analysis of alternative cleavage and polyadenylation by 3′ region extraction and deep sequencing. Nat Methods 10(2):133–139
Wilkening S, Pelechano V, Järvelin AI, Tekkedil MM, Anders S, Benes V, Steinmetz LM (2013) An efficient method for genome-wide polyadenylation site mapping and RNA quantification. Nucleic Acids Res 41(5):e65
Shepard PJ, Choi EA, Lu J, Flanagan LA, Hertel KJ, Shi Y (2011) Complex and dynamic landscape of RNA polyadenylation revealed by PAS-seq. RNA 17(4):761–772
Yao C, Shi Y (2014) Global and quantitative profiling of polyadenylated RNAs using PAS-seq. Methods Mol Biol 1125:179–185
Derti A, Garrett-Engele P, Macisaac KD, Stevens RC, Sriram S, Chen R, Rohl CA, Johnson JM, Babak T (2012) A quantitative atlas of polyadenylation in five mammals. Genome Res 22(6):1173–1183
Ho ES, Gunderson SI, Duffy S (2013) A multispecies polyadenylation site model. BMC Bioinform 14(Suppl 2):S9
Kavakiotis I, Tzanis G, Vlahavas I (2014) Polyadenylation site prediction using PolyA-iEP method. Methods Mol Biol 1125:131–140
Ozsolak F, Platt AR, Jones DR, Reifenberger JG, Sass LE, McInerney P, Thompson JF, Bowers J, Jarosz M, Milos PM (2009) Direct RNA sequencing. Nature 461(7265):814–818
Ozsolak F, Milos PM (2011) Transcriptome profiling using single-molecule direct RNA sequencing. Methods Mol Biol 733:51–61
Ozsolak F (2014) Quantitative polyadenylation site mapping with single-molecule direct RNA sequencing. Methods Mol Biol 1125:145–155
Wu Q, Kim YC, Lu J, Xuan Z, Chen J, Zheng Y, Zhou T, Zhang MQ, Wu CI, Wang SM (2008) Poly A- transcripts expressed in HeLa cells. PLoS One 3(7):e2803
Yang L, Duff MO, Graveley BR, Carmichael GG, Chen LL (2011) Genomewide characterization of non-polyadenylated RNAs. Genome Biol 12(2):R16
Zhang X, Yin Q, Chen L, Yang L (2015) Gene expression profiling of non-polyadenylated RNA-seq across species. Genomics Data 2:237–241
Liu X, Gorovsky MA (1993) Mapping the 5′ and 3′ ends of Tetrahymena thermophila mRNAs using RNA ligase mediated amplification of cDNA ends (RLM-RACE). Nucleic Acids Res 21(21):4954–4960
Jeck WR, Sharpless NE (2014) Detecting and characterizing circular RNAs. Nat Biotechnol 32(5):453–461
Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, Maier L, Mackowiak SD, Gregersen LH, Munschauer M, Loewer A, Ziebold U, Landthaler M, Kocks C, le Noble F, Rajewsky N (2013) Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495(7441):333–338
Jeck WR, Sorrentino JA, Wang K, Slevin MK, Burd CE, Liu J, Marzluff WF, Sharpless NE (2013) Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 19(2):141–157
Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO (2012) Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS One 7(2):e30733
Salzman J, Chen RE, Olsen MN, Wang PL, Brown PO (2013) Cell-type specific features of circular RNA expression. PLoS Genet 9(9):e1003777
Zhang Y, Zhang XO, Chen T, Xiang JF, Yin QF, Xing YH, Zhu S, Yang L, Chen LL (2013) Circular intronic long noncoding RNAs. Mol Cell 51(6):792–806
Schaefer M, Pollex T, Hanna K, Lyko F (2009) RNA cytosine methylation analysis by bisulfite sequencing. Nucleic Acids Res 37(2):e12
Khoddami V, Cairns BR (2014) Transcriptome-wide target profiling of RNA cytosine methyltransferases using the mechanism-based enrichment procedure Aza-IP. Nat Protoc 9(2):337–361
Khoddami V, Cairns BR (2013) Identification of direct targets and modified bases of RNA cytosine methyltransferases. Nat Biotechnol 31(5):458–464
Hussain S, Sajini AA, Blanco S, Dietmann S, Lombard P, Sugimoto Y, Paramor M, Gleeson JG, Odom DT, Ule J, Frye M (2013) NSun2-mediated cytosine-5 methylation of vault noncoding RNA determines its processing into regulatory small RNAs. Cell Rep 4(2):255–261
Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, Kodzius R, Watahiki A, Nakamura M, Arakawa T, Fukuda S, Sasaki D, Podhajska A, Harbers M, Kawai J, Carninci P, Hayashizaki Y (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci USA 100(26):15776–15781
Valen E, Pascarella G, Chalk A, Maeda N, Kojima M, Kawazu C, Murata M, Nishiyori H, Lazarevic D, Motti D, Marstrand TT, Tang MH, Zhao X, Krogh A, Winther O, Arakawa T, Kawai J, Wells C, Daub C, Harbers M, Hayashizaki Y, Gustincich S, Sandelin A, Carninci P (2009) Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. Genome Res 19(2):255–265
Ni T, Corcoran DL, Rach EA, Song S, Spana EP, Gao Y, Ohler U, Zhu J (2010) A paired-end sequencing strategy to map the complex landscape of transcription initiation. Nat Methods 7(7):521–527
Plessy C, Bertin N, Takahashi H, Simone R, Salimullah M, Lassmann T, Vitezic M, Severin J, Olivarius S, Lazarevic D, Hornig N, Orlando V, Bell I, Gao H, Dumais J, Kapranov P, Wang H, Davis CA, Gingeras TR, Kawai J, Daub CO, Hayashizaki Y, Gustincich S, Carninci P (2010) Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat Methods 7(7):528–534
Ozsolak F, Milos PM (2011) RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12(2):87–98
Tsuchihara K, Suzuki Y, Wakaguri H, Irie T, Tanimoto K, Hashimoto S, Matsushima K, Mizushima-Sugano J, Yamashita R, Nakai K, Bentley D, Esumi H (2009) Sugano S (2009) Massive transcriptional start site analysis of human genes in hypoxia cells. Nucleic Acids Res 37(7):2249–2263
Yamashita R, Sathira NP, Kanai A, Tanimoto K, Arauchi T, Tanaka Y, Hashimoto S, Sugano S, Nakai K, Suzuki Y (2011) Genome-wide characterization of transcriptional start sites in humans by integrative transcriptome analysis. Genome Res 21(5):775–789
Mitschke J, Georg J, Scholz I, Sharma CM, Dienst D, Bantscheff J, Voss B, Steglich C, Wilde A, Vogel J, Hess WR (2011) An experimentally anchored map of transcriptional start sites in the model cyanobacterium Synechocystis sp. PCC6803. Proc Natl Acad Sci USA 108(5):2124–2129
Cortes T, Schubert OT, Rose G, Arnvig KB, Comas I, Aebersold R, Young DB (2013) Genome-wide mapping of transcriptional start sites defines an extensive leaderless transcriptome in Mycobacterium tuberculosis. Cell Rep 5(4):1121–1131
Acknowledgments
This work was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Award Number R21HD076845 to ZJ. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Conflict of interest
The authors have declared that no competing interest exists.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, Z., Zhou, X., Li, R. et al. Whole transcriptome analysis with sequencing: methods, challenges and potential solutions. Cell. Mol. Life Sci. 72, 3425–3439 (2015). https://doi.org/10.1007/s00018-015-1934-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00018-015-1934-y