CN110684830A - RNA analysis method for paraffin section tissue - Google Patents
RNA analysis method for paraffin section tissue Download PDFInfo
- Publication number
- CN110684830A CN110684830A CN201910962113.2A CN201910962113A CN110684830A CN 110684830 A CN110684830 A CN 110684830A CN 201910962113 A CN201910962113 A CN 201910962113A CN 110684830 A CN110684830 A CN 110684830A
- Authority
- CN
- China
- Prior art keywords
- paraffin section
- sample
- analysis
- rna
- quality control
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000012188 paraffin wax Substances 0.000 title claims abstract description 46
- 238000004458 analytical method Methods 0.000 title claims abstract description 42
- 238000003908 quality control method Methods 0.000 claims abstract description 35
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 28
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 25
- 238000012163 sequencing technique Methods 0.000 claims abstract description 23
- 230000004927 fusion Effects 0.000 claims abstract description 20
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 20
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 20
- 230000014509 gene expression Effects 0.000 claims abstract description 17
- 230000015556 catabolic process Effects 0.000 claims abstract description 10
- 238000006731 degradation reaction Methods 0.000 claims abstract description 10
- 238000010195 expression analysis Methods 0.000 claims abstract description 10
- 238000011002 quantification Methods 0.000 claims abstract description 9
- 238000004445 quantitative analysis Methods 0.000 claims abstract description 6
- 238000000034 method Methods 0.000 claims description 20
- 239000002299 complementary DNA Substances 0.000 claims description 15
- 108020004414 DNA Proteins 0.000 claims description 11
- 238000002360 preparation method Methods 0.000 claims description 6
- 238000013441 quality evaluation Methods 0.000 claims description 6
- 230000003252 repetitive effect Effects 0.000 claims description 6
- 230000002194 synthesizing effect Effects 0.000 claims description 6
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 5
- 238000012408 PCR amplification Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 241000735480 Istiophorus Species 0.000 claims description 3
- 108091029795 Intergenic region Proteins 0.000 claims description 2
- 238000001514 detection method Methods 0.000 abstract description 5
- 238000011156 evaluation Methods 0.000 abstract description 2
- 238000003559 RNA-seq method Methods 0.000 description 28
- 210000001519 tissue Anatomy 0.000 description 27
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 19
- 108020004418 ribosomal RNA Proteins 0.000 description 13
- 206010028980 Neoplasm Diseases 0.000 description 11
- 201000011510 cancer Diseases 0.000 description 8
- 238000012165 high-throughput sequencing Methods 0.000 description 5
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 2
- 208000005718 Stomach Neoplasms Diseases 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 206010017758 gastric cancer Diseases 0.000 description 2
- 230000003834 intracellular effect Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000035790 physiological processes and functions Effects 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 201000011549 stomach cancer Diseases 0.000 description 2
- 108020005004 Guide RNA Proteins 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 241001397173 Kali <angiosperm> Species 0.000 description 1
- 108091005461 Nucleic proteins Proteins 0.000 description 1
- 238000010802 RNA extraction kit Methods 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 241000607142 Salmonella Species 0.000 description 1
- 239000012620 biological material Substances 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000007385 chemical modification Methods 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000004132 cross linking Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012255 expression quantity analysis Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000011223 gene expression profiling Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000011222 transcriptome analysis Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000005945 translocation Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Physics & Mathematics (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Pathology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Hospice & Palliative Care (AREA)
- Oncology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a paraffin section tissue RNA analysis method, which comprises the following steps: carrying out DNA degradation on the paraffin section tissue, and extracting sample RNA; preparing a paraffin section sample nucleic acid library, and sequencing sample RNA; performing quality control on sample data obtained by sequencing; comparing the sample data after quality control with the reference genome, and performing quality control on the result by comparison; and performing transcriptome assembly and transcript quantification on the sample data subjected to quality control of the comparison result, and performing quantitative analysis, gene differential expression analysis and fusion gene analysis on gene expression. The invention provides an index and a detection method for completely evaluating RNA quality of paraffin section tissue, which can comprehensively evaluate the RNA of the paraffin section tissue, have accurate and effective evaluation results and provide effective reference basis for the accuracy of subsequent analysis.
Description
Technical Field
The invention belongs to the field of second-generation high-throughput sequencing analysis, and particularly relates to a paraffin section tissue RNA analysis method.
Background
RNA sequencing (RNA-seq) is a sensitive and accurate method of quantifying gene expression. Second generation high throughput sequencing (NGS) has created a new era for RNA-seq transcriptome analysis. The design of the broad spectrum application process of RNA-seq involves sequencing technology, sample type, demand analysis of genome and its computational resources. The analysis process is evaluated based on accuracy, calculation speed and the cost of analysis.
The gene expression profile of a tumor sample is a powerful biomarker for identifying prognosis and prediction. To date, transcriptomic profiling has been performed on a large number of cancer frozen tissue samples. However, since fresh frozen tissues of tumor samples of clinical patients are not easy to collect and store for long-term follow-up, formalin-fixed paraffin-embedded tissue (FFPE) is a more widely used biomaterial in the medical field. Genome-wide gene expression profiling of tumor samples is essential for cancer research and also facilitates extensive retrospective clinical genomic studies. FFPE is subjected to fixation, paraffin embedding, sectioning and staining to prevent degradation of cellular tissues, and these preparation processes and storage have significant negative effects on DNA and RNA quality. FFPE samples generally have severe degradation, chemical modification, cross-linking of nucleic acids and proteins, and variability in tissue handling and processing, and these molecular changes will directly affect data quality, causing several problems, such as sample degradation leading to lower sequencing data alignment quality, more soft-cut sequences, more repetitive sequences, formaldehyde fixation leading to random C-T transformation of nucleic acids, which makes FFPE isolated nucleic acids incompatible with downstream high-throughput molecular techniques. In addition to deepening the sequencing depth to supplement the problem of nucleic acid degradation, a complete index and detection method for evaluating the RNA quality of paraffin tissues are urgently needed to ensure the reliability of subsequent analysis. Meanwhile, a complete paraffin section RNA analysis process is needed to study the difference of gene expression levels of organisms in different environments or different physiological states, so that the reaction mechanism of a body can be known and an intracellular regulation network can be constructed. Meanwhile, the fusion new gene formed by connecting all or part of two genes in series due to chromosome translocation or reverse splicing plays an important role in researching the cause and development of various cancer types.
Disclosure of Invention
In order to solve the technical problems, the invention provides a paraffin section tissue RNA analysis method.
A method for analyzing RNA of paraffin section tissue, the method comprising the steps of:
carrying out DNA degradation on the paraffin section tissue, and extracting sample RNA;
preparing a paraffin section sample nucleic acid library and sequencing the sample RNA based on the library;
performing quality control on the sample data obtained by sequencing to remove rRNA data;
comparing the sample data after quality control with a reference genome, and performing quality control on the comparison result;
carrying out transcriptome assembly and transcript quantification on the sample data after quality control, and carrying out quantitative analysis on gene expression;
based on the transcript quantification results, gene differential expression analysis was performed.
Further, the analysis method can also perform fusion gene analysis;
the Fusion gene analysis is performed by selecting one or more software selected from JAFFA, STAR-Fusion, TopHat-Fusion, Fusion catcher, or SOAPfuse.
Further, the preparation of the paraffin section sample nucleic acid library comprises the following steps:
extracting nucleic acid in the paraffin section sample;
synthesizing a single-stranded cDNA based on the nucleic acid;
synthesizing a double-stranded cDNA based on the single-stranded cDNA;
repairing the double-stranded cDNA ends;
determining the connecting joint of the double-stranded cDNA, and performing PCR amplification on the DNA of the connecting joint to obtain a nucleic acid library of the paraffin section sample.
Further, the quality control of the sample data obtained by sequencing comprises:
removing the sequence consisting of the sequencing linker sequence, the low-quality sequence and the N base;
screening the number of bases of the filtered data after the joint removal, the percentage of base quality larger than 20, the percentage of base quality larger than 30, GC content, N content, average read length, rRNA comparison rate and the number of read after the filtration;
and selecting the data and samples meeting the set threshold value for subsequent analysis.
Further, the comparing the sample data after quality control with the reference genome comprises:
comparing the obtained sample sequence containing the nucleic acid sequence information with a reference genome;
sample sequences of the aligned reference genomes are obtained.
Further, the comparing the sample sequence containing the nucleic acid sequence information with the reference genome, and selecting one or more of TopHat, STAR, or HISAT2 to compare the sample sequence with the reference genome.
Further, the quality control of the comparison result comprises:
carrying out quality evaluation on the comparison result file of the paraffin section tissues;
the repetitive sequence is removed.
Further, the quality evaluation of the comparison result file of the paraffin section tissue comprises:
evaluating one or more of the ratio of duplicate sequences, alignment, unique alignment, exon alignment, intron alignment, intergenic region alignment, expression efficiency, detected transcript, detected gene or sequence coverage uniformity.
Further, the quantifying gene expression is performed by selecting one or more of RSEM, eXpress, HTseq, Cufflinks, StringTie, Sailfish, Salmonon, quasi-mapping, or Kallisto software.
Further, the gene differential expression analysis is performed by selecting one or more of DESeq, limma, edgeR, Cuffdiff, Ballgown, DESeq2, or slauth software.
The invention provides an index and a detection method for completely evaluating RNA quality of paraffin section tissues, which can research the difference between gene expression quantities of organisms under different environments or different physiological states, so that the reaction mechanism of the organisms can be known and an intracellular regulation network can be constructed; meanwhile, the invention can also carry out fusion gene analysis, and the detection and analysis of the fusion new gene can play an important role in researching the cause and development of various cancer types.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 shows a flow chart of one embodiment of the RNA analysis method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in FIG. 1, a method for analyzing RNA of paraffin section tissue comprises the following steps: carrying out DNA degradation on the paraffin section tissue, and extracting sample RNA; preparing a paraffin section sample nucleic acid library and sequencing the sample RNA based on the library; performing quality control on the sample data obtained by sequencing to remove rRNA data; comparing the sample data after quality control with a reference genome, and performing quality control on the comparison result; carrying out transcriptome assembly and transcript quantification on the sample data after quality control, and carrying out quantitative analysis on gene expression; performing gene differential expression analysis based on the transcript quantification result; also, fusion gene analysis was also performed using the method described in this example. The software for each step and its results file are detailed in table 1.
TABLE 1 result file presentation of paraffin section RNA analysis method
Preparation of a sample: selecting 3 stomach cancer paraffin section tissues (Beijing Jiyin medical examination laboratory, sample numbers are 199003859T, 199003855T and 199003848T) and 3 paracancer stones of stomach cancer patientsWax section tissue (Beijing Gionee plus medical laboratory, sample numbers 199003859N, 199003848N, 199003855N), DNA degradation in nuclease-free water without RNase, RNA extraction kit (MagMAX MAX) for FFPE sampleTMFFPE DNA/RNA Ultra kit) to obtain purified total RNA; re-use Ribo-ZeroTMRibosomal rna (rRNA) removal kit to remove rRNA;
library construction and sequencing: the preparation of the nucleic acid library of the paraffin section sample comprises the following steps: extracting nucleic acid in the paraffin section sample; synthesizing a single-stranded cDNA based on the nucleic acid; synthesizing a double-stranded cDNA based on the single-stranded cDNA; repairing the double-stranded cDNA ends; determining the connecting joint of the double-stranded cDNA, and performing PCR amplification on the DNA of the connecting joint to obtain a nucleic acid library of the paraffin section sample. Specifically, a kit for obtaining high-quality library yield by using only 10 ng-1. mu.gRNA: (UltraTM RNA library preparation kit) and a DNA library is precisely quantified using a Qubit fluorescer in order to obtain high quality sequencing results. The distribution range of the fragment length of the DNA library is detected by using an Agilent 2100 bioanalyzer, and the size of the library has a narrow peak at 300 bp. RNA sequencing (RNA-Seq) was performed using a second generation high throughput sequencing platform (Illumina HiSeq Xten sequencing platform).
The quality control of the sample data obtained by sequencing comprises the following steps: removing the sequence consisting of the sequencing linker sequence, the low-quality sequence and the N base; screening the number of bases of the filtered data after the joint removal, the percentage of base quality larger than 20, the percentage of base quality larger than 30, GC content, N content, average read length, rRNA comparison rate and the number of read after the filtration; and selecting the data and samples meeting the set threshold value for subsequent analysis. Specifically, the fast software (a data quality control software) is used for quality control, then the bowtie2 software (a sequencing sequence and reference sequence alignment software) is used for aligning the data after quality control with the ribosomal RNA (rRNA) database of the National Center for Biotechnology Information, NCBI for short), and the data are comparedrRNA data were removed. Filtering standard of quality control index: the number of bases after the linker removal is Clean _ Base (Cleandata reads in Table 150bp in length)>2500Mb, percentage Q20 of base mass greater than 20>90% percent of Q30 having a base mass of more than 30>85% GC content>40% and<60% N content<0.100% average read length>120bp and<150bp and rRNA alignment rate<40% of the read number after filtration (number of reads after removal of quality control not up to standard and removal of rRNA)>4*107And (5) screening. Software of bowtie2 was compared with the selected parameters: "- - -positive-D15-R2- -N0-L22-i S,1, 1.15". Specifically, see table 2, where the percentage of the paracancerous rRNA tissue with sample number 199003855N is 85.37%, the percentage is higher than the threshold, and the rRNA filtered data is only 31,165,304 reads (sequences generated by a high-throughput sequencing platform), and the number of the reads is lower than the number of the filtered data, which does not meet the requirement of the subsequent analysis, and requires resampling or rRNA degradation of the sample.
Comparing the sample data after quality control with the reference genome comprises: comparing the obtained sample sequence containing the nucleic acid sequence information with a reference genome; sample sequences of the aligned reference genomes are obtained. Specifically, The method adopts HISAT2 software (RNA-Seq Genome comparison tool software) for comparison, takes a 37 th edition of Human Genome sequence (The Genome reference consensus Human Genome Build 37, GRCh37 for short) as a reference Genome, needs to construct a HISAT2 index for The reference Genome, adopts default parameters for comparison, and adjusts individual sample parameters based on The comparison quality control result of The next sample. Preferably, this embodiment selects TopHat (a Bowtie-based RNA-Seq data analysis software) or STAR (spread proteins Alignment to a Reference, an RNA-Seq genome Alignment tool software) instead of HISAT2, or one or more combinations of TopHat, STAR or HISAT2 to align a sample sequence with a Reference genome.
The quality control of the comparison result comprises the following steps: carrying out quality evaluation on the comparison result file of the paraffin section tissues; the repetitive sequence is removed. Specifically, RNA-SeQC software (a software tool for quality control and expression evaluation of RNA-Seq data) is used for analysis, and it is necessary to construct an index for the comparison result file and operate commands: samtools index; an index is also constructed for the reference genome GRCh37, and commands: samtools false, while creating a dit index using createsequence dictionary. It is necessary to ensure that contig names of the bam file, the reference genome, and the genome gtf file are consistent. And the quality control of the comparison result can evaluate the sample and remove unqualified samples, so that the reliability of the analysis result is improved.
And (3) carrying out quality evaluation on the comparison result file of the paraffin section tissues, wherein the specific threshold value is set as follows: the Duplication Rate < 60%, the alignment Rate > 85%, the Unique alignment Rate Mappled Unique Rate > 50%, the exon alignment Rate > 50%, the intron alignment Rate < 40%, the Intergenic Rate < 10%, the expression efficiency > 45%, the detected transcript >130000, the detected gene >2000, the sequence coverage uniformity bias < 0.500%.
Table 2 shows the comparison and quality control results of 6 samples in this example, the Duplicate rate of the tissue beside cancer of sample No. 199003859N is 62.52%, which is higher, and more strict parameters are required to be used in the subsequent operation of removing the repeated sequence, so that the rate is reduced to be within the threshold range. The Mapping Rate of the tissue beside cancer with the sample number of 199003855N is less than 85%, and loose alignment conditions are required to improve the alignment Rate. All samples which do not reach the threshold value and can not directly enter the next expression quantity analysis need to be sent again or the quality control or the stricter parameter comparison is adjusted under the condition of ensuring the required data quantity so as to enable the data to reach the standard.
TABLE 2 comparison of samples and quality control results
Removing repeated sequences: PCR duplication is removed by Picard software (a software that operates on high throughput sequencing data and formats) because PCR amplification generates repetitive sequences that interfere with the actual enrichment signal. The Picard software REMOVEs PCR repeats with the addition of the parameter REMOVE _ DUPLICATES ═ true, otherwise only the repeat sequence is marked and not removed.
Transcriptome assembly and transcript quantification: by using StringTie software (a transcriptome marker expression quantitative software), an output file with a removed repetitive sequence can be used as an input file only by sequencing a generated comparison result file, and a reference genome annotation file is also required. The parameter used "-m 200" -m sets the minimum length allowed for the predicted transcript. StringTie software runs using the merge option, known transcripts and assembled new transcripts can be merged and assembled into a non-redundant set of transcripts. Preferably, this embodiment may further select one or more of RSEM (RNA-Seq by expectentationvalidation, abbreviated as RSEM, that is, an RNA-Seq data quantification software), efpress (an RNA-Seq data quantification software), HTseq (an RNA-Seq data analysis software), Cufflinks (an RNA-Seq transcriptome data assembly software), Sailfish (an RNA-Seq data rapid quantification software), salmonella (an RNA-Seq data quantification software), quasi-mapping (an unaligned RNA-Seq data quantification software), or kali (an RNA-Seq data rapid quantification software) to perform quantitative analysis of gene expression.
Differential expression analysis: differential expression analysis was performed using the transcript quantification results of the previous step as an input file for this step, using DESeq2 software (a software for RNA-Seq differential expression analysis based on the number of reads); cancer tissue samples were comma segmented; paracarcinoma tissue samples were also comma segmented; the space between the cancer tissue and the tissue beside the cancer is divided by a blank space. Preferably, the embodiment may further select one or more of DESeq (a piece of RNA-Seq differential expression analysis software based on the read number), limma (a piece of RNA-Seq differential expression analysis software based on the read number), edgeR (a piece of RNA-Seq differential expression analysis software based on the read number), Cuffdiff (a piece of RNA-Seq differential expression analysis software based on the assembly technology), Ballgown (a piece of RNA-Seq differential expression analysis software based on the assembly technology), or sluuth (a piece of alignment-free RNA-Seq differential expression quantitative analysis software) for gene differential expression analysis.
Analysis of fusion gene: fusion gene detection was predicted using fusion catcher software (a version of fusion gene analysis software). -d-parameter specifies the directory where the reference genome of the species is located, -i-parameter specifies the directory where the raw sequencing data fastq file corresponding to the sample is located, -o-parameter specifies the directory where the result is output. For humans, the authorities provide databases built on the Ensemblerelease 90 version. Preferably, the present embodiment may select one or more of JAFFA (a software for gene Fusion analysis based on comparing transcriptome to reference re-transcriptome), STAR-Fusion (a software for identifying Fusion gene based on STAR alignment), TopHat-Fusion (a software for identifying Fusion gene using RNA-Seq data), or SOAPfuse (an open software for probing Fusion transcript in the genome-wide range of human RNA-Seq data) for Fusion gene analysis.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A paraffin section tissue RNA analysis method is characterized by comprising the following steps:
carrying out DNA degradation on the paraffin section tissue, and extracting sample RNA;
preparing a paraffin section sample nucleic acid library and sequencing the sample RNA based on the library;
performing quality control on the sample data obtained by sequencing to remove rRNA data;
comparing the sample data after quality control with a reference genome, and performing quality control on the comparison result;
carrying out transcriptome assembly and transcript quantification on the sample data after quality control, and carrying out quantitative analysis on gene expression;
based on the transcript quantification results, gene differential expression analysis was performed.
2. The method for RNA analysis of paraffin-cut tissue according to claim 1, wherein the method further comprises performing fusion gene analysis;
the Fusion gene analysis is performed by selecting one or more software selected from JAFFA, STAR-Fusion, TopHat-Fusion, Fusion catcher, or SOAPfuse.
3. The method for RNA analysis of paraffin section tissue according to claim 1 or 2, wherein the preparation of the paraffin section sample nucleic acid library comprises the steps of:
extracting nucleic acid in the paraffin section sample;
synthesizing a single-stranded cDNA based on the nucleic acid;
synthesizing a double-stranded cDNA based on the single-stranded cDNA;
repairing the double-stranded cDNA ends;
determining the connecting joint of the double-stranded cDNA, and performing PCR amplification on the DNA of the connecting joint to obtain a nucleic acid library of the paraffin section sample.
4. The method for RNA analysis of paraffin section tissue according to claim 1 or 2, wherein the quality control of the sample data obtained by sequencing comprises:
removing the sequence consisting of the sequencing linker sequence, the low-quality sequence and the N base;
screening the number of bases of the filtered data after the joint removal, the percentage of base quality larger than 20, the percentage of base quality larger than 30, GC content, N content, average read length, rRNA comparison rate and the number of read after the filtration;
and selecting the data and samples meeting the set threshold value for subsequent analysis.
5. The method for RNA analysis of paraffin section tissue according to claim 1 or 2, wherein the comparing the sample data after quality control with the reference genome comprises:
comparing the obtained sample sequence containing the nucleic acid sequence information with a reference genome;
sample sequences of the aligned reference genomes are obtained.
6. The method for RNA analysis of paraffin section tissue according to claim 5, wherein the sample sequence containing nucleic acid sequence information is aligned with the reference genome, and one or more of TopHat, STAR or HISAT2 is selected to align the sample sequence with the reference genome.
7. The method for RNA analysis of paraffin section tissue according to claim 1 or 2, wherein the quality control of the comparison result comprises:
carrying out quality evaluation on the comparison result file of the paraffin section tissues;
the repetitive sequence is removed.
8. The method for RNA analysis of paraffin section tissue according to claim 7, wherein the quality evaluation of the comparison result file of paraffin section tissue comprises:
evaluating one or more of the ratio of duplicate sequences, alignment, unique alignment, exon alignment, intron alignment, intergenic region alignment, expression efficiency, detected transcript, detected gene or sequence coverage uniformity.
9. The method for RNA analysis of paraffin section tissue according to claim 1 or 2, wherein the quantitative analysis of gene expression is performed by selecting one or more software selected from RSEM, eXpress, HTseq, Cufflinks, StringTie, Sailfish, Salmonon, quasi-mapping and Kallisto.
10. The method for RNA analysis of paraffin-cut tissue according to claim 1 or 2, wherein the gene differential expression analysis is performed by one or more software selected from DESeq, limma, edgeR, Cuffdiff, Ballgown, DESeq2, or slauth.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910962113.2A CN110684830A (en) | 2019-10-11 | 2019-10-11 | RNA analysis method for paraffin section tissue |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910962113.2A CN110684830A (en) | 2019-10-11 | 2019-10-11 | RNA analysis method for paraffin section tissue |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110684830A true CN110684830A (en) | 2020-01-14 |
Family
ID=69111995
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910962113.2A Pending CN110684830A (en) | 2019-10-11 | 2019-10-11 | RNA analysis method for paraffin section tissue |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110684830A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111696629A (en) * | 2020-06-29 | 2020-09-22 | 电子科技大学 | Method for calculating gene expression quantity of RNA sequencing data |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070254305A1 (en) * | 2006-04-28 | 2007-11-01 | Nsabp Foundation, Inc. | Methods of whole genome or microarray expression profiling using nucleic acids prepared from formalin fixed paraffin embedded tissue |
CN102409099A (en) * | 2011-11-29 | 2012-04-11 | 浙江大学 | Method for analyzing difference of gene expression of porcine mammary gland tissue by sequencing technology |
CN102485979A (en) * | 2010-12-02 | 2012-06-06 | 深圳华大基因科技有限公司 | Formalin-fixed paraffin-embedded (FFPE) sample nucleic acid library |
CN104630206A (en) * | 2015-02-05 | 2015-05-20 | 北京诺禾致源生物信息科技有限公司 | Method for constructing transcriptome library |
CN104657628A (en) * | 2015-01-08 | 2015-05-27 | 深圳华大基因科技服务有限公司 | Proton-based transcriptome sequencing data comparison and analysis method and system |
CN106055925A (en) * | 2016-05-24 | 2016-10-26 | 中国水产科学研究院 | Method and apparatus for assembling genome sequence based on transcriptome paired-end sequencing data |
CN107828857A (en) * | 2017-11-23 | 2018-03-23 | 南宁科城汇信息科技有限公司 | A kind of transcript profile sequencing and RNAseq data analysing methods |
CN108823297A (en) * | 2018-06-13 | 2018-11-16 | 领星生物科技(上海)有限公司 | Transcript profile sequencing approach based on RT-WES |
CN109182329A (en) * | 2018-09-14 | 2019-01-11 | 求臻医学科技(北京)有限公司 | A kind of application for the RNA extraction method of paraffin-embedded tissue sample and its in high-flux sequence |
-
2019
- 2019-10-11 CN CN201910962113.2A patent/CN110684830A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070254305A1 (en) * | 2006-04-28 | 2007-11-01 | Nsabp Foundation, Inc. | Methods of whole genome or microarray expression profiling using nucleic acids prepared from formalin fixed paraffin embedded tissue |
CN102485979A (en) * | 2010-12-02 | 2012-06-06 | 深圳华大基因科技有限公司 | Formalin-fixed paraffin-embedded (FFPE) sample nucleic acid library |
CN102409099A (en) * | 2011-11-29 | 2012-04-11 | 浙江大学 | Method for analyzing difference of gene expression of porcine mammary gland tissue by sequencing technology |
CN104657628A (en) * | 2015-01-08 | 2015-05-27 | 深圳华大基因科技服务有限公司 | Proton-based transcriptome sequencing data comparison and analysis method and system |
CN104630206A (en) * | 2015-02-05 | 2015-05-20 | 北京诺禾致源生物信息科技有限公司 | Method for constructing transcriptome library |
CN106055925A (en) * | 2016-05-24 | 2016-10-26 | 中国水产科学研究院 | Method and apparatus for assembling genome sequence based on transcriptome paired-end sequencing data |
CN107828857A (en) * | 2017-11-23 | 2018-03-23 | 南宁科城汇信息科技有限公司 | A kind of transcript profile sequencing and RNAseq data analysing methods |
CN108823297A (en) * | 2018-06-13 | 2018-11-16 | 领星生物科技(上海)有限公司 | Transcript profile sequencing approach based on RT-WES |
CN109182329A (en) * | 2018-09-14 | 2019-01-11 | 求臻医学科技(北京)有限公司 | A kind of application for the RNA extraction method of paraffin-embedded tissue sample and its in high-flux sequence |
Non-Patent Citations (4)
Title |
---|
ANIRUDDHA CHATTERJEE ET AL.: "A Guide for Designing and Analyzing RNA-Seq Data", 《METHODS IN MOLECULAR BIOLOGY》 * |
ISAAC D. RAPLEE ET AL.: "Aligning the Aligners: Comparison of RNA Sequencing Data Alignment and Gene Expression Quantification Tools for Clinical Breast Cancer Research", 《J. PERS. MED.》 * |
MILICA VUKMIROVIC ET AL.: "Identification and validation of differentially expressed transcripts by RNA-sequencing of formalin-fixed, paraffin-embedded (FFPE)lung tissue from patients with Idiopathic Pulmonary Fibrosis", 《BMC PULMONARY MEDICINE》 * |
XIAN ADICONIS ET AL.: "Comprehensive comparative analysis of RNA sequencing methods for degraded or low input samples", 《NAT METHODS》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111696629A (en) * | 2020-06-29 | 2020-09-22 | 电子科技大学 | Method for calculating gene expression quantity of RNA sequencing data |
CN111696629B (en) * | 2020-06-29 | 2023-04-18 | 电子科技大学 | Method for calculating gene expression quantity of RNA sequencing data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP4070318B1 (en) | Systems and methods for automating rna expression calls in a cancer prediction pipeline | |
TWI793586B (en) | Single-molecule sequencing of plasma dna | |
JP2018524993A (en) | Nucleic acids and methods for detecting chromosomal abnormalities | |
CN105518151A (en) | Identification and use of circulating nucleic acid tumor markers | |
CN113470743A (en) | Differential gene analysis method based on BD single cell transcriptome and proteome sequencing data | |
CN112289376B (en) | Method and device for detecting somatic cell mutation | |
CN109559780A (en) | A kind of RNA data processing method of high-flux sequence | |
CN110556163A (en) | Analysis method of long-chain non-coding RNA translation small peptide based on translation group | |
CN112011615A (en) | Gene fusion kit for human thyroid cancer and detection method | |
Levin et al. | Optimization for sequencing and analysis of degraded FFPE-RNA samples | |
Forsberg et al. | CLC Bio Integrated Platform for Handling and Analysis of Tag Sequencing Data | |
JP2016518822A (en) | Characterization of biological materials using unassembled sequence information, probabilistic methods, and trait-specific database catalogs | |
CN107506614B (en) | Bacterial ncRNA prediction method | |
CN117746988A (en) | Fusion gene detection method based on DNA or RNA sequencing technology | |
CN117210578A (en) | Broiler SNP molecular marker combination and application thereof | |
CN111370065B (en) | Method and device for detecting cross-sample contamination rate of RNA | |
CN112795654A (en) | Method and kit for organism fusion gene detection and fusion abundance quantification | |
CN111192636A (en) | mRNA next-generation sequencing result analysis method suitable for oligodT enrichment | |
CN111292806B (en) | Transcriptome analysis method by using nanopore sequencing | |
CN110684830A (en) | RNA analysis method for paraffin section tissue | |
Ye et al. | Discovery of alternative polyadenylation dynamics from single cell types | |
CN118345153A (en) | Method for determining and analyzing animal and plant transcriptome poly (A) based on ONT sequencing | |
TWI582631B (en) | Dna sequence analyzing system for analyzing bacterial species and method thereof | |
WO2024187428A1 (en) | Assembly process for constructing high-quality microbial genomes on basis of stlfr metagenomic sequencing data | |
CN118853824A (en) | A method for building a UMI-labeled NGS library based on PCR amplification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200114 |