[go: up one dir, main page]

CN113436679B - Method and system for determining mutation rate of nucleic acid sample to be tested - Google Patents

Method and system for determining mutation rate of nucleic acid sample to be tested Download PDF

Info

Publication number
CN113436679B
CN113436679B CN202010207884.3A CN202010207884A CN113436679B CN 113436679 B CN113436679 B CN 113436679B CN 202010207884 A CN202010207884 A CN 202010207884A CN 113436679 B CN113436679 B CN 113436679B
Authority
CN
China
Prior art keywords
sequencing
variation
nucleic acid
acid sample
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010207884.3A
Other languages
Chinese (zh)
Other versions
CN113436679A (en
Inventor
谢震
黄慧雅
廖微曦
曹玉冰
郭亚琨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Syngentech Co ltd
Tsinghua University
Original Assignee
Beijing Syngentech Co ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Syngentech Co ltd, Tsinghua University filed Critical Beijing Syngentech Co ltd
Priority to CN202010207884.3A priority Critical patent/CN113436679B/en
Publication of CN113436679A publication Critical patent/CN113436679A/en
Application granted granted Critical
Publication of CN113436679B publication Critical patent/CN113436679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6858Allele-specific amplification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Organic Chemistry (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Genetics & Genomics (AREA)
  • Theoretical Computer Science (AREA)
  • Zoology (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a method for determining the mutation rate of a nucleic acid sample to be detected. The method comprises the following steps: sequencing a nucleic acid sample to be detected so as to obtain a sequencing result; comparing the sequencing result with a reference genome sequence of the nucleic acid sample to be tested so as to obtain a comparison result; determining and correcting structural variation, single nucleotide and/or small fragment variation, respectively, based on matching sequencing reads and the average length of the library building fragments; comparing unmatched sequencing reads to genome reference sequences of other species, and determining the sequencing read ratio of the source of each other species to the source of the unmatched species; splicing the unmatched sequencing reads, comparing the splicing result with a reference genome sequence of the nucleic acid sample to be detected so as to determine exogenous variation, and determining possible source species based on the splicing result; summarizing the structural variation, the single nucleotide and/or small fragment variation and the exogenous variation so as to determine the variation rate of the nucleic acid sample to be detected.

Description

Method and system for determining mutation rate of nucleic acid sample to be tested
Technical Field
The present invention relates to the field of biological information, and in particular, to a method and system for determining the variability of a nucleic acid sample to be tested, a computer-readable storage medium, and an electronic device.
Background
Viruses often have stable structures, simple genomes, broad-spectrum infectivity and efficient packaging capability, and become widely used engineering DNA transport expression vectors. Whereas, using the autoimmune nature of the virus, researchers have used inactivated, attenuated or engineered viruses as effective vaccines. Furthermore, by utilizing the biological properties of viruses that lyse host cells during the amplification process, researchers have engineered viruses into oncolytic viruses that have replication packaging capabilities and specifically achieve tumor killing. With the development of virus related researches, various viruses such as adenovirus, lentivirus, herpes simplex virus-1 and the like are the engineering targets at present, and various virus products are applied to clinical treatment.
Although viruses have the above advantages as engineering vectors, the pathogenic ability and susceptibility to mutation of viruses also increase safety risks. Detection of exogenous contamination and self-variation is an important aspect of quality control in retrofit and production processes. In the case of adenoviruses, the FDA requires that the level of replicative adenovirus (RCA) in non-replicative adenoviruses be less than 1RCA/3e10VP. At present, exogenous and variant fragments in a virus sample are detected mainly by a low-throughput method for carrying out PCR detection and first-generation sequencing on a specific region, corresponding primer design is required for the fragments to be detected according to possible variant types, all exogenous and variant fragments in the sample are difficult to completely cover, the specificity and the length limitation of PCR reaction are limited, and high-homology fragments and long fragments are difficult to detect. The deep sequencing technology can be used for constructing a library by randomly fragmenting a sample to be detected, can be used for detecting all fragments in the sample with high flux, covers rich neighborhood information around the fragment to be detected, and can be used for effectively detecting exogenous and variant fragments of a virus sample by combining a related analysis technology.
In summary, the detection of exogenous and variant fragments in virus samples is an important content of quality control, but the conventional detection method still has the problems of low flux, incomplete detection and difficult detection of highly homologous fragments and long fragments, and the inventor establishes a detection and analysis flow from a sample to be detected to an analysis report based on a high flux deep sequencing technology and a related analysis technology, so as to effectively and comprehensively detect exogenous pollution and self-variation in the analysis sample.
Disclosure of Invention
The present application has been made based on the findings and knowledge of the inventors regarding the following facts and problems:
In the quality control of engineering virus transformation and production, in order to detect exogenous pollution and self variation, the inventor establishes a detection and analysis flow from a sample to be detected to an analysis report based on high-throughput deep sequencing and related analysis technology, and effectively and comprehensively detects and analyzes the exogenous pollution and self variation in the sample.
In a first aspect of the invention, the invention provides a method for determining the variability of a nucleic acid sample to be tested. According to an embodiment of the invention, the method comprises: (1) Sequencing a nucleic acid sample to be tested so as to obtain a sequencing result, wherein the lowest effective depth of the sequencing is 10-100, the data size of the sequencing result is determined based on the length of a reference genome, the lowest effective sequencing depth and a preset lowest variation rate of detectable variation, and the sequencing result consists of a plurality of sequencing reads; (2) Comparing the sequencing result with a reference genome sequence of the nucleic acid sample to be tested so as to obtain a comparison result, wherein the comparison result comprises a matched sequencing read and an unmatched sequencing read, and determining the average length of the sequenced library-building fragment based on the matched sequencing read; (3) Determining and correcting structural variation, single nucleotide and/or small fragment variation, respectively, based on the matched sequencing reads and the average length of the pooling fragments; (4) Splicing the unmatched sequencing reads, and comparing the splicing result with a reference genome sequence of the nucleic acid sample to be detected so as to determine exogenous variation; (5) Summarizing the structural variation, the single nucleotide and/or small fragment variation and the exogenous variation so as to determine the variation rate of the nucleic acid sample to be detected. According to an embodiment of the invention, the nucleic acid sample to be tested comprises a viral genome. The method provided by the embodiment of the invention can effectively and comprehensively detect and analyze the conditions of exogenous pollution and self variation in the nucleic acid sample to be detected.
According to an embodiment of the present invention, the method may further comprise at least one of the following additional technical features:
According to an embodiment of the present invention, before performing step (2), performing quality assessment and screening on the sequencing result in advance, and re-determining a detectable mutation lowest mutation rate based on the screening result, and if the detectable mutation lowest mutation rate is lower than a predetermined threshold, increasing the amount of the nucleic acid sample in step (1).
According to an embodiment of the present invention, in step (3), the structural variation is determined using Pindel, and half of the predetermined lowest variation rate of the detectable variation is employed as the variation rate screening threshold.
According to an embodiment of the invention, after determining the structural variation, the single nucleotide variation and/or the small fragment variation, the sequencing reads involved in the variation are subjected to secondary comparison, and the detection variations of the same type are combined and false positive detection results due to low quality bases, comparison result errors and the like are corrected.
According to an embodiment of the present invention, in step (3), the secondary alignment is performed using different software than the alignment in step (2).
According to the embodiment of the invention, in the step (3), common mutation types are excluded according to the public data and the historical detection data.
According to an embodiment of the invention, in step (3), the detection of the single nucleotide and/or small fragment variations is performed using Mutect.sup.2.
According to an embodiment of the invention, in step (4), a possible source species is determined based on the splice result.
According to embodiments of the invention, the unmatched sequencing reads are further aligned with genomic reference sequences of other species, and the sequencing read proportions of each other species source and unknown source are determined.
According to an embodiment of the invention, the genome of the other species comprises a human genome and/or a mycoplasma genome.
According to an embodiment of the invention, further comprising performing PCR validation on the structural variation and/or exogenous variation.
In a second aspect of the present invention, the present invention proposes a computer-readable storage medium having a computer program stored thereon. According to an embodiment of the present invention, the program when executed by a processor implements the method for determining the mutation rate of a nucleic acid sample to be tested as described above.
In a third aspect of the invention, the invention provides an electronic device. According to an embodiment of the invention, the electronic device comprises a memory, a processor; wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for implementing the method of determining the mutation rate of a nucleic acid sample to be tested as described above.
In a fourth aspect of the invention, the invention provides a system for determining the variability of a nucleic acid sample to be tested. According to an embodiment of the invention, the system comprises: the sequencing device is used for sequencing the nucleic acid sample to be tested so as to obtain a sequencing result, the lowest effective depth of the sequencing is 10-100, the data size of the sequencing result is determined based on the length of a reference genome, the lowest effective sequencing depth and a preset lowest variation rate of detectable variation, and the sequencing result consists of a plurality of sequencing reads; the comparison device is connected with the sequencing device and is used for comparing the sequencing result with a reference genome sequence of the nucleic acid sample to be tested so as to obtain a comparison result, wherein the comparison result comprises a matched sequencing read and an unmatched sequencing read, and the average length of the sequenced library-building fragment is determined based on the matched sequencing read; the matched sequencing read analysis device is connected with the comparison device and is used for respectively determining and correcting structural variation, single nucleotide and/or small fragment variation based on the average lengths of the matched sequencing read and the library building fragment; the unmatched sequencing read analysis device is connected with the comparison device, splices unmatched sequencing reads, and compares the splicing result with a reference genome sequence of the nucleic acid sample to be detected so as to determine exogenous variation; and the output device is connected with the matched sequencing read analysis device and the unmatched sequencing read analysis device and is used for summarizing the structural variation, the single nucleotide and/or small fragment variation and the exogenous variation so as to determine the variation rate of the nucleic acid sample to be detected. According to an embodiment of the invention, the nucleic acid sample to be tested is a viral genome. The system according to the embodiment of the invention is suitable for executing the method for determining the mutation rate of the nucleic acid sample to be detected, and effectively and comprehensively detects the conditions of external pollution and self mutation in the analysis sample.
According to an embodiment of the present invention, the above system may further include at least one of the following technical features:
According to an embodiment of the invention, the system further comprises:
and the lowest mutation rate determining device is connected with the sequencing device and the comparison device, and is used for carrying out quality evaluation and screening on the sequencing result in advance, redefining the lowest mutation rate of the detectable mutation based on the screening result, increasing the quantity of the nucleic acid sample in the sequencing device if the lowest mutation rate of the detectable mutation is lower than a preset threshold value, and inputting the sequencing result into the comparison device if the lowest mutation rate of the detectable mutation is not lower than the preset threshold value.
According to an embodiment of the invention, the system further comprises an unmatched sequencing read source analysis device connected to the alignment device for aligning the unmatched sequencing reads with other species genome reference sequences and determining the sequencing read proportions of each other species source and unknown source, and inputting the results to the output device.
According to an embodiment of the invention, the system further comprises a splice result source analysis device connected to the unmatched sequencing read analysis device for determining possible source species based on the splice result, the result being input to the output device.
According to an embodiment of the present invention, the system further comprises a PCR device, which is connected to the matched sequencing read analysis device, the unmatched sequencing read source analysis device and the unmatched sequencing read analysis device, and is configured to perform PCR verification on the structural variation and/or the exogenous variation, and input the result to the output device.
Drawings
FIG. 1 is a schematic diagram of a virus variation detection analysis flow;
FIGS. 2A-2F are simulation tests of the accuracy of different tools to detect different proportions of different length deletion variants;
FIGS. 3A-3F are simulation tests of the accuracy of different tools to detect different proportions of different length roll-over variations;
FIGS. 4A-4F are simulation tests of the accuracy of different tools to detect insertion variations of different lengths in different proportions;
FIGS. 5A-5F are simulation tests of the accuracy of different tools to detect copy number variations of different lengths in different proportions;
FIG. 6 is an exogenous insertion/substitution variation detection flow;
FIGS. 7A-7D are simulation tests of accuracy for detecting exogenous substitution variations of different lengths in different proportions based on stitching;
FIGS. 8A-8C are experimental tests of the accuracy of detecting deletion and inversion variations of different lengths in different proportions;
FIG. 9 is an adenovirus sample experimental test and PCR validation;
FIG. 10 is a Pindel variant high resolution correction flow;
FIG. 11 is a schematic diagram showing the structure of a system for determining the mutation rate of a nucleic acid sample to be tested according to an embodiment of the present invention;
FIG. 12 is a schematic diagram showing a system for determining the mutation rate of a nucleic acid sample to be tested according to another embodiment of the present invention;
FIG. 13 is a schematic diagram showing a system for determining the mutation rate of a nucleic acid sample to be tested according to another embodiment of the present invention;
FIG. 14 is a schematic diagram showing a system for determining the mutation rate of a nucleic acid sample to be tested according to another embodiment of the present invention;
FIG. 15 is a schematic diagram showing a system for determining the mutation rate of a nucleic acid sample according to another embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
The invention provides a system for determining the mutation rate of a nucleic acid sample to be detected. Referring to fig. 11, the system includes: a sequencing device 100, wherein the sequencing device 100 is used for sequencing a nucleic acid sample to be tested so as to obtain a sequencing result, the lowest effective depth of the sequencing is 10-100, the data size of the sequencing result is determined based on the length of a reference genome, the lowest effective sequencing depth and a preset lowest mutation rate of detectable mutation, and the sequencing result is composed of a plurality of sequencing reads; the comparison device 200 is connected with the sequencing device 100, and is used for comparing the sequencing result with a reference genome sequence of the nucleic acid sample to be tested so as to obtain a comparison result, wherein the comparison result comprises a matched sequencing read and an unmatched sequencing read, and the average length of the sequenced library-building fragment is determined based on the matched sequencing read; a matched sequencing read analysis device 300, wherein the matched sequencing read analysis device 300 is connected with the comparison device 200 and is used for respectively determining and correcting structural variation, single nucleotide and/or small fragment variation based on the average lengths of the matched sequencing read and the library building fragment; a non-matched sequencing read analysis device 400, wherein the non-matched sequencing read analysis device 400 is connected with the comparison device 200, splices the non-matched sequencing reads, and compares the splicing result with a reference genome sequence of the nucleic acid sample to be detected so as to determine exogenous variation; and the output device 500 is connected with the matched sequencing read analysis device 300 and the unmatched sequencing read analysis device 400, and is used for summarizing the structural variation, the single nucleotide and/or small fragment variation and the exogenous variation so as to determine the variation rate of the nucleic acid sample to be tested. According to an embodiment of the invention, the nucleic acid sample to be tested is a viral genome. The system according to the embodiment of the invention is suitable for executing the method for determining the mutation rate of the nucleic acid sample to be detected, and effectively and comprehensively detects the conditions of external pollution and self mutation in the analysis sample.
According to an embodiment of the present invention, the above system may further include at least one of the following technical features:
According to an embodiment of the invention, referring to fig. 12, the system further comprises: and a lowest mutation rate determining unit 600, wherein the lowest mutation rate determining unit 600 is connected to the sequencing unit 100 and the comparing unit 200, and is configured to perform quality evaluation and screening on the sequencing result in advance, and to redetermine the lowest mutation rate of the detectable mutation based on the screening result, and if the lowest mutation rate of the detectable mutation is lower than a predetermined threshold, increase the amount of the nucleic acid sample in the sequencing unit, and if the lowest mutation rate of the detectable mutation is not lower than the predetermined threshold, input the sequencing result to the comparing unit.
According to an embodiment of the present invention, referring to fig. 13, the system further comprises an unmatched sequencing read source analysis device 700, wherein the unmatched sequencing read source analysis device 700 is connected to the alignment device 200, and is used for aligning the unmatched sequencing read with genome reference sequences of other species, determining sequencing read proportions of sources of the other species and unknown sources, and inputting the results to the output device 500.
According to an embodiment of the invention, referring to fig. 14, the system further comprises a splice result source analysis device 800, said splice result source analysis device 800 being connected to the unmatched sequencing read analysis device 400 for determining possible source species based on said splice result, the result being input to said output device 500.
According to an embodiment of the present invention, referring to fig. 15, the system further comprises a PCR device 900, wherein the PCR device 900 is connected to the matched sequencing read analysis device 300, the unmatched sequencing read source analysis device 700 and the unmatched sequencing read analysis device 400, and is used for performing PCR verification on the structural variation and/or the exogenous variation, and the result is input to the output device 500.
In the quality control of engineering virus transformation and production, in order to detect exogenous pollution and self variation, the inventor establishes a detection and analysis flow from a sample to be detected to an analysis report based on high-throughput deep sequencing and related analysis technology, and effectively and comprehensively detects and analyzes the exogenous pollution and self variation in the sample.
The specific flow is as follows:
1) Obtaining a reference genome sequence of the detected virus through literature investigation, first generation sequencing and other modes; high quality viral genomic DNA is extracted after virus purification.
2) And (3) performing library-building sequencing on the virus extracted genome by a deep sequencing technology to obtain high-throughput sequencing data with sufficient depth. The minimum effective sequencing depth is an empirical value of 10-100, the total data volume required can be estimated by referring to the length of the genome sequence, the minimum effective sequencing depth and the preset minimum detectable variation rate,
3) Sequencing quality is evaluated by using sequencing data quality evaluation software (such as Fastqc), and base quality distribution and joint pollution conditions are mainly judged; preprocessing by using sequencing data preprocessing software (such as Cutadapt), and selecting corresponding linker types, base quality thresholds and sequence length thresholds by combining the quality evaluation results; carrying out quality evaluation again after data preprocessing, and confirming the preprocessing effect; and re-estimating the lowest mutation rate of the detectable mutation according to the total data amount after pretreatment, and adding a test sample if the lowest mutation rate of the detectable mutation is not up to the preset value.
4) Comparing the preprocessed data with a reference genome by using sequence comparison software (such as Bwa), obtaining a comparison result, and reserving a sub-optimal comparison result; performing de-duplication on the comparison result by using comparison result processing software (such as Samblaster) and extracting unmatched sequencing reads; the comparison results are ranked using comparison result processing software (e.g., sambamba) and the average length of the library-building segments is estimated.
5) Structural variation analysis is performed by using structural variation detection software (such as Pindel) based on the comparison result and the average length of the library building fragments, a proper detection variation length range is selected according to the length of the reference genome sequence, and half of the lowest variation rate of the preset detectable variation is selected as a variation rate screening threshold.
6) Correcting the mutation detection result and the detected mutation rate by using high-resolution mutation detection correction software, combining the same type of detected mutation based on the re-comparison result of the detected mutation related data, eliminating false positive detection results generated due to low-quality bases, comparison result errors and the like, and eliminating common mutation types according to the public data and the historical detection data.
7) The single nucleotide and small fragment variation analysis is performed based on the comparison result by using single nucleotide and small fragment variation detection software (such as Mutect < 2 >), common variation types are eliminated according to the published virus polymorphism data and the historical detection data, and the comparison is performed with the structural variation detection result so as to reduce false negative as a target, supplement single nucleotide and small fragment variation which is not detected in the structural variation detection, and correct the estimated variation rate of the single nucleotide and small fragment variation which is also detected in the structural variation detection.
8) Comparing unmatched sequencing reads with possible pollution genomes (such as human genome and mycoplasma genome) respectively, and counting the proportion of each pollution source and unknown source in the sample; and re-estimating the total data amount of the detected virus sources and the lowest detectable mutation rate according to the proportion of the pollution source sequences, and adding a detection sample if the preset lowest detectable mutation rate is not reached.
9) Splice software (e.g., spades) is used to splice unmatched sequencing reads and kmer parameters are adjusted to obtain the best splice rate and splice length. And detecting exogenous substitution variation according to an exogenous fragment splicing result by using substitution variation detection software, comparing the spliced fragment with a virus reference genome, screening spliced fragments of which both half read lengths at two ends of the fragment can be matched with the reference genome, analyzing possible exogenous insertion variation according to the matching position, and estimating variation rate according to the depth of the reference genome data and the depth of the spliced fragment data.
10 Searching for splice fragments using sequence similarity searching software (e.g., blast), and analyzing the possible source species and genetic information of the splice fragments.
11 For the detected structural variation and exogenous fragment variation, designing corresponding PCR experiments for verification, and recovering the PCR fragments for first-generation sequencing verification.
12 And (3) integrating the analysis results to generate a final analysis report.
The flow chart of the invention is shown in figure 1.
Embodiments of the present invention will be described in further detail below, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
Example 1 simulation test of Virus variant detection analysis procedure
Experiment one
Simulation testing of the accuracy of various tool detection of deletion, inversion, insertion, copy number variation of various lengths
Each simulation randomly generates 40000bp sequence as a reference genome sequence, and a tool Art simulation library building process is used for generating 40000-to-Illumina PE150 sequencing data as non-mutated sequencing data; randomly generating deletion mutation with lengths of 1, 10, 100, 200 and 1000bp in a reference genome sequence respectively, and generating 40000-bit Illumina PE150 sequencing data serving as sequencing data after mutation by using a tool Art simulation library building process; the non-mutated and mutated sequencing data samples were mixed at a mutation rate of 0.1, 0.2, 0.5, respectively, and the resulting mutation was detected using tools Mutect, freeBayes, pindel, delly, gridss, lumpy, respectively. The simulation was repeated 200 times to evaluate the accuracy of each tool to detect deletion variations of different lengths.
Each simulation randomly generates 40000bp sequence as a reference genome sequence, and a tool Art simulation library building process is used for generating 40000-to-Illumina PE150 sequencing data as non-mutated sequencing data; randomly generating turnover mutation with lengths of 10 bp, 100 bp, 200 bp and 1000bp in a reference genome sequence respectively, and generating 40000-bit Illumina PE150 sequencing data serving as sequencing data after mutation by using a tool Art simulation library building process; the non-mutated and mutated sequencing data samples were mixed at a mutation rate of 0.1, 0.2, 0.5, respectively, and the resulting mutation was detected using tools Mutect, freeBayes, pindel, delly, gridss, lumpy, respectively. The simulation was repeated 200 times to evaluate the accuracy of each tool to detect different length rollover variations.
Each simulation randomly generates 40000bp sequence as a reference genome sequence, and a tool Art simulation library building process is used for generating 40000-to-Illumina PE150 sequencing data as non-mutated sequencing data; randomly generating insertion mutation with lengths of 1, 10, 100 and 200bp in a reference genome sequence respectively, and generating 40000-bit Illumina PE150 sequencing data serving as sequencing data after mutation by using a tool Art simulation library building process; the non-mutated and mutated sequencing data samples were mixed at a mutation rate of 0.1, 0.2, 0.5, respectively, and the resulting mutation was detected using tools Mutect, freeBayes, pindel, delly, gridss, lumpy, respectively. The simulation was repeated 200 times to evaluate the accuracy of each tool to detect insertion variations of different lengths.
Each simulation randomly generates 40000bp sequence as a reference genome sequence, and a tool Art simulation library building process is used for generating 40000-to-Illumina PE150 sequencing data as non-mutated sequencing data; randomly generating 2X and 3X copy number variations with lengths of 25, 50, 100, 200 and 1000bp in a reference genome sequence respectively, and generating 40000-bit Illumina PE150 sequencing data as sequencing data after variation by using a tool Art simulation library building process; the non-mutated and mutated sequencing data samples were mixed at a mutation rate of 0.1, 0.2, 0.5, respectively, and the resulting mutation was detected using tools Mutect, pindel, delly, gridss, respectively. The simulation was repeated 200 times to evaluate the accuracy of each tool to detect copy number variations of different lengths.
The detection results are shown in figures 2A-2F, 3A-3F, 4A-4F and 5A-5F. Mutect2 and FreeBayes are means for detecting single nucleotide variation and small fragment insertion deletion variation, deletion, insertion and turnover variation with a maximum length of 10bp can be detected in the test, mutect2 can detect copy number 2X variation with a maximum length of 50bp, and the variation rate estimated by the means is close to the actual variation rate. Delly, lumpy and Gridss are tools for detecting structural variation, deletion and turnover variation with the minimum length of 100bp can be detected in the test, and Gridss can also detect insertion variation with the partial length of 100 and 200bp due to the function of partial splicing of Gridss. Delly can detect 2X copy number variations of a minimum length of 200 bp. Gridss can detect a 3 Xcopy number variation of 25bp in minimum length. Neither Delly, lumpy, nor Gridss can evaluate the variability. Pindel can detect variations of various lengths, and the estimated variation rate of the tool is close to the actual variation rate. With 30X coverage corresponding to the simulation data variation rate of 0.1 as a detection limit, the performance of each tool is consistent compared with the simulation data with the variation rate of 0.2 or 0.5. In conclusion, pindel can comprehensively detect various types of variation in various lengths, and can be used as a main tool for detecting virus variation; mutect2 and FreeBayes can be used as supplements for single nucleotide variation and small fragment insertion deletion variation detection; longer length exogenous insertion variants cannot be detected with tools based on known profile alignment, and need to be detected by tools based on stitching.
Experiment two
Simulation test of accuracy of detection of exogenous fragment variations of various lengths by each tool
Each simulation randomly generates 40000bp sequence as a reference genome sequence, and a tool Art simulation library building process is used for generating 40000-to-Illumina PE150 sequencing data as non-mutated sequencing data; randomly generating substitution variants with deletion lengths of 0, 200, 500, 1000 and 10000bp respectively and insertion lengths of 200, 500, 1000 and 10000bp respectively in a reference genome sequence, and generating 40000-Illumina PE150 sequencing data as sequencing data after the variants by using a tool Art simulation library building process; mixing the unmutated and mutated sequencing data samples with mutation rates of 0.1, 0.2 and 0.5 respectively, and splicing unmatched sequencing reads by using a tool Spades to detect the generated mutation. Exogenous insertion/substitution variation detection flow is shown in fig. 6. The simulation was repeated 200 times to evaluate Spades the accuracy of the splicing tool to detect substitution variations of different length inserts.
The detection results are shown in FIGS. 7A to 7D. For the replacement variation of the inserted and deleted fragments with different lengths, the Spades tool can splice the exogenous fragments and accurately compare the exogenous fragments to the occurrence position of the replacement variation, and the variation rate estimated by the tool is close to the actual variation rate. And taking 30X coverage corresponding to the simulation data mutation rate of 0.1 as a detection limit, and comparing with simulation data with the mutation rate of 0.2 or 0.5, the detection accuracy shows consistency. This shows that for longer exogenous fragment insertion/substitution variation, the splice-based Spades tool can accurately detect exogenous fragment and variation occurrence positions.
Example 2 adenovirus experimental test of viral variation detection analysis procedure
Experiment one
Experimental test virus variation detection analysis procedure for detecting accuracy of deletion and inversion variation of various lengths
Constructing adenovirus packaging plasmid with the length of about 40000bp as an unmutated vector; deletion and inversion variants of different lengths were introduced at different positions on the non-variant vector as shown in table 1; mixing the non-mutated vector and mutated vector with mutation rates of 0.001, 0.01 and 0.1 respectively, and performing Illumina PE150 deep sequencing with sequencing depth 1G; and detecting the variation in each sample through a virus variation detection analysis flow, comparing the variation with a variation vector and an actual variation rate, and evaluating the accuracy of the virus variation detection analysis flow.
The detection results are shown in FIGS. 8A to 8C. For the deletion and turnover mutation of different lengths of different detected positions, the virus mutation detection and analysis flow can accurately detect the deletion and turnover mutation, and the detection accuracy is consistent compared with the mutation rate of 0.01 or 0.1 by taking the coverage of about 30X corresponding to the actual mutation rate of 0.001 as the detection limit.
Table 1:
Type(s) Length (bp) Initial position (bp)
Variation A Deletion of 15 1177
Variation B Deletion of 347 1837
Variation C Deletion of 1042 2907
Variation D Overturning 12 6084
Variation E Overturning 320 5047
Variation F Overturning 2079 4504
Experiment two
Experimental test virus variation detection analysis procedure for detecting virus sample SYN
Extracting a SYN genome of a virus sample and carrying out deep sequencing of Illumina PE150 at a sequencing depth of 1G; detecting the variation in each sample through a virus variation detection analysis flow, and verifying the detection result through PCR and a first generation sequencing reaction.
By analyzing the sequencing result, the exogenous fragment 1 and the exogenous fragment 2 are detected. The corresponding PCR primers were designed separately, and the gel was run as shown in FIG. 9, consistent with the length and position of the detected exogenous fragment. And (4) recovering the PCR fragments for carrying out first-generation sequencing, wherein the sequences are consistent with the detected exogenous fragments.
Example 3Pindel variant high resolution correction flow test
Experiment one
Test Pindel detection of variant high resolution correction procedure the detection results of virus sample SYN2 were corrected
Pindel the variant high resolution correction flow is shown in fig. 10. Screening Pindel results with variation depth greater than 10, and totally 254 results; after high resolution correction, 34 microsatellite unstable variants are detected altogether, 8 other variants are detected, wherein 2 variants are combined variants; compared with the historical detection results, the microsatellite instability variants are all existing variants, 5 of the other variants are the existing variants, and 3 are the newly detected variants. The results are shown in Table 2. The accuracy of Pindel detection results is effectively improved through the visual correction flow
Table 2:
example 4 lentivirus Experimental test of Virus variant detection analysis procedure
Experiment one
Experimental test virus variation detection analysis procedure for detecting lentivirus sample
Constructing a lentiviral vector with a certain length of fragments as an unmutated vector; introducing deletion and inversion variants of different lengths at different positions on the non-variant carrier; mixing the non-mutated vector and mutated vector with a mutation rate of 0.01 respectively, and performing Illumina PE150 deep sequencing with a sequencing depth of 1G; and detecting the variation in each sample through a virus variation detection analysis flow, comparing the variation with a variation vector and an actual variation rate, and evaluating the accuracy of the virus variation detection analysis flow.
The virus mutation detection and analysis flow can accurately detect deletion and turnover mutation of different lengths at different positions, and the detection accuracy is consistent compared with the mutation rate of 0.01.
Example 5 adeno-associated Virus Experimental test of Virus variant detection analysis procedure
Experiment one
Experimental test virus variation detection analysis procedure for detecting adeno-associated virus sample
Constructing an adeno-associated virus vector with a certain length of fragments as an unmutated vector; introducing deletion and inversion variants of different lengths at different positions on the non-variant carrier; mixing the non-mutated vector and mutated vector with a mutation rate of 0.01 respectively, and performing Illumina PE150 deep sequencing with a sequencing depth of 1G; and detecting the variation in each sample through a virus variation detection analysis flow, comparing the variation with a variation vector and an actual variation rate, and evaluating the accuracy of the virus variation detection analysis flow.
The virus mutation detection and analysis flow can accurately detect deletion and turnover mutation of different lengths at different positions, and the detection accuracy is consistent compared with the mutation rate of 0.01.
Example 6 Experimental test of herpes simplex Virus of Virus variant detection analysis procedure
Experiment one
Experimental test virus variation detection analysis flow for detecting herpes simplex virus sample
Constructing a herpes simplex virus vector with a certain length of fragments as an unmutated vector; introducing deletion and inversion variants of different lengths at different positions on the non-variant carrier; mixing the non-mutated vector and mutated vector with a mutation rate of 0.01 respectively, and performing Illumina PE150 deep sequencing with a sequencing depth of 1G; and detecting the variation in each sample through a virus variation detection analysis flow, comparing the variation with a variation vector and an actual variation rate, and evaluating the accuracy of the virus variation detection analysis flow.
The virus mutation detection and analysis flow can accurately detect deletion and turnover mutation of different lengths at different positions, and the detection accuracy is consistent compared with the mutation rate of 0.01.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims (20)

1.一种确定待测核酸样本变异率的方法,其特征在于,包括:1. A method for determining the mutation rate of a nucleic acid sample to be tested, comprising: (1)对待检测核酸样本进行测序,以便获得测序结果,所述测序的最低有效深度为10~100,所述测序结果的数据量是基于参考基因组的长度、最低有效测序深度以及预定的可检出变异最低变异率确定的,所述测序结果由多个测序读段构成;(1) sequencing the nucleic acid sample to be tested to obtain a sequencing result, wherein the minimum effective depth of the sequencing is 10 to 100, the data volume of the sequencing result is determined based on the length of the reference genome, the minimum effective sequencing depth, and a predetermined minimum mutation rate of detectable mutations, and the sequencing result is composed of multiple sequencing reads; (2)将所述测序结果与所述待测核酸样本的参考基因组序列进行比对,以便获得比对结果,所述比对结果包括匹配测序读段和未匹配测序读段,并基于所述匹配测序读段确定所述测序的建库片段平均长度;(2) comparing the sequencing result with the reference genome sequence of the nucleic acid sample to be tested to obtain a comparison result, wherein the comparison result includes matched sequencing reads and unmatched sequencing reads, and determining the average length of the sequenced library construction fragments based on the matched sequencing reads; (3)基于所述匹配测序读段与所述建库片段平均长度,分别确定并校正结构变异、单核苷酸和/或小片段变异;(3) determining and correcting structural variations, single nucleotide variations and/or small fragment variations based on the matched sequencing reads and the average length of the library fragments; (4)对所述未匹配测序读段进行拼接,并对拼接结果与所述待测核酸样本的参考基因组序列进行比对,以便确定外源变异;(4) splicing the unmatched sequencing reads, and comparing the splicing results with the reference genome sequence of the nucleic acid sample to be tested to determine the exogenous variation; (5)将所述结构变异、所述单核苷酸和/或小片段变异、外源变异进行汇总,以便确定所述待测核酸样本的变异率。(5) Summarizing the structural variation, the single nucleotide and/or small fragment variation, and the exogenous variation to determine the variation rate of the nucleic acid sample to be tested. 2.根据权利要求1所述的方法,其特征在于,所述待测核酸样本包括病毒基因组。2. The method according to claim 1 is characterized in that the nucleic acid sample to be tested includes a viral genome. 3.根据权利要求1所述的方法,其特征在于,在进行步骤(2)之前,预先对所述测序结果进行质量评估和筛选,并基于筛选结果,重新确定可检出变异最低变异率,如果所述可检出变异最低变异率低于预定的阈值,则在步骤(1)增加所述核酸样本的量。3. The method according to claim 1 is characterized in that, before performing step (2), the sequencing results are quality assessed and screened in advance, and based on the screening results, the minimum detectable mutation rate is re-determined, and if the minimum detectable mutation rate is lower than a predetermined threshold, the amount of the nucleic acid sample is increased in step (1). 4.根据权利要求1所述的方法,其特征在于,在步骤(3)中,利用Pindel确定所述结构变异,并且采用所述预定的可检出变异最低变异率的一半作为变异率筛选阈值。4. The method according to claim 1 is characterized in that, in step (3), Pindel is used to determine the structural variation, and half of the predetermined minimum variation rate of detectable variation is used as the variation rate screening threshold. 5.根据权利要求1所述的方法,其特征在于,在确定所述结构变异、所述单核苷酸变异和/或所述小片段变异后,对所述结构变异、所述单核苷酸变异和/或所述小片段变异所涉及的所述测序读段进行二次比对,合并同类型检出变异并校正由于低质量碱基、比对结果误差原因产生的假阳性检测结果。5. The method according to claim 1 is characterized in that after determining the structural variation, the single nucleotide variation and/or the small fragment variation, a secondary alignment is performed on the sequencing reads involved in the structural variation, the single nucleotide variation and/or the small fragment variation, the same type of detected variations are merged and false positive detection results caused by low-quality bases and errors in the alignment results are corrected. 6.根据权利要求5所述的方法,其特征在于,在步骤(3)中,所述二次比对与步骤(2)中的比对采用不同的软件。6. The method according to claim 5, characterized in that, in step (3), the secondary comparison and the comparison in step (2) use different software. 7.根据权利要求1所述的方法,其特征在于,在步骤(3)中,根据公开数据与历史检测数据排除常见变异类型。7. The method according to claim 1 is characterized in that, in step (3), common variant types are excluded based on public data and historical detection data. 8.根据权利要求1所述的方法,其特征在于,在步骤(3)中,采用Mutect2进行所述单核苷酸和/或小片段变异的检测。8. The method according to claim 1, characterized in that in step (3), Mutect2 is used to detect the single nucleotide and/or small fragment variation. 9.根据权利要求1所述的方法,其特征在于,进一步将所述未匹配测序读段与其他物种基因组参考序列进行比对,并确定各其他物种来源和未知来源的测序读段比例。9. The method according to claim 1 is characterized in that the unmatched sequencing reads are further compared with genome reference sequences of other species, and the ratio of sequencing reads from other species and unknown sources is determined. 10.根据权利要求9所述的方法,其特征在于,所述其他物种基因组包括人基因组和/或支原体基因组。10. The method according to claim 9, characterized in that the genomes of other species include a human genome and/or a mycoplasma genome. 11.根据权利要求1所述的方法,其特征在于,在步骤(4)中,基于所述拼接结果,确定可能的来源物种。11. The method according to claim 1, characterized in that in step (4), the possible source species is determined based on the splicing result. 12.根据权利要求1所述的方法,其特征在于,进一步包括对所述结构变异和/或外源变异进行PCR验证。12. The method according to claim 1 is characterized in that it further comprises PCR verification of the structural variation and/or exogenous variation. 13.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1-12中任一所述的确定待测核酸样本变异率的方法。13. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the method for determining the mutation rate of a nucleic acid sample to be tested as described in any one of claims 1 to 12 is implemented. 14.一种电子设备,其特征在于,包括存储器、处理器;14. An electronic device, characterized in that it comprises a memory and a processor; 其中,所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于实现如权利要求1-12中任一所述的确定待测核酸样本变异率的方法。The processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the method for determining the mutation rate of the nucleic acid sample to be tested as described in any one of claims 1-12. 15.一种确定待测核酸样本变异率的系统,其特征在于,包括:15. A system for determining the mutation rate of a nucleic acid sample to be tested, comprising: 测序装置,所述测序装置用于对待检测核酸样本进行测序,以便获得测序结果,所述测序的最低有效深度为10~100,所述测序结果的数据量是基于参考基因组的长度、最低有效测序深度以及预定的可检出变异最低变异率确定的,所述测序结果由多个测序读段构成;A sequencing device, wherein the sequencing device is used to sequence a nucleic acid sample to be detected so as to obtain a sequencing result, wherein the minimum effective depth of the sequencing is 10 to 100, the data volume of the sequencing result is determined based on the length of the reference genome, the minimum effective sequencing depth, and a predetermined minimum mutation rate of detectable mutations, and the sequencing result is composed of a plurality of sequencing reads; 比对装置,所述比对装置与所述测序装置相连,用于将所述测序结果与所述待测核酸样本的参考基因组序列进行比对,以便获得比对结果,所述比对结果包括匹配测序读段和未匹配测序读段,并基于所述匹配测序读段确定所述测序的建库片段平均长度;A comparison device, connected to the sequencing device, for comparing the sequencing result with the reference genome sequence of the nucleic acid sample to be tested, so as to obtain a comparison result, wherein the comparison result includes matched sequencing reads and unmatched sequencing reads, and determining the average length of the sequenced library construction fragments based on the matched sequencing reads; 匹配测序读段分析装置,所述匹配测序读段分析装置与所述比对装置相连,用于基于所述匹配测序读段与所述建库片段平均长度,分别确定并校正结构变异、单核苷酸和/或小片段变异;A matching sequencing read analysis device, which is connected to the comparison device and is used to determine and correct structural variations, single nucleotide and/or small fragment variations based on the matching sequencing reads and the average length of the library construction fragments; 未匹配测序读段分析装置,所述未匹配测序读段分析装置与所述比对装置相连,对所述未匹配测序读段进行拼接,并对拼接结果与所述待测核酸样本的参考基因组序列进行比对,以便确定外源变异;An unmatched sequencing read analysis device, which is connected to the comparison device, splices the unmatched sequencing reads, and compares the splicing result with the reference genome sequence of the nucleic acid sample to be tested, so as to determine the exogenous variation; 输出装置,所述输出装置与所述匹配测序读段分析装置和所述未匹配测序读段分析装置相连,用于将所述结构变异、所述单核苷酸和/或小片段变异、外源变异进行汇总,以便确定所述待测核酸样本的变异率。An output device, the output device is connected to the matched sequencing read analysis device and the unmatched sequencing read analysis device, and is used to summarize the structural variation, the single nucleotide and/or small fragment variation, and the exogenous variation so as to determine the variation rate of the nucleic acid sample to be tested. 16.根据权利要求15所述的系统,其特征在于,所述待测核酸样本包括病毒基因组。16. The system according to claim 15, characterized in that the nucleic acid sample to be tested includes a viral genome. 17.根据权利要求15所述的系统,其特征在于,进一步包括:17. The system according to claim 15, further comprising: 最低变异率确定装置,所述最低变异率确定装置与所述测序装置和所述比对装置相连,用于预先对所述测序结果进行质量评估和筛选,并基于筛选结果,重新确定可检出变异最低变异率,如果所述可检出变异最低变异率低于预定的阈值,则在测序装置增加所述核酸样本的量,如果所述可检出变异最低变异率不低于预定的阈值,则测序结果输入所述比对装置。A minimum mutation rate determination device is connected to the sequencing device and the comparison device, and is used to pre-evaluate and screen the sequencing results, and based on the screening results, redetermine the minimum mutation rate of detectable mutations. If the minimum mutation rate of detectable mutations is lower than a predetermined threshold, the amount of the nucleic acid sample is increased in the sequencing device; if the minimum mutation rate of detectable mutations is not lower than the predetermined threshold, the sequencing results are input into the comparison device. 18.根据权利要求15所述的系统,其特征在于,进一步包括未匹配测序读段来源分析装置,所述未匹配测序读段来源分析装置与所述比对装置相连,用于将所述未匹配测序读段与其他物种基因组参考序列进行比对,并确定各其他物种来源和未知来源的测序读段比例,结果输入所述输出装置。18. The system according to claim 15 is characterized in that it further comprises an unmatched sequencing read source analysis device, which is connected to the comparison device and is used to compare the unmatched sequencing reads with genome reference sequences of other species, and determine the ratio of sequencing reads from other species and unknown sources, and the results are input into the output device. 19.根据权利要求15所述的系统,其特征在于,进一步包括拼接结果来源分析装置,所述拼接结果来源分析装置与未匹配测序读段分析装置相连,用于基于所述拼接结果,确定可能的来源物种,结果输入所述输出装置。19. The system according to claim 15 is characterized in that it further comprises a splicing result source analysis device, which is connected to the unmatched sequencing read analysis device and is used to determine the possible source species based on the splicing result, and the result is input into the output device. 20.根据权利要求18所述的系统,其特征在于,进一步包括PCR装置,所述PCR装置与所述匹配测序读段分析装置、所述未匹配测序读段分析装置、未匹配测序读段来源分析装置相连,用于对所述结构变异和/或外源变异进行PCR验证,结果输入所述输出装置。20. The system according to claim 18 is characterized in that it further comprises a PCR device, which is connected to the matched sequencing read analysis device, the unmatched sequencing read analysis device, and the unmatched sequencing read source analysis device, and is used to perform PCR verification on the structural variation and/or exogenous variation, and the results are input into the output device.
CN202010207884.3A 2020-03-23 2020-03-23 Method and system for determining mutation rate of nucleic acid sample to be tested Active CN113436679B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010207884.3A CN113436679B (en) 2020-03-23 2020-03-23 Method and system for determining mutation rate of nucleic acid sample to be tested

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010207884.3A CN113436679B (en) 2020-03-23 2020-03-23 Method and system for determining mutation rate of nucleic acid sample to be tested

Publications (2)

Publication Number Publication Date
CN113436679A CN113436679A (en) 2021-09-24
CN113436679B true CN113436679B (en) 2024-05-10

Family

ID=77753267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010207884.3A Active CN113436679B (en) 2020-03-23 2020-03-23 Method and system for determining mutation rate of nucleic acid sample to be tested

Country Status (1)

Country Link
CN (1) CN113436679B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631792A (en) * 2022-11-18 2023-01-20 湖南师范大学 Sequencing-based hybrid fish gene recombination analysis method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102251059A (en) * 2011-07-12 2011-11-23 武汉百泰基因工程有限公司 Hepatitis B virus lamivudine resistant RNA quantitative detection kit, detection method, primers and probes
CN104781421A (en) * 2012-09-04 2015-07-15 夸登特健康公司 Systems and methods for detecting rare mutations and copy number variations
CN105483118A (en) * 2015-12-21 2016-04-13 浙江大学 Gene editing technique taking Argonaute nuclease as core
CN105861710A (en) * 2016-05-20 2016-08-17 北京科迅生物技术有限公司 Sequencing joint and preparation method and application thereof in ultra-low frequency mutation detection
CN106909806A (en) * 2015-12-22 2017-06-30 广州华大基因医学检验所有限公司 The method and apparatus of fixed point detection variation
CN107209814A (en) * 2015-01-13 2017-09-26 10X基因组学有限公司 For making structure variation and the visual system and method for phase information
WO2020005159A1 (en) * 2018-06-25 2020-01-02 Lucence Life Sciences Pte. Ltd. Method for detection and quantification of genetic alterations

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1568765A1 (en) * 2004-02-23 2005-08-31 GSF-Forschungszentrum für Umwelt und Gesundheit GmbH Method for genetic diversification in gene conversion active cells
JP7113838B2 (en) * 2016-11-16 2022-08-05 イルミナ インコーポレイテッド Enabling method and system for array variant calling

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102251059A (en) * 2011-07-12 2011-11-23 武汉百泰基因工程有限公司 Hepatitis B virus lamivudine resistant RNA quantitative detection kit, detection method, primers and probes
CN104781421A (en) * 2012-09-04 2015-07-15 夸登特健康公司 Systems and methods for detecting rare mutations and copy number variations
CN107209814A (en) * 2015-01-13 2017-09-26 10X基因组学有限公司 For making structure variation and the visual system and method for phase information
CN105483118A (en) * 2015-12-21 2016-04-13 浙江大学 Gene editing technique taking Argonaute nuclease as core
CN106909806A (en) * 2015-12-22 2017-06-30 广州华大基因医学检验所有限公司 The method and apparatus of fixed point detection variation
CN105861710A (en) * 2016-05-20 2016-08-17 北京科迅生物技术有限公司 Sequencing joint and preparation method and application thereof in ultra-low frequency mutation detection
WO2020005159A1 (en) * 2018-06-25 2020-01-02 Lucence Life Sciences Pte. Ltd. Method for detection and quantification of genetic alterations

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Oncogenic Mutations of p110a Isoform of PI 3-Kinase Upregulate Its Protein Kinase Activity;Christina M. Buchanan 等;《PLOS ONE》;第8卷(第8期);全文 *
Oncolytic adenovirus programmed by synthetic gene circuit for cancer immunotherapy;Huiya Huang 等;《Nature Communications》;全文 *
桑树多倍体鉴定和育种研究现状;焦锋 等;《西北农业学报》;第24卷(第12期);全文 *

Also Published As

Publication number Publication date
CN113436679A (en) 2021-09-24

Similar Documents

Publication Publication Date Title
Antipov et al. Plasmid detection and assembly in genomic and metagenomic data sets
Ewing et al. Base-calling of automated sequencer traces using phred. II. Error probabilities
Li et al. Mapping short DNA sequencing reads and calling variants using mapping quality scores
CN106909806B (en) Method and device for spot detection of variants
McCarthy et al. LTR_STRUC: a novel search and identification program for LTR retrotransposons
McElroy et al. Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias
CN104937598B (en) The accurate and quick positioning of the sequencing reading value of targeting
IL276891B1 (en) Ultra-sensitive detection of circulating tumor dna through genome-wide integration
CN104794371B (en) The method and apparatus for detecting retrotransponsons insertion polymorphism
CN104504304A (en) Method and device for identifying clustered regularly interspaces short palindromic repeats (CRISPR)
EP3475863B1 (en) Methods for processing next-generation sequencing genomic data
CN110846411A (en) A method for differentiating gene mutation types based on next-generation sequencing of individual tumor samples
CN113436679B (en) Method and system for determining mutation rate of nucleic acid sample to be tested
Mumm et al. Multiplexed long-read plasmid validation and analysis using OnRamp
CN104293941A (en) Method for constructing sequencing library and application of sequencing library
Simonyan et al. HIVE-heptagon: a sensible variant-calling algorithm with post-alignment quality controls
Chen et al. DNA damage is a major cause of sequencing errors, directly confounding variant identification
CN110942806A (en) Blood type genotyping method and device and storage medium
CN116075596A (en) Methods for Identifying Nucleic Acid Barcodes
US20200354798A1 (en) Methods for determining tumor microsatellite instability
Brown et al. High-throughput analysis of DNA break-induced chromosome rearrangements by amplicon sequencing
Tadmor et al. MCRL: using a reference library to compress a metagenome into a non-redundant list of sequences, considering viruses as a case study
US20230332205A1 (en) Linked dual barcode insertion constructs
Röckl et al. Identification of viral variants from functional genomics data
US20220284986A1 (en) Systems and methods for identifying exon junctions from single reads

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant