CN106566877A - Gene mutation detection method and apparatus - Google Patents
Gene mutation detection method and apparatus Download PDFInfo
- Publication number
- CN106566877A CN106566877A CN201610932451.8A CN201610932451A CN106566877A CN 106566877 A CN106566877 A CN 106566877A CN 201610932451 A CN201610932451 A CN 201610932451A CN 106566877 A CN106566877 A CN 106566877A
- Authority
- CN
- China
- Prior art keywords
- sample
- tested
- mutation
- window
- sequence number
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 36
- 206010064571 Gene mutation Diseases 0.000 title claims abstract description 17
- 230000035772 mutation Effects 0.000 claims abstract description 126
- 238000000034 method Methods 0.000 claims abstract description 92
- 238000012163 sequencing technique Methods 0.000 claims abstract description 79
- 238000000265 homogenisation Methods 0.000 claims abstract description 75
- 238000012217 deletion Methods 0.000 claims abstract description 46
- 230000037430 deletion Effects 0.000 claims abstract description 46
- 239000000523 sample Substances 0.000 claims description 233
- 108090000623 proteins and genes Proteins 0.000 claims description 53
- 239000007791 liquid phase Substances 0.000 claims description 24
- 230000000869 mutational effect Effects 0.000 claims description 18
- 238000013461 design Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 17
- 230000008859 change Effects 0.000 claims description 16
- 230000008034 disappearance Effects 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 13
- 238000002360 preparation method Methods 0.000 claims description 11
- 238000002864 sequence alignment Methods 0.000 claims description 9
- 230000036438 mutation frequency Effects 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 7
- 239000013068 control sample Substances 0.000 claims description 6
- 102100021147 DNA mismatch repair protein Msh6 Human genes 0.000 claims description 5
- 101150084967 EPCAM gene Proteins 0.000 claims description 5
- 101000968658 Homo sapiens DNA mismatch repair protein Msh6 Proteins 0.000 claims description 5
- 108010074346 Mismatch Repair Endonuclease PMS2 Proteins 0.000 claims description 5
- 102100037480 Mismatch repair endonuclease PMS2 Human genes 0.000 claims description 5
- 102100035886 Adenine DNA glycosylase Human genes 0.000 claims description 4
- 102100034157 DNA mismatch repair protein Msh2 Human genes 0.000 claims description 4
- 102100037700 DNA mismatch repair protein Msh3 Human genes 0.000 claims description 4
- 102000012804 EPCAM Human genes 0.000 claims description 4
- 102100032742 Histone-lysine N-methyltransferase SETD2 Human genes 0.000 claims description 4
- 101001000351 Homo sapiens Adenine DNA glycosylase Proteins 0.000 claims description 4
- 101001134036 Homo sapiens DNA mismatch repair protein Msh2 Proteins 0.000 claims description 4
- 101001027762 Homo sapiens DNA mismatch repair protein Msh3 Proteins 0.000 claims description 4
- 101000654725 Homo sapiens Histone-lysine N-methyltransferase SETD2 Proteins 0.000 claims description 4
- 101000896484 Homo sapiens Mitotic checkpoint protein BUB3 Proteins 0.000 claims description 4
- 101000896657 Homo sapiens Mitotic checkpoint serine/threonine-protein kinase BUB1 Proteins 0.000 claims description 4
- 101000738901 Homo sapiens PMS1 protein homolog 1 Proteins 0.000 claims description 4
- 101000628562 Homo sapiens Serine/threonine-protein kinase STK11 Proteins 0.000 claims description 4
- 229910015837 MSH2 Inorganic materials 0.000 claims description 4
- 102100021718 Mitotic checkpoint protein BUB3 Human genes 0.000 claims description 4
- 102100021691 Mitotic checkpoint serine/threonine-protein kinase BUB1 Human genes 0.000 claims description 4
- 101710143112 Mothers against decapentaplegic homolog 4 Proteins 0.000 claims description 4
- 102100025725 Mothers against decapentaplegic homolog 4 Human genes 0.000 claims description 4
- 102100037482 PMS1 protein homolog 1 Human genes 0.000 claims description 4
- 108010011536 PTEN Phosphohydrolase Proteins 0.000 claims description 4
- 102100026715 Serine/threonine-protein kinase STK11 Human genes 0.000 claims description 4
- 101150057140 TACSTD1 gene Proteins 0.000 claims description 4
- 102100032543 Phosphatidylinositol 3,4,5-trisphosphate 3-phosphatase and dual-specificity protein phosphatase PTEN Human genes 0.000 claims 1
- 230000004907 flux Effects 0.000 abstract description 5
- 238000012545 processing Methods 0.000 abstract description 3
- 238000012360 testing method Methods 0.000 description 49
- 108020004414 DNA Proteins 0.000 description 45
- 239000003153 chemical reaction reagent Substances 0.000 description 32
- 239000011324 bead Substances 0.000 description 25
- 239000000203 mixture Substances 0.000 description 18
- 239000012634 fragment Substances 0.000 description 17
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 16
- 239000007788 liquid Substances 0.000 description 14
- 201000010099 disease Diseases 0.000 description 13
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 13
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 12
- 239000000243 solution Substances 0.000 description 12
- 238000009396 hybridization Methods 0.000 description 11
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 8
- 239000000872 buffer Substances 0.000 description 8
- 238000005119 centrifugation Methods 0.000 description 8
- 230000007850 degeneration Effects 0.000 description 8
- 210000004027 cell Anatomy 0.000 description 7
- 239000000047 product Substances 0.000 description 7
- SUKJFIGYRHOWBL-UHFFFAOYSA-N sodium hypochlorite Chemical compound [Na+].Cl[O-] SUKJFIGYRHOWBL-UHFFFAOYSA-N 0.000 description 7
- LYCAIKOWRPUZTN-UHFFFAOYSA-N Ethylene glycol Chemical compound OCCO LYCAIKOWRPUZTN-UHFFFAOYSA-N 0.000 description 6
- 229910019093 NaOCl Inorganic materials 0.000 description 6
- 238000005352 clarification Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 239000006228 supernatant Substances 0.000 description 6
- 206010009944 Colon cancer Diseases 0.000 description 5
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 5
- 101000924577 Homo sapiens Adenomatous polyposis coli protein Proteins 0.000 description 5
- 238000012408 PCR amplification Methods 0.000 description 5
- 230000003321 amplification Effects 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 5
- 230000009514 concussion Effects 0.000 description 5
- 238000007689 inspection Methods 0.000 description 5
- 230000001717 pathogenic effect Effects 0.000 description 5
- 239000000376 reactant Substances 0.000 description 5
- 239000000344 soap Substances 0.000 description 5
- 108010026664 MutL Protein Homolog 1 Proteins 0.000 description 4
- 206010028980 Neoplasm Diseases 0.000 description 4
- 102000006382 Ribonucleases Human genes 0.000 description 4
- 108010083644 Ribonucleases Proteins 0.000 description 4
- 238000007792 addition Methods 0.000 description 4
- 235000001014 amino acid Nutrition 0.000 description 4
- 150000001413 amino acids Chemical class 0.000 description 4
- 238000003149 assay kit Methods 0.000 description 4
- 201000011510 cancer Diseases 0.000 description 4
- 239000003085 diluting agent Substances 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 229960004756 ethanol Drugs 0.000 description 4
- 239000012071 phase Substances 0.000 description 4
- 238000007619 statistical method Methods 0.000 description 4
- 239000002699 waste material Substances 0.000 description 4
- 108020004705 Codon Proteins 0.000 description 3
- 229920000742 Cotton Polymers 0.000 description 3
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 3
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 3
- 102000004190 Enzymes Human genes 0.000 description 3
- 108090000790 Enzymes Proteins 0.000 description 3
- 108091092584 GDNA Proteins 0.000 description 3
- 102000014160 PTEN Phosphohydrolase Human genes 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- 238000005520 cutting process Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000010790 dilution Methods 0.000 description 3
- 239000012895 dilution Substances 0.000 description 3
- 238000001704 evaporation Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 238000007838 multiplex ligation-dependent probe amplification Methods 0.000 description 3
- 238000003199 nucleic acid amplification method Methods 0.000 description 3
- 230000004853 protein function Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000007789 sealing Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 102100034540 Adenomatous polyposis coli protein Human genes 0.000 description 2
- 108700028369 Alleles Proteins 0.000 description 2
- 230000004544 DNA amplification Effects 0.000 description 2
- 108700024394 Exon Proteins 0.000 description 2
- 208000008051 Hereditary Nonpolyposis Colorectal Neoplasms Diseases 0.000 description 2
- 206010051922 Hereditary non-polyposis colorectal cancer syndrome Diseases 0.000 description 2
- 201000005027 Lynch syndrome Diseases 0.000 description 2
- 108010090804 Streptavidin Proteins 0.000 description 2
- 239000012148 binding buffer Substances 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000003412 degenerative effect Effects 0.000 description 2
- 230000008020 evaporation Effects 0.000 description 2
- 238000010438 heat treatment Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 239000012528 membrane Substances 0.000 description 2
- 239000002773 nucleotide Substances 0.000 description 2
- 125000003729 nucleotide group Chemical group 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 108090000765 processed proteins & peptides Proteins 0.000 description 2
- 238000000746 purification Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 238000007480 sanger sequencing Methods 0.000 description 2
- 238000001179 sorption measurement Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 239000000725 suspension Substances 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000010257 thawing Methods 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 108700001666 APC Genes Proteins 0.000 description 1
- 241000208340 Araliaceae Species 0.000 description 1
- 240000002791 Brassica napus Species 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 201000006107 Familial adenomatous polyposis Diseases 0.000 description 1
- -1 MHS2 Proteins 0.000 description 1
- 101150105944 Mutyh gene Proteins 0.000 description 1
- 108020004485 Nonsense Codon Proteins 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 208000037656 Respiratory Sounds Diseases 0.000 description 1
- 239000005708 Sodium hypochlorite Substances 0.000 description 1
- 108091027544 Subgenomic mRNA Proteins 0.000 description 1
- 108020005038 Terminator Codon Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 239000002313 adhesive film Substances 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 238000003287 bathing Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000005251 capillar electrophoresis Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 239000012295 chemical reaction liquid Substances 0.000 description 1
- 210000004691 chief cell of stomach Anatomy 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 208000029664 classic familial adenomatous polyposis Diseases 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 235000019628 coolness Nutrition 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 231100000517 death Toxicity 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 229960000935 dehydrated alcohol Drugs 0.000 description 1
- 239000008367 deionised water Substances 0.000 description 1
- 229910021641 deionized water Inorganic materials 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 239000012154 double-distilled water Substances 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000004064 dysfunction Effects 0.000 description 1
- 239000012149 elution buffer Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002789 length control Methods 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000004630 mental health Effects 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000004877 mucosa Anatomy 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 239000013642 negative control Substances 0.000 description 1
- 230000037434 nonsense mutation Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000005498 polishing Methods 0.000 description 1
- 229920001184 polypeptide Polymers 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 239000011148 porous material Substances 0.000 description 1
- 102000004196 processed proteins & peptides Human genes 0.000 description 1
- 238000000197 pyrolysis Methods 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 238000003753 real-time PCR Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000009183 running Effects 0.000 description 1
- 238000004062 sedimentation Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 239000011534 wash buffer Substances 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Zoology (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Immunology (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a gene mutation detection method and apparatus. The method comprises the following steps: 1, acquiring sequencing data of a sample to be detected and a contrast sample; and 2, judging that whether SNP mutation, InDel mutation and/or deletion mutation exists in the sequencing data of the sample to be detected: (1) carrying out homogenization processing, calculating a standard deviation and median, and calculating the irrelevance Z value according to a formula (1); and (2) judging deletion: judging the deletion mutation exists in the window of the sample to be detected if the Z value Z = (the homogenous sequence number of the sample to be detected - the median)/the standard deviation (1) is greater than 3. The method and the apparatus improve the detection flux and the detection accuracy.
Description
Technical field
The present invention relates to detection in Gene Mutation field, in particular to a kind of method and apparatus of detection gene mutation.
Background technology
Data display, 2012, the whole world increased 14,100,000 cancer patients newly, and China is newly-increased 306.5 ten thousand, accounts for 22%;Entirely
Because of cancer mortality, China accounts for 27%, i.e. 220.6 ten thousand people and dies from cancer the people of ball 8,200,000.Colorectal cancer is the digestion of common generation
Road malignant tumor, annual about 1,020,000 new cases in the whole world, 530,000 deaths.China has been enter into colorectal cancer hotspot
Ranks, this disease increasingly seriously threatens the physical and mental health of people.Colorectal cancer case is newly sent out every year up to 130,000 by China,
And constantly risen with average annual 4% amplification, there is the 3rd, tumor in women die rate middle position shelter.
In colorectal cancer, there are two class diseases to occupy major part, be respectively Jessica Lynch's syndrome and Familial Adenomatous breath
Disease of muscle.Wherein, Jessica Lynch's syndrome mainly occurs pathogenic mutation by MLH1, MHS2, MSH6, PMS2, EPCAM gene is caused;And
Familial adenomatous polyposises are then mainly caused by APC and MUTYH gene mutation.
It is well known that can result in the above-mentioned gene mutation such as the disease of colorectal cancer etc has various, including SNP,
InDel (insertion or the base number for lacking are relatively fewer, generally rarely exceed 100bp) and large fragment deletion mutation (are lacked
Mutation) or large fragment repetition mutation (the typically disappearance of Kb ranks or repetition).The method that gene mutation is detected in prior art
Have a lot, wherein, can detect the deletion mutation detection method of fragment includes multiplex ligation-dependent probe amplification (MLPA), fluorescence
Quantitative PCR, Sanger sequencing and secondary sequencing.
The ultimate principle of MLPA is to be hybridized probe and target sequence DNA, the specialization connection of probe, PCR amplifications, expansion
Then volume increase thing is analyzed last obtaining to the data collected by capillary electrophoresis separation, data collection using analysis software
Go out conclusion;Quantitative fluorescent PCR ultimate principle includes carrying out quantitatively the corresponding gene position PCR primer of testing sample and matched group
Analysis, by comparing the conclusion for whether having insertion or repeating is drawn.Sanger sequencing is lacked by direct sequencing
Region.However, there is design primer complexity, low flux, high labor intensive, high cost, not adapting to a large amount of samples in said method
The defects such as the demand of sheet or batch gene abrupt climatic change, thus apply and be restricted.
With the development of high throughput sequencing technologies, the characteristics of its flux is big, accuracy rate is high so that second filial generation sequence measurement into
For the popular means of current detection gene mutation.But the huge sequencing data obtained after high-flux sequence, how quickly,
Therefrom finding the mutated site and mutation type of all purposes gene exactly becomes difficult point, therefore, it is badly in need of providing a kind of batch
The method of all mutation types of amount detection, to improve the flux and accuracy of detection.
The content of the invention
Present invention is primarily targeted at provide it is a kind of detection gene mutation method and apparatus, to improve prior art in
The detection defect that flux is low, accuracy is low.
To achieve these goals, according to an aspect of the invention, there is provided a kind of method of detection gene mutation, is somebody's turn to do
Method is comprised the following steps:Obtain the sequencing data of sample to be tested and check sample;It is in the sequencing data for judging sample to be tested
It is no to there is SNP mutation and/or InDel mutation;And judge to whether there is deletion mutation in the sequencing data of sample to be tested;Its
In, judge that the step of whether there is deletion mutation in the sequencing data of sample to be tested includes:Homogenization is processed, and sequencing data is cut
It is divided into window, statistics sample to be tested and check sample are respectively in the sequence number of each window, and the sequence number to each window is carried out
One change is processed, and obtains sample to be tested and check sample respectively in the homogenization sequence number of each window;Standard deviation and median are calculated,
Calculate the standard deviation and median of homogenization sequence number of the matched group sample on each window;Irrelevance is calculated, and is counted according to formula (1)
Calculate on each window, the irrelevance Z values of the homogenization sequence number of sample to be tested and the median of check sample;And disappearance is sentenced
It is disconnected, when Z values greatly
Z=(the homogenization sequence number-median of sample to be tested)/standard deviation (1)
When 3, then judge that sample to be tested has deletion mutation in window.
Further, uniform in process step, sequencing data is cut into into continuous disjoint window.
Further, the step of homogenization is processed includes:By sample to be tested and the respective sequencing data cutting of check sample
Into window, and the sequence number of each each window of leisure is designated as into First ray number;And by the summation of each First ray number, be designated as
Respective second sequence number;And the formula as shown in formula (2) enters to sample to be tested and the respective First ray number of check sample
At row homogenization
The homogenization sequence numbers (2) of sequence number=First ray number * 1000/ second
Reason, obtains the homogenization sequence number of sample to be tested and each each window of leisure of check sample.
Further, the step of being mutated with the presence or absence of SNP mutation and/or InDel in the sequencing data of sample to be tested is judged
Including:Sequence alignment, the sequencing data of sample to be tested and reference gene group is compared and obtains comparison result;Sieve for the first time
Choosing, filters out the site that there is SNP mutation and/or InDel mutation from comparison result, is designated as the first candidate locus;Second
Screening, filters out site of crowd's mutation frequency less than 2% from the first candidate locus, is designated as the second candidate locus;SNP and/
Or InDel mutation judge, according to, to the functional annotation of the second candidate locus, judging the second candidate locus in functional annotation data base
In with the presence or absence of causing the SNP mutation site and/or InDel mutational sites that gene function changes;If existing, by second
Candidate locus are designated as the 3rd candidate locus;And SNP and/or InDel mutation confirm, when there are three candidate locus, by the
Three candidate locus are defined as SNP mutation site and/or InDel mutational sites.
Further, before the step of obtaining the sequencing data of sample to be tested and check sample, method also includes treating
The step of test sample sheet and check sample carry out exon library and prepare respectively, the step of prepared by exon library in caught using liquid phase
The method for obtaining is prepared.
Further, before the method for being captured using liquid phase is prepared, also include according to target gene exon region
The step of design liquid phase capture probe.
Further, exon library preparation process is prepared comprising the exon library to multiple target genes, many
Individual target gene at least includes following gene:MLH1、MSH2、MSH3、MSH6、PMS1、PMS2、BUB1、BUB3、STK11、
PTEN, SMAD4, APC, MUTYH, EPCAM, SETD2, MAX, TSC2, ATM and FANCC.
According to a further aspect in the invention, there is provided a kind of device of detection deletion mutant, the device includes:Obtain
Module, for obtaining the sequencing data of sample to be tested and check sample;First judge module, for judging the sequencing of sample to be tested
It is mutated with the presence or absence of SNP mutation and/or InDel in data;And second judge module, for judging the sequencing number of sample to be tested
Whether there is deletion mutation according in;Wherein, the second judge module includes:Homogenization submodule, for sequencing data to be cut into
Window, statistics sample to be tested and check sample are respectively in the sequence number of each window, and the sequence number to each window is uniformed
Process, obtain sample to be tested and check sample respectively in the homogenization sequence number of each window;First calculating sub module, for calculating
The standard deviation and median of homogenization sequence number of the matched group sample on each window;Second calculating sub module, for according to formula
(1) calculate on each window, the irrelevance Z values of the homogenization sequence number of sample to be tested and the median of check sample;And lack
Mistake judges submodule
Z=(the homogenization sequence number-median of sample to be tested)/standard deviation (1)
Block, for when Z values are more than 3, then judging that sample to be tested has deletion mutation in window.
Further, uniform submodule to further include:Statistic unit, for by sample to be tested and check sample each
Counted in the sequence number of each window, be designated as respective First ray number, by the First ray number of respective all windows it
With counted, be designated as respective second sequence number;And computing unit, for by sample to be tested and check sample in each window
First ray number carry out homogenization process according to the formula shown in formula (2), obtain sample to be tested and check sample be each comfortable every
Individual window
The homogenization sequence numbers (2) of sequence number=First ray number * 1000/ second
Homogenization sequence number.
Further, the first judge module includes:Sequence alignment submodule, for by the sequencing data of sample to be tested with ginseng
Examine genome and compare and obtain comparison result;, there is SNP mutation for filtering out from comparison result in the first screening submodule
And/or the site of InDel mutation, it is designated as the first candidate locus;Second screening submodule, for screening from the first candidate locus
Go out site of crowd's mutation frequency less than 2%, be designated as the second candidate locus;SNP and/or InDel mutation judging submodules, are used for
Cause base according to, to the functional annotation of the second candidate locus, judging to whether there is in the second candidate locus in functional annotation data base
The SNP mutation site changed because of function and/or InDel mutational sites;If existing, the second candidate locus are designated as into the 3rd
Candidate locus;And SNP and/or InDel mutation confirm submodule, for when there are three candidate locus, by the 3rd candidate
Site is defined as SNP mutation site and/or InDel mutational sites.
Further, before acquisition module, device also includes that exon library prepares module, for being captured using liquid phase
Method is prepared to the exon library of sample to be tested and check sample.
Further, sublibrary is shown outside to prepare before module, device also includes that probe designs module, for according to target
Gene extron subregion designs liquid phase capture probe.
Using technical scheme, dashed forward by the way that the sequencing data of sample to be tested and check sample to be carried out respectively SNP
Become and/or InDel mutation judge and deletion mutation judges, will can exist in sample to be tested in one-time detection above-mentioned various
The site of mutation type all detects.And, lack mutation judge in, by using each window homogenization sequence number with
The statistical method of departure degree of the check sample between the median of the window come judge a certain window with the presence or absence of disappearance, phase
Than the statistical method with the departure degree of the average of homogeneous sequence number, from statistical significance for, validity and accuracy
It is all higher, it is easier to distinguish false positive.
Description of the drawings
The Figure of description for constituting the part of the application is used for providing a further understanding of the present invention, and the present invention's shows
Meaning property embodiment and its illustrated for explaining the present invention, does not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 shows the result figure of the deletion mutation in a kind of specific embodiment of the invention.
Specific embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combination.Below with reference to the accompanying drawings and in conjunction with the embodiments describing the present invention in detail.
Firstly the need of explanation, various possible mutation present in sample can be measured using high-flux sequence, be wrapped
Include InDel, SNP and large fragment deletion.The deletion mutation that methods and apparatus of the present invention detection is obtained is mainly from statistics
Angle to infer sample to be tested in gene mutation site that may be present and its concrete species of mutation, as to whether and disease
There is direct or indirect relation, need many checkings of other testing results, thus this method and device are only fitted
It is used for scientific research and academic basic research, and is not suitable for the diagnosis of clinically disease.
As background section is previously mentioned, in prior art when gene mutation is detected using high-flux sequence method,
Existing cannot accurately detect in batches the defect of mutated site and all possible mutation type.The present invention is above-mentioned scarce in order to improve
Fall into, in a kind of typical embodiment, as shown in Figure 1, there is provided it is a kind of detection gene mutation method, the method include with
Lower step:Obtain the sequencing data of sample to be tested and check sample;Judge in testing data with the presence or absence of SNP mutation and/or
InDel is mutated;And judge to whether there is deletion mutation in testing data;Wherein, judge in testing data with the presence or absence of disappearance
The step of mutation, includes:Homogenization is processed, and sequencing data is cut into into window, and statistics sample to be tested and check sample are respectively each
The sequence number of window, and the sequence number to each window carries out homogenization process, obtains sample to be tested and check sample respectively each
The homogenization sequence number of window;Standard deviation and median are calculated, and calculate homogenization sequence number of the matched group sample on each window
Standard deviation and median;Irrelevance is calculated, and is calculated on each window according to formula (1), the homogenization sequence number of sample to be tested
With the median of check sample
Z=(the homogenization sequence number-median of sample to be tested)/standard deviation (1)
Irrelevance Z values;Disappearance judgement, when Z values are more than 3, then judges that the window has deletion mutation.
The present invention said method by the sequencing data of sample to be tested and check sample is carried out respectively SNP mutation and/
Or InDel mutation judge and deletion mutation judges, can be by the site that there is above-mentioned various mutations type in sample to be tested all
Detect.And, in mutation judgement is lacked, by using the homogenization sequence number and check sample of each window in the window
Median between departure degree statistical method come judge a certain window with the presence or absence of disappearance, compare and homogeneous sequence number
The statistical method of the departure degree of average, from statistical significance for, validity and accuracy it is all higher, it is easier to distinguish
False positive.
In above-mentioned homogenization process step, to be cut into window in the form of carry out the calculating of sequence number, be easy to according to difference
Sequencing data sequencing depth and target deletion fragment size come flexible splitter size, make the deletion fragment of detection
Magnitude range it is more extensive;Also, when it is determined that a certain window whether there is deletion mutation, sample to be tested is calculated first each
The homogenization sequence number of window and check sample the median of the window difference, then further according to the difference and check sample
Whether it is more than 3 to determine the window with the presence or absence of disappearance in the ratio of the standard deviation of the homogenization sequence number of the window.It is this to sentence
Disconnected method is compared and adopts meansigma methodss by choosing the median of one group of check sample as the standard for comparing, the calculating of median
The impact of the homogenization sequence number of indivedual unusual fluctuations is not susceptible to, thus judged result is more accurate.
In the said method of the present invention, can be according to the sensitivity of detection and inspection by the form of sequencing data splitter
The relation surveyed between accuracy, carries out suitably weighing and arranging.In the present invention, it is preferred to the window of above-mentioned cutting is continuous not phase
The window of friendship.Longer exon is divided into into continuous disjoint window, the exon shorter for length then will be whole
Exon is divided in a window.When window is set to less value, it is easy to find less deletion mutation, but it is different
Sequencing sequence number changes greatly and is inconvenient to compare on sample room identical window.When window is set to larger value, not equally
The change of sequencing sequence number is less on uniform window between product, but cannot find less deletion mutation.
In above-described embodiment, the length of window is according to the sequencing depth of sample to be tested and check sample and the base of expected detection
Arrange because of the length of deletion mutation.In being detected due to this sequencing, the length of the exon in the length and gene of all genes
Degree is all known, therefore can carry out window size setting according to the mutation length of expected detection.If it is desired to detection
Mutant fragments are less, then arrange a less value, otherwise can then arrange one larger value of window.The length of window is less
Detection sensitivity is higher, correspondingly accuracy relative drop.The more big then accuracy of the length of window is higher, under sensitivity relatively
Drop.In a kind of preferred embodiment of the present invention, it is more than or equal to for 300 × when in sequencing depth, the length of each window is 50~
160bp.It is more than or equal to for 300 × when in sequencing depth, the length control of each window can be more taken into account into detection in 50~100bp sensitive
Degree and detection accuracy.
In the said method of the present invention, the step of homogenization is processed in primarily to the sequence number of each window of homogenizing,
So that sample to be tested and check sample will not cause comparison result in the sequence number of each window because of the difference of sequencing depth
Deviation, thus, this area is applied to the present invention to the operation that data carry out uniforming process.In the present invention, it is preferred to will treat
Test sample sheet and the respective sequencing data of check sample are cut into window, and the sequence number of respective each window is designated as into First ray
Number, the summation of each First ray number is designated as the second sequence number;Formula as shown in formula (2) is each to sample to be tested and check sample
From sequence number enter
The homogenization sequence numbers (2) of sequence number=First ray number * 1000/ second
Row homogenization is processed, and obtains the homogenization sequence number of sample to be tested and each each window of leisure of check sample.
In above-described embodiment, carry out uniforming the step of processing using formula listed by formula (2), can more effectively eliminate not
With the deviation that the sequence number brought because depth is sequenced between sample is counted so that the sequence number phase of each window of each sample
To homogeneous.
In said method, the step being mutated with the presence or absence of SNP mutation and/or InDel in the sequencing data of sample to be tested is judged
Suddenly it is mutated with the presence or absence of SNP mutation or InDel in detection sample to be tested, or whether two kinds of mutation are while all exist, or
Person, in sample to be tested, different purpose sites whether there is above-mentioned one or two different mutation types, thus, using this
The conventional determination methods in field.
In a kind of preferred embodiment of the invention, judge in the sequencing data of sample to be tested with the presence or absence of SNP mutation and/
Or InDel includes the step of mutation:Sequence alignment procedures, sequencing data and reference gene group are compared and obtain comparing knot
Really;For the first time screening step, filters out the site that there is SNP mutation and/or InDel mutation from comparison result, is designated as first
Candidate locus;Programmed screening step, filters out site of crowd's mutation frequency less than 2% from the first candidate locus, is designated as
Second candidate locus;SNP and/or InDel mutation judge step, according in functional annotation data base to the work(of the second candidate locus
Can annotate, judge in the second candidate locus with the presence or absence of the SNP mutation site and/or InDel for causing gene function to change
Mutational site;If existing, the second candidate locus are designated as into the 3rd candidate locus;SNP and/or InDel mutation verification steps, when
When there are three candidate locus, the 3rd candidate locus are defined as into SNP mutation site and/or InDel mutational sites.
In above preferred embodiment, can be using such as SOAP (http during sequence alignment://
) etc soap.genomics.org.cn/ software, sequencing gained sequence the corresponding position of reference gene group is navigated to;I.e.
The SNP site different from the corresponding position of reference gene group and/or InDel sites can be obtained.In actual process, also need
The overburden depth (number of times that i.e. site is measured) in each SNP site and/or InDel sites is counted, in order to ensure
The accuracy in the mutational site found, site of the overburden depth less than 30 is removed.Afterwards, in remaining SNP site
And/or in InDel sites, according to functional annotation of each site in functional annotation data base, determine in these sites whether deposit
The function of gene can be affected in certain site, if there are such a or several sites, can confirm that this is one or several
There is SNP mutation and/or InDel mutation in site.Additionally, the difference of the function according to genes of interest of interest, can also adopt
Exclude or confirm whether certain site is to cause parafunctional mutational site with other corresponding householder methods.Such as, if
Want to confirm whether above-mentioned site is the related site of disease, in addition to carrying out dysfunction according to the existing information of data base and judging,
Can with according to disease sample and control normal specimens SNP mutation and/or InDel mutational sites information, select in crowd
Site of the frequency less than 2%, is predicted using SIFT softwares to protein function, has the site of change to protein function as disease
The pathogenic candidate locus of disease.
In the said method of the present invention, before step S1, also include carrying out sample to be tested and check sample respectively
Prepared by exon library the step of, is prepared in the preparation process in exon library using the method for liquid phase capture.Using liquid
Mutually to prepare exon library capture rate higher for the method for capture, and can save the plenty of time.
In the said method of the present invention, before the method captured using liquid phase is prepared, also include according to outside target
The step of aobvious subregion design liquid phase capture probe.Liquid phase capture probe can adopt the method for designing of this area typical probe to enter
Row design, such as carry out liquid phase probe customization, official manual side of NimbleGen companies by Agilent company official manual method
Method carries out liquid phase probe customization etc..
It is different according to research purpose in the preparation process of above-mentioned exon library, the multiple of sample to be tested can be selected
Genes of interest carries out the preparation of exon library;Or select one or more genes in multiple samples to be tested to carry out outer showing respectively
Sublibrary builds.It is multiple in a kind of preferred embodiment of the invention when the exon library to multiple genes is prepared
Gene at least includes following gene:MLH1、MSH2、MSH3、MSH6、PMS1、PMS2、BUB1、BUB3、STK11、PTEN、
SMAD4, APC, MUTYH, EPCAM, SETD2, MAX, TSC2, ATM and FANCC.When sequencing data includes above-mentioned multiple bases
Because when, mutational site that may be present and its mutation type in above-mentioned multiple genes can be detected by said method simultaneously.
Above-mentioned multiple genes are known, and in the present invention, inventor is provided to above-mentioned at least 19 first
The method that gene carries out centralized detecting, it is thus possible to which disposably detecting in same sample to be tested may in above-mentioned multiple genes
The mutational site of presence and its mutation type.
In a kind of specific embodiment of the invention, the step in the above-mentioned exon library for preparing sample to be tested and check sample
Suddenly include:Break process is carried out to the genomic DNA of sample to be tested and check sample, broken DNA is obtained;Broken DNA is carried out
A process is repaired and added in end, obtains the DNA plerosis at 3 ' ends band " A ";Joint connection is carried out to DNA plerosis, belt lacing DNA is obtained;
Enter performing PCR amplification to belt lacing DNA, obtain DNA amplification;Hybridized with liquid phase capture probe and DNA amplification, obtain treating test sample
This exon library with check sample.Above-mentioned exon library is obtained containing target base in preparing using the method for liquid phase capture
Because of the sequencing library of exon region, obtain the efficiency high of exon and save time.
It is after the exon library of sample to be tested and check sample is obtained and right in the said method of the present invention
Before exon library is sequenced, the step of also carry out degenerative treatments including external aobvious sublibrary.Degenerative treatments are carried out herein
Purpose is easy for high-flux sequence and uses.
In another kind of typical embodiment of the invention, there is provided a kind of device of deletion mutant detection, the dress
Put including:Acquisition module, for obtaining the sequencing data of sample to be tested and check sample;First judge module, for judging to treat
Survey in data and be mutated with the presence or absence of SNP mutation and/or InDel;And second judge module, judge to whether there is in testing data
Deletion mutation;Wherein, the second judge module includes:Homogenization submodule, for sequencing data to be cut into into window, counts to be measured
Sample and check sample are respectively in the sequence number of each window, and the sequence number to each window carries out homogenization process, obtains to be measured
Sample and check sample are respectively in the homogenization sequence number of each window;First calculating sub module, exists for calculating matched group sample
The standard deviation and median of the homogenization sequence number on each window;Second calculating sub module, for calculating each according to formula (1)
Z=(the homogenization sequence number-median of sample to be tested)/standard deviation (1)
On window, the irrelevance Z values of the homogenization sequence number of sample to be tested and the median of check sample;And disappearance is sentenced
Disconnected submodule, for when Z values are more than 3, then judging that window has deletion mutation.
The said apparatus of the present invention, by acquisition module the sequencing data of sample to be tested and check sample is obtained;Utilize
First judge module judges to be mutated with the presence or absence of SNP mutation and/or InDel in testing data;And using the second judge module
Judge to whether there is deletion mutation in testing data;And the second judge module utilizes homogenization submodule by sequencing data cutting
Into window, statistics sample to be tested and check sample are carried out homogeneous respectively in the sequence number of each window to the sequence number of each window
Change is processed, and obtains sample to be tested and check sample respectively in the homogenization sequence number of each window;Then the first calculating sub module meter
Calculate the standard deviation and median of homogenization sequence number of the matched group sample on each window;Using the second calculating sub module according to formula
(1) each window is calculated
Z=(the homogenization sequence number-median of sample to be tested)/standard deviation (1)
On, the irrelevance Z values of the homogenization sequence number of sample to be tested and the median of check sample;Then perform disappearance to sentence
Disconnected submodule, when Z values are more than 3, then judges that window has deletion mutation.
Said apparatus are easy to be lacked according to the sequencing depth and target of different sequencing datas by using homogenization submodule
The size for losing fragment carrys out the size of flexible splitter, makes the magnitude range of the deletion fragment of detection more extensive.And, second sentences
Disconnected module is to be measured by being used as calculating using the median between check sample when it is determined that a certain window whether there is deletion mutation
The standard that sample compares in the departure degree of the homogenization sequence number of each window, compares using meansigma methodss and standard deviation as comparing
Standard, the calculating of median is not susceptible to the impact of the sequence numbers of indivedual exceptions, it is easier to distinguishes false positive, makes determination result
It is more accurate.
In the said apparatus of the present invention, above-mentioned homogenization submodule can enter to homogenization submodule commonly used in the art
Row is suitably modified, and any homogenization submodule that can be standardized the sequence number of each window of the present invention is equal
Suitable for the present invention.In a preferred embodiment, above-mentioned homogenization submodule is further included:Statistic unit:For
The sequence number of each window of leisure each to sample to be tested and check sample is counted, and is designated as respective First ray number, will be each
Counted from the First ray number sum of all windows, be designated as respective second sequence number;Computing unit:For test sample will to be treated
This and check sample carry out homogenization process in the First ray number of each window according to the formula shown in formula (2), obtain treating test sample
Sheet and check sample
Homogenization sequence number=sequence number ... ... ... ... ... .. of First ray number * 1000/ second
(2)
The homogenization sequence number of each each window of leisure.In the embodiment, this homogenization submodule can be reduced effectively
Impact of the sequencing depth difference between each sample to result.
The present invention said apparatus in, the first judge module be judge in sample to be tested whether there is SNP mutation or
InDel is mutated, or whether two kinds of mutation are while all exist, or, in sample to be tested, different purpose sites are with the presence or absence of upper
One or two different mutation types are stated, thus, using the conventional judge module in this area.
In a kind of preferred embodiment of the invention, the first judge module includes:Sequence alignment submodule, for being sequenced
Data are compared with reference gene group and obtain comparison result;First screening submodule, deposits for filtering out from comparison result
In the site that SNP mutation and/or InDel are mutated, the first candidate locus are designated as;Second screening submodule, for from the first candidate
Site of crowd's mutation frequency less than 2% is filtered out in site, the second candidate locus are designated as;SNP and/or InDel mutation judge
Submodule, for being in the second candidate locus according to, to the functional annotation of the second candidate locus, judging in functional annotation data base
It is no to there is the SNP mutation site and/or InDel mutational sites for causing gene function to change;If existing, by the second candidate
Site is designated as the 3rd candidate locus;SNP and/or InDel mutation confirm submodule, for when there are three candidate locus, inciting somebody to action
3rd candidate locus are defined as SNP mutation site and/or InDel mutational sites.
In above preferred embodiment, sequence alignment submodule can be using such as SOAP (http://
) etc soap.genomics.org.cn/ comparing module is compared.In above-mentioned first screening submodule, according to actual number
According to sequencing quality height, the screening submodule that is removed of site less than 30 to overburden depth can also be included.Second sieve
Submodule is selected to be site according to the crowd's mutation frequency counted in current existing data base less than 2% to the first candidate locus
Screened, the second candidate locus for obtaining belong to site of crowd's mutation frequency less than 2%, and imply that may be not belonging to body
The high frequency mutation of existing individual variation, and it is probably the mutation related to disease, then perform SNP and/or InDel mutation and judge son
Whether module, changing for gene function is caused according to given data storehouse to the annotation of each gene function come the mutation for judging a certain site
Become, if there is such site, further perform SNP and/or InDel mutation and confirm submodule, gene function will be caused to change
The site of change is defined as SNP mutation site and/or InDel mutational sites.
The above-mentioned data base annotated to gene function includes but are not limited to dbSNP (http://
www.ncbi.nlm.nih.gov/projects/SNP/)、HGMD(www.hgmd.cf.ac.uk)、ClinVar(http://
www.ncbi.nlm.nih.gov/clinvar/)、LOVInSiGHT(http://insight-group.org/
lovd.html)。
In said apparatus, before detection module, said apparatus also include that exon library prepares module:For adopting
Liquid phase catching method is prepared to the exon library of sample to be tested and check sample.Above-mentioned exon library prepares module and adopts
The method captured with liquid phase obtains catching for the library Exon that the sequencing library containing target gene exon region is obtained
Obtain efficiency high and save time.
In said apparatus, sublibrary is shown outside and is prepared before module, device also includes that probe designs module:For basis
Target exon region designs liquid phase capture probe.The design principle of probe design module is that design is little with target area complementation
Fragment, captures target area sequence.Probe design module commonly used in the art can be adopted, such as by official of Agilent company
Manual technique carries out liquid phase probe customization, NimbleGen companies official manual method and carries out liquid phase probe customization.
Beneficial effects of the present invention are further illustrated with reference to specific embodiment.
It should be noted that following examples describe the method for the present invention in detail by taking 19 genes listed by table 1 as an example,
Reagent used or medicine and instrument, such as without special mark, both from Agilent company of the U.S..The present embodiment recruits 96 can
Can be the carrier and 10 normal persons, signature Written informed consent of gene mutation, then detect that carrier there may be
Mutant gene and its concrete mutation type.Buccal swab sample extraction is carried out according to buccal swab extracting method, it is prompt according to peace
The description of human relations carries out chip preparation and hybridization, is sequenced according to the description of Illumina.Comprise the following steps that:
Table 1:
| MLH1 | MSH2 | MSH3 | MSH6 | PMS1 | PMS2 |
| BUB1 | BUB3 | STK11 | PTEN | SMAD4 | APC |
| MUTYH | EPCAM | SETD2 | MAX | TSC2 | ATM |
| FANCC |
Test chip design
Reference sequences are above-mentioned 19 genes of NCBI build 37/hg19 (from www.ncbi.nlm.nih.gov)
Group exon sequence and in front and back 10bp, are completed by the design of Agilent Agilent company of the U.S..
Test two DNA extraction
1) material is processed:By the cotton swab transposition wiped across in buccal in 2ml centrifuge tubes, cotton swab part is cut with shears
Under.
2) cell pyrolysis liquid and E.C. 3.4.21.64,56 DEG C of placement 60min peptic cells are added.
3) buffer, 70 DEG C of placement 10min, extruding is added to throw away cotton swab, lysate is proceeded to into new centrifuge tube.
4) dehydrated alcohol, precipitated dna are added.
5) solution is added into adsorption column centrifugation, outwells the waste liquid in collecting pipe.
6) buffer, centrifugation is added to outwell the waste liquid in collecting pipe.
7) rinsing liquid, centrifugation is added to outwell the waste liquid in collecting pipe.It is repeated 1 times.
8) it is dried column matrix.
9) elution buffer eluted dna is added in adsorption column.
10) DNA is collected by centrifugation, repeats eluting once, DNA product is stored in -20 DEG C.
Test the preparation of three libraries
Step one:DNA is crushed
1) gDNA quality inspections, it is ensured that DNA is up-to-standard (without degraded;A260/A280 is between 1.8-2.0).Detected with Qubit
The concentration of sample gDNA.
2) according to parameter setting covaris of table 2, concrete operations are as follows:
Table 2:
| Arrange | Numerical value |
| Service factor (Duty Factor) | 10% |
| Power peak (PIP) | 175 |
| Each pulse period number | 200 |
| Process time | 360sec |
| Temperature | 4 DEG C~8 DEG C |
A. deionized water, water level is added to reach scale " 12 " in covaris water vats;
B. check whether water level can not have sample cell glass part;
C. chilling temperature is set to into 2-5 DEG C, is cooled to 5 DEG C;
D. alternatively, add ethylene glycol (ethylene glycol) to the 20% of cumulative volume, prevent from freezing.
E. " Degas " button on panel is pressed, " Degas " operation at least 30min before use.
3) in 1.5ml EP pipes, 3ug gDNA are diluted to into 130ul with 1X Low TE Buffer;
4) Covaris microTube are attached on covaris;
5) 130ul DNA samples are carefully drawn with taper pipette tips, in being added to Covaris microTube pipes.It is (careful
Operation, should not make tube bottom bubble occur)
6) carry out DNA according to the Covaris parameters of the setting of table 2 to crush, the main peak of breakdown products is in 150-200bp.
7) DNA sample after carefully being crushed with taper pipette tips is drawn onto in a new 1.5ml EP pipe.
Step 2:With Agencourt AMPure XP magnetic beads for purifying DNA samples
1) AMPure XP bead are placed at least 30min in room temperature;Then fully mix AMPure XP bead to suspend
Liquid, until suspension color homogeneous (should not freeze).
2) add in the new pipes of 1.5ml AMPure XP bead suspensions that 180ul mixes and broken DNA library (~
130ul).It is vortexed and mixes, room temperature places 5min.
3) pipe is placed on magnetic frame, stands about 3-5min and become clarification to solution.
4) supernatant in pipe is carefully absorbed on magnetic frame, pipette tips should not encounter magnetic bead.
5) on magnetic frame, in each Guan Zhongfen the ethanol of 500ul 70% is added.Can be obtained more with the fresh ethanol now matched somebody with somebody
Good effect.
6) after standing 1min allows magnetic bead sedimentation, ethanol is absorbed.
7) repeat step 5), 6) once.
8) in the upper 37 DEG C of heating 5min of heat block (head block), or it is heated to the ethanol evaporating completely remained in pipe.
Note:Magnetic bead surfaces should not be heated to and crackle occur.Magnetic bead overdrying can cause the efficiency of eluting to be remarkably decreased.
9) add 50ul without RNase water, mix on vortex instrument, room temperature places 2min.
10) PE pipes are placed on magnetic frame, stand about 2-3min and become clarification to solution.
11) in drawing the new 1.5ml pipes of about 50ul supernatants to.Magnetic bead can be abandoned after this EOS.If no
Subsequent step is carried out, by Sample preservation in -20 DEG C of refrigerators.
Step 3:Repair end
1) using SureSelect Library Prep Kit, ILM. test kits prepare reactant liquor on ice.
2) reaction mixture is prepared in PCR pipe (or comb, PCR plate) formula shown in inner according to the form below 3, is mixed.
3) 52ul reactant liquor mix are added in each PCR pipe (or hole).
4) 48ul DNA samples are added in each PCR pipe (or hole), is mixed with rifle pressure-vaccum.
5) it is subsequently placed in PCR instrument, 20 DEG C of temperature bath 30min hot should not be covered.
Table 3:
Step 4:With Agencourt AMPure XP magnetic beads for purifying DNA samples (the same step 2 of concrete operations)
Step 5:DNA fragmentation end adds A
1) using SureSelect Library Prep Kit, ILM. test kits, the formula of according to the form below 4 prepares anti-on ice
Answer liquid.
Table 4:
2) it is placed in PCR instrument, 37 DEG C of temperature bath 30min.If using heat lid, it is ensured that hot lid temperature is less than 50 DEG C.Step
Six:With Agencourt AMPure XP magnetic beads for purifying DNA samples (concrete purification process same step 2)
Step 7:Joint of the connection with special label
1) formula preparation end adds joint reactant liquor as shown in table 5;
Table 5:
2) it is placed in PCR instrument, 20 DEG C of temperature bath 15min.Should not be using heat lid.If not carrying out subsequent step, sample is protected
There are -20 DEG C of refrigerators.
Step 8:With Agencourt AMPure XP magnetic beads for purifying DNA samples (same to step 2)
Step 9:Amplification connects the library of joint
1) library of joint is connected to, is only expanded with therein 1/3, remaining Sample preservation is in -20 DEG C of refrigerators.
2) PCR reactant liquors are prepared according to formula shown in table 6:
Table 6:
Note:The amount of added DNA library can also be 250ng (quantitative with bioanalyzer DNA1000chip).
3) PCR instrument is put into, according to the form below 7 arranges PCR response procedures and reacted.
Table 7:
Step 10:With Agencourt AMPure XP magnetic beads for purifying DNA samples (same to step 2)
Test the capture of four liquid phases
Step one:Library hybridization
This part contains following steps:By the library for preparing and hybridizing reagent, closed reagent (blocking
) and SureSelect capture probes library carries out hybrid reaction agent.Each DNA library must individually be hybridized and be caught
Obtain, then again by PCR reaction introducing index.
Each library is done once hybridization and is once captured, and should not carry out the mixed pond of sample in this step.Hybridization requires 750ng
DNA initial amounts, maximum volume is specific as follows no more than 3.4ul:
1) at room temperature according to formula preparing hybrid buffer as shown in table 8 below.
Table 8:
2) SureSelect capture library mixture (the Capture library for target acquistion are prepared in PCR plate
mix);Pipe is kept to be put on ice for.For each sample, according to the size (Mb) of target area, the ratio with reference to shown in table 9 below
Add appropriate SureSelect captures library (Capture Library).And with reference to table 9 below with without the dilution of RNase water
SureSelect RNase Block.Prepare the diluent of sample reactions all enough according to table 9 simultaneously, to leave surplus capacity.
SureSelect RNase Block diluents are added with reference to table 9 below, is mixed with rifle pressure-vaccum.
Table 9:
3) the SureSelect Block Mix of sample reactions all enough are prepared according to table 10.
Table 10:
4) in another PCR plate, the library for preparing is processed, for target acquistion.
A. sample is divided into into the row of A, B two, in each hole on B rows, is separately added into 3.4ul 221ng/ul libraries.
B. in each hole on B rows, 5.6ul SureSelect Block Mix are separately added into.It is mixed with the upper and lower pressure-vaccum of rifle
It is even.
C. the hole of each sample is obturaged with lid, is put into PCR instrument,
D. reacted according to the program in table 11;
Table 11:
| Step | Temperature | Time |
| 1 | 95℃ | 5min |
| 2 | 65℃ | Constant temperature |
5) during 65 DEG C of temperature baths, covered with 105 DEG C of heat.
6) keep PCR plate under conditions of 65 DEG C, in each hole that the A of 96 orifice plates is arranged 40ul hybridization buffers added,
The hole count of addition is identical with the library number of B rows on 96 orifice plate.Note:Ensure to carry out before step 10, PCR plate is in 65 DEG C of temperature baths
At least 5min.
7) add on capture library mix to the PCR of step 2 preparation:
A. keep PCR plate under conditions of 65 DEG C, 7ul capture are added in the hole on C rows on above-mentioned 96 orifice plate
library mix。
B. mouth is sealed up with row's lid, it is ensured that sealing is tight.
C.65 a DEG C temperature bathes 2min.
8) keep PCR plate under conditions of 65 DEG C, 13ul hybridization buffers are drawn from A rows with the volley of rifle fire, be added to C rows'
In capture library mix.
9) keep PCR plate under conditions of 65 DEG C, arranged from B with the volley of rifle fire and draw whole library mixed liquors, be added to C rows'
In hybridization solution.With rifle lentamente upper and lower pressure-vaccum 8-10 time, fully mix.Now the volume of hybrid mixed liquid is probably 27-
29ul, evaporates the Volume Loss size for causing when bathing depending on front step temperature.
10) with row's lid or double-deck mucosa (double adhesive film) sealing, it is ensured that all hole sealings are tight.
Note:Using new row's lid or sealed membrane, used its integrity in heating process can decline.If using row
Pipe, situation about being evaporated by preliminary experiment inspection before the first use, it is ensured that the volume of evaporation does not exceed 3-4ul.
11) hybrid mixed liquid is covered in 65 DEG C of temperature bath 24h with 105 DEG C of heat.
Step 2:Prepare magnetic bead
This step uses the reagent of SureSelect Target Enrichment Kit Box#1:SureSelect
Bind Buffer and SureSelect Wash 2.
1) 65 DEG C of preheating SureSelect Wash 2 on water-bath or heat block, use in Step 3.
2) magnetic bead can be settled when preserving, and be vortexed acutely concussion, allow Dynabeads MyOne Streptavidin
T1 suspends again.
3) to each hybridization, 50ul Dynabeads MyOne Streptavidin T1 to 1.5ml centrifuge tubes are taken
In.
4) magnetic bead is rinsed:
A. 200ul SureSelect Binding Buffer, votex concussion 5s are added.
B. pipe is placed on magnetic frame, becomes to solution and absorb supernatant after clarification.
C. twice of repeat step a-b, rinses 3 times altogether.
5) suspended again magnetic bead with 200ul SureSelect Binding Buffer.
Step 3:Capture and eluting
This step uses the reagent of SureSelect Target Enrichment Kit Box#1:SureSelect
Wash 1 and SureSelect Wash 2.
1) after the temperature bath of 24 hours, estimate (being estimated with rifle) and record the volume of remaining hybrid mixed liquid.
2) keep PCR plate under conditions of 65 DEG C, hybrid mixed liquid is applied directly in bead solution, overturn and mix 3-5
It is secondary.
Note:If after temperature bath hybridization 24h, there is excessive evaporation, remaining volume is less than 20ul, it will after impact
Continuous capture effect.
3) mixed liquor is placed on nutator (wobbler), room temperature mixes 30min.
4) brief centrifugation.
5) pipe is placed on magnetic frame, is stood to solution clarification, absorb supernatant.
6) 500ul SureSelect Wash 1 are added, votex 5s allow bead to suspend again.
7) room temperature places 15min, is mixed several times with votex therebetween.
8) brief centrifugation.
9) pipe is placed on magnetic frame, is stood to solution clarification, absorb supernatant.
10) bead is rinsed
A. add 500ul through the SureSelect Wash 2 of 65 DEG C of preheatings, votex 5s allow bead to suspend again.
B. 65 DEG C of temperature bath 10min on water-bath or heat block, are mixed several times therebetween with votex.
If c. bead has been settled, it is reverse it is several under allow it to suspend.
D. brief centrifugation.
E. pipe is placed on magnetic frame, is stood to solution clarification, absorb supernatant.
F. twice of repeat step a-e, rinses 3 times altogether.Guarantee that all of wash buffer are absorbed.
G. 30ul nuclease-free water, votex 5s is added to allow bead to suspend again.
Experiment five:PCR amplifications, introducing label (index) after hybridization
The experimental procedure that this part includes is:Index, PCR primer purification and library quality inspection are entered by pcr amplification primer.
Step one:Pcr amplification primer enters index
The reagent that this step is used:
·Herculase II Fusion DNA Polymerase(Agilent)
·SureSelect Target Enrichment Kit ILM Indexing Hyb Module Box#2
·SureSelect Library Prep Kit,ILM
Note:Should not be with the PCR enzymes beyond Herculase II Fusion DNA Polymerase, the effect of other enzymes is not
Empirical tests.
1) 1 hybridization is with 1 PCR reaction, an additional negative control (being not added with template).
2) multiple samples are placed on ice, are proceeded as follows:
A. the formula of according to the form below 12 prepares reaction liquid mixture, mixes;
B. 35ul reactant liquor mix are added in each PCR pipe (or hole).
C. PCR Primer Index are taken out from test kit " SureSelect Library Prep Kit, ILM "
1through Index 16 (clear caps), add the appropriate index of 1ul in each hole, mixed with rifle pressure-vaccum.
For by the different samples being sequenced on same lane, using different index primer.
E. with each DNA sample of rifle pressure-vaccum, it is ensured that bead solution mix homogeneously.
F. each sample draws 14ul in corresponding PCR pipe (or hole), and upper and lower pressure-vaccum is mixed.
Table 12:Herculase II Master Mix formula
* Herculase II Fusion DNA Polymerase (Agilent) test kits are taken from.Should not be using other examinations
Buffer the and dNTP mix of agent box.
A takes from test kit:SureSelect Target Enrichment Kit ILM Indexing Hyb Module
Box#2。
B uses SureSelect Library Prep Kit, 1 in the primer of 16 in ILM test kits.
3) PCR pipe is put into into PCR instrument to be expanded, amplification program such as table 13 below:
Table 13:
Step 2:With Agencourt AMPure XP magnetic beads for purifying DNA samples (with the step two in experiment three)
Test six high-flux sequences
Step one:Dilution library, degeneration
1) degeneration 0.2N NaOH are prepared:It is molten that 200 μ L 0.1N NaOH are added to preparation 0.2N NaOH in 800 μ L pure water
Liquid.
2) library is diluted to into 2nM, according to each library desired data amount pooling, obtains the library that concentration is 2nM and dilute
Liquid.
3) the isopyknic 10 μ L 0.2nM NaOH of 10 μ L 2nM libraries diluents additions are taken, after pressure-vaccum mixes 3 times, is started
Timing 5min.Period concussion is mixed, that is, shake 10s, is centrifuged, and repeats concussion centrifugally operated twice.
4) after degeneration 5min, 970 μ L HT1, concussion are added to mix library solution, 280*g in degeneration library
5) 1min is centrifuged, obtains the degeneration library of 20pM.
6) the degeneration library of 20pM is diluted to into 3pM for upper machine.The μ L of degeneration library solution 450 are added to 2550 μ L pre-coolings
HT1 in, it is reverse mixing for several times, centrifugation, obtain 3mL 3pM degeneration library.
Step 2:Upper machine
1) prepare test kit (Reagent Cartridge), thaw, check and add sodium hypochlorite;Prepare sequence testing chip
(flow cell):Equilibrate to room temperature, opening, check.
2) test kit (Reagent Cartridge) is prepared:First test kit (Reagent Cartridge) thaws, so
Check that test kit (Reagent Cartridge) big reservoir determines whether reagent thaws completely afterwards.
(1) test kit (Reagent Cartridge) thaws:Test kit (Reagent Cartridge) can be in 2-8
DEG C, overnight thaw.Just can thaw completely in the minimum 18h of this temperature reagent.One week can be preserved in this temperature reagent.①
Test kit (Reagent Cartridge) is taken out from -15-25 DEG C;2. test kit (Reagent Cartridge) is put into and can be soaked
In ning the water-bath of room temperature of test kit (Reagent Cartridge) bottom.Note:Water will not reach test kit (Reagent
Cartridge top).3. reagent thaws about 60min in room-temperature water bath, to thawing completely.4. test kit is taken out from water-bath
(Reagent Cartridge), raps on the table, removes the water of test kit (Reagent Cartridge) bottom, makes examination
Agent box (Reagent Cartridge) bottom is dried.
(2) test kit (Reagent Cartridge) is checked:1. overturn test kit (Reagent Cartridge) to mix for 5 times
The reagent of even defrosting.2. 29,30,31 and 32 reservoirs of test kit (Reagent Cartridge) bottom are checked, it is ensured that these storages
The reagent of layer thaws completely.3. rap test kit (Reagent Cartridge) on the table to drive out of in the bubble in reagent.
(3) it is put into fresh NaOCl:In order to avoid pollution of upper one operation to instrument, in Reagent Cartridge
Before being put into Nextseq 500, the NaOCl of dilution is added in Reagent Cartridge.Illumina recommends 3%-6%'s
NaOCl is diluted to 0.03%-0.06%.Note:The NaOCl of preparation is used in 24h.1. the 0.03%-0.06% of 2mL is prepared
NaOCl, volume ratio is the μ L of NaOCl 20 of 20 3%-6% and the μ L of pure water 1980.2. overturn and mix centrifuge tube for several times;3. paper is used
Towel wipes clean 28 hole napkins;4. 28 hole pore membranes are broken with clean 1mL pipette tips;5. 2mL is added in No. 28 holes
0.03%-0.06%,
(4) sequence testing chip (flow cell) is prepared:Sequence testing chip (flow cell) is taken out from 2-8 DEG C, bag is opened
Dress and by sequence testing chip (flow cell) wiped clean, machine in wait.
3) addition library diluent is in No. 10 holes of test kit.
4) sequence is selected to start sequencing program setting steps from software interface.
5) it is put into sequence testing chip (flow cell);
6) waste tray is emptied, and is put back to.Buffer box is put into, test kit is put into.
7) inspection result before operational factor and operation is examined.Selection brings into operation.
8) by NCS softwares and SAV software supervision runnings.
Test seven data analysiss
Step one:Data filtering
Raw sequencing data is with fastq stored in file format (filenames:* .fq), needed before next step analysis is carried out
Data filtering is carried out, filter method is as follows:
(1) need to filter out the sequence containing joint sequence (reads);
(2) when the content of the N contained in single-ended sequencing sequence exceedes the 10% of the sequence length ratio, need to remove
This is to both-end sequencing sequence (paired reads);
(3) as the low quality (Q contained in single-ended sequencing sequence<=5) base number exceed the read length ratios
When 50%, need to remove this to both-end sequencing sequence (paired reads).
Step 2:Sequence alignment and Quality Control
Through the strict filtration to sequencing data, high-quality ordered sequence (Clean data) is obtained.Ordered sequence leads to
Cross BWA (Burrows-Wheeler Alignment tool) software to compare to NCBI build 37/hg19 reference gene groups
On, comparison result Jing picard (http://broadinstitute.github.io/picard/) remove and repeat, and filter out
Sequence of the base mismatch number more than 5.
Step 3:Pathogenic mutation analysis is carried out to target sequence
3.1SNP and InDel analyses comprise the steps:
(1) by software SOAP (http://soap.genomics.org.cn/), sequencing gained sequence is navigated to people
The corresponding position of genoid group;
(2) SNP and InDel overburden depths are counted, removes site of the overburden depth less than 30.
(3) according to disease sample and normal specimens information, selecting the site in crowd's medium frequency less than 2% is carried out further
Understand, protein function is predicted using SIFT softwares, resulting site is used as the pathogenic candidate locus of disease.
(4) synthesis dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/)、HGMD
(www.hgmd.cf.ac.uk)、ClinVar(http://www.ncbi.nlm.nih.gov/clinvar/)、LOVD
InSiGHT(http://insight-group.org/lovd.html) mutational site is annotated.Jing is analyzed, resulting
Pathogenic candidate locus are as shown in table 14 below.
Table 14:
Annotation:
“-“:Finger does not detect any change or without relevant information.
Heterozygosis:Refer to that on same site two allele there are different genotype.
It is pure and mild:Refer to that on same site two allele have identical genotype.
Nonsense mutation:Refer to because the change of certain base makes to represent the codon mutation of certain aminoacid as termination codon
Son, so that peptide chain synthesis terminates in advance.
Missense mutation:Refer to that the codon for encoding certain aminoacid Jing after base replacement, becomes to encode another kind of aminoacid
Codon so that the amino acid classes and sequence of polypeptide chain change.
Splice site:Referring to may affect subgenomic transcription to form the variation of messenger RNA.
Insertion mutation:Finger inserts nucleotide in genome, the mutation for causing gene code to change.
Deletion mutation:Finger lost several nucleotide in genome, the mutation for causing gene code to change.
The analysis of 3.2 large fragment deletions comprises the steps:
(1) partition window value:Selected 100bp is divided into the longer target area of length as information analysiss window value
Length is the window of 100bp.The window shorter in order to prevent length, target area of the length less than 160bp does not divide
Process.
(2) using the depth of Coverage module meters of GATK (The Genome Analysis Toolkit) instrument
The sequencing sequence number of target sample and control sample group on each window is calculated, both are carried out into homogenization process, uniformed
Processing formula is:
The sequence sum of all windows of sequence number * 1000/ original on sequence number=window on the rear hatch of standardization
(3) using the sequence number after standardization, mark of the control sample on each window between sequence number is calculated
It is accurate poor, and standard deviation is designated as into Sd.The median of control sample sequence number on each window is calculated, and median is designated as
Med。
(4) for specific window, the median of sequence number and check sample after statistics examined samples standardization
Difference, calculates and deviates median degree, and when departure degree is more than 3*Sd deletion mutation is judged as.The formula that disappearance judges is such as
Under:
Zi=(by sequence number i-Medi after sample product standardization)/(Sdi)
Then it is judged as there occurs disappearance on i-th window when Zi is more than 3.
The gene such as table 15 below of presence deletion mutation detected according to the method described above:
Table 15:
| Chromosome | Genomic locations | Fragment length | Gene | Variable region | Variation type |
| 5 | 112043353-112198302 | 154949bp | APC | Exon region | Large fragment deletion |
Said method and existing employing average are carried out into the method for irrelevance detection and is less than with median ratio
0.6 detection method is compared, concrete comparative result such as table 16 below:
Table 16:
| / | Recall rate | Positive predictive value (PPV) |
| The present invention | 13.54% | 100% |
| The irrelevance of calculating compared with average | 15.63% | 86.67% |
| 0.6 is less than with median ratio | 20.8% | 65% |
From above-mentioned table 16 as can be seen that method of the present invention method compared to existing technology reduces false sun in recall rate
The recall rate of property so that positive predictive value reaches 100%, and the accuracy for showing the positive prediction of the method for the present invention is significantly carried
It is high.
Experiment eight is verified
To prove the accuracy of above-mentioned large fragment deletion testing result, positive findingses are tested and analyzed by DPHLC methods and is carried
The mutation result of person, experimental procedure is as follows:
1:Mesh is directed to using software Primer primer5.0 (www.premierbiosoft.com/primerdesign)
Mark point designs primer.Specifically it is shown in Table 17:
Table 17:
2:PCR is expanded.Amplification system such as table 18 below, amplification program such as table 19 below.
Table 18:
| PCR reactive components | Each system addition |
| 10×Buffer I | 5μl |
| 2.5mM dNTP | 4μl |
| Primer sets | 10μl |
| HS Taq enzymes (5U/ μ l) | 0.4μl |
| DNA | 2.0μl |
| ddH2O | Polishing is to 50 μ l |
Table 19:
3.PCR products are sequenced
Take 1 μ l PCR primers to be detected with 2.0% agarose gel electrophoresiies, and send sequencing.
4. sequencing result is shown in Fig. 1.
In Fig. 1, DHPLC interpretation of result Main Analysis be amplified production peak area, because peak area is approximately equal to bottom (peak
It is wide) × high (peak height)/2, so the amount (i.e. copy number) of PCR primer can pass through testing sample and the peak height before standard control
Judge indirectly.Each peak is an individually designed product on Fig. 1, is all base to be measured in addition to a standard reference gene
The different exons of cause.As long as reference gene is alignd after (peak base and peak height), before observation testing sample and standard control
The height of other products (peak) is it may determine that the copy number difference of different exons.From Fig. 1 sample to be tested with compare between
Peak height can be seen that the APC of sample to be tested and there is large fragment deletion, this is consistent with the result of secondary sequencing.So as to demonstrate
The effectiveness and accuracy of the sequence measurement of the present invention.
As can be seen from the above description, the above embodiments of the present invention realize expected technique effect:By treating
The sequencing data of test sample sheet and check sample to be cut into window in the form of carry out the calculating of sequence number, be easy to according to different surveys
The sequencing depth of ordinal number evidence and the size of target deletion fragment carry out the size of flexible splitter, make detection deletion fragment it is big
Small range is more extensive;Also, when it is determined that a certain window whether there is deletion mutation, according to sample to be tested the second of each window
The ratio of the median between sequence and check sample is determined, by using the median between check sample as the mark for comparing
Standard, compares using meansigma methodss and standard deviation as the standard for comparing, it is easier to distinguish false positive, makes determination result more accurate, because
Be when there is no to copy number variation on certain window, using meansigma methodss and standard deviation as the standard for comparing determination mode meeting
Affect the accuracy for determining result.
It should be noted that can be in such as one group computer executable instructions the step of the flow process of accompanying drawing is illustrated
Perform in computer system, and, although show logical order in flow charts, but in some cases, can be with not
The order being same as herein performs shown or described step.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general
Computing device realizing, they can be concentrated on single computing device, or are distributed in multiple computing devices and are constituted
Network on, alternatively, they can be realized with the executable program code of computing device, it is thus possible to they are stored
Performed by computing device in the storage device, or they be fabricated to respectively each integrated circuit modules, or by they
In multiple modules or step be fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific
Hardware and software is combined.
The preferred embodiments of the present invention are the foregoing is only, the present invention is not limited to, for the skill of this area
For art personnel, the present invention can have various modifications and variations.It is all within the spirit and principles in the present invention, made any repair
Change, equivalent, improvement etc., should be included within the scope of the present invention.
Sequence table
<110>Tianjin Nuo Hezhi sources bio information Science and Technology Ltd.
<120>The method and apparatus of detection gene mutation
<130> PN41432NHZY
<160> 14
<170> PatentIn version 3.5
<210> 1
<211> 20
<212> DNA
<213>Synthetic
<400> 1
tcgggaagcg gagagagaag 20
<210> 2
<211> 20
<212> DNA
<213>Synthetic
<400> 2
agacagtgcg agggaaaacc 20
<210> 3
<211> 20
<212> DNA
<213>Synthetic
<400> 3
atttaccagt gagggacggg 20
<210> 4
<211> 20
<212> DNA
<213>Synthetic
<400> 4
acgcttttga gggttgattc 20
<210> 5
<211> 20
<212> DNA
<213>Synthetic
<400> 5
taaggtgcgt gctttgagag 20
<210> 6
<211> 21
<212> DNA
<213>Synthetic
<400> 6
acatcctgag ggtaaggcta a 21
<210> 7
<211> 25
<212> DNA
<213>Synthetic
<400> 7
tgactgtaat attctaagtc ctacc 25
<210> 8
<211> 20
<212> DNA
<213>Synthetic
<400> 8
gagattctga agttgagcgt 20
<210> 9
<211> 22
<212> DNA
<213>Synthetic
<400> 9
cacaacatca ttcactcaca gc 22
<210> 10
<211> 22
<212> DNA
<213>Synthetic
<400> 10
tacttggatt tttgtcctgg tc 22
<210> 11
<211> 25
<212> DNA
<213>Synthetic
<400> 11
tgacaaagga agaacagata gcaaa 25
<210> 12
<211> 22
<212> DNA
<213>Synthetic
<400> 12
aagcctgggt gacagagtga ga 22
<210> 13
<211> 19
<212> DNA
<213>Synthetic
<400> 13
tgttgactcg atccacccc 19
<210> 14
<211> 21
<212> DNA
<213>Synthetic
<400> 14
tgagctgcaa gtttggctga a 21
Claims (12)
1. it is a kind of detection gene mutation method, it is characterised in that the method comprising the steps of:
Obtain the sequencing data of sample to be tested and check sample;
Judge to be mutated with the presence or absence of SNP mutation and/or InDel in the sequencing data of the sample to be tested;And
Judge to whether there is deletion mutation in the sequencing data of the sample to be tested;
Wherein, judge that the step of whether there is deletion mutation in the sequencing data of the sample to be tested includes:
Homogenization is processed, and the sequencing data is cut into into window, counts the sample to be tested and check sample respectively in each window
The sequence number of mouth, and the sequence number to each window carries out homogenization process, obtains the sample to be tested and control sample one's duty
Not in the homogenization sequence number of each window;
Standard deviation and median are calculated, and calculate the standard deviation of the homogenization sequence number of the matched group sample on each window
And median;
Irrelevance calculate, according to formula (1) calculate on each described window, the homogenization sequence number of the sample to be tested with
The irrelevance Z values of the median of the check sample;And
Z=(the homogenization sequence number-median of sample to be tested)/standard deviation (1)
Disappearance judgement, when the Z values are more than 3, then judges that the sample to be tested has deletion mutation in the window.
2. method according to claim 1, it is characterised in that in the homogenization process step, by the sequencing data
It is cut into continuous disjoint window.
3. method according to claim 1, it is characterised in that the step of homogenization is processed includes:
The respective sequencing data of the sample to be tested and check sample is cut into into window, and by each each window of leisure
Sequence number is designated as First ray number;And by the summation of each First ray number, it is designated as respective second sequence number;And
The formula First ray number respective to the sample to be tested and check sample as shown in formula (2) is uniformed
The homogenization sequence numbers (2) of sequence number=First ray number * 1000/ second
Process, obtain the homogenization sequence number of the sample to be tested and each each window of leisure of check sample.
4. method according to claim 1, it is characterised in that whether there is in the sequencing data of the judgement sample to be tested
The step of SNP mutation and/or InDel are mutated includes:
Sequence alignment, the sequencing data of the sample to be tested and reference gene group is compared and obtains comparison result;
Screen for the first time, the site that there is SNP mutation and/or InDel mutation is filtered out from the comparison result, be designated as first
Candidate locus;
Programmed screening, filters out site of crowd's mutation frequency less than 2% from first candidate locus, is designated as the second time
Bit selecting point;
SNP and/or InDel mutation judge, according to, to the functional annotation of second candidate locus, sentencing in functional annotation data base
With the presence or absence of the SNP mutation site and/or InDel mutation position for causing gene function to change in second candidate locus of breaking
Point;If existing, second candidate locus are designated as into the 3rd candidate locus;And
SNP and/or InDel mutation confirming, when there are three candidate locus, the 3rd candidate locus being defined as
SNP mutation site and/or InDel mutational sites.
5. method according to any one of claim 1 to 4, it is characterised in that in the acquisition sample to be tested and control
Before the step of sequencing data of sample, methods described also includes carrying out exon respectively to the sample to be tested and check sample
Prepared by library the step of, the step of prepared by the exon library in be prepared using the method for liquid phase capture.
6. method according to claim 5, it is characterised in that before the method captured using the liquid phase is prepared,
Also include the step of liquid phase capture probe is designed according to target gene exon region.
7. method according to claim 6, it is characterised in that exon library preparation process is included to multiple targets
The exon library of gene is prepared, and the plurality of target gene at least includes following gene:MLH1、MSH2、MSH3、
MSH6、PMS1、PMS2、BUB1、BUB3、STK11、PTEN、SMAD4、APC、MUTYH、EPCAM、SETD2、MAX、TSC2、ATM
And FANCC.
8. it is a kind of detection deletion mutant device, it is characterised in that described device includes:
Acquisition module, for obtaining the sequencing data of sample to be tested and check sample;
First judge module, for prominent with the presence or absence of SNP mutation and/or InDel in the sequencing data for judging the sample to be tested
Become;And
Second judge module, for whether there is deletion mutation in the sequencing data for judging the sample to be tested;
Wherein, second judge module includes:
Homogenization submodule, for the sequencing data to be cut into into window, counts the sample to be tested and check sample difference
In the sequence number of each window, and the sequence number to each window carries out homogenization process, obtains the sample to be tested and control
Sample is respectively in the homogenization sequence number of each window;
First calculating sub module, for calculate the homogenization sequence number of the check sample on each window standard deviation and
Median;
Second calculating sub module, for calculating each described window, the homogenization sequence of the sample to be tested according to formula (1)
The irrelevance Z values of columns and the median of the check sample;And
Z=(the homogenization sequence number-median of sample to be tested)/standard deviation (1)
Disappearance judging submodule, lacks for when the Z values are more than 3, then judging that the sample to be tested exists in the window
Mutation.
9. device according to claim 8, it is characterised in that the homogenization submodule is further included:
Statistic unit, for the sequence number of each each window of leisure of the sample to be tested and check sample to be counted, is designated as
Respective First ray number, the First ray number sum of respective all windows is counted, and is designated as respective second sequence
Columns;And
Computing unit, for by the sample to be tested and check sample each window the First ray number according to formula (2)
The homogenization sequence numbers (2) of sequence number=First ray number * 1000/ second
Shown formula carries out homogenization process, obtain the sample to be tested and each each window of leisure of check sample it is described
One changes sequence number.
10. device according to claim 8, it is characterised in that the first judge module includes:
Sequence alignment submodule, obtains comparing knot for the sequencing data of the sample to be tested and reference gene group to be compared
Really;
First screening submodule, for filtering out the site that there is SNP mutation and/or InDel mutation from the comparison result,
It is designated as the first candidate locus;
Second screening submodule, for filtering out site of crowd's mutation frequency less than 2%, note from first candidate locus
For the second candidate locus;
SNP and/or InDel mutation judging submodule, for according in functional annotation data base to second candidate locus
Functional annotation, judge in second candidate locus with the presence or absence of cause SNP mutation site that gene function changes and/or
InDel mutational sites;If existing, second candidate locus are designated as into the 3rd candidate locus;And
SNP and/or InDel mutation confirm submodule, for when there are three candidate locus, by the 3rd candidate bit
Point is defined as SNP mutation site and/or InDel mutational sites.
11. devices according to claim 8, it is characterised in that obtain sample to be tested and control sample in the acquisition module
Before this sequencing data, described device also includes that exon library prepares module, and the exon library prepares module to be used for
The exon library of the sample to be tested and check sample is prepared using liquid phase catching method.
12. devices according to claim 11, it is characterised in that prepare module to described to be measured in the exon library
Before the exon library of sample and check sample is prepared, described device also includes that probe designs module, and the probe sets
Meter module is used to design liquid phase capture probe according to target gene exon region.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610932451.8A CN106566877A (en) | 2016-10-31 | 2016-10-31 | Gene mutation detection method and apparatus |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610932451.8A CN106566877A (en) | 2016-10-31 | 2016-10-31 | Gene mutation detection method and apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN106566877A true CN106566877A (en) | 2017-04-19 |
Family
ID=60414390
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610932451.8A Pending CN106566877A (en) | 2016-10-31 | 2016-10-31 | Gene mutation detection method and apparatus |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106566877A (en) |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107180166A (en) * | 2017-04-21 | 2017-09-19 | 北京希望组生物科技有限公司 | A kind of full-length genome structure variation analysis method and system being sequenced based on three generations |
| CN108192971A (en) * | 2018-02-02 | 2018-06-22 | 厦门基源医疗科技有限公司 | A kind of detection method of Jessica Lynch's syndrome related genes variants |
| CN108410979A (en) * | 2018-05-25 | 2018-08-17 | 安徽华大医学检验所有限公司 | Chip that is a kind of while detecting single-gene disorder and chromosomal disorders and its application |
| CN108470114A (en) * | 2018-04-27 | 2018-08-31 | 元码基因科技(北京)股份有限公司 | The method of two generation sequencing datas analysis Tumor mutations load based on single sample |
| CN108486251A (en) * | 2018-04-16 | 2018-09-04 | 南京大学(苏州)高新技术研究院 | A target gene capture sequencing method for rapid diagnosis and differential diagnosis of BHD syndrome and its application |
| CN108690871A (en) * | 2018-03-29 | 2018-10-23 | 深圳裕策生物科技有限公司 | Insertion and deletion mutation detection methods, device and storage medium based on the sequencing of two generations |
| CN109949861A (en) * | 2019-03-29 | 2019-06-28 | 深圳裕策生物科技有限公司 | Tumor mutations load testing method, device and storage medium |
| CN110016498A (en) * | 2019-04-24 | 2019-07-16 | 北京诺赛基因组研究中心有限公司 | The method of single nucleotide polymorphism is determined in the sequencing of Sanger method |
| CN112837749A (en) * | 2021-02-01 | 2021-05-25 | 北京百奥纳芯生物科技有限公司 | Optimization method of gene chip probe for cancer screening |
| CN113724781A (en) * | 2021-11-03 | 2021-11-30 | 北京雅康博生物科技有限公司 | Method and apparatus for detecting homozygous deletions |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014039556A1 (en) * | 2012-09-04 | 2014-03-13 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
| CN104204221A (en) * | 2011-12-31 | 2014-12-10 | 深圳华大基因科技服务有限公司 | Method and system for testing fusion gene |
| CN104428425A (en) * | 2012-05-04 | 2015-03-18 | 考利达基因组股份有限公司 | Methods for determining absolute genome-wide copy number variations of complex tumors |
| CN104462869A (en) * | 2014-11-28 | 2015-03-25 | 天津诺禾致源生物信息科技有限公司 | Method and device for detecting somatic cell SNP |
| CN104561289A (en) * | 2014-12-26 | 2015-04-29 | 北京诺禾致源生物信息科技有限公司 | Detection method and device of gene deletion mutation |
| WO2015181718A1 (en) * | 2014-05-26 | 2015-12-03 | Ebios Futura S.R.L. | Method of prenatal diagnosis |
| CN105408496A (en) * | 2013-03-15 | 2016-03-16 | 夸登特健康公司 | Systems and methods for detecting rare mutations and copy number variations |
-
2016
- 2016-10-31 CN CN201610932451.8A patent/CN106566877A/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104204221A (en) * | 2011-12-31 | 2014-12-10 | 深圳华大基因科技服务有限公司 | Method and system for testing fusion gene |
| CN104428425A (en) * | 2012-05-04 | 2015-03-18 | 考利达基因组股份有限公司 | Methods for determining absolute genome-wide copy number variations of complex tumors |
| WO2014039556A1 (en) * | 2012-09-04 | 2014-03-13 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
| CN105408496A (en) * | 2013-03-15 | 2016-03-16 | 夸登特健康公司 | Systems and methods for detecting rare mutations and copy number variations |
| WO2015181718A1 (en) * | 2014-05-26 | 2015-12-03 | Ebios Futura S.R.L. | Method of prenatal diagnosis |
| CN104462869A (en) * | 2014-11-28 | 2015-03-25 | 天津诺禾致源生物信息科技有限公司 | Method and device for detecting somatic cell SNP |
| CN104561289A (en) * | 2014-12-26 | 2015-04-29 | 北京诺禾致源生物信息科技有限公司 | Detection method and device of gene deletion mutation |
Non-Patent Citations (4)
| Title |
|---|
| ISAAC J. NIJMAN ET AL.: ""Targeted next-generation sequencing: A novel diagnostic tool for primary immunodeficiencies"", 《J ALLERGY CLIN IMMUNOL》 * |
| LENNART F. JOHANSSON ET AL.: ""CoNVaDING: Single Exon Variation Detection in Targeted NGS Data"", 《HUMAN MUTATION》 * |
| TAYLOR J. JENSEN ET AL.: ""Detection of Microdeletion 22q11.2 in a Fetus by Next-Generation Sequencing of Maternal Plasma"", 《CLINICAL CHEMISTRY》 * |
| YANFANG GUAN ET AL.: ""Detection of inherited mutations for hereditary cancer using target enrichment and next generation sequencing"", 《 FAMILIAL CANCER》 * |
Cited By (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107180166A (en) * | 2017-04-21 | 2017-09-19 | 北京希望组生物科技有限公司 | A kind of full-length genome structure variation analysis method and system being sequenced based on three generations |
| CN108192971A (en) * | 2018-02-02 | 2018-06-22 | 厦门基源医疗科技有限公司 | A kind of detection method of Jessica Lynch's syndrome related genes variants |
| CN108192971B (en) * | 2018-02-02 | 2020-10-16 | 厦门基源医疗科技有限公司 | Method for detecting gene variation related to forest syndrome |
| CN108690871B (en) * | 2018-03-29 | 2022-05-20 | 深圳裕策生物科技有限公司 | Method, device and storage medium for detecting insertion deletion mutation based on next generation sequencing |
| CN108690871A (en) * | 2018-03-29 | 2018-10-23 | 深圳裕策生物科技有限公司 | Insertion and deletion mutation detection methods, device and storage medium based on the sequencing of two generations |
| CN108486251A (en) * | 2018-04-16 | 2018-09-04 | 南京大学(苏州)高新技术研究院 | A target gene capture sequencing method for rapid diagnosis and differential diagnosis of BHD syndrome and its application |
| CN108470114B (en) * | 2018-04-27 | 2020-02-28 | 元码基因科技(北京)股份有限公司 | Method for analyzing tumor mutation load based on second-generation sequencing data of single sample |
| CN108470114A (en) * | 2018-04-27 | 2018-08-31 | 元码基因科技(北京)股份有限公司 | The method of two generation sequencing datas analysis Tumor mutations load based on single sample |
| CN108410979A (en) * | 2018-05-25 | 2018-08-17 | 安徽华大医学检验所有限公司 | Chip that is a kind of while detecting single-gene disorder and chromosomal disorders and its application |
| CN109949861A (en) * | 2019-03-29 | 2019-06-28 | 深圳裕策生物科技有限公司 | Tumor mutations load testing method, device and storage medium |
| CN110016498A (en) * | 2019-04-24 | 2019-07-16 | 北京诺赛基因组研究中心有限公司 | The method of single nucleotide polymorphism is determined in the sequencing of Sanger method |
| CN110016498B (en) * | 2019-04-24 | 2020-05-08 | 北京诺赛基因组研究中心有限公司 | Method for determining single nucleotide polymorphism in Sanger method sequencing |
| CN112837749A (en) * | 2021-02-01 | 2021-05-25 | 北京百奥纳芯生物科技有限公司 | Optimization method of gene chip probe for cancer screening |
| CN112837749B (en) * | 2021-02-01 | 2021-11-26 | 北京百奥纳芯生物科技有限公司 | Optimization method of gene chip probe for cancer screening |
| US11710537B2 (en) | 2021-02-01 | 2023-07-25 | Beijing Bionaxin Biotech Co., Ltd | Optimal selection method of gene chip probes for cancer screening |
| CN113724781A (en) * | 2021-11-03 | 2021-11-30 | 北京雅康博生物科技有限公司 | Method and apparatus for detecting homozygous deletions |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106566877A (en) | Gene mutation detection method and apparatus | |
| AU2019203491B2 (en) | Using cell-free DNA fragment size to determine copy number variations | |
| EP3567120B1 (en) | Using cell-free dna fragment size to determine copy number variations | |
| DK3078752T3 (en) | SOLUTION OF REFRACTIONS USING POLYMORPHISM COUNTIES | |
| US9260745B2 (en) | Detecting and classifying copy number variation | |
| US9323888B2 (en) | Detecting and classifying copy number variation | |
| US10388403B2 (en) | Analyzing copy number variation in the detection of cancer | |
| CN110800063A (en) | Detection of tumor-associated variants using cell-free DNA fragment size | |
| CA2878246A1 (en) | Detecting and classifying copy number variation in a cancer genome | |
| HK1244844A1 (en) | Using cell-free dna fragment size to determine copy number variations | |
| CA3186272A1 (en) | A method for detecting a genetic variant | |
| CN107475370A (en) | Gene group and kit and diagnostic method for pulmonary cancer diagnosis | |
| CN105695567A (en) | Kit, primers, probe sequence and method for detecting fetus chromosome aneuploid | |
| CN115786459B (en) | Method for detecting tiny residual disease of solid tumor by high-throughput sequencing | |
| CN104561289A (en) | Detection method and device of gene deletion mutation | |
| CN114196740A (en) | Digital amplification detection method, detection product and detection kit for simultaneously identifying multiple gene types | |
| KR20190078715A (en) | Next generation sequencing (ngs)-based hybrid diagnostic panel for analyzing variation of cancer gene and anticancer drug-related gene | |
| CN121335988A (en) | Non-invasive in vitro diagnostic methods | |
| EP4441506A2 (en) | Methods for characterization of circulating tumor cells | |
| CN108866197A (en) | PTCH2 gene mutation site is judging the application in young breast cancer susceptibility | |
| CN120464724A (en) | Cleavage failure and/or early embryonic development arrest detection panel, detection kit and application thereof | |
| Cattelan | Development of a NGS workflow for diagnostic applications in oncology | |
| HK40055868B (en) | Using cell-free dna fragment size to determine copy number variations | |
| CN120683254A (en) | Markers, probe sets, kits and detection methods for thyroid cancer-related gene detection | |
| CN114023442A (en) | Biogenic analysis method and model based on multi-group chemical data osteosarcoma molecular typing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170419 |