CN105653898A

CN105653898A - Cancer detection kit based on large-scale data mining and detection method

Info

Publication number: CN105653898A
Application number: CN201610018232.9A
Authority: CN
Inventors: 钱自亮; 郑媛虹; 魏国鹏
Original assignee: Jiangsu Gezhi Life Technology Co Ltd
Current assignee: Jiangsu Gezhi Life Technology Co Ltd
Priority date: 2016-01-12
Filing date: 2016-01-12
Publication date: 2016-06-08

Abstract

The invention discloses a cancer detection kit based on large-scale mining and a detection method, and belongs to the technical field of biomedical detection. The kit comprises a DNA extraction reagent, a high-throughput sequencing library preparation reagent, gene sequence alignment software and chromosome cover degree calculation software. The method comprises the steps that firstly, peripheral blood is collected from a subject and plasma is separated out; DNA polymerase is used for amplification and a sequencing library is prepared; large-scale sequencing is performed on the prepared library; sequence alignment is performed on a sequencing result, statistic is performed on the distribution situation on a genome, and whether a cancer cell genomic sequence from tumor tissue exists or not is judged, so that whether a detection object carries a cancer or not is judged. According to the cancer detection kit and the detection method, the defect that existing cancer screening specificity and sensitivity are poor is overcome, the method causes no irradiation or wounds, and cancer detection can be achieved only through 4-10 mL of peripheral blood.

Description

A kind of cancer detection reagent kit based on large-scale data excavation and detection method

Technical field

The present invention relates to a kind of cancer detection reagent kit excavated based on large-scale data and detection method, be specifically related to a kind of based on high-flux sequence data mining and noinvasive cancer detection box and detection method, belong to biomedical detection technique.

Technical background

The number of malignant tumor often homologous chromosomes and the variation of structure are associated, because being usually present two chromosomoids in cell, some chromosomes are with the gene expressing malignant tumor, and other chromosomes are then with the gene suppressing malignant tumor. If chromosome morphs, then can cause the imbalance of gene on chromosome, and tumor occurs.

Current cancer detection depends on blood biochemical markers thing and Imaging Technology. Biochemical marker includes CEA, AFP, PSA etc., and the major defect of these labels is: the cancer species that (1) covers is limited, and most cancers can't effectively detect; (2) poor specificity, some underlying diseases, AFP can be caused to rise such as hepatitis and cause erroneous judgement, prostatitis also results in PSA rising and causes erroneous judgement etc.; (3) half-life is long in blood, can survive 7 to 28 days in blood such as AFP albumen or the longer time, be difficult to specific real-time tracking operative treatment, radiotherapy, Drug therapy effect. Imaging Technology includes B-and surpasses, CT and PET, as long as haveing the drawback that: (1) most detection comprises certain radiation hazradial bundle, it is impossible to enough too use frequently, also cannot be used for the real-time discovery of cancer; (2) operate more complicated, be difficulty with large-scale crowd's tumor screening.

Another kind of monitoring method is liquid biopsy, is utilize high throughput sequencing technologies to measure ctDNA fragment in tumor patient in fact, according to the oncogene information analyzed entrained by the ctDNA obtained, it becomes possible to reflect the feature of tumor all sidedly. This technology is by feat of the feature such as accurate, sensitive, noinvasive, high flux, and liquid biopsy can provide effective power-assisted for doctor and patient in diagnosing from treatment and prevention of tumour, treating reference, medication guide, state of an illness monitoring, recurrence early warning etc.In recent years, increasing biological medicine researcher finds, it is possible to carrying out NGS order-checking by human peripheral blood DNA, whether detection cancer occurs and predict risk of cancer. Checked order by full-length genome ctDNA, it is possible to calculate chromosomal copy number situation of change, thus judging whether suffer from tumor.

Such a method based on extensive high throughput sequencing technologies, will be not limited only to the order-checking of several known site, and will be for substantial amounts of known fragment and unknown fragment in patient's cancer cell DNA sequence and detect simultaneously. In the face of so big data volume, how calculating and understand formation useful information accurately becomes the key of tumor liquid biopsy.

The method of Present Domestic includes the technology caught based on target site and the DNA methylation assay technology based on cancer gene group. The technology ubiquity order-checking coverage caught based on target site is too low, thus causing that sensitivity is not enough. Based on, in methylated detection technique cancer in early days, there is the not high problem of specificity (deficiency) owing to ctDNA content is too low, directly affect and constrain liquid biopsy in clinic, market, the other application of scientific research grade.

This technology is by developing a kind of stable method for computing data according to carcinobiology principle, the breakthrough of novelty it is made that in data deciphering and clinical practice etc., rely on these international forward position research, it is conceived to Chinese population, new technique disclosed in this invention, it is intended to for this international cutting edge technology cancer detection at Chinese population and indicating risk. Owing to blood DNA detection noinvasive is radiationless, testing cost is controlled, and the lifting of Cancer in China early discovery rate and the improvement of overall popualtion healthy level certainly will be played a role in promoting.

Summary of the invention

The present invention is in order to overcome the deficiency of existing tumor screening technology specificity and sensitivity, it is provided that a kind of noinvasive lesion detection test kit based on peripheral blood dissociative DNA high-flux sequence and detection method. Radiationless, the hurtless measure based on the detection method of this test kit, it is only necessary to 4-10mL peripheral blood just can realize the detection to cancer. It is applicable to include incidence cancer (oral cancer, nasopharyngeal carcinoma, carcinoma of tongue, laryngeal carcinoma), alimentary tract cancer (esophageal carcinoma, gastric cancer, hepatocarcinoma, cancer of pancreas, intestinal cancer), the brain cancer, pulmonary carcinoma, reproductive system cancers (breast carcinoma, cervical cancer, uterus carcinoma, carcinoma of prostate, carcinoma of testis etc.), urinary system cancer (bladder cancer, renal carcinoma), skin carcinoma, lymphatic cancer, leukemia early diagnosis.

The present invention provides a kind of cancer detection reagent kit excavated based on large-scale data, and described test kit includes: DNA extraction kit, high-throughput sequencing library prepare reagent, gene order comparison software, chromosome coverage software for calculation.

Wherein said DNA extraction reagent is used for extracting peripheral blood dissociative DNA, prepare reagent by described high-throughput sequencing library and prepare high-throughput sequencing library, by to sequencing library high-flux sequence, described gene order comparison software is utilized to be made comparisons with reference to genome with the mankind by sequencing data, carry out data analysis to distinguish cancer and non-cancer patient by chromosome coverage software for calculation, analyze further and judge tumor type and be likely to primary tumor organ.

Wherein, described by data analysis to distinguish cancer and non-cancer patient, concrete operations are: by calculating every item chromosome order-checking coverage, divided by total order-checking amount, calculate relative cover; Under health condition, people every chromosome is two copies, and the copy number difference between chromosome is very little;Whereas if relative cover exceedes certain threshold value, then it is judged as doubtful cancer patient.

Wherein, described every chromosomal relative cover calculates according to below equation:

S c o r e = \frac{Σ_{i = 0}^{C h r_s i z e} {Cov}_{i}}{Σ_{i = 0}^{G e n o m e_s i z e} {Cov}_{i}}

Chr_size in formula represents chromosomal length, and Genome_size represents human genome total length, the total length that namely all chromosomes are added together; Cov_iRepresent genome or chromosome order-checking number of times on the i of position.

Wherein, described differentiation cancer and the method for non-cancer patient be: comparison of tumor genome copy numbers and Healthy People reference copies number obtain z-score:

z - s c o r e = \frac{S c o r e (s a m p l e) - A v e r a g e (S c o r e (r e f e r e n c e))}{S D (S c o r e (r e f e r e n c e))}

Score (sample) in formula represents the chromosome relative cover of test sample, and score (reference) represents the chromosome relative cover of Healthy People reference, and average represents and averages, and SD represents and seeks variance.

Z-score exceedes certain threshold value (+3/-3 or+6/-6) and is namely judged as doubtful cancer; Otherwise it is judged as non-cancer sample.

Wherein, described judgement tumor type and possible primary tumor organ method particularly includes: by searching for cancer patient genome copy numbers data base, it is judged that the tumor type of doubtful cancer; The calculating formula of similarity of test object and cancer patient's database data is:

Similarity (sample, db_i)=pearson_cor (sample, db_i).;

Sample in formula represents test object, db_iRepresenting that data item i, the pearson_cor of data base represent and seek Pearson's coefficient, Similarity represents the similarity of test object and data-base recording row. This computing formula can also is that pearson correlation coefficient, spearman rank correlation coefficient, or other statistics Calculation of correlation factor methods.

Tumor type front 1-5 the tumor type that to be judged as in data-base recording most like of test object:

Type (query)=Type (db_i): i=max_i(pearson_cor (query, db_i));

Wherein, the db in formula_iRepresent that data item i, the max of data base represent that maximizing, Type represent required cancer types, max_iRepresent the entry i that Pearson correlation coefficients is maximum.

Specific algorithm is described as, and selects 1-10000 record before in database search result, and frequency occurs in the tumor type adding up these records, selects 1-5 the tumor type that frequency is the highest, as final interpretation.

Wherein, the cancer patient's genome reference database used is based on full-length genome copy number data base or genome sequencing (WGS) data base of chip technology (affymetrixSNParray, Agilent copy number chip); Wherein sequencing data of whole genome storehouse is for coming from tissue order-checking (such as 1000genome data and TCGA full-length genome data) or peripheral blood dissociative DNA sequencing data; Concrete such as TCGA cancer gene group data base (http://cancergenome.nih.gov/) or GenomeSpace data base (http://www.genomespace.org/);

Wherein, described cancer is solid tumor or hematopathy, including incidence cancer, alimentary tract cancer, the brain cancer, pulmonary carcinoma, reproductive system cancers, urinary system cancer, skin carcinoma, lymphatic cancer, leukemia; Wherein said incidence cancer is oral cancer, nasopharyngeal carcinoma, carcinoma of tongue or laryngeal carcinoma; Described alimentary tract cancer is the esophageal carcinoma, gastric cancer, hepatocarcinoma, cancer of pancreas or intestinal cancer; Described reproductive system cancers is breast carcinoma, cervical cancer, uterus carcinoma, carcinoma of prostate or carcinoma of testis; Described urinary system cancer is bladder cancer or renal carcinoma.

Wherein, described high throughput sequencing technologies selects Roche/454, Illumina sequenator (NextSeq series, Hiseq series, MiSeq series, XTen, and follow-up sequenator series), BGI (Hua Da company, BGI500 is serial and follow-up sequenator) sequenator, LifeTech checks order instrument (Ion, Proton and follow-up order-checking instrument series), PacBio checks order instrument (RSII, Sequel and follow-up order-checking instrument) or based on the order-checking instrument (Genia, Nanopore and similar third generation sequenator) of Nanopore.

The present invention also provides for a kind of cancer detection technology excavated based on large-scale data, and the technical scheme of employing mainly comprises the steps that

(1) gathering peripheral blood from experimenter, separated plasma also extracts dissociative DNA with DNA extraction reagent;

(2) utilize high-throughput sequencing library to prepare reagent carry out archaeal dna polymerase amplification and set up sequencing library;

(3) prepared library is carried out high-flux sequence;

(4) gene order comparison software is utilized to be made comparisons with reference to genome with the mankind by sequencing data;

(5) data analysis is carried out to distinguish cancer and non-cancer patient by chromosome coverage software for calculation;

(6) judge tumor type by further data analysis and be likely to primary tumor organ.

Wherein, described in step (5) by data analysis to distinguish cancer and non-cancer patient, concrete operations are: by calculate every item chromosome order-checking coverage, divided by total order-checking amount, calculate relative cover; Under health condition, people every chromosome is two copies, and the copy number difference between chromosome is very little; Whereas if relative cover exceedes certain threshold value, then it is judged as doubtful cancer patient.

S c o r e = \frac{Σ_{i = 0}^{C h r_s i z e} {Cov}_{i}}{Σ_{i = 0}^{G e n o m e_s i z e} {Cov}_{i}}

Chr_size in formula represents chromosomal length, and Genome_size represents human genome total length, the total length that namely all chromosomes are added together. Cov_iRepresent genome or chromosome order-checking number of times on the i of position.

Wherein, the method for the differentiation cancer described in step (5) and non-cancer patient is: comparison of tumor genome copy numbers and Healthy People reference copies number obtain z-score:

z - s c o r e = \frac{S c o r e (s a m p l e) - A v e r a g e (S c o r e (r e f e r e n c e))}{S D (S c o r e (r e f e r e n c e))}

Z-score exceedes certain threshold value, namely z-score >=3 or��-3 or z-score >=6 or��-6 be namely judged as doubtful cancer; Otherwise it is judged as non-cancer sample.

Wherein, judge tumor type described in step (6) and be likely to primary tumor organ method particularly includes: by searching for cancer patient genome copy numbers data base, it is judged that the tumor type of doubtful cancer; The calculating formula of similarity of test object and cancer patient's database data is:

Similarity (sample, db_i)=pearson_cor (sample, db_i);

Type (query)=Type (db_i): i=max_i(pearson_cor (query, db_i));

Wherein, high throughput sequencing technologies described in step (3) selects Roche/454, Illumina sequenator (NextSeq series, Hiseq series, MiSeq series, XTen, and follow-up sequenator series), BGI (Hua Da company, BGI500 is serial and follow-up sequenator) sequenator, LifeTech checks order instrument (Ion, Proton and follow-up order-checking instrument series), PacBio checks order instrument (RSII, Sequel and follow-up order-checking instrument) or based on the order-checking instrument (Genia of Nanopore, Nanopore and similar third generation sequenator).

The beneficial effect comprise that:

(1) compared to congenic method, the present invention have employed peripheral blood ctDNA genome sequencing first and detects chromosomal copy number detection technique method, the improvement of initiative it is made that so that it is create safety, the raising of accuracy and improvement in lesion detection.

(2) relative to biochemical indicator, the present invention covers tumor type and tumor sufferer more.

(3) relative to iconography (PET, CT), harm that the present invention is radiationless, can repeated multiple times frequently use, it is possible to for the real-time tracking of tumor.

(4) relative to DNA methylation assay technology, chromosomal copy number change has higher tumour-specific. Clinical practice can be substantially reduced false positive rate.

(5) relative to DNA methylation assay technology, the present invention operates simpler, uses more stable in each clinical laboratory or hospital environment.

(6) relative to target site detection technique, the present invention adopts genome sequencing, and the coverage of tumor is greatly increased.

(7) in the present invention, ctDNA large scale sequencing and data processing links have the technical characterstic such as noinvasive, quick, wide spectrum examination, the application of this feature, have and significantly improve diagnosing tumor, examination, improve the effect of many people's health status, there is positive clinical value in liquid biopsy.

(8) present invention can be applied to the aspects such as screening for cancer diagnosis, (postoperative, medication) recruitment evaluation after treatment of cancer, and tumor recurrence prompting, it is adaptable to the research and development of hospital, colleges and universities and association area enterprise or application work.

(9) relative to other blood testings, the present invention can differentiate tumor type and former organ of possible tumor further. Further tumor is helped to make a definite diagnosis.

Accompanying drawing explanation

Fig. 1 is the method operating process of the present invention;

The algorithm that Fig. 2 is the present invention realizes process;

Fig. 3 is tumour patient peripheral blood score and Healthy People comparative result.

Detailed description of the invention

In order to be better understood from the present invention, below in conjunction with specific embodiments and the drawings, the present invention is explained further explanation.

Embodiment 1: take a blood sample in periphery, extracts dissociative DNA with DNA extraction reagent

1. gather experimenter's peripheral blood 4mL to EDTA anticoagulant tube. Experimenter all adopts the principle of voluntariness, and cancer specimen comes from Affiliated Hospital of Jiangsu University.

2. fresh blood is in 4 hours, centrifugal 10 minutes of 1600g, takes supernatant and transfers in 1.5mLEP pipe, and rifle head is avoided encountering intermediate layer and bottom erythrocyte.

Centrifugal 10 minutes of 3.16000g, takes supernatant and moves to new 1.5mLEP pipe, puts-80 degree refrigerator freezings and preserves.

4. dissociative DNA extracts: adopting standard Qiagen dissociative DNA to extract test kit (QIAGEN, QiaAmpDNABloodMiniKit, 55114), operate to specifications, every 4mL peripheral blood extracts 1-50ngDNA.

Concrete operation step is as follows:

(1) take after 1 pipe blood plasma melts on ice, add the QIAGEN E.C. 3.4.21.64 of 100 �� L.

(2) adding 0.8mLBufferACL (adding 1.0 �� gcarrierRNA in advance), lid upper tube cap, vortex 30s, until liquid in pipe is in homogeneous.

Hatch 15��20min for (3) 60 DEG C.

(4) BufferACB of 1.8mL, vortex mixing 15��30s are added; Ice puts 5min.

(5) QIAamp microtrabeculae is inserted in the Vac adapter being placed in QIAvac24Plus, 20mL pipe expander is inserted in QIAamp microtrabeculae.

(6) the cracking mixed liquor of (4th) step gained is carefully added in the pipe expander of QIAamp microtrabeculae, open vacuum pump, treating that all lysates all ooze completely in pipe, close vacuum pump, release of pressure is to 0mbar, and careful taking-up pipe expander also discards.

(7) adding 600 �� LBufferACW1 in pipe, keep pipe lid to open, open vacuum pump, allow BufferACW1 penetrate QIAamp microtrabeculae completely, close vacuum pump, release of pressure is to 0mba.

(8) in QIAamp microtrabeculae, 750 �� LBufferACW2 are added; Keeping pipe lid to open, open vacuum pump, allow ACW2buffer ooze QIAamp microtrabeculae completely, close vacuum pump, release of pressure is to 0mbars.

(9) adding 750 �� L ethanol (96 100%) to QIAamp microtrabeculae, keep pipe lid to open, open under vacuum pump makes all ethanol ooze completely, close vacuum pump, release of pressure is to 0mbars.

(10) stopped pipe lid is closed; QIAamp microtrabeculae is taken off from vacuum manifold, abandons Vac adapter; QIAamp microtrabeculae is positioned in new 2mL connecting tube, is centrifuged (20,000xg at full speed; 14,000rpm) 3min.

(11) QIAamp microtrabeculae is positioned over new 2mL collecting pipe, opens pipe lid, hatch 10min for 56 DEG C.

(12) QIAamp microtrabeculae is positioned on new 1.5mL eluting pipe, discards the collecting pipe of previous step; The careful BufferAVE adding 20��150 �� L in the middle of film. Close upper tube cap, incubated at room 3min.

(13) (20,000xg it is centrifuged at full speed; 14,000rpm) 1min is with eluting nucleic acid, collects and obtains plasma free double-stranded DNA.

Embodiment 2: utilize high-throughput sequencing library to prepare reagent and carry out archaeal dna polymerase amplification and set up sequencing library

The secondary sequencing library test kit of KAPA (KAPAHyperPlus library prepares test kit, 28100) of employing standard, builds storehouse upper machine order-checking (adopting the high-flux sequence instrument of illumina company) on acquired dissociative DNA product.

In the present embodiment, concrete operation step is as follows

(1) end reparation reaction:

Prepare following reaction mixture:

Reaction mixture is placed 20 DEG C of incubation 30min; It is purified (using AgencourtAMPureXPBeads test kit, article No. A63882) immediately, has obtained the product of end reparation reaction;

Guarantee that AMPureXPBeads equilibrates to room temperature; Add AMPureXPBeads in end reparation and reaction, prepare following reaction mixture:

End repairs product 100 �� L

AMPureXPBeads160��L

Being blown and beaten more than 10 times by mixed liquor rifle head, whirlpool shakes; Ambient temperatare puts 15min makes DNA and beads combine; Pipe is placed on supporting magnetic frame 15min or etc. liquid clarify completely; Carefully sopping up 255 �� L liquid (keeping off any pearl), some liquid are likely to still to remain in pipe; Pipe is placed in magnetic field and does not encounter pearl, with 200 �� L80% ethanol purge at least 30s; Carefully sop up ethanol (not encountering pearl), repeat to wash twice with 80% ethanol; Pipe is left from magnetic field, and room temperature 15min dries pearl; Again with 32.5 �� L elution buffer eluting, room temperature places 2min released dna; Pipe is placed on the liquid such as suitable magnetic field 15min clarify completely; Supernatant 30 �� LDNA is moved on in the pipe that the row that to spout adds A reaction.

(2) A reaction is added:

Prepare following reaction mixture:

30 DEG C of incubation 30min; It is purified (using AgencourtAMPureXPBeads test kit) immediately:

Guarantee that AMPureXPBeads equilibrates to room temperature, add AMPureXPBeads:

Add A product 50 �� L

AMPureXPBeads90��L

Total 140 �� L

Blowing and beating more than 10 times with rifle head, vortex shakes; Ambient temperatare puts 15min makes DNA and beads combine; Pipe is placed on supporting magnetic frame 15min or etc. liquid clarify completely; Carefully sopping up 135 �� L liquid (keeping off any pearl), some liquid are likely to still to remain in pipe; Pipe is placed in magnetic field and does not encounter pearl, with 200 �� L80% ethanol purge at least 30s; Carefully sop up ethanol (not encountering pearl), repeat to wash twice with 80% ethanol; Pipe is left from magnetic field, and room temperature 15min dries pearl; Adding 32.5 �� L elution buffers, room temperature places 2min released dna; Pipe is placed in suitable magnetic field 15min or etc. liquid clarify completely; Supernatant 30 �� LDNA is moved on in the pipe carrying out joint coupled reaction.

Add MinElut:

Add A product 50 �� L

ERC buffer 300 �� L

Mixed system is added in post, centrifugal, abandon eluent; By 750 �� LPE buffer solution for cleaning, abandon eluent; 10000g is centrifuged 2min, removes ethanol; In elution, the transfer clean aseptic micro-centrifuge tube of pillar to, add 31 �� LEB buffer. Ambient temperatare puts 1min, centrifugal, reclaims about 30 �� L liquid.

(3) joint connects

Joint coupled reaction, prepares following reaction mixture:

It is placed on 20 DEG C of incubation 15min, is purified immediately:

The first step: AMPureXPBeads purification:

Guarantee that AMPureXPBeads equilibrates to room temperature; Add AMPureXPBeads:

Joint coupled reaction product 50 �� L

AMPureXPBeads50��L

Total 100 �� L

Blowing and beating more than 10 times with rifle head, vortex shakes; Ambient temperatare puts 15min makes DNA and beads combine; Pipe is placed on supporting magnetic frame 15min or etc. liquid clarify completely; Carefully sopping up 95 �� L liquid (keeping off any pearl), some liquid are likely to still to remain in pipe; Being placed in magnetic field by pipe and do not encounter pearl, disappearing with 200 �� L80% ethanol is washed till few 30s;Carefully sop up ethanol (not encountering pearl), repeat to wash twice with 80% ethanol; Pipe is left from magnetic field, and room temperature 15min dries pearl; Adding 52.5 �� L elution buffers, room temperature places 2min released dna; Pipe is placed in suitable magnetic field l5min or etc. liquid clarify completely; Supernatant 50 �� LDNA is moved on in the pipe carrying out adding second step AMPureXPBeads purification step;

Second step: AMPureXPBeads purification

The library DNA 50 �� L that first step purification obtains

AMPureXPBeads50��L

Total 100 �� L

Blowing and beating more than 10 times with rifle head, vortex shakes; Ambient temperatare puts l5min makes DNA and beads combine; Pipe is placed on supporting magnetic frame 15min or etc. liquid clarify completely; Carefully sopping up 95 �� L liquid (keeping off any pearl), some liquid are likely to still to remain in pipe; Pipe is placed in magnetic field and does not encounter pearl, with 200 �� L80% ethanol purge at least 30s; Carefully sop up ethanol (not encountering pearl), repeat to wash twice with 80% ethanol; Pipe is left from magnetic field, and room temperature 15min dries pearl; Adding 52.5 �� L elution buffers, room temperature places 2min released dna; Pipe is placed in suitable magnetic field 15min or etc. liquid clarify completely; Supernatant 50 �� LDNA is moved on in the pipe storing library DNA.

(4) clip size selects

The linkers being not connected with is removed before amplified library, prevent the formation of joint dimer and other short circuit head derived molecules, affect amplification and the sequencing procedure in downstream, in the present embodiment manually agarose gel electrophoresis, cut glue, purification to adapt to the selection of clip size.

The present embodiment carries out electrophoresis in the agarose gel of 2%, choosing clip size is that 280 to 320bp bands are (because ctDNA's is sized to 150-180bp, joint be sized to 120bp, so building storehouse post-fragment, to add the stripe size of top connection be 280-320bp) carry out cutting glue and reclaim, utilize QIAquick glue to reclaim test kit (QIAGEN, 28706) and reclaim purpose fragment.

(5) library enrichment/amplification

A. the preparatory stage:

Defrosting completely, of short duration concussion, centrifugal KAPA high-fidelity thermal starting prepare system (2 ��) and PCR primer.

Defrosting completely, of short duration concussion, the centrifugal joint connected and the library DNA through size separation purification.

With distinctive round-robin method preliminary experiment on thermal cycler.

B. reaction system:

Configuration PCR system, reaction system is:

Rifle head will be changed after each imbibition. Stopped pipe reacts, and mixes gently, of short duration centrifugal.

C. circular response parameter: 98 DEG C of reaction 2min; 98 DEG C, 30s; 65 DEG C, 30s; 72 DEG C, 1min, 17 circulations; 72 DEG C, 5min.

D.PCR purification:

Purify with AMPureXPBeads.

E. library checking:

Verify the size of PCR rich segment with agarose gel electrophoresis and Bioanalyzer (Agilent2100, Shanghai branch company of Agilent Technologies), check fragment size distribution. Testing result display library main peak is respectively at about 300bp, and peak type is single, without assorted peak, and non junction and without primer dimer, can determine whether that Insert Fragment size is qualified.

Adopting realtimePCR method to carry out quantitatively to building library, library concentration can determine whether to ask for confidential on meeting at 1ng/ more than �� L.

Embodiment 2: the detection of high-flux sequence, large-scale data analysis and tumor

(1) sequence alignment

The obtained sequencing data of Illumina sequenator (50,75,100,150 etc. various read long), removes sequence measuring joints, removes sample label (barcode), shear the steps such as low quality region, obtain valid data.

By use software bwa-mem (http://bio-bwa.sourceforge.net/), comparison is to standard people with reference to genome hg19, and comparison file is retained in .bam file.

(2) statistics chromosome coverage

Use software samtools (samtools.sourceforge.net) that bam file translations is become .mpileup file, obtain the order-checking coverage data in each site of genome in .mpileup file, coverage according to each each site of genome, calculates the average within the scope of chromosome; Namely this algorithm is by calculating every item chromosome order-checking coverage (average time that on genome, single base is sequenced), divided by total order-checking amount, calculate relative cover (average time that on genome, single base is sequenced divided by all sites be on average sequenced number of times). Computational methods are as follows:

S c o r e = \frac{Σ_{i = 0}^{C h r_s i z e} {Cov}_{i}}{Σ_{i = 0}^{G e n o m e_s i z e} {Cov}_{i}} .

Wherein, the Chr_size in formula represents chromosomal length, and Genome_size represents human genome total length, the total length that namely all chromosomes are added together. Cov_iRepresent genome or chromosome order-checking number of times on the i of position.

(3) cancer is determined whether

Under health condition, people every chromosome is two copies, and the copy number difference between chromosome is very little. Whereas if relative cover exceedes certain threshold value, then it is judged as doubtful cancer patient.

Specifically comprise the following steps that

According to step (2), calculate each chromosomal z-score, comparison of tumor genome relative cover and Healthy People and obtain z-score with reference to relative cover.

z - s c o r e = \frac{S c o r e (s a m p l e) - A v e r a g e (S c o r e (r e f e r e n c e))}{S D (S c o r e (r e f e r e n c e))}

Wherein, score (sample) in formula represents the chromosome relative cover of test sample, and score (reference) represents the chromosome relative cover of Healthy People reference, and average represents and averages, and SD represents and seeks variance.

Result of calculation numerical value is as follows:

Each chromosomal z-score result of each sample and Healthy People comparison in table 1. the present embodiment

The z-score sample be more than or equal to 3 or less than or equal to negative 3 is judged as doubtful cancer.

As it is shown on figure 3, each histogram data represents the score of 21 normal dyeing of each sample, it is pulmonary carcinoma, hepatocarcinoma and normal healthy controls from left to right successively.

Each cancer sample standard deviation has a plurality of chromosome z-score beyond threshold value; Each normal healthy controls, every chromosome is all in threshold range. It can be seen that, in the present embodiment, 6/6=100% cancer sample z-score is higher than 3 or lower than-3, wherein each chromosome of 5/6=83% example cancer blood sample z-score is well beyond range of normal value, 0/5 normal person's blood sample exceeds range of normal value, so this test sensitivity is 100%, specificity is 100%.

(4) tumor classification judges

By searching for cancer patient genome copy numbers data base TCGA (http://cancergenome.nih.gov/), it is judged that the tumor type of doubtful cancer. The calculating formula of similarity of test object and cancer patient's database data is:

Similarity (sample, db_i)=pearson_cor (sample, db_i);

Wherein, the sample in formula represents test object, db_iRepresenting that data item i, the pearson_cor of data base represent and seek Pearson's coefficient, Similarity represents the similarity of test object and data-base recording row.

Type (query)=Type (db_i): i=max_i(pearson_cor (query, db_i));

By search comparison TCGA cancer patient gene copy number database search, the database hit of pulmonary carcinoma-005 respectively LUSC (lungsquamouscellcarcinoma, lung squamous cancer), HNSC (head&necksquamouscellcarcinoma, G. cephalantha), LUSC, LUSC, ESCC (Esophagealsquamouscellcarcinoma, esophageal squamous cell carcinoma), HNSC, LUSC, LUSC, LUSC and ESCC, wherein most hits are lung squamous cancers, cancer class interpretation is lung squamous cancer (probability 50%), esophageal squamous cell carcinoma (probability 20%) and other scale cancer (probability 20%).

The database hit of pulmonary carcinoma-004 respectively LUAD (lungadenocarcinoma, adenocarcinoma of lung), STAD (stomachadenocarcinoma, adenocarcinoma of stomach), LUAD, CRC (colorectaladenocarcinoma, Colon and rectum adenocarcinoma), PRAD (pancreaticadenocarcinoma, cancer of pancreas), STAD, LUAD, LUAD, CRC. The hit of data base's majority is adenocarcinoma of lung, and final sentence read result is adenocarcinoma of lung (probability 50%), intestinal cancer (probability 20%), other adenocarcinoma probability 30%.

The database hit of hepatocarcinoma-002 is LIHC (liverhepatocellularcarcinoma, hepatocarcinoma), LIHC, LIHC, LIHC, LUAD, LIHC, LIHC, LIHC, LIHC, STAD and LIHC. Most hits are hepatocarcinoma, and final interpretation is hepatocarcinoma (probability 80%), and other cancer probability are 20%.

Decision rule is suitable for the interpretation of other samples, pulmonary carcinoma-003, pulmonary carcinoma-002 and hepatocarcinoma 002, repeats no more.

The method of table 2. present invention is in the testing result of all kinds of cancers

Claims

1. the cancer detection reagent kit excavated based on large-scale data, it is characterised in that described test kit includes: DNA extraction reagent, high-throughput sequencing library prepare reagent, gene order comparison software, chromosome coverage software for calculation; Wherein said DNA extraction reagent is used for extracting peripheral blood dissociative DNA, prepare reagent by described high-throughput sequencing library and prepare high-throughput sequencing library, by to sequencing library high-flux sequence, described gene order comparison software is utilized to be made comparisons with reference to genome with the mankind by sequencing data, carry out data analysis to distinguish cancer and non-cancer patient by chromosome coverage software for calculation, analyze further and judge tumor type and be likely to primary tumor organ.

2. a kind of cancer detection reagent kit excavated based on large-scale data according to claim 1, it is characterized in that, described by data analysis to distinguish cancer and non-cancer patient, concrete operations are: calculate every item chromosome order-checking coverage, divided by total order-checking amount, calculate relative cover; Under health condition, people every chromosome is two copies, and the copy number difference between chromosome is very little; Whereas if relative cover exceedes certain threshold value, then it is judged as doubtful cancer patient; Described every chromosomal relative cover calculates according to below equation:

S c o r e = \frac{Σ_{i = 0}^{C h r_s i z e} {Cov}_{i}}{Σ_{i = 0}^{G e n o m e_s i z e} {Cov}_{i}}

3. a kind of cancer detection reagent kit excavated based on large-scale data according to claim 1 and 2, it is characterized in that, the method distinguishing cancer and non-cancer patient is: comparison of tumor genome copy numbers and Healthy People reference copies number obtain z-score, z-score and exceed certain threshold value and z-score >=3 or��-3 or z-score >=6 or��-6 and be namely judged as doubtful cancer;Otherwise it is judged as non-cancer sample;

z - s c o r e = \frac{S c o r e (s a m p l e) - A v e r a g e (S c o r e (r e f e r e n c e))}{S D (S c o r e (r e f e r e n c e))}

4. a kind of cancer detection reagent kit excavated based on large-scale data according to claim 1, it is characterized in that, described judgement tumor type and possible primary tumor organ method particularly includes: by searching for cancer reference database, it is judged that the tumor type of doubtful cancer; The calculating formula of similarity of test object and cancer patient's database data is:

Similarity (sample, db_i)=pearson_cor (sample, db_i);

Sample in formula represents test object, db_iRepresenting that data item i, the pearson_cor of data base represent and seek Pearson's coefficient, Similarity represents the similarity of test object and data-base recording row; Tumor type front 1-5 the tumor type that to be judged as in data-base recording most like of test object:

Type (query)=Type (db_i): i=max_i(pearson_cor (query, db_i));

Db in formula_iRepresent that data item i, the max of data base represent that maximizing, Type represent required cancer types, max_iRepresent the entry i that Pearson correlation coefficients is maximum;

5. a kind of cancer detection reagent kit excavated based on large-scale data according to any one of claim 1-4, it is characterized in that, described cancer is solid tumor or hematopathy, including incidence cancer, alimentary tract cancer, the brain cancer, pulmonary carcinoma, reproductive system cancers, urinary system cancer, skin carcinoma, lymphatic cancer, leukemia.

6. a kind of cancer detection reagent kit according to claim 5, it is characterised in that described incidence cancer is oral cancer, nasopharyngeal carcinoma, carcinoma of tongue or laryngeal carcinoma; Described alimentary tract cancer is the esophageal carcinoma, gastric cancer, hepatocarcinoma, cancer of pancreas or intestinal cancer; Described reproductive system cancers is breast carcinoma, cervical cancer, uterus carcinoma, carcinoma of prostate or carcinoma of testis; Described urinary system cancer is bladder cancer or renal carcinoma.

7. the method for detecting cancer excavated based on large-scale data, it is characterized in that, the method of the noinvasive lesion detection that described method is based on peripheral blood dissociative DNA high-flux sequence, is excavated by large-scale data, stable data calculate, specifically includes following steps:

(3) prepared library is carried out high-flux sequence;

8. a kind of method for detecting cancer excavated based on large-scale data according to claim 7, it is characterized in that, described in step (5) by data analysis to distinguish cancer and non-cancer patient, concrete operations are: by calculating every item chromosome order-checking coverage, divided by total order-checking amount, calculate relative cover;Under health condition, people every chromosome is two copies, and the copy number difference between chromosome is very little; Whereas if relative cover exceedes certain threshold value, then it is judged as doubtful cancer patient;

Described every chromosomal relative cover calculates according to below equation:

S c o r e = \frac{Σ_{i = 0}^{C h r_s i z e} {Cov}_{i}}{Σ_{i = 0}^{G e n o m e_s i z e} {Cov}_{i}}

9. a kind of method for detecting cancer excavated based on large-scale data according to claim 7 or 8, it is characterised in that the method distinguishing cancer and non-cancer patient is: comparison of tumor genome copy numbers and Healthy People reference copies number obtain z-score:

z - s c o r e = \frac{S c o r e (s a m p l e) - A v e r a g e (S c o r e (r e f e r e n c e))}{S D (S c o r e (r e f e r e n c e))}

Score (sample) in formula represents the chromosome relative cover of test sample, and score (reference) represents the chromosome relative cover of Healthy People reference, and average represents and averages, and SD represents and seeks variance;

Z-score exceedes certain threshold value and z-score >=3 or��-3 or z-score >=6 or��-6 and is namely judged as doubtful cancer; Otherwise it is judged as non-cancer sample.

10. a kind of method for detecting cancer excavated based on large-scale data according to claim 7, it is characterized in that, judge tumor type described in step (6) and be likely to primary tumor organ method particularly includes: by searching for cancer reference database, it is judged that the tumor type of doubtful cancer; The calculating formula of similarity of test object and cancer patient's database data is:

Similarity (sample, db_i)=pearson_cor (sample, db_i);

Type (query)=Type (db_i): i=max_i(pearson_cor (query, db_i));

11. a kind of method for detecting cancer excavated based on large-scale data according to any one of claim 7-10, it is characterized in that, described cancer is solid tumor or hematopathy, including incidence cancer, alimentary tract cancer, the brain cancer, pulmonary carcinoma, reproductive system cancers, urinary system cancer, skin carcinoma, lymphatic cancer, leukemia.

12. a kind of method for detecting cancer excavated based on large-scale data according to claim 11, it is characterised in that described incidence cancer is oral cancer, nasopharyngeal carcinoma, carcinoma of tongue or laryngeal carcinoma; Described alimentary tract cancer is the esophageal carcinoma, gastric cancer, hepatocarcinoma, cancer of pancreas or intestinal cancer; Described reproductive system cancers is breast carcinoma, cervical cancer, uterus carcinoma, carcinoma of prostate or carcinoma of testis; Described urinary system cancer is bladder cancer or renal carcinoma.

13. a kind of method for detecting cancer excavated based on large-scale data according to claim 7, it is characterised in that described high throughput sequencing technologies selects Roche/454;Illumina sequenator, such as NextSeq series, Hiseq series, MiSeq series, XTen and follow-up sequenator series; BGI sequenator, as serial in Hua Da company sequenator, BGI500 and follow-up sequenator; LifeTech checks order instrument, as Ion, Proton and follow-up order-checking instrument series, PacBio check order instrument, such as RSII, Sequel and follow-up order-checking instrument; Or the order-checking instrument based on Nanopore, such as Genia, Nanopore and similar third generation sequenator.

14. a kind of method for detecting cancer excavated based on large-scale data according to claim 10, it is characterised in that described cancer patient's genome reference database is based on full-length genome copy number data base or the sequencing data of whole genome storehouse of chip technology; Wherein sequencing data of whole genome storehouse is for coming from tissue order-checking or peripheral blood dissociative DNA sequencing data.