CN105653898A - Cancer detection kit based on large-scale data mining and detection method - Google Patents
Cancer detection kit based on large-scale data mining and detection method Download PDFInfo
- Publication number
- CN105653898A CN105653898A CN201610018232.9A CN201610018232A CN105653898A CN 105653898 A CN105653898 A CN 105653898A CN 201610018232 A CN201610018232 A CN 201610018232A CN 105653898 A CN105653898 A CN 105653898A
- Authority
- CN
- China
- Prior art keywords
- cancer
- carcinoma
- data
- chromosome
- genome
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 210
- 201000011510 cancer Diseases 0.000 title claims abstract description 140
- 238000001514 detection method Methods 0.000 title claims abstract description 39
- 238000007418 data mining Methods 0.000 title description 2
- 210000000349 chromosome Anatomy 0.000 claims abstract description 52
- 238000000034 method Methods 0.000 claims abstract description 35
- 239000003153 chemical reaction reagent Substances 0.000 claims abstract description 24
- 238000012163 sequencing technique Methods 0.000 claims abstract description 18
- 210000005259 peripheral blood Anatomy 0.000 claims abstract description 17
- 239000011886 peripheral blood Substances 0.000 claims abstract description 17
- 238000012165 high-throughput sequencing Methods 0.000 claims abstract description 15
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 11
- 238000007400 DNA extraction Methods 0.000 claims abstract description 8
- 230000003321 amplification Effects 0.000 claims abstract description 6
- 238000003199 nucleic acid amplification method Methods 0.000 claims abstract description 6
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 claims abstract description 4
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 claims abstract description 4
- 210000001519 tissue Anatomy 0.000 claims abstract description 4
- 238000012360 testing method Methods 0.000 claims description 40
- 238000005516 engineering process Methods 0.000 claims description 21
- 230000002759 chromosomal effect Effects 0.000 claims description 14
- 238000007405 data analysis Methods 0.000 claims description 11
- 206010058467 Lung neoplasm malignant Diseases 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 208000016847 malignant urinary system neoplasm Diseases 0.000 claims description 9
- 210000000056 organ Anatomy 0.000 claims description 9
- 208000015608 reproductive system cancer Diseases 0.000 claims description 9
- 201000004435 urinary system cancer Diseases 0.000 claims description 9
- 208000005016 Intestinal Neoplasms Diseases 0.000 claims description 6
- 206010061902 Pancreatic neoplasm Diseases 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 201000002313 intestinal cancer Diseases 0.000 claims description 6
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 claims description 6
- 206010005003 Bladder cancer Diseases 0.000 claims description 5
- 208000003174 Brain Neoplasms Diseases 0.000 claims description 5
- 206010006187 Breast cancer Diseases 0.000 claims description 5
- 208000026310 Breast neoplasm Diseases 0.000 claims description 5
- 201000009030 Carcinoma Diseases 0.000 claims description 5
- 208000017897 Carcinoma of esophagus Diseases 0.000 claims description 5
- 206010008342 Cervix carcinoma Diseases 0.000 claims description 5
- 208000003445 Mouth Neoplasms Diseases 0.000 claims description 5
- 208000002454 Nasopharyngeal Carcinoma Diseases 0.000 claims description 5
- 206010061306 Nasopharyngeal cancer Diseases 0.000 claims description 5
- 206010030155 Oesophageal carcinoma Diseases 0.000 claims description 5
- 206010038389 Renal cancer Diseases 0.000 claims description 5
- 208000006265 Renal cell carcinoma Diseases 0.000 claims description 5
- 208000005718 Stomach Neoplasms Diseases 0.000 claims description 5
- 206010043966 Tongue neoplasm malignant stage unspecified Diseases 0.000 claims description 5
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 claims description 5
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 claims description 5
- 201000008275 breast carcinoma Diseases 0.000 claims description 5
- 201000010881 cervical cancer Diseases 0.000 claims description 5
- 201000005619 esophageal carcinoma Diseases 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 5
- 206010017758 gastric cancer Diseases 0.000 claims description 5
- 230000036541 health Effects 0.000 claims description 5
- 201000005264 laryngeal carcinoma Diseases 0.000 claims description 5
- 208000032839 leukemia Diseases 0.000 claims description 5
- 208000012987 lip and oral cavity carcinoma Diseases 0.000 claims description 5
- 230000001926 lymphatic effect Effects 0.000 claims description 5
- 201000011216 nasopharynx carcinoma Diseases 0.000 claims description 5
- 201000001514 prostate carcinoma Diseases 0.000 claims description 5
- 201000010174 renal carcinoma Diseases 0.000 claims description 5
- 201000000849 skin cancer Diseases 0.000 claims description 5
- 201000008261 skin carcinoma Diseases 0.000 claims description 5
- 201000011549 stomach cancer Diseases 0.000 claims description 5
- 210000001550 testis Anatomy 0.000 claims description 5
- 201000005112 urinary bladder cancer Diseases 0.000 claims description 5
- 208000012991 uterine carcinoma Diseases 0.000 claims description 5
- 238000001712 DNA sequencing Methods 0.000 claims description 3
- 230000003902 lesion Effects 0.000 claims description 3
- 238000012216 screening Methods 0.000 abstract description 4
- 230000035945 sensitivity Effects 0.000 abstract description 4
- 210000004027 cell Anatomy 0.000 abstract description 3
- 230000007547 defect Effects 0.000 abstract description 2
- 238000009826 distribution Methods 0.000 abstract description 2
- 238000012177 large-scale sequencing Methods 0.000 abstract description 2
- 238000002864 sequence alignment Methods 0.000 abstract description 2
- 206010052428 Wound Diseases 0.000 abstract 1
- 208000027418 Wounds and injury Diseases 0.000 abstract 1
- 238000005065 mining Methods 0.000 abstract 1
- 238000002360 preparation method Methods 0.000 abstract 1
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 30
- 108020004414 DNA Proteins 0.000 description 21
- 239000007788 liquid Substances 0.000 description 18
- 210000004369 blood Anatomy 0.000 description 9
- 239000008280 blood Substances 0.000 description 9
- 238000006243 chemical reaction Methods 0.000 description 8
- 239000012634 fragment Substances 0.000 description 7
- 238000000746 purification Methods 0.000 description 7
- 239000006228 supernatant Substances 0.000 description 6
- 208000036765 Squamous cell carcinoma of the esophagus Diseases 0.000 description 5
- 208000007276 esophageal squamous cell carcinoma Diseases 0.000 description 5
- 238000011528 liquid biopsy Methods 0.000 description 5
- 239000000047 product Substances 0.000 description 5
- 239000011541 reaction mixture Substances 0.000 description 5
- 208000010507 Adenocarcinoma of Lung Diseases 0.000 description 4
- 239000011324 bead Substances 0.000 description 4
- 239000012149 elution buffer Substances 0.000 description 4
- 238000012268 genome sequencing Methods 0.000 description 4
- 230000007067 DNA methylation Effects 0.000 description 3
- 108700019961 Neoplasm Genes Proteins 0.000 description 3
- 102000048850 Neoplasm Genes Human genes 0.000 description 3
- 206010061534 Oesophageal squamous cell carcinoma Diseases 0.000 description 3
- 238000003556 assay Methods 0.000 description 3
- 238000010009 beating Methods 0.000 description 3
- 238000007664 blowing Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 239000003292 glue Substances 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000011534 incubation Methods 0.000 description 3
- 210000004072 lung Anatomy 0.000 description 3
- 210000002381 plasma Anatomy 0.000 description 3
- 238000010926 purge Methods 0.000 description 3
- 238000011282 treatment Methods 0.000 description 3
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 2
- 206010009944 Colon cancer Diseases 0.000 description 2
- 238000003646 Spearman's rank correlation coefficient Methods 0.000 description 2
- 238000000246 agarose gel electrophoresis Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000009514 concussion Effects 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 239000000539 dimer Substances 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000003480 eluent Substances 0.000 description 2
- 201000006585 gastric adenocarcinoma Diseases 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 150000002500 ions Chemical class 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000010257 thawing Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 206010052747 Adenocarcinoma pancreas Diseases 0.000 description 1
- 206010052360 Colorectal adenocarcinoma Diseases 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 241000417413 Gentiana cephalantha Species 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 208000009956 adenocarcinoma Diseases 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 239000003146 anticoagulant agent Substances 0.000 description 1
- 229940127219 anticoagulant drug Drugs 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003150 biochemical marker Substances 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 230000004087 circulation Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 201000010897 colon adenocarcinoma Diseases 0.000 description 1
- 208000029742 colonic neoplasm Diseases 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000005336 cracking Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000002651 drug therapy Methods 0.000 description 1
- 238000004043 dyeing Methods 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 238000010828 elution Methods 0.000 description 1
- 210000003743 erythrocyte Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 208000006454 hepatitis Diseases 0.000 description 1
- 231100000283 hepatitis Toxicity 0.000 description 1
- 238000005213 imbibition Methods 0.000 description 1
- 201000005249 lung adenocarcinoma Diseases 0.000 description 1
- 201000005243 lung squamous cell carcinoma Diseases 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 239000000155 melt Substances 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 201000002094 pancreatic adenocarcinoma Diseases 0.000 description 1
- 230000002980 postoperative effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 201000007094 prostatitis Diseases 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000001959 radiotherapy Methods 0.000 description 1
- 238000003753 real-time PCR Methods 0.000 description 1
- 230000007115 recruitment Effects 0.000 description 1
- 201000001281 rectum adenocarcinoma Diseases 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000007616 round robin method Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a cancer detection kit based on large-scale mining and a detection method, and belongs to the technical field of biomedical detection. The kit comprises a DNA extraction reagent, a high-throughput sequencing library preparation reagent, gene sequence alignment software and chromosome cover degree calculation software. The method comprises the steps that firstly, peripheral blood is collected from a subject and plasma is separated out; DNA polymerase is used for amplification and a sequencing library is prepared; large-scale sequencing is performed on the prepared library; sequence alignment is performed on a sequencing result, statistic is performed on the distribution situation on a genome, and whether a cancer cell genomic sequence from tumor tissue exists or not is judged, so that whether a detection object carries a cancer or not is judged. According to the cancer detection kit and the detection method, the defect that existing cancer screening specificity and sensitivity are poor is overcome, the method causes no irradiation or wounds, and cancer detection can be achieved only through 4-10 mL of peripheral blood.
Description
Technical field
The present invention relates to a kind of cancer detection reagent kit excavated based on large-scale data and detection method, be specifically related to a kind of based on high-flux sequence data mining and noinvasive cancer detection box and detection method, belong to biomedical detection technique.
Technical background
The number of malignant tumor often homologous chromosomes and the variation of structure are associated, because being usually present two chromosomoids in cell, some chromosomes are with the gene expressing malignant tumor, and other chromosomes are then with the gene suppressing malignant tumor. If chromosome morphs, then can cause the imbalance of gene on chromosome, and tumor occurs.
Current cancer detection depends on blood biochemical markers thing and Imaging Technology. Biochemical marker includes CEA, AFP, PSA etc., and the major defect of these labels is: the cancer species that (1) covers is limited, and most cancers can't effectively detect; (2) poor specificity, some underlying diseases, AFP can be caused to rise such as hepatitis and cause erroneous judgement, prostatitis also results in PSA rising and causes erroneous judgement etc.; (3) half-life is long in blood, can survive 7 to 28 days in blood such as AFP albumen or the longer time, be difficult to specific real-time tracking operative treatment, radiotherapy, Drug therapy effect. Imaging Technology includes B-and surpasses, CT and PET, as long as haveing the drawback that: (1) most detection comprises certain radiation hazradial bundle, it is impossible to enough too use frequently, also cannot be used for the real-time discovery of cancer; (2) operate more complicated, be difficulty with large-scale crowd's tumor screening.
Another kind of monitoring method is liquid biopsy, is utilize high throughput sequencing technologies to measure ctDNA fragment in tumor patient in fact, according to the oncogene information analyzed entrained by the ctDNA obtained, it becomes possible to reflect the feature of tumor all sidedly. This technology is by feat of the feature such as accurate, sensitive, noinvasive, high flux, and liquid biopsy can provide effective power-assisted for doctor and patient in diagnosing from treatment and prevention of tumour, treating reference, medication guide, state of an illness monitoring, recurrence early warning etc.In recent years, increasing biological medicine researcher finds, it is possible to carrying out NGS order-checking by human peripheral blood DNA, whether detection cancer occurs and predict risk of cancer. Checked order by full-length genome ctDNA, it is possible to calculate chromosomal copy number situation of change, thus judging whether suffer from tumor.
Such a method based on extensive high throughput sequencing technologies, will be not limited only to the order-checking of several known site, and will be for substantial amounts of known fragment and unknown fragment in patient's cancer cell DNA sequence and detect simultaneously. In the face of so big data volume, how calculating and understand formation useful information accurately becomes the key of tumor liquid biopsy.
The method of Present Domestic includes the technology caught based on target site and the DNA methylation assay technology based on cancer gene group. The technology ubiquity order-checking coverage caught based on target site is too low, thus causing that sensitivity is not enough. Based on, in methylated detection technique cancer in early days, there is the not high problem of specificity (deficiency) owing to ctDNA content is too low, directly affect and constrain liquid biopsy in clinic, market, the other application of scientific research grade.
This technology is by developing a kind of stable method for computing data according to carcinobiology principle, the breakthrough of novelty it is made that in data deciphering and clinical practice etc., rely on these international forward position research, it is conceived to Chinese population, new technique disclosed in this invention, it is intended to for this international cutting edge technology cancer detection at Chinese population and indicating risk. Owing to blood DNA detection noinvasive is radiationless, testing cost is controlled, and the lifting of Cancer in China early discovery rate and the improvement of overall popualtion healthy level certainly will be played a role in promoting.
Summary of the invention
The present invention is in order to overcome the deficiency of existing tumor screening technology specificity and sensitivity, it is provided that a kind of noinvasive lesion detection test kit based on peripheral blood dissociative DNA high-flux sequence and detection method. Radiationless, the hurtless measure based on the detection method of this test kit, it is only necessary to 4-10mL peripheral blood just can realize the detection to cancer. It is applicable to include incidence cancer (oral cancer, nasopharyngeal carcinoma, carcinoma of tongue, laryngeal carcinoma), alimentary tract cancer (esophageal carcinoma, gastric cancer, hepatocarcinoma, cancer of pancreas, intestinal cancer), the brain cancer, pulmonary carcinoma, reproductive system cancers (breast carcinoma, cervical cancer, uterus carcinoma, carcinoma of prostate, carcinoma of testis etc.), urinary system cancer (bladder cancer, renal carcinoma), skin carcinoma, lymphatic cancer, leukemia early diagnosis.
The present invention provides a kind of cancer detection reagent kit excavated based on large-scale data, and described test kit includes: DNA extraction kit, high-throughput sequencing library prepare reagent, gene order comparison software, chromosome coverage software for calculation.
Wherein said DNA extraction reagent is used for extracting peripheral blood dissociative DNA, prepare reagent by described high-throughput sequencing library and prepare high-throughput sequencing library, by to sequencing library high-flux sequence, described gene order comparison software is utilized to be made comparisons with reference to genome with the mankind by sequencing data, carry out data analysis to distinguish cancer and non-cancer patient by chromosome coverage software for calculation, analyze further and judge tumor type and be likely to primary tumor organ.
Wherein, described by data analysis to distinguish cancer and non-cancer patient, concrete operations are: by calculating every item chromosome order-checking coverage, divided by total order-checking amount, calculate relative cover; Under health condition, people every chromosome is two copies, and the copy number difference between chromosome is very little;Whereas if relative cover exceedes certain threshold value, then it is judged as doubtful cancer patient.
Wherein, described every chromosomal relative cover calculates according to below equation:
Chr_size in formula represents chromosomal length, and Genome_size represents human genome total length, the total length that namely all chromosomes are added together; CoviRepresent genome or chromosome order-checking number of times on the i of position.
Wherein, described differentiation cancer and the method for non-cancer patient be: comparison of tumor genome copy numbers and Healthy People reference copies number obtain z-score:
Score (sample) in formula represents the chromosome relative cover of test sample, and score (reference) represents the chromosome relative cover of Healthy People reference, and average represents and averages, and SD represents and seeks variance.
Z-score exceedes certain threshold value (+3/-3 or+6/-6) and is namely judged as doubtful cancer; Otherwise it is judged as non-cancer sample.
Wherein, described judgement tumor type and possible primary tumor organ method particularly includes: by searching for cancer patient genome copy numbers data base, it is judged that the tumor type of doubtful cancer; The calculating formula of similarity of test object and cancer patient's database data is:
Similarity (sample, dbi)=pearson_cor (sample, dbi).;
Sample in formula represents test object, dbiRepresenting that data item i, the pearson_cor of data base represent and seek Pearson's coefficient, Similarity represents the similarity of test object and data-base recording row. This computing formula can also is that pearson correlation coefficient, spearman rank correlation coefficient, or other statistics Calculation of correlation factor methods.
Tumor type front 1-5 the tumor type that to be judged as in data-base recording most like of test object:
Type (query)=Type (dbi): i=maxi(pearson_cor (query, db_i));
Wherein, the db in formulaiRepresent that data item i, the max of data base represent that maximizing, Type represent required cancer types, maxiRepresent the entry i that Pearson correlation coefficients is maximum.
Specific algorithm is described as, and selects 1-10000 record before in database search result, and frequency occurs in the tumor type adding up these records, selects 1-5 the tumor type that frequency is the highest, as final interpretation.
Wherein, the cancer patient's genome reference database used is based on full-length genome copy number data base or genome sequencing (WGS) data base of chip technology (affymetrixSNParray, Agilent copy number chip); Wherein sequencing data of whole genome storehouse is for coming from tissue order-checking (such as 1000genome data and TCGA full-length genome data) or peripheral blood dissociative DNA sequencing data; Concrete such as TCGA cancer gene group data base (http://cancergenome.nih.gov/) or GenomeSpace data base (http://www.genomespace.org/);
Wherein, described cancer is solid tumor or hematopathy, including incidence cancer, alimentary tract cancer, the brain cancer, pulmonary carcinoma, reproductive system cancers, urinary system cancer, skin carcinoma, lymphatic cancer, leukemia; Wherein said incidence cancer is oral cancer, nasopharyngeal carcinoma, carcinoma of tongue or laryngeal carcinoma; Described alimentary tract cancer is the esophageal carcinoma, gastric cancer, hepatocarcinoma, cancer of pancreas or intestinal cancer; Described reproductive system cancers is breast carcinoma, cervical cancer, uterus carcinoma, carcinoma of prostate or carcinoma of testis; Described urinary system cancer is bladder cancer or renal carcinoma.
Wherein, described high throughput sequencing technologies selects Roche/454, Illumina sequenator (NextSeq series, Hiseq series, MiSeq series, XTen, and follow-up sequenator series), BGI (Hua Da company, BGI500 is serial and follow-up sequenator) sequenator, LifeTech checks order instrument (Ion, Proton and follow-up order-checking instrument series), PacBio checks order instrument (RSII, Sequel and follow-up order-checking instrument) or based on the order-checking instrument (Genia, Nanopore and similar third generation sequenator) of Nanopore.
The present invention also provides for a kind of cancer detection technology excavated based on large-scale data, and the technical scheme of employing mainly comprises the steps that
(1) gathering peripheral blood from experimenter, separated plasma also extracts dissociative DNA with DNA extraction reagent;
(2) utilize high-throughput sequencing library to prepare reagent carry out archaeal dna polymerase amplification and set up sequencing library;
(3) prepared library is carried out high-flux sequence;
(4) gene order comparison software is utilized to be made comparisons with reference to genome with the mankind by sequencing data;
(5) data analysis is carried out to distinguish cancer and non-cancer patient by chromosome coverage software for calculation;
(6) judge tumor type by further data analysis and be likely to primary tumor organ.
Wherein, described in step (5) by data analysis to distinguish cancer and non-cancer patient, concrete operations are: by calculate every item chromosome order-checking coverage, divided by total order-checking amount, calculate relative cover; Under health condition, people every chromosome is two copies, and the copy number difference between chromosome is very little; Whereas if relative cover exceedes certain threshold value, then it is judged as doubtful cancer patient.
Wherein, described every chromosomal relative cover calculates according to below equation:
Chr_size in formula represents chromosomal length, and Genome_size represents human genome total length, the total length that namely all chromosomes are added together. CoviRepresent genome or chromosome order-checking number of times on the i of position.
Wherein, the method for the differentiation cancer described in step (5) and non-cancer patient is: comparison of tumor genome copy numbers and Healthy People reference copies number obtain z-score:
Score (sample) in formula represents the chromosome relative cover of test sample, and score (reference) represents the chromosome relative cover of Healthy People reference, and average represents and averages, and SD represents and seeks variance.
Z-score exceedes certain threshold value, namely z-score >=3 or��-3 or z-score >=6 or��-6 be namely judged as doubtful cancer; Otherwise it is judged as non-cancer sample.
Wherein, judge tumor type described in step (6) and be likely to primary tumor organ method particularly includes: by searching for cancer patient genome copy numbers data base, it is judged that the tumor type of doubtful cancer; The calculating formula of similarity of test object and cancer patient's database data is:
Similarity (sample, dbi)=pearson_cor (sample, dbi);
Sample in formula represents test object, dbiRepresenting that data item i, the pearson_cor of data base represent and seek Pearson's coefficient, Similarity represents the similarity of test object and data-base recording row. This computing formula can also is that pearson correlation coefficient, spearman rank correlation coefficient, or other statistics Calculation of correlation factor methods.
Tumor type front 1-5 the tumor type that to be judged as in data-base recording most like of test object:
Type (query)=Type (dbi): i=maxi(pearson_cor (query, db_i));
Wherein, the db in formulaiRepresent that data item i, the max of data base represent that maximizing, Type represent required cancer types, maxiRepresent the entry i that Pearson correlation coefficients is maximum.
Specific algorithm is described as, and selects 1-10000 record before in database search result, and frequency occurs in the tumor type adding up these records, selects 1-5 the tumor type that frequency is the highest, as final interpretation.
Wherein, the cancer patient's genome reference database used is based on full-length genome copy number data base or genome sequencing (WGS) data base of chip technology (affymetrixSNParray, Agilent copy number chip); Wherein sequencing data of whole genome storehouse is for coming from tissue order-checking (such as 1000genome data and TCGA full-length genome data) or peripheral blood dissociative DNA sequencing data; Concrete such as TCGA cancer gene group data base (http://cancergenome.nih.gov/) or GenomeSpace data base (http://www.genomespace.org/);
Wherein, described cancer is solid tumor or hematopathy, including incidence cancer, alimentary tract cancer, the brain cancer, pulmonary carcinoma, reproductive system cancers, urinary system cancer, skin carcinoma, lymphatic cancer, leukemia; Wherein said incidence cancer is oral cancer, nasopharyngeal carcinoma, carcinoma of tongue or laryngeal carcinoma; Described alimentary tract cancer is the esophageal carcinoma, gastric cancer, hepatocarcinoma, cancer of pancreas or intestinal cancer; Described reproductive system cancers is breast carcinoma, cervical cancer, uterus carcinoma, carcinoma of prostate or carcinoma of testis; Described urinary system cancer is bladder cancer or renal carcinoma.
Wherein, high throughput sequencing technologies described in step (3) selects Roche/454, Illumina sequenator (NextSeq series, Hiseq series, MiSeq series, XTen, and follow-up sequenator series), BGI (Hua Da company, BGI500 is serial and follow-up sequenator) sequenator, LifeTech checks order instrument (Ion, Proton and follow-up order-checking instrument series), PacBio checks order instrument (RSII, Sequel and follow-up order-checking instrument) or based on the order-checking instrument (Genia of Nanopore, Nanopore and similar third generation sequenator).
The beneficial effect comprise that:
(1) compared to congenic method, the present invention have employed peripheral blood ctDNA genome sequencing first and detects chromosomal copy number detection technique method, the improvement of initiative it is made that so that it is create safety, the raising of accuracy and improvement in lesion detection.
(2) relative to biochemical indicator, the present invention covers tumor type and tumor sufferer more.
(3) relative to iconography (PET, CT), harm that the present invention is radiationless, can repeated multiple times frequently use, it is possible to for the real-time tracking of tumor.
(4) relative to DNA methylation assay technology, chromosomal copy number change has higher tumour-specific. Clinical practice can be substantially reduced false positive rate.
(5) relative to DNA methylation assay technology, the present invention operates simpler, uses more stable in each clinical laboratory or hospital environment.
(6) relative to target site detection technique, the present invention adopts genome sequencing, and the coverage of tumor is greatly increased.
(7) in the present invention, ctDNA large scale sequencing and data processing links have the technical characterstic such as noinvasive, quick, wide spectrum examination, the application of this feature, have and significantly improve diagnosing tumor, examination, improve the effect of many people's health status, there is positive clinical value in liquid biopsy.
(8) present invention can be applied to the aspects such as screening for cancer diagnosis, (postoperative, medication) recruitment evaluation after treatment of cancer, and tumor recurrence prompting, it is adaptable to the research and development of hospital, colleges and universities and association area enterprise or application work.
(9) relative to other blood testings, the present invention can differentiate tumor type and former organ of possible tumor further. Further tumor is helped to make a definite diagnosis.
Accompanying drawing explanation
Fig. 1 is the method operating process of the present invention;
The algorithm that Fig. 2 is the present invention realizes process;
Fig. 3 is tumour patient peripheral blood score and Healthy People comparative result.
Detailed description of the invention
In order to be better understood from the present invention, below in conjunction with specific embodiments and the drawings, the present invention is explained further explanation.
Embodiment 1: take a blood sample in periphery, extracts dissociative DNA with DNA extraction reagent
1. gather experimenter's peripheral blood 4mL to EDTA anticoagulant tube. Experimenter all adopts the principle of voluntariness, and cancer specimen comes from Affiliated Hospital of Jiangsu University.
2. fresh blood is in 4 hours, centrifugal 10 minutes of 1600g, takes supernatant and transfers in 1.5mLEP pipe, and rifle head is avoided encountering intermediate layer and bottom erythrocyte.
Centrifugal 10 minutes of 3.16000g, takes supernatant and moves to new 1.5mLEP pipe, puts-80 degree refrigerator freezings and preserves.
4. dissociative DNA extracts: adopting standard Qiagen dissociative DNA to extract test kit (QIAGEN, QiaAmpDNABloodMiniKit, 55114), operate to specifications, every 4mL peripheral blood extracts 1-50ngDNA.
Concrete operation step is as follows:
(1) take after 1 pipe blood plasma melts on ice, add the QIAGEN E.C. 3.4.21.64 of 100 �� L.
(2) adding 0.8mLBufferACL (adding 1.0 �� gcarrierRNA in advance), lid upper tube cap, vortex 30s, until liquid in pipe is in homogeneous.
Hatch 15��20min for (3) 60 DEG C.
(4) BufferACB of 1.8mL, vortex mixing 15��30s are added; Ice puts 5min.
(5) QIAamp microtrabeculae is inserted in the Vac adapter being placed in QIAvac24Plus, 20mL pipe expander is inserted in QIAamp microtrabeculae.
(6) the cracking mixed liquor of (4th) step gained is carefully added in the pipe expander of QIAamp microtrabeculae, open vacuum pump, treating that all lysates all ooze completely in pipe, close vacuum pump, release of pressure is to 0mbar, and careful taking-up pipe expander also discards.
(7) adding 600 �� LBufferACW1 in pipe, keep pipe lid to open, open vacuum pump, allow BufferACW1 penetrate QIAamp microtrabeculae completely, close vacuum pump, release of pressure is to 0mba.
(8) in QIAamp microtrabeculae, 750 �� LBufferACW2 are added; Keeping pipe lid to open, open vacuum pump, allow ACW2buffer ooze QIAamp microtrabeculae completely, close vacuum pump, release of pressure is to 0mbars.
(9) adding 750 �� L ethanol (96 100%) to QIAamp microtrabeculae, keep pipe lid to open, open under vacuum pump makes all ethanol ooze completely, close vacuum pump, release of pressure is to 0mbars.
(10) stopped pipe lid is closed; QIAamp microtrabeculae is taken off from vacuum manifold, abandons Vac adapter; QIAamp microtrabeculae is positioned in new 2mL connecting tube, is centrifuged (20,000xg at full speed; 14,000rpm) 3min.
(11) QIAamp microtrabeculae is positioned over new 2mL collecting pipe, opens pipe lid, hatch 10min for 56 DEG C.
(12) QIAamp microtrabeculae is positioned on new 1.5mL eluting pipe, discards the collecting pipe of previous step; The careful BufferAVE adding 20��150 �� L in the middle of film. Close upper tube cap, incubated at room 3min.
(13) (20,000xg it is centrifuged at full speed; 14,000rpm) 1min is with eluting nucleic acid, collects and obtains plasma free double-stranded DNA.
Embodiment 2: utilize high-throughput sequencing library to prepare reagent and carry out archaeal dna polymerase amplification and set up sequencing library
The secondary sequencing library test kit of KAPA (KAPAHyperPlus library prepares test kit, 28100) of employing standard, builds storehouse upper machine order-checking (adopting the high-flux sequence instrument of illumina company) on acquired dissociative DNA product.
In the present embodiment, concrete operation step is as follows
(1) end reparation reaction:
Prepare following reaction mixture:
Reaction mixture is placed 20 DEG C of incubation 30min; It is purified (using AgencourtAMPureXPBeads test kit, article No. A63882) immediately, has obtained the product of end reparation reaction;
Guarantee that AMPureXPBeads equilibrates to room temperature; Add AMPureXPBeads in end reparation and reaction, prepare following reaction mixture:
End repairs product 100 �� L
AMPureXPBeads160��L
Being blown and beaten more than 10 times by mixed liquor rifle head, whirlpool shakes; Ambient temperatare puts 15min makes DNA and beads combine; Pipe is placed on supporting magnetic frame 15min or etc. liquid clarify completely; Carefully sopping up 255 �� L liquid (keeping off any pearl), some liquid are likely to still to remain in pipe; Pipe is placed in magnetic field and does not encounter pearl, with 200 �� L80% ethanol purge at least 30s; Carefully sop up ethanol (not encountering pearl), repeat to wash twice with 80% ethanol; Pipe is left from magnetic field, and room temperature 15min dries pearl; Again with 32.5 �� L elution buffer eluting, room temperature places 2min released dna; Pipe is placed on the liquid such as suitable magnetic field 15min clarify completely; Supernatant 30 �� LDNA is moved on in the pipe that the row that to spout adds A reaction.
(2) A reaction is added:
Prepare following reaction mixture:
30 DEG C of incubation 30min; It is purified (using AgencourtAMPureXPBeads test kit) immediately:
Guarantee that AMPureXPBeads equilibrates to room temperature, add AMPureXPBeads:
Add A product 50 �� L
AMPureXPBeads90��L
Total 140 �� L
Blowing and beating more than 10 times with rifle head, vortex shakes; Ambient temperatare puts 15min makes DNA and beads combine; Pipe is placed on supporting magnetic frame 15min or etc. liquid clarify completely; Carefully sopping up 135 �� L liquid (keeping off any pearl), some liquid are likely to still to remain in pipe; Pipe is placed in magnetic field and does not encounter pearl, with 200 �� L80% ethanol purge at least 30s; Carefully sop up ethanol (not encountering pearl), repeat to wash twice with 80% ethanol; Pipe is left from magnetic field, and room temperature 15min dries pearl; Adding 32.5 �� L elution buffers, room temperature places 2min released dna; Pipe is placed in suitable magnetic field 15min or etc. liquid clarify completely; Supernatant 30 �� LDNA is moved on in the pipe carrying out joint coupled reaction.
Add MinElut:
Add A product 50 �� L
ERC buffer 300 �� L
Mixed system is added in post, centrifugal, abandon eluent; By 750 �� LPE buffer solution for cleaning, abandon eluent; 10000g is centrifuged 2min, removes ethanol; In elution, the transfer clean aseptic micro-centrifuge tube of pillar to, add 31 �� LEB buffer. Ambient temperatare puts 1min, centrifugal, reclaims about 30 �� L liquid.
(3) joint connects
Joint coupled reaction, prepares following reaction mixture:
It is placed on 20 DEG C of incubation 15min, is purified immediately:
The first step: AMPureXPBeads purification:
Guarantee that AMPureXPBeads equilibrates to room temperature; Add AMPureXPBeads:
Joint coupled reaction product 50 �� L
AMPureXPBeads50��L
Total 100 �� L
Blowing and beating more than 10 times with rifle head, vortex shakes; Ambient temperatare puts 15min makes DNA and beads combine; Pipe is placed on supporting magnetic frame 15min or etc. liquid clarify completely; Carefully sopping up 95 �� L liquid (keeping off any pearl), some liquid are likely to still to remain in pipe; Being placed in magnetic field by pipe and do not encounter pearl, disappearing with 200 �� L80% ethanol is washed till few 30s;Carefully sop up ethanol (not encountering pearl), repeat to wash twice with 80% ethanol; Pipe is left from magnetic field, and room temperature 15min dries pearl; Adding 52.5 �� L elution buffers, room temperature places 2min released dna; Pipe is placed in suitable magnetic field l5min or etc. liquid clarify completely; Supernatant 50 �� LDNA is moved on in the pipe carrying out adding second step AMPureXPBeads purification step;
Second step: AMPureXPBeads purification
Guarantee that AMPureXPBeads equilibrates to room temperature, add AMPureXPBeads:
The library DNA 50 �� L that first step purification obtains
AMPureXPBeads50��L
Total 100 �� L
Blowing and beating more than 10 times with rifle head, vortex shakes; Ambient temperatare puts l5min makes DNA and beads combine; Pipe is placed on supporting magnetic frame 15min or etc. liquid clarify completely; Carefully sopping up 95 �� L liquid (keeping off any pearl), some liquid are likely to still to remain in pipe; Pipe is placed in magnetic field and does not encounter pearl, with 200 �� L80% ethanol purge at least 30s; Carefully sop up ethanol (not encountering pearl), repeat to wash twice with 80% ethanol; Pipe is left from magnetic field, and room temperature 15min dries pearl; Adding 52.5 �� L elution buffers, room temperature places 2min released dna; Pipe is placed in suitable magnetic field 15min or etc. liquid clarify completely; Supernatant 50 �� LDNA is moved on in the pipe storing library DNA.
(4) clip size selects
The linkers being not connected with is removed before amplified library, prevent the formation of joint dimer and other short circuit head derived molecules, affect amplification and the sequencing procedure in downstream, in the present embodiment manually agarose gel electrophoresis, cut glue, purification to adapt to the selection of clip size.
The present embodiment carries out electrophoresis in the agarose gel of 2%, choosing clip size is that 280 to 320bp bands are (because ctDNA's is sized to 150-180bp, joint be sized to 120bp, so building storehouse post-fragment, to add the stripe size of top connection be 280-320bp) carry out cutting glue and reclaim, utilize QIAquick glue to reclaim test kit (QIAGEN, 28706) and reclaim purpose fragment.
(5) library enrichment/amplification
A. the preparatory stage:
Defrosting completely, of short duration concussion, centrifugal KAPA high-fidelity thermal starting prepare system (2 ��) and PCR primer.
Defrosting completely, of short duration concussion, the centrifugal joint connected and the library DNA through size separation purification.
With distinctive round-robin method preliminary experiment on thermal cycler.
B. reaction system:
Configuration PCR system, reaction system is:
Rifle head will be changed after each imbibition. Stopped pipe reacts, and mixes gently, of short duration centrifugal.
C. circular response parameter: 98 DEG C of reaction 2min; 98 DEG C, 30s; 65 DEG C, 30s; 72 DEG C, 1min, 17 circulations; 72 DEG C, 5min.
D.PCR purification:
Purify with AMPureXPBeads.
E. library checking:
Verify the size of PCR rich segment with agarose gel electrophoresis and Bioanalyzer (Agilent2100, Shanghai branch company of Agilent Technologies), check fragment size distribution. Testing result display library main peak is respectively at about 300bp, and peak type is single, without assorted peak, and non junction and without primer dimer, can determine whether that Insert Fragment size is qualified.
Adopting realtimePCR method to carry out quantitatively to building library, library concentration can determine whether to ask for confidential on meeting at 1ng/ more than �� L.
Embodiment 2: the detection of high-flux sequence, large-scale data analysis and tumor
(1) sequence alignment
The obtained sequencing data of Illumina sequenator (50,75,100,150 etc. various read long), removes sequence measuring joints, removes sample label (barcode), shear the steps such as low quality region, obtain valid data.
By use software bwa-mem (http://bio-bwa.sourceforge.net/), comparison is to standard people with reference to genome hg19, and comparison file is retained in .bam file.
(2) statistics chromosome coverage
Use software samtools (samtools.sourceforge.net) that bam file translations is become .mpileup file, obtain the order-checking coverage data in each site of genome in .mpileup file, coverage according to each each site of genome, calculates the average within the scope of chromosome; Namely this algorithm is by calculating every item chromosome order-checking coverage (average time that on genome, single base is sequenced), divided by total order-checking amount, calculate relative cover (average time that on genome, single base is sequenced divided by all sites be on average sequenced number of times). Computational methods are as follows:
Wherein, the Chr_size in formula represents chromosomal length, and Genome_size represents human genome total length, the total length that namely all chromosomes are added together. CoviRepresent genome or chromosome order-checking number of times on the i of position.
(3) cancer is determined whether
Under health condition, people every chromosome is two copies, and the copy number difference between chromosome is very little. Whereas if relative cover exceedes certain threshold value, then it is judged as doubtful cancer patient.
Specifically comprise the following steps that
According to step (2), calculate each chromosomal z-score, comparison of tumor genome relative cover and Healthy People and obtain z-score with reference to relative cover.
Wherein, score (sample) in formula represents the chromosome relative cover of test sample, and score (reference) represents the chromosome relative cover of Healthy People reference, and average represents and averages, and SD represents and seeks variance.
Result of calculation numerical value is as follows:
Each chromosomal z-score result of each sample and Healthy People comparison in table 1. the present embodiment
The z-score sample be more than or equal to 3 or less than or equal to negative 3 is judged as doubtful cancer.
As it is shown on figure 3, each histogram data represents the score of 21 normal dyeing of each sample, it is pulmonary carcinoma, hepatocarcinoma and normal healthy controls from left to right successively.
Each cancer sample standard deviation has a plurality of chromosome z-score beyond threshold value; Each normal healthy controls, every chromosome is all in threshold range. It can be seen that, in the present embodiment, 6/6=100% cancer sample z-score is higher than 3 or lower than-3, wherein each chromosome of 5/6=83% example cancer blood sample z-score is well beyond range of normal value, 0/5 normal person's blood sample exceeds range of normal value, so this test sensitivity is 100%, specificity is 100%.
(4) tumor classification judges
By searching for cancer patient genome copy numbers data base TCGA (http://cancergenome.nih.gov/), it is judged that the tumor type of doubtful cancer. The calculating formula of similarity of test object and cancer patient's database data is:
Similarity (sample, dbi)=pearson_cor (sample, dbi);
Wherein, the sample in formula represents test object, dbiRepresenting that data item i, the pearson_cor of data base represent and seek Pearson's coefficient, Similarity represents the similarity of test object and data-base recording row.
Tumor type front 1-5 the tumor type that to be judged as in data-base recording most like of test object:
Type (query)=Type (dbi): i=maxi(pearson_cor (query, db_i));
Wherein, the db in formulaiRepresent that data item i, the max of data base represent that maximizing, Type represent required cancer types, maxiRepresent the entry i that Pearson correlation coefficients is maximum.
By search comparison TCGA cancer patient gene copy number database search, the database hit of pulmonary carcinoma-005 respectively LUSC (lungsquamouscellcarcinoma, lung squamous cancer), HNSC (head&necksquamouscellcarcinoma, G. cephalantha), LUSC, LUSC, ESCC (Esophagealsquamouscellcarcinoma, esophageal squamous cell carcinoma), HNSC, LUSC, LUSC, LUSC and ESCC, wherein most hits are lung squamous cancers, cancer class interpretation is lung squamous cancer (probability 50%), esophageal squamous cell carcinoma (probability 20%) and other scale cancer (probability 20%).
The database hit of pulmonary carcinoma-004 respectively LUAD (lungadenocarcinoma, adenocarcinoma of lung), STAD (stomachadenocarcinoma, adenocarcinoma of stomach), LUAD, CRC (colorectaladenocarcinoma, Colon and rectum adenocarcinoma), PRAD (pancreaticadenocarcinoma, cancer of pancreas), STAD, LUAD, LUAD, CRC. The hit of data base's majority is adenocarcinoma of lung, and final sentence read result is adenocarcinoma of lung (probability 50%), intestinal cancer (probability 20%), other adenocarcinoma probability 30%.
The database hit of hepatocarcinoma-002 is LIHC (liverhepatocellularcarcinoma, hepatocarcinoma), LIHC, LIHC, LIHC, LUAD, LIHC, LIHC, LIHC, LIHC, STAD and LIHC. Most hits are hepatocarcinoma, and final interpretation is hepatocarcinoma (probability 80%), and other cancer probability are 20%.
Decision rule is suitable for the interpretation of other samples, pulmonary carcinoma-003, pulmonary carcinoma-002 and hepatocarcinoma 002, repeats no more.
The method of table 2. present invention is in the testing result of all kinds of cancers
Claims (14)
1. the cancer detection reagent kit excavated based on large-scale data, it is characterised in that described test kit includes: DNA extraction reagent, high-throughput sequencing library prepare reagent, gene order comparison software, chromosome coverage software for calculation; Wherein said DNA extraction reagent is used for extracting peripheral blood dissociative DNA, prepare reagent by described high-throughput sequencing library and prepare high-throughput sequencing library, by to sequencing library high-flux sequence, described gene order comparison software is utilized to be made comparisons with reference to genome with the mankind by sequencing data, carry out data analysis to distinguish cancer and non-cancer patient by chromosome coverage software for calculation, analyze further and judge tumor type and be likely to primary tumor organ.
2. a kind of cancer detection reagent kit excavated based on large-scale data according to claim 1, it is characterized in that, described by data analysis to distinguish cancer and non-cancer patient, concrete operations are: calculate every item chromosome order-checking coverage, divided by total order-checking amount, calculate relative cover; Under health condition, people every chromosome is two copies, and the copy number difference between chromosome is very little; Whereas if relative cover exceedes certain threshold value, then it is judged as doubtful cancer patient; Described every chromosomal relative cover calculates according to below equation:
Chr_size in formula represents chromosomal length, and Genome_size represents human genome total length, the total length that namely all chromosomes are added together; CoviRepresent genome or chromosome order-checking number of times on the i of position.
3. a kind of cancer detection reagent kit excavated based on large-scale data according to claim 1 and 2, it is characterized in that, the method distinguishing cancer and non-cancer patient is: comparison of tumor genome copy numbers and Healthy People reference copies number obtain z-score, z-score and exceed certain threshold value and z-score >=3 or��-3 or z-score >=6 or��-6 and be namely judged as doubtful cancer;Otherwise it is judged as non-cancer sample;
Score (sample) in formula represents the chromosome relative cover of test sample, and score (reference) represents the chromosome relative cover of Healthy People reference, and average represents and averages, and SD represents and seeks variance.
4. a kind of cancer detection reagent kit excavated based on large-scale data according to claim 1, it is characterized in that, described judgement tumor type and possible primary tumor organ method particularly includes: by searching for cancer reference database, it is judged that the tumor type of doubtful cancer; The calculating formula of similarity of test object and cancer patient's database data is:
Similarity (sample, dbi)=pearson_cor (sample, dbi);
Sample in formula represents test object, dbiRepresenting that data item i, the pearson_cor of data base represent and seek Pearson's coefficient, Similarity represents the similarity of test object and data-base recording row; Tumor type front 1-5 the tumor type that to be judged as in data-base recording most like of test object:
Type (query)=Type (dbi): i=maxi(pearson_cor (query, db_i));
Db in formulaiRepresent that data item i, the max of data base represent that maximizing, Type represent required cancer types, maxiRepresent the entry i that Pearson correlation coefficients is maximum;
Specific algorithm is described as, and selects 1-10000 record before in database search result, and frequency occurs in the tumor type adding up these records, selects 1-5 the tumor type that frequency is the highest, as final interpretation.
5. a kind of cancer detection reagent kit excavated based on large-scale data according to any one of claim 1-4, it is characterized in that, described cancer is solid tumor or hematopathy, including incidence cancer, alimentary tract cancer, the brain cancer, pulmonary carcinoma, reproductive system cancers, urinary system cancer, skin carcinoma, lymphatic cancer, leukemia.
6. a kind of cancer detection reagent kit according to claim 5, it is characterised in that described incidence cancer is oral cancer, nasopharyngeal carcinoma, carcinoma of tongue or laryngeal carcinoma; Described alimentary tract cancer is the esophageal carcinoma, gastric cancer, hepatocarcinoma, cancer of pancreas or intestinal cancer; Described reproductive system cancers is breast carcinoma, cervical cancer, uterus carcinoma, carcinoma of prostate or carcinoma of testis; Described urinary system cancer is bladder cancer or renal carcinoma.
7. the method for detecting cancer excavated based on large-scale data, it is characterized in that, the method of the noinvasive lesion detection that described method is based on peripheral blood dissociative DNA high-flux sequence, is excavated by large-scale data, stable data calculate, specifically includes following steps:
(1) gathering peripheral blood from experimenter, separated plasma also extracts dissociative DNA with DNA extraction reagent;
(2) utilize high-throughput sequencing library to prepare reagent carry out archaeal dna polymerase amplification and set up sequencing library;
(3) prepared library is carried out high-flux sequence;
(4) gene order comparison software is utilized to be made comparisons with reference to genome with the mankind by sequencing data;
(5) data analysis is carried out to distinguish cancer and non-cancer patient by chromosome coverage software for calculation;
(6) judge tumor type by further data analysis and be likely to primary tumor organ.
8. a kind of method for detecting cancer excavated based on large-scale data according to claim 7, it is characterized in that, described in step (5) by data analysis to distinguish cancer and non-cancer patient, concrete operations are: by calculating every item chromosome order-checking coverage, divided by total order-checking amount, calculate relative cover;Under health condition, people every chromosome is two copies, and the copy number difference between chromosome is very little; Whereas if relative cover exceedes certain threshold value, then it is judged as doubtful cancer patient;
Described every chromosomal relative cover calculates according to below equation:
Chr_size in formula represents chromosomal length, and Genome_size represents human genome total length, the total length that namely all chromosomes are added together; CoviRepresent genome or chromosome order-checking number of times on the i of position.
9. a kind of method for detecting cancer excavated based on large-scale data according to claim 7 or 8, it is characterised in that the method distinguishing cancer and non-cancer patient is: comparison of tumor genome copy numbers and Healthy People reference copies number obtain z-score:
Score (sample) in formula represents the chromosome relative cover of test sample, and score (reference) represents the chromosome relative cover of Healthy People reference, and average represents and averages, and SD represents and seeks variance;
Z-score exceedes certain threshold value and z-score >=3 or��-3 or z-score >=6 or��-6 and is namely judged as doubtful cancer; Otherwise it is judged as non-cancer sample.
10. a kind of method for detecting cancer excavated based on large-scale data according to claim 7, it is characterized in that, judge tumor type described in step (6) and be likely to primary tumor organ method particularly includes: by searching for cancer reference database, it is judged that the tumor type of doubtful cancer; The calculating formula of similarity of test object and cancer patient's database data is:
Similarity (sample, dbi)=pearson_cor (sample, dbi);
Sample in formula represents test object, dbiRepresenting that data item i, the pearson_cor of data base represent and seek Pearson's coefficient, Similarity represents the similarity of test object and data-base recording row; Tumor type front 1-5 the tumor type that to be judged as in data-base recording most like of test object:
Type (query)=Type (dbi): i=maxi(pearson_cor (query, db_i));
Db in formulaiRepresent that data item i, the max of data base represent that maximizing, Type represent required cancer types, maxiRepresent the entry i that Pearson correlation coefficients is maximum;
Specific algorithm is described as, and selects 1-10000 record before in database search result, and frequency occurs in the tumor type adding up these records, selects 1-5 the tumor type that frequency is the highest, as final interpretation.
11. a kind of method for detecting cancer excavated based on large-scale data according to any one of claim 7-10, it is characterized in that, described cancer is solid tumor or hematopathy, including incidence cancer, alimentary tract cancer, the brain cancer, pulmonary carcinoma, reproductive system cancers, urinary system cancer, skin carcinoma, lymphatic cancer, leukemia.
12. a kind of method for detecting cancer excavated based on large-scale data according to claim 11, it is characterised in that described incidence cancer is oral cancer, nasopharyngeal carcinoma, carcinoma of tongue or laryngeal carcinoma; Described alimentary tract cancer is the esophageal carcinoma, gastric cancer, hepatocarcinoma, cancer of pancreas or intestinal cancer; Described reproductive system cancers is breast carcinoma, cervical cancer, uterus carcinoma, carcinoma of prostate or carcinoma of testis; Described urinary system cancer is bladder cancer or renal carcinoma.
13. a kind of method for detecting cancer excavated based on large-scale data according to claim 7, it is characterised in that described high throughput sequencing technologies selects Roche/454;Illumina sequenator, such as NextSeq series, Hiseq series, MiSeq series, XTen and follow-up sequenator series; BGI sequenator, as serial in Hua Da company sequenator, BGI500 and follow-up sequenator; LifeTech checks order instrument, as Ion, Proton and follow-up order-checking instrument series, PacBio check order instrument, such as RSII, Sequel and follow-up order-checking instrument; Or the order-checking instrument based on Nanopore, such as Genia, Nanopore and similar third generation sequenator.
14. a kind of method for detecting cancer excavated based on large-scale data according to claim 10, it is characterised in that described cancer patient's genome reference database is based on full-length genome copy number data base or the sequencing data of whole genome storehouse of chip technology; Wherein sequencing data of whole genome storehouse is for coming from tissue order-checking or peripheral blood dissociative DNA sequencing data.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610018232.9A CN105653898A (en) | 2016-01-12 | 2016-01-12 | Cancer detection kit based on large-scale data mining and detection method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610018232.9A CN105653898A (en) | 2016-01-12 | 2016-01-12 | Cancer detection kit based on large-scale data mining and detection method |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN105653898A true CN105653898A (en) | 2016-06-08 |
Family
ID=56486555
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610018232.9A Pending CN105653898A (en) | 2016-01-12 | 2016-01-12 | Cancer detection kit based on large-scale data mining and detection method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN105653898A (en) |
Cited By (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106372459A (en) * | 2016-08-30 | 2017-02-01 | 天津诺禾致源生物信息科技有限公司 | Method and device for detecting copy number variation based on amplicon next generation sequencing |
| CN107992719A (en) * | 2017-11-23 | 2018-05-04 | 南方医科大学 | A kind of carcinoma of urinary bladder detection kit based on high-flux sequence |
| CN108624584A (en) * | 2017-03-16 | 2018-10-09 | 上海融享生物科技有限公司 | A kind of library constructing method for the detection of ctDNA low frequencies |
| CN109680049A (en) * | 2018-12-03 | 2019-04-26 | 东南大学 | A kind of method and its application based on the dissociative DNA in blood high-flux sequence analysis affiliated individual physiological state of cfDNA |
| WO2019128233A1 (en) * | 2017-12-29 | 2019-07-04 | 南京格致基因生物科技有限公司 | Method and system for determining cervical cancer |
| CN109988835A (en) * | 2017-12-29 | 2019-07-09 | 南京格致基因生物科技有限公司 | Method and apparatus based on the screening of high-flux sequence method and the high-level serous carcinoma of diagnosis of ovarian |
| CN110272985A (en) * | 2019-06-26 | 2019-09-24 | 广州市雄基生物信息技术有限公司 | Tumor screening kit and its System and method for based on peripheral blood plasma DNA high throughput sequencing technologies |
| CN110580934A (en) * | 2019-07-19 | 2019-12-17 | 南方医科大学 | A method for predicting pregnancy-related diseases based on high-throughput sequencing of peripheral blood cell-free DNA |
| CN110736834A (en) * | 2018-07-19 | 2020-01-31 | 南京格致基因生物科技有限公司 | Method, device and system for screening and diagnosing liver cancer based on high-throughput sequencing method |
| CN110791564A (en) * | 2018-10-10 | 2020-02-14 | 杭州翱锐基因科技有限公司 | Method and apparatus for analyzing early cancer |
| CN110880356A (en) * | 2018-09-05 | 2020-03-13 | 南京格致基因生物科技有限公司 | Method and apparatus for screening, diagnosing or risk stratification for ovarian cancer |
| WO2021077411A1 (en) * | 2019-10-25 | 2021-04-29 | 苏州宏元生物科技有限公司 | Chromosome instability detection method, system and test kit |
| CN113969316A (en) * | 2021-10-15 | 2022-01-25 | 上海缘悉生物科技有限公司 | Application of chromosome instability score |
| CN115346603A (en) * | 2022-08-23 | 2022-11-15 | 天津市肿瘤医院(天津医科大学肿瘤医院) | Use of the Genome Instability Score as a Marker for Meningeal Metastases |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104428425A (en) * | 2012-05-04 | 2015-03-18 | 考利达基因组股份有限公司 | Methods for determining absolute genome-wide copy number variations of complex tumors |
| CN104560697A (en) * | 2015-01-26 | 2015-04-29 | 上海美吉生物医药科技有限公司 | Detection device for instability of genome copy number |
| CN104611410A (en) * | 2013-11-04 | 2015-05-13 | 北京贝瑞和康生物技术有限公司 | Noninvasive cancer detection method and its kit |
| CN105112569A (en) * | 2015-09-14 | 2015-12-02 | 中国医学科学院病原生物学研究所 | Virus infection detection and identification method based on metagenomics |
-
2016
- 2016-01-12 CN CN201610018232.9A patent/CN105653898A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104428425A (en) * | 2012-05-04 | 2015-03-18 | 考利达基因组股份有限公司 | Methods for determining absolute genome-wide copy number variations of complex tumors |
| CN104611410A (en) * | 2013-11-04 | 2015-05-13 | 北京贝瑞和康生物技术有限公司 | Noninvasive cancer detection method and its kit |
| CN104560697A (en) * | 2015-01-26 | 2015-04-29 | 上海美吉生物医药科技有限公司 | Detection device for instability of genome copy number |
| CN105112569A (en) * | 2015-09-14 | 2015-12-02 | 中国医学科学院病原生物学研究所 | Virus infection detection and identification method based on metagenomics |
Cited By (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106372459B (en) * | 2016-08-30 | 2019-03-15 | 天津诺禾致源生物信息科技有限公司 | A kind of method and device based on amplification second filial sequencing copy number variation detection |
| CN106372459A (en) * | 2016-08-30 | 2017-02-01 | 天津诺禾致源生物信息科技有限公司 | Method and device for detecting copy number variation based on amplicon next generation sequencing |
| CN108624584A (en) * | 2017-03-16 | 2018-10-09 | 上海融享生物科技有限公司 | A kind of library constructing method for the detection of ctDNA low frequencies |
| CN107992719B (en) * | 2017-11-23 | 2021-08-06 | 南方医科大学 | A high-throughput sequencing-based detection kit for bladder cancer |
| CN107992719A (en) * | 2017-11-23 | 2018-05-04 | 南方医科大学 | A kind of carcinoma of urinary bladder detection kit based on high-flux sequence |
| CN109988835A (en) * | 2017-12-29 | 2019-07-09 | 南京格致基因生物科技有限公司 | Method and apparatus based on the screening of high-flux sequence method and the high-level serous carcinoma of diagnosis of ovarian |
| WO2019128233A1 (en) * | 2017-12-29 | 2019-07-04 | 南京格致基因生物科技有限公司 | Method and system for determining cervical cancer |
| CN110736834A (en) * | 2018-07-19 | 2020-01-31 | 南京格致基因生物科技有限公司 | Method, device and system for screening and diagnosing liver cancer based on high-throughput sequencing method |
| CN110880356A (en) * | 2018-09-05 | 2020-03-13 | 南京格致基因生物科技有限公司 | Method and apparatus for screening, diagnosing or risk stratification for ovarian cancer |
| CN110791564A (en) * | 2018-10-10 | 2020-02-14 | 杭州翱锐基因科技有限公司 | Method and apparatus for analyzing early cancer |
| CN109680049A (en) * | 2018-12-03 | 2019-04-26 | 东南大学 | A kind of method and its application based on the dissociative DNA in blood high-flux sequence analysis affiliated individual physiological state of cfDNA |
| CN110272985A (en) * | 2019-06-26 | 2019-09-24 | 广州市雄基生物信息技术有限公司 | Tumor screening kit and its System and method for based on peripheral blood plasma DNA high throughput sequencing technologies |
| CN110272985B (en) * | 2019-06-26 | 2021-08-17 | 广州市雄基生物信息技术有限公司 | Tumor screening kit based on peripheral blood plasma free DNA high-throughput sequencing technology, system and method thereof |
| CN110580934A (en) * | 2019-07-19 | 2019-12-17 | 南方医科大学 | A method for predicting pregnancy-related diseases based on high-throughput sequencing of peripheral blood cell-free DNA |
| CN110580934B (en) * | 2019-07-19 | 2022-05-10 | 南方医科大学 | A prediction method for pregnancy-related diseases based on high-throughput sequencing of peripheral blood cell-free DNA |
| WO2021077411A1 (en) * | 2019-10-25 | 2021-04-29 | 苏州宏元生物科技有限公司 | Chromosome instability detection method, system and test kit |
| CN113969316A (en) * | 2021-10-15 | 2022-01-25 | 上海缘悉生物科技有限公司 | Application of chromosome instability score |
| CN115346603A (en) * | 2022-08-23 | 2022-11-15 | 天津市肿瘤医院(天津医科大学肿瘤医院) | Use of the Genome Instability Score as a Marker for Meningeal Metastases |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN105653898A (en) | Cancer detection kit based on large-scale data mining and detection method | |
| CN106047998B (en) | A kind of detection method and application of lung cancer gene | |
| CN105219844B (en) | Gene marker combination, kit and the disease risks prediction model of a kind of a kind of disease of screening ten | |
| CN106156543B (en) | A kind of tumour ctDNA information statistical method | |
| CN109097471A (en) | A kind of kit detected for colorectal cancer and precancerous lesion and its application method | |
| CN107523563A (en) | A kind of Bioinformatics method for Circulating tumor DNA analysis | |
| CN116064755B (en) | Device for detecting MRD marker based on linkage gene mutation | |
| CN107475403A (en) | The analysis method of the method for detection Circulating tumor DNA, kit and its sequencing result from peripheral blood dissociative DNA | |
| CN114694750B (en) | Single-sample tumor somatic mutation distinguishing and TMB (tumor necrosis factor) detecting method based on NGS (Next Generation broadcasting) platform | |
| EP3249051B1 (en) | Use of methylation sites in y chromosome as prostate cancer diagnosis marker | |
| CN110452981A (en) | The kit of early screening of lung cancer based on peripheral blood | |
| CN105132407A (en) | Method for low-frequency mutant-enriched sequencing of DNA of exfoliative cells | |
| CN108949979A (en) | A method of judging that Lung neoplasm is good pernicious by blood sample | |
| CN109616198A (en) | It is only used for the choosing method of the special DNA methylation assay Sites Combination of the single cancer kind screening of liver cancer | |
| CN116121383A (en) | Composition for clinical diagnosis and treatment of hematological malignant tumor and application thereof | |
| CN108300787A (en) | Special application of the methylation sites as early diagnosing mammary cancer marker | |
| CN108070658A (en) | Detect the non-diagnostic method of MSI | |
| CN105821147A (en) | Primer and method for detecting rectal-cancer-susceptibility-related SNP site | |
| CN112951325A (en) | Design method and application of probe combination for cancer detection | |
| CN111968702A (en) | Early malignant tumor screening system based on circulating tumor DNA | |
| CN113817822B (en) | Tumor diagnosis kit based on methylation detection and application thereof | |
| Wilmott et al. | Tumour procurement, DNA extraction, coverage analysis and optimisation of mutation-detection algorithms for human melanoma genomes | |
| US20240312563A1 (en) | Method for preparation of multi-analytical prediction model for cancer diagnosis | |
| CN106755330B (en) | Cancer-related gene expression difference detection kit and application thereof | |
| CN114277132B (en) | Application of immune-related lncRNA expression profile in predicting benefit of small cell lung cancer auxiliary chemotherapy and prognosis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160608 |