CN106566877A

CN106566877A - Gene mutation detection method and apparatus

Info

Publication number: CN106566877A
Application number: CN201610932451.8A
Authority: CN
Inventors: 李雷; 李瑞强; 臧晚春; 余盼; 马丽娟; 于洋; 蒋智
Original assignee: TIANJIN NOVOGENE BIOLOGICAL INFORMATION TECHNOLOGY Co Ltd
Current assignee: TIANJIN NOVOGENE BIOLOGICAL INFORMATION TECHNOLOGY Co Ltd
Priority date: 2016-10-31
Filing date: 2016-10-31
Publication date: 2017-04-19

Abstract

The invention discloses a gene mutation detection method and apparatus. The method comprises the following steps: 1, acquiring sequencing data of a sample to be detected and a contrast sample; and 2, judging that whether SNP mutation, InDel mutation and/or deletion mutation exists in the sequencing data of the sample to be detected: (1) carrying out homogenization processing, calculating a standard deviation and median, and calculating the irrelevance Z value according to a formula (1); and (2) judging deletion: judging the deletion mutation exists in the window of the sample to be detected if the Z value Z = (the homogenous sequence number of the sample to be detected - the median)/the standard deviation (1) is greater than 3. The method and the apparatus improve the detection flux and the detection accuracy.

Description

The method and apparatus of detection gene mutation

Technical field

The present invention relates to detection in Gene Mutation field, in particular to a kind of method and apparatus of detection gene mutation.

Background technology

Data display, 2012, the whole world increased 14,100,000 cancer patients newly, and China is newly-increased 306.5 ten thousand, accounts for 22%；Entirely Because of cancer mortality, China accounts for 27%, i.e. 220.6 ten thousand people and dies from cancer the people of ball 8,200,000.Colorectal cancer is the digestion of common generation Road malignant tumor, annual about 1,020,000 new cases in the whole world, 530,000 deaths.China has been enter into colorectal cancer hotspot Ranks, this disease increasingly seriously threatens the physical and mental health of people.Colorectal cancer case is newly sent out every year up to 130,000 by China, And constantly risen with average annual 4% amplification, there is the 3rd, tumor in women die rate middle position shelter.

In colorectal cancer, there are two class diseases to occupy major part, be respectively Jessica Lynch's syndrome and Familial Adenomatous breath Disease of muscle.Wherein, Jessica Lynch's syndrome mainly occurs pathogenic mutation by MLH1, MHS2, MSH6, PMS2, EPCAM gene is caused；And Familial adenomatous polyposises are then mainly caused by APC and MUTYH gene mutation.

It is well known that can result in the above-mentioned gene mutation such as the disease of colorectal cancer etc has various, including SNP, InDel (insertion or the base number for lacking are relatively fewer, generally rarely exceed 100bp) and large fragment deletion mutation (are lacked Mutation) or large fragment repetition mutation (the typically disappearance of Kb ranks or repetition).The method that gene mutation is detected in prior art Have a lot, wherein, can detect the deletion mutation detection method of fragment includes multiplex ligation-dependent probe amplification (MLPA), fluorescence Quantitative PCR, Sanger sequencing and secondary sequencing.

The ultimate principle of MLPA is to be hybridized probe and target sequence DNA, the specialization connection of probe, PCR amplifications, expansion Then volume increase thing is analyzed last obtaining to the data collected by capillary electrophoresis separation, data collection using analysis software Go out conclusion；Quantitative fluorescent PCR ultimate principle includes carrying out quantitatively the corresponding gene position PCR primer of testing sample and matched group Analysis, by comparing the conclusion for whether having insertion or repeating is drawn.Sanger sequencing is lacked by direct sequencing Region.However, there is design primer complexity, low flux, high labor intensive, high cost, not adapting to a large amount of samples in said method The defects such as the demand of sheet or batch gene abrupt climatic change, thus apply and be restricted.

With the development of high throughput sequencing technologies, the characteristics of its flux is big, accuracy rate is high so that second filial generation sequence measurement into For the popular means of current detection gene mutation.But the huge sequencing data obtained after high-flux sequence, how quickly, Therefrom finding the mutated site and mutation type of all purposes gene exactly becomes difficult point, therefore, it is badly in need of providing a kind of batch The method of all mutation types of amount detection, to improve the flux and accuracy of detection.

The content of the invention

Present invention is primarily targeted at provide it is a kind of detection gene mutation method and apparatus, to improve prior art in The detection defect that flux is low, accuracy is low.

To achieve these goals, according to an aspect of the invention, there is provided a kind of method of detection gene mutation, is somebody's turn to do Method is comprised the following steps：Obtain the sequencing data of sample to be tested and check sample；It is in the sequencing data for judging sample to be tested It is no to there is SNP mutation and/or InDel mutation；And judge to whether there is deletion mutation in the sequencing data of sample to be tested；Its In, judge that the step of whether there is deletion mutation in the sequencing data of sample to be tested includes：Homogenization is processed, and sequencing data is cut It is divided into window, statistics sample to be tested and check sample are respectively in the sequence number of each window, and the sequence number to each window is carried out One change is processed, and obtains sample to be tested and check sample respectively in the homogenization sequence number of each window；Standard deviation and median are calculated, Calculate the standard deviation and median of homogenization sequence number of the matched group sample on each window；Irrelevance is calculated, and is counted according to formula (1) Calculate on each window, the irrelevance Z values of the homogenization sequence number of sample to be tested and the median of check sample；And disappearance is sentenced It is disconnected, when Z values greatly

Z=(the homogenization sequence number-median of sample to be tested)/standard deviation (1)

When 3, then judge that sample to be tested has deletion mutation in window.

Further, uniform in process step, sequencing data is cut into into continuous disjoint window.

Further, the step of homogenization is processed includes：By sample to be tested and the respective sequencing data cutting of check sample Into window, and the sequence number of each each window of leisure is designated as into First ray number；And by the summation of each First ray number, be designated as Respective second sequence number；And the formula as shown in formula (2) enters to sample to be tested and the respective First ray number of check sample At row homogenization

The homogenization sequence numbers (2) of sequence number=First ray number * 1000/ second

Reason, obtains the homogenization sequence number of sample to be tested and each each window of leisure of check sample.

Further, the step of being mutated with the presence or absence of SNP mutation and/or InDel in the sequencing data of sample to be tested is judged Including：Sequence alignment, the sequencing data of sample to be tested and reference gene group is compared and obtains comparison result；Sieve for the first time Choosing, filters out the site that there is SNP mutation and/or InDel mutation from comparison result, is designated as the first candidate locus；Second Screening, filters out site of crowd's mutation frequency less than 2% from the first candidate locus, is designated as the second candidate locus；SNP and/ Or InDel mutation judge, according to, to the functional annotation of the second candidate locus, judging the second candidate locus in functional annotation data base In with the presence or absence of causing the SNP mutation site and/or InDel mutational sites that gene function changes；If existing, by second Candidate locus are designated as the 3rd candidate locus；And SNP and/or InDel mutation confirm, when there are three candidate locus, by the Three candidate locus are defined as SNP mutation site and/or InDel mutational sites.

Further, before the step of obtaining the sequencing data of sample to be tested and check sample, method also includes treating The step of test sample sheet and check sample carry out exon library and prepare respectively, the step of prepared by exon library in caught using liquid phase The method for obtaining is prepared.

Further, before the method for being captured using liquid phase is prepared, also include according to target gene exon region The step of design liquid phase capture probe.

Further, exon library preparation process is prepared comprising the exon library to multiple target genes, many Individual target gene at least includes following gene：MLH1、MSH2、MSH3、MSH6、PMS1、PMS2、BUB1、BUB3、STK11、 PTEN, SMAD4, APC, MUTYH, EPCAM, SETD2, MAX, TSC2, ATM and FANCC.

According to a further aspect in the invention, there is provided a kind of device of detection deletion mutant, the device includes：Obtain Module, for obtaining the sequencing data of sample to be tested and check sample；First judge module, for judging the sequencing of sample to be tested It is mutated with the presence or absence of SNP mutation and/or InDel in data；And second judge module, for judging the sequencing number of sample to be tested Whether there is deletion mutation according in；Wherein, the second judge module includes：Homogenization submodule, for sequencing data to be cut into Window, statistics sample to be tested and check sample are respectively in the sequence number of each window, and the sequence number to each window is uniformed Process, obtain sample to be tested and check sample respectively in the homogenization sequence number of each window；First calculating sub module, for calculating The standard deviation and median of homogenization sequence number of the matched group sample on each window；Second calculating sub module, for according to formula (1) calculate on each window, the irrelevance Z values of the homogenization sequence number of sample to be tested and the median of check sample；And lack Mistake judges submodule

Block, for when Z values are more than 3, then judging that sample to be tested has deletion mutation in window.

Further, uniform submodule to further include：Statistic unit, for by sample to be tested and check sample each Counted in the sequence number of each window, be designated as respective First ray number, by the First ray number of respective all windows it With counted, be designated as respective second sequence number；And computing unit, for by sample to be tested and check sample in each window First ray number carry out homogenization process according to the formula shown in formula (2), obtain sample to be tested and check sample be each comfortable every Individual window

Homogenization sequence number.

Further, the first judge module includes：Sequence alignment submodule, for by the sequencing data of sample to be tested with ginseng Examine genome and compare and obtain comparison result；, there is SNP mutation for filtering out from comparison result in the first screening submodule And/or the site of InDel mutation, it is designated as the first candidate locus；Second screening submodule, for screening from the first candidate locus Go out site of crowd's mutation frequency less than 2%, be designated as the second candidate locus；SNP and/or InDel mutation judging submodules, are used for Cause base according to, to the functional annotation of the second candidate locus, judging to whether there is in the second candidate locus in functional annotation data base The SNP mutation site changed because of function and/or InDel mutational sites；If existing, the second candidate locus are designated as into the 3rd Candidate locus；And SNP and/or InDel mutation confirm submodule, for when there are three candidate locus, by the 3rd candidate Site is defined as SNP mutation site and/or InDel mutational sites.

Further, before acquisition module, device also includes that exon library prepares module, for being captured using liquid phase Method is prepared to the exon library of sample to be tested and check sample.

Further, sublibrary is shown outside to prepare before module, device also includes that probe designs module, for according to target Gene extron subregion designs liquid phase capture probe.

Using technical scheme, dashed forward by the way that the sequencing data of sample to be tested and check sample to be carried out respectively SNP Become and/or InDel mutation judge and deletion mutation judges, will can exist in sample to be tested in one-time detection above-mentioned various The site of mutation type all detects.And, lack mutation judge in, by using each window homogenization sequence number with The statistical method of departure degree of the check sample between the median of the window come judge a certain window with the presence or absence of disappearance, phase Than the statistical method with the departure degree of the average of homogeneous sequence number, from statistical significance for, validity and accuracy It is all higher, it is easier to distinguish false positive.

Description of the drawings

The Figure of description for constituting the part of the application is used for providing a further understanding of the present invention, and the present invention's shows Meaning property embodiment and its illustrated for explaining the present invention, does not constitute inappropriate limitation of the present invention.In the accompanying drawings：

Fig. 1 shows the result figure of the deletion mutation in a kind of specific embodiment of the invention.

Specific embodiment

It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Below with reference to the accompanying drawings and in conjunction with the embodiments describing the present invention in detail.

Firstly the need of explanation, various possible mutation present in sample can be measured using high-flux sequence, be wrapped Include InDel, SNP and large fragment deletion.The deletion mutation that methods and apparatus of the present invention detection is obtained is mainly from statistics Angle to infer sample to be tested in gene mutation site that may be present and its concrete species of mutation, as to whether and disease There is direct or indirect relation, need many checkings of other testing results, thus this method and device are only fitted It is used for scientific research and academic basic research, and is not suitable for the diagnosis of clinically disease.

As background section is previously mentioned, in prior art when gene mutation is detected using high-flux sequence method, Existing cannot accurately detect in batches the defect of mutated site and all possible mutation type.The present invention is above-mentioned scarce in order to improve Fall into, in a kind of typical embodiment, as shown in Figure 1, there is provided it is a kind of detection gene mutation method, the method include with Lower step：Obtain the sequencing data of sample to be tested and check sample；Judge in testing data with the presence or absence of SNP mutation and/or InDel is mutated；And judge to whether there is deletion mutation in testing data；Wherein, judge in testing data with the presence or absence of disappearance The step of mutation, includes：Homogenization is processed, and sequencing data is cut into into window, and statistics sample to be tested and check sample are respectively each The sequence number of window, and the sequence number to each window carries out homogenization process, obtains sample to be tested and check sample respectively each The homogenization sequence number of window；Standard deviation and median are calculated, and calculate homogenization sequence number of the matched group sample on each window Standard deviation and median；Irrelevance is calculated, and is calculated on each window according to formula (1), the homogenization sequence number of sample to be tested With the median of check sample

Irrelevance Z values；Disappearance judgement, when Z values are more than 3, then judges that the window has deletion mutation.

The present invention said method by the sequencing data of sample to be tested and check sample is carried out respectively SNP mutation and/ Or InDel mutation judge and deletion mutation judges, can be by the site that there is above-mentioned various mutations type in sample to be tested all Detect.And, in mutation judgement is lacked, by using the homogenization sequence number and check sample of each window in the window Median between departure degree statistical method come judge a certain window with the presence or absence of disappearance, compare and homogeneous sequence number The statistical method of the departure degree of average, from statistical significance for, validity and accuracy it is all higher, it is easier to distinguish False positive.

In above-mentioned homogenization process step, to be cut into window in the form of carry out the calculating of sequence number, be easy to according to difference Sequencing data sequencing depth and target deletion fragment size come flexible splitter size, make the deletion fragment of detection Magnitude range it is more extensive；Also, when it is determined that a certain window whether there is deletion mutation, sample to be tested is calculated first each The homogenization sequence number of window and check sample the median of the window difference, then further according to the difference and check sample Whether it is more than 3 to determine the window with the presence or absence of disappearance in the ratio of the standard deviation of the homogenization sequence number of the window.It is this to sentence Disconnected method is compared and adopts meansigma methodss by choosing the median of one group of check sample as the standard for comparing, the calculating of median The impact of the homogenization sequence number of indivedual unusual fluctuations is not susceptible to, thus judged result is more accurate.

In the said method of the present invention, can be according to the sensitivity of detection and inspection by the form of sequencing data splitter The relation surveyed between accuracy, carries out suitably weighing and arranging.In the present invention, it is preferred to the window of above-mentioned cutting is continuous not phase The window of friendship.Longer exon is divided into into continuous disjoint window, the exon shorter for length then will be whole Exon is divided in a window.When window is set to less value, it is easy to find less deletion mutation, but it is different Sequencing sequence number changes greatly and is inconvenient to compare on sample room identical window.When window is set to larger value, not equally The change of sequencing sequence number is less on uniform window between product, but cannot find less deletion mutation.

In above-described embodiment, the length of window is according to the sequencing depth of sample to be tested and check sample and the base of expected detection Arrange because of the length of deletion mutation.In being detected due to this sequencing, the length of the exon in the length and gene of all genes Degree is all known, therefore can carry out window size setting according to the mutation length of expected detection.If it is desired to detection Mutant fragments are less, then arrange a less value, otherwise can then arrange one larger value of window.The length of window is less Detection sensitivity is higher, correspondingly accuracy relative drop.The more big then accuracy of the length of window is higher, under sensitivity relatively Drop.In a kind of preferred embodiment of the present invention, it is more than or equal to for 300 × when in sequencing depth, the length of each window is 50～ 160bp.It is more than or equal to for 300 × when in sequencing depth, the length control of each window can be more taken into account into detection in 50～100bp sensitive Degree and detection accuracy.

In the said method of the present invention, the step of homogenization is processed in primarily to the sequence number of each window of homogenizing, So that sample to be tested and check sample will not cause comparison result in the sequence number of each window because of the difference of sequencing depth Deviation, thus, this area is applied to the present invention to the operation that data carry out uniforming process.In the present invention, it is preferred to will treat Test sample sheet and the respective sequencing data of check sample are cut into window, and the sequence number of respective each window is designated as into First ray Number, the summation of each First ray number is designated as the second sequence number；Formula as shown in formula (2) is each to sample to be tested and check sample From sequence number enter

Row homogenization is processed, and obtains the homogenization sequence number of sample to be tested and each each window of leisure of check sample.

In above-described embodiment, carry out uniforming the step of processing using formula listed by formula (2), can more effectively eliminate not With the deviation that the sequence number brought because depth is sequenced between sample is counted so that the sequence number phase of each window of each sample To homogeneous.

In said method, the step being mutated with the presence or absence of SNP mutation and/or InDel in the sequencing data of sample to be tested is judged Suddenly it is mutated with the presence or absence of SNP mutation or InDel in detection sample to be tested, or whether two kinds of mutation are while all exist, or Person, in sample to be tested, different purpose sites whether there is above-mentioned one or two different mutation types, thus, using this The conventional determination methods in field.

In a kind of preferred embodiment of the invention, judge in the sequencing data of sample to be tested with the presence or absence of SNP mutation and/ Or InDel includes the step of mutation：Sequence alignment procedures, sequencing data and reference gene group are compared and obtain comparing knot Really；For the first time screening step, filters out the site that there is SNP mutation and/or InDel mutation from comparison result, is designated as first Candidate locus；Programmed screening step, filters out site of crowd's mutation frequency less than 2% from the first candidate locus, is designated as Second candidate locus；SNP and/or InDel mutation judge step, according in functional annotation data base to the work(of the second candidate locus Can annotate, judge in the second candidate locus with the presence or absence of the SNP mutation site and/or InDel for causing gene function to change Mutational site；If existing, the second candidate locus are designated as into the 3rd candidate locus；SNP and/or InDel mutation verification steps, when When there are three candidate locus, the 3rd candidate locus are defined as into SNP mutation site and/or InDel mutational sites.

In above preferred embodiment, can be using such as SOAP (http during sequence alignment:// ) etc soap.genomics.org.cn/ software, sequencing gained sequence the corresponding position of reference gene group is navigated to；I.e. The SNP site different from the corresponding position of reference gene group and/or InDel sites can be obtained.In actual process, also need The overburden depth (number of times that i.e. site is measured) in each SNP site and/or InDel sites is counted, in order to ensure The accuracy in the mutational site found, site of the overburden depth less than 30 is removed.Afterwards, in remaining SNP site And/or in InDel sites, according to functional annotation of each site in functional annotation data base, determine in these sites whether deposit The function of gene can be affected in certain site, if there are such a or several sites, can confirm that this is one or several There is SNP mutation and/or InDel mutation in site.Additionally, the difference of the function according to genes of interest of interest, can also adopt Exclude or confirm whether certain site is to cause parafunctional mutational site with other corresponding householder methods.Such as, if Want to confirm whether above-mentioned site is the related site of disease, in addition to carrying out dysfunction according to the existing information of data base and judging, Can with according to disease sample and control normal specimens SNP mutation and/or InDel mutational sites information, select in crowd Site of the frequency less than 2%, is predicted using SIFT softwares to protein function, has the site of change to protein function as disease The pathogenic candidate locus of disease.

In the said method of the present invention, before step S1, also include carrying out sample to be tested and check sample respectively Prepared by exon library the step of, is prepared in the preparation process in exon library using the method for liquid phase capture.Using liquid Mutually to prepare exon library capture rate higher for the method for capture, and can save the plenty of time.

In the said method of the present invention, before the method captured using liquid phase is prepared, also include according to outside target The step of aobvious subregion design liquid phase capture probe.Liquid phase capture probe can adopt the method for designing of this area typical probe to enter Row design, such as carry out liquid phase probe customization, official manual side of NimbleGen companies by Agilent company official manual method Method carries out liquid phase probe customization etc..

It is different according to research purpose in the preparation process of above-mentioned exon library, the multiple of sample to be tested can be selected Genes of interest carries out the preparation of exon library；Or select one or more genes in multiple samples to be tested to carry out outer showing respectively Sublibrary builds.It is multiple in a kind of preferred embodiment of the invention when the exon library to multiple genes is prepared Gene at least includes following gene：MLH1、MSH2、MSH3、MSH6、PMS1、PMS2、BUB1、BUB3、STK11、PTEN、 SMAD4, APC, MUTYH, EPCAM, SETD2, MAX, TSC2, ATM and FANCC.When sequencing data includes above-mentioned multiple bases Because when, mutational site that may be present and its mutation type in above-mentioned multiple genes can be detected by said method simultaneously.

Above-mentioned multiple genes are known, and in the present invention, inventor is provided to above-mentioned at least 19 first The method that gene carries out centralized detecting, it is thus possible to which disposably detecting in same sample to be tested may in above-mentioned multiple genes The mutational site of presence and its mutation type.

In a kind of specific embodiment of the invention, the step in the above-mentioned exon library for preparing sample to be tested and check sample Suddenly include：Break process is carried out to the genomic DNA of sample to be tested and check sample, broken DNA is obtained；Broken DNA is carried out A process is repaired and added in end, obtains the DNA plerosis at 3 ' ends band " A "；Joint connection is carried out to DNA plerosis, belt lacing DNA is obtained； Enter performing PCR amplification to belt lacing DNA, obtain DNA amplification；Hybridized with liquid phase capture probe and DNA amplification, obtain treating test sample This exon library with check sample.Above-mentioned exon library is obtained containing target base in preparing using the method for liquid phase capture Because of the sequencing library of exon region, obtain the efficiency high of exon and save time.

It is after the exon library of sample to be tested and check sample is obtained and right in the said method of the present invention Before exon library is sequenced, the step of also carry out degenerative treatments including external aobvious sublibrary.Degenerative treatments are carried out herein Purpose is easy for high-flux sequence and uses.

In another kind of typical embodiment of the invention, there is provided a kind of device of deletion mutant detection, the dress Put including：Acquisition module, for obtaining the sequencing data of sample to be tested and check sample；First judge module, for judging to treat Survey in data and be mutated with the presence or absence of SNP mutation and/or InDel；And second judge module, judge to whether there is in testing data Deletion mutation；Wherein, the second judge module includes：Homogenization submodule, for sequencing data to be cut into into window, counts to be measured Sample and check sample are respectively in the sequence number of each window, and the sequence number to each window carries out homogenization process, obtains to be measured Sample and check sample are respectively in the homogenization sequence number of each window；First calculating sub module, exists for calculating matched group sample The standard deviation and median of the homogenization sequence number on each window；Second calculating sub module, for calculating each according to formula (1)

On window, the irrelevance Z values of the homogenization sequence number of sample to be tested and the median of check sample；And disappearance is sentenced Disconnected submodule, for when Z values are more than 3, then judging that window has deletion mutation.

The said apparatus of the present invention, by acquisition module the sequencing data of sample to be tested and check sample is obtained；Utilize First judge module judges to be mutated with the presence or absence of SNP mutation and/or InDel in testing data；And using the second judge module Judge to whether there is deletion mutation in testing data；And the second judge module utilizes homogenization submodule by sequencing data cutting Into window, statistics sample to be tested and check sample are carried out homogeneous respectively in the sequence number of each window to the sequence number of each window Change is processed, and obtains sample to be tested and check sample respectively in the homogenization sequence number of each window；Then the first calculating sub module meter Calculate the standard deviation and median of homogenization sequence number of the matched group sample on each window；Using the second calculating sub module according to formula (1) each window is calculated

On, the irrelevance Z values of the homogenization sequence number of sample to be tested and the median of check sample；Then perform disappearance to sentence Disconnected submodule, when Z values are more than 3, then judges that window has deletion mutation.

Said apparatus are easy to be lacked according to the sequencing depth and target of different sequencing datas by using homogenization submodule The size for losing fragment carrys out the size of flexible splitter, makes the magnitude range of the deletion fragment of detection more extensive.And, second sentences Disconnected module is to be measured by being used as calculating using the median between check sample when it is determined that a certain window whether there is deletion mutation The standard that sample compares in the departure degree of the homogenization sequence number of each window, compares using meansigma methodss and standard deviation as comparing Standard, the calculating of median is not susceptible to the impact of the sequence numbers of indivedual exceptions, it is easier to distinguishes false positive, makes determination result It is more accurate.

In the said apparatus of the present invention, above-mentioned homogenization submodule can enter to homogenization submodule commonly used in the art Row is suitably modified, and any homogenization submodule that can be standardized the sequence number of each window of the present invention is equal Suitable for the present invention.In a preferred embodiment, above-mentioned homogenization submodule is further included：Statistic unit：For The sequence number of each window of leisure each to sample to be tested and check sample is counted, and is designated as respective First ray number, will be each Counted from the First ray number sum of all windows, be designated as respective second sequence number；Computing unit：For test sample will to be treated This and check sample carry out homogenization process in the First ray number of each window according to the formula shown in formula (2), obtain treating test sample Sheet and check sample

Homogenization sequence number=sequence number ... ... ... ... ... .. of First ray number * 1000/ second (2)

The homogenization sequence number of each each window of leisure.In the embodiment, this homogenization submodule can be reduced effectively Impact of the sequencing depth difference between each sample to result.

The present invention said apparatus in, the first judge module be judge in sample to be tested whether there is SNP mutation or InDel is mutated, or whether two kinds of mutation are while all exist, or, in sample to be tested, different purpose sites are with the presence or absence of upper One or two different mutation types are stated, thus, using the conventional judge module in this area.

In a kind of preferred embodiment of the invention, the first judge module includes：Sequence alignment submodule, for being sequenced Data are compared with reference gene group and obtain comparison result；First screening submodule, deposits for filtering out from comparison result In the site that SNP mutation and/or InDel are mutated, the first candidate locus are designated as；Second screening submodule, for from the first candidate Site of crowd's mutation frequency less than 2% is filtered out in site, the second candidate locus are designated as；SNP and/or InDel mutation judge Submodule, for being in the second candidate locus according to, to the functional annotation of the second candidate locus, judging in functional annotation data base It is no to there is the SNP mutation site and/or InDel mutational sites for causing gene function to change；If existing, by the second candidate Site is designated as the 3rd candidate locus；SNP and/or InDel mutation confirm submodule, for when there are three candidate locus, inciting somebody to action 3rd candidate locus are defined as SNP mutation site and/or InDel mutational sites.

In above preferred embodiment, sequence alignment submodule can be using such as SOAP (http:// ) etc soap.genomics.org.cn/ comparing module is compared.In above-mentioned first screening submodule, according to actual number According to sequencing quality height, the screening submodule that is removed of site less than 30 to overburden depth can also be included.Second sieve Submodule is selected to be site according to the crowd's mutation frequency counted in current existing data base less than 2% to the first candidate locus Screened, the second candidate locus for obtaining belong to site of crowd's mutation frequency less than 2%, and imply that may be not belonging to body The high frequency mutation of existing individual variation, and it is probably the mutation related to disease, then perform SNP and/or InDel mutation and judge son Whether module, changing for gene function is caused according to given data storehouse to the annotation of each gene function come the mutation for judging a certain site Become, if there is such site, further perform SNP and/or InDel mutation and confirm submodule, gene function will be caused to change The site of change is defined as SNP mutation site and/or InDel mutational sites.

The above-mentioned data base annotated to gene function includes but are not limited to dbSNP (http:// www.ncbi.nlm.nih.gov/projects/SNP/)、HGMD(www.hgmd.cf.ac.uk)、ClinVar(http:// www.ncbi.nlm.nih.gov/clinvar/)、LOVInSiGHT(http://insight-group.org/ lovd.html)。

In said apparatus, before detection module, said apparatus also include that exon library prepares module：For adopting Liquid phase catching method is prepared to the exon library of sample to be tested and check sample.Above-mentioned exon library prepares module and adopts The method captured with liquid phase obtains catching for the library Exon that the sequencing library containing target gene exon region is obtained Obtain efficiency high and save time.

In said apparatus, sublibrary is shown outside and is prepared before module, device also includes that probe designs module：For basis Target exon region designs liquid phase capture probe.The design principle of probe design module is that design is little with target area complementation Fragment, captures target area sequence.Probe design module commonly used in the art can be adopted, such as by official of Agilent company Manual technique carries out liquid phase probe customization, NimbleGen companies official manual method and carries out liquid phase probe customization.

Beneficial effects of the present invention are further illustrated with reference to specific embodiment.

It should be noted that following examples describe the method for the present invention in detail by taking 19 genes listed by table 1 as an example, Reagent used or medicine and instrument, such as without special mark, both from Agilent company of the U.S..The present embodiment recruits 96 can Can be the carrier and 10 normal persons, signature Written informed consent of gene mutation, then detect that carrier there may be Mutant gene and its concrete mutation type.Buccal swab sample extraction is carried out according to buccal swab extracting method, it is prompt according to peace The description of human relations carries out chip preparation and hybridization, is sequenced according to the description of Illumina.Comprise the following steps that：

Table 1：

MLH1	MSH2	MSH3	MSH6	PMS1	PMS2
						BUB1	BUB3	STK11	PTEN	SMAD4	APC
MUTYH	EPCAM	SETD2	MAX	TSC2	ATM
						FANCC

Test chip design

Reference sequences are above-mentioned 19 genes of NCBI build 37/hg19 (from www.ncbi.nlm.nih.gov) Group exon sequence and in front and back 10bp, are completed by the design of Agilent Agilent company of the U.S..

Test two DNA extraction

1) material is processed：By the cotton swab transposition wiped across in buccal in 2ml centrifuge tubes, cotton swab part is cut with shears Under.

2) cell pyrolysis liquid and E.C. 3.4.21.64,56 DEG C of placement 60min peptic cells are added.

3) buffer, 70 DEG C of placement 10min, extruding is added to throw away cotton swab, lysate is proceeded to into new centrifuge tube.

4) dehydrated alcohol, precipitated dna are added.

5) solution is added into adsorption column centrifugation, outwells the waste liquid in collecting pipe.

6) buffer, centrifugation is added to outwell the waste liquid in collecting pipe.

7) rinsing liquid, centrifugation is added to outwell the waste liquid in collecting pipe.It is repeated 1 times.

8) it is dried column matrix.

9) elution buffer eluted dna is added in adsorption column.

10) DNA is collected by centrifugation, repeats eluting once, DNA product is stored in -20 DEG C.

Test the preparation of three libraries

Step one：DNA is crushed

1) gDNA quality inspections, it is ensured that DNA is up-to-standard (without degraded；A260/A280 is between 1.8-2.0).Detected with Qubit The concentration of sample gDNA.

2) according to parameter setting covaris of table 2, concrete operations are as follows：

Table 2：

Arrange	Numerical value
		Service factor (Duty Factor)	10%
Power peak (PIP)	175
		Each pulse period number	200
Process time	360sec
		Temperature	4 DEG C～8 DEG C

A. deionized water, water level is added to reach scale " 12 " in covaris water vats；

B. check whether water level can not have sample cell glass part；

C. chilling temperature is set to into 2-5 DEG C, is cooled to 5 DEG C；

D. alternatively, add ethylene glycol (ethylene glycol) to the 20% of cumulative volume, prevent from freezing.

E. " Degas " button on panel is pressed, " Degas " operation at least 30min before use.

3) in 1.5ml EP pipes, 3ug gDNA are diluted to into 130ul with 1X Low TE Buffer；

4) Covaris microTube are attached on covaris；

5) 130ul DNA samples are carefully drawn with taper pipette tips, in being added to Covaris microTube pipes.It is (careful Operation, should not make tube bottom bubble occur)

6) carry out DNA according to the Covaris parameters of the setting of table 2 to crush, the main peak of breakdown products is in 150-200bp.

7) DNA sample after carefully being crushed with taper pipette tips is drawn onto in a new 1.5ml EP pipe.

Step 2：With Agencourt AMPure XP magnetic beads for purifying DNA samples

1) AMPure XP bead are placed at least 30min in room temperature；Then fully mix AMPure XP bead to suspend Liquid, until suspension color homogeneous (should not freeze).

2) add in the new pipes of 1.5ml AMPure XP bead suspensions that 180ul mixes and broken DNA library (～ 130ul).It is vortexed and mixes, room temperature places 5min.

3) pipe is placed on magnetic frame, stands about 3-5min and become clarification to solution.

4) supernatant in pipe is carefully absorbed on magnetic frame, pipette tips should not encounter magnetic bead.

5) on magnetic frame, in each Guan Zhongfen the ethanol of 500ul 70% is added.Can be obtained more with the fresh ethanol now matched somebody with somebody Good effect.

6) after standing 1min allows magnetic bead sedimentation, ethanol is absorbed.

7) repeat step 5), 6) once.

8) in the upper 37 DEG C of heating 5min of heat block (head block), or it is heated to the ethanol evaporating completely remained in pipe. Note：Magnetic bead surfaces should not be heated to and crackle occur.Magnetic bead overdrying can cause the efficiency of eluting to be remarkably decreased.

9) add 50ul without RNase water, mix on vortex instrument, room temperature places 2min.

10) PE pipes are placed on magnetic frame, stand about 2-3min and become clarification to solution.

11) in drawing the new 1.5ml pipes of about 50ul supernatants to.Magnetic bead can be abandoned after this EOS.If no Subsequent step is carried out, by Sample preservation in -20 DEG C of refrigerators.

Step 3：Repair end

1) using SureSelect Library Prep Kit, ILM. test kits prepare reactant liquor on ice.

2) reaction mixture is prepared in PCR pipe (or comb, PCR plate) formula shown in inner according to the form below 3, is mixed.

3) 52ul reactant liquor mix are added in each PCR pipe (or hole).

4) 48ul DNA samples are added in each PCR pipe (or hole), is mixed with rifle pressure-vaccum.

5) it is subsequently placed in PCR instrument, 20 DEG C of temperature bath 30min hot should not be covered.

Table 3：

Step 4：With Agencourt AMPure XP magnetic beads for purifying DNA samples (the same step 2 of concrete operations)

Step 5：DNA fragmentation end adds A

1) using SureSelect Library Prep Kit, ILM. test kits, the formula of according to the form below 4 prepares anti-on ice Answer liquid.

Table 4：

2) it is placed in PCR instrument, 37 DEG C of temperature bath 30min.If using heat lid, it is ensured that hot lid temperature is less than 50 DEG C.Step Six：With Agencourt AMPure XP magnetic beads for purifying DNA samples (concrete purification process same step 2)

Step 7：Joint of the connection with special label

1) formula preparation end adds joint reactant liquor as shown in table 5；

Table 5：

2) it is placed in PCR instrument, 20 DEG C of temperature bath 15min.Should not be using heat lid.If not carrying out subsequent step, sample is protected There are -20 DEG C of refrigerators.

Step 8：With Agencourt AMPure XP magnetic beads for purifying DNA samples (same to step 2)

Step 9：Amplification connects the library of joint

1) library of joint is connected to, is only expanded with therein 1/3, remaining Sample preservation is in -20 DEG C of refrigerators.

2) PCR reactant liquors are prepared according to formula shown in table 6：

Table 6：

Note：The amount of added DNA library can also be 250ng (quantitative with bioanalyzer DNA1000chip).

3) PCR instrument is put into, according to the form below 7 arranges PCR response procedures and reacted.

Table 7：

Step 10：With Agencourt AMPure XP magnetic beads for purifying DNA samples (same to step 2)

Test the capture of four liquid phases

Step one：Library hybridization

This part contains following steps：By the library for preparing and hybridizing reagent, closed reagent (blocking ) and SureSelect capture probes library carries out hybrid reaction agent.Each DNA library must individually be hybridized and be caught Obtain, then again by PCR reaction introducing index.

Each library is done once hybridization and is once captured, and should not carry out the mixed pond of sample in this step.Hybridization requires 750ng DNA initial amounts, maximum volume is specific as follows no more than 3.4ul：

1) at room temperature according to formula preparing hybrid buffer as shown in table 8 below.

Table 8：

2) SureSelect capture library mixture (the Capture library for target acquistion are prepared in PCR plate mix)；Pipe is kept to be put on ice for.For each sample, according to the size (Mb) of target area, the ratio with reference to shown in table 9 below Add appropriate SureSelect captures library (Capture Library).And with reference to table 9 below with without the dilution of RNase water SureSelect RNase Block.Prepare the diluent of sample reactions all enough according to table 9 simultaneously, to leave surplus capacity. SureSelect RNase Block diluents are added with reference to table 9 below, is mixed with rifle pressure-vaccum.

Table 9：

3) the SureSelect Block Mix of sample reactions all enough are prepared according to table 10.

Table 10：

4) in another PCR plate, the library for preparing is processed, for target acquistion.

A. sample is divided into into the row of A, B two, in each hole on B rows, is separately added into 3.4ul 221ng/ul libraries.

B. in each hole on B rows, 5.6ul SureSelect Block Mix are separately added into.It is mixed with the upper and lower pressure-vaccum of rifle It is even.

C. the hole of each sample is obturaged with lid, is put into PCR instrument,

D. reacted according to the program in table 11；

Table 11：

Step	Temperature	Time
			1	95℃	5min
2	65℃	Constant temperature

5) during 65 DEG C of temperature baths, covered with 105 DEG C of heat.

6) keep PCR plate under conditions of 65 DEG C, in each hole that the A of 96 orifice plates is arranged 40ul hybridization buffers added, The hole count of addition is identical with the library number of B rows on 96 orifice plate.Note：Ensure to carry out before step 10, PCR plate is in 65 DEG C of temperature baths At least 5min.

7) add on capture library mix to the PCR of step 2 preparation：

A. keep PCR plate under conditions of 65 DEG C, 7ul capture are added in the hole on C rows on above-mentioned 96 orifice plate library mix。

B. mouth is sealed up with row's lid, it is ensured that sealing is tight.

C.65 a DEG C temperature bathes 2min.

8) keep PCR plate under conditions of 65 DEG C, 13ul hybridization buffers are drawn from A rows with the volley of rifle fire, be added to C rows' In capture library mix.

9) keep PCR plate under conditions of 65 DEG C, arranged from B with the volley of rifle fire and draw whole library mixed liquors, be added to C rows' In hybridization solution.With rifle lentamente upper and lower pressure-vaccum 8-10 time, fully mix.Now the volume of hybrid mixed liquid is probably 27- 29ul, evaporates the Volume Loss size for causing when bathing depending on front step temperature.

10) with row's lid or double-deck mucosa (double adhesive film) sealing, it is ensured that all hole sealings are tight.

Note：Using new row's lid or sealed membrane, used its integrity in heating process can decline.If using row Pipe, situation about being evaporated by preliminary experiment inspection before the first use, it is ensured that the volume of evaporation does not exceed 3-4ul.

11) hybrid mixed liquid is covered in 65 DEG C of temperature bath 24h with 105 DEG C of heat.

Step 2：Prepare magnetic bead

This step uses the reagent of SureSelect Target Enrichment Kit Box#1：SureSelect Bind Buffer and SureSelect Wash 2.

1) 65 DEG C of preheating SureSelect Wash 2 on water-bath or heat block, use in Step 3.

2) magnetic bead can be settled when preserving, and be vortexed acutely concussion, allow Dynabeads MyOne Streptavidin T1 suspends again.

3) to each hybridization, 50ul Dynabeads MyOne Streptavidin T1 to 1.5ml centrifuge tubes are taken In.

4) magnetic bead is rinsed：

A. 200ul SureSelect Binding Buffer, votex concussion 5s are added.

B. pipe is placed on magnetic frame, becomes to solution and absorb supernatant after clarification.

C. twice of repeat step a-b, rinses 3 times altogether.

5) suspended again magnetic bead with 200ul SureSelect Binding Buffer.

Step 3：Capture and eluting

This step uses the reagent of SureSelect Target Enrichment Kit Box#1：SureSelect Wash 1 and SureSelect Wash 2.

1) after the temperature bath of 24 hours, estimate (being estimated with rifle) and record the volume of remaining hybrid mixed liquid.

2) keep PCR plate under conditions of 65 DEG C, hybrid mixed liquid is applied directly in bead solution, overturn and mix 3-5 It is secondary.

Note：If after temperature bath hybridization 24h, there is excessive evaporation, remaining volume is less than 20ul, it will after impact Continuous capture effect.

3) mixed liquor is placed on nutator (wobbler), room temperature mixes 30min.

4) brief centrifugation.

5) pipe is placed on magnetic frame, is stood to solution clarification, absorb supernatant.

6) 500ul SureSelect Wash 1 are added, votex 5s allow bead to suspend again.

7) room temperature places 15min, is mixed several times with votex therebetween.

8) brief centrifugation.

9) pipe is placed on magnetic frame, is stood to solution clarification, absorb supernatant.

10) bead is rinsed

A. add 500ul through the SureSelect Wash 2 of 65 DEG C of preheatings, votex 5s allow bead to suspend again.

B. 65 DEG C of temperature bath 10min on water-bath or heat block, are mixed several times therebetween with votex.

If c. bead has been settled, it is reverse it is several under allow it to suspend.

D. brief centrifugation.

E. pipe is placed on magnetic frame, is stood to solution clarification, absorb supernatant.

F. twice of repeat step a-e, rinses 3 times altogether.Guarantee that all of wash buffer are absorbed.

G. 30ul nuclease-free water, votex 5s is added to allow bead to suspend again.

Experiment five：PCR amplifications, introducing label (index) after hybridization

The experimental procedure that this part includes is：Index, PCR primer purification and library quality inspection are entered by pcr amplification primer.

Step one：Pcr amplification primer enters index

The reagent that this step is used：

·Herculase II Fusion DNA Polymerase(Agilent)

·SureSelect Target Enrichment Kit ILM Indexing Hyb Module Box#2

·SureSelect Library Prep Kit,ILM

Note：Should not be with the PCR enzymes beyond Herculase II Fusion DNA Polymerase, the effect of other enzymes is not Empirical tests.

1) 1 hybridization is with 1 PCR reaction, an additional negative control (being not added with template).

2) multiple samples are placed on ice, are proceeded as follows：

A. the formula of according to the form below 12 prepares reaction liquid mixture, mixes；

B. 35ul reactant liquor mix are added in each PCR pipe (or hole).

C. PCR Primer Index are taken out from test kit " SureSelect Library Prep Kit, ILM " 1through Index 16 (clear caps), add the appropriate index of 1ul in each hole, mixed with rifle pressure-vaccum.

For by the different samples being sequenced on same lane, using different index primer.

E. with each DNA sample of rifle pressure-vaccum, it is ensured that bead solution mix homogeneously.

F. each sample draws 14ul in corresponding PCR pipe (or hole), and upper and lower pressure-vaccum is mixed.

Table 12：Herculase II Master Mix formula

* Herculase II Fusion DNA Polymerase (Agilent) test kits are taken from.Should not be using other examinations Buffer the and dNTP mix of agent box.

A takes from test kit：SureSelect Target Enrichment Kit ILM Indexing Hyb Module Box#2。

B uses SureSelect Library Prep Kit, 1 in the primer of 16 in ILM test kits.

3) PCR pipe is put into into PCR instrument to be expanded, amplification program such as table 13 below：

Table 13：

Step 2：With Agencourt AMPure XP magnetic beads for purifying DNA samples (with the step two in experiment three)

Test six high-flux sequences

Step one：Dilution library, degeneration

1) degeneration 0.2N NaOH are prepared：It is molten that 200 μ L 0.1N NaOH are added to preparation 0.2N NaOH in 800 μ L pure water Liquid.

2) library is diluted to into 2nM, according to each library desired data amount pooling, obtains the library that concentration is 2nM and dilute Liquid.

3) the isopyknic 10 μ L 0.2nM NaOH of 10 μ L 2nM libraries diluents additions are taken, after pressure-vaccum mixes 3 times, is started Timing 5min.Period concussion is mixed, that is, shake 10s, is centrifuged, and repeats concussion centrifugally operated twice.

4) after degeneration 5min, 970 μ L HT1, concussion are added to mix library solution, 280*g in degeneration library

5) 1min is centrifuged, obtains the degeneration library of 20pM.

6) the degeneration library of 20pM is diluted to into 3pM for upper machine.The μ L of degeneration library solution 450 are added to 2550 μ L pre-coolings HT1 in, it is reverse mixing for several times, centrifugation, obtain 3mL 3pM degeneration library.

Step 2：Upper machine

1) prepare test kit (Reagent Cartridge), thaw, check and add sodium hypochlorite；Prepare sequence testing chip (flow cell)：Equilibrate to room temperature, opening, check.

2) test kit (Reagent Cartridge) is prepared：First test kit (Reagent Cartridge) thaws, so Check that test kit (Reagent Cartridge) big reservoir determines whether reagent thaws completely afterwards.

(1) test kit (Reagent Cartridge) thaws：Test kit (Reagent Cartridge) can be in 2-8 DEG C, overnight thaw.Just can thaw completely in the minimum 18h of this temperature reagent.One week can be preserved in this temperature reagent.① Test kit (Reagent Cartridge) is taken out from -15-25 DEG C；2. test kit (Reagent Cartridge) is put into and can be soaked In ning the water-bath of room temperature of test kit (Reagent Cartridge) bottom.Note：Water will not reach test kit (Reagent Cartridge top).3. reagent thaws about 60min in room-temperature water bath, to thawing completely.4. test kit is taken out from water-bath (Reagent Cartridge), raps on the table, removes the water of test kit (Reagent Cartridge) bottom, makes examination Agent box (Reagent Cartridge) bottom is dried.

(2) test kit (Reagent Cartridge) is checked：1. overturn test kit (Reagent Cartridge) to mix for 5 times The reagent of even defrosting.2. 29,30,31 and 32 reservoirs of test kit (Reagent Cartridge) bottom are checked, it is ensured that these storages The reagent of layer thaws completely.3. rap test kit (Reagent Cartridge) on the table to drive out of in the bubble in reagent.

(3) it is put into fresh NaOCl：In order to avoid pollution of upper one operation to instrument, in Reagent Cartridge Before being put into Nextseq 500, the NaOCl of dilution is added in Reagent Cartridge.Illumina recommends 3%-6%'s NaOCl is diluted to 0.03%-0.06%.Note：The NaOCl of preparation is used in 24h.1. the 0.03%-0.06% of 2mL is prepared NaOCl, volume ratio is the μ L of NaOCl 20 of 20 3%-6% and the μ L of pure water 1980.2. overturn and mix centrifuge tube for several times；3. paper is used Towel wipes clean 28 hole napkins；4. 28 hole pore membranes are broken with clean 1mL pipette tips；5. 2mL is added in No. 28 holes 0.03%-0.06%,

(4) sequence testing chip (flow cell) is prepared：Sequence testing chip (flow cell) is taken out from 2-8 DEG C, bag is opened Dress and by sequence testing chip (flow cell) wiped clean, machine in wait.

3) addition library diluent is in No. 10 holes of test kit.

4) sequence is selected to start sequencing program setting steps from software interface.

5) it is put into sequence testing chip (flow cell)；

6) waste tray is emptied, and is put back to.Buffer box is put into, test kit is put into.

7) inspection result before operational factor and operation is examined.Selection brings into operation.

8) by NCS softwares and SAV software supervision runnings.

Test seven data analysiss

Step one：Data filtering

Raw sequencing data is with fastq stored in file format (filenames：* .fq), needed before next step analysis is carried out Data filtering is carried out, filter method is as follows：

(1) need to filter out the sequence containing joint sequence (reads)；

(2) when the content of the N contained in single-ended sequencing sequence exceedes the 10% of the sequence length ratio, need to remove This is to both-end sequencing sequence (paired reads)；

(3) as the low quality (Q contained in single-ended sequencing sequence<=5) base number exceed the read length ratios When 50%, need to remove this to both-end sequencing sequence (paired reads).

Step 2：Sequence alignment and Quality Control

Through the strict filtration to sequencing data, high-quality ordered sequence (Clean data) is obtained.Ordered sequence leads to Cross BWA (Burrows-Wheeler Alignment tool) software to compare to NCBI build 37/hg19 reference gene groups On, comparison result Jing picard (http://broadinstitute.github.io/picard/) remove and repeat, and filter out Sequence of the base mismatch number more than 5.

Step 3：Pathogenic mutation analysis is carried out to target sequence

3.1SNP and InDel analyses comprise the steps：

(1) by software SOAP (http://soap.genomics.org.cn/), sequencing gained sequence is navigated to people The corresponding position of genoid group；

(2) SNP and InDel overburden depths are counted, removes site of the overburden depth less than 30.

(3) according to disease sample and normal specimens information, selecting the site in crowd's medium frequency less than 2% is carried out further Understand, protein function is predicted using SIFT softwares, resulting site is used as the pathogenic candidate locus of disease.

(4) synthesis dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/)、HGMD (www.hgmd.cf.ac.uk)、ClinVar(http://www.ncbi.nlm.nih.gov/clinvar/)、LOVD InSiGHT(http://insight-group.org/lovd.html) mutational site is annotated.Jing is analyzed, resulting Pathogenic candidate locus are as shown in table 14 below.

Table 14：

Annotation：

“-“：Finger does not detect any change or without relevant information.

Heterozygosis：Refer to that on same site two allele there are different genotype.

It is pure and mild：Refer to that on same site two allele have identical genotype.

Nonsense mutation：Refer to because the change of certain base makes to represent the codon mutation of certain aminoacid as termination codon Son, so that peptide chain synthesis terminates in advance.

Missense mutation：Refer to that the codon for encoding certain aminoacid Jing after base replacement, becomes to encode another kind of aminoacid Codon so that the amino acid classes and sequence of polypeptide chain change.

Splice site：Referring to may affect subgenomic transcription to form the variation of messenger RNA.

Insertion mutation：Finger inserts nucleotide in genome, the mutation for causing gene code to change.

Deletion mutation:Finger lost several nucleotide in genome, the mutation for causing gene code to change.

The analysis of 3.2 large fragment deletions comprises the steps：

(1) partition window value：Selected 100bp is divided into the longer target area of length as information analysiss window value Length is the window of 100bp.The window shorter in order to prevent length, target area of the length less than 160bp does not divide Process.

(2) using the depth of Coverage module meters of GATK (The Genome Analysis Toolkit) instrument The sequencing sequence number of target sample and control sample group on each window is calculated, both are carried out into homogenization process, uniformed Processing formula is：

The sequence sum of all windows of sequence number * 1000/ original on sequence number=window on the rear hatch of standardization

(3) using the sequence number after standardization, mark of the control sample on each window between sequence number is calculated It is accurate poor, and standard deviation is designated as into Sd.The median of control sample sequence number on each window is calculated, and median is designated as Med。

(4) for specific window, the median of sequence number and check sample after statistics examined samples standardization Difference, calculates and deviates median degree, and when departure degree is more than 3*Sd deletion mutation is judged as.The formula that disappearance judges is such as Under：

Zi=(by sequence number i-Medi after sample product standardization)/(Sdi)

Then it is judged as there occurs disappearance on i-th window when Zi is more than 3.

The gene such as table 15 below of presence deletion mutation detected according to the method described above：

Table 15：

Chromosome	Genomic locations	Fragment length	Gene	Variable region	Variation type
						5	112043353-112198302	154949bp	APC	Exon region	Large fragment deletion

Said method and existing employing average are carried out into the method for irrelevance detection and is less than with median ratio 0.6 detection method is compared, concrete comparative result such as table 16 below：

Table 16：

/	Recall rate	Positive predictive value (PPV)
			The present invention	13.54%	100%
The irrelevance of calculating compared with average	15.63%	86.67%
			0.6 is less than with median ratio	20.8%	65%

From above-mentioned table 16 as can be seen that method of the present invention method compared to existing technology reduces false sun in recall rate The recall rate of property so that positive predictive value reaches 100%, and the accuracy for showing the positive prediction of the method for the present invention is significantly carried It is high.

Experiment eight is verified

To prove the accuracy of above-mentioned large fragment deletion testing result, positive findingses are tested and analyzed by DPHLC methods and is carried The mutation result of person, experimental procedure is as follows：

1：Mesh is directed to using software Primer primer5.0 (www.premierbiosoft.com/primerdesign) Mark point designs primer.Specifically it is shown in Table 17：

Table 17：

2：PCR is expanded.Amplification system such as table 18 below, amplification program such as table 19 below.

Table 18：

PCR reactive components	Each system addition
		10×Buffer I	5μl
2.5mM dNTP	4μl
		Primer sets	10μl
HS Taq enzymes (5U/ μ l)	0.4μl
		DNA	2.0μl
ddH₂O	Polishing is to 50 μ l

Table 19：

3.PCR products are sequenced

Take 1 μ l PCR primers to be detected with 2.0% agarose gel electrophoresiies, and send sequencing.

4. sequencing result is shown in Fig. 1.

In Fig. 1, DHPLC interpretation of result Main Analysis be amplified production peak area, because peak area is approximately equal to bottom (peak It is wide) × high (peak height)/2, so the amount (i.e. copy number) of PCR primer can pass through testing sample and the peak height before standard control Judge indirectly.Each peak is an individually designed product on Fig. 1, is all base to be measured in addition to a standard reference gene The different exons of cause.As long as reference gene is alignd after (peak base and peak height), before observation testing sample and standard control The height of other products (peak) is it may determine that the copy number difference of different exons.From Fig. 1 sample to be tested with compare between Peak height can be seen that the APC of sample to be tested and there is large fragment deletion, this is consistent with the result of secondary sequencing.So as to demonstrate The effectiveness and accuracy of the sequence measurement of the present invention.

As can be seen from the above description, the above embodiments of the present invention realize expected technique effect：By treating The sequencing data of test sample sheet and check sample to be cut into window in the form of carry out the calculating of sequence number, be easy to according to different surveys The sequencing depth of ordinal number evidence and the size of target deletion fragment carry out the size of flexible splitter, make detection deletion fragment it is big Small range is more extensive；Also, when it is determined that a certain window whether there is deletion mutation, according to sample to be tested the second of each window The ratio of the median between sequence and check sample is determined, by using the median between check sample as the mark for comparing Standard, compares using meansigma methodss and standard deviation as the standard for comparing, it is easier to distinguish false positive, makes determination result more accurate, because Be when there is no to copy number variation on certain window, using meansigma methodss and standard deviation as the standard for comparing determination mode meeting Affect the accuracy for determining result.

It should be noted that can be in such as one group computer executable instructions the step of the flow process of accompanying drawing is illustrated Perform in computer system, and, although show logical order in flow charts, but in some cases, can be with not The order being same as herein performs shown or described step.

Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general Computing device realizing, they can be concentrated on single computing device, or are distributed in multiple computing devices and are constituted Network on, alternatively, they can be realized with the executable program code of computing device, it is thus possible to they are stored Performed by computing device in the storage device, or they be fabricated to respectively each integrated circuit modules, or by they In multiple modules or step be fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific Hardware and software is combined.

The preferred embodiments of the present invention are the foregoing is only, the present invention is not limited to, for the skill of this area For art personnel, the present invention can have various modifications and variations.It is all within the spirit and principles in the present invention, made any repair Change, equivalent, improvement etc., should be included within the scope of the present invention.

Sequence table

<110>Tianjin Nuo Hezhi sources bio information Science and Technology Ltd.

<120>The method and apparatus of detection gene mutation

<130> PN41432NHZY

<160> 14

<170> PatentIn version 3.5

<210> 1

<211> 20

<212> DNA

<213>Synthetic

<400> 1

tcgggaagcg gagagagaag 20

<210> 2

<211> 20

<212> DNA

<213>Synthetic

<400> 2

agacagtgcg agggaaaacc 20

<210> 3

<211> 20

<212> DNA

<213>Synthetic

<400> 3

atttaccagt gagggacggg 20

<210> 4

<211> 20

<212> DNA

<213>Synthetic

<400> 4

acgcttttga gggttgattc 20

<210> 5

<211> 20

<212> DNA

<213>Synthetic

<400> 5

taaggtgcgt gctttgagag 20

<210> 6

<211> 21

<212> DNA

<213>Synthetic

<400> 6

acatcctgag ggtaaggcta a 21

<210> 7

<211> 25

<212> DNA

<213>Synthetic

<400> 7

tgactgtaat attctaagtc ctacc 25

<210> 8

<211> 20

<212> DNA

<213>Synthetic

<400> 8

gagattctga agttgagcgt 20

<210> 9

<211> 22

<212> DNA

<213>Synthetic

<400> 9

cacaacatca ttcactcaca gc 22

<210> 10

<211> 22

<212> DNA

<213>Synthetic

<400> 10

tacttggatt tttgtcctgg tc 22

<210> 11

<211> 25

<212> DNA

<213>Synthetic

<400> 11

tgacaaagga agaacagata gcaaa 25

<210> 12

<211> 22

<212> DNA

<213>Synthetic

<400> 12

aagcctgggt gacagagtga ga 22

<210> 13

<211> 19

<212> DNA

<213>Synthetic

<400> 13

tgttgactcg atccacccc 19

<210> 14

<211> 21

<212> DNA

<213>Synthetic

<400> 14

tgagctgcaa gtttggctga a 21

Claims

1. it is a kind of detection gene mutation method, it is characterised in that the method comprising the steps of：

Obtain the sequencing data of sample to be tested and check sample；

Judge to be mutated with the presence or absence of SNP mutation and/or InDel in the sequencing data of the sample to be tested；And

Judge to whether there is deletion mutation in the sequencing data of the sample to be tested；

Wherein, judge that the step of whether there is deletion mutation in the sequencing data of the sample to be tested includes：

Homogenization is processed, and the sequencing data is cut into into window, counts the sample to be tested and check sample respectively in each window The sequence number of mouth, and the sequence number to each window carries out homogenization process, obtains the sample to be tested and control sample one's duty Not in the homogenization sequence number of each window；

Standard deviation and median are calculated, and calculate the standard deviation of the homogenization sequence number of the matched group sample on each window And median；

Irrelevance calculate, according to formula (1) calculate on each described window, the homogenization sequence number of the sample to be tested with The irrelevance Z values of the median of the check sample；And

Disappearance judgement, when the Z values are more than 3, then judges that the sample to be tested has deletion mutation in the window.

2. method according to claim 1, it is characterised in that in the homogenization process step, by the sequencing data It is cut into continuous disjoint window.

3. method according to claim 1, it is characterised in that the step of homogenization is processed includes：

The respective sequencing data of the sample to be tested and check sample is cut into into window, and by each each window of leisure Sequence number is designated as First ray number；And by the summation of each First ray number, it is designated as respective second sequence number；And

The formula First ray number respective to the sample to be tested and check sample as shown in formula (2) is uniformed

Process, obtain the homogenization sequence number of the sample to be tested and each each window of leisure of check sample.

4. method according to claim 1, it is characterised in that whether there is in the sequencing data of the judgement sample to be tested The step of SNP mutation and/or InDel are mutated includes：

Sequence alignment, the sequencing data of the sample to be tested and reference gene group is compared and obtains comparison result；

Screen for the first time, the site that there is SNP mutation and/or InDel mutation is filtered out from the comparison result, be designated as first Candidate locus；

Programmed screening, filters out site of crowd's mutation frequency less than 2% from first candidate locus, is designated as the second time Bit selecting point；

SNP and/or InDel mutation judge, according to, to the functional annotation of second candidate locus, sentencing in functional annotation data base With the presence or absence of the SNP mutation site and/or InDel mutation position for causing gene function to change in second candidate locus of breaking Point；If existing, second candidate locus are designated as into the 3rd candidate locus；And

SNP and/or InDel mutation confirming, when there are three candidate locus, the 3rd candidate locus being defined as SNP mutation site and/or InDel mutational sites.

5. method according to any one of claim 1 to 4, it is characterised in that in the acquisition sample to be tested and control Before the step of sequencing data of sample, methods described also includes carrying out exon respectively to the sample to be tested and check sample Prepared by library the step of, the step of prepared by the exon library in be prepared using the method for liquid phase capture.

6. method according to claim 5, it is characterised in that before the method captured using the liquid phase is prepared, Also include the step of liquid phase capture probe is designed according to target gene exon region.

7. method according to claim 6, it is characterised in that exon library preparation process is included to multiple targets The exon library of gene is prepared, and the plurality of target gene at least includes following gene：MLH1、MSH2、MSH3、 MSH6、PMS1、PMS2、BUB1、BUB3、STK11、PTEN、SMAD4、APC、MUTYH、EPCAM、SETD2、MAX、TSC2、ATM And FANCC.

8. it is a kind of detection deletion mutant device, it is characterised in that described device includes：

Acquisition module, for obtaining the sequencing data of sample to be tested and check sample；

First judge module, for prominent with the presence or absence of SNP mutation and/or InDel in the sequencing data for judging the sample to be tested Become；And

Second judge module, for whether there is deletion mutation in the sequencing data for judging the sample to be tested；

Wherein, second judge module includes：

Homogenization submodule, for the sequencing data to be cut into into window, counts the sample to be tested and check sample difference In the sequence number of each window, and the sequence number to each window carries out homogenization process, obtains the sample to be tested and control Sample is respectively in the homogenization sequence number of each window；

First calculating sub module, for calculate the homogenization sequence number of the check sample on each window standard deviation and Median；

Second calculating sub module, for calculating each described window, the homogenization sequence of the sample to be tested according to formula (1) The irrelevance Z values of columns and the median of the check sample；And

Disappearance judging submodule, lacks for when the Z values are more than 3, then judging that the sample to be tested exists in the window Mutation.

9. device according to claim 8, it is characterised in that the homogenization submodule is further included：

Statistic unit, for the sequence number of each each window of leisure of the sample to be tested and check sample to be counted, is designated as Respective First ray number, the First ray number sum of respective all windows is counted, and is designated as respective second sequence Columns；And

Computing unit, for by the sample to be tested and check sample each window the First ray number according to formula (2)

Shown formula carries out homogenization process, obtain the sample to be tested and each each window of leisure of check sample it is described One changes sequence number.

10. device according to claim 8, it is characterised in that the first judge module includes：

Sequence alignment submodule, obtains comparing knot for the sequencing data of the sample to be tested and reference gene group to be compared Really；

First screening submodule, for filtering out the site that there is SNP mutation and/or InDel mutation from the comparison result, It is designated as the first candidate locus；

Second screening submodule, for filtering out site of crowd's mutation frequency less than 2%, note from first candidate locus For the second candidate locus；

SNP and/or InDel mutation judging submodule, for according in functional annotation data base to second candidate locus Functional annotation, judge in second candidate locus with the presence or absence of cause SNP mutation site that gene function changes and/or InDel mutational sites；If existing, second candidate locus are designated as into the 3rd candidate locus；And

SNP and/or InDel mutation confirm submodule, for when there are three candidate locus, by the 3rd candidate bit Point is defined as SNP mutation site and/or InDel mutational sites.

11. devices according to claim 8, it is characterised in that obtain sample to be tested and control sample in the acquisition module Before this sequencing data, described device also includes that exon library prepares module, and the exon library prepares module to be used for The exon library of the sample to be tested and check sample is prepared using liquid phase catching method.

12. devices according to claim 11, it is characterised in that prepare module to described to be measured in the exon library Before the exon library of sample and check sample is prepared, described device also includes that probe designs module, and the probe sets Meter module is used to design liquid phase capture probe according to target gene exon region.