WO2021077411A1

WO2021077411A1 - Chromosome instability detection method, system and test kit

Info

Publication number: WO2021077411A1
Application number: PCT/CN2019/113332
Authority: WO
Inventors: 钱自亮; 王白云; 徐文胜
Original assignee: Suzhou Hongyuan Biotech Co Ltd
Current assignee: Suzhou Hongyuan Biotech Co Ltd
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2021-04-29
Anticipated expiration: 2022-04-25

Abstract

A chromosome instability detection method, a system and a test kit, the detection method comprising: dividing a reference DNA sequence into a number n of regions; sequencing extracted DNA in a sample, obtaining a DNA sequence including multiple fragments; comparing the obtained DNA sequence and the reference DNA sequence, and outputting all fragment numbers, fragment start and termination positions and fragment nucleobase comparison results having a high degree of similarity; counting a number Ci of all highly similar fragments falling within a region numbered i; performing normalization of Ci; determining whether a normalized number Sigi is significantly higher than normalized numbers for other regions, and if so, reporting that an abnormality has occurred for region number i and the normalized number Sigi, and acquiring normalized numbers corresponding to all regions of the n regions in the present sample for which an abnormality has occurred; acquiring a sample again, and obtaining normalized numbers corresponding to abnormalities in the newly acquired sample; calculating a degree of similarity value for detection values of the former and latter samples; if the degree of similarity value is greater than a set value, outputting a chromosome instability detection result.

Description

Chromosome instability detection method, system and kit

Technical field

本发明涉及染色体不稳定性技术领域，特别是涉及一种染色体不稳定性检测方法、系统及含有染色体不稳定性检测系统的试剂盒。The invention relates to the technical field of chromosome instability, in particular to a chromosome instability detection method and system and a kit containing the chromosome instability detection system.

Background technique

染色体不稳定通常与肿瘤相关，具体包括整个染色体或者染色体片段拷贝的缺失或扩增。含有与肿瘤发生相关的染色体或者染色体片段的扩增和缺失经常是肿瘤发生所特有的，对肿瘤染色体不稳定区域的检测，对于研究肿瘤发生和开发肿瘤诊断技术都是至关重要。Chromosomal instability is usually associated with tumors, specifically including the deletion or amplification of copies of entire chromosomes or chromosomal fragments. The amplification and deletion of chromosomes or chromosome fragments that are related to tumorigenesis are often unique to tumorigenesis. The detection of chromosomal unstable regions of tumors is very important for the study of tumorigenesis and the development of tumor diagnostic technology.

发明内容Summary of the invention

本发明针对现有技术存在的问题和不足，提供一种新型的染色体不稳定性检测方法、系统及试剂盒。In view of the problems and deficiencies in the prior art, the present invention provides a novel chromosome instability detection method, system and kit.

本发明是通过下述技术方案来解决上述技术问题的：The present invention solves the above technical problems through the following technical solutions:

本发明提供一种染色体不稳定性检测方法，其特点在于，其包括以下步骤：The present invention provides a method for detecting chromosome instability, which is characterized in that it includes the following steps:

S1、对参考DNA序列的所有片段进行区间划分以划分出n个区间；S1. Perform interval division on all fragments of the reference DNA sequence to divide n intervals;

S2、获取样本，从获取的样本中提取出所需的DNA，并对提取出的DNA进行DNA序列测序以获得一组含有多个片段的DNA序列，对每一片段进行片段编号；S2. Obtain a sample, extract the required DNA from the obtained sample, perform DNA sequence sequencing on the extracted DNA to obtain a set of DNA sequences containing multiple fragments, and number each fragment;

S3、将获得的DNA序列的所有片段与参考DNA序列的对应片段进行碱基的一一对比，输出所有的DNA序列的片段与参考DNA序列的对应片段高度相似的片段编号、片段起始位置、片段终止位置和片段碱基一一对比结果；S3. Compare all the fragments of the obtained DNA sequence with the corresponding fragments of the reference DNA sequence one by one, and output all the fragments of the DNA sequence that are highly similar to the corresponding fragments of the reference DNA sequence, the fragment number, the starting position of the fragment, The result of comparing the end position of the fragment and the base of the fragment one by one;

S4、计数所有高度相似的片段中落入参考DNA序列的第i个区间的数目C _i，1≤i≤n； _{S4. Count the number C i of} all highly similar fragments that fall into the i-th interval of the reference DNA sequence, 1≤i≤n;

S5、对数目C _i进行归一化处理以获得归一化数目Sig _i； S5. Perform normalization processing on the number C _i to obtain a normalized number Sig _i ;

S6、判断第i个区间的归一化数目Sig _i是否显著高于其他n-1个区间的归一化数目，若是则报告第i个区间发生异常并同时报告归一化数目Sig _i，从而获取该次样本在n个区间中发生异常的所有区间对应的归一化数目； S6. Determine whether the normalized number Sig _{i of the i} -th interval is significantly higher than the normalized number of other n-1 intervals, and if so, report the abnormality in the i-th interval and report the normalized number Sig _i at the same time, thereby Obtain the normalized number corresponding to all the intervals in which abnormality occurs in the n intervals of the sample;

S7、再次获取样本，获取再次样本在n个区间中发生异常的所有区间对应的归一化数目；S7. Obtain the sample again, and obtain the normalized number corresponding to all the intervals in which the abnormality of the second sample occurs in the n intervals;

S8、基于前后两次样本对应的归一化数目计算前后两次样本检测值的相似度值；S8. Calculate the similarity value of the detection values of the two samples before and after based on the normalized number corresponding to the two samples before and after;

S9、判断相似度值是否大于设定值，若是则输出染色体不稳定性检测结果。S9. Determine whether the similarity value is greater than the set value, and if so, output the chromosome instability detection result.

较佳地，步骤S6替换为：对n个区间进行区间组合以获得k个区间组合，计算k个区间组合中每一个区间组合对应的归一化数目Sig _r，判断第r个区间组合的归一化数目Sig _r是否显著高于其他k-1个区间组合的归一化数目，若是则报告第r个区间发生异常并同时报告归一化数目Sig _r，从而获取该次样本在k个区间组合中发生异常的所有区间组合对应的归一化数目，1≤r≤k，1≤k≤n； Preferably, step S6 is replaced by: performing interval combinations on n intervals to obtain k interval combinations, calculating the normalized number Sig _r corresponding to each interval combination in the k interval combinations, and judging the normalization of the r-th interval combination Whether the normalized number Sig _r is significantly higher than the normalized number of other k-1 interval combinations, if so, report the abnormality in the rth interval and report the normalized number Sig _r at the same time, so as to obtain the sample in k intervals The normalized number corresponding to all interval combinations in which anomalies occur in the combination, 1≤r≤k, 1≤k≤n;

步骤S7替换为：再次获取样本，获取再次样本在k个区间组合中发生异常的所有区间组合对应的归一化数目。Step S7 is replaced with: obtaining the sample again, obtaining the normalized number corresponding to all the interval combinations in which the abnormality occurs in the k interval combinations for the second sample.

较佳地，在步骤S2中，样本包括血液、生物组织、脱落细胞、细胞间隙液、胸腔积液、唾液或脑脊液，采用DNA序列测序系统进行DNA序列测序。Preferably, in step S2, the sample includes blood, biological tissue, exfoliated cells, intercellular fluid, pleural effusion, saliva or cerebrospinal fluid, and DNA sequence sequencing is performed using a DNA sequence sequencing system.

较佳地，在步骤S5中，归一化采用的算法为GC含量算法、二级结构算法或其他DNA序列特征算法。Preferably, in step S5, the algorithm used for normalization is a GC content algorithm, a secondary structure algorithm, or other DNA sequence feature algorithm.

较佳地，在步骤S6中，判断第i个区间的归一化数目Sig _i是否显著高于其他n-1个区间的归一化数目的统计学方法包括Sig _i偏离其它n-1个区间的归一化数目的均值的不小于2倍方差。 Preferably, in step S6, _{the statistical method for judging whether the normalized number Sig i of the i} -th interval is significantly higher than the normalized number of _{other n-1 intervals includes that Sig i} deviates from other n-1 intervals The mean of the normalized number is not less than 2 times the variance.

较佳地，在步骤S6中，判断第r个区间组合的归一化数目Sig _r是否显著高于其他k-1个区间组合的归一化数目的统计学方法包括T检验算法。 Preferably, in step S6, _{the statistical method for judging whether the normalized number Sig r of the r} -th interval combination is significantly higher than the normalized number of other k-1 interval combinations includes a T-test algorithm.

较佳地，在步骤S8中，计算前后两次样本检测值的相似度值的统计学计算方法包括但不限于Pearson相关系数、秩相关系数。Preferably, in step S8, the statistical calculation method for calculating the similarity value of the two sample detection values before and after includes but not limited to Pearson correlation coefficient and rank correlation coefficient.

本发明还提供一种染色体不稳定性检测系统，其特点在于，其包括划分模块、采集模块、对比模块、计数模块、处理模块、判断模块、计算模块和输出模块；The present invention also provides a chromosome instability detection system, which is characterized in that it includes a division module, an acquisition module, a comparison module, a counting module, a processing module, a judgment module, a calculation module and an output module;

所述划分模块用于对参考DNA序列的所有片段进行区间划分以划分出n个区间；The dividing module is used to divide all segments of the reference DNA sequence into intervals to divide n intervals;

所述采集模块用于采集样本，从采集的样本中提取出所需的DNA，并对提取出的DNA进行DNA序列测序以获得一组含有多个片段的DNA序列，对每一片段进行片段编号；The collection module is used to collect samples, extract the required DNA from the collected samples, and perform DNA sequence sequencing on the extracted DNA to obtain a set of DNA sequences containing multiple fragments, and perform fragment numbering on each fragment ；

所述对比模块用于将获得的DNA序列的所有片段与参考DNA序列的对应片段进行碱基的一一对比，输出所有的DNA序列的片段与参考DNA序列的对应片段高度相似的片段编号、片段起始位置、片段终止位置和片段碱基一一对比结果；The comparison module is used to compare all the fragments of the obtained DNA sequence with the corresponding fragments of the reference DNA sequence one by one, and output the fragment numbers and fragments of all the fragments of the DNA sequence that are highly similar to the corresponding fragments of the reference DNA sequence. The start position, the end position of the fragment and the base of the fragment are compared one by one;

所述计数模块用于计数所有高度相似的片段中落入参考DNA序列的第i个区间的数目C _i，1≤i≤n； _{The counting module is used to count the number C i of} all highly similar fragments falling in the i-th interval of the reference DNA sequence, 1≤i≤n;

所述处理模块用于对数目C _i进行归一化处理以获得归一化数目Sig _i； The processing module is used _{for normalizing the number C i} to obtain the normalized number Sig _i ;

所述判断模块用于判断第i个区间的归一化数目Sig _i是否显著高于其他n-1个区间的归一化数目，若是则报告第i个区间发生异常并同时报告归一化数目Sig _i，从而获取该次样本在n个区间中发生异常的所有区间对应的归一化数目； The judgment module is used to judge whether the normalized number Sig _{i of the i} -th interval is significantly higher than the normalized number of other n-1 intervals, and if so, report that the i-th interval is abnormal and report the normalized number at the same time Sig _i , so as to obtain the normalized number corresponding to all the intervals in which abnormality occurs in the n intervals of this sample;

再次调用所述采集模块、对比模块、计数模块、处理模块和判断模块用于再次采集样本，获取再次样本在n个区间中发生异常的所有区间对应的归一化数目；Calling the acquisition module, comparison module, counting module, processing module, and judgment module again to collect samples again, and obtain the normalized numbers corresponding to all the intervals in which the abnormality of the re-sample occurs in the n intervals;

所述计算模块用于基于前后两次样本对应的归一化数目计算前后两次样本检测值的相似度值；The calculation module is configured to calculate the similarity value of the detection value of the two samples before and after based on the normalized number corresponding to the two samples before and after;

所述输出模块用于判断相似度值是否大于设定值，若是则输出染色体不稳定性检测结果。The output module is used to determine whether the similarity value is greater than the set value, and if so, output the chromosome instability detection result.

较佳地，所述系统还包括组合模块，所述组合模块用于对n个区间进行区间组合以获得k个区间组合，计算k个区间组合中每一个区间组合对应的归一化数目Sig _r； Preferably, the system further includes a combination module, which is used to combine n intervals to obtain k interval combinations, and calculate the normalized number Sig _{r corresponding to each interval combination in the k interval combinations.} ；

所述判断模块用于判断第r个区间组合的归一化数目Sig _r是否显著高于其他k-1个区间组合的归一化数目，若是则报告第r个区间发生异常并同时报告归一化数目Sig _r，从而获取该次样本在k个区间组合中发生异常的所有区间组合对应的归一化数目，1≤r≤k，1≤k≤n； The judgment module is used to judge whether the normalized number Sig _{r of the r} -th interval combination is significantly higher than the normalized number of other k-1 interval combinations, and if so, report an abnormality in the r-th interval and report the normalization at the same time Reduce the number Sig _r , so as to obtain the normalized number corresponding to all the interval combinations in the k interval combinations of this sample, 1≤r≤k, 1≤k≤n;

再次调用所述采集模块、对比模块、计数模块、处理模块、组合模块和判断模块用于再次采集样本获取再次样本在k个区间组合中发生异常的所有区间组合对应的归一化数目。The acquisition module, the comparison module, the counting module, the processing module, the combination module, and the judgment module are called again to acquire the samples again to obtain the normalized numbers corresponding to all the interval combinations in which the re-samples are abnormal in the k interval combinations.

较佳地，样本包括血液、生物组织、脱落细胞、细胞间隙液、胸腔积液、唾液或脑脊液，采用DNA序列测序系统进行DNA序列测序。Preferably, the sample includes blood, biological tissue, exfoliated cells, intercellular fluid, pleural effusion, saliva, or cerebrospinal fluid, and DNA sequence sequencing is performed using a DNA sequence sequencing system.

较佳地，归一化采用的算法为GC含量算法、二级结构算法或其他DNA序列特征算法。Preferably, the algorithm used for normalization is a GC content algorithm, a secondary structure algorithm, or other DNA sequence feature algorithm.

较佳地，所述判断模块用于判断第i个区间的归一化数目Sig _i是否显著高于其他n-1个区间的归一化数目所采用的统计学方法包括Sig _i偏离其它n-1个区间的归一化数目的均值的不小于2倍方差。 Preferably, the judging module is used to judge whether the normalized number Sig _{i of the i} -th interval is significantly higher than the normalized number of other n-1 intervals. The statistical method used includes the _{deviation of Sig i} from other n- The mean value of the normalized number of an interval is not less than 2 times the variance.

较佳地，所述判断模块用于判断第r个区间组合的归一化数目Sig _r是否显著高于其他k-1个区间组合的归一化数目所采用的统计学方法包括T检验算法。 Preferably, the statistical method used by the judging module for judging whether the normalized number Sig _{r of the r} -th interval combination is significantly higher than the normalized number of other k-1 interval combinations includes a T-test algorithm.

较佳地，所述计算模块用于计算前后两次样本检测值的相似度值的统计学计算方法包括但不限于Pearson相关系数、秩相关系数。Preferably, the statistical calculation method used by the calculation module to calculate the similarity value of the detection values of the two samples before and after includes but not limited to Pearson correlation coefficient and rank correlation coefficient.

本发明还提供一种试剂盒，其特点在于，其包括上述的染色体不稳定性检测系统。The present invention also provides a kit, which is characterized in that it includes the above-mentioned chromosome instability detection system.

在符合本领域常识的基础上，上述各优选条件，可任意组合，即得本发明各较佳实例。On the basis of conforming to common knowledge in the field, the above-mentioned preferred conditions can be combined arbitrarily to obtain preferred examples of the present invention.

本发明的积极进步效果在于：The positive and progressive effects of the present invention are:

本发明收集样本，能够精确地检测出样本中染色体不稳定区域，为下一步临床早期诊断和制定个体化治疗方案提供科学依据。The invention collects samples, can accurately detect chromosomal unstable regions in the samples, and provides scientific basis for the next stage of clinical early diagnosis and formulation of individualized treatment plans.

Description of the drawings

图1为本发明实施例1的染色体不稳定性检测方法的流程图。Fig. 1 is a flowchart of a method for detecting chromosome instability according to Embodiment 1 of the present invention.

图2为本发明实施例1的染色体不稳定性检测系统的结构框图。Figure 2 is a structural block diagram of the chromosome instability detection system according to Embodiment 1 of the present invention.

图3为本发明实施例2的染色体不稳定性检测方法的流程图。Fig. 3 is a flow chart of the method for detecting chromosome instability according to Embodiment 2 of the present invention.

图4为本发明实施例2的染色体不稳定性检测系统的结构框图。Fig. 4 is a structural block diagram of a chromosome instability detection system according to Embodiment 2 of the present invention.

Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

实施例1Example 1

如图1所示，本实施例提供一种染色体不稳定性检测方法，其包括以下步骤：As shown in Figure 1, this embodiment provides a method for detecting chromosome instability, which includes the following steps:

步骤101、对参考DNA序列的所有片段进行区间划分以划分出n个区间。Step 101: Perform interval division on all fragments of the reference DNA sequence to divide n intervals.

步骤102、获取样本，从获取的样本中提取出所需的DNA，并对提取出的DNA进行DNA序列测序以获得一组含有多个片段的DNA序列，对每一片段进行片段编号；Step 102: Obtain a sample, extract the required DNA from the obtained sample, perform DNA sequence sequencing on the extracted DNA to obtain a set of DNA sequences containing multiple fragments, and perform fragment numbering on each fragment;

在步骤102中，样本包括血液、生物组织、脱落细胞、细胞间隙液、胸腔积液、唾液或脑脊液。In step 102, the sample includes blood, biological tissue, exfoliated cells, intercellular fluid, pleural effusion, saliva, or cerebrospinal fluid.

采用DNA序列测序系统进行DNA序列测序，DNA序列测序系统包括但不限于Illumina HiSeq、NextSeq、Life Ion、BGISEQ200、BGISEQ500、BGISEQ2000、PacBio、MinIon、Nanopore、Roche Solid等。DNA sequence sequencing is performed using a DNA sequence sequencing system, which includes but is not limited to Illumina HiSeq, NextSeq, Life Ion, BGISEQ200, BGISEQ500, BGISEQ2000, PacBio, MinIon, Nanopore, RocheSolid, etc.

步骤103、将获得的DNA序列的所有片段与参考DNA序列的对应片段进行碱基的一一对比，输出所有的DNA序列的片段与参考DNA序列的对应片段高度相似的片段编号、片段起始位置、片段终止位置和片段碱基一一对比结果。Step 103: Perform a base-by-base comparison of all the fragments of the obtained DNA sequence with the corresponding fragments of the reference DNA sequence, and output the fragment numbers and the starting positions of the fragments that are highly similar to the corresponding fragments of the reference DNA sequence. , Fragment termination position and fragment base one-by-one comparison result.

步骤104、计数所有高度相似的片段中落入参考DNA序列的第i个区间的数目C _i，1≤i≤n。 _{Step 104: Count the number C i of} all highly similar fragments that fall into the i-th interval of the reference DNA sequence, and 1≤i≤n.

步骤105、对数目C _i进行归一化处理以获得归一化数目Sig _i。 _{Step 105:} Perform normalization processing on the number C i to obtain a normalized number Sig _i .

在步骤105中，归一化采用的算法为GC含量算法、二级结构算法或其他DNA序列特征算法。In step 105, the algorithm used for normalization is a GC content algorithm, a secondary structure algorithm, or other DNA sequence feature algorithm.

步骤106、判断第i个区间的归一化数目Sig _i是否显著高于其他n-1个区间的归一化数目，若是则报告第i个区间发生异常并同时报告归一化数目Sig _i，从而获取该次样本在n个区间中发生异常的所有区间对应的归一化数目。 Step 106: Determine whether the normalized number Sig _{i of the i} -th interval is significantly higher than the normalized number of other n-1 intervals, and if so, report an abnormality in the i-th interval and report the normalized number Sig _i at the same time, In this way, the normalized number corresponding to all the intervals in which abnormality occurs in the n intervals of the sub-sample is obtained.

在步骤106中，判断第i个区间的归一化数目Sig _i是否显著高于其他n-1个区间的归一化数目的统计学方法包括但不限于Sig _i偏离其它n-1个区间的归一化数目的均值的不小于2倍方差。 In step 106, the statistical methods for judging whether the normalized number Sig _{i of the i} -th interval is significantly higher than the normalized number of other n-1 intervals include, but are not limited to, the _{deviation of Sig i} from other n-1 intervals. The mean of the normalized number is not less than 2 times the variance.

步骤107、再次获取样本，获取再次样本在n个区间中发生异常的所有区间对应的归一化数目。Step 107: Obtain the samples again, and obtain the normalized numbers corresponding to all the intervals in which the abnormality occurs in the n intervals of the second sample.

步骤108、基于前后两次样本对应的归一化数目计算前后两次样本检测值的相似度值。Step 108: Calculate the similarity value of the detection value of the two samples before and after based on the normalized number corresponding to the two samples before and after.

在步骤108中，计算前后两次样本检测值的相似度值的统计学计算方法包括但不限于Pearson相关系数、秩相关系数。In step 108, statistical calculation methods for calculating the similarity values of the detection values of the two samples before and after include but are not limited to Pearson correlation coefficient and rank correlation coefficient.

步骤109、判断相似度值是否大于设定值，若是则输出染色体不稳定性检测结果。Step 109: Determine whether the similarity value is greater than the set value, and if so, output the chromosome instability detection result.

如图2所示，本实施例还提供一种染色体不稳定性检测系统，其包括划分模块11、采集模块12、对比模块13、计数模块14、处理模块15、判断模块16、计算模块17和输出模块18。As shown in Figure 2, this embodiment also provides a chromosome instability detection system, which includes a dividing module 11, an acquisition module 12, a comparison module 13, a counting module 14, a processing module 15, a judgment module 16, a calculation module 17, and Output module 18.

所述划分模块11用于对参考DNA序列的所有片段进行区间划分以划分出n个区间。The dividing module 11 is used to divide all segments of the reference DNA sequence to divide n intervals.

所述采集模块12用于采集样本，从采集的样本中提取出所需的DNA，并对提取出的DNA进行DNA序列测序以获得一组含有多个片段的DNA序列，对每一片段进行片段编号。样本包括血液、生物组织、脱落细胞、细胞间隙液、胸腔积液、唾液或脑脊液，采用DNA序列测序系统进行DNA序列测序。The collection module 12 is used to collect samples, extract the required DNA from the collected samples, and perform DNA sequence sequencing on the extracted DNA to obtain a set of DNA sequences containing multiple fragments, and perform fragmentation on each fragment. Numbering. Samples include blood, biological tissues, exfoliated cells, intercellular fluid, pleural effusion, saliva or cerebrospinal fluid, and DNA sequence sequencing is performed using a DNA sequencing system.

所述对比模块13用于将获得的DNA序列的所有片段与参考DNA序列的对应片段进行碱基的一一对比，输出所有的DNA序列的片段与参考DNA序列的对应片段高度相似的片段编号、片段起始位置、片段终止位置和片段碱基一一对比结果。The comparison module 13 is used to compare all the fragments of the obtained DNA sequence with the corresponding fragments of the reference DNA sequence one by one, and output the fragment numbers of all the fragments of the DNA sequence that are highly similar to the corresponding fragments of the reference DNA sequence, Fragment start position, fragment end position and fragment base are compared one by one.

所述计数模块14用于计数所有高度相似的片段中落入参考DNA序列的第i个区间的数目C _i，1≤i≤n。 _{The counting module 14 is used to count the number C i of} all highly similar fragments falling in the i-th interval of the reference DNA sequence, 1≤i≤n.

所述处理模块15用于对数目C _i进行归一化处理以获得归一化数目Sig _i，归一化采用的算法为GC含量算法、二级结构算法或其他DNA序列特征算法。 The processing module 15 is used to _{normalize the number C i} to obtain the normalized number Sig _i , and the algorithm used for the normalization is a GC content algorithm, a secondary structure algorithm or other DNA sequence feature algorithms.

所述判断模块16用于判断第i个区间的归一化数目Sig _i是否显著高于其他n-1个区间的归一化数目，若是则报告第i个区间发生异常并同时报告归一化数目Sig _i，从而获取该次样本在n个区间中发生异常的所有区间对应的归一化数目。 The judgment module 16 is used to judge whether the normalized number Sig _{i of the i} -th interval is significantly higher than the normalized number of other n-1 intervals, and if so, report the abnormality in the i-th interval and report the normalization at the same time The number Sig _i , so as to obtain the normalized number corresponding to all the intervals in the n intervals in which the sample is abnormal.

所述判断模块16用于判断第i个区间的归一化数目Sig _i是否显著高于其他n-1个区间的归一化数目所采用的统计学方法包括Sig _i偏离其它n-1个区间的归一化数目的均值的不小于2倍方差。 The judgment module 16 is used to judge whether the normalized number Sig _{i of the i} -th interval is significantly higher than the normalized number of other n-1 intervals. The statistical method used includes the _{deviation of Sig i} from other n-1 intervals. The mean of the normalized number is not less than 2 times the variance.

再次调用所述采集模块12、对比模块13、计数模块14、处理模块15和判断模块16用于再次采集样本，获取再次样本在n个区间中发生异常的所有区间对应的归一化数目。The collection module 12, the comparison module 13, the counting module 14, the processing module 15 and the judgment module 16 are called again to collect the samples again, and obtain the normalized numbers corresponding to all the intervals in which the abnormality of the re-samples occurs in the n intervals.

所述计算模块17用于基于前后两次样本对应的归一化数目计算前后两次样本检测值的相似度值。所述计算模块用于计算前后两次样本检测值的相似度值的统计学计算方法包括但不限于Pearson相关系数、秩相关系数。The calculation module 17 is configured to calculate the similarity value of the detection value of the two samples before and after based on the normalized number corresponding to the two samples before and after. The statistical calculation method used by the calculation module to calculate the similarity value of the detection value of the two samples before and after includes but not limited to the Pearson correlation coefficient and the rank correlation coefficient.

所述输出模块18用于判断相似度值是否大于设定值，若是则输出染色体不稳定性检测结果。The output module 18 is used to determine whether the similarity value is greater than the set value, and if so, output the chromosome instability detection result.

本实施例还提供一种试剂盒，其包括上述的染色体不稳定性检测系统。This embodiment also provides a kit, which includes the above-mentioned chromosome instability detection system.

实施例2Example 2

如图3所示，本实施例提供一种染色体不稳定性检测方法，其包括以下步骤：As shown in FIG. 3, this embodiment provides a method for detecting chromosome instability, which includes the following steps:

步骤201、对参考DNA序列的所有片段进行区间划分以划分出n个区间。Step 201: Perform interval division on all fragments of the reference DNA sequence to divide n intervals.

步骤202、获取样本，从获取的样本中提取出所需的DNA，并对提取出的DNA进行DNA序列测序以获得一组含有多个片段的DNA序列，对每一片段进行片段编号；Step 202: Obtain a sample, extract the required DNA from the obtained sample, perform DNA sequence sequencing on the extracted DNA to obtain a set of DNA sequences containing multiple fragments, and perform fragment numbering on each fragment;

在步骤202中，样本包括血液、生物组织、脱落细胞、细胞间隙液、胸腔积液、唾液或脑脊液。In step 202, the sample includes blood, biological tissue, exfoliated cells, intercellular fluid, pleural effusion, saliva, or cerebrospinal fluid.

步骤203、将获得的DNA序列的所有片段与参考DNA序列的对应片段进行碱基的一一对比，输出所有的DNA序列的片段与参考DNA序列的对应片段高度相似的片段编号、片段起始位置、片段终止位置和片段碱基一一对比结果。Step 203: Perform a base-by-base comparison of all the fragments of the obtained DNA sequence with the corresponding fragments of the reference DNA sequence, and output the fragment numbers and the starting positions of the fragments that are highly similar to the corresponding fragments of the reference DNA sequence. , Fragment termination position and fragment base one-by-one comparison result.

步骤204、计数所有高度相似的片段中落入参考DNA序列的第i个区间的数目C _i，1≤i≤n。 _{Step 204: Count the number C i of} all highly similar fragments that fall into the i-th interval of the reference DNA sequence, and 1≤i≤n.

步骤205、对数目C _i进行归一化处理以获得归一化数目Sig _i。 _{Step 205:} Perform normalization processing on the number C i to obtain a normalized number Sig _i .

在步骤205中，归一化采用的算法为GC含量算法、二级结构算法或其他DNA序列特征算法。In step 205, the algorithm used for normalization is a GC content algorithm, a secondary structure algorithm, or other DNA sequence feature algorithm.

步骤206、对n个区间进行区间组合以获得k个区间组合，计算k个区间组合中每一个区间组合对应的归一化数目Sig _r，判断第r个区间组合的归一化数目Sig _r是否显著高于其他k-1个区间组合的归一化数目，若是则报告第r个区间发生异常并同时报告归一化数目Sig _r，从而获取该次样本在k个区间组合中发生异常的所有区间组合对应的归一化数目，1≤r≤k，1≤k≤n。 Step 206, the n intervals for interval combinations to obtain k intervals combination, calculate the k on interval of each interval corresponding to a combination of a normalized number Sig _r, determines the r th interval combinations normalized number Sig _r whether Significantly higher than the normalized number of other k-1 interval combinations. If so, report the r-th interval with an abnormality and report the normalized number Sig _r at the same time, so as to obtain all the abnormalities in the k interval combinations of this sample The normalized number corresponding to the interval combination, 1≤r≤k, 1≤k≤n.

在步骤206中，判断第r个区间组合的归一化数目Sig _r是否显著高于其他k-1个区间组合的归一化数目的统计学方法包括但不限于T检验算法。 In step 206, the statistical method for judging whether the normalized number Sig _{r of the r} -th interval combination is significantly higher than the normalized number of other k-1 interval combinations includes, but is not limited to, the T-test algorithm.

步骤207、再次获取样本，获取再次样本在k个区间组合中发生异常的所有区间组合对应的归一化数目。Step 207: Obtain the sample again, and obtain the normalized number corresponding to all the interval combinations in which the abnormality occurs in the k interval combinations for the second sample.

步骤208、基于前后两次样本对应的归一化数目计算前后两次样本检测值的相似度值。Step 208: Calculate the similarity value of the detection value of the two samples before and after based on the normalized number corresponding to the two samples before and after.

在步骤208中，计算前后两次样本检测值的相似度值的统计学计算方法包括但不限于Pearson相关系数、秩相关系数。In step 208, statistical calculation methods for calculating the similarity values of the detection values of the two samples before and after include but are not limited to Pearson correlation coefficient and rank correlation coefficient.

步骤209、判断相似度值是否大于设定值，若是则输出染色体不稳定性检测结果。Step 209: Determine whether the similarity value is greater than the set value, and if so, output the chromosome instability detection result.

如图4所示，本实施例还提供一种染色体不稳定性检测系统，其包括划分模块21、采集模块22、对比模块23、计数模块24、处理模块25、组合模块26、判断模块27、计算模块28和输出模块29。As shown in Figure 4, this embodiment also provides a chromosome instability detection system, which includes a dividing module 21, an acquisition module 22, a comparison module 23, a counting module 24, a processing module 25, a combination module 26, a judgment module 27, Calculation module 28 and output module 29.

所述划分模块21用于对参考DNA序列的所有片段进行区间划分以划分出n个区间。The dividing module 21 is used to divide all segments of the reference DNA sequence to divide n intervals.

所述采集模块22用于采集样本，从采集的样本中提取出所需的DNA，并对提取出的DNA进行DNA序列测序以获得一组含有多个片段的DNA序列，对每一片段进行片段编号。样本包括血液、生物组织、脱落细胞、细胞间隙液、胸腔积液、唾液或脑脊液，采用DNA序列测序系统进行DNA序列测序。The collection module 22 is used to collect samples, extract the required DNA from the collected samples, and perform DNA sequence sequencing on the extracted DNA to obtain a set of DNA sequences containing multiple fragments, and perform fragmentation on each fragment. Numbering. Samples include blood, biological tissues, exfoliated cells, intercellular fluid, pleural effusion, saliva or cerebrospinal fluid, and DNA sequence sequencing is performed using a DNA sequencing system.

所述对比模块23用于将获得的DNA序列的所有片段与参考DNA序列的对应片段进行碱基的一一对比，输出所有的DNA序列的片段与参考DNA序列的对应片段高度相似的片段编号、片段起始位置、片段终止位置和片段碱基一一对比结果。The comparison module 23 is used to compare all the fragments of the obtained DNA sequence with the corresponding fragments of the reference DNA sequence one by one, and output the fragment numbers of all the fragments of the DNA sequence that are highly similar to the corresponding fragments of the reference DNA sequence. Fragment start position, fragment end position and fragment base are compared one by one.

所述计数模块24用于计数所有高度相似的片段中落入参考DNA序列的第i个区间的数目C _i，1≤i≤n。 _{The counting module 24 is used to count the number C i of} all highly similar fragments falling in the i-th interval of the reference DNA sequence, 1≤i≤n.

所述处理模块25用于对数目C _i进行归一化处理以获得归一化数目Sig _i，归一化采用的算法为GC含量算法、二级结构算法或其他DNA序列特征算法。 The processing module 25 is used to _{normalize the number C i} to obtain the normalized number Sig _i , and the algorithm used for the normalization is a GC content algorithm, a secondary structure algorithm or other DNA sequence feature algorithms.

所述组合模块26用于对n个区间进行区间组合以获得k个区间组合，计算k个区间组合中每一个区间组合对应的归一化数目Sig _r。 The combination module 26 is configured to perform interval combinations on n intervals to obtain k interval combinations, and calculate the normalized number Sig _r corresponding to each interval combination in the k interval combinations.

所述判断模块27用于判断第r个区间组合的归一化数目Sig _r是否显著高于其他k-1个区间组合的归一化数目，若是则报告第r个区间发生异常并同时报告归一化数目Sig _r，从而获取该次样本在k个区间组合中发生异常的所有区间组合对应的归一化数目，1≤r≤k，1≤k≤n。 The judging module 27 is used to judge whether the normalized number Sig _{r of} the rth interval combination is significantly higher than the normalized number of other k-1 interval combinations, and if so, report that the rth interval is abnormal and report the normalized number at the same time. The number Sig _r is unified, so as to obtain the normalized number corresponding to all the interval combinations in the k interval combinations of the sample, 1≤r≤k, 1≤k≤n.

所述判断模块27用于判断第r个区间组合的归一化数目Sig _r是否显著高于其他k-1个区间组合的归一化数目所采用的统计学方法包括但不限于T检验算法。 The judging module 27 is used for judging whether the normalized number Sig _{r of the r} -th interval combination is significantly higher than the normalized number of other k-1 interval combinations. The statistical methods used include but are not limited to the T-test algorithm.

再次调用所述采集模块22、对比模块23、计数模块24、处理模块25、组合模块26和判断模块27用于再次采集样本获取再次样本在k个区间组合中发生异常的所有区间组合对应的归一化数目。The collection module 22, the comparison module 23, the counting module 24, the processing module 25, the combination module 26, and the judgment module 27 are called again to collect the samples again to obtain the results corresponding to all the interval combinations in which the abnormality occurs in the k interval combinations. A number.

所述计算模块28用于基于前后两次样本对应的归一化数目计算前后两次样本检测值的相似度值。所述计算模块用于计算前后两次样本检测值的相似度值的统计学计算方法包括但不限于Pearson相关系数、秩相关系数。The calculation module 28 is configured to calculate the similarity value of the detection value of the two samples before and after based on the normalized number corresponding to the two samples before and after. The statistical calculation method used by the calculation module to calculate the similarity value of the detection value of the two samples before and after includes but not limited to the Pearson correlation coefficient and the rank correlation coefficient.

所述输出模块29用于判断相似度值是否大于设定值，若是则输出染色体不稳定性检测结果。The output module 29 is used for judging whether the similarity value is greater than the set value, and if so, outputting the chromosome instability detection result.

虽然以上描述了本发明的具体实施方式，但是本领域的技术人员应当理解，这些仅是举例说明，本发明的保护范围是由所附权利要求书限定的。本领域的技术人员在不背离本发明的原理和实质的前提下，可以对这些实施方式做出多种变更或修改，但这些变更和修改均落入本发明的保护范围。Although the specific embodiments of the present invention are described above, those skilled in the art should understand that these are only examples, and the protection scope of the present invention is defined by the appended claims. Those skilled in the art can make various changes or modifications to these embodiments without departing from the principle and essence of the present invention, but these changes and modifications all fall within the protection scope of the present invention.

Claims

A method for detecting chromosome instability, which is characterized in that it comprises the following steps:

S1. Perform interval division on all fragments of the reference DNA sequence to divide n intervals;

S2. Obtain a sample, extract the required DNA from the obtained sample, perform DNA sequence sequencing on the extracted DNA to obtain a set of DNA sequences containing multiple fragments, and number each fragment;

S3. Compare all the fragments of the obtained DNA sequence with the corresponding fragments of the reference DNA sequence one by one, and output all the fragments of the DNA sequence that are highly similar to the corresponding fragments of the reference DNA sequence, the fragment number, the starting position of the fragment, The result of comparing the end position of the fragment and the base of the fragment one by one;

_{S4. Count the number C i of} all highly similar fragments that fall into the i-th interval of the reference DNA sequence, 1≤i≤n;

S5. Perform normalization processing on the number C _i to obtain a normalized number Sig _i ;

S6. Determine whether the normalized number Sig _{i of the i} -th interval is significantly higher than the normalized number of other n-1 intervals, and if so, report the abnormality in the i-th interval and report the normalized number Sig _i at the same time, thereby Obtain the normalized number corresponding to all the intervals in which abnormality occurs in the n intervals of the sample;

S7. Obtain the sample again, and obtain the normalized number corresponding to all the intervals in which the abnormality of the second sample occurs in the n intervals;

S8. Calculate the similarity value of the detection values of the two samples before and after based on the normalized number corresponding to the two samples before and after;

S9. Determine whether the similarity value is greater than the set value, and if so, output the chromosome instability detection result.

The method for detecting chromosome instability according to claim 1, wherein step S6 is replaced by: performing interval combinations on n intervals to obtain k interval combinations, and calculating the corresponding index for each interval combination in the k interval combinations. the number a of Sig _r, determines whether or not the r-th sections combined normalized number Sig _r significantly higher than the number of the other k-1 intervals composition normalized, if the report r intervals abnormality and also reported normalized Reduce the number Sig _r , so as to obtain the normalized number corresponding to all the interval combinations in the k interval combinations of this sample, 1≤r≤k, 1≤k≤n;

Step S7 is replaced with: obtaining the sample again, obtaining the normalized number corresponding to all the interval combinations in which the abnormality occurs in the k interval combinations for the second sample.

The method for detecting chromosome instability according to claim 1 or 2, wherein in step S2, the sample includes blood, biological tissue, exfoliated cells, intercellular fluid, pleural effusion, saliva, or cerebrospinal fluid, using DNA sequence The sequencing system performs DNA sequence sequencing.

The method for detecting chromosome instability according to claim 1 or 2, wherein in step S5, the algorithm used for normalization is a GC content algorithm, a secondary structure algorithm, or other DNA sequence feature algorithms.

The method for detecting chromosome instability according to claim 1, wherein in step S6, it is determined whether the normalized number Sig _{i of the i} -th interval is significantly higher than the normalized number of other n-1 intervals The statistical method includes that the _{deviation of Sig i} from the mean of the normalized number of other n-1 intervals is not less than 2 times the variance.

The method for detecting chromosome instability according to claim 2, wherein in step S6, it is determined whether the normalized number Sig _{r of the r} -th interval combination is significantly higher than that of other k-1 interval combinations. Statistical methods for changing the number include the T-test algorithm.

The method for detecting chromosome instability according to claim 1 or 2, characterized in that, in step S8, the statistical calculation method for calculating the similarity value of the two sample detection values before and after includes but not limited to Pearson correlation coefficient, rank Correlation coefficient.

A chromosome instability detection system, which is characterized in that it includes a division module, an acquisition module, a comparison module, a counting module, a processing module, a judgment module, a calculation module, and an output module;

The dividing module is used to divide all segments of the reference DNA sequence into intervals to divide n intervals;

The collection module is used to collect samples, extract the required DNA from the collected samples, perform DNA sequence sequencing on the extracted DNA to obtain a set of DNA sequences containing multiple fragments, and number each fragment ；

The comparison module is used to compare all the fragments of the obtained DNA sequence with the corresponding fragments of the reference DNA sequence one by one, and output the fragment numbers and fragments that are highly similar to the corresponding fragments of the reference DNA sequence. The start position, the end position of the fragment and the base of the fragment are compared one by one;

_{The counting module is used to count the number C i of} all highly similar fragments falling in the i-th interval of the reference DNA sequence, 1≤i≤n;

The processing module is used _{for normalizing the number C i} to obtain the normalized number Sig _i ;

The judgment module is used to judge whether the normalized number Sig _{i of the i} -th interval is significantly higher than the normalized number of other n-1 intervals, and if so, report that the i-th interval is abnormal and report the normalized number at the same time Sig _i , so as to obtain the normalized number corresponding to all the intervals in which abnormality occurs in the n intervals of this sample;

Calling the acquisition module, comparison module, counting module, processing module, and judgment module again to collect samples again, and obtain the normalized numbers corresponding to all the intervals in which the abnormality of the re-sample occurs in the n intervals;

The calculation module is configured to calculate the similarity value of the detection value of the two samples before and after based on the normalized number corresponding to the two samples before and after;

The output module is used to judge whether the similarity value is greater than the set value, and if so, output the chromosome instability detection result.

The chromosome instability detection system according to claim 8, wherein the system further comprises a combination module, the combination module is used to combine n intervals to obtain k interval combinations, and calculate k intervals _{The normalized number Sig r} corresponding to each interval combination in the combination;

The judgment module is used to judge whether the normalized number Sig _{r of the r} -th interval combination is significantly higher than the normalized number of other k-1 interval combinations, and if so, report an abnormality in the r-th interval and report the normalization at the same time Reduce the number Sig _r , so as to obtain the normalized number corresponding to all the interval combinations in the k interval combinations of this sample, 1≤r≤k, 1≤k≤n;

The acquisition module, the comparison module, the counting module, the processing module, the combination module, and the judgment module are called again to collect the samples again to obtain the normalized numbers corresponding to all the interval combinations in which the re-samples are abnormal in the k interval combinations.

The chromosome instability detection system according to claim 8 or 9, wherein the sample includes blood, biological tissue, exfoliated cells, intercellular fluid, pleural effusion, saliva, or cerebrospinal fluid, and the DNA sequence is performed using a DNA sequence sequencing system Sequencing.

The chromosome instability detection system according to claim 8 or 9, wherein the algorithm used for normalization is a GC content algorithm, a secondary structure algorithm, or other DNA sequence feature algorithm.

The chromosome instability detection system according to claim 8, wherein the judgment module is used to judge whether the normalized number Sig _{i of the i} -th interval is significantly higher than that of other n-1 intervals. The statistical method used for the number includes that the _{deviation of Sig i} from the mean value of the normalized number of other n-1 intervals is not less than 2 times the variance.

The chromosome instability detection system according to claim 9, wherein the judgment module is used to judge whether the normalized number Sig _{r of the r} -th interval combination is significantly higher than that of other k-1 interval combinations. The statistical methods used to change the number include the T-test algorithm.

The chromosome instability detection system according to claim 8 or 9, wherein the statistical calculation method used by the calculation module to calculate the similarity value of the test values of the two samples before and after includes but not limited to Pearson correlation coefficient, Rank correlation coefficient.

A kit, characterized in that it comprises the chromosome instability detection system according to any one of claims 8-14.