CN111312336A - Method and system for establishing biological edge identification system - Google Patents
Method and system for establishing biological edge identification system Download PDFInfo
- Publication number
- CN111312336A CN111312336A CN202010108036.7A CN202010108036A CN111312336A CN 111312336 A CN111312336 A CN 111312336A CN 202010108036 A CN202010108036 A CN 202010108036A CN 111312336 A CN111312336 A CN 111312336A
- Authority
- CN
- China
- Prior art keywords
- data
- gene
- edge
- states
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 195
- 230000014509 gene expression Effects 0.000 claims abstract description 58
- 239000011159 matrix material Substances 0.000 claims abstract description 47
- 230000009466 transformation Effects 0.000 claims abstract description 31
- 201000010099 disease Diseases 0.000 claims abstract description 25
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 25
- 230000009977 dual effect Effects 0.000 claims description 16
- 238000012706 support-vector machine Methods 0.000 claims description 14
- 238000010801 machine learning Methods 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 230000001394 metastastic effect Effects 0.000 claims description 6
- 206010061289 metastatic neoplasm Diseases 0.000 claims description 6
- 230000000683 nonmetastatic effect Effects 0.000 claims description 6
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 238000011410 subtraction method Methods 0.000 claims description 3
- 239000003814 drug Substances 0.000 claims 4
- 229940079593 drug Drugs 0.000 claims 4
- 238000000844 transformation Methods 0.000 claims 2
- 230000003993 interaction Effects 0.000 abstract description 10
- 238000011161 development Methods 0.000 abstract description 6
- 239000000090 biomarker Substances 0.000 description 16
- 206010059866 Drug resistance Diseases 0.000 description 12
- 238000012937 correction Methods 0.000 description 12
- 238000007667 floating Methods 0.000 description 6
- 102000004169 proteins and genes Human genes 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 230000001973 epigenetic effect Effects 0.000 description 4
- 238000004949 mass spectrometry Methods 0.000 description 4
- 239000002207 metabolite Substances 0.000 description 4
- 238000002493 microarray Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 206010028980 Neoplasm Diseases 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 206010027476 Metastases Diseases 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
本发明公开了一种生物边标识系统的建立方法和系统,可以简单高效的找出关键的相互作用的改变作为疾病发生发展的生物标识。其技术方案为:收集具有双状态的数据;选出相关性符合显著差异条件的基因对;对于相关性符合显著差异条件的基因对,通过矩阵变换,将基因对的表达值数据转化为代表相关性的边数据;应用特征选择算法找出边数据中分类能力最佳的基因对,将分类能力最佳的基因对作为生物边标识,从而建立起生物边标识系统。
The invention discloses a method and a system for establishing a biological edge identification system, which can simply and efficiently find out the change of the key interaction as the biological identification of the occurrence and development of the disease. The technical scheme is as follows: collecting data with two states; selecting gene pairs whose correlations meet the conditions of significant differences; for gene pairs whose correlations meet the conditions of significant differences, through matrix transformation, the expression value data of the gene pairs are converted into representative correlations. The feature selection algorithm is used to find out the gene pair with the best classification ability in the edge data, and the gene pair with the best classification ability is used as the biological edge identification, so as to establish the biological edge identification system.
Description
本发明是2014年11月13日所提出的申请号为201410640410.2、发明名称为《生物边标识系统的建立方法和系统》的发明专利申请的分案申请。The present invention is a divisional application of the invention patent application with the application number 201410640410.2 and the invention title "Method and System for Establishing Biological Edge Identification System" filed on November 13, 2014.
技术领域technical field
本发明涉及计算系统生物学和生物信息学,尤其涉及生物标识的处理方法和系统。The present invention relates to computational systems biology and bioinformatics, in particular to a method and system for processing biological markers.
背景技术Background technique
生物标识的研究一直是生物医学领域的重要课题,一个成功的生物标识能帮助医生做出准确的诊断或者提出有效的治疗方案,因此寻找合适的生物标识对攻克疾病特别是复杂疾病具有十分重要的意义。The research of biomarkers has always been an important topic in the field of biomedicine. A successful biomarker can help doctors make an accurate diagnosis or propose an effective treatment plan. Therefore, finding a suitable biomarker is very important to overcome diseases, especially complex diseases. significance.
人类复杂疾病是对病因不明确、涉及因素众多、无有效治疗手段的一类疾病的统称,如各类癌症及糖尿病等。20世纪80年代以来,高通量生物技术(如DNB芯片,高通量测序等)的迅猛发展,为人类复杂疾病的研究带来了机遇。Human complex disease is a general term for a class of diseases with unclear etiology, many factors, and no effective treatment, such as various types of cancer and diabetes. Since the 1980s, the rapid development of high-throughput biotechnology (such as DNB chips, high-throughput sequencing, etc.) has brought opportunities for the study of complex human diseases.
如何从这些技术所产生的海量数据中找出有用的生物标识也是当今生物标识研究领域所面临的一大挑战。早期的研究关注于差异表达的基因或者蛋白等生物分子,把具有区分能力的分子作为生物标识,这些方法简单直观,对于一些简单疾病也起到很好的效果,但这些方法没有考虑分子之间存在复杂的相互作用,而很多复杂疾病的发生往往是这些分子之间相互作用的改变导致的,因此这些方法在复杂疾病中的应用效果并不好。How to find useful biomarkers from the massive data generated by these technologies is also a major challenge facing the field of biomarker research today. Early research focused on differentially expressed genes or proteins and other biomolecules, and used distinguishing molecules as biomarkers. These methods are simple and intuitive, and have good effects on some simple diseases, but these methods do not consider the difference between molecules. There are complex interactions, and the occurrence of many complex diseases is often caused by changes in the interactions between these molecules, so the application of these methods in complex diseases is not effective.
正因为如此,许多研究者开始从系统或网络的角度找生物标识,即考虑生物分子间的各种相互作用所组成的网络,把具有区分能力的子网或者边集作为生物标识。目前很少有理想的方法来实现这一目的。Because of this, many researchers began to look for biomarkers from the perspective of systems or networks, that is, considering the network composed of various interactions between biomolecules, and using the sub-networks or edge sets with distinguishing ability as biomarkers. There are currently few ideal ways to achieve this.
发明内容SUMMARY OF THE INVENTION
以下给出一个或多个方面的简要概述以提供对这些方面的基本理解。此概述不是所有构想到的方面的详尽综览,并且既非旨在指认出所有方面的关键性或决定性要素亦非试图界定任何或所有方面的范围。其唯一的目的是要以简化形式给出一个或多个方面的一些概念以为稍后给出的更加详细的描述之序。A brief summary of one or more aspects is presented below to provide a basic understanding of the aspects. This summary is not an exhaustive overview of all contemplated aspects and is neither intended to identify key or critical elements of all aspects nor attempt to delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
本发明的目的在于提供一种生物边标识系统的建立方法和系统,可以简单高效的找出关键的相互作用的改变作为疾病发生发展的生物标识。The purpose of the present invention is to provide a method and system for establishing a biological edge marker system, which can simply and efficiently find changes in key interactions as biological markers for the occurrence and development of diseases.
本发明的技术方案为:本发明揭示了一种生物边标识系统的建立方法,包括:The technical scheme of the present invention is as follows: the present invention discloses a method for establishing a biological edge identification system, including:
收集具有双状态的数据;Collect data with two states;
选出相关性符合显著差异条件的基因对;Select the gene pairs whose correlation meets the conditions of significant difference;
对于相关性符合显著差异条件的基因对,通过矩阵变换,将基因对的表达值数据转化为代表相关性的边数据;For gene pairs whose correlations meet the conditions of significant difference, the expression data of gene pairs are converted into edge data representing correlations through matrix transformation;
应用特征选择算法找出边数据中分类能力最佳的基因对,将分类能力最佳的基因对作为生物边标识,从而建立起生物边标识系统。The feature selection algorithm is used to find out the gene pair with the best classification ability in the edge data, and the gene pair with the best classification ability is used as the biological edge identification, so as to establish the biological edge identification system.
根据本发明的生物边标识系统的建立方法的一实施例,所述具有双状态的数据包括:正常状态数据和疾病状态数据、转移状态数据和非转移状态数据、有药物抵抗状态的数据和无药物抵抗状态的数据。According to an embodiment of the method for establishing a biological edge identification system of the present invention, the data with dual states includes: normal state data and disease state data, transfer state data and non-transfer state data, data with drug resistance state and data without Data on drug resistance status.
根据本发明的生物边标识系统的建立方法的一实施例,所述具有双状态的数据的数据类型包括基因对的表达谱或丰度谱数据。According to an embodiment of the method for establishing a biological edge identification system of the present invention, the data type of the data with two states includes expression profile or abundance profile data of gene pairs.
根据本发明的生物边标识系统的建立方法的一实施例,在所述收集具有双状态的数据的步骤之后还包括:According to an embodiment of the method for establishing a biological edge identification system of the present invention, after the step of collecting data with two states, the method further includes:
对数据进行预处理,去除表达均值低于设定值或变异系数高于设定值的基因。The data is preprocessed to remove genes whose expression mean is lower than the set value or whose coefficient of variation is higher than the set value.
根据本发明的生物边标识系统的建立方法的一实施例,在所述选出相关性符合显著差异条件的基因对的步骤中,计算基因对在双状态下的相关系数,根据双状态下的相关系数的差异的绝对值和阈值的比较来确定相关性是否符合显著差异条件。According to an embodiment of the method for establishing a biological edge identification system of the present invention, in the step of selecting gene pairs whose correlations meet the conditions of significant difference, the correlation coefficients of the gene pairs in two states are calculated, according to the two states The absolute value of the difference of the correlation coefficient is compared with the threshold value to determine whether the correlation meets the conditions of significant difference.
根据本发明的生物边标识系统的建立方法的一实施例,在所述对于相关性符合显著差异条件的基因对,通过矩阵变换,将基因对的表达值数据转化为代表相关性的边数据的步骤中,基因对的表达值数据是矩阵形式:According to an embodiment of the method for establishing a biological edge identification system of the present invention, for the gene pairs whose correlations meet the conditions of significant difference, the expression value data of the gene pairs are converted into edge data representing the correlation through matrix transformation. In step, the expression value data of gene pairs are in matrix form:
其中,xij代表生物分子i在所述双状态中的第一状态下第j个样本的表达谱的数值或丰度谱的数值,yij代表生物分子i在所述双状态中的第二状态下第j个样本的表达谱的数值或丰度谱的数值;Wherein, x ij represents the value of the expression profile or the value of the abundance profile of the j-th sample of biomolecule i in the first state of the two states, and y ij represents the second state of the biomolecule i in the two states The numerical value of the expression profile or the numerical value of the abundance profile of the jth sample in the state;
矩阵转换的过程为:The process of matrix transformation is:
对于给定的基因对u和v,做如下变换:For a given gene pair u and v, do the following transformation:
其中,<u,v>N和<u,v>D分别是指基因对u,v在第一状态下和第二状态下的边特征,分别是基因对u和v在第一状态和第二状态下的表达谱的数值或丰度谱的数值的均值,Sxu,Sxv,Syu,Syv分别是基因对u和v在第一状态下和第二状态下的方差,k1,k2为校正系数,所有相关性符合显著差异条件的基因对得到的<u,v>N和<u,v>D所组成的矩阵就是基因对对应的边数据,边数据代表该基因对在不同状态下的相关性,每一个基因对由边数据里的两个对偶的变量或特征所刻画。Among them, <u, v> N and <u, v> D refer to the edge features of the gene pair u, v in the first state and the second state, respectively, are the mean values of expression profiles or abundance profiles of gene pairs u and v in the first and second states, respectively, Sx u , Sx v , Sy u , Sy v are the gene pairs u and v in the first and second states, respectively. The variance in the first state and the second state, k 1 , k 2 are correction coefficients, and the matrix composed of <u,v> N and <u,v> D obtained from all gene pairs whose correlations meet the conditions of significant difference is The edge data corresponding to the gene pair, the edge data represents the correlation of the gene pair in different states, and each gene pair is characterized by two dual variables or features in the edge data.
根据本发明的生物边标识系统的建立方法的一实施例,校正系数k1,k2的取值均为1。According to an embodiment of the method for establishing a biological edge identification system of the present invention, the values of the correction coefficients k 1 and k 2 are both 1.
根据本发明的生物边标识系统的建立方法的一实施例,所述特征选择算法包括机器学习中的循环增减法(Sequential Forward Floating Selection,SFFS)和支持向量机(Support Vector Machine,SVM)。According to an embodiment of the method for establishing a biological edge identification system of the present invention, the feature selection algorithm includes Sequential Forward Floating Selection (SFFS) and Support Vector Machine (SVM) in machine learning.
本发明揭示了一种生物边标识系统,包括:The present invention discloses a biological edge identification system, comprising:
信息收集模块,收集具有双状态的数据;An information collection module that collects data with two states;
基因对选取模块,选出相关性符合显著差异条件的基因对;The gene pair selection module selects gene pairs whose correlations meet the conditions of significant difference;
边数据获取模块,对于相关性符合显著差异条件的基因对,通过矩阵变换,将基因对的表达值数据转化为代表相关性的边数据;The edge data acquisition module converts the expression value data of the gene pair into edge data representing the correlation through matrix transformation for gene pairs whose correlations meet the conditions of significant difference;
生物边标识建立模块,应用特征选择算法找出边数据中分类能力最佳的基因对,将分类能力最佳的基因对作为生物边标识,从而建立起生物边标识系统。The biological edge identification building module uses the feature selection algorithm to find out the gene pair with the best classification ability in the edge data, and uses the gene pair with the best classification ability as the biological edge identification, thereby establishing the biological edge identification system.
根据本发明的生物边标识系统的一实施例,所述具有双状态的数据包括:正常状态数据和疾病状态数据、转移状态数据和非转移状态数据、有药物抵抗状态的数据和无药物抵抗状态的数据。According to an embodiment of the biological edge identification system of the present invention, the data with dual states includes: normal state data and disease state data, transfer state data and non-transfer state data, drug resistance state data and drug resistance state data The data.
根据本发明的生物边标识系统的一实施例,所述具有双状态的数据的数据类型包括基因对的表达谱或丰度谱数据。According to an embodiment of the biological edge identification system of the present invention, the data type of the data with two states includes expression profile or abundance profile data of gene pairs.
根据本发明的生物边标识系统的一实施例,在信息收集模块之后还连接:According to an embodiment of the biometric edge identification system of the present invention, after the information collection module is further connected:
预处理模块,对数据进行预处理,去除表达均值低于设定值或变异系数高于设定值的基因。The preprocessing module preprocesses the data to remove genes whose expression mean is lower than the set value or whose coefficient of variation is higher than the set value.
根据本发明的生物边标识系统的一实施例,在基因对选取模块中,计算基因对在双状态下的相关系数,根据双状态下的相关系数的差异的绝对值和阈值的比较来确定相关性是否符合显著差异条件。According to an embodiment of the biological edge identification system of the present invention, in the gene pair selection module, the correlation coefficient of the gene pair under two states is calculated, and the correlation is determined according to the comparison between the absolute value of the difference of the correlation coefficient under the two states and the threshold value. Whether the sex meets the conditions of significant difference.
根据本发明的生物边标识系统的一实施例,在边数据获取模块中,基因对的表达值数据是矩阵形式:According to an embodiment of the biological edge identification system of the present invention, in the edge data acquisition module, the expression value data of the gene pair is in the form of a matrix:
其中,xij代表生物分子i在所述双状态中的第一状态下第j个样本的表达谱的数值或丰度谱的数值,yij代表生物分子i在所述双状态中的第二状态下第j个样本的表达谱的数值或丰度谱的数值;Wherein, x ij represents the value of the expression profile or the value of the abundance profile of the j-th sample of biomolecule i in the first state of the two states, and y ij represents the second state of the biomolecule i in the two states The numerical value of the expression profile or the numerical value of the abundance profile of the jth sample in the state;
矩阵转换的过程为:The process of matrix transformation is:
对于给定的基因对u和v,做如下变换:For a given gene pair u and v, do the following transformation:
其中,<u,v>N和<u,v>D分别是指基因对u,v在第一状态下和第二状态下的边特征,分别是基因对u和v在第一状态和第二状态下的表达谱的数值或丰度谱的数值的均值,Sxu,Sxv,Syu,Syv分别是基因对u和v在第一状态下和第二状态下的方差,k1,k2为校正系数,所有相关性符合显著差异条件的基因对得到的<u,v>N和<u,v>D所组成的矩阵就是基因对对应的边数据,边数据代表该基因对在不同状态下的相关性,每一个基因对由边数据里的两个对偶的变量或特征所刻画。Among them, <u, v> N and <u, v> D refer to the edge features of the gene pair u, v in the first state and the second state, respectively, are the mean values of expression profiles or abundance profiles of gene pairs u and v in the first and second states, respectively, Sx u , Sx v , Sy u , Sy v are the gene pairs u and v in the first and second states, respectively. The variance in the first state and the second state, k 1 , k 2 are correction coefficients, and the matrix composed of <u,v> N and <u,v> D obtained from all gene pairs whose correlations meet the conditions of significant difference is The edge data corresponding to the gene pair, the edge data represents the correlation of the gene pair in different states, and each gene pair is characterized by two dual variables or features in the edge data.
根据本发明的生物边标识系统的一实施例,校正系数k1,k2的取值均为1。According to an embodiment of the biological edge identification system of the present invention, the values of the correction coefficients k 1 and k 2 are both 1.
根据本发明的生物边标识系统的一实施例,生物边标识建立模块中的特征选择算法包括机器学习中的循环增减法(Sequential Forward Floating Selection,SFFS)和支持向量机(Support Vector Machine,SVM)。According to an embodiment of the biological edge identification system of the present invention, the feature selection algorithm in the biological edge identification building module includes a cyclic addition and subtraction method (Sequential Forward Floating Selection, SFFS) and a Support Vector Machine (SVM) in machine learning. ).
本发明对比现有技术有如下的有益效果:传统的生物标识用一个或多个基因(也称为分子)的表达量来区分不同的状态,而生物边标识是用基因对之间的相关性来区分不同状态。由于在生物体内,分子间呈现出错综复杂的相互作用网络,这些相互作用的改变往往是导致复杂疾病发生发展的关键因素,因此生物边标识比传统的生物标识有更强的生物学意义,能找出这些关键的相互作用作为疾病发生发展的生物标识,能更好的揭示内在的机制。Compared with the prior art, the present invention has the following beneficial effects: traditional biological markers use the expression of one or more genes (also called molecules) to distinguish different states, while biological edge markers use the correlation between pairs of genes to distinguish different states. Because there are intricate interaction networks between molecules in living organisms, changes in these interactions are often the key factors leading to the development of complex diseases. Therefore, biological edge markers have stronger biological significance than traditional biological markers. Identifying these key interactions as biomarkers for the occurrence and development of diseases can better reveal the underlying mechanisms.
附图说明Description of drawings
图1示出了本发明的生物边标识系统的建立方法的第一实施例的流程图。FIG. 1 shows a flow chart of a first embodiment of a method for establishing a biological edge identification system of the present invention.
图2示出了本发明的生物边标识系统的建立方法的第二实施例的流程图。FIG. 2 shows a flow chart of the second embodiment of the method for establishing the biological edge identification system of the present invention.
图3示出了本发明的生物边标识系统的第一实施例的原理图。Figure 3 shows a schematic diagram of a first embodiment of the biometric edge identification system of the present invention.
图4示出了本发明的生物边标识系统的第二实施例的原理图。Figure 4 shows a schematic diagram of a second embodiment of the biometric edge identification system of the present invention.
图5示出了生物边标识系统的建立及应用的流程示意图。FIG. 5 shows a schematic flowchart of the establishment and application of the biological edge identification system.
具体实施方式Detailed ways
在结合以下附图阅读本公开的实施例的详细描述之后,能够更好地理解本发明的上述特征和优点。在附图中,各组件不一定是按比例绘制,并且具有类似的相关特性或特征的组件可能具有相同或相近的附图标记。The above-described features and advantages of the present invention can be better understood after reading the detailed description of the embodiments of the present disclosure in conjunction with the following drawings. In the drawings, components are not necessarily drawn to scale and components with similar related characteristics or features may have the same or similar reference numbers.
图1示出了本发明的生物边标识系统的建立方法的第一实施例的流程。请参见图1,本实施例的生物边标识系统的建立方法的实施步骤如下。FIG. 1 shows the flow of the first embodiment of the method for establishing the biological edge identification system of the present invention. Referring to FIG. 1 , the implementation steps of the method for establishing a biological edge identification system in this embodiment are as follows.
步骤S11:收集具有双状态的数据。Step S11: Collect data with two states.
具有双状态的数据包括:正常状态数据和疾病状态数据、转移状态数据和非转移状态数据、有药物抵抗状态的数据和无药物抵抗状态的数据。具有双状态的数据的数据类型包括基因对的表达谱或丰度谱数据,包括microarray、蛋白质谱、代谢物质质谱、表观遗传谱数据等。Data with dual states includes: normal state data and disease state data, metastatic state data and non-metastatic state data, data with drug resistance state and data without drug resistance state. The data types of data with two states include gene pair expression profile or abundance profile data, including microarray, protein profile, metabolite mass spectrometry, epigenetic profile data, etc.
步骤S12:选出相关性符合显著差异条件的基因对。Step S12: Select gene pairs whose correlations meet the conditions of significant difference.
在这一步骤中,计算基因对在双状态下的相关系数,根据双状态下的相关系数的差异的绝对值和阈值的比较来确定相关性是否符合显著差异条件。In this step, the correlation coefficient of the gene pair under the two states is calculated, and whether the correlation meets the condition of significant difference is determined according to the comparison of the absolute value of the difference of the correlation coefficient under the two states and the threshold value.
比如,基因1和基因2的在正常情况下的相关系数是0.8而在疾病状态下为-0.6,则它们相关性差异的绝对值为1.4,假定阈值为0.8,则可以确定基因1和基因2为相关性有显著差异的基因对。For example, if the correlation coefficient between gene 1 and gene 2 is 0.8 under normal conditions and -0.6 in disease state, then the absolute value of their correlation difference is 1.4. Assuming a threshold of 0.8, gene 1 and gene 2 can be determined. Gene pairs with significantly different correlations.
步骤S13:对于相关性符合显著差异条件的基因对,通过矩阵变换,将基因对的表达值数据转化为代表相关性的边数据。Step S13: For gene pairs whose correlations meet the conditions of significant difference, the expression value data of the gene pairs are converted into edge data representing correlations through matrix transformation.
这一步骤是实施本发明的关键所在。基因对的表达值数据是矩阵形式:This step is the key to the implementation of the present invention. The expression value data for gene pairs are in matrix form:
其中,xij代表生物分子i在所述双状态中的第一状态下第j个样本的表达谱的数值或丰度谱的数值,yij代表生物分子i在所述双状态中的第二状态下第j个样本的表达谱的数值或丰度谱的数值。Wherein, x ij represents the value of the expression profile or the value of the abundance profile of the j-th sample of biomolecule i in the first state of the two states, and y ij represents the second state of the biomolecule i in the two states The numerical value of the expression profile or the numerical value of the abundance profile of the jth sample in the state.
矩阵转换的过程为:The process of matrix transformation is:
对于给定的基因对u和v,做如下变换:For a given gene pair u and v, do the following transformation:
其中,<u,v>N和<u,v>D分别是指基因对u,v在第一状态下和第二状态下的边特征,分别是基因对u和v在第一状态和第二状态下的表达谱的数值或丰度谱的数值的均值,Sxu,Sxv,Syu,Syv分别是基因对u和v在第一状态下和第二状态下的方差,k1,k2为校正系数(其中校正系数k1,k2的取值均为1),所有相关性符合显著差异条件的基因对得到的<u,v>N和<u,v>D所组成的矩阵就是基因对对应的边数据,边数据代表该基因对在不同状态下的相关性,每一个基因对由边数据里的两个对偶的变量或特征所刻画。每个基因对可以根据上述矩阵转换算出一个2*(m+n)的小矩阵,把这些小矩阵按行堆叠在一起得到的大矩阵就是所谓的边数据。这个边数据中每一行代表一个基因对,每一列代表一个样本。Among them, <u, v> N and <u, v> D refer to the edge features of the gene pair u, v in the first state and the second state, respectively, are the mean values of expression profiles or abundance profiles of gene pairs u and v in the first and second states, respectively, Sx u , Sx v , Sy u , Sy v are the gene pairs u and v in the first and second states, respectively. The variance in the first state and the second state, k 1 , k 2 are the correction coefficients (where the correction coefficients k 1 , k 2 are both 1), and all the gene pairs whose correlations meet the conditions of significant difference get <u ,v> N and <u,v> D matrix is the edge data corresponding to the gene pair, the edge data represents the correlation of the gene pair in different states, each gene pair is composed of two duals in the edge data. variable or characteristic. For each gene pair, a small matrix of 2*(m+n) can be calculated according to the above matrix transformation, and the large matrix obtained by stacking these small matrices together in rows is the so-called edge data. Each row in this edge data represents a gene pair, and each column represents a sample.
步骤S14:应用特征选择算法找出边数据中分类能力最佳的基因对,将分类能力最佳的基因对作为生物边标识,从而建立起生物边标识系统。Step S14: Find out the gene pair with the best classification ability in the edge data by applying the feature selection algorithm, and use the gene pair with the best classification ability as the biological edge identification, thereby establishing the biological edge identification system.
在本步骤中,特征选择算法包括机器学习中的循环增减法(Sequential ForwardFloating Selection,SFFS)和支持向量机(Support Vector Machine,SVM)。In this step, the feature selection algorithm includes Sequential Forward Floating Selection (SFFS) and Support Vector Machine (SVM) in machine learning.
图2示出了本发明的生物边标识系统的建立方法的第二实施例的流程。请参见图2,本实施例的生物边标识系统的建立方法的实施步骤如下。FIG. 2 shows the flow of the second embodiment of the method for establishing the biometric edge identification system of the present invention. Referring to FIG. 2 , the implementation steps of the method for establishing a biological edge identification system in this embodiment are as follows.
步骤S21:收集具有双状态的数据。Step S21: Collect data with two states.
具有双状态的数据包括:正常状态数据和疾病状态数据、转移状态数据和非转移状态数据、有药物抵抗状态的数据和无药物抵抗状态的数据。具有双状态的数据的数据类型包括基因对的表达谱或丰度谱数据,包括microarray、蛋白质谱、代谢物质质谱、表观遗传谱数据等。Data with dual states includes: normal state data and disease state data, metastatic state data and non-metastatic state data, data with drug resistance state and data without drug resistance state. The data types of data with two states include gene pair expression profile or abundance profile data, including microarray, protein profile, metabolite mass spectrometry, epigenetic profile data, etc.
步骤S22:对数据进行预处理,去除表达均值低于设定值或变异系数高于设定值的基因。Step S22: Preprocess the data to remove genes whose expression mean is lower than the set value or whose coefficient of variation is higher than the set value.
步骤S23:选出相关性符合显著差异条件的基因对。Step S23: Select gene pairs whose correlations meet the conditions of significant difference.
在这一步骤中,计算基因对在双状态下的相关系数,根据双状态下的相关系数的差异的绝对值和阈值的比较来确定相关性是否符合显著差异条件。In this step, the correlation coefficient of the gene pair under the two states is calculated, and whether the correlation meets the condition of significant difference is determined according to the comparison of the absolute value of the difference of the correlation coefficient under the two states and the threshold value.
比如,基因1和基因2的在正常情况下的相关系数是0.8而在疾病状态下为-0.6,则它们相关性差异的绝对值为1.4,假定阈值为0.8,则可以确定基因1和基因2为相关性有显著差异的基因对。For example, if the correlation coefficient between gene 1 and gene 2 is 0.8 under normal conditions and -0.6 in disease state, then the absolute value of their correlation difference is 1.4. Assuming a threshold of 0.8, gene 1 and gene 2 can be determined. Gene pairs with significantly different correlations.
步骤S24:对于相关性符合显著差异条件的基因对,通过矩阵变换,将基因对的表达值数据转化为代表相关性的边数据。Step S24: For gene pairs whose correlations meet the conditions of significant difference, transform the expression value data of the gene pairs into edge data representing the correlations through matrix transformation.
这一步骤是实施本发明的关键所在。基因对的表达值数据是矩阵形式:This step is the key to the implementation of the present invention. The expression value data for gene pairs are in matrix form:
其中,xij代表生物分子i在所述双状态中的第一状态下第j个样本的表达谱的数值或丰度谱的数值,yij代表生物分子i在所述双状态中的第二状态下第j个样本的表达谱的数值或丰度谱的数值。Wherein, x ij represents the value of the expression profile or the value of the abundance profile of the j-th sample of biomolecule i in the first state of the two states, and y ij represents the second state of the biomolecule i in the two states The numerical value of the expression profile or the numerical value of the abundance profile of the jth sample in the state.
矩阵转换的过程为:The process of matrix transformation is:
对于给定的基因对u和v,做如下变换:For a given gene pair u and v, do the following transformation:
其中,<u,v>N和<u,v>D分别是指基因对u,v在第一状态下和第二状态下的边特征,分别是基因对u和v在第一状态和第二状态下的表达谱的数值或丰度谱的数值的均值,Sxu,Sxv,Syu,Syv分别是基因对u和v在第一状态下和第二状态下的方差,k1,k2为校正系数(其中校正系数k1,k2的取值均为1),所有相关性符合显著差异条件的基因对得到的<u,v>N和<u,v>D所组成的矩阵就是基因对对应的边数据,边数据代表该基因对在不同状态下的相关性,每一个基因对由边数据里的两个对偶的变量或特征所刻画。每个基因对可以根据上述矩阵转换算出一个2*(m+n)的小矩阵,把这些小矩阵按行堆叠在一起得到的大矩阵就是所谓的边数据。这个边数据中每一行代表一个基因对,每一列代表一个样本。Among them, <u, v> N and <u, v> D refer to the edge features of the gene pair u, v in the first state and the second state, respectively, are the mean values of expression profiles or abundance profiles of gene pairs u and v in the first and second states, respectively, Sx u , Sx v , Sy u , Sy v are the gene pairs u and v in the first and second states, respectively. The variance in the first state and the second state, k 1 , k 2 are the correction coefficients (where the correction coefficients k 1 , k 2 are both 1), and all the gene pairs whose correlations meet the conditions of significant difference get <u ,v> N and <u,v> D matrix is the edge data corresponding to the gene pair, the edge data represents the correlation of the gene pair in different states, each gene pair is composed of two duals in the edge data. variable or characteristic. For each gene pair, a small matrix of 2*(m+n) can be calculated according to the above matrix transformation, and the large matrix obtained by stacking these small matrices together in rows is the so-called edge data. Each row in this edge data represents a gene pair, and each column represents a sample.
步骤S25:应用特征选择算法找出边数据中分类能力最佳的基因对,将分类能力最佳的基因对作为生物边标识,从而建立起生物边标识系统。Step S25 : find out the gene pair with the best classification ability in the edge data by applying the feature selection algorithm, and use the gene pair with the best classification ability as the biological edge identification, thereby establishing the biological edge identification system.
在本步骤中,特征选择算法包括机器学习中的循环增减法(Sequential ForwardFloating Selection,SFFS)和支持向量机(Support Vector Machine,SVM)。In this step, the feature selection algorithm includes Sequential Forward Floating Selection (SFFS) and Support Vector Machine (SVM) in machine learning.
图3示出了本发明的生物边标识系统的第一实施例的原理。请参见图3,本实施例的生物边标识系统包括:信息收集模块11、基因对选取模块12、边数据获取模块13、生物边标识建立模块14。Figure 3 shows the principle of the first embodiment of the biometric edge identification system of the present invention. Referring to FIG. 3 , the biological edge identification system in this embodiment includes: an information collection module 11 , a gene pair selection module 12 , an edge data acquisition module 13 , and a biological edge identification establishment module 14 .
这些模块之间的连接关系是,信息收集模块11后连接基因对选取模块12,基因对选取模块12后连接边数据获取模块13,边数据获取模块13后连接生物边标识建立模块14。The connection between these modules is that the information collection module 11 is connected to the gene pair selection module 12 , the gene pair selection module 12 is connected to the edge data acquisition module 13 , and the edge data acquisition module 13 is connected to the biological edge identification establishment module 14 .
信息收集模块11收集具有双状态的数据。具有双状态的数据包括:正常状态数据和疾病状态数据、转移状态数据和非转移状态数据、有药物抵抗状态的数据和无药物抵抗状态的数据。具有双状态的数据的数据类型包括基因对的表达谱或丰度谱数据,包括microarray、蛋白质谱、代谢物质质谱、表观遗传谱数据等。The information collection module 11 collects data with two states. Data with dual states includes: normal state data and disease state data, metastatic state data and non-metastatic state data, data with drug resistance state and data without drug resistance state. The data types of data with two states include gene pair expression profile or abundance profile data, including microarray, protein profile, metabolite mass spectrometry, epigenetic profile data, etc.
基因对选取模块12选出相关性符合显著差异条件的基因对。基因对选取模块12计算基因对在双状态下的相关系数,根据双状态下的相关系数的差异的绝对值和阈值的比较来确定相关性是否符合显著差异条件。The gene pair selection module 12 selects gene pairs whose correlations meet the conditions of significant difference. The gene pair selection module 12 calculates the correlation coefficient of the gene pair under two states, and determines whether the correlation meets the condition of significant difference according to the comparison between the absolute value of the difference of the correlation coefficient under the two states and the threshold value.
比如,基因1和基因2的在正常情况下的相关系数是0.8而在疾病状态下为-0.6,则它们相关性差异的绝对值为1.4,假定阈值为0.8,则可以确定基因1和基因2为相关性有显著差异的基因对。For example, if the correlation coefficient between gene 1 and gene 2 is 0.8 under normal conditions and -0.6 in disease state, then the absolute value of their correlation difference is 1.4. Assuming a threshold of 0.8, gene 1 and gene 2 can be determined. Gene pairs with significantly different correlations.
边数据获取模块13对于相关性符合显著差异条件的基因对,通过矩阵变换,将基因对的表达值数据转化为代表相关性的边数据。The edge data acquisition module 13 converts the expression value data of the gene pair into edge data representing the correlation through matrix transformation for the gene pairs whose correlations meet the conditions of significant difference.
边数据获取模块13是实施本发明的关键所在。基因对的表达值数据是矩阵形式:The edge data acquisition module 13 is the key to implementing the present invention. The expression value data for gene pairs are in matrix form:
其中,xij代表生物分子i在所述双状态中的第一状态下第j个样本的表达谱的数值或丰度谱的数值,yij代表生物分子i在所述双状态中的第二状态下第j个样本的表达谱的数值或丰度谱的数值。Wherein, x ij represents the value of the expression profile or the value of the abundance profile of the j-th sample of biomolecule i in the first state of the two states, and y ij represents the second state of the biomolecule i in the two states The numerical value of the expression profile or the numerical value of the abundance profile of the jth sample in the state.
矩阵转换的过程为:The process of matrix transformation is:
对于给定的基因对u和v,做如下变换:For a given gene pair u and v, do the following transformation:
其中,<u,v>N和<u,v>D分别是指基因对u,v在第一状态下和第二状态下的边特征,分别是基因对u和v在第一状态和第二状态下的表达谱的数值或丰度谱的数值的均值,Sxu,Sxv,Syu,Syv分别是基因对u和v在第一状态下和第二状态下的方差,k1,k2为校正系数(其中校正系数k1,k2的取值均为1),所有相关性符合显著差异条件的基因对得到的<u,v>N和<u,v>D所组成的矩阵就是基因对对应的边数据,边数据代表该基因对在不同状态下的相关性,每一个基因对由边数据里的两个对偶的变量或特征所刻画。每个基因对可以根据上述矩阵转换算出一个2*(m+n)的小矩阵,把这些小矩阵按行堆叠在一起得到的大矩阵就是所谓的边数据。这个边数据中每一行代表一个基因对,每一列代表一个样本。Among them, <u, v> N and <u, v> D refer to the edge features of the gene pair u, v in the first state and the second state, respectively, are the mean values of expression profiles or abundance profiles of gene pairs u and v in the first and second states, respectively, Sx u , Sx v , Sy u , Sy v are the gene pairs u and v in the first and second states, respectively. The variance in the first state and the second state, k 1 , k 2 are the correction coefficients (where the correction coefficients k 1 , k 2 are both 1), and all the gene pairs whose correlations meet the conditions of significant difference get <u ,v> N and <u,v> D matrix is the edge data corresponding to the gene pair, the edge data represents the correlation of the gene pair in different states, each gene pair is composed of two duals in the edge data. variable or characteristic. For each gene pair, a small matrix of 2*(m+n) can be calculated according to the above matrix transformation, and the large matrix obtained by stacking these small matrices together in rows is the so-called edge data. Each row in this edge data represents a gene pair, and each column represents a sample.
生物边标识建立模块14应用特征选择算法找出边数据中分类能力最佳的基因对,将分类能力最佳的基因对作为生物边标识,从而建立起生物边标识系统。特征选择算法包括机器学习中的循环增减法(Sequential Forward Floating Selection,SFFS)和支持向量机(Support Vector Machine,SVM)。The biological edge identification building module 14 applies the feature selection algorithm to find out the gene pair with the best classification ability in the edge data, and uses the gene pair with the best classification ability as the biological edge identification, thereby establishing the biological edge identification system. Feature selection algorithms include Sequential Forward Floating Selection (SFFS) and Support Vector Machine (SVM) in machine learning.
图4示出了本发明的生物边标识系统的第二实施例的原理。请参见图4,本实施例的生物边标识系统包括:信息收集模块21、预处理模块22、基因对选取模块23、边数据获取模块24、生物边标识建立模块25。Figure 4 shows the principle of the second embodiment of the biometric edge identification system of the present invention. Referring to FIG. 4 , the biological edge identification system in this embodiment includes: an information collection module 21 , a preprocessing module 22 , a gene pair selection module 23 , an edge data acquisition module 24 , and a biological edge identification establishment module 25 .
这些模块之间的连接关系是,信息收集模块21后连接预处理模块22,预处理模块22后连接基因对选取模块23,基因对选取模块23后连接边数据获取模块24,边数据获取模块24后连接生物边标识建立模块25。The connection relationship between these modules is that the information collection module 21 is connected to the preprocessing module 22, the preprocessing module 22 is connected to the gene pair selection module 23, the gene pair selection module 23 is connected to the side data acquisition module 24, and the side data acquisition module 24 Afterwards, the biological edge identification establishing module 25 is connected.
信息收集模块21收集具有双状态的数据。具有双状态的数据包括:正常状态数据和疾病状态数据、转移状态数据和非转移状态数据、有药物抵抗状态的数据和无药物抵抗状态的数据。具有双状态的数据的数据类型包括基因对的表达谱或丰度谱数据,包括microarray、蛋白质谱、代谢物质质谱、表观遗传谱数据等。The information collection module 21 collects data with two states. Data with dual states includes: normal state data and disease state data, metastatic state data and non-metastatic state data, data with drug resistance state and data without drug resistance state. The data types of data with two states include gene pair expression profile or abundance profile data, including microarray, protein profile, metabolite mass spectrometry, epigenetic profile data, etc.
预处理模块22对数据进行预处理,去除表达均值低于设定值或变异系数高于设定值的基因,以降低噪声对结果的影响。The preprocessing module 22 preprocesses the data, and removes genes whose expression mean value is lower than the set value or the variation coefficient is higher than the set value, so as to reduce the influence of noise on the result.
基因对选取模块23选出相关性符合显著差异条件的基因对。基因对选取模块23计算基因对在双状态下的相关系数,根据双状态下的相关系数的差异的绝对值和阈值的比较来确定相关性是否符合显著差异条件。The gene pair selection module 23 selects gene pairs whose correlations meet the conditions of significant difference. The gene pair selection module 23 calculates the correlation coefficient of the gene pair under two states, and determines whether the correlation meets the significant difference condition according to the comparison between the absolute value of the difference of the correlation coefficient under the two states and the threshold value.
比如,基因1和基因2的在正常情况下的相关系数是0.8而在疾病状态下为-0.6,则它们相关性差异的绝对值为1.4,假定阈值为0.8,则可以确定基因1和基因2为相关性有显著差异的基因对。For example, if the correlation coefficient between gene 1 and gene 2 is 0.8 under normal conditions and -0.6 in disease state, then the absolute value of their correlation difference is 1.4. Assuming a threshold of 0.8, gene 1 and gene 2 can be determined. Gene pairs with significantly different correlations.
边数据获取模块24对于相关性符合显著差异条件的基因对,通过矩阵变换,将基因对的表达值数据转化为代表相关性的边数据。The edge data acquisition module 24 converts the expression value data of the gene pairs into edge data representing the correlation through matrix transformation for gene pairs whose correlations meet the conditions of significant difference.
边数据获取模块24是实施本发明的关键所在。基因对的表达值数据是矩阵形式:The edge data acquisition module 24 is the key to implementing the present invention. The expression value data for gene pairs are in matrix form:
其中,xij代表生物分子i在所述双状态中的第一状态下第j个样本的表达谱的数值或丰度谱的数值,yij代表生物分子i在所述双状态中的第二状态下第j个样本的表达谱的数值或丰度谱的数值。Wherein, x ij represents the value of the expression profile or the value of the abundance profile of the j-th sample of biomolecule i in the first state of the two states, and y ij represents the second state of the biomolecule i in the two states The numerical value of the expression profile or the numerical value of the abundance profile of the jth sample in the state.
矩阵转换的过程为:The process of matrix transformation is:
对于给定的基因对u和v,做如下变换:For a given gene pair u and v, do the following transformation:
其中,<u,v>N和<u,v>D分别是指基因对u,v在第一状态下和第二状态下的边特征,分别是基因对u和v在第一状态和第二状态下的表达谱的数值或丰度谱的数值的均值,Sxu,Sxv,Syu,Syv分别是基因对u和v在第一状态下和第二状态下的方差,k1,k2为校正系数(其中校正系数k1,k2的取值均为1),所有相关性符合显著差异条件的基因对得到的<u,v>N和<u,v>D所组成的矩阵就是基因对对应的边数据,边数据代表该基因对在不同状态下的相关性,每一个基因对由边数据里的两个对偶的变量或特征所刻画。每个基因对可以根据上述矩阵转换算出一个2*(m+n)的小矩阵,把这些小矩阵按行堆叠在一起得到的大矩阵就是所谓的边数据。这个边数据中每一行代表一个基因对,每一列代表一个样本。Among them, <u, v> N and <u, v> D refer to the edge features of the gene pair u, v in the first state and the second state, respectively, are the mean values of expression profiles or abundance profiles of gene pairs u and v in the first and second states, respectively, Sx u , Sx v , Sy u , Sy v are the gene pairs u and v in the first and second states, respectively. The variance in the first state and the second state, k 1 , k 2 are the correction coefficients (where the correction coefficients k 1 , k 2 are both 1), and all the gene pairs whose correlations meet the conditions of significant difference get <u ,v> N and <u,v> D matrix is the edge data corresponding to the gene pair, the edge data represents the correlation of the gene pair in different states, each gene pair is composed of two duals in the edge data. variable or characteristic. For each gene pair, a small matrix of 2*(m+n) can be calculated according to the above matrix transformation, and the large matrix obtained by stacking these small matrices together in rows is the so-called edge data. Each row in this edge data represents a gene pair, and each column represents a sample.
生物边标识建立模块25应用特征选择算法找出边数据中分类能力最佳的基因对,将分类能力最佳的基因对作为生物边标识,从而建立起生物边标识系统。特征选择算法包括机器学习中的循环增减法(Sequential Forward Floating Selection,SFFS)和支持向量机(Support Vector Machine,SVM)。The biological edge identification building module 25 applies the feature selection algorithm to find out the gene pair with the best classification ability in the edge data, and uses the gene pair with the best classification ability as the biological edge identification, thereby establishing a biological edge identification system. Feature selection algorithms include Sequential Forward Floating Selection (SFFS) and Support Vector Machine (SVM) in machine learning.
图5还示出了生物边标识系统的一个示例的建立和应用的示意流程。有了本发明建立的生物边标识系统后,可以建立对应的分类器或诊断模型,基于这个诊断模型,对于待测的样本,根据矩阵变换算出其对应于生物边标识的边值作为诊断模型的输入数据,再根据模型的输出结果判断待测样本的状态,即是否得病、或是否发生癌转移等。FIG. 5 also shows a schematic flow of the establishment and application of an example of the biometric edge identification system. With the biological edge identification system established by the present invention, a corresponding classifier or diagnostic model can be established. Based on this diagnostic model, for the sample to be tested, the boundary value corresponding to the biological edge identification is calculated according to the matrix transformation as the diagnostic model. Input data, and then judge the status of the sample to be tested according to the output results of the model, that is, whether it is sick or whether cancer metastasis occurs.
以基因表达数据为例,现有方法主要是从差异表达的基因中挑出具有最大区分能力的基因作为生物标识,这些生物标识是否具有相互作用是不知道的。而本发明关注的是相互作用上有差异的基因,并从中找出具有最大区分能力的基因对作为生物标识,称之为生物边标识,这样找出来的边标识从机制上讲更有可能是导致疾病发生发展的原因。Taking gene expression data as an example, existing methods mainly select genes with the greatest discriminating ability from differentially expressed genes as biomarkers, and it is unknown whether these biomarkers have interactions. The present invention focuses on genes with differences in interaction, and finds the gene pair with the greatest discriminative ability as a biological marker, which is called a biological edge marker. causes of disease development.
尽管为使解释简单化将上述方法图示并描述为一系列动作,但是应理解并领会,这些方法不受动作的次序所限,因为根据一个或多个实施例,一些动作可按不同次序发生和/或与来自本文中图示和描述或本文中未图示和描述但本领域技术人员可以理解的其他动作并发地发生。Although the above-described methods are illustrated and described as a series of acts for simplicity of explanation, it should be understood and appreciated that these methods are not limited by the order of the acts, as some acts may occur in a different order in accordance with one or more embodiments and/or occur concurrently with other actions from or not shown and described herein but understood by those skilled in the art.
本领域技术人员将进一步领会,结合本文中所公开的实施例来描述的各种解说性逻辑板块、模块、电路、和算法步骤可实现为电子硬件、计算机软件、或这两者的组合。为清楚地解说硬件与软件的这一可互换性,各种解说性组件、框、模块、电路、和步骤在上面是以其功能性的形式作一般化描述的。此类功能性是被实现为硬件还是软件取决于具体应用和施加于整体系统的设计约束。技术人员对于每种特定应用可用不同的方式来实现所描述的功能性,但这样的实现决策不应被解读成导致脱离了本发明的范围。Those skilled in the art will further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends on the specific application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
结合本文所公开的实施例描述的各种解说性逻辑板块、模块、和电路可用通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑器件、分立的门或晶体管逻辑、分立的硬件组件、或其设计成执行本文所描述功能的任何组合来实现或执行。通用处理器可以是微处理器,但在替换方案中,该处理器可以是任何常规的处理器、控制器、微控制器、或状态机。处理器还可以被实现为计算设备的组合,例如DSP与微处理器的组合、多个微处理器、与DSP核心协作的一个或多个微处理器、或任何其他此类配置。The various illustrative logic blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented using general purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or other Programmable logic devices, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein are implemented or performed. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors cooperating with a DSP core, or any other such configuration.
结合本文中公开的实施例描述的方法或算法的步骤可直接在硬件中、在由处理器执行的软件模块中、或在这两者的组合中体现。软件模块可驻留在RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、可移动盘、CD-ROM、或本领域中所知的任何其他形式的存储介质中。示例性存储介质耦合到处理器以使得该处理器能从/向该存储介质读取和写入信息。在替换方案中,存储介质可以被整合到处理器。处理器和存储介质可驻留在ASIC中。ASIC可驻留在用户终端中。在替换方案中,处理器和存储介质可作为分立组件驻留在用户终端中。The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integrated into the processor. The processor and storage medium may reside in the ASIC. The ASIC may reside in the user terminal. In the alternative, the processor and storage medium may reside in the user terminal as discrete components.
在一个或多个示例性实施例中,所描述的功能可在硬件、软件、固件或其任何组合中实现。如果在软件中实现为计算机程序产品,则各功能可以作为一条或更多条指令或代码存储在计算机可读介质上或藉其进行传送。计算机可读介质包括计算机存储介质和通信介质两者,其包括促成计算机程序从一地向另一地转移的任何介质。存储介质可以是能被计算机访问的任何可用介质。作为示例而非限定,这样的计算机可读介质可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储、磁盘存储或其它磁存储设备、或能被用来携带或存储指令或数据结构形式的合意程序代码且能被计算机访问的任何其它介质。任何连接也被正当地称为计算机可读介质。例如,如果软件是使用同轴电缆、光纤电缆、双绞线、数字订户线(DSL)、或诸如红外、无线电、以及微波之类的无线技术从web网站、服务器、或其它远程源传送而来,则该同轴电缆、光纤电缆、双绞线、DSL、或诸如红外、无线电、以及微波之类的无线技术就被包括在介质的定义之中。如本文中所使用的盘(disk)和碟(disc)包括压缩碟(CD)、激光碟、光碟、数字多用碟(DVD)、软盘和蓝光碟,其中盘(disk)往往以磁的方式再现数据,而碟(disc)用激光以光学方式再现数据。上述的组合也应被包括在计算机可读介质的范围内。In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software as a computer program product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage medium can be any available medium that can be accessed by a computer. By way of example and not limitation, such computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, or can be used to carry or store instructions or data structures in the form of Any other medium that conforms to program code and that can be accessed by a computer. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave , then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc as used herein includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc, where disks are often reproduced magnetically data, and discs reproduce the data optically with a laser. Combinations of the above should also be included within the scope of computer-readable media.
提供对本公开的先前描述是为使得本领域任何技术人员皆能够制作或使用本公开。对本公开的各种修改对本领域技术人员来说都将是显而易见的,且本文中所定义的普适原理可被应用到其他变体而不会脱离本公开的精神或范围。由此,本公开并非旨在被限定于本文中所描述的示例和设计,而是应被授予与本文中所公开的原理和新颖性特征相一致的最广范围。The previous description of the present disclosure is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to the present disclosure will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other variations without departing from the spirit or scope of the present disclosure. Thus, the present disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010108036.7A CN111312336A (en) | 2014-11-13 | 2014-11-13 | Method and system for establishing biological edge identification system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410640410.2A CN105590037A (en) | 2014-11-13 | 2014-11-13 | Biological edge mark system and establishing method thereof |
CN202010108036.7A CN111312336A (en) | 2014-11-13 | 2014-11-13 | Method and system for establishing biological edge identification system |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410640410.2A Division CN105590037A (en) | 2014-11-13 | 2014-11-13 | Biological edge mark system and establishing method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111312336A true CN111312336A (en) | 2020-06-19 |
Family
ID=55929613
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010108036.7A Pending CN111312336A (en) | 2014-11-13 | 2014-11-13 | Method and system for establishing biological edge identification system |
CN201410640410.2A Pending CN105590037A (en) | 2014-11-13 | 2014-11-13 | Biological edge mark system and establishing method thereof |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410640410.2A Pending CN105590037A (en) | 2014-11-13 | 2014-11-13 | Biological edge mark system and establishing method thereof |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN111312336A (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107292128A (en) * | 2017-06-27 | 2017-10-24 | 湖南农业大学 | One kind pairing interacting genes detection method and forecast model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6128608A (en) * | 1998-05-01 | 2000-10-03 | Barnhill Technologies, Llc | Enhancing knowledge discovery using multiple support vector machines |
CN101996284A (en) * | 2010-11-29 | 2011-03-30 | 昆明理工大学 | Screening method of characteristic gene of certain disease |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761451B (en) * | 2014-01-02 | 2017-04-05 | 中国科学院数学与系统科学研究院 | Biomarker combined recognising method and system based on biomedical big data |
-
2014
- 2014-11-13 CN CN202010108036.7A patent/CN111312336A/en active Pending
- 2014-11-13 CN CN201410640410.2A patent/CN105590037A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6128608A (en) * | 1998-05-01 | 2000-10-03 | Barnhill Technologies, Llc | Enhancing knowledge discovery using multiple support vector machines |
CN101996284A (en) * | 2010-11-29 | 2011-03-30 | 昆明理工大学 | Screening method of characteristic gene of certain disease |
Non-Patent Citations (1)
Title |
---|
WANWEI ZHANG ET AL.: ""EdgeMarker:Identifying differentially correlated molecule pairs as edge-biomarkers"" * |
Also Published As
Publication number | Publication date |
---|---|
CN105590037A (en) | 2016-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12242943B2 (en) | Generating machine learning models using genetic data | |
CN112863696B (en) | Method and device for drug sensitivity prediction based on transfer learning and graph neural network | |
CN111742059B (en) | Model for targeted sequencing | |
CN117642515A (en) | Methods and systems for three-dimensional reconstruction of tissue gene expression data | |
TWI781230B (en) | Method, system and computer product using site-specific noise model for targeted sequencing | |
CN110648721B (en) | Method and device for detecting copy number variation for exon capture technology | |
Kim et al. | rSW-seq: algorithm for detection of copy number alterations in deep sequencing data | |
CN103559426A (en) | Protein functional module excavating method for multi-view data fusion | |
Ahmed et al. | Early detection of Alzheimer's disease using single nucleotide polymorphisms analysis based on gradient boosting tree | |
CN118969078B (en) | A spatial omics tumor evolution prediction method and system based on graph neural network | |
CN107992722A (en) | Based on symmetrical uncertain and information exchange gain feature selection approach | |
CN111180013B (en) | Device for detecting blood disease fusion gene | |
Liu et al. | Mixed-weight neural bagging for detecting $ m^ 6A $ modifications in SARS-CoV-2 RNA sequencing | |
Carrieri et al. | A fast machine learning workflow for rapid phenotype prediction from whole shotgun metagenomes | |
CN117079804A (en) | Method and system for constructing digestive system tumor clinical result prediction model | |
CN116564409A (en) | A Machine Learning-Based Identification Method for Metastatic Breast Cancer Transcriptome Sequencing Data | |
CN108268753B (en) | Method, device and equipment for identifying microbiome | |
CN101950326B (en) | Based on the DNA sequence dna similarity detection method of Hurst index | |
CN111312336A (en) | Method and system for establishing biological edge identification system | |
CN112992274A (en) | Method and system for constructing disease risk prediction model based on sequencing and machine learning | |
CN117688225A (en) | Filtering method for second-generation sequencing RNA fusion false positive | |
CN115251953B (en) | Motor imagery electroencephalogram signal identification method, device, terminal equipment and storage medium | |
TWI399661B (en) | A system for analyzing and screening disease related genes using microarray database | |
CN116206680A (en) | A method, device, equipment and storage medium for detecting tandem repeat regions | |
Dautle et al. | Single‐Cell Hi‐C Technologies and Computational Data Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200703 Address after: 200031 building 35, No. 320, Yueyang Road, Xuhui District, Shanghai Applicant after: Center for excellence and innovation of molecular cell science, Chinese Academy of Sciences Address before: 200031 Yueyang Road, Shanghai, No. 319, No. Applicant before: SHANGHAI INSTITUTES FOR BIOLOGICAL SCIENCES, CHINESE ACADEMY OF SCIENCES |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200619 |