CN106778078A - DNA sequence dna similitude comparison method based on kendall coefficient correlations - Google Patents
DNA sequence dna similitude comparison method based on kendall coefficient correlations Download PDFInfo
- Publication number
- CN106778078A CN106778078A CN201611186639.9A CN201611186639A CN106778078A CN 106778078 A CN106778078 A CN 106778078A CN 201611186639 A CN201611186639 A CN 201611186639A CN 106778078 A CN106778078 A CN 106778078A
- Authority
- CN
- China
- Prior art keywords
- dna sequence
- dna
- words
- kendall
- sequence dna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 108091028043 Nucleic acid sequence Proteins 0.000 title claims abstract description 107
- 238000000034 method Methods 0.000 title claims abstract description 26
- 239000013598 vector Substances 0.000 claims abstract description 30
- 239000011159 matrix material Substances 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 abstract description 4
- 230000000875 corresponding effect Effects 0.000 description 6
- 241000894007 species Species 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 108020004414 DNA Proteins 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 241001439211 Almeida Species 0.000 description 1
- 241000283081 Balaenoptera physalus Species 0.000 description 1
- 241000289427 Didelphidae Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 241000282575 Gorilla Species 0.000 description 1
- 241000283118 Halichoerus grypus Species 0.000 description 1
- 241000282620 Hylobates sp. Species 0.000 description 1
- 241000289569 Macropus robustus Species 0.000 description 1
- 241000289371 Ornithorhynchus anatinus Species 0.000 description 1
- 241000282577 Pan troglodytes Species 0.000 description 1
- 241001504519 Papio ursinus Species 0.000 description 1
- 241000283150 Phoca vitulina Species 0.000 description 1
- 241000282405 Pongo abelii Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007614 genetic variation Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
本发明公开基于kendall相关系数的DNA序列相似性比对方法,其包括如下步骤:1)获取N条待比对的DNA序列;2)选取长度k,按滑动窗口的方式获取每对组合DNA序列的相应的k词,并组合成相应的向量3)以步骤2)所获取的k词,计算每个k词在DNA序列中出现的次数即计算k词在DNA序列中出现的频率向量,将其记为xi,DNA序列所有的k词频率记为X={xi};4)对N条DNA序列k词向量进行两两组合,即得到组合,每个组合k词频率向量记为x,y;5)每种组合的k词频率向量即x,y,计算其对应的kendall相关系数;6)建立N条DNA序列的N*N阶相似系数矩阵,以获取DNA序列的相似性以及进化关系图。本发明提高DNA序列相似性比对的效果,简化计算复杂性并缩短运算时间。
The invention discloses a DNA sequence similarity comparison method based on kendall correlation coefficient, which comprises the following steps: 1) obtaining N DNA sequences to be compared; 2) selecting a length k, and obtaining each pair of combined DNA sequences in a sliding window manner The corresponding k words, and be combined into corresponding vectors 3) With the k words obtained in step 2), calculate the number of times that each k word occurs in the DNA sequence, that is, calculate the frequency vector of the occurrence of the k words in the DNA sequence, and use It is denoted as xi , and the frequency of all k words in the DNA sequence is denoted as X={ xi }; 4) Combining the k word vectors of N DNA sequences in pairs, we can get Combination, each combination k word frequency vector is recorded as x, y; 5) The k word frequency vector of each combination is x, y, and its corresponding kendall correlation coefficient is calculated; 6) N*N order of N DNA sequences is established Similarity coefficient matrix to obtain DNA sequence similarity and evolutionary relationship graph. The invention improves the effect of DNA sequence similarity comparison, simplifies calculation complexity and shortens calculation time.
Description
技术领域technical field
本发明涉及计算机与生物信息学处理领域,尤其涉及基于kendall相关系数的DNA序列相似性比对方法。The invention relates to the field of computer and bioinformatics processing, in particular to a DNA sequence similarity comparison method based on kendall correlation coefficient.
背景技术Background technique
生物信息学的中心任务,是从浩如烟海的DNA序列数据中提取理性知识。生物信息学家所面临的任务,不仅是解决高效的数据储存手段,而且需要开发有效的数据分析工具。因为只有利用新的、有效的数据分析工具,才能将DNA序列信息转换成生物学知识,并弄清它们所蕴含的结构和功能信息,进而彻底了解它们所代表的生物学意义。The central task of bioinformatics is to extract rational knowledge from the vast sea of DNA sequence data. The task faced by bioinformaticians is not only to solve efficient data storage methods, but also to develop effective data analysis tools. Because only by using new and effective data analysis tools, can DNA sequence information be converted into biological knowledge, and the structural and functional information contained in them can be clarified, so as to fully understand the biological significance they represent.
DNA序列比对的理论基础是进化理论,如果两个DNA序列之间具有足够的相似性,就推测二者可能有共同的进化祖先,经过DNA序列内残基的替换、残基或DNA序列片段的缺失以及DNA序列重组等遗传变异过程分别演化而来。DNA序列相似和DNA序列同源是不同的概念,DNA序列之间的相似程度是可以量化的参数,而DNA序列是否同源需要有进化事实的验证。DNA序列比对实际上就是运用某种特定的数学模型或算法,找出两个或多个DNA序列之间的最大匹配碱基数。The theoretical basis of DNA sequence comparison is the theory of evolution. If there is enough similarity between two DNA sequences, it is speculated that the two may have a common evolutionary ancestor, through the replacement of residues in the DNA sequence, residues or DNA sequence fragments Deletion and DNA sequence recombination and other genetic variation processes evolved respectively. DNA sequence similarity and DNA sequence homology are different concepts. The degree of similarity between DNA sequences is a quantifiable parameter, while whether DNA sequences are homologous needs to be verified by evolutionary facts. DNA sequence comparison is actually using a specific mathematical model or algorithm to find the maximum number of matching bases between two or more DNA sequences.
黄玉娟、王天明等人采用DNA序列中的k词出现的频率及位置信息构建了一个概率分布,这个分布表示两个向量之间的距离,值越小物种越接近。Vinga和Almeida提出了基于词频率的DNA序列比较方法:通过滑动窗口的方式所有长度为k的词出现的次数,得到k词次数或频率向量,这样把一条DNA序列映射为高维欧式空间上的一个向量,从而将DNA序列之间的相似性比较转换为向量之间的比较。Huang Yujuan, Wang Tianming and others constructed a probability distribution using the frequency and location information of k words in the DNA sequence. This distribution represents the distance between two vectors, and the smaller the value, the closer the species. Vinga and Almeida proposed a DNA sequence comparison method based on word frequency: the number of occurrences of all words with a length of k is obtained by sliding the window to obtain the number of times or frequency vectors of k words, so that a DNA sequence is mapped to a high-dimensional Euclidean space. A vector, thus converting the similarity comparison between DNA sequences into a comparison between vectors.
双DNA序列比对就是用特定的算法对两条DNA序列进行比对,从而求出这两条DNA序列之间最大的相似性的匹配。Kendall相关系数被广泛用于时间DNA序列、水文、水质DNA序列等的相关性预测,但未曾被用于DNA序列相似性匹配。Double DNA sequence comparison is to use a specific algorithm to compare two DNA sequences, so as to find the maximum similarity between the two DNA sequences. The Kendall correlation coefficient is widely used in the correlation prediction of time DNA sequence, hydrology, water quality DNA sequence, etc., but it has not been used in DNA sequence similarity matching.
发明内容Contents of the invention
本发明的目的在于克服现有技术的不足,提供基于kendall相关系数的DNA序列相似性比对方法,构建一个关于N条DNA序列的阶相似系数矩阵,获得N条DNA序列的进化关系,同时提高DNA序列相似性比对的效率及提高运算效率。The purpose of the present invention is to overcome the deficiencies in the prior art, provide a DNA sequence similarity comparison method based on the Kendall correlation coefficient, and construct a list of N DNA sequences The first-order similarity coefficient matrix is used to obtain the evolutionary relationship of N DNA sequences, and at the same time, the efficiency of DNA sequence similarity comparison is improved and the calculation efficiency is improved.
本发明采用的技术方案是:The technical scheme adopted in the present invention is:
基于kendall相关系数的DNA序列相似性比对方法,其包括如下步骤:A DNA sequence similarity comparison method based on kendall correlation coefficient, which comprises the steps of:
1)获取N条待比对的DNA序列;1) Obtain N DNA sequences to be compared;
2)选取长度k,按滑动窗口的方式获取每对组合DNA序列的相应的k词,并组合成相应的向量2) Select the length k, obtain the corresponding k words of each pair of combined DNA sequences in a sliding window manner, and combine them into corresponding vectors
3)以步骤2)所获取的k词,计算每个k词在DNA序列中出现的次数,即计算k词在DNA序列中出现的频率向量,将其记为xi;3) With the k words obtained in step 2), calculate the number of times that each k word occurs in the DNA sequence, that is, calculate the frequency vector that the k word occurs in the DNA sequence, and record it as xi ;
4)对N条DNA序列k词向量进行两两组合,即得到组合,每个组合向量记为X={xi},Y={yi}。4) Combining the k word vectors of N DNA sequences in pairs to obtain Combination, each combination vector is recorded as X={ xi }, Y={y i }.
5)每种组合的k词频率向量即xi,yi,计算其对应的kendall相关系数;5) The k word frequency vectors of each combination are x i , y i , and their corresponding kendall correlation coefficients are calculated;
6)建立N条DNA序列的N×N阶相关系数矩阵,以获取DNA序列的相似信息以及进化关系图。6) Establish an N×N order correlation coefficient matrix of N DNA sequences to obtain similarity information and evolutionary relationship diagrams of DNA sequences.
进一步,所述步骤2)中,对DNA序列取其长度为k的词频向量。Further, in the step 2), the word frequency vector whose length is k is taken for the DNA sequence.
进一步,所述步骤5)中,可通过如下步骤获得DNA序列的k词的kendall相关系数;Further, in the step 5), the kendall correlation coefficient of the k words of the DNA sequence can be obtained through the following steps;
a)通过下式,获取待比对DNA序列A的k词,其中DNA序列A长度设为n:a) Obtain the k words of the DNA sequence A to be compared by the following formula, wherein the length of the DNA sequence A is set to n:
b)通过下式,计算k词出现的频率:xi={第i个k词在DNA序列A中重复出现的次数};b) Calculate the frequency of occurrence of k words by the following formula: x i = {i-th k words The number of repetitions in the DNA sequence A};
c)对组合的X,Y向量,通过下式,计算kendall相关系数其特征在于:tx是{xi},{yi}中拥有一致性对数,ty是{xi,yi}拥有不一致性对数,T是{xi,yi}拥有不相同k词总个数。c) For the combined X and Y vectors, calculate the kendall correlation coefficient through the following formula It is characterized in that: t x is { xi }, {y i } has consistent logarithms, t y is { xi , y i } has inconsistent logarithms, T is { xi , y i } has inconsistent The total number of identical k words.
d)步骤c)中的tx,ty可以由下式获取,tx=(xi-yi)*(xi-yi)为同号,则称为是{xi,yi}中一致性对数,ty可以由下式获取,ty=(xi-yi)*(xi-yi)为异号,则称为是{xi,yi}中不一致性对数d) t x and t y in step c) can be obtained by the following formula, t x = ( xi -y i )*( xi -y i ) is the same sign, then it is called { xi , y i }, t y can be obtained by the following formula, if t y = ( xi -y i )*( xi -y i ) is a different sign, it is called inconsistency in { xi , y i } Sexual logarithm
所获得的kendall相关系数τ是一个值为[-1,1]的数,当τ的值越接近于1则表示两条DNA序列之间相关程度越强,当τ的值越接近-1则表示两条DNA序列之间是负向相关,当τ的值接近于0则表示两条DNA序列不存在相关性。The obtained kendall correlation coefficient τ is a number with a value of [-1,1]. When the value of τ is closer to 1, it means that the correlation between the two DNA sequences is stronger. When the value of τ is closer to -1, the It means that there is a negative correlation between the two DNA sequences, and when the value of τ is close to 0, it means that there is no correlation between the two DNA sequences.
构建N*N阶的kendall相关系数矩阵,此矩阵为对称矩阵,对角线上的值为1,可以得到N条DNA序列的两两相似性信息,由此构建出N条DNA序列的进化的关系。Construct a kendall correlation coefficient matrix of N*N order, which is a symmetrical matrix, and the value on the diagonal is 1, and the pairwise similarity information of N DNA sequences can be obtained, thereby constructing the evolutionary model of N DNA sequences relation.
本发明基于kendall相关系数的DNA序列相似性比对方法,采用滑动窗口方式求取待分析DNA序列的k词频率向量,对N条DNA序列的k词向量进行两两组合,利用kendall相关系数对相应DNA序列的k词频率向量求其相关系数,使得能够对多条DNA序列进行相似性检测,检测结果有效地反映出DNA序列之间的进化关系。本方法较为简洁,只需构建一个对称矩阵,矩阵左上到右下的对角线上的值为1,简化了计算复杂性,提高了运算效率,kendall系数可以作为描述DNA序列相似性预测的特征值,可以获得良好的准确度。The present invention is based on the DNA sequence similarity comparison method of the kendall correlation coefficient, adopts the sliding window method to obtain the k-word frequency vector of the DNA sequence to be analyzed, and combines the k-word vectors of the N DNA sequences in pairs, and uses the kendall correlation coefficient to compare The correlation coefficient of the k-word frequency vector of the corresponding DNA sequence is calculated, so that the similarity detection of multiple DNA sequences can be carried out, and the detection results can effectively reflect the evolutionary relationship between the DNA sequences. This method is relatively simple. It only needs to construct a symmetrical matrix. The value on the diagonal line from the upper left to the lower right of the matrix is 1, which simplifies the computational complexity and improves the operational efficiency. The kendall coefficient can be used as a feature to describe the prediction of DNA sequence similarity value, good accuracy can be obtained.
附图说明Description of drawings
以下结合附图和具体实施方式对本发明做进一步详细说明;The present invention will be described in further detail below in conjunction with accompanying drawing and specific embodiment;
图1为本发明基于kendall相关系数的DNA序列相似性比对方法的流程示意图;Fig. 1 is the schematic flow chart of the DNA sequence similarity comparison method based on kendall correlation coefficient of the present invention;
图2为本发明基于kendall相关系数的DNA序列相似性比对方法的DNA序列的进化关系图。Fig. 2 is a diagram of the evolution relationship of DNA sequences in the method of comparing DNA sequence similarity based on Kendall correlation coefficient in the present invention.
具体实施方式detailed description
如图1或图2所示,对本发明的方法采用20个物种的DNA编码DNA序列作为分析对象为例作进一步详细阐述,包括以下步骤:如图1所示,本实施例的基于kendall相关系数的DNA序列相似性比对方法包括如下步骤:As shown in Figure 1 or Figure 2, the method of the present invention adopts the DNA coding DNA sequences of 20 species as an example to be further described in detail, including the following steps: As shown in Figure 1, the present embodiment based on the kendall correlation coefficient The DNA sequence similarity comparison method comprises the following steps:
1)选择20个物种的DNA编码DNA序列作为初始DNA序列,20个物种的DNA序列名称及长度见表1;1) Select the DNA coding DNA sequences of 20 species as the initial DNA sequences, and the names and lengths of the DNA sequences of the 20 species are shown in Table 1;
表1:物种DNA序列信息Table 1: Species DNA sequence information
2)对步骤1的初始DNA序列获取其k词,并组合这些k词,得到初始DNA序列的k词频率向量(参见Vinga,S.Almeida,J.S.Alignment-free sequence comparison area review[J].Bioinformatics.513-523.2003)。此方法的特点是对按滑动窗口方式求长度k的短DNA序列出现在待测DNA序列中频率,对DNA的4个碱基{A,T,G,C},取k长度为2,则对应k词有42=16种,若k=3则对应k词43=64种;如待测DNA序列片段的DNA序列A=ATAACTA,其k词W2={AT,TA,AA,TT,AG,GA,AC,CA,CT….},其频率向量值为{1,2,1,0,0,0,1,0,1,0…};待测DNA序列片段B=ACAACTTA,其k词频率向量为{0,1,1,1,0,0,2,1,1,0…};2) Obtain k words from the initial DNA sequence in step 1, and combine these k words to obtain the k word frequency vector of the initial DNA sequence (see Vinga, S.Almeida, JSAligment-free sequence comparison area review[J].Bioinformatics. 513-523.2003). The feature of this method is to calculate the frequency of short DNA sequences of length k appearing in the DNA sequence to be tested according to the sliding window method. For the 4 bases {A, T, G, C} of DNA, take the length of k as 2, then There are 4 2 =16 kinds of corresponding k words, and if k=3 then correspond to k words 4 3 =64 kinds; as the DNA sequence A=ATAACTA of the DNA sequence fragment to be tested, its k words W 2 ={AT, TA, AA, TT,AG,GA,AC,CA,CT….}, its frequency vector The value is {1,2,1,0,0,0,1,0,1,0...}; the DNA sequence fragment to be tested B=ACAACTTA, and its k word frequency vector is {0,1,1,1,0 ,0,2,1,1,0...};
3)对应N条DNA序列,可以求出N个k词频率向量,将其两两组合,得到组合,每个组合频率向量记为X,Y3) Corresponding to N DNA sequences, N k-word frequency vectors can be obtained, and combined in pairs to obtain Combination, each combination frequency vector is recorded as X, Y
4)通过下式计算获取kendall相关系数,其中tx是{xi,yi}与其他k词频率之间拥有一致性对数,ty是{xi,yi}与其他k词频率之间拥有不一致性对数,T是{xi,yi}拥有不相同k词总个数,步骤2)中DNA序列A,B片段的k词总个数为T=7;4) Calculated by the following formula Obtain the kendall correlation coefficient, where t x is the logarithm of consistency between { xi , y i } and other k word frequencies, and t y is the inconsistency pair between { xi , y i } and other k word frequencies Number, T is that { xi , y i } has different total number of k words, step 2) in DNA sequence A, the total number of k words of segment B is T=7;
5)步骤4)中的tx,ty可以由下式获取,tx=(xi-yi)×(xi-yi)为同号,则称为{xi,yi}中一致性对数,ty可以由下式获取,ty=(xi-yi)×(xi-yi)为异号,则称为{xi,yi}中不一致性对数;5) t x and t y in step 4) can be obtained by the following formula, t x = ( xi -y i ) × ( xi -y i ) is the same sign, then it is called { xi , y i } Consistency logarithm, t y can be obtained by the following formula, t y = ( xi -y i )×( xi -y i ) is a different sign, it is called the inconsistency pair in { xi ,y i } number;
6)构建矩阵为N*N阶的kendall相关系数矩阵,此矩阵为对称矩阵,对角线值为1,通常可以列为上三角矩阵。由于相似性与距离成负相关关系,所以,在构建进化关系图之前,我们将相似性数值取相反数转换为距离,并以此构建进化关系图,请参看图2。6) The construction matrix is a kendall correlation coefficient matrix of order N*N, which is a symmetrical matrix with a diagonal value of 1, and can usually be listed as an upper triangular matrix. Since similarity is negatively correlated with distance, before constructing the evolutionary relationship diagram, we invert the similarity value and convert it into distance, and then construct the evolutionary relationship diagram, please refer to Figure 2.
结果分析:通过计算与编辑距离之间的皮尔森相关系数,我们发现应用kendall计算出来的DNA序列相似性与编辑距离的相关系数为-0.94,说明应用本发明方法计算出来的DNA序列相似性具有精度高的特点,并且能够通过快速计算得到,是一种替代编辑距离的非常有效的方法。Result analysis: by calculating the Pearson correlation coefficient between the edit distance, we find that the correlation coefficient between the DNA sequence similarity calculated by kendall and the edit distance is -0.94, indicating that the DNA sequence similarity calculated by the method of the present invention has It is characterized by high precision and can be obtained by fast calculation, which is a very effective method to replace the edit distance.
以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above is only an embodiment of the present invention, and does not limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technologies fields, all of which are equally included in the scope of patent protection of the present invention.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611186639.9A CN106778078B (en) | 2016-12-20 | 2016-12-20 | DNA Sequence Similarity Alignment Method Based on Kendall's Correlation Coefficient |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611186639.9A CN106778078B (en) | 2016-12-20 | 2016-12-20 | DNA Sequence Similarity Alignment Method Based on Kendall's Correlation Coefficient |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106778078A true CN106778078A (en) | 2017-05-31 |
CN106778078B CN106778078B (en) | 2019-04-09 |
Family
ID=58896076
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611186639.9A Expired - Fee Related CN106778078B (en) | 2016-12-20 | 2016-12-20 | DNA Sequence Similarity Alignment Method Based on Kendall's Correlation Coefficient |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106778078B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846262A (en) * | 2018-05-31 | 2018-11-20 | 广西大学 | The method that RNA secondary structure distance based on DFT calculates phylogenetic tree construction |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040101846A1 (en) * | 2002-11-22 | 2004-05-27 | Collins Patrick J. | Methods for identifying suitable nucleic acid probe sequences for use in nucleic acid arrays |
CN102732609A (en) * | 2011-04-08 | 2012-10-17 | 博奥生物有限公司 | Method for detecting similarity of oligonucleotide and target genome |
WO2014019164A1 (en) * | 2012-08-01 | 2014-02-06 | 深圳华大基因研究院 | Method and device for analyzing microbial community composition |
CN104395900A (en) * | 2013-03-15 | 2015-03-04 | 北京未名博思生物智能科技开发有限公司 | Spatial arithmetic method of sequence alignment |
CN104657628A (en) * | 2015-01-08 | 2015-05-27 | 深圳华大基因科技服务有限公司 | Proton-based transcriptome sequencing data comparison and analysis method and system |
WO2016058089A1 (en) * | 2014-10-17 | 2016-04-21 | The Hospital For Sick Children | Dna methylation markers for overgrowth syndromes |
EP3081257A1 (en) * | 2015-04-17 | 2016-10-19 | Sorin CRM SAS | Active implantable medical device for cardiac stimulation comprising means for detecting a remodelling or reverse remodelling phenomenon of the patient |
CN106203471A (en) * | 2016-06-22 | 2016-12-07 | 南京航空航天大学 | A kind of based on the Spectral Clustering merging Kendall Tau distance metric |
-
2016
- 2016-12-20 CN CN201611186639.9A patent/CN106778078B/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040101846A1 (en) * | 2002-11-22 | 2004-05-27 | Collins Patrick J. | Methods for identifying suitable nucleic acid probe sequences for use in nucleic acid arrays |
CN102732609A (en) * | 2011-04-08 | 2012-10-17 | 博奥生物有限公司 | Method for detecting similarity of oligonucleotide and target genome |
WO2014019164A1 (en) * | 2012-08-01 | 2014-02-06 | 深圳华大基因研究院 | Method and device for analyzing microbial community composition |
CN104395900A (en) * | 2013-03-15 | 2015-03-04 | 北京未名博思生物智能科技开发有限公司 | Spatial arithmetic method of sequence alignment |
WO2016058089A1 (en) * | 2014-10-17 | 2016-04-21 | The Hospital For Sick Children | Dna methylation markers for overgrowth syndromes |
CN104657628A (en) * | 2015-01-08 | 2015-05-27 | 深圳华大基因科技服务有限公司 | Proton-based transcriptome sequencing data comparison and analysis method and system |
EP3081257A1 (en) * | 2015-04-17 | 2016-10-19 | Sorin CRM SAS | Active implantable medical device for cardiac stimulation comprising means for detecting a remodelling or reverse remodelling phenomenon of the patient |
CN106203471A (en) * | 2016-06-22 | 2016-12-07 | 南京航空航天大学 | A kind of based on the Spectral Clustering merging Kendall Tau distance metric |
Non-Patent Citations (1)
Title |
---|
黄玉娟: "基于k词的DNA序列分析的模型研究及应用", 《中国博士学位论文全文数据库(基础科学辑)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846262A (en) * | 2018-05-31 | 2018-11-20 | 广西大学 | The method that RNA secondary structure distance based on DFT calculates phylogenetic tree construction |
Also Published As
Publication number | Publication date |
---|---|
CN106778078B (en) | 2019-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Paten et al. | Cactus: Algorithms for genome multiple sequence alignment | |
US20230076457A1 (en) | Edge calculation-oriented reparametric neural network architecture search method | |
US20180276542A1 (en) | Recommendation Result Generation Method and Apparatus | |
Jacques et al. | Model-based co-clustering for ordinal data | |
JP7522936B2 (en) | Gene phenotype prediction based on graph neural networks | |
CN105825269B (en) | A kind of feature learning method and system based on parallel automatic coding machine | |
CN105335785B (en) | A kind of association rule mining method based on vector operation | |
Degnan et al. | The probability distribution of ranked gene trees on a species tree | |
CN114444584A (en) | Improvement method of Informer model and prediction method and system of long series time series | |
JP2006268558A (en) | Data processing method and program | |
Chan et al. | A maximum a posteriori probability and time-varying approach for inferring gene regulatory networks from time course gene microarray data | |
He et al. | Functional martingale residual process for high-dimensional Cox regression with model averaging | |
Xu et al. | Semi-parametric joint modeling of survival and longitudinal data: The R package JSM | |
Gordon et al. | TSI-GNN: extending graph neural networks to handle missing data in temporal settings | |
Comin et al. | Fast entropic profiler: An information theoretic approach for the discovery of patterns in genomes | |
Lee et al. | Survival prediction and variable selection with simultaneous shrinkage and grouping priors | |
Boulin et al. | High-dimensional variable clustering based on maxima of a weakly dependent random process | |
CN107103206A (en) | The DNA sequence dna cluster of local sensitivity Hash based on standard entropy | |
CN106778078A (en) | DNA sequence dna similitude comparison method based on kendall coefficient correlations | |
CN113470799B (en) | Intelligent editor of comprehensive quality supervision platform for hospitals | |
Rabin et al. | Two directional Laplacian pyramids with application to data imputation | |
CN113627513A (en) | Training data generation method and system, electronic device and storage medium | |
Tripathi et al. | Efficient estimation of the PDF and the CDF of a generalized logistic distribution | |
CN112070200B (en) | Harmonic group optimization method and application thereof | |
Faure et al. | GraphUnzip: unzipping assembly graphs with long reads and Hi-C |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190409 |
|
CF01 | Termination of patent right due to non-payment of annual fee |