[go: up one dir, main page]

CN107220333B - A Character Search Method Based on Sunday Algorithm - Google Patents

A Character Search Method Based on Sunday Algorithm Download PDF

Info

Publication number
CN107220333B
CN107220333B CN201710375615.6A CN201710375615A CN107220333B CN 107220333 B CN107220333 B CN 107220333B CN 201710375615 A CN201710375615 A CN 201710375615A CN 107220333 B CN107220333 B CN 107220333B
Authority
CN
China
Prior art keywords
string
characters
character
text
pattern string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710375615.6A
Other languages
Chinese (zh)
Other versions
CN107220333A (en
Inventor
刘小垒
张小松
牛伟纳
胡斌
王中晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710375615.6A priority Critical patent/CN107220333B/en
Publication of CN107220333A publication Critical patent/CN107220333A/en
Application granted granted Critical
Publication of CN107220333B publication Critical patent/CN107220333B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of computer software application, and discloses character searching methods based on a Sunday algorithm, which match a pattern string and a text pattern string by judging whether a combination of the last bit of the text window and the lower bit characters outside the text window appears in the pattern string, wherein if the matching is successful, a program is ended, if the matching is unsuccessful, the text window is moved, and the judgment is continued by using the method until the text window reaches the tail end of the text string or the matching is successful, the program is ended.

Description

一种基于Sunday算法的字符搜索方法A Character Search Method Based on Sunday Algorithm

技术领域technical field

本发明涉及字符搜索,特别是一种基于Sunday算法的字符搜索方法,用于计算机网络领域中对字符进行搜索。The invention relates to character search, in particular to a character search method based on Sunday algorithm, which is used for character search in the field of computer networks.

背景技术Background technique

字符串是计算机科学中常见的基本概念,搜索问题也是计算机科学中的基本问题。随着互联网的日渐庞大,信息也是越来越多,如何在海量的信息中快速查找自己所要的信息是网络搜索研究的热点所在;在这其中,字符串匹配算法起着非常重要的作用,一个高效的字符串匹配算法,可以极大的提高搜索的效率和质量。字符串匹配在网络领域有着广泛的应用。比如,拼写检查、语言翻译、数据压缩、搜索引擎、网络入侵检测等。Strings are a common basic concept in computer science, and the search problem is also a basic problem in computer science. With the growing size of the Internet, there is more and more information. How to quickly find the information you want in the massive amount of information is the hotspot of network search research; in this, the string matching algorithm plays a very important role, a Efficient string matching algorithm can greatly improve the efficiency and quality of search. String matching has a wide range of applications in the network field. For example, spell checking, language translation, data compression, search engines, network intrusion detection, etc.

现有的常见的字符串匹配算法有Brute force、KMP、Boyer Moore、Sunday、robin_karp和bitap等。其中,Sunday算法是平均效率较高的一种匹配方案。Existing common string matching algorithms include Brute force, KMP, Boyer Moore, Sunday, robin_karp, and bitap. Among them, the Sunday algorithm is a matching scheme with higher average efficiency.

Sunday算法是Daniel M.Sunday于1990年提出的一种字符串模式匹配算法。其核心思想是:在匹配过程中,模式串并不被要求一定要按从左向右进行比较还是从右向左进行比较,它在发现不匹配时,算法能跳过尽可能多的字符以进行下一步的匹配,从而提高了匹配效率。Sunday algorithm is a string pattern matching algorithm proposed by Daniel M.Sunday in 1990. The core idea is: in the matching process, the pattern string is not required to be compared from left to right or from right to left. When it finds a mismatch, the algorithm can skip as many characters as possible to avoid The next step of matching is performed, thereby improving the matching efficiency.

Sunday算法思想跟BM算法很相似,在匹配失败时关注的是文本串中参加匹配的最末位字符的下一位字符。如果该字符没有在匹配串中出现则直接跳过,即移动步长=匹配串长度+1;否则,同BM算法一样其移动步长=匹配串中最右端的该字符到末尾的距离+1。The idea of the Sunday algorithm is very similar to the BM algorithm. When the match fails, the focus is on the next character of the last character in the text string that participates in the match. If the character does not appear in the matching string, skip it directly, that is, the moving step = the length of the matching string + 1; otherwise, the same as the BM algorithm, the moving step = the distance from the rightmost character in the matching string to the end + 1 .

Sunday算法在模式串很长的时候(很多实际应用中,模式串都会很长),文本窗口外下一个字符在模式串中出现的概率会很高,这样会增加很多无效匹配,由此Sunday算法的效率会显著降低。When the pattern string of the Sunday algorithm is very long (in many practical applications, the pattern string will be very long), the probability of the next character outside the text window appearing in the pattern string will be very high, which will increase a lot of invalid matches, so the Sunday algorithm efficiency will be significantly reduced.

发明内容SUMMARY OF THE INVENTION

基于以上技术问题,本发明提供了一种基于Sunday算法的字符搜索方法,从而解决了在字符搜索中产生很多无效匹配的技术问题。Based on the above technical problems, the present invention provides a character search method based on the Sunday algorithm, thereby solving the technical problem of generating many invalid matches in the character search.

本发明的技术方案如下:The technical scheme of the present invention is as follows:

一种基于Sunday算法的字符搜索方法,包括以下步骤:A character search method based on Sunday algorithm, including the following steps:

步骤1:利用模式串中相邻两字符的信息构建辅助数组,所述模式串为待匹配的字符串;Step 1: use the information of two adjacent characters in the pattern string to construct an auxiliary array, and the pattern string is the character string to be matched;

步骤2:文本窗口与文本字符串左对齐,所述文本字符串为待搜索文本,所述文本窗口在所述文本字符串上滑动;利用所述辅助数组判断所述文本窗口内最后两个字符是否在模式串中出现;若未出现,跳转到步骤4;若出现,则将模式串中出现的字符与所述最后两个字符对齐,判断模式串与文本字符串对应位置上的字符是否匹配,若匹配,则匹配成功程序结束;若不匹配,则跳转到步骤3;Step 2: The text window is left aligned with the text string, the text string is the text to be searched, and the text window slides on the text string; the auxiliary array is used to determine the last two characters in the text window Whether it appears in the pattern string; if not, jump to step 4; if it does, align the characters appearing in the pattern string with the last two characters, and judge whether the characters in the corresponding positions of the pattern string and the text string are If it matches, the program ends if the match is successful; if it does not match, jump to step 3;

步骤3:采用Sunday算法进行下一次的匹配和跳转,若采用Sunday算法匹配成功,则程序结束;若采用Sunday算法匹配失败,则判断Sunday算法跳转的长度是否超过文本窗口的长度,若超过文本窗口长度,则跳转到步骤4;若未超过文本窗口长度,则重复步骤3;Step 3: Use the Sunday algorithm to perform the next match and jump. If the Sunday algorithm matches successfully, the program ends; if the Sunday algorithm fails to match, judge whether the length of the Sunday algorithm jump exceeds the length of the text window. If the length of the text window is exceeded, go to step 4; if the length of the text window is not exceeded, repeat step 3;

步骤4:文本窗口向右移动J个字符长度,将文本窗口中的末位字符和文本窗外与所述末位字符相邻的字符作为组合字符,利用辅助数组判断所述组合字符是否在模式串中出现;若出现,跳转到步骤5;若未出现,判断文本窗口是否超出文本字符串,若超出,则匹配失败程序结束;若未超出,重复步骤4的内容;其中模式串的长度和文本窗口的长度均为J个字符长度;Step 4: Move the text window to the right by J characters, take the last character in the text window and the character adjacent to the last character outside the text window as the combined character, and use the auxiliary array to determine whether the combined character is in the pattern string. If it appears, jump to step 5; if it does not appear, judge whether the text window exceeds the text string, if it does, the match fails and the program ends; if it does not exceed, repeat the content of step 4; the length of the pattern string and The length of the text window is J characters;

步骤5:模式串中出现的字符与所述组合字符对齐,判断模式串与文本字符串对应位置上的字符是否匹配;若匹配则匹配成功程序结束;若不匹配,跳转到步骤3。Step 5: The characters appearing in the pattern string are aligned with the combined characters, and it is judged whether the pattern string matches the characters in the corresponding positions of the text string;

进一步的,步骤1中辅助数组构建方式如下:Further, the auxiliary array construction method in step 1 is as follows:

S201:利用模式串中字符的ASCII码值和设定的index值,计算模式串中相邻两字符的哈希值,采用如下公式:S201: Use the ASCII code value of the character in the pattern string and the set index value to calculate the hash value of two adjacent characters in the pattern string, using the following formula:

H[i]=A[i]×index+A[i+1]×(index+1) (1)H[i]=A[i]×index+A[i+1]×(index+1) (1)

其中,i表示模式串中字符的位置序号;H[i]表示模式串中相邻两字符的哈希值;A[i]表示第i个字符的ASCII码值;Among them, i represents the position number of the character in the pattern string; H[i] represents the hash value of two adjacent characters in the pattern string; A[i] represents the ASCII code value of the i-th character;

S202:利用所有字符两两组合产生的最大哈希值作为辅助数组中元素的个数,且字符两两组合产生的哈希值的大小为所述元素的位置序号;S202: the maximum hash value generated by the combination of all characters is used as the number of elements in the auxiliary array, and the size of the hash value generated by the combination of characters is the position sequence number of the element;

S203:将S201中计算出的哈希值映射到辅助数组中,具体方式如下:S203: Map the hash value calculated in S201 to the auxiliary array, and the specific method is as follows:

哈希值H[i]对应位置上的元素为1,若模式串中两组及以上相邻两字符组合产生的哈希值H[i]相同,则将该哈希值H[i]对应位置上的元素设为大于1的数,辅助数组中其余位置上的元素均为0。The element at the corresponding position of the hash value H[i] is 1. If the hash value H[i] generated by the combination of two or more adjacent characters in the pattern string is the same, then the hash value H[i] corresponds to The element at the position is set to a number greater than 1, and the elements at the remaining positions in the auxiliary array are all 0.

进一步的,所述利用辅助数组进行判断方法为:计算文本字符串中需要判断的相邻两字符的哈希值,利用该哈希值查找辅助数组中对应位置上的元素,若元素为1,则代表文本字符串中相邻两字符与模式串匹配或出现在模式串中;若元素为0,则代表文本字符串中相邻两字符与模式串不匹配或未出现在模式串中;若元素为大于1的数,则代表该字符组合与其余字符组合的哈希值相同,则进行步骤3的内容。Further, the judging method using the auxiliary array is: calculating the hash value of two adjacent characters that need to be judged in the text string, and using the hash value to find the element on the corresponding position in the auxiliary array, if the element is 1, It means that two adjacent characters in the text string match the pattern string or appear in the pattern string; if the element is 0, it means that the two adjacent characters in the text string do not match the pattern string or do not appear in the pattern string; if If the element is a number greater than 1, it means that the hash value of the combination of characters is the same as that of the other combinations of characters, and the content of step 3 is performed.

综上所述,由于采用了上述技术方案,本发明的有益效果是:To sum up, due to the adoption of the above-mentioned technical solutions, the beneficial effects of the present invention are:

采用字符的ASCII值和index值进行哈希值的计算,简便易行;利用组合字符形式来进行匹配,可有效的降低无效匹配的概率,从而提高Sunday算法的效率。Using the ASCII value and index value of the character to calculate the hash value is simple and easy; using the combination of characters to match can effectively reduce the probability of invalid matching, thereby improving the efficiency of the Sunday algorithm.

模式串一旦确定,通过预处理得出辅助数组后再匹配,提高在实际应用中的使用效率;比如网络入侵中恶意代码的检测、论文检索中大段文本的搜索,病毒多特征扫描,由于这些应用的模式串一般都很长,所以在本发明的应用上能有较大效率的提升。Once the pattern string is determined, the auxiliary array is obtained through preprocessing and then matched to improve the efficiency of use in practical applications; such as the detection of malicious code in network intrusion, the search of large texts in paper retrieval, and multi-feature scanning of viruses. The applied pattern string is generally very long, so the application of the present invention can greatly improve the efficiency.

附图说明Description of drawings

图1是本发明的流程图;Fig. 1 is the flow chart of the present invention;

图2是本发明的实验结果图Fig. 2 is the experimental result figure of the present invention

具体实施方式Detailed ways

本说明书中公开的所有特征,除了互相排斥的特征和/或步骤以外,均可以以任何方式组合。All features disclosed in this specification, except mutually exclusive features and/or steps, may be combined in any way.

下面结合附图对本发明作详细说明。The present invention will be described in detail below with reference to the accompanying drawings.

一种基于Sunday算法的字符搜索方法,包括以下步骤:A character search method based on Sunday algorithm, including the following steps:

步骤1:利用模式串中相邻两字符的信息构建辅助数组,所述模式串为待匹配的字符串;Step 1: use the information of two adjacent characters in the pattern string to construct an auxiliary array, and the pattern string is the character string to be matched;

辅助数组构建方式如下:The auxiliary array is constructed as follows:

S201:利用模式串中字符的ASCII码值和设定的index值,计算模式串中相邻两字符的哈希值,采用如下公式:S201: Use the ASCII code value of the character in the pattern string and the set index value to calculate the hash value of two adjacent characters in the pattern string, using the following formula:

H[i]=A[i]×index+A[i+1]×(index+1) (2)H[i]=A[i]×index+A[i+1]×(index+1) (2)

其中,i表示模式串中字符的位置序号;H[i]表示模式串中相邻两字符的哈希值;A[i]表示第i个字符的ASCII码值;Among them, i represents the position number of the character in the pattern string; H[i] represents the hash value of two adjacent characters in the pattern string; A[i] represents the ASCII code value of the i-th character;

S202:利用所有字符两两组合产生的最大哈希值作为辅助数组中元素的个数,且字符两两组合产生的哈希值的大小为所述元素的位置序号;S202: the maximum hash value generated by the combination of all characters is used as the number of elements in the auxiliary array, and the size of the hash value generated by the combination of characters is the position sequence number of the element;

S203:将S201中计算出的哈希值映射到辅助数组中,具体方式如下:S203: Map the hash value calculated in S201 to the auxiliary array, and the specific method is as follows:

哈希值H[i]对应位置上的元素为1,若模式串中两组及以上相邻两字符组合产生的哈希值H[i]相同,则将该哈希值H[i]对应位置上的元素设为大于1的数,辅助数组中其余位置上的元素均为0。The element at the corresponding position of the hash value H[i] is 1. If the hash value H[i] generated by the combination of two or more adjacent characters in the pattern string is the same, then the hash value H[i] corresponds to The element at the position is set to a number greater than 1, and the elements at the remaining positions in the auxiliary array are all 0.

步骤2:文本窗口与文本字符串左对齐,所述文本字符串为待搜索文本,所述文本窗口在所述文本字符串上滑动;计算所述文本窗口内最后两个字符的哈希值,在辅助数组中,若计算出的哈希值对应位置上的元素为0,则代表最后两个字符未出现在模式串中,跳转到步骤4;若计算出的哈希值对应位置上的元素为1,则代表则代表最后两个字符出现在模式串中,则将模式串中出现的字符与所述最后两个字符对齐,判断模式串与文本字符串对应位置上的字符是否匹配,若匹配,则匹配成功程序结束;若不匹配,则跳转到步骤3;若计算出的哈希值对应位置上的元素为大于1的值,则代表存在另外组合字符的哈希值与该哈希值相同,为减小错误率,跳转到步骤3。Step 2: the text window is left aligned with the text string, the text string is the text to be searched, and the text window slides on the text string; the hash value of the last two characters in the text window is calculated, In the auxiliary array, if the element in the corresponding position of the calculated hash value is 0, it means that the last two characters do not appear in the pattern string, and jump to step 4; if the corresponding position of the calculated hash value is 0 If the element is 1, it means that the last two characters appear in the pattern string, then align the characters appearing in the pattern string with the last two characters, and judge whether the pattern string matches the characters in the corresponding position of the text string, If it matches, the matching program ends; if it does not match, jump to step 3; if the element at the corresponding position of the calculated hash value is a value greater than 1, it means that there is another combined character hash value and this The hash values are the same. To reduce the error rate, go to step 3.

步骤3:采用Sunday算法进行下一次的匹配和跳转,若采用Sunday算法匹配成功,则程序结束;若采用Sunday算法匹配失败,则判断Sunday算法跳转的长度是否超过文本窗口的长度,若超过文本窗口长度,则跳转到步骤4;若未超过文本窗口长度,则重复步骤3;Step 3: Use the Sunday algorithm to perform the next match and jump. If the Sunday algorithm matches successfully, the program ends; if the Sunday algorithm fails to match, judge whether the length of the Sunday algorithm jump exceeds the length of the text window. If the length of the text window is exceeded, go to step 4; if the length of the text window is not exceeded, repeat step 3;

步骤4:文本窗口向右移动J个字符长度,将文本窗口中的末位字符和文本窗外与所述末位字符相邻的字符作为组合字符,计算所述组合字符的哈希值,在辅助数组中,若计算出的哈希值对应位置上的元素为0,则代表最后两个字符未出现在模式串中,判断文本窗口是否超出文本字符串,若超出,则匹配失败程序结束;若未超出,重复步骤4的内容;若计算出的哈希值对应位置上的元素为1,则代表则代表最后两个字符出现在模式串中,跳转到步骤5;若计算出的哈希值对应位置上的元素为大于1的值,则代表存在另外组合字符的哈希值与该哈希值相同,为减小错误率,跳转到步骤3;其中模式串的长度和文本窗口的长度均为J个字符长度;Step 4: Move the text window to the right by J characters, take the last character in the text window and the character adjacent to the last character outside the text window as a combined character, calculate the hash value of the combined character, and use the auxiliary In the array, if the element at the corresponding position of the calculated hash value is 0, it means that the last two characters do not appear in the pattern string, and it is judged whether the text window exceeds the text string. If it exceeds, the matching fails and the program ends; if If it does not exceed, repeat the content of step 4; if the element in the corresponding position of the calculated hash value is 1, it means that the last two characters appear in the pattern string, and jump to step 5; if the calculated hash value is 1 If the element at the corresponding position of the value is a value greater than 1, it means that the hash value of another combined character is the same as the hash value. In order to reduce the error rate, go to step 3; the length of the pattern string is the same as that of the text window. The length is all J characters long;

步骤5:模式串中出现的字符与所述组合字符对齐,判断模式串与文本字符串对应位置上的字符是否匹配;若匹配则匹配成功程序结束;若不匹配,跳转到步骤3。Step 5: The characters appearing in the pattern string are aligned with the combined characters, and it is judged whether the pattern string matches the characters in the corresponding positions of the text string;

本发明的工作原理是:构建辅助数组,文本窗口在文本字符串上滑动,滑动长度为模式串的长度;将文本窗口的最后一个字符与文本窗外的下一个字符进行组合,通过辅助数组来判断该字符组合是否出现在模式串中,以此来对字符串进行匹配。The working principle of the invention is as follows: an auxiliary array is constructed, the text window slides on the text string, and the sliding length is the length of the pattern string; the last character of the text window and the next character outside the text window are combined, and the auxiliary array is used to judge Whether the character combination appears in the pattern string to match the string.

下面,结合具体实施例来对本发明做进一步详细说明。Hereinafter, the present invention will be further described in detail with reference to specific embodiments.

具体实施例specific embodiment

具体实施例1Specific Example 1

以下为本发明C语言的实现程序:The following is the implementation program of the C language of the present invention:

具体实施例2Specific embodiment 2

以下为本发明的实验结果图,具体为与Sunday算法的效率比较图。The following is a graph of experimental results of the present invention, specifically an efficiency comparison graph with the Sunday algorithm.

生成10KB的随机纯英文字母和数字作为文本字符串,选取特定特点的字符串作模式串,用原Sunday算法和本发明采用的算法进行搜索,比较它们的实际效率,重复调用1000次求总时间(单位:毫秒),得出下列统计图。Generate 10KB random pure English letters and numbers as text strings, select character strings with specific characteristics as pattern strings, use the original Sunday algorithm and the algorithm adopted by the present invention to search, compare their actual efficiency, and repeat the call 1000 times to find the total time (unit: milliseconds), the following statistical graphs are obtained.

图中,A代表Sunday算法,B代表本发明采用的算法;In the figure, A represents the Sunday algorithm, and B represents the algorithm adopted by the present invention;

柱体1采用的模式串为:abcdef;The pattern string used by cylinder 1 is: abcdef;

柱体2采用模式串为:abcdefghijklmnopqrstuvwxyz;Cylinder 2 adopts the pattern string: abcdefghijklmnopqrstuvwxyz;

柱体3采用模式串为:Cylinder 3 adopts the pattern string as:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ;abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ;

柱体4采用模式串为:Cylinder 4 adopts the pattern string as:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567

如上所述即为本发明的实施例。本发明不局限于上述实施方式,任何人应该得知在本发明的启示下做出的结构变化,凡是与本发明具有相同或相近的技术方案,均落入本发明的保护范围之内。The above are the embodiments of the present invention. The present invention is not limited to the above-mentioned embodiments, and anyone should know that structural changes made under the inspiration of the present invention, and all technical solutions that are the same or similar to those of the present invention, fall within the protection scope of the present invention.

Claims (3)

1, character search method based on Sunday algorithm, which is characterized in that the method comprises the following steps:
step 1: constructing an auxiliary array by using information of two adjacent characters in a pattern string, wherein the pattern string is a character string to be matched;
step 2: the method comprises the following steps that a text window is aligned with a text character string on the left, the text character string is a text to be searched, and the text window slides on the text character string; judging whether the last two characters in the text window appear in the pattern string by using the auxiliary array; if not, jumping to the step 4; if yes, aligning the characters appearing in the pattern string with the last two characters, judging whether the characters on the corresponding positions of the pattern string and the text character string are matched, and if yes, ending the matching success program; if not, jumping to the step 3;
step 3, adopting a Sunday algorithm to carry out matching and jumping for times, if the Sunday algorithm is successfully matched, ending the program, if the Sunday algorithm is unsuccessfully matched, judging whether the jumping length of the Sunday algorithm exceeds the length of a text window, if the jumping length of the Sunday algorithm exceeds the length of the text window, jumping to the step 4, and if the jumping length of the text window is not exceeded, repeating the step 3;
and 4, step 4: moving the text window to the right by J character length, taking the last character in the text window and the character outside the text window and adjacent to the last character as combined characters, and judging whether the combined characters appear in the pattern string by using an auxiliary array; if yes, jumping to the step 5; if not, judging whether the text window exceeds the text character string, if so, ending the matching failure program; if not, repeating the content of the step 4; the length of the mode string and the length of the text window are J characters;
and 5: aligning the characters appearing in the pattern string with the combined characters, and judging whether the characters on the corresponding positions of the pattern string and the text character string are matched or not; if the matching is successful, the matching procedure is ended; and if not, jumping to the step 3.
2. The character searching method based on Sunday algorithm, as claimed in claim 1, wherein the auxiliary array is constructed in the following manner in step 1:
s201: calculating the hash value of two adjacent characters in the pattern string by using the ASCII code value of the characters in the pattern string and the set index value, and adopting the following formula:
H[i]=A[i]×index+A[i+1]×(index+1)
wherein i represents the position serial number of the character in the pattern string; h [ i ] represents the hash value of two adjacent characters in the pattern string; a [ i ] represents an ASCII code value of the ith character;
s202: the maximum hash value generated by combining all characters in pairs is used as the number of elements in the auxiliary array, and the size of the hash value generated by combining all the characters in pairs is the position serial number of the elements;
s203: mapping the hash value calculated in S201 to the auxiliary array in the following specific manner:
the elements at the positions corresponding to the hash values Hi are 1, if the hash values Hi generated by two or more groups of adjacent character combinations in the pattern string are the same, the elements at the positions corresponding to the hash values Hi are set to be numbers larger than 1, and the elements at the rest positions in the auxiliary array are 0.
3. The character searching method based on Sunday algorithm, as claimed in claim 2, wherein the method for determining by using the auxiliary array comprises calculating the hash value of two adjacent characters to be determined in the text string, searching the element at the corresponding position in the auxiliary array by using the hash value, if the element is 1, then representing that the two adjacent characters in the text string match the pattern string or appear in the pattern string, if the element is 0, then representing that the two adjacent characters in the text string do not match the pattern string or do not appear in the pattern string, and if the element is a number greater than 1, then representing that the hash value of the character combination is the same as that of the other character combinations, then proceeding with the content of step 3.
CN201710375615.6A 2017-05-24 2017-05-24 A Character Search Method Based on Sunday Algorithm Expired - Fee Related CN107220333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710375615.6A CN107220333B (en) 2017-05-24 2017-05-24 A Character Search Method Based on Sunday Algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710375615.6A CN107220333B (en) 2017-05-24 2017-05-24 A Character Search Method Based on Sunday Algorithm

Publications (2)

Publication Number Publication Date
CN107220333A CN107220333A (en) 2017-09-29
CN107220333B true CN107220333B (en) 2020-01-31

Family

ID=59944598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710375615.6A Expired - Fee Related CN107220333B (en) 2017-05-24 2017-05-24 A Character Search Method Based on Sunday Algorithm

Country Status (1)

Country Link
CN (1) CN107220333B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109348304B (en) * 2018-09-30 2021-04-27 武汉斗鱼网络科技有限公司 Bullet screen data verification method and device and terminal
CN109977276B (en) * 2019-03-22 2020-12-22 华南理工大学 Sunday algorithm-based improved single-mode matching method
CN111814009B (en) * 2020-06-28 2022-03-01 四川长虹电器股份有限公司 Mode matching method based on search engine retrieval information
CN112671413B (en) * 2020-12-25 2022-09-06 浪潮云信息技术股份公司 Data compression method and system based on LZSS algorithm and Sunday algorithm

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120063879A (en) * 2010-12-08 2012-06-18 서울대학교산학협력단 Method for searching string matching on multi-byte character set texts
CN104519056A (en) * 2014-12-15 2015-04-15 广东科学技术职业学院 Double-jump-based single mode matching method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120063879A (en) * 2010-12-08 2012-06-18 서울대학교산학협력단 Method for searching string matching on multi-byte character set texts
CN104519056A (en) * 2014-12-15 2015-04-15 广东科学技术职业学院 Double-jump-based single mode matching method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Poster:A Y_C sunday algorithm based on improved sunday algorithm;Yang Z等;《International conference oncommunications & networking in china》;20151231;正文第680-681页 *
Sunday字符串匹配算法的效率改进;徐珊等;《计算机工程与应用》;20111231;正文第96-98页 *
改进的SUnday模式匹配算法;万晓榆等;《计算机工程》;20090430;正文第125-126页 *

Also Published As

Publication number Publication date
CN107220333A (en) 2017-09-29

Similar Documents

Publication Publication Date Title
EP2585962B1 (en) Password checking
CN107220333B (en) A Character Search Method Based on Sunday Algorithm
CN101976253B (en) Chinese variation text matching recognition method
CN114022882B (en) Text recognition model training method, text recognition device, text recognition equipment and medium
CN109977276B (en) Sunday algorithm-based improved single-mode matching method
CN110765458A (en) Malicious software detection method and device based on deep learning
JP5862413B2 (en) Information conversion rule generation program, information conversion rule generation device, and information conversion rule generation method
CN103198149A (en) Method and system for query error correction
WO2015003421A1 (en) Algorithm for fast character string matching
JP2013206187A (en) Information conversion device, information search device, information conversion method, information search method, information conversion program and information search program
CN102750379A (en) Fast character string matching method based on filtering type
CN108647511A (en) The password strength assessment method derived based on weak passwurd
CN103927325B (en) Method and device for classifying URLs
CN115088038A (en) Improved quality value compression framework in aligned sequencing data based on new context
CN115563626A (en) Vulnerability availability prediction method for CVE
CN113407693B (en) Text similarity comparison method and device for full-media reading
CN118199922A (en) Malicious mining domain name detection method based on deep learning
EP4293956A1 (en) Method for predicting malicious domains
CN117010368A (en) Chinese error correction data enhancement method based on font similarity
CN117010366A (en) Text specific sentence-oriented content identification and correction method
CN113065419B (en) Pattern matching algorithm and system based on flow high-frequency content
CN112632526B (en) A User Password Modeling and Strength Evaluation Method Based on Synthetic Segmentation
CN109002423A (en) text search method and device
WO2021072892A1 (en) Legal provision search method based on neural network hybrid model, and related device
CN115525801A (en) Pattern matching algorithm for network security system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200131