[go: up one dir, main page]

CN102867049A - Chinese PINYIN quick word segmentation method based on word search tree - Google Patents

Chinese PINYIN quick word segmentation method based on word search tree Download PDF

Info

Publication number
CN102867049A
CN102867049A CN2012103320727A CN201210332072A CN102867049A CN 102867049 A CN102867049 A CN 102867049A CN 2012103320727 A CN2012103320727 A CN 2012103320727A CN 201210332072 A CN201210332072 A CN 201210332072A CN 102867049 A CN102867049 A CN 102867049A
Authority
CN
China
Prior art keywords
node
word
search
tree
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012103320727A
Other languages
Chinese (zh)
Other versions
CN102867049B (en
Inventor
于少飞
袁美英
杨震威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Conway Communication Technology Co Ltd
Original Assignee
Shandong Conwell Communication Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Conwell Communication Technology Co Ltd filed Critical Shandong Conwell Communication Technology Co Ltd
Priority to CN201210332072.7A priority Critical patent/CN102867049B/en
Publication of CN102867049A publication Critical patent/CN102867049A/en
Application granted granted Critical
Publication of CN102867049B publication Critical patent/CN102867049B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a Chinese PINYIN quick word segmentation method based on a word search tree. The method is implemented by a computer or embedded mobile equipment and comprises the following working steps of: 1, building a Chinese character PINYIN search tree according to all the known Chinese character PINYIN lists; 2, combining the search tree with a hash table according to the built word search tree, and segmenting a string of given Chinese PINYINs; 3, working out a word segmentation result; and 4, destroying the search tree and releasing resources. Due to a public prefix of a character string, a construction space is saved, so that unnecessary character string comparison is greatly reduced; by the redundancy hash table with an index, the search efficiency is improved; and the time complexity of an algorithm is reduced to the minimum.

Description

A kind of Chinese phonetic alphabet fast word segmentation method that realizes based on the word lookup tree
Technical field
The invention belongs to computing machine or various hand-held embedded movable equipment Chinese information processing technical field, particularly a kind of Chinese phonetic alphabet fast word segmentation method that realizes based on the word lookup tree.
Background technology
From a string continuous Chinese phonetic alphabet, automatically identify each individual character phonetic by the computer software algorithm, be the technology that spelling input method and search engine (associating Chinese sentence according to phonetic type key word) must use again.All existing Chinese word phonetics as key word, are set up a Hash table, by from the Hash table of setting up, repeatedly searching and mating, can realize a string continuous Chinese phonetic alphabet is carried out participle, but there is the not high problem of efficient in the method during participle.
For raising the efficiency, in the prior art above-mentioned Hash table is made following improvement: with the initial of Chinese word phonetic as key word, set up a Hash table, each element of Hash table is a single-track link table, is storing in the chained list take Hash table key word letter all individual character phonetics as beginning.Through after the improvement, when searching, from Hash table, obtain fast first the first node pointer of a single-track link table according to initial so at every turn, and then the traversal single-track link table, final coupling made.Use the Hash table through improvement to improve participle efficient, but processing polysemant, during such as " xian " " piao ", still can there be the problem that needs special processing, after retrieving " xi ", a kind of scheme is immediately " xi " to be removed from word string, next continue retrieval " an ", but phonetic " xian " has just been lost like this; First scheme is to keep " xi " in word string, continue retrieval with the phonetic of alphabetical x beginning, until find the end of single-track link table, and then remove " xi ", continue again at last retrieval " an ", so just can find all possible phonetic plan " xi " " an " " xian ", but this scheme search efficiency is lower.
Chinese patent (the patent No.: 200710118921), a kind of internal memory processing method of telephone number mapping domain name server and device, although this patent has been mentioned search tree and Hash table, but the basic function that this patent has just used search tree and Hash table to store and search, not through any improvement, also without any extension and improvement; And on the purposes of using, also there is fundamental difference, this patent has just been stored data at the node of search tree and Hash table, just find merely the function of the data of storage by search tree and Hash table, that Chinese phonetic alphabet fast word segmentation is finished in the combination of a kind of mutation by search tree and Hash table and the present invention mainly realizes, this patent is the data of searching storage, the present invention is the fast word segmentation of finishing the Chinese phonetic alphabet, and there is the difference of essence in two files in use-pattern and purposes.
Chinese patent (the patent No.: 200810129141.8), adjust the method and apparatus of candidate word sequence, although this patent has been mentioned search tree and Hash table, but the trie tree in this patent has just been used a kind of in search tree or the Hash tree, as among the present invention, search tree and Hash table are not combined use, both combine closely in the present invention, and are indispensable; And purposes is also different, the method and apparatus that this patent is adjusted candidate word sequence just utilizes search tree or Hash table to store, whether the pinyin string of judging storage is the standard spelling, using search tree or Hash tree is not the function that is used as participle, and the present invention uses the combination of the mutation of search tree and Hash tree to finish the fast word segmentation of phonetic, finally forms pinyin string.
Chinese patent (the patent No.: 200910107961.1), a kind of implementation method of fast word segmentation, although this patent also is a kind of method of participle, but this searching patents tree is realized by one-level concordance list and HASH multiway tree, the deficiency of this patent is: when processing polysemant, and the situation of easy Zao Cheng Lost word; If do not want to occur the situation of Lost word, the method that just need to take to reduce search efficiency solves, although guaranteed the correctness of lookup result, direct result is exactly the problem that has caused search efficiency lower.
Although the single-track link table in the such scheme has such as easy care in the internal memory, inserts the advantages such as deletion is simple, have that query performance is low, the shortcoming of search efficiency Di Huo Lost word.
Summary of the invention
Purpose of the present invention is exactly in order to address the above problem, a kind of Chinese phonetic alphabet fast word segmentation method that realizes based on the word lookup tree is provided, search tree is combined with Hash table, finish fast word segmentation to the Chinese phonetic alphabet with a kind of mutation of Hash tree, this kind participle mode had both been avoided low, the efficient Di, Lost word problem of query performance, improve again search efficiency, realized fast word segmentation.
To achieve these goals, the present invention adopts following technical scheme:
A kind of Chinese phonetic alphabet fast word segmentation method that realizes based on the word lookup tree, the method realizes that by computing machine or embedded movable equipment the groundwork step is as follows:
Step 1, set up Chinese word phonetic search tree according to all known Chinese word pinyin tables;
The word lookup tree that step 2, foundation have been set up is combined search tree with Hash table, given a string Chinese phonetic alphabet is carried out participle;
Step 3, provide word segmentation result;
Step 4, destruction search tree, releasing resource.
In the described step 1, set up individual character phonetic search tree according to all known Chinese word pinyin tables, mainly comprise following steps:
(1) root node does not comprise character, and each node only comprises a character except root node;
(2) character that comprises of all child nodes of each node is not identical;
(3) except leaf node, it is 26 Hash table that each node has a length, and Hash table is take the ascending order of 26 English alphabets as index, and each element is stored respectively a child node, and the child node actual number is less than or equal to 26;
(4) each node comprises an identification field, and this field value 0 or 1 is used for sign from root node to this node, and the Connection operator of process gets up whether to represent a complete Chinese word phonetic on the path.
In the described step 2, according to the word lookup tree of having set up, search tree is combined with Hash table, given a string Chinese phonetic alphabet is carried out participle, mainly comprise following steps:
A) begin once to search for from root node;
B) obtain the first letter that to search keyword, and from Hash table, select corresponding subtree and forward this subtree to proceed retrieval according to this letter;
C) on corresponding subtree, obtain the second letter that to search keyword, and further select corresponding subtree to retrieve;
D) iterative process: obtain the 1st, 2 of keyword ... n letter continues to search.
In the described step 3, concrete steps are as follows:
1) at certain node place, if all letters of keyword have been removed or the identification field value of node is 1, then from the root path to the present node, exports successively the identification field values of all characters and present node.
2) if all letters of keyword are removed, namely finish and search; Otherwise obtain the next letter of keyword, the root node of getting back to search tree continues iteration and searches.
3) for the keyword that comprises a plurality of semantemes, for example: piao, both can be construed to " ticket ", also can be construed to " fur-lined jacket ", word segmentation result will be exported all possible value.
In the described step 4, finish participle after, destroy search tree, releasing resource reclaims the internal memory that takies.
Described search tree is the polymorphic set of a kind of support, the data structure of operations such as comprising insertion, delete and search.
Described Hash table is also named hash table, is the data structure that directly conducts interviews according to key value.
Beneficial effect of the present invention:
This invention is a new breakthrough at computing machine or various hand-held embedded movable equipment Chinese information processing technical field, and comprehensively Xie Jue Lost word, problem that search efficiency is low provide a kind of new approaches of Chinese information processing technical field.
The present invention utilizes the common prefix of character string to save the structure space, reduce to greatest extent meaningless character string comparison, so not only improved the search efficiency of phonetic, also can the Effective Raise computing machine or the internal memory service efficiency of various hand-held embedded movable equipments, Effective Raise the operational efficiency of various device;
The present invention utilizes the redundant Hash table of tape index to improve search efficiency, reduces to greatest extent the time complexity of algorithm.Effectively reduced the number of searches of node, Effective Raise the real-time of algorithm, and guaranteed the accuracy of searching.Simultaneously, also Effective Raise search efficiency and effectively saved query time.
Participle mode of the present invention had both been avoided low, the efficient Di, Lost word problem of query performance, had improved again search efficiency, had realized fast word segmentation.Be a new breakthrough at participle technique, because search efficiency improves, participle efficient improves, and also can improve Chinese phonetic alphabet user's work efficiency accordingly, saves working time, reduces labour intensity.
Description of drawings
Fig. 1: search tree participle process flow diagram;
Fig. 2: the structural drawing of search tree;
Fig. 3: participle process flow diagram flow chart.
Embodiment
The invention will be further described below in conjunction with accompanying drawing and embodiment.
As shown in Figure 1, at first set up the Hash tree of a search tree and Hash table combination according to existing Chinese word pinyin table, then the given a string continuous Chinese phonetic alphabet is carried out participle, provide analysis result, destroy at last search tree, releasing resource reclaims internal memory.
Set up Hash tree, set up a word lookup tree according to all known Chinese word pinyin tables.The root node of search tree does not comprise character, and each node only comprises a character except root node.The character that all child nodes of each node of search tree comprise is not identical.Search tree is except leaf node, and it is 26 Hash table that each node has a length, and Hash table is take the ascending order of 26 English alphabets as index, and each element is stored respectively a child node, and the child node actual number is less than or equal to 26.Search tree is except root node, and each node comprises an identification field, and this field value 0 or 1 is used for sign from root node to this node, and the Connection operator of process gets up whether to represent a complete Chinese word phonetic on the path.
Word segmentation processing, the word lookup tree according to having set up carries out participle to given a string Chinese phonetic alphabet.Begin once to search for from root node.Obtain the first letter that to search keyword, and from Hash table, select corresponding subtree and forward this subtree to proceed retrieval according to this letter.On corresponding subtree, obtain the second letter that to search keyword, and further select corresponding subtree to retrieve.Same step obtains the 3rd, the 4th of keyword, a n letter, and corresponding subtree is retrieved.
The output word segmentation result, at certain node place, all letters of keyword have been removed or the identification field value of node is 1, then begin to export successively the identification field values of all characters and present node to present node from root path.If all letters of keyword are removed, namely finish and search; Otherwise obtain the next letter of keyword, the root node of getting back to search tree continues iteration and searches.For the keyword that comprises a plurality of semantemes, for example: piao, both can be construed to " ticket ", also can be construed to " fur-lined jacket ", word segmentation result will be exported all possible value.
Destroy Hash tree, finish participle after, destroy Hash tree, reclaim the internal memory that takies.
As shown in Figure 2, as space is limited, only listed the phonetic transcriptions of Chinese characters of part of representative among the figure.
" a ", root node comprises an identification field (value 0) and a Hash table, from first element (representing the child node that character is a) of root node Hash table down, (character is a) to comprise too an identification field to first order child node, at this moment the identification field value is 1, because alphabetical a namely represents a complete Chinese phonetic alphabet one.
" ai " " an " " ao ", continuation character is the node of a, it also has a Hash table, because not only " " phonetic that starts with alphabetical a equally, also have " ai " " an " " ao ", so it is three child nodes of " i " " n " " o " that a node has character, and the attribute field of three child nodes all value be 1.
" ang ", same, character is that the node of n has also comprised a child node that character is g in Hash table, and the zone bit of child node also is 1.So obtain phonetic " ang " after will connecting from a – n – g of process on the path that root node begins.
" pi " " po " " pian ", derivation principle is the same.
As shown in Figure 3, part is come the descriptor flow process with the false code of class C in this process flow diagram.
Whole process flow diagram has been described complete participle process, is subdivided into a main flow and three sub-processes.Comprise the polysemant method of processing in the pinyin string, when processing polysemant, used the recursive call mode.
Beginning of flow process is a string Chinese phonetic alphabet of input and initialization local variable, and pc1 is used for recording the beginning of an individual character participle; Pc2 is a dynamic cursor, the sign current character; Pc3 is used for recording the end of an individual character participle, and purpose is in order to process polysemant; Pt is a dynamic cursor, identifies current search tree node.
Main flow is an iterative process, carries out a series of judgement according to the value of the child node of the zone bit of pt node, character that pc2 points to and pt node, and then jumps to different sub-processes.
The job step of main flow is:
Step 1, the participle pinyin string is treated in input;
Step 2 is stated following variable: (1) character pointer pc1, pc2, pc3, (2) search tree node pointer pt; Pc1=pc2 points to the first character pc3=null of pinyin string;
Step 3, the root node of pt=search tree;
Step 4 judges whether the identification field of pt node equals 1, if just enter sub-process 1, just enters if not step 5;
Step 5 judges whether the character that pc2 points to is empty, if just enter sub-process 2, just enters if not step 6;
Step 6 judges that can the character that point to take pc as index, find child node in the Hash table of pt node; If just enter step 7; Otherwise enter sub-process 3;
Step 7 is carried out pc2++, the pt=child node; Return step 4;
Sub-process 1, when the flag of pt node was 1, a complete individual character phonetic had been found in representative, export this individual character phonetic this moment, then according to pc3(polysemant token variable) value continue the Recursion process polysemant, main flow is returned in last redirect.
The detailed step of sub-process 1 is:
Step (1-1), all characters that output is pointed to from pc1 to pc2-1, and report that this is an individual character phonetic;
Step (1-2) judges that whether pc3 is empty, if just enter step (1-4), just enters step (1-3) if not;
Step (1-3), all characters that intercepting is pointed to from pc3 to pc2-1 from the beginning recursive call of this flow process, enter step (1-4);
Step (1-4), pc3=pc2; Return the step 5 of main flow.
When sub-process 2, the character that pc2 points to were empty, representative had arrived the ending for the treatment of the participle pinyin string, if this moment, pc1(represented the beginning of an individual character participle) be not equal to pc2, all characters that output pc1 points to pc2-1, and be reported as the unallowable instruction digit string, jump out at last whole participle flow process.
The detailed step of sub-process 2 is:
Step (2-1) judges whether pc1 equals pc2, if just finish, just enters if not step (2-2);
All characters that step (2-2) output is pointed to from pc1 to pc2-1, and report that this is the unallowable instruction digit string, then finish.
Sub-process 3, at the current search tree node of pt() child node in do not find the node of coupling, this moment according to pc3(polysemant token variable) value do further judgement, if pc3 is null value, then representative has run into illegal character (for example character i, it is not the beginning character of any Chinese word phonetic), exports this unallowable instruction digit, behind the replacement variable, jump to the beginning of main flow; Otherwise representative need to be got back to the position (pipi for example, pc1 is in the beginning of string now, pc2 and pc3 are in the position of second p) of pc3, be equally the replacement variate-value after, jump to the beginning of main flow.
The detailed step of sub-process 3 is:
Step (3-1): judge that whether pc3 is empty, if so, enters step (3-2); If not, just enter step (3-6);
Step (3-2): all characters that output is pointed to from pc1 to pc2, and report that this is the unallowable instruction digit string; Enter step (3-3);
Step (3-3): judge whether pc1==pc2 sets up, if so, just enter step (3-4), if not, just enter step (3-5);
Step (3-4): pc2++; Enter step (3-5);
Step (3-5): pc1=pc2; Enter the step 3 of main flow;
Step (3-6): pc1=pc3; Pc2=pc3; Pc3=null; Enter the step 3 of main flow.
Although above-mentionedly by reference to the accompanying drawings the specific embodiment of the present invention is described; but be not limiting the scope of the invention; one of ordinary skill in the art should be understood that; on the basis of technical scheme of the present invention, those skilled in the art do not need to pay various modifications that creative work can make or distortion still in protection scope of the present invention.

Claims (6)

1. Chinese phonetic alphabet fast word segmentation method that realizes based on word lookup tree, the method realizes that by computing machine or embedded movable equipment it is characterized in that, the groundwork step is as follows:
Step 1, set up Chinese word phonetic search tree according to all known Chinese word pinyin tables;
The word lookup tree that step 2, foundation have been set up is combined search tree with Hash table, given a string Chinese phonetic alphabet is carried out participle;
Step 3, provide word segmentation result;
Step 4, destruction search tree, releasing resource.
2. a kind of Chinese phonetic alphabet fast word segmentation method that realizes based on the word lookup tree as claimed in claim 1 is characterized in that, in the described step 1, sets up individual character phonetic search tree according to all known Chinese word pinyin tables, mainly comprises following steps:
(1) root node does not comprise character, and each node only comprises a character except root node;
(2) character that comprises of all child nodes of each node is not identical;
(3) except leaf node, it is 26 Hash table that each node has a length, and Hash table is take the ascending order of 26 English alphabets as index, and each element is stored respectively a child node, and the child node actual number is less than or equal to 26;
(4) each node comprises an identification field, and this field value 0 or 1 is used for sign from root node to this node, and the Connection operator of process gets up whether to represent a complete Chinese word phonetic on the path.
3. a kind of Chinese phonetic alphabet fast word segmentation method that realizes based on the word lookup tree as claimed in claim 1 is characterized in that, in the described step 2, according to the word lookup tree of having set up, search tree is combined with Hash table, given a string Chinese phonetic alphabet is carried out participle, mainly comprise following steps:
A) begin once to search for from root node;
B) obtain the first letter that to search keyword, and from Hash table, select corresponding subtree and forward this subtree to proceed retrieval according to this letter;
C) on corresponding subtree, obtain the second letter that to search keyword, and further select corresponding subtree to retrieve;
D) iterative process: obtain the 1st, 2 of keyword ... n letter continues to search.
4. a kind of Chinese phonetic alphabet fast word segmentation method that realizes based on word lookup tree as claimed in claim 1 is characterized in that in the described step 3, concrete steps are as follows:
1) at certain node place, if all letters of keyword have been removed or the identification field value of node is 1, then from the root path to the present node, exports successively the identification field values of all characters and present node;
2) if all letters of keyword are removed, namely finish and search; Otherwise obtain the next letter of keyword, the root node of getting back to search tree continues iteration and searches;
3) for the keyword that comprises a plurality of semantemes, the output word segmentation result.
5. a kind of Chinese phonetic alphabet fast word segmentation method that realizes based on word lookup tree as claimed in claim 1 is characterized in that, in the described step 4, finish participle after, destroy search tree, releasing resource reclaims the internal memory that takies.
6. a kind of Chinese phonetic alphabet fast word segmentation method that realizes based on the word lookup tree as claimed in claim 1 is characterized in that described search tree is the polymorphic set of a kind of support, comprises the data structure of insertion, deletion and search operation.
CN201210332072.7A 2012-09-10 2012-09-10 Chinese PINYIN quick word segmentation method based on word search tree Active CN102867049B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210332072.7A CN102867049B (en) 2012-09-10 2012-09-10 Chinese PINYIN quick word segmentation method based on word search tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210332072.7A CN102867049B (en) 2012-09-10 2012-09-10 Chinese PINYIN quick word segmentation method based on word search tree

Publications (2)

Publication Number Publication Date
CN102867049A true CN102867049A (en) 2013-01-09
CN102867049B CN102867049B (en) 2015-02-25

Family

ID=47445918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210332072.7A Active CN102867049B (en) 2012-09-10 2012-09-10 Chinese PINYIN quick word segmentation method based on word search tree

Country Status (1)

Country Link
CN (1) CN102867049B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955986A (en) * 2016-04-18 2016-09-21 乐视控股(北京)有限公司 Character converting method and apparatus
CN109947779A (en) * 2019-03-29 2019-06-28 北京金山安全软件有限公司 Storage method, device and equipment for user input vocabulary
CN110019649A (en) * 2017-12-25 2019-07-16 北京新媒传信科技有限公司 A kind of method and device established, search for index tree
CN111310450A (en) * 2020-03-23 2020-06-19 中国建设银行股份有限公司 Character string word segmentation method, device, equipment and storage medium
CN111444413A (en) * 2020-04-08 2020-07-24 作业不凡(北京)教育科技有限公司 Data query method and device and computing equipment
CN113065340A (en) * 2021-03-11 2021-07-02 国网河北省电力有限公司 Similar asset prompting method and device based on similar word mining
CN113094470A (en) * 2021-04-08 2021-07-09 蔡堃 Text searching method and system
CN113961768A (en) * 2021-12-22 2022-01-21 金蝶软件(中国)有限公司 Sensitive word detection method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020060386A1 (en) * 1998-05-19 2002-05-23 Konieczny David J. Device for clamping and holding an elongated workpiece
CN101576877A (en) * 2009-06-16 2009-11-11 程治永 Fast word segmentation realization method
CN101699438A (en) * 2009-11-04 2010-04-28 北京锋力信息科技有限公司 Data access method and system
CN101751430A (en) * 2008-12-12 2010-06-23 汉王科技股份有限公司 Electronic dictionary fuzzy searching method
CN101833547A (en) * 2009-03-09 2010-09-15 三星电子(中国)研发中心 Phase level forecast inputting method based on personal corpus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020060386A1 (en) * 1998-05-19 2002-05-23 Konieczny David J. Device for clamping and holding an elongated workpiece
CN101751430A (en) * 2008-12-12 2010-06-23 汉王科技股份有限公司 Electronic dictionary fuzzy searching method
CN101833547A (en) * 2009-03-09 2010-09-15 三星电子(中国)研发中心 Phase level forecast inputting method based on personal corpus
CN101576877A (en) * 2009-06-16 2009-11-11 程治永 Fast word segmentation realization method
CN101699438A (en) * 2009-11-04 2010-04-28 北京锋力信息科技有限公司 Data access method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨毅 等: "中文分词词典机制:次字拼音首字母哈希机制", 《计算机工程与设计》, vol. 31, no. 6, 28 March 2010 (2010-03-28) *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955986A (en) * 2016-04-18 2016-09-21 乐视控股(北京)有限公司 Character converting method and apparatus
CN110019649A (en) * 2017-12-25 2019-07-16 北京新媒传信科技有限公司 A kind of method and device established, search for index tree
CN109947779A (en) * 2019-03-29 2019-06-28 北京金山安全软件有限公司 Storage method, device and equipment for user input vocabulary
CN111310450A (en) * 2020-03-23 2020-06-19 中国建设银行股份有限公司 Character string word segmentation method, device, equipment and storage medium
CN111310450B (en) * 2020-03-23 2023-07-14 中国建设银行股份有限公司 Character string word segmentation method, device, equipment and storage medium
CN111444413A (en) * 2020-04-08 2020-07-24 作业不凡(北京)教育科技有限公司 Data query method and device and computing equipment
CN113065340A (en) * 2021-03-11 2021-07-02 国网河北省电力有限公司 Similar asset prompting method and device based on similar word mining
CN113094470A (en) * 2021-04-08 2021-07-09 蔡堃 Text searching method and system
CN113094470B (en) * 2021-04-08 2022-05-24 蔡堃 Text searching method and system
CN113961768A (en) * 2021-12-22 2022-01-21 金蝶软件(中国)有限公司 Sensitive word detection method and device, computer equipment and storage medium
CN113961768B (en) * 2021-12-22 2022-05-13 金蝶软件(中国)有限公司 Sensitive word detection method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN102867049B (en) 2015-02-25

Similar Documents

Publication Publication Date Title
CN102867049B (en) Chinese PINYIN quick word segmentation method based on word search tree
US8301437B2 (en) Tokenization platform
JP3672242B2 (en) PATTERN SEARCH METHOD, PATTERN SEARCH DEVICE, COMPUTER PROGRAM, AND STORAGE MEDIUM
Baeza-Yates Introduction to Data Structures and Algorithms Related to Information Retrieval.
CN101976253B (en) Chinese variation text matching recognition method
CN100565515C (en) A kind of Chinese auto-answer method and system
CN107784110B (en) A kind of index establishment method and apparatus
CN109902142B (en) A String Fuzzy Matching and Query Method Based on Edit Distance
CN105138514A (en) Dictionary-based method for maximum matching of Chinese word segmentations through successive one word adding in forward direction
CN104063384A (en) Data retrieval method and device
CN105589894A (en) Document index establishing method and device as well as document retrieving method and device
US20220005546A1 (en) Non-redundant gene set clustering method and system, and electronic device
CN109885641B (en) Method and system for searching Chinese full text in database
KR101089722B1 (en) Prefix tree-based indexing method and apparatus, recording medium thereof
CN113076733A (en) Text matching method, terminal device and storage medium
CN108304384B (en) Word splitting method and device
CN108549679B (en) File extension fast matching method and device for URL analysis system
CN118114660A (en) Text detection method, system and computer readable storage medium
CN105426490A (en) Tree structure based indexing method
CN113886420B (en) SQL sentence generation method and device, electronic equipment and storage medium
CN109657108B (en) Domain name asset data storage and query method and system
CN105095276B (en) A method and device for mining maximum repeat sequences
CN102722527B (en) Full-text search method supporting search request containing missing symbols
CN116450639A (en) Data processing method, data processing device, electronic device and readable storage medium
CN108595584B (en) Chinese character output method and system based on digital marks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 250101 A203, block F, entrepreneurship square, Qilu Software Park, No.1 Shunhua Road, high tech Development Zone, Jinan City, Shandong Province

Patentee after: Conway Communication Technology Co., Ltd

Address before: 250101 Qilu Software Park, No. 1 Shunhua Road, Jinan High-tech Development Zone, Shandong Province, A 203

Patentee before: SHANDONG KANGWEI COMMUNICATION TECHNOLOGY Co.,Ltd.