[go: up one dir, main page]

CN102200984A - Search method based on compound words and search engine server - Google Patents

Search method based on compound words and search engine server Download PDF

Info

Publication number
CN102200984A
CN102200984A CN2010101380777A CN201010138077A CN102200984A CN 102200984 A CN102200984 A CN 102200984A CN 2010101380777 A CN2010101380777 A CN 2010101380777A CN 201010138077 A CN201010138077 A CN 201010138077A CN 102200984 A CN102200984 A CN 102200984A
Authority
CN
China
Prior art keywords
word
combination
compound
compound word
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010101380777A
Other languages
Chinese (zh)
Inventor
刘致远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Shenzhen Tencent Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Computer Systems Co Ltd filed Critical Shenzhen Tencent Computer Systems Co Ltd
Priority to CN2010101380777A priority Critical patent/CN102200984A/en
Publication of CN102200984A publication Critical patent/CN102200984A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a search method based on compound words, comprising the steps of: performing frequency statistic on the word combinations of each web page stored in a web page library, wherein the word combination is composed of unitary words obtained by performing word separation on the web page; establishing compound word indexes for the word combinations having frequency greater than a preset threshold value, wherein the word combination having a frequency greater than the preset threshold value is a compound word; and when an obtained search request comprises key words matched with the compound words, returning search results according to the compound word indexes. Accordingly, the invention discloses a search engine server. By establishing the compound word indexes, the splitting degree of search words input by a user is reduced, the search efficiency is improved and the operation time is reduced; and therefore, the search request of the user can be responded rapidly.

Description

A kind of searching method and search engine server based on compound word
Technical field
The present invention relates to internet arena, relate in particular to a kind of searching method and search engine server based on compound word.
Background technology
Along with Internet development, the information on the internet is more and more abundanter, and the effect in user's daily life and work and study is more and more obvious, becomes the user obtains information from the internet main path gradually.And therein, obtain the great proportion of having accounted for of information by search engine.Therefore search engine becomes the very fierce field of current competition, and the vital point of its competition is except content abundant, and also having needs to improve user experience.In general, search engine is except guaranteeing the quality of search information, and another important factor is the speed of search.The user uses search engine to wish that its searching request can access response fast, and therefore the speed of search has become one of deciding factor of user experience quality.
The searching request of search engine process user need split user's retrieval language, then to the word difference search index after splitting, obtains the Search Results of each word respectively.Concrete, describe by following example." win tomato " such as user search, the treatment scheme of search engine is:
After user's searching request carried out synonym normalization, top retrieval language is split as " winning " and " tomato ";
" winning " carried out search index, obtain results set A;
" tomato " carried out search index, obtain results set B;
Indexed results according to " winning " and " tomato " is carried out search arithmetic: to A and the B computing that seeks common ground, obtain the common factor X of AB; A and B are carried out cup, obtain AB's and gather Y;
Export Search Results to the user at last, clooating sequence is: the webpage of set among the X comes the foremost, secondly be among the Y not at the element of X, be the element that A and B do not exist in the Y set at last.
In the prior art, the long retrieval language for user's input need split into more monobasic word, and carry out search index to splitting the monobasic word that obtains respectively then, and carry out the computing of carrying out in twos, and then the output result.For example, the retrieval language of user's input is: People's Bank of China, then need to carry out search index respectively to splitting " China ", " people ", " bank " that obtain, and carry out search arithmetic then in twos.This method need be carried out search arithmetic repeatedly, and because the combination between some monobasic speech is not have the combination of the word of practical significance, also need to set up index and carry out computing, wasted storage space and handled resource, increased the time of haveing suffered search procedure simultaneously.
Summary of the invention
Embodiment of the invention technical matters to be solved is, a kind of method of the searching method based on compound word is provided.Can reduce fractionation granularity by setting up the compound word index, reduce operation time, thereby can respond user's searching request faster the retrieval language of user's input.
In order to solve the problems of the technologies described above, the embodiment of the invention provides a kind of method of the searching method based on compound word, and this method comprises:
The word of each webpage that web page library is preserved is combined into line frequency degree statistics, and described word combination is formed by this webpage being carried out the monobasic word that word segmentation processing obtains;
For frequency is set up the compound word index greater than the word combination of predetermined threshold value; Described frequency is combined as compound word greater than the word of predetermined threshold value;
When in obtaining searching request, containing the keyword that mates with described compound word, return Search Results according to described compound word index.
Accordingly, the embodiment of the invention provides a kind of search engine server, and this server comprises:
The frequency statistics unit, the word that is used for each webpage that web page library is preserved is combined into line frequency degree statistics, and described word combination is formed by this webpage being carried out the monobasic word that word segmentation processing obtains;
The screening unit is used to screen the word combination of frequency greater than predetermined threshold value, and described frequency is combined as compound word greater than the word of predetermined threshold value;
Generation unit is used to described compound word to set up the compound word index;
Search processing when containing the keyword that mates with described compound word in obtaining searching request, is returned Search Results according to described compound word index.
The embodiment of the invention is screened compound word by the frequency according to the combination of statistics word, and sets up the compound word index for compound word, when containing the keyword that mates with described compound word in obtaining searching request, returns Search Results according to described compound word index.Can reduce fractionation granularity, save and handle resource, reduce operation time, thereby can respond user's searching request faster the retrieval language of user's input.
Description of drawings
Fig. 1 is that the present invention realizes the first embodiment process flow diagram based on the searching method of compound word;
Fig. 2 is that the present invention realizes the second embodiment process flow diagram based on the searching method of compound word;
Fig. 3 is the structural representation of embodiment of the invention search engine server first embodiment;
Fig. 4 is the structural representation of embodiment of the invention search engine server second embodiment.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, the embodiment of the invention is described in further detail below in conjunction with accompanying drawing.
Fig. 1 is that the present invention realizes the first embodiment process flow diagram based on the searching method of compound word.As shown in Figure 1, this method specifically comprises:
Step S101, the word of each webpage that web page library is preserved is combined into line frequency degree statistics, and described word combination is formed by this webpage being carried out the monobasic word that word segmentation processing obtains;
Step S102 is for frequency is set up the compound word index greater than the word combination of predetermined threshold value; Described frequency is combined as compound word greater than the word of predetermined threshold value;
Step S103 when containing the keyword that mates with described compound word in obtaining searching request, returns Search Results according to described compound word index.
The embodiment of the invention is screened compound word by the frequency according to the combination of statistics word, and sets up the compound word index for compound word, when containing the keyword that mates with described compound word in obtaining searching request, returns Search Results according to described compound word index.Can reduce fractionation granularity, save and handle resource, reduce operation time, thereby can respond user's searching request faster the retrieval language of user's input.
Fig. 2 is that the present invention realizes the second embodiment process flow diagram based on the searching method of compound word.As shown in Figure 2, this method specifically comprises:
Step S201 is provided with the compound word candidate storehouse of preparing to be used to add up the word combination, and described compound word candidate storehouse is used to deposit the word combination, and the counter of corresponding each word combination.
In the present embodiment, realize above-mentioned setting by the compound word candidate storehouse of a sky of initialization.Concrete mode is: defining a compound word candidate storehouse M, is a set, and its effect is to deposit following elements A:
A=struct{
The compound entry of String
The counter of the compound entry of Int32iCounter
}
In fact M realizes being exactly the array of a structure A in program.
Step S202 for each webpage in the web page library, reads this webpage earlier and this webpage is carried out word segmentation processing.
In this enforcement, participle is that sentence is carried out the process of cutting participle by word.It is to remove stop words that webpage is carried out the word segmentation processing main policies, and carries out synonym normalization, obtains the result after the word segmentation processing then.For example, the web page contents that reads is " international strategies of Intellectual Property in China is selected to arrange with domestic strategy ".Remove stop words, and carry out synonym normalization and carry out participle, the monobasic word of acquisition is: " China ", " knowledge ", " property right ", " world ", " strategy ", " selection ", " domestic ", " strategy ", " arrangement ".
Step S203, the monobasic word that described word segmentation processing is obtained carries out permutation and combination, obtains the word combination.
In the present embodiment, the monobasic word that word segmentation processing is obtained carries out permutation and combination, obtains the word combination.Described monobasic speech is for forming the basic word or the vocabulary of phrase or statement, with the monobasic speech that obtains of giving an example among the step S202 is that example is carried out permutation and combination, can obtain various binary speech permutation and combination such as " Chinese knowledge ", " intellecture property ", " international strategies ", " domestic strategy ", " strategy is arranged ", " selecting domestic ", " strategic choice ", not enumerate one by one at this.Can also obtain various binary speech permutation and combination such as " Intellectual Property in China ", " selecting domestic strategy ", not enumerate one by one at this.Same can obtain more polynary word combination, can be by the unit time that queueing discipline is provided with the maximum of combination word is set.Described queueing discipline is used to indicate the monobasic word that described word segmentation processing is obtained to carry out permutation and combination according to the rule that is provided with, and except the unit of the maximum of the combination word that is provided with time, setting such as can also whether make up at random.
Step S204 judges the word combination that whether has acquisition in the described compound word candidate storehouse, if the judgment is Yes, and then counter values that should the word combination is added one; Otherwise, be used to increase the combination of this word at described compound word candidate storehouse, and counter values that should the word combination is set at one.
In this enforcement, word is combined in the number of times that occurs in the webpage of the web page library frequency this word combination, occurs 30 times in the webpage of web page library such as a word, says that then the frequency that this word makes up this web page library is 30.
Each word combination that permutation and combination obtains is judged, if there is the word combination that obtains in the described compound word candidate storehouse, then counter values that should the word combination is added one; If there is not the word combination of acquisition in the described compound word candidate storehouse, then be used to increase this word combination at described compound word candidate storehouse, and counter values that should the word combination is set at one.To each word combination, be made as Xi, specific algorithm is as follows:
IF (Xi is in candidate storehouse M)
{
Among the visit M about the record of Xi
ICounter counter about the record of Xi among the M is added one;
}
Else
{
In M, increase record Xi.
ICounter Counter Value about the record of Xi among the M is set at one.
}
After the word in webpage combination is added up, check all webpages of whether having added up in the web page library, if execution in step S205 then, if otherwise read the webpage of adding up, carry out word segmentation processing, and execution in step S204-step S204.
Step S205 is for frequency is set up the compound word index greater than the word combination of predetermined threshold value; Described frequency is combined as compound word greater than the word of predetermined threshold value.
In the present embodiment, if after having added up the frequency of the portmanteau word of used webpage in the web page library, set a threshold value, travel through all compound word candidate storehouses, export all word combinations greater than this threshold value, described frequency is combined as compound word greater than the word of predetermined threshold value.Wherein, threshold value can be according to the frequency maximal value of the word combination of statistics as a reference, is provided with according to this maximal value according to the action need of reality.Perhaps the frequency to the used word combination in the compound word candidate storehouse carries out normalized, obtain the probability that each word combination occurs, the word of choosing greater than certain probable value according to this maximal value setting according to the action need of reality makes up as compound word then, and the described compound word of choosing is combined into compound dictionary.
Step S206 is with searching request and the compound word coupling of obtaining.
Present embodiment after obtaining user's searching request, mates compound word in retrieval language in this request and the above-mentioned compound dictionary.If contain the middle compound word of compound dictionary in this retrieval language, then the match is successful.
Step S207 after the match is successful, utilizes described compound word index to carry out search arithmetic, and returns the result after this search arithmetic.
In the present embodiment, at compound word after the match is successful.Retrieval in this searching request language is split:
If this retrieval language only splits into a compound word, then the index according to this compound word returns Search Results;
If this retrieval language only splits into a plurality of compound words, then the result that the index of each compound word is obtained is in conjunction with seeking common ground and ask and set operation, and returns calculated result;
If this retrieval language only splits into compound word and monobasic speech, then the results set that obtains of results set that the index of compound word is obtained and monobasic glossarial index seeks common ground and asks and set operation, and returns calculated result.
The embodiment of the invention is screened compound word by the frequency according to the combination of statistics word, and sets up the compound word index for compound word, when containing the keyword that mates with described compound word in obtaining searching request, returns Search Results according to described compound word index.Can reduce fractionation granularity, save and handle resource, reduce operation time, improve recall precision, thereby can respond user's searching request faster the retrieval language of user's input.
Fig. 3 is the structural representation of embodiment of the invention search engine server first embodiment.As shown in Figure 3, this search engine server comprises: frequency statistics unit 310, screening unit 320, generation unit 330 and search processing 340.
Frequency statistics unit 310, the word that is used for each webpage that web page library is preserved is combined into line frequency degree statistics, and described word combination is formed by this webpage being carried out the monobasic word that word segmentation processing obtains.
Screening unit 320 is used to screen the word combination of frequency greater than predetermined threshold value, and described frequency is combined as compound word greater than the word of predetermined threshold value.
Generation unit 330 is used to described compound word to set up the compound word index.
Search processing 340 when containing the keyword that mates with described compound word in obtaining searching request, is returned Search Results according to described compound word index.
The embodiment of the invention is screened compound word by the frequency according to the combination of statistics word, and sets up the compound word index for compound word, when containing the keyword that mates with described compound word in obtaining searching request, returns Search Results according to described compound word index.Can reduce fractionation granularity, save and handle resource, reduce operation time, improve recall precision, thereby can respond user's searching request faster the retrieval language of user's input.
Fig. 4 is the structural representation of embodiment of the invention search engine server second embodiment.As shown in Figure 4, this search engine server comprises: storage unit 410, frequency statistics unit 420, screening unit 430, generation unit 440 and search processing 450 are set.
Storage unit 410 is set, is used to be provided with the compound word candidate storehouse of preparing to be used to add up the word combination, described compound word candidate storehouse is used to deposit the word combination, and the counter of corresponding each word combination.
In the present embodiment, storage unit 410 is set realizes above-mentioned setting by the compound word candidate storehouse of a sky of initialization.Storage unit 410 concrete executive modes are set is: defining a compound word candidate storehouse M, is a set, and its effect is to deposit following elements A:
A=struct{
The compound entry of String
The counter of the compound entry of Int32iCounter
}
In fact M realizes being exactly the array of a structure A in program.
Frequency statistics unit 420 is used to screen the word combination of frequency greater than predetermined threshold value, and described frequency is combined as compound word greater than the word of predetermined threshold value.This frequency statistics unit 420 comprises: word-dividing mode 421, arrangement module 422, judge module 423 and statistical module 424.
Word-dividing mode 421 is used for reading each webpage of web page library and this webpage is carried out word segmentation processing.
In this enforcement, participle is that sentence is carried out the process of cutting participle by word.It is to remove stop words that 421 pairs of webpages of word-dividing mode carry out the word segmentation processing main policies, and carries out synonym normalization, obtains the result after the word segmentation processing then.For example, the web page contents that reads is " international strategies of Intellectual Property in China is selected to arrange with domestic strategy ".Remove stop words, and carry out synonym normalization and carry out participle, the monobasic word of acquisition is: " China ", " knowledge ", " property right ", " world ", " strategy ", " selection ", " domestic ", " strategy ", " arrangement ".
Arrange module 422 and be used for the monobasic word that described word segmentation processing obtains is carried out permutation and combination, obtain the word combination.
In the present embodiment, the monobasic word that word segmentation processing is obtained carries out permutation and combination, obtains the word combination.The monobasic speech that obtains with above-mentioned word-dividing mode 421 is that example is carried out permutation and combination, can obtain various binary speech permutation and combination such as " Chinese knowledge ", " intellecture property ", " international strategies ", " domestic strategy ", " strategy is arranged ", " selecting domestic ", " strategic choice ", not enumerate one by one at this.Can also obtain various binary speech permutation and combination such as " Intellectual Property in China ", " selecting domestic strategy ", not enumerate one by one at this.Same can obtain more polynary word combination, can module 425 be set by a rule that is used to be provided with the queueing discipline of permutation and combination is set, by the unit time that queueing discipline is provided with the maximum of combination word is set.Described queueing discipline is used to indicate the monobasic word that described word segmentation processing is obtained to carry out permutation and combination according to the rule that is provided with, and except the unit of the maximum of the combination word that is provided with time, setting such as can also whether make up at random.
Judge module 423 is used for after described arrangement module obtains the word combination, judges the word combination that whether has acquisition in the described compound word candidate storehouse; Statistical module 424 is used for judging that at described judge module there is the word combination that obtains in the compound word candidate storehouse, then counter values that should the word combination is added one; Otherwise, be used to increase the combination of this word at described compound word candidate storehouse, and counter values that should the word combination is set at one.
In the present embodiment, each word combination that the 423 pairs of permutation and combination of judge module obtain judges, if there is the word combination that obtains in the described compound word candidate storehouse, then statistical module 424 is counter values that should the word combination is added one; If there is not the word combination of acquisition in the described compound word candidate storehouse, then statistical module 424 is used to increase this word combination at described compound word candidate storehouse, and counter values that should the word combination is set at one.To each word combination, be made as Xi, specific algorithm is as follows:
IF (Xi is in candidate storehouse M)
{
Among the visit M about the record of Xi
ICounter counter about the record of Xi among the M is added one;
}
Else
{
In M, increase record Xi.
ICounter Counter Value about the record of Xi among the M is set at one.
}
After combination is added up to the word in the webpage, check whether add up all webpages of playing in the web page library, if playing, statistics further handles by screening unit 430; Otherwise continue to carry out word segmentation processing, and finish the frequency statistics of the combination of speaking to oneself respectively by arrangement module 422, judge module 423, statistical module 424 by 421 pairs of webpages of not adding up of word-dividing mode.
Screening unit 430 is used to screen the word combination of frequency greater than predetermined threshold value, and described frequency is combined as compound word greater than the word of predetermined threshold value.
In the present embodiment, if after having added up the frequency of the portmanteau word of used webpage in the web page library, set a threshold value by screening unit 430, travel through all compound word candidate storehouses, export all word combinations greater than this threshold value, described frequency is combined as compound word greater than the word of predetermined threshold value.Wherein, threshold value can be according to the frequency maximal value of the word combination of statistics as a reference, is provided with according to this maximal value according to the action need of reality.Perhaps the frequency to the used word combination in the compound word candidate storehouse carries out normalized, obtain the probability that each word combination occurs, the word of choosing greater than certain probable value according to this maximal value setting according to the action need of reality makes up as compound word then, and the described compound word of choosing is combined into compound dictionary.
Generation unit 440 is used to described compound word to set up the compound word index.
Search processing 450 when containing the keyword that mates with described compound word in obtaining searching request, is returned Search Results according to described compound word index.Described search processing 450 specifically comprises: acquisition module 451, matching module 452 and computing module 453.
Acquisition module 451 is used to obtain user's searching request; Searching request and compound word that matching module 452 is used for obtaining mate; Computing module 753 is used to utilize described compound word index to carry out search arithmetic, and returns the result after this search arithmetic.
In the present embodiment, after acquisition module 451 obtains user's searching request, during matching module 452 will be asked in retrieval language and the above-mentioned compound dictionary compound word mate.If contain the middle compound word of compound dictionary in this retrieval language, then the match is successful.To retrieve language at compound word after the match is successful.Retrieval in this searching request language is split:
If this retrieval language only splits into a compound word, then computing module 453 returns Search Results according to the index of this compound word;
If this retrieval language only splits into a plurality of compound words, then computing module 453 result that the index of each compound word is obtained is in conjunction with seeking common ground and ask and set operation, and returns calculated result;
If this retrieval language only splits into compound word and monobasic speech, then the results set that obtains of the results set that obtains of the index of 453 pairs of compound words of computing module and monobasic glossarial index seeks common ground and asks and set operation, and returns calculated result.
The embodiment of the invention is screened compound word by the frequency according to the combination of statistics word, and sets up the compound word index for compound word, when containing the keyword that mates with described compound word in obtaining searching request, returns Search Results according to described compound word index.Can reduce fractionation granularity, save and handle resource, reduce operation time, improve recall precision, thereby can respond user's searching request faster the retrieval language of user's input.
More than cited only be preferred embodiment of the present invention, can not limit the present invention's interest field certainly with this, therefore the equivalent variations of doing according to claim of the present invention still belongs to the scope that the present invention is contained.

Claims (10)

1. the method based on the searching method of compound word is characterized in that, comprising:
The word of each webpage that web page library is preserved is combined into line frequency degree statistics, and described word combination is formed by this webpage being carried out the monobasic word that word segmentation processing obtains;
For frequency is set up the compound word index greater than the word combination of predetermined threshold value; Described frequency is combined as compound word greater than the word of predetermined threshold value;
When in obtaining searching request, containing the keyword that mates with described compound word, return Search Results according to described compound word index.
2. the method for claim 1 is characterized in that, described word in the webpage of web page library preservation is combined into comprises before the line frequency degree is added up:
The compound word candidate storehouse of preparing to be used to add up the word combination is set, and described compound word candidate storehouse is used to deposit the word combination, and the counter of corresponding each word combination.
3. method as claimed in claim 2 is characterized in that, the word of described each webpage that web page library is preserved is combined into line frequency degree statistics and comprises:
For each webpage in the web page library, read this webpage earlier and this webpage is carried out word segmentation processing;
The monobasic word that described word segmentation processing is obtained carries out permutation and combination, obtains the word combination;
If the word that exist to obtain in described compound word candidate storehouse combination is then counter values that should the word combination is added one;
If there is not the word combination of acquisition in the described compound word candidate storehouse, then be used to increase this word combination at described compound word candidate storehouse, and counter values that should the word combination is set at one.
4. method as claimed in claim 3 is characterized in that, the described monobasic word that described word segmentation processing is obtained carries out permutation and combination, obtains also to comprise before the word combination:
The queueing discipline of permutation and combination is set, and described queueing discipline is used to indicate the monobasic word that described word segmentation processing is obtained to carry out permutation and combination according to the rule that is provided with.
5. as each described method of claim 1-4, it is characterized in that, return Search Results according to described compound word index during the described keyword that in obtaining searching request, contains with described compound word coupling and comprise:
With searching request and the compound word coupling of obtaining;
After the match is successful, utilize described compound word index to carry out search arithmetic, and return the result after this search arithmetic.
6. a search engine server is characterized in that, comprising:
The frequency statistics unit, the word that is used for each webpage that web page library is preserved is combined into line frequency degree statistics, and described word combination is formed by this webpage being carried out the monobasic word that word segmentation processing obtains;
The screening unit is used to screen the word combination of frequency greater than predetermined threshold value, and described frequency is combined as compound word greater than the word of predetermined threshold value;
Generation unit is used to described compound word to set up the compound word index;
Search processing when containing the keyword that mates with described compound word in obtaining searching request, is returned Search Results according to described compound word index.
7. search engine server as claimed in claim 6 is characterized in that, also comprises:
Storage unit is set, is used to be provided with the compound word candidate storehouse of preparing to be used to add up the word combination, described compound word candidate storehouse is used to deposit the word combination, and the counter of corresponding each word combination.
8. search engine server as claimed in claim 7 is characterized in that, described frequency statistics unit comprises
Word-dividing mode is used for reading each webpage of web page library and this webpage is carried out word segmentation processing;
Arrange module, be used for the monobasic word that described word-dividing mode word segmentation processing obtains is carried out permutation and combination, obtain the word combination;
Judge module is used for after described arrangement module obtains the word combination, judges the word combination that whether has acquisition in the described compound word candidate storehouse;
Statistical module is used for judging that at described judge module there is the word combination that obtains in the compound word candidate storehouse, then counter values that should the word combination is added one; Otherwise, be used to increase the combination of this word at described compound word candidate storehouse, and counter values that should the word combination is set at one.
9. search engine server as claimed in claim 8 is characterized in that, described frequency statistics unit also comprises:
Rule is provided with module, is used to be provided with the queueing discipline of permutation and combination, and described queueing discipline is used to indicate the monobasic word that described word segmentation processing is obtained to carry out permutation and combination according to the rule that is provided with.
10. as each described search engine server of claim 6-9, it is characterized in that described search processing also comprises:
Acquisition module is used to obtain user's searching request;
Matching module is used for searching request and the compound word coupling that will obtain;
Computing module is used to utilize described compound word index to carry out search arithmetic, and returns the result after this search arithmetic.
CN2010101380777A 2010-03-24 2010-03-24 Search method based on compound words and search engine server Pending CN102200984A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010101380777A CN102200984A (en) 2010-03-24 2010-03-24 Search method based on compound words and search engine server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101380777A CN102200984A (en) 2010-03-24 2010-03-24 Search method based on compound words and search engine server

Publications (1)

Publication Number Publication Date
CN102200984A true CN102200984A (en) 2011-09-28

Family

ID=44661668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101380777A Pending CN102200984A (en) 2010-03-24 2010-03-24 Search method based on compound words and search engine server

Country Status (1)

Country Link
CN (1) CN102200984A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103377226A (en) * 2012-04-25 2013-10-30 中国移动通信集团公司 Intelligent search method and system thereof
WO2017063525A1 (en) * 2015-10-12 2017-04-20 广州神马移动信息科技有限公司 Query processing method, device and apparatus
CN106844638A (en) * 2017-01-19 2017-06-13 王碧波 Information retrieval method, device and electronic equipment
CN106909638A (en) * 2012-12-07 2017-06-30 合网络技术(北京)有限公司 A kind of method and apparatus for finding hot video in real time based on user's inquiry log
CN107894979A (en) * 2017-11-21 2018-04-10 北京百度网讯科技有限公司 The compound process method, apparatus and its equipment excavated for semanteme

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6823341B1 (en) * 1999-12-06 2004-11-23 International Business Machines Corporation Method, system and program for providing indexed web page contents to a search engine database
CN101079056A (en) * 2007-02-06 2007-11-28 腾讯科技(深圳)有限公司 Retrieving method and system
CN101315623A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6823341B1 (en) * 1999-12-06 2004-11-23 International Business Machines Corporation Method, system and program for providing indexed web page contents to a search engine database
CN101079056A (en) * 2007-02-06 2007-11-28 腾讯科技(深圳)有限公司 Retrieving method and system
CN101315623A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103377226A (en) * 2012-04-25 2013-10-30 中国移动通信集团公司 Intelligent search method and system thereof
CN103377226B (en) * 2012-04-25 2016-08-03 中国移动通信集团公司 A kind of intelligent search method and system thereof
CN106909638A (en) * 2012-12-07 2017-06-30 合网络技术(北京)有限公司 A kind of method and apparatus for finding hot video in real time based on user's inquiry log
WO2017063525A1 (en) * 2015-10-12 2017-04-20 广州神马移动信息科技有限公司 Query processing method, device and apparatus
CN106844638A (en) * 2017-01-19 2017-06-13 王碧波 Information retrieval method, device and electronic equipment
CN106844638B (en) * 2017-01-19 2020-11-03 杭州汇数智通科技有限公司 Information retrieval method and device and electronic equipment
CN107894979A (en) * 2017-11-21 2018-04-10 北京百度网讯科技有限公司 The compound process method, apparatus and its equipment excavated for semanteme

Similar Documents

Publication Publication Date Title
US10452691B2 (en) Method and apparatus for generating search results using inverted index
CN101685455B (en) Method and system of data retrieval
CN107784110B (en) A kind of index establishment method and apparatus
KR101195341B1 (en) Method and apparatus for determining category of an unknown word
CN105095368B (en) Method and device for sequencing news information
KR20150036117A (en) Query expansion
CN103377226A (en) Intelligent search method and system thereof
CN101963965A (en) Document indexing method, data query method and server based on search engine
CN102929902A (en) Character splitting method and device based on Chinese retrieval
CN108897819B (en) Data searching method and device
WO2022105178A1 (en) Keyword extraction method and related device
CN102200984A (en) Search method based on compound words and search engine server
US20180225382A1 (en) System and method for automatic creation of ontological databases and semantic searching
CN102339294A (en) Searching method and system for preprocessing keywords
US11836175B1 (en) Systems and methods for semantic search via focused summarizations
CN108629375B (en) Electric power customer classification method, system, terminal and computer-readable storage medium
CN103034656A (en) Chapter content tiering method and device, and article content tiering method and device
CN103020489A (en) Novel method for forecasting siRNA interference efficiency based on ARM (Advanced RISC Machines) microprocessor
CN115374775A (en) Method, device and equipment for determining text similarity and storage medium
US20110191347A1 (en) Adaptive routing of documents to searchable indexes
CN105022794A (en) Method and apparatus for fast searching for required article contents
Cheng et al. MISDA: web services discovery approach based on mining interface semantics
CN120705184A (en) Data query caching method, device, equipment and storage medium
CN109977227A (en) Text feature, system, device based on feature coding
CN117312486B (en) Dictionary division two-layer structure encryption index creation method supporting quick encryption document ordering retrieval

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: SHENZHEN TENCENT COMPUTER SYSTEM CO., LTD.

Effective date: 20131031

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518129 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TA01 Transfer of patent application right

Effective date of registration: 20131031

Address after: A Tencent Building in Shenzhen Nanshan District City, Guangdong streets in Guangdong province science and technology 518057 16

Applicant after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: The South Road in Guangdong province Shenzhen city Fiyta building 518129 floor 5-10 Nanshan District high tech Zone

Applicant before: Shenzhen Tencent Computer System Co., Ltd.

C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110928