CN101820592A - Method and device for mobile search - Google Patents
Method and device for mobile search Download PDFInfo
- Publication number
- CN101820592A CN101820592A CN200910140119A CN200910140119A CN101820592A CN 101820592 A CN101820592 A CN 101820592A CN 200910140119 A CN200910140119 A CN 200910140119A CN 200910140119 A CN200910140119 A CN 200910140119A CN 101820592 A CN101820592 A CN 101820592A
- Authority
- CN
- China
- Prior art keywords
- search
- interest
- user
- search type
- score value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种移动搜索方法及装置,所述方法包括:接收搜索请求,所述搜索请求中包含一个或多个查询关键字;计算各搜索类型域的评分值,所述评分值为以下任意一项的评分值或多项的综合评分值:所述搜索请求与所述搜索类型域的相似度、所述搜索请求对应所述搜索类型域的大众搜索率、搜索类型域的个性化用户兴趣评分值;所述大众搜索率为:大众搜索次数,或者大众搜索结果点击次数;根据各搜索类型域的评分值选择其中一个或几个搜索类型域搜索所述查询关键字。利用本发明,可以为用户提供个性化的准确的搜索结果。
The invention discloses a mobile search method and device. The method includes: receiving a search request, the search request includes one or more query keywords; calculating the score value of each search type field, and the score value is as follows The score value of any item or the comprehensive score value of multiple items: the similarity between the search request and the search type domain, the search rate of the search request corresponding to the search type domain, the personalized user of the search type domain Interest scoring value; the popular search rate: the number of popular searches, or the number of clicks on popular search results; according to the scoring values of each search type field, select one or several search type fields to search for the query keyword. With the present invention, personalized and accurate search results can be provided for users.
Description
技术领域technical field
本发明涉及移动通信技术,具体涉及一种移动搜索方法及装置。The invention relates to mobile communication technology, in particular to a mobile search method and device.
背景技术Background technique
目前,作为搜索引擎和移动通信这两个当前信息产业的两大热门领域的结合-移动搜索,已经成为移动增值业务新的亮点和增长点。移动搜索框架是一个基于元搜索的开放的平台,它整合许多专业/垂直搜索引擎的能力,为用户提供一个综合的搜索能力。At present, mobile search, which is the combination of search engine and mobile communication, two hot fields in the current information industry, has become a new highlight and growth point of mobile value-added services. The mobile search framework is an open platform based on meta-search, which integrates the capabilities of many professional/vertical search engines to provide users with a comprehensive search capability.
用户使用移动搜索时,通常输入搜索关键字后直接进行搜索而没有选择搜索的类型域(domain)。因此,如何正确理解用户的搜索意图,为用户提供个性化的精确的搜索结果,现有技术中还没有很好的解决方案。When a user uses a mobile search, he usually searches directly after inputting a search keyword without selecting a search type domain (domain). Therefore, how to correctly understand the user's search intention and provide the user with personalized and accurate search results, there is no good solution in the prior art.
发明内容Contents of the invention
本发明实施例提供一种移动搜索方法及装置,能够为用户提供个性化的准确的搜索结果。Embodiments of the present invention provide a mobile search method and device, which can provide users with personalized and accurate search results.
本发明实施例提供一种移动搜索方法,包括:An embodiment of the present invention provides a mobile search method, including:
接收搜索请求,所述搜索请求中包含一个或多个查询关键字;Receiving a search request, the search request includes one or more query keywords;
计算各搜索类型域的评分值,所述评分值为以下任意一项的评分值或多项的综合评分值:所述搜索请求与所述搜索类型域的相似度、所述搜索请求对应所述搜索类型域的大众搜索率、搜索类型域的个性化用户兴趣评分值;Calculate the score value of each search type field, and the score value is the score value of any one of the following items or the comprehensive score value of multiple items: the similarity between the search request and the search type field, the search request corresponding to the The popular search rate of the search type domain, the personalized user interest score value of the search type domain;
根据各搜索类型域的评分值选择其中一个或几个搜索类型域搜索所述查询关键字。According to the scoring value of each search type field, one or several search type fields are selected to search for the query keyword.
本发明实施例提供一种移动搜索装置,包括:An embodiment of the present invention provides a mobile search device, including:
接收单元,用于接收搜索请求,所述搜索请求中包含一个或多个查询关键字;a receiving unit, configured to receive a search request, wherein the search request includes one or more query keywords;
计算单元,用于计算各搜索类型域的评分值,所述评分值为以下任意一项的评分值或多项的综合评分值:所述搜索请求与所述搜索类型域的相似度、所述搜索请求对应所述搜索类型域的大众搜索率、搜索类型域的个性化用户兴趣评分值;A calculation unit, configured to calculate the score value of each search type field, and the score value is a score value of any one of the following items or a comprehensive score value of multiple items: the similarity between the search request and the search type field, the The search request corresponds to the popular search rate of the search type domain and the personalized user interest score value of the search type domain;
选择单元,根据各搜索类型域的评分值选择其中一个或几个搜索类型域;Selection unit, select one or several search type fields according to the scoring value of each search type field;
搜索单元,用于利用所述选择单元选择的搜索类型域搜索所述查询关键字。A search unit, configured to search for the query keyword using the search type field selected by the selection unit.
本发明实施例提供的移动搜索方法及装置,通过分析用户的大众兴趣与用户的个性化兴趣,确定用户的个性化查询分类,从而为用户提供个性化的精确的搜索结果。The mobile search method and device provided by the embodiments of the present invention determine the user's personalized query category by analyzing the user's public interest and the user's personalized interest, thereby providing the user with personalized and accurate search results.
附图说明Description of drawings
图1是本发明实施例移动搜索方法的流程图;Fig. 1 is the flow chart of the mobile search method of the embodiment of the present invention;
图2是本发明实施例移动搜索方法的一种实现流程图;Fig. 2 is a kind of implementation flowchart of the mobile search method of the embodiment of the present invention;
图3是本发明实施例移动搜索方法的另一种实现流程图;Fig. 3 is another implementation flowchart of the mobile search method in the embodiment of the present invention;
图4是本发明实施例移动搜索方法的另一种实现流程图;Fig. 4 is another implementation flowchart of the mobile search method in the embodiment of the present invention;
图5是本发明实施例移动搜索方法的另一种实现流程图;Fig. 5 is another implementation flowchart of the mobile search method according to the embodiment of the present invention;
图6是本发明实施例移动搜索装置的结构示意图;FIG. 6 is a schematic structural diagram of a mobile search device according to an embodiment of the present invention;
图7是本发明实施例移动搜索装置的一种具体结构示意图;FIG. 7 is a schematic structural diagram of a mobile search device according to an embodiment of the present invention;
图8是本发明实施例移动搜索装置的另一种具体结构示意图;Fig. 8 is a schematic diagram of another specific structure of a mobile search device according to an embodiment of the present invention;
图9是本发明实施例移动搜索装置的另一种具体结构示意图;FIG. 9 is a schematic diagram of another specific structure of a mobile search device according to an embodiment of the present invention;
图10是图9所示装置中兴趣模型提取子单元的一种结构示意图;Fig. 10 is a schematic structural diagram of an interest model extraction subunit in the device shown in Fig. 9;
图11是图9所示装置中兴趣模型提取子单元的另一种结构示意图;Fig. 11 is another schematic structural diagram of the interest model extraction subunit in the device shown in Fig. 9;
图12是本发明实施例移动搜索装置的另一种具体结构示意图。Fig. 12 is a schematic diagram of another specific structure of a mobile search device according to an embodiment of the present invention.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本发明实施例的方案,下面结合附图和实施方式对本发明实施例作进一步的详细说明。In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the drawings and implementations.
本发明实施例移动搜索方法及装置,针对用户的搜索请求,通过分析用户对应的大众兴趣与用户的个性化兴趣,确定用户的个性化查询分类,具体地,计算各搜索类型域的评分值,所述评分值为以下任意一项的评分值或多项的综合评分值:所述搜索请求与所述搜索类型域的相似度、所述搜索请求对应所述搜索类型域的大众搜索率、搜索类型域的个性化用户兴趣评分值;所述大众搜索率为:大众搜索次数,或者大众搜索结果点击次数;然后,根据各搜索类型域的评分值选择其中一个或几个搜索类型域搜索所述查询关键字,从而为用户提供个性化的精确的搜索结果。The mobile search method and device of the embodiments of the present invention determine the user's personalized query category by analyzing the user's corresponding public interest and user's personalized interest for the user's search request, specifically, calculate the scoring value of each search type field, The score value is the score value of any one of the following items or the comprehensive score value of multiple items: the similarity between the search request and the search type domain, the public search rate of the search request corresponding to the search type domain, search The personalized user interest score value of the type domain; the mass search rate is: the number of mass searches, or the number of clicks on the mass search results; then, according to the scoring value of each search type field, select one or several search type fields to search the Query keywords to provide users with personalized and accurate search results.
如图1所示,是本发明实施例移动搜索方法的流程图。As shown in FIG. 1 , it is a flowchart of a mobile search method according to an embodiment of the present invention.
步骤101,接收搜索请求,所述搜索请求中包含一个或多个查询关键字。
步骤102,计算各搜索类型域的评分值,所述评分值为以下任意一项的评分值或多项的综合评分值:所述搜索请求与所述搜索类型域的相似度、所述搜索请求对应所述搜索类型域的大众搜索率、搜索类型域的个性化用户兴趣评分值;所述大众搜索率为:大众搜索次数,或者大众搜索结果点击次数。
步骤103,根据各搜索类型域的评分值选择其中一个或几个搜索类型域搜索所述查询关键字。
在本发明实施例中,在确定用户的个性化查询分类时,可以有多种实现方式,比如,可以是根据所述搜索请求与所述搜索类型域的相似度,选择相似度高的一个或几个搜索类型域进行搜索;也可以是根据所述搜索请求对应所述搜索类型域的大众搜索率,选择大众搜索率高的一个或几个搜索类型域进行搜索;还可以根据搜索类型域的个性化用户兴趣评分值,选择个性化用户兴趣评分值高的一个或几个搜索类型域进行搜索。当然,还可以是综合考虑上述几项,计算出每个搜索类型域的综合评分值,选择综合评分值高的一个或几个搜索类型域进行搜索。下面对此分别举例详细说明。In the embodiment of the present invention, when determining the user's personalized query category, there may be multiple implementation methods, for example, it may be based on the similarity between the search request and the search type field, select the one with a high similarity or Several search type domains are searched; it is also possible to select one or several search type domains with high public search rate according to the popular search rate of the search type domain corresponding to the search request; it is also possible to search according to the search type domain Personalized user interest score value, select one or several search type domains with high personalized user interest score value to search. Of course, it is also possible to comprehensively consider the above items, calculate the comprehensive score value of each search type field, and select one or several search type fields with high comprehensive score values for searching. Examples are given below to explain this in detail.
参照图2,是本发明实施例移动搜索方法的一种实现流程图。Referring to FIG. 2 , it is a flowchart of an implementation of a mobile search method according to an embodiment of the present invention.
在该实施例中,根据所述搜索请求与所述搜索类型域的相似度,选择搜索类型域进行搜索,以便为用户提供个性化的准确的搜索结果。In this embodiment, according to the similarity between the search request and the search type field, a search type field is selected for searching, so as to provide users with personalized and accurate search results.
步骤201,接收搜索请求,所述搜索请求中包含一个或多个查询关键字。
步骤202,根据所述查询关键字计算所述搜索请求与各搜索类型域的相似度。
可以为所述搜索请求中的查询关键字设置相应的权重,由所述查询关键字的权重生成查询向量Query(q1,q2,...qn’);其中,q1,q2,...qn’为对应各查询关键字的权重;具体地,可以将所有关键字设置相同的权重,比如权重=1;也可以为不同的关键字设置不同的权重,比如,为排在最前面的关键字设置最大权重,比如权重=1,为排在中间的关键字设置中间大小的权重,比如0.5<权重<1,为排在最后的关键字设置最小权重,比如权重=0.5。Corresponding weights may be set for the query keywords in the search request, and the query vector Query(q1, q2,...qn') is generated by the weights of the query keywords; where, q1, q2,...qn ' is the weight corresponding to each query keyword; specifically, the same weight can be set for all keywords, such as weight = 1; different weights can also be set for different keywords, for example, for the top keywords Set the maximum weight, such as weight=1, set the weight of the middle size for the keywords ranked in the middle, such as 0.5<weight<1, set the minimum weight for the keywords ranked last, such as weight=0.5.
由所述搜索类型域的各词的权重生成对应该搜索类型域的域向量,比如给每个搜索类型域的所有主题词和相关词设置一定的权重,由这些主题词和相关词的权重组成对应该搜索类型域的域向量Domain(t1,t2,…,tn),其中,t1,t2,…,tn为该搜索类型域中各词的权重。通过计算所述查询向量和域向量得到所述所述搜索请求与搜索类型域的相似度。The domain vector corresponding to the search type domain is generated from the weights of each word in the search type domain, such as setting a certain weight for all subject words and related words in each search type domain, which is composed of the weights of these subject terms and related words A domain vector Domain(t1, t2, ..., tn) corresponding to the search type domain, where t1, t2, ..., tn is the weight of each word in the search type domain. The similarity between the search request and the search type domain is obtained by calculating the query vector and the domain vector.
可以按以下公式计算向量Domian(t1,t2,...,tn)与向量Query(q1,q2,...,qn’)之间的相似度:The similarity between the vector Domian(t1, t2, ..., tn) and the vector Query(q1, q2, ..., qn') can be calculated according to the following formula:
其中,ti1,ti2,…,tin’分别是向量Domian(t1,t2,...,tn)中与权重q1,q2,...,qn’对应的查询关键字相同的词对应的权重。Among them, t i1 , t i2 , ..., t in ' are the words corresponding to the same query keywords corresponding to weights q1, q2, ..., qn' in the vector Domian(t1, t2, ..., tn) the weight of.
假设有m个搜索类型域,对应的域向量分别为Domain1(t1,t2,...,tn),Domain2(t1,t2,...,tn),...,Domainm(t1,t2,...,tn),则按公式(1)分别计算向量Query(q1,q2,...,qn’)与上述各域向量的相似度。Suppose there are m search type domains, and the corresponding domain vectors are Domain1(t1, t2, ..., tn), Domain2(t1, t2, ..., tn), ..., Domainm(t1, t2, ..., tn), then calculate the similarity between the vector Query(q1, q2, ..., qn') and the above domain vectors according to the formula (1).
步骤203,选择相似度高的一个或多个搜索类型域进行搜索。
在该实施例中,各搜索类型域中主题词、相关词,以及各词的权重可以有多种方式来设置。In this embodiment, the subject words, related words, and weights of each word in each search type field can be set in various ways.
1.人工分配方式1. Manual allocation method
对于主题词设置最大的权重,对于强相关词设置中间大小的权重,对于弱相关词设置最小权重。Set the maximum weight for the subject words, set the middle size weight for the strong related words, and set the minimum weight for the weak related words.
比如:主题词(如餐饮搜索类型域中的“川菜”)设置权重为1,强相关词(如餐饮搜索类型域中的“辣”)设置权重为0.8,弱相关词(如餐饮搜索类型域中的“香”)设置权重为0.5。For example: subject words (such as "Sichuan Cuisine" in the catering search type field) set the weight to 1, strong related words (such as "spicy" in the catering search type field) set the weight to 0.8, weakly related words (such as catering search type field " "Incense" in ) set the weight to 0.5.
2.通过学习自动分配方式2. By learning the automatic allocation method
具体过程如下:The specific process is as follows:
(1)对于每个搜索类型域,获取对应该搜索类型域的训练文本语料样本;(1) For each search type field, obtain a training text corpus sample corresponding to the search type field;
(2)对所述语料样本进行切词,生成该搜索类型域的词库;(2) Carry out word segmentation to described corpus sample, generate the thesaurus of this search type field;
(3)计算所述词库中各词的权重,每个词的权重=TF*GIDF,其中TF为该词在该搜索类型域所有语料样本中总词频,GIDF为全局反向文档频率,GIDF=log(1+N/GDF),其中N为所有搜索类型域的所有语料样本的总数量,GDF为全局语料样本频率,即为所有搜索类型域中包含该词的所有语料样本的数量;(3) Calculate the weight of each word in the thesaurus, the weight of each word=TF*GIDF, wherein TF is the total word frequency of this word in all corpus samples of this search type domain, GIDF is the global reverse document frequency, GIDF =log(1+N/GDF), where N is the total number of all corpus samples in all search type domains, and GDF is the global corpus sample frequency, which is the number of all corpus samples that include the word in all search type domains;
(4)根据各词的权重确定所述搜索类型域中的主题词和相关词;(4) determine subject words and related words in the search type field according to the weight of each word;
假设某搜索类型域的词库中共有n个词,对应的权重为T1,T2,...,Tn,其中,T1>T2>...>Tn,这样,可以认为T1对应的词为主题词,其他词为相关词。Assume that there are n words in the thesaurus of a certain search type domain, and the corresponding weights are T1, T2, ..., Tn, among them, T1>T2>...>Tn, so that the word corresponding to T1 can be considered as the topic words, and other words are related words.
进一步地,还可以将所述词库中的所有词按照权重划分为不同档次的集合,为每个档次的集合设置最终评分值,并将每个档次的最终评分值作为该档次内的各词的权重。比如,共有L档,为第一档设置最高评分值,中间档设置中间大小的评分值,第L档设置最小评分值。这样,由词类中的词及其最终评分值即可组成对应的搜索类型域的域向量。Further, it is also possible to divide all the words in the thesaurus into sets of different grades according to their weights, set the final score value for the set of each grade, and use the final score value of each grade as each word in the grade the weight of. For example, there are L grades in total, the highest score value is set for the first grade, the intermediate grade value is set for the middle grade, and the minimum score value is set for the L grade. In this way, the domain vector of the corresponding search type domain can be formed from the words in the part of speech and their final scoring values.
当然,本发明实施例并不仅限于上述这些设置方式,对于各搜索类型域中主题词、相关词,以及各词的权重还可以采用其他方式来设置,在此不再一一详细说明。Of course, the embodiments of the present invention are not limited to the above-mentioned setting methods, and other methods can be used to set the subject words, related words, and weights of each word in each search type field, which will not be described in detail here.
本发明实施例移动搜索方法,针对用户的搜索请求,通过计算搜索请求的查询向量与各搜索类型域的域向量的相似度,选择相似度高的一个或几个搜索类型域进行搜索,从而可以为用户确定个性化查询分类,为用户提供个性化的精确的搜索结果。In the mobile search method of the embodiment of the present invention, for the user's search request, by calculating the similarity between the query vector of the search request and the field vectors of each search type field, one or several search type fields with high similarity are selected for searching, so that Determine personalized query categories for users, and provide users with personalized and accurate search results.
参照图3,是本发明实施例移动搜索方法的另一种实现流程图。Referring to FIG. 3 , it is a flow chart of another implementation of the mobile search method according to the embodiment of the present invention.
在该实施例中,根据所述搜索请求对应所述搜索类型域的大众搜索率,选择搜索类型域进行搜索,以便为用户提供个性化的准确的搜索结果。In this embodiment, according to the popular search rate corresponding to the search type field in the search request, the search type field is selected for searching, so as to provide users with personalized and accurate search results.
步骤301,接收搜索请求,所述搜索请求中包含一个或多个查询关键字。Step 301: Receive a search request, where the search request includes one or more query keywords.
步骤302,根据所述查询关键字计算所述搜索请求对应各搜索类型域的大众搜索率。
步骤303,选择大众搜索率高的一个或多个搜索类型域进行搜索。
在本发明实施例中,所述大众搜索率具体可以是:大众搜索次数,或者大众搜索结果点击次数等。In the embodiment of the present invention, the popular search rate may specifically be: the number of popular searches, or the number of clicks on popular search results.
下面分别详细说明计算所述搜索请求对应各搜索类型域的大众搜索次数和大众搜索结果点击次数的过程。The process of calculating the number of popular searches and the number of clicks on popular search results corresponding to each search type field of the search request will be described in detail below.
计算所述搜索请求对应的某个搜索类型域的大众搜索次数的过程如下:The process of calculating the number of public searches for a certain search type domain corresponding to the search request is as follows:
(1)计算所述搜索请求中每个关键字对应的某个搜索类型域的大众搜索总次数;(1) Calculate the total number of public searches of a certain search type field corresponding to each keyword in the search request;
可以依据历史记录,搜集所有用户关于包含所述搜索请求中某个关键字的搜索请求选择用某个搜索类型域进行搜索的次数的总和,作为该关键字对应的大众对该搜索类型域进行搜索的总次数,即对应该搜索类型域的大众搜索总次数;According to historical records, the sum of the number of times all users choose to search with a certain search type field for a search request containing a certain keyword in the search request can be used as the number of times the public corresponding to the keyword searches for this search type field The total number of times, that is, the total number of public searches corresponding to the search type domain;
(2)将所述搜索请求中所有关键字对应的该搜索类型域的大众搜索总次数的和,作为所述搜索请求对应的该搜索类型域的大众搜索总次数。(2) The sum of the total number of popular searches in the search type field corresponding to all the keywords in the search request is taken as the total number of popular searches in the search type field corresponding to the search request.
同样,计算所述搜索请求对应的某个搜索类型域的大众搜索结果点击次数的过程如下:Similarly, the process of calculating the number of clicks on the public search results of a certain search type domain corresponding to the search request is as follows:
(1)计算所述搜索请求中每个关键字对应的某个搜索类型域的大众搜索结果点击总次数;(1) calculating the total number of clicks on popular search results of a certain search type field corresponding to each keyword in the search request;
可以依据历史记录,搜集所有用户关于包含所述搜索请求中某个关键字的搜索请求选择用某个搜索类型域进行搜索的搜索结果点击次数的总和,作为该关键字对应的大众对该搜索类型域的搜索结果点击的总次数,即对应该搜索类型域的大众搜索结果点击总次数;According to historical records, the sum of the number of times all users click on the search results of a search request containing a certain keyword in the search request and choose a certain search type domain is collected as the search type corresponding to the keyword. The total number of clicks on the search results of the domain, that is, the total number of clicks on the public search results corresponding to the search type domain;
(2)将所述搜索请求中所有关键字对应的该搜索类型域的大众搜索结果点击总次数的和,作为所述搜索请求对应的该搜索类型域的大众搜索结果点击总次数。(2) The sum of the total number of clicks on the popular search results of the search type field corresponding to all the keywords in the search request is used as the total number of clicks on the popular search results of the search type field corresponding to the search request.
本发明实施例移动搜索方法,针对用户的搜索请求,通过计算所述搜索请求对应各搜索类型域的大众搜索率,选择大众搜索率高的一个或几个搜索类型域进行搜索,从而可以为用户确定个性化查询分类,为用户提供个性化的精确的搜索结果。In the mobile search method of the embodiment of the present invention, for a user's search request, by calculating the popular search rate of each search type domain corresponding to the search request, one or several search type domains with a high public search rate are selected for searching, thereby providing users with Determine the classification of personalized queries to provide users with personalized and accurate search results.
参照图4,是本发明实施例移动搜索方法的另一种实现流程图。Referring to FIG. 4 , it is another implementation flowchart of the mobile search method in the embodiment of the present invention.
在该实施例中,根据搜索类型域的个性化用户兴趣评分值,选择评分值高的搜索类型域进行搜索,以便为用户提供个性化的准确的搜索结果。In this embodiment, according to the personalized user interest score value of the search type field, a search type field with a high score value is selected for searching, so as to provide users with personalized and accurate search results.
步骤401,接收搜索请求,所述搜索请求中包含一个或多个查询关键字。
步骤402,从用户数据中提取用户的兴趣模型。
所述用户的兴趣模型为所述用户数据针对多个兴趣维度的评分值组成的向量,比如IM(I1,I2,...,In),其中Ii为用户第i个兴趣维度的评分值。可以从用户个性化数据(比如静态档案、搜索点击历史数据、呈现业务信息、本地信息等)中提取用户兴趣模型;也可预先从用户个性化数据中提取出对应的用户兴趣模型并保存,在需要时,直接从这些保存的用户兴趣模型提取所需的用户兴趣模型。The user's interest model is a vector composed of rating values of the user data for multiple interest dimensions, such as IM(I1, I2, ..., In), where Ii is the rating value of the i-th interest dimension of the user. User interest models can be extracted from user personalized data (such as static files, search and click history data, presented business information, local information, etc.); the corresponding user interest models can also be extracted from user personalized data in advance and saved. When needed, the required user interest models are directly extracted from these saved user interest models.
所述用户的兴趣模型可以是静态兴趣模型或动态兴趣模型,当然,也可以是综合静态兴趣模型和动态兴趣模型生成的兴趣模型。The user's interest model may be a static interest model or a dynamic interest model, of course, it may also be an interest model generated by combining the static interest model and the dynamic interest model.
从用户的静态档案中可以提取用户的静态兴趣模型,具体过程可以有以下两种方式:The user's static interest model can be extracted from the user's static profile. The specific process can be in the following two ways:
(1)计算用户的静态档案中属于每个兴趣维度的所有词的词频之和,并将其作为对应每个兴趣维度的评分值,由对应每个兴趣维度的评分值作为向量生成所述用户兴趣模型;(1) Calculate the sum of the word frequencies of all words belonging to each interest dimension in the user's static file, and use it as the score value corresponding to each interest dimension, and generate the user by using the score value corresponding to each interest dimension as a vector interest model;
(2)计算用户的静态档案与每个兴趣维度的相似度评分值,并将其作为对应每个兴趣维度的评分值,由对应每个兴趣维度的评分值作为向量生成所述用户兴趣模型;(2) Calculating the static file of the user and the similarity scoring value of each dimension of interest, and using it as the scoring value corresponding to each dimension of interest, generating the user interest model as a vector by the scoring value corresponding to each dimension of interest;
从用户数据中提取用户的动态兴趣模型,具体过程可以有以下两种方式:Extract the user's dynamic interest model from the user data. The specific process can be in the following two ways:
(1)计算用户的搜索点击历史记录中属于每个兴趣维度的所有词的词频之和,并将其作为对应每个兴趣维度的评分值,由对应每个兴趣维度的评分值作为向量生成所述用户的动态兴趣模型;(1) Calculate the sum of the word frequencies of all words belonging to each dimension of interest in the user's search click history, and use it as the score value corresponding to each dimension of interest, which is generated by the score value corresponding to each dimension of interest as a vector Describe the user's dynamic interest model;
(2)计算搜索点击历史记录与每个兴趣维度的相似度评分值,并将其作为对应每个兴趣维度的评分值,由对应每个兴趣维度的评分值作为向量生成所述用户的动态兴趣模型。(2) Calculate the similarity score value between the search click history record and each interest dimension, and use it as the score value corresponding to each interest dimension, and generate the dynamic interest of the user by using the score value corresponding to each interest dimension as a vector Model.
综合静态兴趣模型和动态兴趣模型生成的兴趣模型可以是:The interest model generated by combining static interest model and dynamic interest model can be:
(1)首先分别对所述静态兴趣模型和所述动态兴趣模型进行归一化处理,然后计算归一化处理后的一个或多个静态兴趣模型、和一个或多个动态兴趣模型的和,并将该和作为所述用户的兴趣模型。(1) first normalize the static interest model and the dynamic interest model respectively, and then calculate the sum of one or more static interest models and one or more dynamic interest models after normalization, And use this sum as the user's interest model.
(2)首先将一个或多个所述静态兴趣模型、和一个或多个所述动态兴趣模型进行加权相加,然后再将加权相加的和进行归一化处理,并将归一化处理后的结果作为所述用户的兴趣模型。(2) First, one or more of the static interest models and one or more of the dynamic interest models are weighted and added, and then the sum of the weighted additions is normalized, and the normalized The final result is used as the user's interest model.
步骤403,将所述搜索类型域对应所述用户兴趣模型的一个或多个兴趣维度的评分值之和作为所述搜索类型域的个性化用户兴趣评分值。Step 403: The sum of the score values of the search type field corresponding to one or more interest dimensions of the user interest model is used as the personalized user interest score value of the search type field.
步骤404,选择评分值高的一个或多个搜索类型域搜索所述查询关键字。Step 404: Select one or more search type domains with high scoring values to search for the query keyword.
例如,将用户的兴趣用n个维度来表示,如:新闻、体育、娱乐、财经、科技、房产、游戏、女性、论坛、天气、商品、家电、音乐、读书、博客、手机、军事、教育、旅游、彩信、彩铃、餐饮、民航、工业、农业、电脑、地理等。所述用户兴趣模型即为用户对每个维度的兴趣的评分值所组成的一个向量W(r1,r2,r3,......,rn)。For example, the user's interests are represented by n dimensions, such as: news, sports, entertainment, finance, technology, real estate, games, women, forums, weather, commodities, home appliances, music, reading, blogs, mobile phones, military, education , tourism, MMS, CRBT, catering, civil aviation, industry, agriculture, computer, geography, etc. The user interest model is a vector W(r1, r2, r3, .
在从用户个性化数据中提取用户兴趣模型时,可以从用户的静态档案中提取,也可以从用户搜索的历史数据中提取。When extracting the user interest model from the user's personalized data, it can be extracted from the user's static profile, or can be extracted from the historical data of the user's search.
从用户的静态档案中提取用户兴趣模型W1可以有以下几种方式:There are several ways to extract the user interest model W1 from the user's static profile:
(1)W1=(p1,p2,p3,......,pn),其中pi为静态档案中类型属于第i个兴趣维度的所有词的词频之和。(1) W1=(p1, p2, p3, . . . , pn), where pi is the sum of the word frequencies of all words whose type belongs to the i-th dimension of interest in the static file.
(2)W1=(p1,p2,p3,......,pn),其中pi为静态档案与第i个兴趣维度的相似度评分值。(2) W1=(p1, p2, p3, . . . , pn), where pi is the similarity score between the static profile and the i-th dimension of interest.
其中,计算静态档案与某个兴趣维度的相似度pi的过程如下:Among them, the process of calculating the similarity pi between a static file and a dimension of interest is as follows:
(a)提取分类器的特征词库,具体为:(a) Extract the feature vocabulary of the classifier, specifically:
(i)对用户的每个兴趣维度分别收集相应的语料集,生成语料库;(i) Collect corresponding corpus for each interest dimension of the user to generate a corpus;
(ii)对所述语料库进行切词,形成一系列词条;(ii) performing word segmentation on the corpus to form a series of entries;
(iii)判断切词后的词条是否为特征词,具体可以采用卡方统计算法(CHI):(iii) To determine whether the entry after word segmentation is a feature word, specifically, the chi-square statistical algorithm (CHI) can be used:
其中,各参数的含义如下:t:某一词条;c:某一类别;N:训练文本总数;A:属于c且包含t的训练文本数;B:不属于c但是包含t的文本数;C:属于c但不包含t的文本数;D:不属于c也不包含t的文本数。如果C、D都是0,那么χ2(t,c)=0;Among them, the meaning of each parameter is as follows: t: a certain entry; c: a certain category; N: the total number of training texts; A: the number of training texts that belong to c and contain t; B: the number of texts that do not belong to c but contain t ;C: the number of texts belonging to c but not including t; D: the number of texts not belonging to c nor including t. If both C and D are 0, then χ 2 (t, c)=0;
词条t对整个训练集的CHI值可定义为:或低于指定阈值的词条可不考虑作为特征词。The CHI value of term t to the entire training set can be defined as: or Entries below the specified threshold may not be considered as feature words.
其中P(c)的计算过程如下:The calculation process of P(c) is as follows:
设类别为C1,C2,...,Cn,Let the categories be C 1 , C 2 , ..., C n ,
则其中,N(Ci)是类别Ci所包含的训练文本的数量;but Among them, N(C i ) is the number of training texts contained in category C i ;
或者,其中,M(Ci)是类别Ci的所有训练文本所包含的词条总数,M是所有训练文本所包含的词条总数。or, Among them, M(C i ) is the total number of entries contained in all training texts of category C i , and M is the total number of entries contained in all training texts.
最终得到的特征词条记为t1,t2,...,tn。The final feature words are denoted as t1, t2, ..., tn.
当然,判断切词后的词条是否为特征词时,并不仅限于上述CHI算法,还可以采用其他算法,比如,χ2(t,c)=|AD-BC|。Of course, when judging whether the word-sliced entry is a feature word, it is not limited to the above-mentioned CHI algorithm, and other algorithms can also be used, for example, χ 2 (t, c)=|AD-BC|.
(b)根据(a)步骤得到的特征词,生成第i个兴趣维度的特征向量Wi=(wi1,wi2,...,wii,...,win),其中wii为特征词ti在第i个兴趣维度中的权重。(b) Generate the feature vector Wi=(wi1,wi2,...,wii,...,win) of the i-th dimension of interest according to the feature words obtained in step (a), where wii is the feature word ti Weights in the i dimensions of interest.
Wii=TFi*log(1+N/GDFi),TFi为特征词ti在属于第i个兴趣维度的所有语料中出现的词频,N为特征词ti在所有兴趣维度的所有语料中文档数量,GDFi(全局文档频率)为所有兴趣维度的所有语料中包含特征词ti的文档数量。Wii=TFi*log(1+N/GDFi), TFi is the word frequency of the feature word ti appearing in all corpora belonging to the i-th interest dimension, N is the number of documents in all corpus of the feature word ti in all interest dimensions, GDFi (Global document frequency) is the number of documents containing the feature word ti in all corpora of all dimensions of interest.
(c)根据(a)步骤得到的特征词,生成用户静态档案的特征向量S=(s1,s2,...,sn),其中si为特征词ti在用户静态档案中的权重。(c) Generate the feature vector S=(s1, s2, .
Si=特征词ti在静态档案中出现的词频。Si = the word frequency of the characteristic word ti appearing in the static file.
(d)计算用户静态档案向量与第i个兴趣维度的特征向量Wi之间的相似度,得到相似度的评分值pi,(d) Calculate the similarity between the user's static profile vector and the feature vector Wi of the i-th interest dimension, and obtain the score value pi of the similarity,
从用户搜索的历史数据中提取用户兴趣模型W2可以有以下几种方式:There are several ways to extract user interest model W2 from historical data searched by users:
W2=d1+d2+d3+......dm,其中di为用户某个点击文档所对应的兴趣模型向量;W2=d1+d2+d3+...dm, where di is the interest model vector corresponding to a document clicked by the user;
获取某个点击文档所对应的兴趣模型向量有两种方法:There are two ways to obtain the interest model vector corresponding to a clicked document:
(1)di=(t1,t2,t3,......,tn),当用户最新点击了这个文档,tj等于文档中类型属于第j个兴趣维度的所有词的词频之和。(1) di=(t1, t2, t3,...,tn), when the user clicks on this document recently, tj is equal to the sum of the word frequencies of all words in the document whose type belongs to the jth dimension of interest.
(2)di=(t1,t2,t3,......,tn),其中di为文档与第i个兴趣维度的相似度评分值。计算di的过程如下:(2) di=(t1, t2, t3, . . . , tn), where di is the similarity score between the document and the i-th dimension of interest. The process of calculating di is as follows:
(a)提取分类器的特征词库,具体为:(a) Extract the feature vocabulary of the classifier, specifically:
(i)对用户的每个兴趣维度分别收集相应的语料集,生成语料库;(i) Collect corresponding corpus for each interest dimension of the user to generate a corpus;
(ii)对所述语料库进行分词,形成一系列词条;(ii) performing word segmentation on the corpus to form a series of entries;
(iii)判断切词后的词条,是否特征词,具体可以采用CHI算法:(iii) To determine whether the entry after word segmentation is a feature word, the CHI algorithm can be used specifically:
其中,各参数的含义如下:t:某一词条;c:某一类别;N:训练文本总数;A:属于c且包含t的文本数;B:不属于c但是包含t的文本数;C:属于c但不包含t的文本数;D:不属于c也不包含t的文本数;如果C、D都是0,那么χ2(t,c)=0。Among them, the meaning of each parameter is as follows: t: a certain entry; c: a certain category; N: the total number of training texts; A: the number of texts that belong to c and contain t; B: the number of texts that do not belong to c but contain t; C: the number of texts belonging to c but not including t; D: the number of texts not belonging to c nor including t; if both C and D are 0, then χ 2 (t, c)=0.
词条t对整个训练集的CHI值可定义为:或低于指定阈值的词条可不考虑作为特征词。The CHI value of term t to the entire training set can be defined as: or Entries below the specified threshold may not be considered as feature words.
设定类别为C1,C2,…,Cn,P(c)的计算过程如下:Set the categories as C 1 , C 2 , ..., C n , and the calculation process of P(c) is as follows:
其中,N(Ci)是类别Ci所包含的训练文本的数量; Among them, N(C i ) is the number of training texts contained in category C i ;
或者,其中,M(Ci)是类别Ci的所有训练文本所包含的词条总数,M是所有训练文本所包含的词条总数。or, Among them, M(C i ) is the total number of entries contained in all training texts of category C i , and M is the total number of entries contained in all training texts.
最终得到的特征词条记为t1,t2,...,tn。The final feature words are denoted as t1, t2, ..., tn.
当然,判断切词后的词条是否为特征词时,并不仅限于上述CHI算法,还可以采用其他算法,比如,χ2(t,c)=|AD-BC|。Of course, when judging whether the word-sliced entry is a feature word, it is not limited to the above-mentioned CHI algorithm, and other algorithms can also be used, for example, χ 2 (t, c)=|AD-BC|.
(b)根据(a)步骤得到的特征词,生成第i个兴趣维度的特征向量Wi=(wi1,wi2,...,wii,...,win),其中wii为特征词ti在第i个兴趣维度中的权重。(b) Generate the feature vector Wi=(wi1,wi2,...,wii,...,win) of the i-th dimension of interest according to the feature words obtained in step (a), where wii is the feature word ti Weights in the i dimensions of interest.
Wii=TFi*log(1+N/GDFi),TFi为特征词ti在属于第i个兴趣维度的所有语料中出现的词频,N为特征词ti在所有兴趣维度的所有语料中文档数量,GDFi(全局文档频率)为所有兴趣维度的所有语料中包含特征词ti的文档数量。Wii=TFi*log(1+N/GDFi), TFi is the word frequency of the feature word ti appearing in all corpora belonging to the i-th interest dimension, N is the number of documents in all corpus of the feature word ti in all interest dimensions, GDFi (Global document frequency) is the number of documents containing the feature word ti in all corpora of all dimensions of interest.
(c)根据(a)步骤得到的特征词,生成文档的特征向量V=(v1,v2,...,vn),其中vi为特征词ti在文档中的权重,vi=特征词ti在文档中出现的词频。(c) According to the feature words obtained in step (a), generate the feature vector V=(v1, v2, ..., vn) of the document, where vi is the weight of the feature word ti in the document, and vi=the feature word ti in The frequency of the term in the document.
(d)计算文档的特征向量v与第i个兴趣维度的特征向量Wi之间的相似度,得到相似度的评分值di:(d) Calculate the similarity between the feature vector v of the document and the feature vector Wi of the i-th dimension of interest, and obtain the score value di of the similarity:
如果用户对某个点击过的文档进行评价,如果评价为好,di向量乘以一个正的常数c,表示文档的重要性增加,即di=c*di=(c*ti,c*t2,c*t3,......,c*tn);如果评价为不好,di向量乘以一个正的常数c的倒数,表示文档的重要性减小,即di=1/c*di=(1/c*ti,1/c*t2,1/c*t3,......,1/c*tn);If the user evaluates a clicked document, if the evaluation is good, the di vector is multiplied by a positive constant c, indicating that the importance of the document increases, that is, di=c*di=(c*ti, c*t2, c*t3,...,c*tn); if the evaluation is not good, the di vector is multiplied by the reciprocal of a positive constant c, indicating that the importance of the document is reduced, that is, di=1/c*di =(1/c*ti, 1/c*t2, 1/c*t3, ..., 1/c*tn);
一段时间后,tj的值自动减少一定的百分比,表示随着时间的推移其重要性减弱,直到过了较长的时间tj的值减为零为止,这时可以将di从历史记录中删除。After a period of time, the value of tj will automatically decrease by a certain percentage, indicating that its importance will decrease as time goes by, until the value of tj decreases to zero after a long period of time, then di can be deleted from the history.
分别对W1和W2作归一化,得到用户兴趣模型W=r1*W1+r2*W2,其中r1+r2=1。W1 and W2 are normalized respectively to obtain a user interest model W=r1*W1+r2*W2, where r1+r2=1.
本发明实施例移动搜索方法,针对用户的搜索请求,通过计算各搜索类型域的个性化用户兴趣评分值,选择评分值高的一个或几个搜索类型域进行搜索,从而可以为用户确定个性化查询分类,为用户提供个性化的精确的搜索结果。In the mobile search method of the embodiment of the present invention, according to the user's search request, by calculating the personalized user interest score value of each search type domain, one or several search type domains with high score value are selected for searching, so as to determine the personalized user interest score value for the user. Query classification to provide users with personalized and accurate search results.
在上面各实施例中,在进行搜索类型域选择时,分别以所述搜索请求与所述搜索类型域的相似度、所述搜索请求对应所述搜索类型域的大众搜索率、以及搜索类型域的个性化用户兴趣评分值作为搜索类型域选择的依据,确定用户的个性化查询分类,为用户提供个性化的精确的搜索结果。In each of the above embodiments, when selecting a search type field, the similarity between the search request and the search type field, the search rate of the search request corresponding to the search type field, and the search type field The personalized user interest score value is used as the basis for selecting the search type field to determine the user's personalized query classification and provide users with personalized and accurate search results.
在本发明实施例中,还可以综合考虑上述任意两项或多项,计算出每个搜索类型域的综合评分值,选择综合评分值高的一个或几个搜索类型域进行搜索。下面以综合考虑上述三项作为搜索类型域选择的依据为例,对本发明实施例详细说明。In the embodiment of the present invention, any two or more of the above-mentioned items may be considered comprehensively to calculate the comprehensive score value of each search type domain, and select one or several search type domains with high comprehensive score values for searching. Hereinafter, the embodiment of the present invention will be described in detail by taking comprehensive consideration of the above three items as the basis for selecting the search type field as an example.
参照图5,是本发明实施例移动搜索方法的另一种实现流程图。Referring to FIG. 5 , it is another implementation flowchart of the mobile search method according to the embodiment of the present invention.
步骤501,接收搜索请求,所述搜索请求中包含一个或多个查询关键字。
步骤502,分别计算所述搜索请求与各搜索类型域的相似度、所述搜索请求对应各搜索类型域的大众搜索率、所述搜索类型域的个性化用户兴趣评分值。
步骤503,将得到对应所述搜索类型域的各值进行归一化处理,得到各搜索类型域的综合评分值。
比如,计算所述搜索请求与某个搜索类型域的相似度,并将其归一化,得到值Score1;For example, calculate the similarity between the search request and a certain search type field, and normalize it to obtain the value Score1;
计算所述搜索请求对应该搜索类型域的大众搜索率,并将其归一化,得到值Score2;Calculate the public search rate of the search request corresponding to the search type domain, and normalize it to obtain the value Score2;
计算该搜索类型域的个性化用户兴趣评分值,并将其归一化,得到值Score3;Calculate the personalized user interest score value of the search type domain and normalize it to obtain the value Score3;
计算该搜索类型域的综合评分值=r1*score1+r2*score2+r3*score3,其中,r1,r2,r3分别为Score1,Score2,Score3的权值,r1+r2+r3+r4=1。Calculate the comprehensive score value of the search type field=r1*score1+r2*score2+r3*score3, where r1, r2, r3 are the weights of Score1, Score2, and Score3 respectively, and r1+r2+r3+r4=1.
综合评分值也可以有其他计算方式,如:The comprehensive score value can also be calculated in other ways, such as:
综合评分值=score1*score2*score3,或者Comprehensive score value = score1*score2*score3, or
综合评分值=(score1+score2+score3)/3,等。Comprehensive score value=(score1+score2+score3)/3, etc.
步骤504,选择综合评分值高的一个或多个搜索类型域进行搜索。
可见,在本发明实施例中,综合考虑了多项因素确定用户的个性化查询分类,计算出每个搜索类型域的综合评分值,选择综合评分值高的一个或几个搜索类型域进行搜索,从而为用户提供个性化的精确的搜索结果。It can be seen that in the embodiment of the present invention, a number of factors are comprehensively considered to determine the user's personalized query classification, the comprehensive score value of each search type field is calculated, and one or several search type fields with high comprehensive score values are selected for searching. , so as to provide users with personalized and accurate search results.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于一计算机可读取存储介质中,所述的存储介质,如:ROM/RAM、磁碟、光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium, and the storage Media, such as: ROM/RAM, disk, CD, etc.
本发明实施例还提供了一种移动搜索装置,如图6所示,是该装置的结构示意图:The embodiment of the present invention also provides a mobile search device, as shown in Figure 6, which is a schematic structural diagram of the device:
在该实施例中,所述装置包括:接收单元601、计算单元602、选择单元603和搜索单元604。其中:In this embodiment, the apparatus includes: a receiving
接收单元601,用于接收搜索请求,所述搜索请求中包含一个或多个查询关键字;A receiving
计算单元602,用于计算各搜索类型域的评分值,所述评分值为以下任意一项的评分值或多项的综合评分值:所述搜索请求与所述搜索类型域的相似度、所述搜索请求对应所述搜索类型域的大众搜索率、搜索类型域的个性化用户兴趣评分值;
计算单元602计算各搜索类型域的综合评分值为:根据搜索请求与搜索类型域的相似度、搜索请求对应搜索类型域的大众搜索率和搜索类型域的个性化用户兴趣评分值中多项计算乘积评分值、平均评分值或加权评分值。The
选择单元603,根据各搜索类型域的评分值选择其中一个或几个搜索类型域;A
搜索单元604,用于利用所述选择单元选择的搜索类型域搜索所述查询关键字。The searching
在本发明实施例中,在计算单元602和选择单元603确定用户的个性化查询分类时,可以有多种实现方式,比如,可以是根据所述搜索请求与所述搜索类型域的相似度,选择相似度高的一个或几个搜索类型域进行搜索;也可以是根据所述搜索请求对应所述搜索类型域的大众搜索率,选择大众搜索率高的一个或几个搜索类型域进行搜索;还可以根据搜索类型域的个性化用户兴趣评分值,选择个性化用户兴趣评分值高的一个或几个搜索类型域进行搜索。当然,还可以是综合考虑上述几项,计算出每个搜索类型域的综合评分值,选择综合评分值高的一个或几个搜索类型域进行搜索。因此,所述计算单元602包括以下任意一个或多个单元:In the embodiment of the present invention, when the
相似度计算单元,用于计算所述搜索请求与各搜索类型域的相似度;a similarity calculation unit, configured to calculate the similarity between the search request and each search type field;
大众搜索率计算单元,用于计算所述搜索请求对应各搜索类型域的大众搜索率;a public search rate calculation unit, configured to calculate the public search rate corresponding to each search type field of the search request;
用户兴趣评分值计算单元,用于计算各搜索类型域的个性化用户兴趣评分值。The user interest scoring value calculation unit is used to calculate the personalized user interest scoring value of each search type domain.
下面对此分别举例详细说明。Examples are given below to explain this in detail.
如图7所示,是本发明实施例移动搜索装置的一种具体结构示意图。As shown in FIG. 7 , it is a schematic structural diagram of a mobile search device according to an embodiment of the present invention.
在该实施例中,所述装置包括:接收单元701、相似度计算单元702、选择单元703和搜索单元704。其中,所述接收单元701、选择单元703和搜索单元704与图6所示实施例中各对应单元一致,在此不再详细描述。In this embodiment, the apparatus includes: a receiving
所述相似度计算单元702包括:权重设置子单元721、查询向量生成子单元722、域向量生成单元723和第一计算子单元724。其中:权重设置子单元721,用于为所述查询关键字设置权重;查询向量生成子单元722,用于由所述查询关键字的权重生成查询向量;域向量生成单元723,用于由所述搜索类型域的各词的权重生成对应该搜索类型域的域向量;第一计算子单元724,用于通过计算所述查询向量和域向量得到所述所述搜索请求与搜索类型域的相似度。The
在该实施例中,所述装置还可进一步包括:设置单元(未图示)或学习单元705。其中,所述设置单元,用于通过人工方式确定所述搜索类型域中的主题词和相关词,以及各词的权重;所述学习单元705,用于通过自动学习方式确定所述搜索类型域中的主题词和相关词,以及各词的权重。In this embodiment, the device may further include: a setting unit (not shown) or a
所述学习单元705包括:语料样本获取子单元751、词库生成子单元752、权重计算子单元753和主题词确定子单元754。其中:语料样本获取子单元751,用于对于每个搜索类型域,获取对应该搜索类型域的训练文本语料样本;词库生成子单元752,用于对所述语料样本进行切词,生成该搜索类型域的词库;The
权重计算子单元753,用于计算所述词库中各词的权重;主题词确定子单元754,用于根据各词的权重确定所述搜索类型域中的主题词和相关词。The
在本发明实施例中,所述学习单元705还可进一步包括:档次划分子单元755和评分值设置子单元756。其中,档次划分子单元755,用于将所述词库中的所有词按照权重划分为不同档次的集合;评分值设置子单元756,用于为每个档次的集合设置最终评分值,并将每个档次的最终评分值作为该档次内的各词的权重。In the embodiment of the present invention, the
本发明实施例移动搜索装置,针对用户的搜索请求,通过计算搜索请求与各搜索类型域的相似度,选择相似度高的一个或几个搜索类型域进行搜索,从而可以为用户确定个性化查询分类,为用户提供个性化的精确的搜索结果。具体过程可参照前面图2所示实施例中的描述,在此不再赘述。The mobile search device in the embodiment of the present invention, according to the user's search request, calculates the similarity between the search request and each search type field, and selects one or several search type fields with high similarity for searching, so as to determine the personalized query for the user Classification, to provide users with personalized and accurate search results. For the specific process, reference may be made to the description in the embodiment shown in FIG. 2 above, and details are not repeated here.
如图8所示,是本发明实施例移动搜索装置的另一种具体结构示意图。As shown in FIG. 8 , it is a schematic diagram of another specific structure of a mobile search device according to an embodiment of the present invention.
在该实施例中,所述装置包括:接收单元801、大众搜索率计算单元802、选择单元803和搜索单元804。其中,所述接收单元801、选择单元803和搜索单元804与图6所示实施例中各对应单元一致,在此不再详细描述。In this embodiment, the device includes: a receiving
所述大众搜索率计算单元802包括第二计算子单元821和相加子单元822,其中,第二计算子单元821,用于计算所述搜索请求中每个查询关键字对应的各搜索类型域的大众搜索率;相加子单元822,用于将所述搜索请求中所有查询关键字对应的同一个搜索类型域的大众搜索率的和作为所述搜索请求对应该搜索类型域的大众搜索率。The public search
在本发明实施例中,所述大众搜索率具体可以是大众搜索次数。所述第二计算子单元821计算所述搜索请求中每个关键字对应的某个搜索类型域的大众搜索总次数时,可以依据历史记录,搜集所有用户关于包含所述搜索请求中某个关键字的搜索请求选择用某个搜索类型域进行搜索的次数的总和,作为该关键字对应的大众对该搜索类型域进行搜索的总次数,即对应该搜索类型域的大众搜索总次数;然后所述相加子单元822将所述搜索请求中所有关键字对应的该搜索类型域的大众搜索总次数的和,作为所述搜索请求对应的该搜索类型域的大众搜索总次数。In this embodiment of the present invention, the public search rate may specifically be the number of public searches. When the
在本发明实施例中,所述大众搜索率具体还可以是大众搜索结果点击次数。所述第二计算子单元821计算所述搜索请求中每个关键字对应的某个搜索类型域的大众搜索结果点击总次数时,可以依据历史记录,搜集所有用户关于包含所述搜索请求中某个关键字的搜索请求选择用某个搜索类型域进行搜索的搜索结果点击次数的总和,作为该关键字对应的大众对该搜索类型域的搜索结果点击的总次数,即对应该搜索类型域的大众搜索结果点击总次数;然后所述相加子单元822将所述搜索请求中所有关键字对应的该搜索类型域的大众搜索结果点击总次数的和,作为所述搜索请求对应的该搜索类型域的大众搜索结果点击总次数。In the embodiment of the present invention, the public search rate may specifically be the number of clicks on public search results. When the
本发明实施例移动搜索装置,针对用户的搜索请求,通过计算所述搜索请求对应各搜索类型域的大众搜索率,选择大众搜索率高的一个或几个搜索类型域进行搜索,从而可以为用户确定个性化查询分类,为用户提供个性化的精确的搜索结果。具体过程可参照前面图3所示实施例中的描述,在此不再赘述。According to the search request of the user, the mobile search device in the embodiment of the present invention calculates the public search rate of each search type domain corresponding to the search request, and selects one or several search type domains with a high public search rate to search, so as to provide users with Determine the classification of personalized queries to provide users with personalized and accurate search results. For the specific process, reference may be made to the description in the embodiment shown in FIG. 3 above, and details are not repeated here.
如图9所示,是本发明实施例移动搜索装置的另一种具体结构示意图。As shown in FIG. 9 , it is a schematic diagram of another specific structure of a mobile search device according to an embodiment of the present invention.
在该实施例中,所述装置包括:接收单元901、用户兴趣评分值计算单元902、选择单元903和搜索单元904。其中,所述接收单元901、选择单元903和搜索单元904与图6所示实施例中各对应单元一致,在此不再详细描述。In this embodiment, the device includes: a receiving unit 901 , a calculation unit 902 for a user interest score value, a selection unit 903 and a search unit 904 . Wherein, the receiving unit 901 , the selecting unit 903 and the searching unit 904 are consistent with the corresponding units in the embodiment shown in FIG. 6 , and will not be described in detail here.
所述用户兴趣评分值计算单元902包括兴趣模型提取子单元921和第三计算子单元922,其中,兴趣模型提取子单元921,用于从用户数据中提取用户的兴趣模型,所述用户的兴趣模型为所述用户数据针对多个兴趣维度的评分值组成的向量;第三计算子单元922,用于将所述搜索类型域对应所述用户兴趣模型的一个或多个兴趣维度的评分值之和作为所述搜索类型域的个性化用户兴趣评分值。The user interest score calculation unit 902 includes an interest model extraction subunit 921 and a third calculation subunit 922, wherein the interest model extraction subunit 921 is used to extract the user interest model from the user data, and the user interest The model is a vector composed of score values of multiple interest dimensions of the user data; the third calculation subunit 922 is configured to map the search type domain to one of the score values of one or more interest dimensions of the user interest model and the personalized user interest score value as the search type field.
在该实施例中,所述用户的兴趣模型为:静态兴趣模型或动态兴趣模型,还可以是综合所述静态兴趣模型或动态兴趣模型而生成的兴趣模型。为此,所述兴趣模型提取子单元921可以有多种结构方式。In this embodiment, the interest model of the user is: a static interest model or a dynamic interest model, or an interest model generated by combining the static interest model or the dynamic interest model. For this reason, the interest model extracting subunit 921 may have various structural modes.
所述兴趣模型提取子单元921可以只包括第一提取子单元(图中未示),用于计算用户的静态档案中属于每个兴趣维度的所有词的词频之和,并将其作为对应每个兴趣维度的评分值,由对应每个兴趣维度的评分值作为向量生成所述用户兴趣模型;The interest model extraction subunit 921 may only include a first extraction subunit (not shown in the figure), which is used to calculate the sum of the word frequencies of all words belonging to each interest dimension in the user's static file, and use it as the sum of the word frequencies corresponding to each interest dimension. The scoring value of each interest dimension, the user interest model is generated as a vector by the scoring value corresponding to each interest dimension;
所述兴趣模型提取子单元921还可以只包括第二提取子单元(图中未示),用于计算用户搜索的历史记录历史记录中被点击的文档中属于每个兴趣维度的所有词的词频之和,并将其作为对应每个兴趣维度的评分值,由对应每个兴趣维度的评分值作为向量生成所述用户的动态兴趣模型。The interest model extracting subunit 921 may also only include a second extracting subunit (not shown in the figure), which is used to calculate the word frequency of all words belonging to each interest dimension in the clicked document in the history record history record of the user search and take it as a score value corresponding to each interest dimension, and use the score value corresponding to each interest dimension as a vector to generate a dynamic interest model of the user.
如图10所示,所述兴趣模型提取子单元921还可以包括所述第一提取子单元1001和所述第二提取子单元1002,以及第一处理子单元1003和第一加权子单元1004。其中,第一处理子单元1003,用于分别对所述静态兴趣模型和所述动态兴趣模型进行归一化处理;第一加权子单元1004,用于计算归一化处理后的静态兴趣模型和动态兴趣模型的和,并将该和作为所述用户的兴趣模型。As shown in FIG. 10 , the interest model extraction subunit 921 may further include the
如图11所示,所述兴趣模型提取子单元921还可以包括所述第一提取子单元1101和所述第二提取子单元1102,以及第二加权子单元1103和第二处理子单元1104。其中,第二加权子单元1103,用于将所述静态兴趣模型和所述动态兴趣模型进行加权相加;第二处理子单元1104,用于将所述第二加权子单元输出的结果进行归一化处理,并将归一化处理后的结果作为所述用户的兴趣模型。As shown in FIG. 11 , the interest model extraction subunit 921 may further include the
本发明实施例移动搜索装置,针对用户的搜索请求,通过计算各搜索类型域的个性化用户兴趣评分值,选择评分值高的一个或几个搜索类型域进行搜索,从而可以为用户确定个性化查询分类,为用户提供个性化的精确的搜索结果。具体过程可参照前面本发明实施例移动搜索方法中的描述。According to the user's search request, the mobile search device in the embodiment of the present invention calculates the personalized user interest score value of each search type domain, and selects one or several search type domains with high score values to search, so as to determine the personalization for the user. Query classification to provide users with personalized and accurate search results. For the specific process, reference may be made to the description in the mobile search method in the embodiment of the present invention.
在上面各实施例的移动搜索装置中,在进行搜索类型域选择时,分别以所述搜索请求与所述搜索类型域的相似度、所述搜索请求对应所述搜索类型域的大众搜索率、以及搜索类型域的个性化用户兴趣评分值作为搜索类型域选择的依据,确定用户的个性化查询分类,为用户提供个性化的精确的搜索结果。In the mobile search device in each of the above embodiments, when selecting a search type domain, the similarity between the search request and the search type domain, the public search rate of the search request corresponding to the search type domain, And the personalized user interest score value of the search type field is used as the basis for selecting the search type field to determine the user's personalized query category and provide the user with personalized and accurate search results.
在本发明实施例中,还可以综合考虑上述任意两项或多项,计算出每个搜索类型域的综合评分值,选择综合评分值高的一个或几个搜索类型域进行搜索。下面以综合考虑上述三项作为搜索类型域选择的依据为例,对本发明实施例详细说明。In the embodiment of the present invention, any two or more of the above-mentioned items may be considered comprehensively to calculate the comprehensive score value of each search type domain, and select one or several search type domains with high comprehensive score values for searching. Hereinafter, the embodiment of the present invention will be described in detail by taking comprehensive consideration of the above three items as the basis for selecting the search type field as an example.
参照图12,是本发明实施例移动搜索装置的另一种结构图。Referring to FIG. 12 , it is another structural diagram of a mobile search device according to an embodiment of the present invention.
在该实施例中,所述装置包括:接收单元1201、计算单元1202、选择单元1203和搜索单元1204。其中,接收单元1201,用于接收搜索请求,所述搜索请求中包含一个或多个查询关键字;计算单元1202,用于计算各搜索类型域的评分值,所述评分值为以下任意一项的评分值或多项的综合评分值:所述搜索请求与所述搜索类型域的相似度、所述搜索请求对应所述搜索类型域的大众搜索率、搜索类型域的个性化用户兴趣评分值;选择单元1203,根据各搜索类型域的评分值选择其中一个或几个搜索类型域;搜索单元1204,用于利用所述选择单元选择的搜索类型域搜索所述查询关键字。In this embodiment, the apparatus includes: a receiving
在该实施例中,所述计算单元1202包括:相似度计算单元1221,大众搜索率计算单元1222,用户兴趣评分值计算单元1223、归一化处理单元1224和综合处理单元1225。其中,相似度计算单元1221,用于计算所述搜索请求与各搜索类型域的相似度;大众搜索率计算单元1222,用于计算所述搜索请求对应各搜索类型域的大众搜索率;用户兴趣评分值计算单元1223,用于计算各搜索类型域的个性化用户兴趣评分值;归一化处理单元1224,用于分别对所述相似度计算单元、所述大众搜索率计算单元和所述用户兴趣评分值计算单元计算得到的值进行归一化处理;综合处理单元1225,用于对归一化处理单元1224得到的任意两个或多个归一化后的值进行综合计算,例如:乘积、平均或加权相加等,得到各搜索类型域的评分值。In this embodiment, the
可见,本发明实施例的移动搜索装置,综合考虑了多项因素确定用户的个性化查询分类,计算出每个搜索类型域的综合评分值,选择综合评分值高的一个或几个搜索类型域进行搜索,从而可以为用户提供个性化的精确的搜索结果。It can be seen that the mobile search device in the embodiment of the present invention comprehensively considers multiple factors to determine the user's personalized query classification, calculates the comprehensive score value of each search type field, and selects one or several search type fields with high comprehensive score values. Search, which can provide users with personalized and accurate search results.
以上对本发明实施例进行了详细介绍,本文中应用了具体实施方式对本发明进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及设备;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The embodiments of the present invention have been described in detail above, and the present invention has been described using specific implementation methods herein. The descriptions of the above embodiments are only used to help understand the method and equipment of the present invention; meanwhile, for those of ordinary skill in the art, According to the idea of the present invention, there will be changes in the specific implementation and scope of application. To sum up, the contents of this specification should not be construed as limiting the present invention.
Claims (27)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910140119A CN101820592A (en) | 2009-02-27 | 2009-07-01 | Method and device for mobile search |
PCT/CN2009/074758 WO2010096986A1 (en) | 2009-02-27 | 2009-11-05 | Mobile search method and device |
US13/219,058 US20110314059A1 (en) | 2009-02-27 | 2011-08-26 | Mobile search method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910118632.7 | 2009-02-27 | ||
CN200910140119A CN101820592A (en) | 2009-02-27 | 2009-07-01 | Method and device for mobile search |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101820592A true CN101820592A (en) | 2010-09-01 |
Family
ID=42655489
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910140119A Pending CN101820592A (en) | 2009-02-27 | 2009-07-01 | Method and device for mobile search |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101820592A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102364467A (en) * | 2011-09-29 | 2012-02-29 | 北京亿赞普网络技术有限公司 | Network search method and system |
CN102436495A (en) * | 2011-11-14 | 2012-05-02 | 百度在线网络技术(北京)有限公司 | Method and device for providing dynamic search page |
CN102436496A (en) * | 2011-11-14 | 2012-05-02 | 百度在线网络技术(北京)有限公司 | Method for providing personated searching labels and device thereof |
CN102521350A (en) * | 2011-12-12 | 2012-06-27 | 浙江大学 | Selection method of distributed information retrieval sets based on historical click data |
CN102955813A (en) * | 2011-08-29 | 2013-03-06 | 中国移动通信集团四川有限公司 | Information searching method and information searching system |
CN102999521A (en) * | 2011-09-15 | 2013-03-27 | 北京百度网讯科技有限公司 | Method and device for identifying search requirement |
CN102999520A (en) * | 2011-09-15 | 2013-03-27 | 北京百度网讯科技有限公司 | Method and device for identifying search request |
CN103339623A (en) * | 2010-09-08 | 2013-10-02 | 纽昂斯通讯公司 | Method and apparatus relating to internet searching |
CN103455499A (en) * | 2012-05-29 | 2013-12-18 | 北京百度网讯科技有限公司 | Method and system for automatically matching search types according to search terms in mobile terminal |
CN103530385A (en) * | 2013-10-18 | 2014-01-22 | 北京奇虎科技有限公司 | Method and device for searching for information based on vertical searching channels |
CN103729359A (en) * | 2012-10-12 | 2014-04-16 | 阿里巴巴集团控股有限公司 | Method and system for recommending search terms |
CN104699737A (en) * | 2013-12-09 | 2015-06-10 | 国际商业机器公司 | Method and system for managing a search |
CN104933090A (en) * | 2015-05-18 | 2015-09-23 | 深圳市金立通信设备有限公司 | Information searching method and terminal |
CN105245589A (en) * | 2015-09-28 | 2016-01-13 | 小米科技有限责任公司 | Information display method and device |
CN105512298A (en) * | 2015-12-10 | 2016-04-20 | 成都陌云科技有限公司 | Interested content prediction method based on machine learning |
CN105550282A (en) * | 2015-12-10 | 2016-05-04 | 成都陌云科技有限公司 | User interest forecasting method by utilizing multidimensional data |
CN108415903A (en) * | 2018-03-12 | 2018-08-17 | 武汉斗鱼网络科技有限公司 | Judge evaluation method, storage medium and the equipment of search intention identification validity |
CN104915429B (en) * | 2015-06-15 | 2018-09-04 | 小米科技有限责任公司 | Keyword search methodology and device |
-
2009
- 2009-07-01 CN CN200910140119A patent/CN101820592A/en active Pending
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103339623A (en) * | 2010-09-08 | 2013-10-02 | 纽昂斯通讯公司 | Method and apparatus relating to internet searching |
CN102955813B (en) * | 2011-08-29 | 2015-11-25 | 中国移动通信集团四川有限公司 | A kind of information search method and system |
CN102955813A (en) * | 2011-08-29 | 2013-03-06 | 中国移动通信集团四川有限公司 | Information searching method and information searching system |
CN102999521A (en) * | 2011-09-15 | 2013-03-27 | 北京百度网讯科技有限公司 | Method and device for identifying search requirement |
CN102999520A (en) * | 2011-09-15 | 2013-03-27 | 北京百度网讯科技有限公司 | Method and device for identifying search request |
CN102999521B (en) * | 2011-09-15 | 2016-06-15 | 北京百度网讯科技有限公司 | A kind of method and device identifying search need |
CN102999520B (en) * | 2011-09-15 | 2016-04-27 | 北京百度网讯科技有限公司 | A kind of method and apparatus of search need identification |
CN102364467A (en) * | 2011-09-29 | 2012-02-29 | 北京亿赞普网络技术有限公司 | Network search method and system |
CN102436496A (en) * | 2011-11-14 | 2012-05-02 | 百度在线网络技术(北京)有限公司 | Method for providing personated searching labels and device thereof |
CN102436495A (en) * | 2011-11-14 | 2012-05-02 | 百度在线网络技术(北京)有限公司 | Method and device for providing dynamic search page |
CN102521350A (en) * | 2011-12-12 | 2012-06-27 | 浙江大学 | Selection method of distributed information retrieval sets based on historical click data |
CN102521350B (en) * | 2011-12-12 | 2014-07-16 | 浙江大学 | Selection method of distributed information retrieval sets based on historical click data |
CN103455499A (en) * | 2012-05-29 | 2013-12-18 | 北京百度网讯科技有限公司 | Method and system for automatically matching search types according to search terms in mobile terminal |
US9489688B2 (en) | 2012-10-12 | 2016-11-08 | Alibaba Group Holding Limited | Method and system for recommending search phrases |
CN103729359A (en) * | 2012-10-12 | 2014-04-16 | 阿里巴巴集团控股有限公司 | Method and system for recommending search terms |
CN103729359B (en) * | 2012-10-12 | 2017-03-01 | 阿里巴巴集团控股有限公司 | A kind of method and system recommending search word |
CN103530385A (en) * | 2013-10-18 | 2014-01-22 | 北京奇虎科技有限公司 | Method and device for searching for information based on vertical searching channels |
CN104699737A (en) * | 2013-12-09 | 2015-06-10 | 国际商业机器公司 | Method and system for managing a search |
US11176124B2 (en) | 2013-12-09 | 2021-11-16 | International Business Machines Corporation | Managing a search |
US10176227B2 (en) | 2013-12-09 | 2019-01-08 | International Business Machines Corporation | Managing a search |
US9996588B2 (en) | 2013-12-09 | 2018-06-12 | International Business Machines Corporation | Managing a search |
CN104933090A (en) * | 2015-05-18 | 2015-09-23 | 深圳市金立通信设备有限公司 | Information searching method and terminal |
CN104915429B (en) * | 2015-06-15 | 2018-09-04 | 小米科技有限责任公司 | Keyword search methodology and device |
CN105245589A (en) * | 2015-09-28 | 2016-01-13 | 小米科技有限责任公司 | Information display method and device |
CN105245589B (en) * | 2015-09-28 | 2019-06-14 | 小米科技有限责任公司 | Information displaying method and device |
CN105550282A (en) * | 2015-12-10 | 2016-05-04 | 成都陌云科技有限公司 | User interest forecasting method by utilizing multidimensional data |
CN105512298A (en) * | 2015-12-10 | 2016-04-20 | 成都陌云科技有限公司 | Interested content prediction method based on machine learning |
CN108415903A (en) * | 2018-03-12 | 2018-08-17 | 武汉斗鱼网络科技有限公司 | Judge evaluation method, storage medium and the equipment of search intention identification validity |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101820592A (en) | Method and device for mobile search | |
CN109815308B (en) | Method and device for determining intention recognition model and method and device for searching intention recognition | |
CN103593425B (en) | Intelligent retrieval method and system based on preference | |
CN101661475B (en) | Search method and system | |
CN106815297B (en) | Academic resource recommendation service system and method | |
CN102056335B (en) | Mobile search method, device and system | |
US8380697B2 (en) | Search and retrieval methods and systems of short messages utilizing messaging context and keyword frequency | |
US20110314059A1 (en) | Mobile search method and apparatus | |
CN107958014B (en) | Search engine | |
CN108694647B (en) | Method and device for mining merchant recommendation reason and electronic equipment | |
CN105260390B (en) | A Group-Oriented Item Recommendation Method Based on Joint Probability Matrix Factorization | |
CN111090771B (en) | Song searching method, device and computer storage medium | |
CN103310003A (en) | Method and system for predicting click rate of new advertisement based on click log | |
CN108334610A (en) | A kind of newsletter archive sorting technique, device and server | |
CN102831128A (en) | Method and device for sorting information of namesake persons on Internet | |
CN102968417A (en) | Searching method and system applied to computer network | |
CN103440242A (en) | User search behavior-based personalized recommendation method and system | |
CN103020049A (en) | Searching method and searching system | |
CN101685456B (en) | A search method, system and device | |
CN103473244A (en) | Device and method for recommending applications used in application group | |
CN103744918A (en) | Vertical domain based micro blog searching ranking method and system | |
CN105653546B (en) | Method and system for retrieving a target subject | |
CN106951420A (en) | Literature search method and apparatus, author's searching method and equipment | |
CN115168700A (en) | Information flow recommendation method, system and medium based on pre-training algorithm | |
CN104077327A (en) | Core word importance recognition method and equipment and search result sorting method and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20100901 |