[go: up one dir, main page]

CN106649818B - Application search intent identification method, device, application search method and server - Google Patents

Application search intent identification method, device, application search method and server Download PDF

Info

Publication number
CN106649818B
CN106649818B CN201611246921.1A CN201611246921A CN106649818B CN 106649818 B CN106649818 B CN 106649818B CN 201611246921 A CN201611246921 A CN 201611246921A CN 106649818 B CN106649818 B CN 106649818B
Authority
CN
China
Prior art keywords
search
word
keyword
search term
search word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201611246921.1A
Other languages
Chinese (zh)
Other versions
CN106649818A (en
Inventor
庞伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201611246921.1A priority Critical patent/CN106649818B/en
Publication of CN106649818A publication Critical patent/CN106649818A/en
Application granted granted Critical
Publication of CN106649818B publication Critical patent/CN106649818B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种应用搜索意图的识别方法、装置、应用搜索方法和服务器,该方法包括:从应用搜索引擎的查询会话日志中获取各查询会话中的搜索词;根据各查询会话中的搜索词以及预设策略,挖掘出各搜索词的标签体系;根据每个搜索词的标签体系识别出该搜索词对应的应用搜索意图。该方案中与app应用标签体系相匹配的用户意图识别方法标签法的提出,灵活的表达用户细粒度的查询意图。基于无监督机器学习技术构建用户意图的标签体系,抛弃了传统的用户意图分类方法,实现了一套自动化用户意图挖掘流程,可生成高准确率、召回率的用户意图标签列表,把用户意图和app应用映射到同一个标签体系内,使得用户在搜索应用时,能够快速精准的获取能满足意图的app应用。

Figure 201611246921

The invention discloses an application search intention identification method, device, application search method and server. The method includes: acquiring search words in each query session from a query session log of an application search engine; According to the tag system of each search term, the corresponding application search intent of the search term is identified. In this scheme, a user intent identification method matching the app application tagging system is proposed, which can flexibly express the user's fine-grained query intent. Based on unsupervised machine learning technology, a user intent labeling system is constructed, which abandons the traditional user intent classification method and realizes a set of automatic user intent mining process, which can generate a user intent label list with high accuracy and recall rate. Apps are mapped to the same tag system, so that users can quickly and accurately obtain apps that meet their intent when searching for apps.

Figure 201611246921

Description

Application search intention identification method and device, application search method and server
Technical Field
The invention relates to the field of data mining, in particular to an application search intention identification method and device, an application search method and a server.
Background
The application search engine is a mobile terminal software application search engine service, and provides search and download of apps on mobile phones, such as 360 mobile phone assistants, Tencent apps, Google Play, Appstore and the like. The application search engine is a mobile search service installed on a mobile phone, such as 360 mobile phone assistant app, and due to the limitation of objective conditions such as small display plane of search results, only the accurate search results are provided to obtain the best user experience, and the application search engine is also one of important differences between mobile search and PC end web page search. The number of mobile terminal app applications is huge, millions of app applications exist, and the app search engine can accurately show the desired app applications in the user on the premise of understanding the query intention of the user.
The premise of providing accurate search service by applying a search engine is to accurately understand the query intention of a user. Potential search intentions are hidden behind each query request of the user, if the application search engine can sense the requirements of the user, the search word texts are mapped to corresponding app application functions or app application categories, and app application results which are more consistent with the user intentions are arranged in the front, so that the search experience of the user is obviously enhanced. Therefore, the user intention identification is the core technology of the application search engine and is also the key for realizing the functional search technology.
In the existing traditional web search engine technology, user search intentions are manually sorted and classified, and are classified into three types, namely a navigation type, an information type and a resource type, but the user intention classification method aiming at web pages is not suitable for app application scenes. Because each app application has a fixed application field, a specific function is provided for people, the function requirement of mining the fine granularity of users by using tags is appropriate, and the classification-based method is not suitable due to wide granularity and wide range. Therefore, there is no very flexible and efficient method to date that can meet the increasing user demand for fast and accurate searching of app applications.
Disclosure of Invention
In view of the above problems, the present invention has been made to provide an identification method, apparatus, application search method, and server of an application search intention that overcome the above problems or at least partially solve the above problems.
According to an aspect of the present invention, there is provided an identification method of an application search intention, the method including:
obtaining search terms in each query session from a query session log of an application search engine;
excavating a label system of each search term according to the search term in each query session and a preset strategy;
and identifying the application search intention corresponding to each search word according to the label system of each search word.
Optionally, mining a label system of each search term according to the search term in each query session and a preset policy includes:
obtaining a training corpus set according to search terms in each query session;
inputting the training corpus set into an LDA model for training to obtain a search word-subject probability distribution result and a subject-keyword probability distribution result output by the LDA model;
and calculating to obtain a label system of each search word according to the search word-theme probability distribution result and the theme-keyword probability distribution result.
Optionally, the obtaining a corpus set according to the search term in each query session includes:
obtaining the original corpus of each search term according to the search terms in each query session;
the original linguistic data of each search word form an original linguistic data set; and preprocessing the original corpus set to obtain a training corpus set.
Optionally, the obtaining the original corpus of the search terms according to the search terms in each query session includes:
obtaining a search word sequence set corresponding to a plurality of query sessions according to search words in each query session; obtaining a search term set corresponding to a plurality of query sessions;
training the search word sequence set to obtain an N-dimensional search word vector file;
for each search word in the search word set, calculating the association degree between the search word and other search words according to the N-dimensional search word vector file; and taking other search terms of which the association degrees with the search terms accord with preset conditions as the original linguistic data of the search terms.
Optionally, the obtaining the search word sequence sets corresponding to the plurality of query sessions includes:
for each query session, arranging the search terms in the query session into a sequence in sequence; if one search term in the sequence corresponds to an application download operation, inserting the name of the downloaded application into the rear adjacent position of the corresponding search term in the sequence; obtaining a search word sequence corresponding to the query session;
the obtaining a set of search terms corresponding to a plurality of query sessions comprises: and taking the set of search terms in the plurality of query sessions as the set of search terms corresponding to the plurality of query sessions.
Optionally, training the search word sequence set to obtain an N-dimensional search word vector file includes:
and taking each search word in the search word sequence set as a word, and training the search word sequence set by using a deep learning tool kit word2vec to generate an N-dimensional search word vector file.
Optionally, for each search term in the search term set, calculating a degree of association between the search term and each other search term according to the N-dimensional search term vector file; taking other search terms with the association degree meeting preset conditions with the search term as the original corpus of the search term, wherein the other search terms comprise:
calculating the search word set and the N-dimensional search word vector file by using a KNN algorithm, and calculating the distance between every two search words in the search word set according to the N-dimensional search word vector file;
and for each search word in the search word set, sorting the search words according to the distance from the search word from large to small, and selecting the search words with the first preset threshold as the original corpus of the search word.
Optionally, the preprocessing the original corpus set includes:
in the original corpus collection, the corpus is divided into a plurality of parts,
for each original corpus, performing word segmentation processing on the original corpus to obtain word segmentation results containing a plurality of lexical items; searching phrases formed by adjacent terms in the word segmentation result; and reserving the phrases, the lexical items belonging to nouns and the lexical items belonging to verbs in the word segmentation result as the corresponding reserved keywords of the original corpus.
Optionally, the searching for phrases composed of adjacent terms in the word segmentation result includes:
and calculating the cPId values of every two adjacent terms in the word segmentation result, and determining that the two adjacent terms form a phrase when the cPId values of the two adjacent terms are larger than a second preset threshold.
Optionally, the preprocessing the original corpus set further includes:
using the key words correspondingly reserved in the original material of each search word as the first-stage training corpus of the search word;
the first-stage training corpus of each search word forms a first-stage training corpus set; and carrying out data cleaning on the keywords in the first-stage corpus set.
Optionally, the performing data cleaning on the keywords in the first-stage corpus set includes:
in the first-stage corpus set,
calculating a TF-IDF value of each keyword in a first-stage training corpus of each search word; deleting the key words with the TF-IDF value higher than a third preset threshold value and/or lower than a fourth preset threshold value to obtain a training corpus of the search word;
the corpus of each search term constitutes a corpus set.
Optionally, the calculating, according to the search term-topic probability distribution result and the topic-keyword probability distribution result, a tag system of each search term includes:
calculating to obtain a search word-keyword probability distribution result according to the search word-topic probability distribution result and the topic-keyword probability distribution result;
and according to the search word-keyword probability distribution result, for each search word, sorting the keywords according to the probability of the search word from large to small, and selecting the keywords with the number of the top fifth preset threshold value.
Optionally, the calculating a search term-keyword probability distribution result according to the search term-topic probability distribution result and the topic-keyword probability distribution result includes:
for each search word, obtaining the probability of each topic about the search word according to the search word-topic probability distribution result;
for each topic, obtaining the probability of each keyword about the topic according to the topic-keyword probability distribution result;
for each keyword, taking the product of the probability of the keyword about a subject and the probability of the subject about a search word as the probability of the keyword about the search word based on the subject; and taking the probability of the keyword about the search word as the probability of the keyword based on the sum of the probabilities of the topics about the search word.
Optionally, the step of obtaining a tag system of each search term by calculation according to the search term-topic probability distribution result and the topic-keyword probability distribution result further includes:
taking the keywords with the number of the first fifth preset threshold value correspondingly selected by each search word as a first-stage label system of the search word;
for the first-stage label system of each search word, calculating a semantic relation value between each keyword in the first-stage label system of the search word and the search word; for each keyword, taking the product of the semantic relation value corresponding to the keyword and the probability of the keyword relative to the search word as the corrected probability of the keyword relative to the search word; and sorting all the keywords in the first-stage label system of the search word according to the correction probability of the search word from large to small, and selecting the first sixth preset threshold number of keywords to form the label system of the search word.
Optionally, calculating a semantic relationship value between each keyword in the first-stage tagging system of the search term and the search term includes:
obtaining a search word sequence set corresponding to a plurality of query sessions according to search words in each query session; training the search word sequence set to obtain an N-dimensional keyword vector file;
calculating word vectors of the keywords according to the N-dimensional keyword vector files, and calculating the word vectors of each term in the search words;
calculating cosine similarity between the word vector of the keyword and the word vector of each term, and taking the cosine similarity as a semantic relation value of the keyword and the corresponding term;
and taking the sum of the semantic relation values of the keyword and each term as the semantic relation value between the keyword and the search word.
Optionally, the training the search word sequence set to obtain an N-dimensional keyword vector file includes:
and performing word segmentation processing on the search word sequence set, and training the search word sequence set subjected to word segmentation processing by using a deep learning tool package word2vec to generate an N-dimensional keyword vector file.
Optionally, the step of obtaining a tag system of each search term by calculation according to the search term-topic probability distribution result and the topic-keyword probability distribution result further includes:
taking the first sixth preset threshold number of keywords correspondingly selected by each search word as a second stage label system of the search word;
for the second-stage label system of each search word, counting the TF-IDF value of each keyword in the second-stage label system of the search word in the training corpus of the search word; for each keyword, taking the product of the probability of the keyword about the search word and the TF-IDF value as the secondary correction probability of the keyword about the search word; and sequencing all the keywords in the second stage label system of the search word from large to small according to the secondary correction probability of the search word, and selecting the first K keywords to form the label system of the search word.
Optionally, the tag system for forming the search term by the first K selected keywords includes:
acquiring the query times of the search terms in a preset time period from a query session log of an application search engine;
selecting the first K key words to form a label system of the search word according to the query times; wherein the value of K is used as a broken line function of the query times corresponding to the search term.
According to another aspect of the present invention, there is provided an application search method, including:
constructing a search word tag database, wherein the search word tag database comprises a tag system of a plurality of search words;
receiving a current search word uploaded by a client, and acquiring a tag system of the current search word according to the search word tag database;
calculating the degree of association between the label system of the current search term and the label system of each application;
when the correlation degree between the label system of the current search word and the label system of one application meets a preset condition, returning the relevant information of the application to the client for displaying;
constructing the search term tag database by a method according to any one of the first aspect of the invention.
Optionally, the obtaining a tag system of the current search term according to the search term tag database includes:
calculating semantic similarity between the current search word and each search word in the search word tag database, sorting the search words according to the semantic similarity from large to small, and selecting the first preset threshold search words;
and obtaining the label system of the current search word according to the label system of each selected search word.
Optionally, the calculating semantic similarity between the current search word and each search word in the search word tag database includes: calculating the Euclidean distance between the current search word and each search word in the search word tag database, and taking the Euclidean distance between each search word and the current search word as the semantic similarity corresponding to the search word;
the obtaining of the label system of the current search term according to the label system of each selected search term includes: the semantic similarity corresponding to each search word is used as the weight of each label in the label system of the search word; adding the weights of the same labels for the labels corresponding to the label system of each search term to obtain the final weight of each label; and sorting according to the final weight from large to small, and selecting the labels with the first second preset threshold value to form a label system of the current search term.
According to another aspect of the present invention, there is provided an apparatus for identifying an application search intention, the apparatus including:
the acquisition unit is suitable for acquiring search terms in each query session from a query session log of an application search engine;
the mining unit is suitable for mining a label system of each search term according to the search term in each query session and a preset strategy;
and the identification unit is suitable for identifying the application search intention corresponding to each search word according to the label system of the search word.
Optionally, the mining unit is adapted to obtain a corpus set according to search terms in each query session; inputting the training corpus set into an LDA model for training to obtain a search word-subject probability distribution result and a subject-keyword probability distribution result output by the LDA model; and calculating to obtain a label system of each search word according to the search word-theme probability distribution result and the theme-keyword probability distribution result.
Optionally, the mining unit is adapted to obtain an original corpus of each search term according to the search term in each query session; the original linguistic data of each search word form an original linguistic data set; and preprocessing the original corpus set to obtain a training corpus set.
Optionally, the mining unit is adapted to obtain a search word sequence set corresponding to a plurality of query sessions according to search words in each query session; obtaining a search term set corresponding to a plurality of query sessions; training the search word sequence set to obtain an N-dimensional search word vector file; for each search word in the search word set, calculating the association degree between the search word and other search words according to the N-dimensional search word vector file; and taking other search terms of which the association degrees with the search terms accord with preset conditions as the original linguistic data of the search terms.
Optionally, the mining unit is adapted to, for each query session, arrange the search terms in the query session into a sequence in order; if one search term in the sequence corresponds to an application download operation, inserting the name of the downloaded application into the rear adjacent position of the corresponding search term in the sequence; obtaining a search word sequence corresponding to the query session; and taking the set of search terms in the plurality of query sessions as the set of search terms corresponding to the plurality of query sessions.
Optionally, the mining unit is adapted to train the search word sequence set by using a deep learning tool package word2vec to generate an N-dimensional search word vector file, where each search word in the search word sequence set is used as a word.
Optionally, the mining unit is adapted to perform an operation on the search term set and the N-dimensional search term vector file by using a KNN algorithm, and calculate a distance between every two search terms in the search term set according to the N-dimensional search term vector file; and for each search word in the search word set, sorting the search words according to the distance from the search word from large to small, and selecting the search words with the first preset threshold as the original corpus of the search word.
Optionally, the mining unit is adapted to perform word segmentation processing on each original corpus in the original corpus set to obtain a word segmentation result including a plurality of terms; searching phrases formed by adjacent terms in the word segmentation result; and reserving the phrases, the lexical items belonging to nouns and the lexical items belonging to verbs in the word segmentation result as the corresponding reserved keywords of the original corpus.
Optionally, the mining unit is adapted to calculate cpmd values of every two adjacent terms in the word segmentation result, and determine that the two adjacent terms form a phrase when the cpmd values of the two adjacent terms are greater than a second preset threshold.
Optionally, the mining unit is further adapted to use a keyword, which is reserved corresponding to the original material of each search term, as a first-stage training corpus of the search term; the first-stage training corpus of each search word forms a first-stage training corpus set; and carrying out data cleaning on the keywords in the first-stage corpus set.
Optionally, the mining unit is adapted to calculate, in the first-stage corpus, for a first-stage corpus of each search word, a TF-IDF value of each keyword in the first-stage corpus; deleting the key words with the TF-IDF value higher than a third preset threshold value and/or lower than a fourth preset threshold value to obtain a training corpus of the search word; the corpus of each search term constitutes a corpus set.
Optionally, the mining unit is adapted to calculate a search term-keyword probability distribution result according to the search term-topic probability distribution result and the topic-keyword probability distribution result; and according to the search word-keyword probability distribution result, for each search word, sorting the keywords according to the probability of the search word from large to small, and selecting the keywords with the number of the top fifth preset threshold value.
Optionally, the mining unit is adapted to, for each search word, obtain, according to the search word-topic probability distribution result, a probability of each topic about the search word; for each topic, obtaining the probability of each keyword about the topic according to the topic-keyword probability distribution result; for each keyword, taking the product of the probability of the keyword about a subject and the probability of the subject about a search word as the probability of the keyword about the search word based on the subject; and taking the probability of the keyword about the search word as the probability of the keyword based on the sum of the probabilities of the topics about the search word.
Optionally, the mining unit is further adapted to use the first-fifth preset threshold number of keywords, which are selected corresponding to each search word, as a first-stage tag system of the search word; for the first-stage label system of each search word, calculating a semantic relation value between each keyword in the first-stage label system of the search word and the search word; for each keyword, taking the product of the semantic relation value corresponding to the keyword and the probability of the keyword relative to the search word as the corrected probability of the keyword relative to the search word; and sorting all the keywords in the first-stage label system of the search word according to the correction probability of the search word from large to small, and selecting the first sixth preset threshold number of keywords to form the label system of the search word.
Optionally, the mining unit is adapted to obtain a search word sequence set corresponding to a plurality of query sessions according to search words in each query session; training the search word sequence set to obtain an N-dimensional keyword vector file; calculating word vectors of the keywords according to the N-dimensional keyword vector files, and calculating the word vectors of each term in the search words; calculating cosine similarity between the word vector of the keyword and the word vector of each term, and taking the cosine similarity as a semantic relation value of the keyword and the corresponding term; and taking the sum of the semantic relation values of the keyword and each term as the semantic relation value between the keyword and the search word.
Optionally, the mining unit is adapted to perform word segmentation on the search word sequence set, and train the search word sequence set after word segmentation by using a deep learning tool package word2vec to generate an N-dimensional keyword vector file.
Optionally, the mining unit is further adapted to use a first sixth preset threshold number of keywords, which are selected corresponding to each search word, as a second-stage tag system of the search word; for the second-stage label system of each search word, counting the TF-IDF value of each keyword in the second-stage label system of the search word in the training corpus of the search word; for each keyword, taking the product of the probability of the keyword about the search word and the TF-IDF value as the secondary correction probability of the keyword about the search word; and sequencing all the keywords in the second stage label system of the search word from large to small according to the secondary correction probability of the search word, and selecting the first K keywords to form the label system of the search word.
Optionally, the mining unit is adapted to obtain, from a query session log of an application search engine, a number of queries about the search term in a preset time period; selecting the first K key words to form a label system of the search word according to the query times; wherein the value of K is used as a broken line function of the query times corresponding to the search term.
According to still another aspect of the present invention, there is provided an application search server, including:
the database construction unit is suitable for constructing a search word tag database which comprises a plurality of tag systems of search words;
the interaction unit is suitable for receiving the current search terms uploaded by the client;
the search processing unit is suitable for acquiring a label system of the current search word according to the search word label database; calculating the degree of association between the label system of the current search term and the label system of each application;
the interaction unit is also suitable for returning the relevant information of the application to the client side for displaying when the association degree between the label system of the current search word and the label system of the application meets the preset condition;
the database construction unit is the same as the process of constructing the search word tag database by the recognition apparatus of application search intention according to any one of claims 22 to 39.
Optionally, the search processing unit is adapted to calculate semantic similarities between the current search word and the search words in the search word tag database, sort the search words according to the semantic similarities from large to small, and select a first preset threshold number of search words; and obtaining the label system of the current search word according to the label system of each selected search word.
Optionally, the search processing unit is adapted to calculate an euclidean distance between a current search word and each search word in the search word tag database, and use the euclidean distance between each search word and the current search word as a semantic similarity corresponding to the search word; the semantic similarity corresponding to each search word is used as the weight of each label in the label system of the search word; adding the weights of the same labels for the labels corresponding to the label system of each search term to obtain the final weight of each label; and sorting according to the final weight from large to small, and selecting the labels with the first second preset threshold value to form a label system of the current search term.
According to the scheme of the invention, a user intention identification method-a label method matched with an app application label system is provided, the label system corresponding to the search word is mined flexibly, effectively and accurately, a search word label database is established, and the search word input by the user can be accurately described by the label system, so that the problem of user intention identification is solved. Further, the user intention and the app can be mapped into the same tag system, and more accurate application search results can be obtained by performing search matching. Therefore, the problem of user intention identification is solved, the problem of related calculation of the application search engine is solved, and a foundation is laid for a core technology-function search technology in the application search engine.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 illustrates a flow diagram of a method for identifying application search intent, according to one embodiment of the present invention;
FIG. 2 illustrates a flow diagram of a method of application searching in accordance with one embodiment of the present invention;
FIG. 3 is a diagram illustrating an apparatus for recognizing application search intention according to an embodiment of the present invention; and
FIG. 4 shows a schematic diagram of an application search server, according to one embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Hereinafter, app represents an application, query represents a search term, tag represents a tag, and Session represents a query Session.
The invention provides a new user intention identification method aiming at an application search engine, a label method is used for flexibly and effectively expressing the fine-grained query intention of a user, a label system of the user intention is constructed based on an unsupervised machine learning technology, the traditional user intention classification method is abandoned, a set of automatic user intention mining process is realized, a user intention label list with high accuracy and recall rate can be generated, the query and the app of the user are mapped into a common label system, meanwhile, the problems of user intention identification and correlation calculation of the application search engine are solved, and a very good effect is achieved.
FIG. 1 shows a flow diagram of a method for identifying application search intent, according to one embodiment of the invention. As shown in fig. 1, the method includes:
step S110, obtaining search terms in each query session from a query session log of an application search engine;
step S120, excavating a label system of each search term according to the search term in each query session and a preset strategy;
step S130, identifying the application search intention corresponding to each search word according to the label system of the search word.
Therefore, the traditional user intention identification is a classification method designed aiming at web pages, is not suitable for app application scenes, each application has a fixed application field, a specific function is provided for people, the function requirement of mining the fine granularity of a user by using a label is proper, and the classification-based method is wide in granularity and is not suitable. The scheme provides a user intention identification method-a label method matched with an application label system, is flexible and effective, maps the user intention and the application into the same label system, solves the problem of user intention identification, solves the problem of correlation calculation of an application search engine, and is the basis for realizing a core technology-a function search technology of the application search engine.
In general, a user search word is a short text, and the search word features constructed by the user according to the thought needs of the user are sparse and cannot comprehensively describe the needs. However, if a user only finds an app application with a single functional scenario in a short period of time, query search terms are often rewritten around a single requirement, and there is usually a strong semantic relationship between the issued query terms, which is an important feature of an application search engine.
In the search engine service, the system automatically records the relevant information related to the user search and stores the relevant information in the query log. For example, when a user opens an Baidu search page, sequentially inputs search terms such as "game", "game software", "funny game", "game application download", etc. and then enters the search page, or after entering a certain search page, continues to input some search terms to perform a search action until the user completes the search event and closes the entire Baidu search page, and the entire process is called a query session.
In an embodiment of the present invention, the step S120 of mining a label system of each search term according to the search term in each query session and a preset policy includes: obtaining a training corpus set according to search terms in each query session; inputting the training corpus set into an LDA model for training to obtain a search word-subject probability distribution result and a subject-keyword probability distribution result output by the LDA model; and calculating to obtain a label system of each search word according to the search word-theme probability distribution result and the theme-keyword probability distribution result.
In the process of obtaining the training corpus, the technical difficulty is that the query short text is expanded into a long text, and then one query can be regarded as a document, which is a key for effectively utilizing an LDA topic model, so that the intention tag with high accuracy and high recall rate is generated. The intention tag is divided into a categorical tag and a functional tag, the categorical tag reflects the application field of the user's needs, and the functional tag reflects the user's specific needs.
Wherein the obtaining of the corpus set according to the search terms in each query session includes:
obtaining the original corpus of each search term according to the search terms in each query session; the original linguistic data of each search word form an original linguistic data set; and preprocessing the original corpus set to obtain a training corpus set. Specifically, the obtaining the original corpus of the search terms according to the search terms in each query session includes: obtaining a search word sequence set corresponding to a plurality of query sessions according to search words in each query session; and obtaining a set of search terms corresponding to the plurality of query sessions.
The method comprises the steps of keeping a query search word sequence in a query session, taking the search word as a whole, downloading some apps by a user under a certain query, and splicing app names right behind the query sequence. Such as: a user session sequence is query1, query2, query3, and the user downloads an app1 after entering query2 and spells app1 behind query2 and in front of query3, namely query1, query2, app1, and query 3. Each session sequence is a line and is output to a file session _ query-app _ list. And outputs all the queries to another file query _ all.
Training the search word sequence set to obtain an N-dimensional search word vector file; for each search word in the search word set, calculating the association degree between the search word and other search words according to the N-dimensional search word vector file; and taking other search terms of which the association degrees with the search terms meet preset conditions as the original linguistic data of the search terms.
In an embodiment of the present invention, the obtaining the search word sequence sets corresponding to the plurality of query sessions includes: for each query session, arranging the search terms in the query session into a sequence in sequence; if one search term in the sequence corresponds to an application download operation, inserting the name of the downloaded application into the rear adjacent position of the corresponding search term in the sequence; obtaining a search word sequence corresponding to the query session; the obtaining a set of search terms corresponding to a plurality of query sessions comprises: and taking the set of search terms in the plurality of query sessions as the set of search terms corresponding to the plurality of query sessions.
For example, a user enters "search term 1", "search term 2", and "search term 3" in sequence in a query session, and the user downloads an app1 after entering "search term 2". Therefore, the search word sequence corresponding to the query session is: search term 1, search term 2, app1, search term 3. The search word sequence corresponding to each query session is a row, and the search word sequences corresponding to the plurality of query sessions are collected into a plurality of rows.
The training of the search word sequence set to obtain the N-dimensional search word vector file includes: and taking each search word in the search word sequence set as a word, and training the search word sequence set by using a deep learning tool kit word2vec to generate an N-dimensional search word vector file. For example, a deep learning tool kit word2vec is used for training, a query vector with 300 dimensions is generated, and a query vector file query _ w2v _300. ditt, namely a search word vector file, is generated.
In fact, the user can input various search words in the form of a noun (e.g., "game"), a phrase (e.g., "funny game"), or a sentence (e.g., "I want to download a funny game") when searching for the application desired by the query.
In an embodiment of the present invention, the search term vector file obtained in the foregoing is used as a basis for calculating a term vector for each search term in the search term set, and for each search term in the search term set, the degree of association between the search term and each other search term is calculated according to the N-dimensional search term vector file; taking other search terms, the association degrees of which accord with preset conditions, as the original corpus of the search terms, specifically including:
calculating the search word set and the N-dimensional search word vector file by using a KNN algorithm, and calculating the distance between every two search words in the search word set according to the N-dimensional search word vector file; and for each search word in the search word set, sorting the search words according to the distance from the search word from large to small, and selecting the search words with the first preset threshold as the original corpus of the search word.
Table 1 shows the top 10 nearest neighbors of the search term "dog" in one embodiment of the present invention, including the search term and app application name, as shown in the first column of table 1 as "dog phone entry method", "dog entry method", etc. The first preset value 10 in this example is shown, and the second column in table 1 indicates the distance between the corresponding nearest neighbor and the search term "dog search".
TABLE 1
Nearest neighbor Statistical index (based on Euclidean distance)
Dog searching mobile phone input method 38 303.827 0.838104
Dog searching input method 26 323.494 0.845153
Sogou 20332 372.525 0.778589
Dog collecting device 6986 385.809 0.76965
Dog searching pinyin 14577 410.986 0.753037
Dog searching input methodMillet edition 4042 423.929 0.746941
Dog searching phonetic input method 4927 435.273 0.736172
Fox searching input method 18233 452.955 0.724872
Dog searching input 10274 455.505 0.720034
Mobile phone dog searching input method 3075 476.93 0.721099
Table 2 shows the top 10 nearest neighbors of the search term "lottery drawing inquiry" in an embodiment of the present invention, and the corresponding representative meanings are similar to those in table 1 and will not be described again.
TABLE 2
Figure BDA0001197316370000151
Figure BDA0001197316370000161
In an embodiment of the present invention, after obtaining an original corpus set corresponding to each search term, the preprocessing the original corpus set includes:
in the original corpus set, performing word segmentation processing on each original corpus to obtain a word segmentation result containing a plurality of terms; searching phrases formed by adjacent terms in the word segmentation result; and reserving the phrases, the lexical items belonging to nouns and the lexical items belonging to verbs in the word segmentation result as the corresponding reserved keywords of the original corpus.
For example, if a user inputs a search word "download game", the term of the search word belonging to the noun is "game", and the term of the search word belonging to the verb is "download".
Wherein the searching for phrases composed of adjacent terms in the word segmentation result comprises:
and calculating the cPId values of every two adjacent terms in the word segmentation result, and determining that the two adjacent terms form a phrase when the cPId values of the two adjacent terms are larger than a second preset threshold.
Equation 1 shows cpmd calculation method, where D (x, y) represents co-occurrence frequency of two terms x and y, D (x) represents occurrence frequency of term x, D (y) represents occurrence frequency of term y, D represents total app number, and δ is 0.7.
Equation 1
Figure BDA0001197316370000162
For example, according to the reverse ordering of the cpmd values, selecting a term combination with the cpmd higher than the threshold value 5 as a phrase, combining the phrase with the verb and the noun which are just reserved, and generating a new file query _ corp _ seg _ nouns _ verbs _ phrase.
Further, in an embodiment of the present invention, the preprocessing the original corpus set further includes: using the key words correspondingly reserved in the original material of each search word as the first-stage training corpus of the search word; the first-stage training corpus of each search word forms a first-stage training corpus set; and carrying out data cleaning on the keywords in the first-stage corpus set.
Specifically, the data cleaning of the keywords in the first-stage corpus set includes: in the first-stage training corpus set, for the first-stage training corpus of each search word, calculating a TF-IDF value of each keyword in the first-stage training corpus; deleting the key words with the TF-IDF value higher than a third preset threshold value and/or lower than a fourth preset threshold value to obtain a training corpus of the search word; the corpus of each search term constitutes a corpus set.
This step is to mine the non-tag words in the corpus set of the first stage for data cleansing. The probability that a term appearing in high frequency or low frequency is tag is small, tf-idf weight of each term and phrase is calculated in a first-stage training corpus set by utilizing a tf-idf statistical method, terms or phrases higher than a certain threshold or lower than the certain threshold are used as non-tag words, the threshold is related to specific corpus, specific values are not listed here, the non-tag words generate a black list black _ tag.list, the non-tag words in the first-stage training corpus set of a file are filtered, and a new training corpus set is generated, wherein the format is as follows: the search term _ id \ t term 1 term 2 … term n.
Table 3 shows that some of the non-tagged words in data cleansing that may be discarded are either too high frequency or too low frequency of occurrence and are not meaningful to the user search.
TABLE 3
Figure BDA0001197316370000171
After the training corpus set is obtained, the LDA model is selected from the GibbsLDA + + version. And modifying a GibbsLDA + + source code, and initializing the subjects of the same lexical item in the query corpus into the same subject. In the original code, each term is randomly initialized into a theme, so that the same repeated term can be initialized into a plurality of themes. For example, LDA training selects 120 topics, iterates through 300 rounds, and outputs two pieces of data, a topic-term probability distribution and a document-topic probability distribution.
According to the scheme, a label system of each search word is calculated according to the search word-topic probability distribution result and the topic-keyword probability distribution result, and the method comprises the following steps:
calculating to obtain a search word-keyword probability distribution result according to the search word-topic probability distribution result and the topic-keyword probability distribution result; and according to the search word-keyword probability distribution result, for each search word, sorting the keywords according to the probability of the search word from large to small, and selecting the keywords with the number of the top fifth preset threshold value.
Wherein, the calculating the search word-keyword probability distribution result according to the search word-topic probability distribution result and the topic-keyword probability distribution result comprises: for each search word, obtaining the probability of each topic about the search word according to the search word-topic probability distribution result; for each topic, obtaining the probability of each keyword about the topic according to the topic-keyword probability distribution result; for each keyword, taking the product of the probability of the keyword about a subject and the probability of the subject about a search word as the probability of the keyword about the search word based on the subject; and taking the probability of the keyword about the search word as the probability of the keyword based on the sum of the probabilities of the topics about the search word.
This step is the process of initial LDA tag generation, which results in LDA generated tags. The LDA outputs a topic probability distribution under each query, and a term probability distribution under each topic. In order to obtain the tag of each query, the probability distribution of topic and the probability distribution of terms are sorted in an inverse order from large to small according to the probability, the top 50 topic under each query are selected, the top 120 terms are selected under each topic, the probability of terms is weighted and sorted by using the probability of topic, each tag term has LDA weight to represent the importance under the query, and the tag list generated by LDA is obtained according to the inverse sorting of the tag weight, so that the tag list contains much noise and the sequence of the tag is inaccurate.
Further, to fine-tune the prediction result of the LDA model so that the order of the important tag of each query is advanced, in an embodiment of the present invention, the calculating the label system of each search term according to the search term-topic probability distribution result and the topic-keyword probability distribution result further includes: taking the keywords with the number of the first fifth preset threshold value correspondingly selected by each search word as a first-stage label system of the search word; for the first-stage label system of each search word, calculating a semantic relation value between each keyword in the first-stage label system of the search word and the search word; for each keyword, taking the product of the semantic relation value corresponding to the keyword and the probability of the keyword relative to the search word as the corrected probability of the keyword relative to the search word; and sorting all the keywords in the first-stage label system of the search word according to the correction probability of the search word from large to small, and selecting the first sixth preset threshold number of keywords to form the label system of the search word.
Calculating a semantic relation value between each keyword in a first-stage label system of the search term and the search term comprises the following steps: obtaining a search word sequence set corresponding to a plurality of query sessions according to search words in each query session; training the search word sequence set to obtain an N-dimensional keyword vector file; calculating word vectors of the keywords according to the N-dimensional keyword vector files, and calculating the word vectors of each term in the search words; calculating cosine similarity between the word vector of the keyword and the word vector of each term, and taking the cosine similarity as a semantic relation value of the keyword and the corresponding term; and taking the sum of the semantic relation values of the keyword and each term as the semantic relation value between the keyword and the search word.
For example, the semantic relationship between each tag word and query is calculated, using a trained word vector term _ w2v _300. dit, by: and respectively calculating the similarity of the residual indexes of the tag word vector and the word vector of each word in the query, accumulating the similarity together, and indicating that the tag is more important when the value is larger, weighting the tag with lda weight and then re-sequencing in the reverse order.
Specifically, the training the search word sequence set to obtain an N-dimensional keyword vector file includes: and performing word segmentation processing on the search word sequence set, and training the search word sequence set subjected to word segmentation processing by using a deep learning tool package word2vec to generate an N-dimensional keyword vector file.
For example, the search word sequence set is subjected to Chinese word segmentation, a deep learning tool package word2vec is used for training, a 300-dimensional query vector is generated, and another word vector file term _ w2v _300. ditt, namely a keyword vector file, is generated.
Still further, in an embodiment of the present invention, the tag system for obtaining each search term by calculation according to the search term-topic probability distribution result and the topic-keyword probability distribution result further includes: taking the first sixth preset threshold number of keywords correspondingly selected by each search word as a second stage label system of the search word; for the second-stage label system of each search word, counting the TF-IDF value of each keyword in the second-stage label system of the search word in the training corpus of the search word; for each keyword, taking the product of the probability of the keyword about the search word and the TF-IDF value as the secondary correction probability of the keyword about the search word; and sequencing all the keywords in the second stage label system of the search word from large to small according to the secondary correction probability of the search word, and selecting the first K keywords to form the label system of the search word.
For example, tf-idf weights in query expanded corpus are weighted appropriately according to tag, normalized and tag order is rearranged accordingly.
After the correction of the two methods, the tag sequence accuracy of expressing the query intention is greatly improved
In an embodiment of the present invention, the tag system for selecting the first K keywords to form the search term includes: acquiring the query times of the search terms in a preset time period from a query session log of an application search engine; selecting the first K key words to form a label system of the search word according to the query times; wherein the value of K is used as a broken line function of the query times corresponding to the search term.
The step is to determine the number of tags for each query, keep top k tag words, the k value is used as a broken line function of the query search times, each query keeps 2 to 5 unequal tags, the accuracy is 88%, and the recall rate is 75%. At this step we generate a query intent dictionary query _ intent _ tag.
Further, in a specific example, the scheme marks about 260 tens of thousands of queries with tag words expressing the intentions of the user, the queries are regarded as a whole, after the user synonymously reconstructs and rewrites the queries, the new queries are not in the query intention dictionary, and at this time, the semantic similarity between the new queries and the queries in the dictionary needs to be calculated, and the intent tag of the semantically similar queries is given to the new queries. The calculation method comprises the following steps: accumulating the term quantity of each word in the new query to be used as a new query vector, calculating Euclidean distance with the query vector of the query intention dictionary, selecting the first 3 nearest queries, and reducing the calculation complexity by using KdTree; and smoothing the Euclidean distance by using a Gaussian kernel to serve as the weighting weight of the tag words, synthesizing the intention tag words of 3 adjacent queries to generate the intention tag word of a new query, and reserving the first 3 tags to meet the search intention of the user, wherein the accuracy rate reaches 80%.
FIG. 2 shows a flowchart of an application search method according to an embodiment of the present invention, the method comprising:
step 210, a search term tag database is constructed, wherein the search term tag database comprises a tag system of a plurality of search terms.
And step 220, receiving the current search word uploaded by the client, and acquiring a label system of the current search word according to the search word label database.
And step 230, calculating the association degree between the label system of the current search term and the label system of each application.
And 240, returning the relevant information of the application to the client for displaying when the association degree between the label system of the current search word and the label system of the application meets the preset condition.
In the process of constructing the search term tag database, the mining of the tag system of the search term in step S210 is the same as the mining process of the tag system of the search term shown in any embodiment of the method shown in fig. 1.
In an embodiment of the present invention, the tag system for obtaining the current search term according to the search term tag database includes: calculating semantic similarity between the current search word and each search word in the search word tag database, sorting the search words according to the semantic similarity from large to small, and selecting the first preset threshold search words; and obtaining the label system of the current search word according to the label system of each selected search word.
In one embodiment of the present invention, the calculating semantic similarity between the current search word and each search word in the search word tag database includes: calculating the Euclidean distance between the current search word and each search word in the search word tag database, and taking the Euclidean distance between each search word and the current search word as the semantic similarity corresponding to the search word; the obtaining of the label system of the current search term according to the label system of each selected search term includes: the semantic similarity corresponding to each search word is used as the weight of each label in the label system of the search word; adding the weights of the same labels for the labels corresponding to the label system of each search term to obtain the final weight of each label; and sorting according to the final weight from large to small, and selecting the labels with the first second preset threshold value to form a label system of the current search term.
Table 4 shows 360 the cell phone assistant searching for the intent tag words of the partial search terms.
TABLE 4
Figure BDA0001197316370000211
Figure BDA0001197316370000221
Fig. 3 shows an apparatus for identifying an application search intention according to an embodiment of the present invention, where the apparatus 300 for identifying an application search intention includes:
an obtaining unit 310, adapted to obtain a search term in each query session from a query session log of an application search engine;
the mining unit 320 is adapted to mine a label system of each search term according to the search term in each query session and a preset strategy;
the identifying unit 330 is adapted to identify an application search intention corresponding to each search term according to the label system of the search term.
In an embodiment of the present invention, the mining unit 320 is adapted to obtain a corpus set according to search terms in each query session; inputting the training corpus set into an LDA model for training to obtain a search word-subject probability distribution result and a subject-keyword probability distribution result output by the LDA model; and calculating to obtain a label system of each search word according to the search word-theme probability distribution result and the theme-keyword probability distribution result.
In an embodiment of the present invention, the mining unit 320 is adapted to obtain an original corpus of each search term according to the search term in each query session; the original linguistic data of each search word form an original linguistic data set; and preprocessing the original corpus set to obtain a training corpus set.
Specifically, in an embodiment of the present invention, the mining unit 320 is adapted to obtain a search word sequence set corresponding to a plurality of query sessions according to search words in each query session; obtaining a search term set corresponding to a plurality of query sessions; training the search word sequence set to obtain an N-dimensional search word vector file; for each search word in the search word set, calculating the association degree between the search word and other search words according to the N-dimensional search word vector file; and taking other search terms of which the association degrees with the search terms meet preset conditions as the original linguistic data of the search terms.
That is, the mining unit 320 is adapted to, for each query session, arrange the search terms in the query session into a sequence in order; if one search term in the sequence corresponds to an application download operation, inserting the name of the downloaded application into the rear adjacent position of the corresponding search term in the sequence; obtaining a search word sequence corresponding to the query session; and taking the set of search terms in the plurality of query sessions as the set of search terms corresponding to the plurality of query sessions.
For example, the mining unit 320 is adapted to train the search word sequence set using a deep learning tool package word2vec to generate an N-dimensional search word vector file, with each search word in the search word sequence set as a word.
On this basis, in an embodiment of the present invention, the mining unit 320 is adapted to perform operations on the search term set and the N-dimensional search term vector file by using a KNN algorithm, and calculate a distance between every two search terms in the search term set according to the N-dimensional search term vector file; and for each search word in the search word set, sorting the search words according to the distance from the search word from large to small, and selecting the search words with the first preset threshold as the original corpus of the search word.
In the preprocessing process, in an embodiment of the present invention, the mining unit 320 is adapted to perform word segmentation on each original corpus in the original corpus set to obtain a word segmentation result including a plurality of terms; searching phrases formed by adjacent terms in the word segmentation result; and reserving the phrases, the lexical items belonging to nouns and the lexical items belonging to verbs in the word segmentation result as the corresponding reserved keywords of the original corpus.
Specifically, the mining unit 320 is adapted to calculate cpmd values of every two adjacent terms in the word segmentation result, and determine that the two adjacent terms form a phrase when the cpmd values of the two adjacent terms are greater than a second preset threshold.
Further, in an embodiment of the present invention, the mining unit 320 is further adapted to use a remaining keyword corresponding to the original material of each search term as a first-stage training corpus of the search term; the first-stage training corpus of each search word forms a first-stage training corpus set; and carrying out data cleaning on the keywords in the first-stage corpus set.
Specifically, in an embodiment of the present invention, the mining unit 320 is adapted to calculate, in the first-stage corpus, for the first-stage corpus of each search word, a TF-IDF value of each keyword in the first-stage corpus; deleting the key words with the TF-IDF value higher than a third preset threshold value and/or lower than a fourth preset threshold value to obtain a training corpus of the search word; the corpus of each search term constitutes a corpus set.
In an embodiment of the present invention, the mining unit 320 is adapted to calculate a search term-keyword probability distribution result according to the search term-topic probability distribution result and the topic-keyword probability distribution result; and according to the search word-keyword probability distribution result, for each search word, sorting the keywords according to the probability of the search word from large to small, and selecting the keywords with the number of the top fifth preset threshold value.
In an embodiment of the present invention, the mining unit 320 is adapted to, for each search word, obtain a probability of each topic about the search word according to the search word-topic probability distribution result; for each topic, obtaining the probability of each keyword about the topic according to the topic-keyword probability distribution result; for each keyword, taking the product of the probability of the keyword about a subject and the probability of the subject about a search word as the probability of the keyword about the search word based on the subject; and taking the probability of the keyword about the search word as the probability of the keyword based on the sum of the probabilities of the topics about the search word.
Further, in an embodiment of the present invention, the mining unit 320 is further adapted to use the first-fifth preset threshold number of keywords, which are selected corresponding to each search term, as a first-stage tag system of the search term; for the first-stage label system of each search word, calculating a semantic relation value between each keyword in the first-stage label system of the search word and the search word; for each keyword, taking the product of the semantic relation value corresponding to the keyword and the probability of the keyword relative to the search word as the corrected probability of the keyword relative to the search word; and sorting all the keywords in the first-stage label system of the search word according to the correction probability of the search word from large to small, and selecting the first sixth preset threshold number of keywords to form the label system of the search word.
In an embodiment of the present invention, the mining unit 320 is adapted to obtain a set of search word sequences corresponding to a plurality of query sessions according to search words in each query session; training the search word sequence set to obtain an N-dimensional keyword vector file; calculating word vectors of the keywords according to the N-dimensional keyword vector files, and calculating the word vectors of each term in the search words; calculating cosine similarity between the word vector of the keyword and the word vector of each term, and taking the cosine similarity as a semantic relation value of the keyword and the corresponding term; and taking the sum of the semantic relation values of the keyword and each term as the semantic relation value between the keyword and the search word.
In an embodiment of the present invention, the mining unit 320 is adapted to perform word segmentation on the search word sequence set, and train the search word sequence set after word segmentation by using a deep learning tool package word2vec to generate an N-dimensional keyword vector file.
Further, in an embodiment of the present invention, the mining unit 320 is further adapted to use the first sixth preset threshold number of keywords, which are selected corresponding to each search word, as a second-stage tag system of the search word; for the second-stage label system of each search word, counting the TF-IDF value of each keyword in the second-stage label system of the search word in the training corpus of the search word; for each keyword, taking the product of the probability of the keyword about the search word and the TF-IDF value as the secondary correction probability of the keyword about the search word; and sequencing all the keywords in the second stage label system of the search word from large to small according to the secondary correction probability of the search word, and selecting the first K keywords to form the label system of the search word.
In an embodiment of the present invention, the mining unit 320 is adapted to obtain, from a query session log of an application search engine, a number of queries about the search term in a preset time period; selecting the first K key words to form a label system of the search word according to the query times; wherein the value of K is used as a broken line function of the query times corresponding to the search term.
Fig. 4 shows an application search server according to an embodiment of the present invention, the application search server 400 including:
a database construction unit 410 adapted to construct a search term tag database including a tag system of a plurality of search terms;
an interaction unit 420 adapted to receive a current search term uploaded by a client;
the search processing unit 430 is adapted to obtain a tag system of a current search term according to the search term tag database; calculating the degree of association between the label system of the current search term and the label system of each application;
the interaction unit 420 is further adapted to return the relevant information of an application to the client for displaying when the degree of association between the tag system of the current search term and the tag system of the application meets a preset condition;
the scheme of mining the tag system of the search term in the process of building the search term tag database by the database building unit 410 is the same as the scheme of mining the tag system of the search term by the identification device 300 applying the search intention according to any one of the above embodiments of the present invention.
In an embodiment of the present invention, the search processing unit 430 is adapted to calculate semantic similarities between the current search word and the search words in the search word tag database, sort the search words according to the semantic similarities from large to small, and select a first preset threshold number of search words; and obtaining the label system of the current search word according to the label system of each selected search word.
In one embodiment of the present invention, the search processing unit 430 is adapted to calculate the euclidean distance between the current search word and each search word in the search word tag database, and use the euclidean distance between each search word and the current search word as the semantic similarity corresponding to the search word; the semantic similarity corresponding to each search word is used as the weight of each label in the label system of the search word; adding the weights of the same labels for the labels corresponding to the label system of each search term to obtain the final weight of each label; and sorting according to the final weight from large to small, and selecting the labels with the first second preset threshold value to form a label system of the current search term.
It should be noted that the embodiments of the apparatus shown in fig. 3-4 are the same as the embodiments of the method shown in fig. 1-2, and the detailed description is given above and will not be repeated herein.
In summary, the invention provides a user intention identification method-label method matched with an app application label system by applying the identification method and device of the search intention, the application search method and the server, and the query intention of the user with fine granularity is flexibly expressed. A label system of the user intention is constructed based on an unsupervised machine learning technology, a traditional user intention classification method is abandoned, a set of automatic user intention mining process is realized, and a user intention label list with high accuracy and recall rate can be generated. The user intention and the app can be mapped into the same tag system, so that the problem of user intention identification is solved, the problem of related calculation of an application search engine is solved, and a foundation is laid for a core technology-function search technology in the application search engine.
It should be noted that:
the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the application search intent recognition apparatus and application search server according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (36)

1.一种应用搜索意图的识别方法,其中,包括:1. A method for identifying application search intent, comprising: 从应用搜索引擎的查询会话日志中获取各查询会话中的搜索词;Obtain the search terms in each query session from the query session log of the application search engine; 根据各查询会话中的搜索词以及预设策略,挖掘出各搜索词的标签体系;According to the search terms and preset strategies in each query session, the tag system of each search term is mined; 根据每个搜索词的标签体系识别出该搜索词对应的应用搜索意图;Identify the application search intent corresponding to the search term according to the tag system of each search term; 根据各查询会话中的搜索词以及预设策略,挖掘出各搜索词的标签体系包括:According to the search terms in each query session and the preset strategy, the tag system for mining each search term includes: 根据各查询会话中的搜索词,获得训练语料集合;Obtain a training corpus set according to the search words in each query session; 将训练语料集合输入至LDA模型中进行训练,得到LDA模型输出的搜索词-主题概率分布结果以及主题-关键词概率分布结果;Input the training corpus set into the LDA model for training, and obtain the search word-topic probability distribution results and the topic-keyword probability distribution results output by the LDA model; 根据所述搜索词-主题概率分布结果和所述主题-关键词概率分布结果,计算得到各搜索词的标签体系;According to the search term-topic probability distribution result and the topic-keyword probability distribution result, the tag system of each search term is obtained by calculating; 所述根据各查询会话中的搜索词,获得训练语料集合包括:The obtaining of the training corpus set according to the search words in each query session includes: 根据各查询会话中的搜索词,获得各搜索词的原始语料;Obtain the original corpus of each search term according to the search term in each query session; 各搜索词的原始语料构成原始语料集合;对所述原始语料集合进行预处理,获得训练语料集合;The original corpus of each search term constitutes an original corpus set; the original corpus set is preprocessed to obtain a training corpus set; 所述根据各查询会话中的搜索词,获得各搜索词的原始语料包括:According to the search terms in each query session, obtaining the original corpus of each search term includes: 根据各查询会话中的搜索词,获得多个查询会话对应的搜索词序列集合;以及,获得多个查询会话对应的搜索词集合;According to the search words in each query session, obtain a set of search word sequences corresponding to the plurality of query sessions; and, obtain a set of search words corresponding to the plurality of query sessions; 对所述搜索词序列集合进行训练得到N维的搜索词向量文件;The search word sequence set is trained to obtain an N-dimensional search word vector file; 对于所搜索词集合中的每个搜索词,根据所述N维的搜索词向量文件计算该搜索词与其他各搜索词之间的关联程度;将与该搜索词的关联程度符合预设条件的其他各搜索词作为该搜索词的原始语料。For each search word in the searched word set, calculate the degree of association between the search word and other search words according to the N-dimensional search word vector file; Each other search term is used as the original corpus of the search term. 2.如权利要求1所述的方法,其中,所述获得多个查询会话对应的搜索词序列集合包括:2. The method of claim 1, wherein the obtaining a set of search word sequences corresponding to multiple query sessions comprises: 对于每个查询会话,将该查询会话中的搜索词按照顺序排成一个序列;如果该序列中的一个搜索词对应于应用下载操作,将所下载的应用的名称插入到该序列中的相应搜索词的后面相邻位置;得到该查询会话对应的搜索词序列;For each query session, the search terms in the query session are ordered into a sequence; if a search term in the sequence corresponds to an application download operation, insert the name of the downloaded application into the corresponding search in the sequence The adjacent position behind the word; get the search word sequence corresponding to the query session; 所述获得多个查询会话对应的搜索词集合包括:将多个查询会话中的搜索词的集合作为所述多个查询会话对应的搜索词集合。The obtaining the set of search words corresponding to the multiple query sessions includes: taking the set of search words in the multiple query sessions as the set of search words corresponding to the multiple query sessions. 3.如权利要求1所述的方法,其中,对所述搜索词序列集合进行训练得到N维的搜索词向量文件包括:3. The method according to claim 1, wherein, training the search word sequence set to obtain an N-dimensional search word vector file comprising: 将所述搜索词序列集合中的每个搜索词作为一个单词,利用深度学习工具包word2vec对所述搜索词序列集合进行训练,生成N维的搜索词向量文件。Taking each search word in the set of search word sequences as a word, and using the deep learning toolkit word2vec to train the set of search word sequences, an N-dimensional search word vector file is generated. 4.如权利要求1所述的方法,其中,所述对于所搜索词集合中的每个搜索词,根据所述N维的搜索词向量文件计算该搜索词与其他各搜索词之间的关联程度;将与该搜索词的关联程度符合符合预设条件的其他各搜索词作为该搜索词的原始语料包括:4. The method according to claim 1, wherein, for each search word in the searched word set, the association between the search word and other search words is calculated according to the N-dimensional search word vector file degree; the original corpus of the search term using other search terms whose degree of association with the search term meets the preset conditions includes: 利用KNN算法对所述搜索词集合以及所述N维的搜索词向量文件进行运算,根据所述N维的搜索词向量文件计算所述搜索词集合中的每两个搜索词之间的距离;Utilize the KNN algorithm to perform operations on the search term set and the N-dimensional search term vector file, and calculate the distance between each two search terms in the search term set according to the N-dimensional search term vector file; 对于所述搜索词集合中的每个搜索词,按照与该搜索词的距离从大到小排序,选取前第一预设阈值个搜索词作为该搜索词的原始语料。For each search word in the search word set, the distance from the search word is sorted in descending order, and the first preset threshold search words are selected as the original corpus of the search word. 5.如权利要求1所述的方法,其中,所述对所述原始语料集合进行预处理包括:5. The method of claim 1, wherein the preprocessing of the original set of corpora comprises: 在所述原始语料集合中,In the original corpus set, 对于每个原始语料,对所述原始语料进行分词处理,得到包含多个词项的分词结果;查找由所述分词结果中的相邻词项构成的短语;保留所述短语、所述分词结果中属于名词的词项和属于动词的词项,作为该原始语料对应保留的关键词。For each original corpus, perform word segmentation on the original corpus to obtain a word segmentation result containing multiple terms; find a phrase formed by adjacent terms in the word segmentation result; retain the phrase and the word segmentation result The terms that belong to nouns and the terms that belong to verbs in the original corpus are used as the reserved keywords for the original corpus. 6.如权利要求5所述的方法,其中,所述查找由所述分词结果中的相邻词项构成的短语包括:6. The method of claim 5, wherein the finding phrases formed by adjacent terms in the word segmentation results comprises: 计算分词结果中的每两个相邻词项的cPMId值,当两个相邻词项的cPMId值大于第二预设阈值时,确定这两个相邻词项构成短语。Calculate the cPMId value of each two adjacent terms in the word segmentation result, and when the cPMId value of the two adjacent terms is greater than the second preset threshold, determine that the two adjacent terms constitute a phrase. 7.如权利要求1-6中任一项所述的方法,其中,所述对所述原始语料集合进行预处理还包括:7. The method according to any one of claims 1-6, wherein the preprocessing the original corpus set further comprises: 将每个搜索词的原始物料对应保留的关键词作为该搜索词的第一阶段训练语料;The reserved keywords corresponding to the original materials of each search term are used as the first-stage training corpus of the search term; 各搜索词的第一阶段训练语料构成第一阶段训练语料集合;对所述第一阶段训练语料集合中的关键词进行数据清洗。The first-stage training corpus of each search term constitutes a first-stage training corpus set; data cleaning is performed on the keywords in the first-stage training corpus set. 8.如权利要求7所述的方法,其中,所述对所述第一阶段训练语料集合中的关键词进行数据清洗包括:8. The method according to claim 7, wherein the performing data cleaning on the keywords in the first-stage training corpus set comprises: 在所述第一阶段训练语料集合中,In the first-stage training corpus set, 对于每个搜索词的第一阶段训练语料,计算所述第一阶段训练语料中的每个关键词的TF-IDF值;将TF-IDF值高于第三预设阈值和/或低于第四预设阈值的关键词删除,得到该搜索词的训练语料;For the first-stage training corpus of each search word, calculate the TF-IDF value of each keyword in the first-stage training corpus; set the TF-IDF value higher than the third preset threshold and/or lower than the third Four preset threshold keywords are deleted, and the training corpus of the search word is obtained; 各搜索词的训练语料构成训练语料集合。The training corpus of each search term constitutes a training corpus set. 9.如权利要求1所述的方法,其中,所述根据所述搜索词-主题概率分布结果和所述主题-关键词概率分布结果,计算得到各搜索词的标签体系包括:9. The method of claim 1 , wherein, according to the search term-topic probability distribution result and the topic-keyword probability distribution result, calculating and obtaining the label system of each search term comprises: 根据所述搜索词-主题概率分布结果和所述主题-关键词概率分布结果,计算得到搜索词-关键词概率分布结果;According to the search word-topic probability distribution result and the topic-keyword probability distribution result, calculate the search word-keyword probability distribution result; 根据所述搜索词-关键词概率分布结果,对于每个搜索词,将关键词按照关于该搜索词的概率从大到小排序,选取前第五预设阈值数目的关键词。According to the search term-keyword probability distribution result, for each search term, the keywords are sorted in descending order according to the probability of the search term, and the first fifth preset threshold number of keywords are selected. 10.如权利要求1所述的方法,其中,所述根据所述搜索词-主题概率分布结果和所述主题-关键词概率分布结果,计算得到搜索词-关键词概率分布结果包括:10. The method according to claim 1, wherein, according to the search word-topic probability distribution result and the topic-keyword probability distribution result, the calculation to obtain the search word-keyword probability distribution result comprises: 对于每个搜索词,根据所述搜索词-主题概率分布结果得到各主题关于该搜索词的概率;For each search term, obtain the probability of each topic about the search term according to the search term-topic probability distribution result; 对于每个主题,根据所述主题-关键词概率分布结果得到各关键词关于该主题的概率;For each topic, obtain the probability of each keyword about the topic according to the topic-keyword probability distribution result; 则对于每个关键词,将该关键词关于一个主题的概率与该主题关于一个搜索词的概率的乘积作为该关键词基于该主题的关于所述搜索词的概率;将该关键词基于各主题关于所述搜索词的概率之和作为该关键词关于所述搜索词的概率。Then for each keyword, the product of the probability of the keyword on a topic and the probability of the topic on a search term is taken as the probability of the keyword based on the topic on the search term; the keyword is based on each topic The sum of the probabilities about the search word is taken as the probability of the keyword with respect to the search word. 11.如权利要求1所述的方法,其中,所述根据所述搜索词-主题概率分布结果和所述主题-关键词概率分布结果,计算得到各搜索词的标签体系还包括:11. The method according to claim 1, wherein, according to the search term-topic probability distribution result and the topic-keyword probability distribution result, calculating and obtaining the label system of each search term further comprises: 将每个搜索词对应选取的前第五预设阈值数目的关键词作为该搜索词的第一阶段标签体系;The first-stage label system of the search term is taken as the first-stage label system of the search term corresponding to the first fifth preset threshold number of keywords selected for each search term; 对于每个搜索词的第一阶段标签体系,计算该搜索词的第一阶段标签体系中的每个关键词与该搜索词之间的语义关系值;对于每个关键词,将该关键词对应的语义关系值与该关键词关于该搜索词的概率的乘积作为该关键词关于该搜索词的修正概率;将该搜索词的第一阶段标签体系中的各关键词按照关于该搜索词的修正概率从大到小排序,选取前第六预设阈值个关键词构成该搜索词的标签体系。For the first-stage tag system of each search term, calculate the semantic relationship value between each keyword in the first-stage tag system of the search term and the search term; for each keyword, the keyword corresponds to The product of the semantic relationship value of the keyword and the probability of the keyword with respect to the search word is used as the modified probability of the keyword with respect to the search word; each keyword in the first-stage label system of the search word is modified according to the search word The probability is sorted from large to small, and the first sixth preset threshold keywords are selected to form the tag system of the search word. 12.如权利要求1所述的方法,其中,计算该搜索词的第一阶段标签体系中的每个关键词与该搜索词之间的语义关系值包括:12. The method of claim 1, wherein calculating the semantic relationship value between each keyword in the first-stage tag system of the search term and the search term comprises: 根据各查询会话中的搜索词,获得多个查询会话对应的搜索词序列集合;对所述搜索词序列集合进行训练得到N维的关键词向量文件;According to the search words in each query session, obtain a set of search word sequences corresponding to multiple query sessions; perform training on the set of search word sequences to obtain an N-dimensional keyword vector file; 根据所述N维的关键词向量文件,计算该关键词的词向量,计算该搜索词中的每个词项的词向量;Calculate the word vector of the keyword according to the N-dimensional keyword vector file, and calculate the word vector of each term in the search word; 计算该关键词的词向量与每个词项的词向量之间的余弦相似度,作为该关键词与相应词项的语义关系值;Calculate the cosine similarity between the word vector of the keyword and the word vector of each term as the semantic relationship value between the keyword and the corresponding term; 将该关键词与各词项的语义关系值之和作为该关键词与该搜索词之间的语义关系值。The sum of the semantic relationship values of the keyword and each term is used as the semantic relationship value between the keyword and the search term. 13.如权利要求1所述的方法,其中,所述对所述搜索词序列集合进行训练得到N维的关键词向量文件包括:13. The method of claim 1 , wherein the N-dimensional keyword vector file obtained by training the set of search word sequences comprises: 对所述搜索词序列集合进行分词处理,利用深度学习工具包word2vec对分词处理后的搜索词序列集合进行训练,生成N维的关键词向量文件。Perform word segmentation processing on the search word sequence set, and use the deep learning toolkit word2vec to train the search word sequence set after word segmentation processing to generate an N-dimensional keyword vector file. 14.如权利要求1所述的方法,其中,所述根据所述搜索词-主题概率分布结果和所述主题-关键词概率分布结果,计算得到各搜索词的标签体系还包括:14. The method according to claim 1, wherein, according to the search term-topic probability distribution result and the topic-keyword probability distribution result, calculating and obtaining the label system of each search term further comprises: 将每个搜索词对应选取的前第六预设阈值个关键词作为该搜索词的第二阶段标签体系;The first sixth preset threshold keywords selected corresponding to each search term are used as the second-stage label system of the search term; 对于每个搜索词的第二阶段标签体系,统计该搜索词的第二阶段标签体系中的每个关键词在该搜索词的训练语料中的TF-IDF值;对于每个关键词,将该关键词关于该搜索词的概率与所述TF-IDF值的乘积作为该关键词关于该搜索词的二次修正概率;将该搜索词的第二阶段标签体系中的各关键词按照关于该搜索词的二次修正概率从大到小排序,选取前K个关键词构成该搜索词的标签体系。For the second-stage label system of each search word, count the TF-IDF value of each keyword in the search word's second-stage label system in the training corpus of the search word; for each keyword, the The product of the probability of the keyword about the search word and the TF-IDF value is used as the secondary correction probability of the keyword about the search word; each keyword in the second-stage tag system of the search word is based on the The secondary correction probability of the words is sorted from large to small, and the top K keywords are selected to form the tag system of the search word. 15.如权利要求14所述的方法,其中,所述选取前K个关键词构成该搜索词的标签体系包括:15. The method of claim 14 , wherein the label system for selecting the top K keywords to form the search term comprises: 从应用搜索引擎的查询会话日志中获取关于该搜索词在预设时间段内的查询次数;Obtain the number of queries about the search term within a preset time period from the query session log of the application search engine; 根据所述查询次数选取前K个关键词构成该搜索词的标签体系;其中K值作为该搜索词对应的查询次数的折线函数。The top K keywords are selected according to the number of queries to form a tag system of the search term; the K value is used as a polyline function of the number of queries corresponding to the search term. 16.一种应用搜索方法,其中,包括:16. An application search method, comprising: 构建搜索词标签数据库,该搜索词标签数据库中包括多个搜索词的标签体系;Build a search term tag database, which includes a tag system of multiple search terms; 接收客户端上传的当前搜索词,根据所述搜索词标签数据库获取当前搜索词的标签体系;Receive the current search term uploaded by the client, and obtain the tag system of the current search term according to the search term tag database; 计算当前搜索词的标签体系与各应用的标签体系之间的关联程度;Calculate the degree of association between the tag system of the current search term and the tag system of each application; 当当前搜索词的标签体系与一个应用的标签体系之间的关联程度符合预设条件时,将该应用的相关信息返回至客户端进行展示;When the degree of association between the tag system of the current search term and the tag system of an application meets the preset conditions, return the relevant information of the application to the client for display; 通过如权利要求1-15中任一项所述的方法构建所述搜索词标签数据库。The search term tag database is constructed by the method of any one of claims 1-15. 17.如权利要求16所述的方法,其中,所述根据所述搜索词标签数据库获取当前搜索词的标签体系包括:17. The method of claim 16, wherein the obtaining the tag system of the current search term according to the search term tag database comprises: 计算当前搜索词与所述搜索词标签数据库中的各搜索词之间的语义相似度,按照语义相似度从大到小排序,选取前第一预设阈值个搜索词;Calculate the semantic similarity between the current search term and each search term in the search term tag database, sort according to the semantic similarity from large to small, and select the first first preset threshold search terms; 根据所选取的各搜索词的标签体系,获得当前搜索词的标签体系。According to the selected tag system of each search term, the tag system of the current search term is obtained. 18.如权利要求16或17所述的方法,其中,18. The method of claim 16 or 17, wherein, 所述计算当前搜索词与所述搜索词标签数据库中的各搜索词之间的语义相似度包括:计算当前搜索词与所述搜索词标签数据库中的各搜索词之间的欧式距离,将每个搜索词与当前搜索词之间的欧式距离作为该搜索词对应的语义相似度;The calculating the semantic similarity between the current search term and each search term in the search term label database includes: calculating the Euclidean distance between the current search term and each search term in the search term label database, The Euclidean distance between a search term and the current search term is used as the semantic similarity corresponding to the search term; 所述根据所选取的各搜索词的标签体系,获得当前搜索词的标签体系包括:每个搜索词对应的语义相似度作为该搜索词的标签体系中的各标签的权重;对于各搜索词的标签体系对应的各标签,将相同的标签的权重相加,得到各标签的最终权重;按照最终权重从大到小排序,选取前第二预设阈值个标签构成当前搜索词的标签体系。The obtaining of the label system of the current search word according to the selected label system of each search word includes: the semantic similarity corresponding to each search word is used as the weight of each label in the label system of the search word; For each tag corresponding to the tag system, the weights of the same tags are added to obtain the final weight of each tag; according to the final weight, the tags are sorted from large to small, and the first second preset threshold tags are selected to form the tag system of the current search term. 19.一种应用搜索意图的识别装置,其中,包括:19. A device for identifying an application search intent, comprising: 获取单元,适于从应用搜索引擎的查询会话日志中获取各查询会话中的搜索词;an obtaining unit, adapted to obtain search terms in each query session from the query session log of the application search engine; 挖掘单元,适于根据各查询会话中的搜索词以及预设策略,挖掘出各搜索词的标签体系;The mining unit is suitable for mining the tag system of each search word according to the search words in each query session and the preset strategy; 识别单元,适于根据每个搜索词的标签体系识别出该搜索词对应的应用搜索意图;an identification unit, adapted to identify the application search intent corresponding to the search term according to the tag system of each search term; 所述挖掘单元,适于根据各查询会话中的搜索词,获得训练语料集合;将训练语料集合输入至LDA模型中进行训练,得到LDA模型输出的搜索词-主题概率分布结果以及主题-关键词概率分布结果;根据所述搜索词-主题概率分布结果和所述主题-关键词概率分布结果,计算得到各搜索词的标签体系;The mining unit is adapted to obtain a training corpus set according to the search words in each query session; input the training corpus set into the LDA model for training, and obtain the search word-topic probability distribution result and the topic-keyword output by the LDA model Probability distribution result; according to the search term-topic probability distribution result and the topic-keyword probability distribution result, the tag system of each search term is calculated and obtained; 所述挖掘单元,适于根据各查询会话中的搜索词,获得各搜索词的原始语料;各搜索词的原始语料构成原始语料集合;对所述原始语料集合进行预处理,获得训练语料集合;The mining unit is adapted to obtain the original corpus of each search term according to the search words in each query session; the original corpus of each search term constitutes an original corpus set; the original corpus set is preprocessed to obtain a training corpus set; 所述挖掘单元,适于根据各查询会话中的搜索词,获得多个查询会话对应的搜索词序列集合;以及,获得多个查询会话对应的搜索词集合;对所述搜索词序列集合进行训练得到N维的搜索词向量文件;对于所搜索词集合中的每个搜索词,根据所述N维的搜索词向量文件计算该搜索词与其他各搜索词之间的关联程度;将与该搜索词的关联程度符合预设条件的其他各搜索词作为该搜索词的原始语料。The mining unit is adapted to obtain search word sequence sets corresponding to multiple query sessions according to search terms in each query session; and obtain search word sets corresponding to multiple query sessions; and train the search word sequence sets Obtain an N-dimensional search word vector file; for each search word in the searched word set, calculate the degree of association between the search word and other search words according to the N-dimensional search word vector file; Other search words whose correlation degree meets the preset condition are used as the original corpus of the search word. 20.如权利要求19所述的识别装置,其中,20. The identification device of claim 19, wherein, 所述挖掘单元,适于对于每个查询会话,将该查询会话中的搜索词按照顺序排成一个序列;如果该序列中的一个搜索词对应于应用下载操作,将所下载的应用的名称插入到该序列中的相应搜索词的后面相邻位置;得到该查询会话对应的搜索词序列;将多个查询会话中的搜索词的集合作为所述多个查询会话对应的搜索词集合。The mining unit is adapted to, for each query session, arrange the search terms in the query session into a sequence in order; if a search term in the sequence corresponds to an application download operation, insert the name of the downloaded application into the Go to the adjacent position behind the corresponding search word in the sequence; obtain the search word sequence corresponding to the query session; use the set of search words in multiple query sessions as the search word set corresponding to the multiple query sessions. 21.如权利要求19所述的识别装置,其中,21. The identification device of claim 19, wherein, 所述挖掘单元,适于将所述搜索词序列集合中的每个搜索词作为一个单词,利用深度学习工具包word2vec对所述搜索词序列集合进行训练,生成N维的搜索词向量文件。The mining unit is adapted to use each search word in the search word sequence set as a word, and use the deep learning toolkit word2vec to train the search word sequence set to generate an N-dimensional search word vector file. 22.如权利要求19所述的识别装置,其中,22. The identification device of claim 19, wherein, 所述挖掘单元,适于利用KNN算法对所述搜索词集合以及所述N维的搜索词向量文件进行运算,根据所述N维的搜索词向量文件计算所述搜索词集合中的每两个搜索词之间的距离;对于所述搜索词集合中的每个搜索词,按照与该搜索词的距离从大到小排序,选取前第一预设阈值个搜索词作为该搜索词的原始语料。The mining unit is adapted to use the KNN algorithm to perform operations on the search word set and the N-dimensional search word vector file, and calculate each two in the search word set according to the N-dimensional search word vector file. The distance between the search words; for each search word in the set of search words, according to the distance from the search word in descending order, select the first preset threshold search words as the original corpus of the search word . 23.如权利要求19所述的识别装置,其中,23. The identification device of claim 19, wherein, 所述挖掘单元,适于在所述原始语料集合中,对于每个原始语料,对所述原始语料进行分词处理,得到包含多个词项的分词结果;查找由所述分词结果中的相邻词项构成的短语;保留所述短语、所述分词结果中属于名词的词项和属于动词的词项,作为该原始语料对应保留的关键词。The mining unit is adapted to, in the original corpus set, for each original corpus, perform word segmentation processing on the original corpus to obtain a word segmentation result including multiple terms; Phrase composed of terms; retain the phrase, the terms belonging to the noun and the terms belonging to the verb in the word segmentation result, as the reserved keywords corresponding to the original corpus. 24.如权利要求23所述的装置,其中,24. The apparatus of claim 23, wherein, 所述挖掘单元,适于计算分词结果中的每两个相邻词项的cPMId值,当两个相邻词项的cPMId值大于第二预设阈值时,确定这两个相邻词项构成短语。The mining unit is adapted to calculate the cPMId value of each two adjacent terms in the word segmentation result, and when the cPMId value of the two adjacent terms is greater than the second preset threshold, it is determined that the two adjacent terms constitute a phrase. 25.如权利要求19所述的识别装置,其中,25. The identification device of claim 19, wherein, 所述挖掘单元,还适于将每个搜索词的原始物料对应保留的关键词作为该搜索词的第一阶段训练语料;各搜索词的第一阶段训练语料构成第一阶段训练语料集合;对所述第一阶段训练语料集合中的关键词进行数据清洗。The mining unit is further adapted to use the reserved keywords corresponding to the original materials of each search term as the first-stage training corpus of the search term; the first-stage training corpus of each search term constitutes a first-stage training corpus set; The keywords in the training corpus set in the first stage are subjected to data cleaning. 26.如权利要求25所述的识别装置,其中,26. The identification device of claim 25, wherein, 所述挖掘单元,适于在所述第一阶段训练语料集合中,对于每个搜索词的第一阶段训练语料,计算所述第一阶段训练语料中的每个关键词的TF-IDF值;将TF-IDF值高于第三预设阈值和/或低于第四预设阈值的关键词删除,得到该搜索词的训练语料;各搜索词的训练语料构成训练语料集合。The mining unit is adapted to, in the first-stage training corpus set, for the first-stage training corpus of each search word, calculate the TF-IDF value of each keyword in the first-stage training corpus; Delete the keywords whose TF-IDF value is higher than the third preset threshold and/or lower than the fourth preset threshold to obtain the training corpus of the search term; the training corpus of each search term constitutes a training corpus set. 27.如权利要求19所述的识别装置,其中,27. The identification device of claim 19, wherein, 所述挖掘单元,适于根据所述搜索词-主题概率分布结果和所述主题-关键词概率分布结果,计算得到搜索词-关键词概率分布结果;根据所述搜索词-关键词概率分布结果,对于每个搜索词,将关键词按照关于该搜索词的概率从大到小排序,选取前第五预设阈值数目的关键词。The mining unit is adapted to calculate and obtain a search word-keyword probability distribution result according to the search word-topic probability distribution result and the topic-keyword probability distribution result; according to the search word-keyword probability distribution result , for each search word, sort the keywords according to the probability of the search word in descending order, and select the first fifth preset threshold number of keywords. 28.如权利要求19所述的识别装置,其中,28. The identification device of claim 19, wherein, 所述挖掘单元,适于对于每个搜索词,根据所述搜索词-主题概率分布结果得到各主题关于该搜索词的概率;对于每个主题,根据所述主题-关键词概率分布结果得到各关键词关于该主题的概率;则对于每个关键词,将该关键词关于一个主题的概率与该主题关于一个搜索词的概率的乘积作为该关键词基于该主题的关于所述搜索词的概率;将该关键词基于各主题关于所述搜索词的概率之和作为该关键词关于所述搜索词的概率。The mining unit is adapted to, for each search word, obtain the probability of each topic about the search word according to the search word-topic probability distribution result; for each topic, obtain each topic according to the topic-keyword probability distribution result. The probability of the keyword on the topic; then for each keyword, the product of the probability of the keyword on a topic and the probability of the topic on a search term is taken as the probability of the keyword based on the topic on the search term ; the keyword is based on the sum of the probabilities of each topic about the search word as the probability of the keyword about the search word. 29.如权利要求19所述的识别装置,其中,29. The identification device of claim 19, wherein, 所述挖掘单元,还适于将每个搜索词对应选取的前第五预设阈值数目的关键词作为该搜索词的第一阶段标签体系;对于每个搜索词的第一阶段标签体系,计算该搜索词的第一阶段标签体系中的每个关键词与该搜索词之间的语义关系值;对于每个关键词,将该关键词对应的语义关系值与该关键词关于该搜索词的概率的乘积作为该关键词关于该搜索词的修正概率;将该搜索词的第一阶段标签体系中的各关键词按照关于该搜索词的修正概率从大到小排序,选取前第六预设阈值个关键词构成该搜索词的标签体系。The mining unit is also adapted to use the first fifth preset threshold number of keywords corresponding to each search term as the first-stage label system of the search term; for the first-stage label system of each search term, calculate The semantic relationship value between each keyword in the first-stage tag system of the search term and the search term; for each keyword, the semantic relationship value corresponding to the keyword and the keyword's relation to the search term The product of the probabilities is used as the modified probability of the keyword with respect to the search term; the keywords in the first-stage tag system of the search term are sorted in descending order according to the modified probability of the search term, and the first sixth preset is selected. A threshold number of keywords constitute the tag system of the search term. 30.如权利要求19所述的识别装置,其中,30. The identification device of claim 19, wherein, 所述挖掘单元,适于根据各查询会话中的搜索词,获得多个查询会话对应的搜索词序列集合;对所述搜索词序列集合进行训练得到N维的关键词向量文件;根据所述N维的关键词向量文件,计算该关键词的词向量,计算该搜索词中的每个词项的词向量;计算该关键词的词向量与每个词项的词向量之间的余弦相似度,作为该关键词与相应词项的语义关系值;将该关键词与各词项的语义关系值之和作为该关键词与该搜索词之间的语义关系值。The mining unit is adapted to obtain search word sequence sets corresponding to multiple query sessions according to the search words in each query session; perform training on the search word sequence sets to obtain an N-dimensional keyword vector file; dimensional keyword vector file, calculate the word vector of the keyword, calculate the word vector of each term in the search word; calculate the cosine similarity between the word vector of the keyword and the word vector of each term , as the semantic relationship value between the keyword and the corresponding term; the sum of the semantic relationship value between the keyword and each term is used as the semantic relationship value between the keyword and the search term. 31.如权利要求19所述的识别装置,其中,31. The identification device of claim 19, wherein, 所述挖掘单元,适于对所述搜索词序列集合进行分词处理,利用深度学习工具包word2vec对分词处理后的搜索词序列集合进行训练,生成N维的关键词向量文件。The mining unit is adapted to perform word segmentation processing on the search word sequence set, and uses the deep learning toolkit word2vec to train the search word sequence set after word segmentation processing to generate an N-dimensional keyword vector file. 32.如权利要求19所述的识别装置,其中,32. The identification device of claim 19, wherein, 所述挖掘单元,还适于将每个搜索词对应选取的前第六预设阈值个关键词作为该搜索词的第二阶段标签体系;对于每个搜索词的第二阶段标签体系,统计该搜索词的第二阶段标签体系中的每个关键词在该搜索词的训练语料中的TF-IDF值;对于每个关键词,将该关键词关于该搜索词的概率与所述TF-IDF值的乘积作为该关键词关于该搜索词的二次修正概率;将该搜索词的第二阶段标签体系中的各关键词按照关于该搜索词的二次修正概率从大到小排序,选取前K个关键词构成该搜索词的标签体系。The mining unit is also adapted to use the first sixth preset threshold keywords corresponding to each search term as the second-stage label system of the search term; for the second-stage label system of each search term, count the The TF-IDF value of each keyword in the second-stage tag system of the search word in the training corpus of the search word; for each keyword, the probability of the keyword about the search word and the TF-IDF The product of the values is used as the secondary correction probability of the keyword with respect to the search word; the keywords in the second-stage label system of the search word are sorted according to the secondary correction probability of the search word from large to small. K keywords constitute the tag system of the search term. 33.如权利要求32所述的识别装置,其中,33. The identification device of claim 32, wherein, 所述挖掘单元,适于从应用搜索引擎的查询会话日志中获取关于该搜索词在预设时间段内的查询次数;根据所述查询次数选取前K个关键词构成该搜索词的标签体系;其中K值作为该搜索词对应的查询次数的折线函数。The mining unit is adapted to obtain the number of queries about the search term within a preset time period from the query session log of the application search engine; according to the number of queries, the top K keywords are selected to form a tag system of the search term; The K value is used as a polyline function of the number of queries corresponding to the search term. 34.一种应用搜索服务器,其中,包括:34. An application search server, comprising: 数据库构建单元,适于构建搜索词标签数据库,该搜索词标签数据库中包括多个搜索词的标签体系;a database construction unit, suitable for constructing a search word tag database, the search word tag database includes a tag system of a plurality of search words; 交互单元,适于接收客户端上传的当前搜索词;an interaction unit, adapted to receive the current search term uploaded by the client; 搜索处理单元,适于根据所述搜索词标签数据库获取当前搜索词的标签体系;计算当前搜索词的标签体系与各应用的标签体系之间的关联程度;a search processing unit, adapted to obtain the tag system of the current search term according to the search term tag database; calculate the degree of association between the tag system of the current search term and the tag system of each application; 所述交互单元,还适于当当前搜索词的标签体系与一个应用的标签体系之间的关联程度符合预设条件时,将该应用的相关信息返回至客户端进行展示;The interaction unit is further adapted to return the relevant information of the application to the client for display when the degree of association between the tag system of the current search term and the tag system of an application meets a preset condition; 所述数据库构建单元与权利要求19-33中任一项所述的应用搜索意图的识别装置构建所述搜索词标签数据库的过程相同。The database construction unit is the same as the process of constructing the search term tag database by the apparatus for recognizing search intent according to any one of claims 19-33. 35.如权利要求34所述的服务器,其中,35. The server of claim 34, wherein, 所述搜索处理单元,适于计算当前搜索词与所述搜索词标签数据库中的各搜索词之间的语义相似度,按照语义相似度从大到小排序,选取前第一预设阈值个搜索词;根据所选取的各搜索词的标签体系,获得当前搜索词的标签体系。The search processing unit is adapted to calculate the semantic similarity between the current search term and each search term in the search term tag database, sort according to the semantic similarity in descending order, and select the first preset threshold number of searches before word; according to the selected label system of each search word, the label system of the current search word is obtained. 36.如权利要求34或35所述的服务器,其中,36. The server of claim 34 or 35, wherein, 所述搜索处理单元,适于计算当前搜索词与所述搜索词标签数据库中的各搜索词之间的欧式距离,将每个搜索词与当前搜索词之间的欧式距离作为该搜索词对应的语义相似度;每个搜索词对应的语义相似度作为该搜索词的标签体系中的各标签的权重;对于各搜索词的标签体系对应的各标签,将相同的标签的权重相加,得到各标签的最终权重;按照最终权重从大到小排序,选取前第二预设阈值个标签构成当前搜索词的标签体系。The search processing unit is adapted to calculate the Euclidean distance between the current search term and each search term in the search term tag database, and use the Euclidean distance between each search term and the current search term as the corresponding search term. Semantic similarity; the semantic similarity corresponding to each search word is used as the weight of each label in the label system of the search word; for each label corresponding to the label system of each search word, the weights of the same labels are added to obtain each label. The final weight of the tag; according to the final weight in descending order, the first second preset threshold tags are selected to form the tag system of the current search term.
CN201611246921.1A 2016-12-29 2016-12-29 Application search intent identification method, device, application search method and server Expired - Fee Related CN106649818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611246921.1A CN106649818B (en) 2016-12-29 2016-12-29 Application search intent identification method, device, application search method and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611246921.1A CN106649818B (en) 2016-12-29 2016-12-29 Application search intent identification method, device, application search method and server

Publications (2)

Publication Number Publication Date
CN106649818A CN106649818A (en) 2017-05-10
CN106649818B true CN106649818B (en) 2020-05-15

Family

ID=58835949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611246921.1A Expired - Fee Related CN106649818B (en) 2016-12-29 2016-12-29 Application search intent identification method, device, application search method and server

Country Status (1)

Country Link
CN (1) CN106649818B (en)

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357875B (en) * 2017-07-04 2021-09-10 北京奇艺世纪科技有限公司 Voice search method and device and electronic equipment
CN107247709B (en) * 2017-07-28 2021-03-16 广州多益网络股份有限公司 Encyclopedic entry label optimization method and system
CN110147479B (en) * 2017-10-31 2021-06-11 北京搜狗科技发展有限公司 Search behavior recognition method and device and search behavior recognition device
CN107862027B (en) * 2017-10-31 2019-03-12 北京小度信息科技有限公司 Retrieve intension recognizing method, device, electronic equipment and readable storage medium storing program for executing
CN108170664B (en) * 2017-11-29 2021-04-09 有米科技股份有限公司 Key word expansion method and device based on key words
CN109948140B (en) * 2017-12-20 2023-06-23 普天信息技术有限公司 Word vector embedding method and device
CN108804532B (en) * 2018-05-03 2020-06-26 腾讯科技(深圳)有限公司 Query intention mining method and device and query intention identification method and device
CN109002849B (en) * 2018-07-05 2022-05-17 百度在线网络技术(北京)有限公司 Method and device for identifying development stage of object
CN110688846B (en) * 2018-07-06 2023-11-07 北京京东尚科信息技术有限公司 Periodic word mining method, system, electronic equipment and readable storage medium
CN109034248B (en) * 2018-07-27 2022-04-05 电子科技大学 A deep learning-based classification method for images with noisy labels
CN109460473B (en) * 2018-11-21 2021-11-02 中南大学 Multi-label classification method of electronic medical records based on symptom extraction and feature representation
CN109522275B (en) * 2018-11-27 2020-11-20 掌阅科技股份有限公司 Label mining method based on user production content, electronic device and storage medium
CN109710612B (en) * 2018-12-25 2021-05-18 百度在线网络技术(北京)有限公司 Vector index recall method and device, electronic equipment and storage medium
CN109800296B (en) * 2019-01-21 2022-03-01 四川长虹电器股份有限公司 Semantic fuzzy recognition method based on user real intention
CN111563208B (en) * 2019-01-29 2023-06-30 株式会社理光 Method and device for identifying intention and computer readable storage medium
US11138285B2 (en) * 2019-03-07 2021-10-05 Microsoft Technology Licensing, Llc Intent encoder trained using search logs
CN110175241B (en) * 2019-05-23 2021-08-03 腾讯科技(深圳)有限公司 Question and answer library construction method and device, electronic equipment and computer readable medium
CN110347959B (en) * 2019-06-27 2022-11-01 杭州数跑科技有限公司 Anonymous user identification method, device, computer equipment and storage medium
CN112749321B (en) * 2019-10-31 2024-05-28 阿里巴巴集团控股有限公司 Data processing method, client, server, system and storage medium
CN110909535B (en) * 2019-12-06 2023-04-07 北京百分点科技集团股份有限公司 Named entity checking method and device, readable storage medium and electronic equipment
CN113569128B (en) * 2020-04-29 2024-12-24 北京金山云网络技术有限公司 Data retrieval method, device and electronic equipment
CN111783440B (en) * 2020-07-02 2024-04-26 北京字节跳动网络技术有限公司 Intention recognition method and device, readable medium and electronic equipment
CN111859148B (en) * 2020-07-30 2025-01-24 深圳前海微众银行股份有限公司 Method, device, apparatus and computer-readable storage medium for extracting subject matter
CN113343028B (en) * 2021-05-31 2022-09-02 北京达佳互联信息技术有限公司 Method and device for training intention determination model
CN113609379B (en) * 2021-07-12 2022-07-22 北京达佳互联信息技术有限公司 Label system construction method and device, electronic equipment and storage medium
CN115618873A (en) * 2021-07-13 2023-01-17 腾讯科技(深圳)有限公司 Data processing method, device, computer equipment and storage medium
CN113792116B (en) * 2021-08-25 2024-03-29 北京库睿科技有限公司 Multi-vertical-domain multi-intention hierarchical judgment method and system based on search word semantics
CN113836307B (en) * 2021-10-15 2024-02-20 国网北京市电力公司 A method, system, device and storage medium for hotspot discovery of power supply service work orders
CN113987266A (en) * 2021-10-26 2022-01-28 北京京东尚科信息技术有限公司 Method, device and system for training feature learning model and storage medium
CN114003750B (en) * 2021-10-29 2024-03-26 平安银行股份有限公司 Material online method, device, equipment and storage medium
CN114417116A (en) * 2021-12-21 2022-04-29 北京百度网讯科技有限公司 Search method, apparatus, device, medium, and program product based on search word
CN114168756B (en) * 2022-01-29 2022-05-13 浙江口碑网络技术有限公司 Query understanding method and device for search intention, storage medium and electronic device
CN114610576A (en) * 2022-03-15 2022-06-10 中国银行股份有限公司 Log generation monitoring method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425710A (en) * 2012-05-25 2013-12-04 北京百度网讯科技有限公司 Subject-based searching method and device
CN103995845A (en) * 2014-05-06 2014-08-20 百度在线网络技术(北京)有限公司 Information search method and device
CN104408095A (en) * 2014-11-15 2015-03-11 北京广利核系统工程有限公司 Improvement-based KNN (K Nearest Neighbor) text classification method
CN104636402A (en) * 2013-11-13 2015-05-20 阿里巴巴集团控股有限公司 Classification, search and push methods and systems of service objects
CN105095474A (en) * 2015-08-11 2015-11-25 北京奇虎科技有限公司 Method and device for establishing recommendation relation between searching terms and application data
CN105224661A (en) * 2015-09-30 2016-01-06 北京奇虎科技有限公司 Conversational information search method and device
CN105760522A (en) * 2016-02-29 2016-07-13 网易(杭州)网络有限公司 Information search method and device based on application program
CN106156357A (en) * 2016-07-27 2016-11-23 成都四象联创科技有限公司 Text data beam search method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7707284B2 (en) * 2005-08-03 2010-04-27 Novell, Inc. System and method of searching for classifying user activity performed on a computer system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425710A (en) * 2012-05-25 2013-12-04 北京百度网讯科技有限公司 Subject-based searching method and device
CN104636402A (en) * 2013-11-13 2015-05-20 阿里巴巴集团控股有限公司 Classification, search and push methods and systems of service objects
CN103995845A (en) * 2014-05-06 2014-08-20 百度在线网络技术(北京)有限公司 Information search method and device
CN104408095A (en) * 2014-11-15 2015-03-11 北京广利核系统工程有限公司 Improvement-based KNN (K Nearest Neighbor) text classification method
CN105095474A (en) * 2015-08-11 2015-11-25 北京奇虎科技有限公司 Method and device for establishing recommendation relation between searching terms and application data
CN105224661A (en) * 2015-09-30 2016-01-06 北京奇虎科技有限公司 Conversational information search method and device
CN105760522A (en) * 2016-02-29 2016-07-13 网易(杭州)网络有限公司 Information search method and device based on application program
CN106156357A (en) * 2016-07-27 2016-11-23 成都四象联创科技有限公司 Text data beam search method

Also Published As

Publication number Publication date
CN106649818A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intent identification method, device, application search method and server
CN109189942B (en) Method and device for constructing knowledge graph of patent data
CN110968684B (en) Information processing method, device, equipment and storage medium
CN109815308B (en) Method and device for determining intention recognition model and method and device for searching intention recognition
CN109145153B (en) Intention category identification method and device
CN106599278B (en) Application search intention identification method and device
CN106709040B (en) Application search method and server
CN106682169B (en) Application label mining method and device, application searching method and server
US9305083B2 (en) Author disambiguation
CN111324771B (en) Video tag determination method and device, electronic equipment and storage medium
WO2023108980A1 (en) Information push method and device based on text adversarial sample
CN106970991B (en) Similar application identification method and device, application search recommendation method and server
CN109388743B (en) Language model determining method and device
CN103699625A (en) Method and device for retrieving based on keyword
CN111291177A (en) Information processing method and device and computer storage medium
CN103678576A (en) Full-text retrieval system based on dynamic semantic analysis
CN106126619A (en) A kind of video retrieval method based on video content and system
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
CN104484380A (en) Personalized search method and personalized search device
CN108038099B (en) A low-frequency keyword recognition method based on word clustering
CN112559684A (en) Keyword extraction and information retrieval method
CN110008473B (en) Medical text named entity identification and labeling method based on iteration method
CN109446313B (en) Sequencing system and method based on natural language analysis
CN108875065B (en) A content-based recommendation method for Indonesian news pages
CN103577534A (en) Searching method and search engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200515

CF01 Termination of patent right due to non-payment of annual fee