CN113220838B - Method, device, electronic equipment and storage medium for determining key information - Google Patents
Method, device, electronic equipment and storage medium for determining key information Download PDFInfo
- Publication number
- CN113220838B CN113220838B CN202110520029.2A CN202110520029A CN113220838B CN 113220838 B CN113220838 B CN 113220838B CN 202110520029 A CN202110520029 A CN 202110520029A CN 113220838 B CN113220838 B CN 113220838B
- Authority
- CN
- China
- Prior art keywords
- phrase
- target
- determining
- speech
- service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000000605 extraction Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 12
- 230000015654 memory Effects 0.000 claims description 9
- 238000013135 deep learning Methods 0.000 abstract description 4
- 238000003058 natural language processing Methods 0.000 abstract description 4
- 238000007418 data mining Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 14
- 238000004422 calculation algorithm Methods 0.000 description 13
- 238000012545 processing Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 8
- 230000011218 segmentation Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000012423 maintenance Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 238000005034 decoration Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007115 recruitment Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The disclosure provides a method, a device, electronic equipment and a storage medium for determining key information, which are applied to the technical field of electronics, and are particularly applied to the fields of natural language processing, deep learning and data mining. The specific implementation scheme of the method for determining the key information is as follows: determining query information related to a target service as target query information; extracting candidate key phrases from the target query information; and determining the target phrase in the candidate key phrases as key information for the target service based on the similarity between the candidate key phrases and the service names of the target service.
Description
Technical Field
The present disclosure relates to the field of electronic technology, and more particularly, to the fields of natural language processing, deep learning, and data mining, and more particularly, to a method, apparatus, device, and storage medium for determining key information.
Background
With the development of electronic technology and internet technology, people are more inclined to query information through networks. To improve the user experience, how to provide service information meeting the demands to users according to query information input by the users becomes one of research hotspots.
To provide the user with service information meeting the demand, trigger phrases may be set for various types of services. In this manner, the type of service information provided to the user may be determined based on the trigger phrase included in the query information.
Disclosure of Invention
Provided are a method, apparatus, electronic device, and storage medium for determining key information, which improve accuracy.
According to one aspect of the present disclosure, there is provided a method of determining key information, the method comprising: determining query information related to a target service as target query information; extracting candidate key phrases from the target query information; and determining the target phrase in the candidate key phrases as key information for the target service based on the similarity between the candidate key phrases and the service names of the target service.
According to another aspect of the present disclosure, there is provided an apparatus for determining key information, the apparatus comprising: the query information determining module is used for determining query information related to target business as target query information; the phrase extraction module is used for extracting candidate key phrases from the target query information; and a key information determining module for determining a target phrase in the candidate key phrases as key information for the target service based on the similarity between the candidate key phrases and the service names of the target service.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of determining critical information provided by the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of determining key information provided by the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of determining critical information provided by the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic view of an application scenario of a method, apparatus, electronic device and storage medium for determining key information according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method of determining critical information according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of determining query information related to a target service according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of extracting candidate key-phrases from target query information according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of determining target phrases in candidate key phrases according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a method of determining critical information according to an embodiment of the present disclosure;
FIG. 7 is a block diagram of an apparatus for determining critical information according to an embodiment of the present disclosure; and
Fig. 8 is a block diagram of an electronic device for implementing a method of determining critical information of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The present disclosure provides a method of determining key information, the method comprising a query information determination stage, a phrase extraction stage, and a key information determination stage. In the query information determination phase, query information related to a target service is determined as target query information. In the phrase extraction phase, candidate key phrases are extracted from the target query information. In the key information determination stage, a target phrase in the candidate key phrases is determined as key information for the target service based on the similarity between the candidate key phrases and the service names of the target service.
An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.
Fig. 1 is an application scenario diagram of a method and apparatus for determining key information according to an embodiment of the present disclosure.
As shown in fig. 1, the application scenario 100 includes a user 110, a terminal device 120, and a server 130. Terminal device 120 may be communicatively coupled to server 130 via a network, which may include wired or wireless communication links.
The user 110 may interact with the server 130 via a network, for example, using the terminal device 120, to receive or send messages, etc. The terminal device 120 may be a terminal device with a display screen and processing capabilities including, but not limited to, a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like.
Illustratively, the terminal device 120 may obtain the query information 140 in response to a user operation, or may recognize voice information of the user through a voice recognition technology, thereby obtaining the query information 140. The terminal device 120 may also send the query information 140 to the server 130 to obtain service information 150 from the server 130 that meets the user's needs. The query information 140 may be, for example, a query in a search scene.
The server 130 may be a server providing various services, such as a background management server providing support for web sites or client applications browsed by the user using the terminal device 120. The server 130 may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
For example, the server 130 may, for example, in response to the received query information 140, match the query information 140 according to a trigger phrase of the multi-class service information maintained in advance, determine a trigger phrase matched with the query information 140, and feed back service information 150 corresponding to the matched trigger phrase to the terminal device 120.
According to an embodiment of the present disclosure, as shown in fig. 1, the application scenario 100 may further include a database 160. The database 160 stores historical query information. The server 130 may access the database 160, for example, to determine trigger phrases for multiple classes of service information based on historical query information.
The aforementioned multiple types of service information may include, for example, service information related to the fields of work, life, education, and the like, such as "scenic spot ticket" information, "full-time recruitment" information, and "home maintenance" information.
It should be noted that the key information determined by the present disclosure may be the foregoing trigger phrase. The method of determining key information provided by the present disclosure may be performed by the server 130. Accordingly, the apparatus for determining key information provided by the present disclosure may be provided in the server 130.
It should be understood that the number and types of terminal devices, servers, and databases in fig. 1 are merely illustrative. There may be any number and type of terminal devices, servers, and databases as desired for implementation.
Fig. 2 is a flowchart of a method of determining critical information according to an embodiment of the present disclosure.
As shown in fig. 2, the method 200 of determining key information of this embodiment may include operations S210 to S230.
In operation S210, query information related to a target service is determined as target query information.
The target service may be a service capable of providing the aforementioned multiple types of service information, and may include, for example, a "scenic spot ticket" service, a "home maintenance" service, a "full-time recruitment" service, and/or a "house rental" service, etc. The target business is any one of service items provided in a plurality of fields of life, work, education, medical treatment and the like.
For example, query information relating to the target business may be recalled from the historical query information. The historical query information may be stored, for example, according to the target traffic class concerned and indexed by the target traffic concerned. The embodiment can firstly determine the target service and take the historical query information indexed by the target service as the target query information. Or query information with larger similarity with the service name of the target service in the historical query information can be used as target query information. Wherein the determined target query information may be one or more. To improve the accuracy of the determined key information, multiple target query information may be recalled.
In operation S220, candidate key phrases are extracted from the target query information.
According to embodiments of the present disclosure, a phrase (phrase) extraction algorithm may be employed to extract candidate key phrases. The phrase extraction algorithm can be a rapid automatic keyword extraction algorithm (Rapid Automatic Keyword Extraction, RAKE), an extraction algorithm based on mutual information and left and right information entropy, or a KEY PHRASE-extract operator provided by a natural language processing platform, and the like. The phrase refers to a language unit without sentence tone, which is combined by language units capable of collocating on three levels of syntax, semantics and language, and is also called phrase. The phrase is a unit of grammar that is larger than a word but not a sentence.
It will be appreciated that the phrase extraction algorithm described above is merely an example to facilitate an understanding of the present disclosure, which is not limited thereto.
In operation S230, a target phrase among the candidate key phrases is determined as key information for the target service based on the similarity between the candidate key phrases and the service names of the target service.
After obtaining the candidate key-phrase, a semantic similarity calculation method can be used to determine the similarity between the candidate key-phrase and the business name. And selecting phrases with similarity larger than a similarity threshold value from the candidate key phrases and using the phrases with similarity larger than the service name as key information for target services. The similarity threshold may be set according to actual requirements, which is not limited in this disclosure.
Illustratively, semantic similarity calculation methods may include a word shift distance (W o rd river' S DISTANCE) based method, a smooth inverse frequency (Smooth Inverse Frequency) based method, and employing deep structured semantic models (Deep Structured Semantic Models, DSSM), among others. It will be appreciated that the above-described semantic similarity calculation method is merely an example to facilitate understanding of the present disclosure, which is not limited thereto.
After the key information is determined, if the query obtained by the terminal device includes the key information, the embodiment may feed back the service information of the target service to the terminal device, so as to display the service information as feedback information of the response query to the user.
Compared with the technical scheme that key information is obtained by rewriting the service name (such as the service item name) of the target service in the related technology, the embodiment of the disclosure can enlarge the text range of the source text of the key information and can mine the key information which is different from the text included by the service name but related to the semantics by recalling the query information related to the target service from the historical query information and mining the phrase with high similarity to the service name from the query information as the key information. Therefore, the embodiment can obtain the key information with high accuracy and more fitting to the use scene. Based on the technical scheme of the embodiment, better search service is conveniently provided for users.
Fig. 3 is a schematic diagram of determining query information related to a target service according to an embodiment of the present disclosure.
The embodiment of the disclosure can generate the dictionary tree in advance based on the service names of a plurality of services, and when the target query information is recalled, the query information related to the target service is searched from the historical query information based on the dictionary tree. Therefore, the efficiency and the accuracy of recalling the target query information are improved. The plurality of services include the target service, and the plurality of services may be set according to actual requirements, which is not limited in this disclosure.
When the dictionary tree is generated, the service names of each service in the plurality of services can be divided by taking the word as a unit to obtain the phrase aiming at each service. After obtaining the phrase for each service, generating branches for each service based on the phrase, and obtaining a plurality of branches to form a preset dictionary tree based on the plurality of branches.
For example, the business names of the businesses may be processed for word segmentation according to a custom dictionary or a barker word segmentation tool. For example, for the business name "attraction ticket", the phrase { attraction, ticket }, can be obtained by a word segmentation process. Based on the sequence of each word in the phrase, sequentially establishing nodes for indicating each word to obtain a plurality of nodes, and sequentially connecting the plurality of nodes according to the sequence of each word to obtain branches for each service. For the phrase { scenic spot, ticket }, a first node indicating "scenery", a second node indicating "point", a third node indicating "gate" and a fourth node indicating "ticket" may be sequentially established, and the first node, the second node, the third node and the fourth node are sequentially connected, so as to obtain a branch for the business of "scenic spot ticket". A plurality of branches are connected to the root node, and a dictionary tree can be obtained. Wherein the root node does not indicate character information.
Illustratively, after connecting the plurality of branches to the root node, the embodiment may also employ a string matching algorithm to add mismatch pointers to the nodes in the branches. The character string matching Algorithm may be, for example, a Knuth-Morris-Pratt algorism (KMP Algorithm) or a BM (Boyer-Moore) Algorithm. By adding the mismatch pointer, the query efficiency of the dictionary tree can be improved.
When searching for the query information related to the target service from the historical query information based on the dictionary tree, the historical query information can be matched with a preset dictionary tree, and the query information matched with branches aiming at the target service in the preset dictionary tree can be used as the target query information.
For example, a predetermined dictionary tree may be queried based on each historical query information. If a certain historical query information comprises a word string matched with a phrase indicated by a certain branch in a preset dictionary tree, the certain historical query information is reserved, and a query-service pair is formed by the service aimed by the phrase and the certain historical query information. In this way, multiple query-service pairs can be obtained. And taking the historical query information forming a query-service pair with the target service as the target query information. If the mismatch pointer is added to the predetermined dictionary tree, the predetermined dictionary tree may be queried by using the foregoing character string matching algorithm. For example, the predetermined dictionary tree may be generated and queried based on an AC automaton (Aho-Corasick automaton) algorithm, which is not limited by the present disclosure.
In an embodiment, after the query information matched with the branch for the target service is recalled by querying the predetermined dictionary tree, for example, the recalled query information may be further filtered according to the similarity between the query information and the service name of the target service, and the query information obtained by filtering is used as the target query information. In this way, the correlation between the target query information and the target service may be improved, and thus the accuracy of the determined key information for the target service may be facilitated to be improved.
Illustratively, as shown in fig. 3, when determining query information related to a target service, the embodiment 300 may first query the predetermined dictionary tree 320 based on each of the historical query information in the query information base 310, i.e., match the historical query information with the predetermined dictionary tree, and obtain query information matched with branches for the target service in the predetermined dictionary tree as candidate query information 330. After obtaining the candidate query information, a similarity 350 between each candidate query information 330 and the service name 340 of the target service is determined. Candidate query information with the similarity to the service name of the target service greater than or equal to a first threshold value is selected from the candidate query information 330, and target query information 360 is obtained. The similarity 350 between each candidate query information 330 and the service name 340 of the target service may be determined, for example, by using the semantic similarity calculation method described above, which is not limited in this disclosure. The first threshold may be set according to actual requirements, for example, may be set to an arbitrary value not less than 0.5, which is not limited in the present disclosure. For example, if the candidate query information is "modern brief wind", and the service name of the target service is "decoration", and the similarity of the two is 0.58, the candidate query information "modern brief wind" is taken as an item of target query information.
In an embodiment, after obtaining the candidate query information 330 or obtaining the target query information, the obtained query information may be filtered, for example, based on a pre-constructed blacklist, to filter out the query information including the words in the blacklist. The pre-built blacklist may include forbidden words issued by authorities or sensitive words violating laws, society social ethics, and jeopardizing public interests, etc.
Fig. 4 is a schematic diagram of extracting candidate key-phrases from target query information according to an embodiment of the present disclosure.
The method and the device can obtain the preset blacklist which cannot be used as key information according to processing results by performing word segmentation processing, part-of-speech analysis and the like on the target query information. When the candidate key phrase is extracted from the target query information, the keyword extracted from the target query information can be removed according to the predetermined blacklist, and the phrase after the keyword is removed is used as the candidate key phrase. By the method, the determined candidate key phrases can be enabled to meet the setting requirements of the key information to a certain extent, and the efficiency and accuracy of determining the key information are improved.
According to an embodiment of the present disclosure, as shown in fig. 4, the target query information may be plural, and in this embodiment 400, when extracting candidate key phrases from the target query information, the key phrases of each of the plural target query information 410 may be extracted first, so as to obtain plural key phrases 420. Candidate key-phrases of the plurality of key-phrases 420 are then determined based on the predetermined blacklist 430. When determining candidate key phrases in the plurality of key phrases, words belonging to a preset blacklist in the key phrases can be removed based on the preset blacklist, so that the candidate key phrases are obtained.
In one embodiment, non-allowed parts of speech 431 may be included in predetermined blacklist 430. In determining candidate key phrases of the plurality of key phrases, the part of speech of the words included in each of the plurality of key phrases 420 may be determined first. And then, eliminating words with parts of speech which are not allowed parts of speech from each phrase to obtain candidate key phrases. Wherein a lexical analyzer (Lexical Analyzer, LA) or natural language analysis tools of the related art may be employed to perform part of speech analysis on the key phrases to determine the part of speech of the words included in each phrase. The non-permissible parts of speech may be at least one of group/organization nouns, adverbs, english abbreviations, and the like.
In an embodiment, after the above embodiment eliminates words with parts of speech that are not allowed parts of speech from each phrase, the resulting key phrase may be used as the third phrase 440. Candidate key phrases are then screened from the third phrase based on the similarity between the third phrase 440 and the business name 450 of the target business. Specifically, a third phrase having a greater similarity with the service name 450 of the target service may be used as the candidate key phrase. Wherein, the third phrase with larger similarity can be selected according to a preset fourth threshold value. The fourth threshold may be greater than the first threshold described above, thereby progressively increasing the requirements for relevance of the screened phrase to the target business. According to the embodiment, after the words are removed from the key phrases, the third phrases are screened based on the similarity between the third phrases and the service names, so that the accuracy of the key information determined later can be improved, and the unnecessary waste of calculation resources is reduced to a certain extent.
In one embodiment, non-permitted words 432 may also be included in the predetermined blacklist 430. When determining candidate key phrases among the plurality of key phrases, the fourth phrase 460 may be obtained after selecting the third phrase with greater similarity according to the fourth threshold according to the foregoing embodiment. The fourth phrase 460 is a third phrase having a similarity to the business name 450 of the target business greater than or equal to a fourth threshold. After the fourth phrase 460 is obtained, non-allowed words may be removed from the fourth phrase 460, resulting in candidate key phrases 470. Wherein, for example, the fourth phrase 460 may be character-aligned with the non-permitted words 432 to identify non-permitted words in the fourth phrase 460 and cull the non-permitted words. Wherein the non-permissible words in this embodiment may include, for example, words belonging to the non-permissible parts of speech described above. Because of the fact that analysis is inaccurate when part-of-speech analysis is carried out on the phrases, words in the phrases are further removed based on non-allowed words, the determined candidate key phrases can be further enabled to be more in accordance with setting rules of key information, and the efficiency and accuracy of determining the key information are improved.
In accordance with an embodiment of the present disclosure, to facilitate the culling of words in the third key phrase based on non-allowed parts of speech, the embodiment may generate non-allowed parts of speech in advance based on a plurality of target query information. For example, part-of-speech analysis may be performed on the plurality of target query information 410 to obtain the part-of-speech of the term included in each query information in the plurality of target query information. By counting the parts of speech of words in the plurality of target query information, a word list can be generated for each part of speech, and words belonging to each part of speech in the plurality of target query information are listed in the word list. Thus, a vocabulary for each of the multiple parts of speech may be obtained. After obtaining the vocabulary, an average similarity between each word in the vocabulary for each part of speech and the service name of the target service may be determined as the average similarity for each part of speech. The parts of speech for which the average similarity is less than the third threshold is listed as non-allowed parts of speech.
Illustratively, after the vocabulary for each part of speech is obtained, the similarity between each word in the vocabulary and the business name of the target business may be calculated. And calculating the average value of the similarity between all words in the vocabulary and the business names as the average similarity for the part of speech. The third threshold may be set according to actual requirements, for example, may be 0.5, which is not limited in this disclosure.
In order to facilitate the culling of words in the third phrase based on non-permitted words, in accordance with embodiments of the present disclosure, the embodiments may generate non-permitted words based on a plurality of target query information after determining the non-permitted parts of speech. For example, a target part of speech in the non-allowed parts of speech may be determined and all words in the vocabulary for the target part of speech may be taken as non-allowed words. The part of speech targeted by the vocabulary, in which the similarity between each included word and the service name of the target service is smaller than the fifth threshold, may be used as the target part of speech. The fifth threshold may be any value less than the third threshold described above, for example, may be 0.3, etc., which is not limited by the present disclosure.
Fig. 5 is a schematic diagram of determining target phrases in candidate key phrases according to an embodiment of the present disclosure.
According to the embodiment of the disclosure, when the target phrase in the candidate key phrases is determined, the phrase with low confusion (perplexity) can be selected as the target phrase, so that the phrase with high confusion is filtered out, the determined target phrase is more attached to the expression mode of the query information, and the accuracy of the service information fed back to the user is improved. Wherein, in the natural language processing model, the smaller the confusion degree is, the higher the probability of each word in the phrase is, and the higher the compliance degree and the smoothness degree of the text are.
For example, in determining the target phrase among the candidate key phrases, the confusion of the candidate key phrases may be determined first. And taking the candidate key phrases with the confusion degree smaller than the confusion degree threshold value as target phrases. Wherein, for example, a statistical language model or a deep learning language model may be employed to determine the confusion of candidate key phrases. The statistical language model may be an N-Gram model, etc., the deep learning language model may be a convolutional neural network model, a Bi-directional transform encoder (Bidirectional Encoder Representation from Transformers, bert) model, etc., and the convolutional neural network model may be a Long Short-Term Memory (LSTM) network model, a Bi-LSTM) network model, a deep contextualized word expression (Deep contextualizedword representation, ELMo) model, etc. It will be appreciated that the model that determines the degree of confusion is merely an example to facilitate an understanding of the present disclosure, which is not limited by the present disclosure. The confusion threshold may be set according to actual demands, for example, may be 0.5, and the present disclosure is not limited thereto.
For example, for a candidate key phrase "bad with decoration," the candidate key phrase may be filtered out because of the semantic ambiguity, the determined confusion may be high. For the candidate key phrase "brief wind decoration", the candidate key phrase may be considered as the target phrase because the confusion degree obtained by the semantic definition may be low.
According to the embodiment of the disclosure, when the candidate key phrase includes the phrase with the partial word removed based on the blacklist, the similarity between the candidate key phrase and the service name of the target service is different from the similarity between the target query information and the service name of the target service. In this embodiment, after selecting a phrase with low confusion, a phrase with high similarity to the service name of the target service may be selected from the phrases with low confusion as the target phrase. For example, phrases having a similarity to the business name of the target business greater than or equal to a second threshold may be selected. The second threshold may be any value greater than the first threshold described above, and the second threshold may also be greater than the fourth threshold described above, so as to gradually increase the requirements on the relevance between the screened phrase and the target service, and improve the accuracy of the determined target phrase. Through the screening based on the similarity, the relevance of the determined target phrase and the target service can be further improved, and the accuracy of the determined target phrase is improved.
According to the embodiment of the disclosure, in order to improve the hit rate of service information based on the determined target phrase and accurately providing the service information, phrase extraction or eliminating words which do not meet the key information rule of the target service can be further performed on longer candidate key phrases.
Illustratively, as shown in fig. 5, the embodiment 500 may divide the candidate key phrase 510 into a fifth phrase 520 having a length less than or equal to a predetermined length and a first phrase 530 having a length greater than the predetermined length according to the length of each phrase in the candidate key phrase. For the first phrase 530, a phrase having a length less than or equal to a predetermined length may be extracted from the first phrase 530 as the second phrase 550 based on the target part of speech 540 for the target service. And replaces the first phrase with the second phrase 550 resulting in a replaced candidate key phrase 560. Then, in determining the confusion of each phrase in the candidate key phrases 560 after replacement, a phrase 570 with lower confusion (i.e., a phrase with lower confusion than the confusion threshold) is selected from the candidate key phrases 560 after replacement, and a phrase with higher similarity with the business name of the target business is selected from the phrase 570 with lower confusion as the target phrase 580.
The predetermined length may be set according to actual requirements, and may be set to any value such as 6. The target part of speech 540 may be set in advance according to actual demands, and may include verbs, nouns, and the like, for example. For example, if the candidate key phrase is "bad refrigerator needs maintenance", and the length of the phrase exceeds the predetermined length, word segmentation and part-of-speech analysis may be performed on the candidate key phrase, and the "refrigerator" and the "maintenance" with the part-of-speech as the target part-of-speech are reserved, so as to obtain a second phrase "refrigerator maintenance".
For example, if the phrase length of the phrase belonging to the target part of speech in the candidate key phrase is still greater than the predetermined length, a word string of a predetermined length located at a preceding position may be selected as the second phrase. Or the phrase still greater than the predetermined length may be filtered out directly.
According to embodiments of the present disclosure, after the replaced candidate key phrase 560 is obtained, a deduplication operation may be performed on the replaced candidate key phrase 560. The confusion is then calculated for the candidate key-phrases remaining after the deduplication operation. In this way, repeated processing of the same phrase can be reduced, and processing efficiency can be improved to some extent.
Fig. 6 is a schematic diagram of a method of determining key information according to an embodiment of the present disclosure.
As shown in fig. 6, in an application scenario in which service items are provided to a user in response to query information, the method 600 of determining key information of this embodiment may be divided into a target query determination flow 610, a candidate key phrase determination flow 620, and a trigger phrase determination flow 630 as a whole.
In the target query determination flow 610, word segmentation may be performed on each service item name in the service item category 611 to obtain word pairs 612 for each service item. After the word pair 612 is obtained, a dictionary tree (Trie tree 613) may be generated based on the word pair 612. The above operation is similar to the operation of generating the dictionary tree described above. In obtaining the dictionary tree, the Trie tree 613 may be queried based on each query in the query library 614, and the query matching the Trie tree and the service item for which the branch matching the query is directed may be formed into a query-service item pair 615. The queries in the query-service item pair 615 are screened based on the similarity between the service item names and the service item names, and the queries with higher similarity obtained by screening are used as target queries 616 for subsequent process standby, so that the target query determining process 610 is completed.
After completing the target query determination process 610, the target queries may be first grouped based on the service items paired with the target queries, so as to divide the queries paired with the same service item into a group, and obtain a query group for each service item. The candidate key phrase determination flow 620 and trigger phrase determination flow 630 are then performed with each service item as the target business described previously.
In the candidate key phrase determining process 620, each query in the query group for each service item may be subjected to word segmentation and part of speech analysis, and a predetermined blacklist 621 may be generated according to a similarity statistical result between each word obtained by the word segmentation and the service item name, where the predetermined blacklist 621 includes a non-allowed part of speech 6211 and a non-allowed word 6212. The predetermined blacklist 621 is generated in a similar manner to the method described above. The key phrase 622 is obtained by extracting phrases from each target query in the set of queries for the target service using the phrase extraction algorithm described previously. After obtaining the predetermined blacklist 621 and the key phrase 622, words with parts of speech being non-allowed parts of speech 6211 in the key phrase 622 may be removed based on the predetermined blacklist 621, and the non-allowed words 6212 may be removed, to obtain candidate key phrases 623. Wherein the method of deriving candidate key-phrases 623 based on a predetermined blacklist is similar to the method described previously.
In the trigger phrase determination flow 630, operation S631 is first performed to determine whether the length of each phrase in the candidate key phrases 623 is equal to or less than a predetermined length. If less than or equal to the predetermined length, the confusion 633 of the phrase is directly calculated. If the word is greater than the predetermined length, the word of the target part of speech may be first selected from the phrases, and the words selected from the phrases may be formed into a second phrase 632 corresponding to the phrases. The second phrase 632 has a length equal to or less than a predetermined length. The confusion 633 for this second phrase 632 is then calculated. After obtaining the confusion degree of each phrase having a length equal to or less than the predetermined length, the phrases 634 having low confusion degree (may be the phrases having the confusion degree less than the confusion degree threshold described above) are selected from all the phrases. Finally, the trigger phrase 635 is obtained by de-duplication and similarity screening of the low confusion phrase 634. And selecting a phrase with higher similarity with the service item name from the first key phrases by similarity screening.
Based on the above method for determining key information, the present disclosure further provides an apparatus for determining key information, which will be described in detail below with reference to fig. 7.
Fig. 7 is a block diagram of an apparatus for determining key information according to an embodiment of the present disclosure.
As shown in fig. 7, the apparatus 700 for determining key information may include a query information determination module 710, a phrase extraction module 720, and a key information determination module 730.
The query information determination module 710 is configured to determine query information related to a target service as target query information. In an embodiment, the query information determining module 710 may be configured to perform the operation S210 described above, which is not described herein.
The phrase extraction module 720 is configured to extract candidate key phrases from the target query information. In an embodiment, the phrase extraction module 720 may be configured to perform the operation S220 described above, which is not described herein.
The key information determining module 730 is configured to determine, as key information for the target service, a target phrase in the candidate key phrases based on a similarity between the candidate key phrases and the service names of the target service. In an embodiment, the key information determining module 730 may be configured to perform the operation S230 described above, which is not described herein.
The query information determination module 710 may include a matching sub-module and an information determination sub-module according to an embodiment of the present disclosure. And the matching sub-module is used for matching the historical query information with a preset dictionary tree to obtain candidate query information. The information determination submodule is used for determining candidate query information with similarity between the candidate query information and the service name of the target service being greater than or equal to a first threshold value as target query information. Wherein the predetermined dictionary tree includes a plurality of branches including branches for the target traffic, the candidate query information matching the branches for the target traffic.
The key information determination module 730 may include a confusion determination submodule and a target phrase determination submodule according to an embodiment of the present disclosure. The confusion determination submodule is used for determining the confusion of the candidate key phrases. The target phrase determination submodule is used for determining the target phrase based on the candidate key phrase with the confusion degree smaller than the confusion degree threshold value.
According to an embodiment of the present disclosure, the above-mentioned target phrase determining submodule is specifically configured to: and determining the key phrase with the similarity with the service name of the target service being greater than or equal to a second threshold value from the candidate key phrases with the confusion degree being smaller than the confusion degree threshold value as the target phrase.
According to an embodiment of the present disclosure, the above-described confusion determining submodule may include a first phrase determining unit, a phrase extracting unit, and a confusion determining unit. The first phrase determining unit is used for determining a first phrase with a length larger than a preset length in the candidate key phrases. The phrase extraction unit is used for extracting a second phrase with a length larger than or equal to a preset length from the first phrase based on the target part of speech for the target service, and replacing the first phrase with the second phrase. The confusion determining unit is used for determining the confusion of the second phrase.
According to an embodiment of the present disclosure, the target query information is plural, and the phrase extraction module 720 may include a phrase extraction sub-module and a candidate phrase determination sub-module. The phrase extraction sub-module is used for extracting the key phrases of each of the plurality of target query information to obtain a plurality of key phrases. The candidate phrase determination submodule is used for determining candidate key phrases in the plurality of key phrases based on a predetermined blacklist. Wherein the predetermined blacklist is generated based on the plurality of target query information.
According to an embodiment of the present disclosure, the predetermined blacklist includes non-allowed parts of speech. The phrase extraction submodule comprises a part-of-speech determination unit, a word rejection unit and a second phrase determination unit. The part-of-speech determination unit is used for determining the part of speech of each word in each phrase for each phrase in the plurality of key phrases. The word eliminating unit is used for eliminating words with parts of speech not allowed from each phrase to obtain a third phrase. The second phrase determining unit is used for determining candidate key phrases based on the similarity between the third phrase and the service name of the target service.
The apparatus 700 for determining key information may further include a part-of-speech generation module for generating a non-allowed part-of-speech based on the plurality of target query information, according to an embodiment of the present disclosure. The part-of-speech generation module may include a part-of-speech statistics sub-module, a similarity determination sub-module, and a first part-of-speech determination sub-module. The part-of-speech statistics sub-module is used for counting the parts of speech of words in the plurality of target query information, and obtaining a vocabulary aiming at each part of speech in the plurality of parts of speech. The similarity determination submodule is used for determining average similarity between each word in the word list for each part of speech and the service name of the target service as the average similarity for each part of speech. The first part-of-speech determination submodule is used for determining that the part of speech for which the average similarity is smaller than a third threshold is a non-allowed part of speech.
The predetermined blacklist may further include non-permitted words according to an embodiment of the present disclosure. The second phrase determining unit may include a phrase determining subunit and a word rejecting subunit. The phrase determining subunit is configured to determine a third phrase with a similarity with the service name of the target service being greater than or equal to a fourth threshold value, so as to obtain a fourth phrase. The word eliminating subunit is used for eliminating non-allowed words from the fourth phrase to obtain candidate key phrases, wherein the non-allowed words comprise words belonging to non-allowed parts of speech.
The apparatus 700 for determining key information may further include a word generation module for generating non-allowed words based on the plurality of target query information according to an embodiment of the present disclosure. The word generation module may include a second part-of-speech determination sub-module and a word determination sub-module. The second part-of-speech determining sub-module is used for determining a target part of speech in the non-allowed part of speech, and the similarity between each word in the vocabulary aiming at the target part of speech and the service name of the target service is smaller than a fifth threshold value. The word determination submodule is used for determining that words in the word list aiming at the target part of speech are non-allowed words.
According to an embodiment of the present disclosure, the apparatus 700 for determining key information may further include a dictionary tree generation module for generating a predetermined dictionary tree based on service names of a plurality of services. The dictionary tree generation module may include a phrase obtaining sub-module and a dictionary tree generation sub-module. The phrase obtaining sub-module is used for dividing service names of each service in the plurality of services by taking words as units to obtain phrases aiming at each service. The dictionary tree generation submodule is used for generating branches for each service according to the phrase for each service to obtain a plurality of branches so as to form a preset dictionary tree.
It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, etc. of the related personal information of the user all conform to the rules of the related laws and regulations, and do not violate the popular regulations of the public order.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 8 illustrates a schematic block diagram of an electronic device 800 that may be used to determine critical information in practicing embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, for example, a method of determining key information. For example, in some embodiments, the method of determining critical information may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of the method of determining critical information described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the method of determining critical information by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, also referred to as a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual PRIVATE SERVER" or simply "VPS"). The server may also be a server of a distributed system or a server that incorporates a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.
Claims (21)
1. A method of determining critical information, comprising:
determining query information related to a target service as target query information;
Extracting candidate key phrases from the target query information; and
Determining target phrases in the candidate key phrases based on the similarity between the candidate key phrases and the service names of the target service, wherein the target phrases are used as key information for the target service;
Wherein the target inquiry information is a plurality of pieces; the extracting candidate key phrases from the target query information comprises:
extracting key phrases of each of the plurality of target query information to obtain a plurality of key phrases; and
Determining a candidate key phrase of the plurality of key phrases based on a predetermined blacklist,
Wherein the predetermined blacklist is generated based on the plurality of target query information; the predetermined blacklist includes non-allowed parts of speech; the method further includes generating the non-allowed parts-of-speech based on the plurality of target query information by:
counting the parts of speech of words in the target query information to obtain word lists aiming at each part of speech in a plurality of parts of speech;
Determining the average similarity between each word in the vocabulary aiming at each part of speech and the service name of the target service as the average similarity aiming at each part of speech; and
And determining the part of speech for which the average similarity is smaller than a third threshold value as the non-allowed part of speech.
2. The method of claim 1, wherein determining query information related to a target service comprises:
Matching the historical query information with a preset dictionary tree to obtain candidate query information; and
Candidate query information with the similarity with the service name of the target service being greater than or equal to a first threshold value is determined as the target query information,
Wherein the predetermined dictionary tree comprises a plurality of branches including branches for the target service, the candidate query information matching the branches for the target service.
3. The method of claim 1 or 2, wherein determining a target phrase of the candidate key phrases comprises:
determining a degree of confusion of the candidate key phrase; and
And determining the target phrase based on the candidate key phrases with the confusion degree smaller than a confusion degree threshold value.
4. The method of claim 3, wherein determining the target phrase comprises:
and determining a key phrase with the similarity with the service name of the target service being greater than or equal to a second threshold value from the candidate key phrases with the confusion degree being smaller than the confusion degree threshold value as the target phrase.
5. The method of claim 3, wherein determining the confusion of the candidate key phrase comprises:
Determining a first phrase with a length greater than a preset length in the candidate key phrases;
Extracting a second phrase with a length less than or equal to the predetermined length from the first phrase based on a target part of speech for the target service, and replacing the first phrase with the second phrase; and
Determining a degree of confusion for the second phrase.
6. The method of claim 1 or 2, wherein the determining candidate key-phrases of the plurality of key-phrases comprises:
Determining, for each phrase of the plurality of key phrases, a part of speech of each word in the each phrase;
removing words with parts of speech not allowing parts of speech from each phrase to obtain a third phrase; and
And determining the candidate key phrase based on the similarity between the third phrase and the service name of the target service.
7. The method of claim 6, wherein the predetermined blacklist further includes non-permitted words; determining the candidate key-phrase includes:
determining a third phrase with the similarity between the service names of the target service being greater than or equal to a fourth threshold value to obtain a fourth phrase; and
Removing the non-allowed words from the fourth phrase to obtain the candidate key phrase,
Wherein the non-permissible words include words belonging to the non-permissible part of speech.
8. The method of claim 7, further comprising generating non-permitted words based on the plurality of target query information by:
determining a target part of speech in the non-allowed part of speech, wherein the similarity between each word in a vocabulary aiming at the target part of speech and the service name of the target service is smaller than a fifth threshold; and
And determining the word in the word list aiming at the target part of speech as the non-allowed word.
9. The method of claim 2, further comprising generating the predetermined dictionary tree based on business names of a plurality of businesses by:
Dividing service names of each service in the plurality of services by taking words as units to obtain phrases aiming at each service; and
Generating branches for each service according to the phrase for each service, and obtaining the branches to form the preset dictionary tree.
10. An apparatus for determining critical information, comprising:
The query information determining module is used for determining query information related to target business as target query information;
the phrase extraction module is used for extracting candidate key phrases from the target query information; and
A key information determining module, configured to determine, based on a similarity between the candidate key phrase and a service name of the target service, a target phrase in the candidate key phrase as key information for the target service;
wherein the target inquiry information is a plurality of pieces; the phrase extraction module comprises:
The phrase extraction sub-module is used for extracting key phrases of each of the plurality of target query information to obtain a plurality of key phrases; and
A candidate phrase determination sub-module for determining candidate key phrases of the plurality of key phrases based on a predetermined blacklist,
Wherein the predetermined blacklist is generated based on the plurality of target query information; the predetermined blacklist includes non-allowed parts of speech; the apparatus further includes a part-of-speech generation module to generate the non-allowed parts-of-speech based on the plurality of target query information; the part-of-speech generation module comprises:
The part-of-speech statistics sub-module is used for counting the parts of speech of words in the plurality of target query information to obtain word lists aiming at each part of speech in a plurality of parts of speech;
a similarity determination submodule, configured to determine an average similarity between each word in the vocabulary for each part of speech and the service name of the target service, as the average similarity for each part of speech; and
And the first part-of-speech determining submodule is used for determining the part of speech for which the average similarity smaller than the third threshold value is the non-allowed part of speech.
11. The apparatus of claim 10, wherein the query information determination module comprises:
The matching sub-module is used for matching the historical query information with a preset dictionary tree to obtain candidate query information; and
An information determination sub-module for determining candidate query information with similarity with the service name of the target service being greater than or equal to a first threshold value as the target query information,
Wherein the predetermined dictionary tree comprises a plurality of branches including branches for the target service, the candidate query information matching the branches for the target service.
12. The apparatus of claim 10 or 11, wherein the critical information determination module comprises:
a confusion determining submodule for determining the confusion of the candidate key phrase; and
And the target phrase determining sub-module is used for determining the target phrase based on the candidate key phrases with the confusion degree smaller than a confusion degree threshold value.
13. The apparatus of claim 12, wherein the target phrase determination submodule is specifically configured to:
and determining a key phrase with the similarity with the service name of the target service being greater than or equal to a second threshold value from the candidate key phrases with the confusion degree being smaller than the confusion degree threshold value as the target phrase.
14. The apparatus of claim 12, wherein the confusion-determination submodule comprises:
a first phrase determining unit, configured to determine a first phrase with a length greater than a predetermined length from the candidate key phrases;
A phrase extraction unit configured to extract a second phrase having a length less than or equal to the predetermined length from the first phrase based on a target part of speech for the target service, and replace the first phrase with the second phrase; and
A confusion determining unit configured to determine a confusion of the second phrase.
15. The apparatus of claim 10 or 11, wherein the phrase extraction submodule comprises:
a part-of-speech determining unit configured to determine, for each of the plurality of key phrases, a part-of-speech of each word in the each phrase;
A word eliminating unit, configured to eliminate words with parts of speech being the non-allowed parts of speech from each phrase to obtain a third phrase; and
And the second phrase determining unit is used for determining the candidate key phrase based on the similarity between the third phrase and the service name of the target service.
16. The apparatus of claim 15, wherein the predetermined blacklist further includes non-permitted words; the second phrase determining unit includes:
a phrase determining subunit, configured to determine a third phrase with a similarity with the service name of the target service being greater than or equal to a fourth threshold value, to obtain a fourth phrase; and
A word eliminating subunit, configured to eliminate the non-allowed word from the fourth phrase to obtain the candidate key phrase,
Wherein the non-permissible words include words belonging to the non-permissible part of speech.
17. The apparatus of claim 16, further comprising a word generation module to generate non-permitted words based on the plurality of target query information; the word generation module includes:
A second part-of-speech determining sub-module, configured to determine a target part of speech in the non-allowed parts of speech, where, for each word in a vocabulary of the target part of speech, a similarity between the word and a service name of the target service is smaller than a fifth threshold; and
And the word determining submodule is used for determining that the word in the word list aiming at the target part of speech is the non-allowed word.
18. The apparatus of claim 11, further comprising a dictionary tree generation module to generate the predetermined dictionary tree based on business names of a plurality of businesses; the dictionary tree generation module comprises:
the phrase obtaining sub-module is used for dividing the service names of each service in the plurality of services by taking a word as a unit to obtain phrases aiming at each service; and
And the dictionary tree generation sub-module is used for generating branches for each service according to the phrase for each service to obtain the branches so as to form the preset dictionary tree.
19. An electronic device, comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.
21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110520029.2A CN113220838B (en) | 2021-05-12 | 2021-05-12 | Method, device, electronic equipment and storage medium for determining key information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110520029.2A CN113220838B (en) | 2021-05-12 | 2021-05-12 | Method, device, electronic equipment and storage medium for determining key information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113220838A CN113220838A (en) | 2021-08-06 |
CN113220838B true CN113220838B (en) | 2024-09-17 |
Family
ID=77095275
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110520029.2A Active CN113220838B (en) | 2021-05-12 | 2021-05-12 | Method, device, electronic equipment and storage medium for determining key information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113220838B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114691829B (en) * | 2022-03-28 | 2025-05-13 | 北京捷通华声科技股份有限公司 | Information query method and device |
CN114996494A (en) * | 2022-06-24 | 2022-09-02 | 广东电网有限责任公司 | Image processing method, device, electronic device and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112632292A (en) * | 2020-12-23 | 2021-04-09 | 深圳壹账通智能科技有限公司 | Method, device and equipment for extracting service keywords and storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7627559B2 (en) * | 2005-12-15 | 2009-12-01 | Microsoft Corporation | Context-based key phrase discovery and similarity measurement utilizing search engine query logs |
CN105447020B (en) * | 2014-08-22 | 2018-11-27 | 阿里巴巴集团控股有限公司 | A kind of method and device of determining business object keyword |
CN105701253B (en) * | 2016-03-04 | 2019-03-26 | 南京大学 | The knowledge base automatic question-answering method of Chinese natural language question semanteme |
CN108121737B (en) * | 2016-11-29 | 2022-04-26 | 阿里巴巴集团控股有限公司 | Method, device and system for generating business object attribute identifier |
CN112115227B (en) * | 2020-08-14 | 2024-05-24 | 咪咕文化科技有限公司 | Data query method and device, electronic equipment and storage medium |
CN112328762B (en) * | 2020-11-04 | 2023-12-19 | 平安科技(深圳)有限公司 | Question-answer corpus generation method and device based on text generation model |
-
2021
- 2021-05-12 CN CN202110520029.2A patent/CN113220838B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112632292A (en) * | 2020-12-23 | 2021-04-09 | 深圳壹账通智能科技有限公司 | Method, device and equipment for extracting service keywords and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113220838A (en) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111460083B (en) | Method and device for constructing document title tree, electronic equipment and storage medium | |
CN107301170B (en) | Method and device for segmenting sentences based on artificial intelligence | |
CN112395385B (en) | Text generation method and device based on artificial intelligence, computer equipment and medium | |
CN111967262A (en) | Method and device for determining entity tag | |
CN112417103A (en) | Method, apparatus, device and storage medium for detecting sensitive words | |
CN112948573B (en) | Text label extraction method, device, equipment and computer storage medium | |
EP3992814A2 (en) | Method and apparatus for generating user interest profile, electronic device and storage medium | |
CN111984774B (en) | Searching method, searching device, searching equipment and storage medium | |
CN113220838B (en) | Method, device, electronic equipment and storage medium for determining key information | |
CN112989235B (en) | Knowledge base-based inner link construction method, device, equipment and storage medium | |
CN112925912A (en) | Text processing method, and synonymous text recall method and device | |
CN112380847A (en) | Interest point processing method and device, electronic equipment and storage medium | |
CN112560425A (en) | Template generation method and device, electronic equipment and storage medium | |
CN115563242A (en) | Automobile information screening method and device, electronic equipment and storage medium | |
CN111930949B (en) | Search string processing method and device, computer readable medium and electronic equipment | |
CN114417116A (en) | Search method, apparatus, device, medium, and program product based on search word | |
CN113868508A (en) | Writing material query method and device, electronic equipment and storage medium | |
CN117312513B (en) | Document search model training method, document search method and related device | |
CN114491232B (en) | Information query method and device, electronic equipment and storage medium | |
CN114706956A (en) | Classification information obtaining method, classification information obtaining device, classification information classifying device, electronic equipment and storage medium | |
CN114818732A (en) | Text content evaluation method, related device and computer program product | |
CN114417862A (en) | Text matching method, training method and device for text matching model | |
CN114201607A (en) | Information processing method and device | |
CN112784600A (en) | Information sorting method and device, electronic equipment and storage medium | |
CN114186552B (en) | Text analysis method, device and equipment and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |