CN117009470A

CN117009470A - Entity and intention recognition method, device, equipment, storage medium and product

Info

Publication number: CN117009470A
Application number: CN202211167620.5A
Authority: CN
Inventors: 戴洪良; 张云燕
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2023-11-07

Abstract

The application provides a method, a device, equipment, a storage medium and a product for identifying an entity and an intention, and belongs to the technical field of artificial intelligence. The method comprises the following steps: determining a plurality of words in the text based on a first vocabulary and a second vocabulary, the first vocabulary comprising a plurality of non-rare words, the second vocabulary comprising a plurality of rare words and entity categories to which the plurality of rare words belong; acquiring feature vectors of a plurality of words, wherein the feature vectors of rare words in the plurality of words are represented based on the feature vectors of entity categories to which the rare words belong; acquiring feature vectors of a plurality of words in a text; based on the feature vectors of the words and the feature vectors of the words, obtaining a hidden vector sequence of the text, wherein the hidden vector sequence comprises a hidden vector sequence of the words and a hidden vector sequence of the words; based on the hidden vector sequence of the words in the hidden vector sequence, carrying out entity recognition on the text to obtain an entity recognition result; based on the hidden vector sequence, intention recognition is carried out on the text, so that an intention recognition result is obtained, and the accuracy of entity recognition and intention recognition is improved.

Description

Entity and intention recognition method, device, equipment, storage medium and product

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a product for identifying an entity and an intention.

Background

In intelligent question-answering systems, entity recognition and intent recognition are important for correctly answering questions. While entity recognition and intent recognition are typically implemented based on recognition models. However, since it is difficult to cover all the entity words in the recognition model, the range of the entity words is wide, and the recognition model is generally obtained based on common training of the entity words. Thus, when an entity word which is not trained in the recognition model appears in the text to be recognized, the situation of recognition errors occurs, and the entity recognition result and the intention recognition result are inaccurate.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment, a storage medium and a product for identifying an entity and an intention, which can improve the accuracy of entity identification and intention identification. The technical scheme is as follows:

in one aspect, there is provided a method of identifying an entity and an intent, the method comprising:

determining a plurality of words in text based on a first vocabulary and a second vocabulary, the first vocabulary comprising a plurality of non-rare words, the second vocabulary comprising a plurality of rare words and entity categories to which the plurality of rare words belong;

Acquiring feature vectors of the plurality of words, wherein the feature vectors of rare words in the plurality of words are represented based on the feature vectors of entity categories to which the rare words belong;

acquiring feature vectors of a plurality of words in the text;

obtaining a hidden vector sequence of the text based on the feature vectors of the words and the feature vectors of the words, wherein the hidden vector sequence comprises a hidden vector sequence of the words and a hidden vector sequence of the words, the hidden vector sequence of the words comprises hidden vectors of the words, and the hidden vector sequence of the words comprises hidden vectors of the words;

based on the hidden vector sequence of the words in the hidden vector sequence, carrying out entity recognition on the text to obtain an entity recognition result;

and carrying out intention recognition on the text based on the hidden vector sequence to obtain an intention recognition result.

In another aspect, there is provided an entity and intention recognition apparatus, the apparatus comprising:

a word determining module, configured to determine a plurality of words in a text based on a first vocabulary and a second vocabulary, where the first vocabulary includes a plurality of non-rare words, and the second vocabulary includes a plurality of rare words and entity categories to which the plurality of rare words belong;

The feature vector acquisition module is used for acquiring feature vectors of the plurality of words, and the feature vectors of rare words in the plurality of words are represented based on the feature vectors of the entity categories to which the rare words belong;

the feature vector acquisition module is also used for acquiring feature vectors of a plurality of words in the text;

the hidden vector sequence determining module is used for obtaining a hidden vector sequence of the text based on the feature vectors of the words and the feature vectors of the words, wherein the hidden vector sequence comprises a hidden vector sequence of the words and a hidden vector sequence of the words, the hidden vector sequence of the words comprises hidden vectors of the words, and the hidden vector sequence of the words comprises hidden vectors of the words;

the entity recognition module is used for carrying out entity recognition on the text based on the hidden vector sequence of the words in the hidden vector sequence to obtain an entity recognition result;

and the intention recognition module is used for carrying out intention recognition on the text based on the hidden vector sequence to obtain an intention recognition result.

In some embodiments, the entity recognition result and the intent recognition result are obtained based on a target recognition model, the target recognition model including an input module, a coding module, an entity recognition module, and an intent recognition module;

The feature vector acquisition module is used for inputting the text into the input module of the target recognition model, and the feature vectors of the words are obtained through the input module;

the hidden vector sequence determining module is used for inputting the feature vectors of the plurality of words and the feature vectors of the plurality of words into the encoding module, and extracting attention features of the feature vectors of the plurality of words and the feature vectors of the plurality of words through the encoding module to obtain the hidden vector sequence;

the entity recognition module is used for inputting the hidden vector sequence of the word into the entity recognition module, and obtaining the entity recognition result through the entity recognition module;

the intention recognition module is used for inputting the hidden vector sequence into the intention recognition module, and obtaining the intention recognition result through the intention recognition module.

In some embodiments, the feature vector acquisition module is configured to:

inputting the feature vectors of the plurality of words and the feature vectors of the plurality of words into the encoding module;

performing linear transformation on the first feature matrix through the coding module to obtain a second feature matrix, wherein each row vector in the first feature matrix is a feature vector of a word or a feature vector of a word;

Obtaining a weight matrix based on a plurality of row vectors in the first feature matrix, wherein each element in the weight matrix represents the degree of association between two row vectors in the first feature matrix;

and performing activation processing on the weight matrix, and obtaining the hidden vector sequence based on the weight matrix after the activation processing and the second feature matrix.

In some embodiments, the weight matrix includes a plurality of elements, and the feature vector acquisition module is configured to:

acquiring a first position and a second position, wherein the first position is the position of a character or word indicated by an ith row vector in the first feature matrix in the text, the second position is the position of a character or word indicated by a jth row vector in the second feature matrix in the text, and i and j are integers larger than 0;

determining a position vector based on a position difference of the first position and the second position;

and fusing the ith row vector, the jth row vector and the position vector to obtain elements positioned in the ith row and the jth column in the weight matrix.

In some embodiments, the entity recognition result is a target tag sequence of the text, where the target tag sequence includes category tags of the plurality of words, and the category tag of each word is used to represent an entity category of a target word where the word is located and a position of the word in the target word;

The entity recognition module is used for inputting the hidden vector sequence of the word into the entity recognition module;

determining, by the entity recognition module, for any one of the plurality of words, a degree parameter of each of the plurality of types of words labeled as a plurality of types of preset type labels based on the hidden vector of the word and a preset type label of the word having a preset position difference from the word, wherein the degree parameter of each type of preset type label is used for indicating a proper degree of the word labeled as the preset type label;

based on the degree parameters of the plurality of words marked as the plurality of preset category labels and the probability of the plurality of candidate label sequences, the plurality of candidate label sequences are obtained based on the combination of the plurality of preset category labels;

and outputting the candidate tag sequence with the highest probability among the plurality of candidate tag sequences as the target tag sequence.

In some embodiments, the intent recognition result includes probabilities of a plurality of intent categories, the intent recognition module to:

inputting the hidden vector sequence into the intention recognition module;

determining a plurality of target preset category labels from a plurality of preset category labels through the intention recognition module, wherein any one preset category label indicates the entity category of a target word where a marked word is located and the position of the word in the target word, and the target preset category label indicates that the marked word is located at the head of the target word;

For each target preset category label in the plurality of target preset category labels, determining the probability that the plurality of words are respectively marked as the target preset category labels based on the hidden vectors of the plurality of words;

determining the probability that the target preset category label is marked in the text based on the probability that the plurality of words are marked as the target preset category label respectively;

determining a first probability vector based on the probability that the plurality of target preset category labels are marked in the text;

extracting attention features of the hidden vector sequence to obtain an attention vector;

and fusing the first probability vector and the attention vector to obtain a second probability vector, wherein the second probability vector comprises probabilities of a plurality of intention categories.

In some embodiments, the intent recognition module is configured to determine, for each word of the plurality of words, a probability that the word is labeled as the target preset category label based on a hidden vector of the word and a target model parameter, the target model parameter being a model parameter obtained from the entity recognition module for determining the probability that the word is labeled as any preset category label.

In some embodiments, the intent recognition module is to:

performing linear transformation on the first hidden vector matrix to obtain a second hidden vector matrix, wherein each row vector in the first hidden vector matrix is one hidden vector in the hidden vector sequence;

obtaining a weight vector based on a plurality of row vectors in the first hidden vector matrix, wherein each element in the weight vector represents the importance degree of one row vector in the first hidden vector matrix;

and performing activation processing on the weight vector, and obtaining the attention vector based on the weight vector after the activation processing and the second hidden vector matrix.

In some embodiments, the apparatus further comprises a training module for:

inputting a training sample into the target recognition model, obtaining a predicted entity recognition result through the entity recognition module, and determining a first loss value based on the predicted entity recognition result and a real entity recognition result, wherein the first loss value represents a gap between the predicted entity recognition result and the real entity recognition result;

obtaining a predicted intent recognition result through the intent recognition module, and determining a second loss value based on the predicted intent recognition result and a real intent recognition result, wherein the second loss value represents a gap between the predicted intent recognition result and the real intent recognition result;

Weighting and summing the first loss value and the second loss value to obtain a third loss value;

model parameters of the object recognition model are adjusted based on the third loss value.

In another aspect, a computer device is provided, the computer device including a processor and a memory for storing at least one segment of a computer program loaded and executed by the processor to implement a method of identifying entities and intents in an embodiment of the present application.

In another aspect, a computer readable storage medium having stored therein at least one segment of a computer program loaded and executed by a processor to implement a method of identifying entities and intents as in embodiments of the present application is provided.

In another aspect, a computer program product is provided, the computer program product comprising computer program code, the computer program code being stored in a computer readable storage medium, the computer program code being read from the computer readable storage medium by a processor of a computer device, the processor executing the computer program code such that the computer device performs the method of identifying an entity and an intent as described in any of the above implementations.

The embodiment of the application provides a method for identifying entities and intentions, which can determine a plurality of words in a text based on a first vocabulary comprising non-rare words and a second vocabulary comprising rare words, so that the rare words in the text can be determined, and the determined words are more comprehensive. And the feature vector of the rare word is represented based on the feature vector of the entity class to which the rare word belongs, so that the related information of the rare word can be obtained, and more accurate recognition results can be obtained when the entity recognition and the intention recognition are carried out, thereby improving the accuracy of the entity recognition and the intention recognition.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flow chart of a method for identifying entities and intents provided by an embodiment of the present application;

FIG. 3 is a flow chart of another method for identifying entities and intents provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of an object recognition model according to an embodiment of the present application;

fig. 5 is an interface schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 6 is a block diagram of an entity and intent recognition device provided by an embodiment of the present application;

fig. 7 is a block diagram of a terminal according to an embodiment of the present application;

fig. 8 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and "n," and that there is no limitation on the amount and order of execution.

The term "at least one" in the present application means one or more, and the meaning of "a plurality of" means two or more.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions. For example, the text referred to in this application is obtained with sufficient authorization.

Hereinafter, terms related to the present application will be explained.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Natural language processing (Nature Language Processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

The following describes an implementation environment according to the present application:

the entity and intention identification method provided by the embodiment of the application can be executed by computer equipment. In some embodiments, the computer device is at least one of a terminal and a server. An implementation environment schematic diagram of the entity and intention recognition method provided by the embodiment of the application is described below. Referring to fig. 1, the implementation environment includes a terminal 101 and a server 102. The terminal 101 and the server 102 can be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

In some embodiments, the terminal 101 is provided with a target application, which may be an application for performing intelligent question answering, and after the terminal 101 obtains the input text containing the question, the text is sent to the server 102. The server 102 is a background server 102 of the target application, and the server 102 is configured to identify an entity and an intention of the text, further obtain a pre-configured corresponding answer based on the entity identification result and the intention identification result, and send the answer to the terminal 101 for display.

In some embodiments, after the terminal 101 obtains the input text containing the question, the terminal 101 performs entity and intention recognition on the text, then sends the entity recognition result and the intention recognition result to the server 102, and the server 102 obtains a pre-configured corresponding answer based on the entity recognition result and the intention recognition result, and sends the answer to the terminal 101 for display.

In some embodiments, after the terminal 101 obtains the input text containing the question, the terminal 101 performs entity and intention recognition on the text, and further obtains a pre-configured corresponding answer based on the entity recognition result and the intention recognition result, and then displays the answer.

The method provided by the embodiment of the application can be applied to the search field, and the search field can search positions, search weather and the like, namely, the method provided by the embodiment of the application provides search services in the search field.

In some embodiments, the terminal 101 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, and the like. In some embodiments, the server 102 is a server cluster or a distributed system formed by a plurality of servers, and can also be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms. The server 102 is used to provide background services for a target application installed by the terminal 101. In some embodiments, the server 102 primarily takes on computing work and the terminal 101 takes on secondary computing work; alternatively, the server 102 assumes secondary computing services and the terminal 101 assumes primary computing tasks; alternatively, a distributed computing architecture is used for collaborative computing between the server 102 and the terminal 101.

Fig. 2 is a flowchart of a method for identifying entities and intents according to an embodiment of the present application, referring to fig. 2, in an embodiment of the present application, the method is described by taking a computer device as an example, and includes the following steps:

201. the computer device determines a plurality of words in the text based on a first vocabulary including a plurality of non-rare words and a second vocabulary including a plurality of rare words and entity categories to which the plurality of rare words belong.

In the embodiment of the application, the text is a text of an entity to be identified and an intention, the text can be a word, a sentence or a section of speech, and the text can be a text containing a question, so that an answer to the question contained in the text can be obtained based on an entity identification result and an intention identification result of the text.

In the embodiment of the present application, the non-rare words are words with a frequency of use greater than a preset frequency of use, and the rare words are words with a frequency of use not greater than the preset frequency of use, for example, the first vocabulary includes non-rare words such as "today", "some", "eat", and the like. The second vocabulary includes rare words and entity categories of rare words such as "photophobia (symptoms), whispering (symptoms), bond teeth (diseases)".

In an embodiment of the present application, the plurality of rare words are rare words in at least one domain; for example, the plurality of rare words are each rare words in the medical field, or the plurality of rare words are rare words in the plurality of fields of medical, agricultural, education, and the like.

202. The computer device obtains feature vectors for a plurality of words, the feature vectors for rare words in the plurality of words being based on a feature vector representation of an entity class to which the rare words belong.

In the embodiment of the application, the feature vector of the entity class to which the rare word belongs is used for representing the entity class to which the rare word belongs, and the feature vector can effectively represent the rare word. The feature vector of a non-rare word of the plurality of words is determined based on the feature vector of the non-rare word, the feature vector of the non-rare word being used to represent the non-rare word.

203. The computer device obtains feature vectors for a plurality of words in the text.

In an embodiment of the application, the feature vector of each word is used to represent the word.

204. The computer device obtains a sequence of hidden vectors of the text based on the feature vectors of the plurality of words and the feature vectors of the plurality of words, the sequence of hidden vectors comprising a sequence of hidden vectors of the words and a sequence of hidden vectors of the words, the sequence of hidden vectors of the words comprising hidden vectors of the plurality of words.

In the embodiment of the application, the hidden vectors of the plurality of words are sequentially arranged to form a hidden vector sequence of the words, and the hidden vectors of the plurality of words are sequentially arranged to form a hidden vector sequence of the words. The hidden vector of a word is used to represent the word and the hidden vector of a word is used to represent the word.

205. The computer equipment carries out entity recognition on the text based on the hidden vector sequence of the words in the hidden vector sequence, and an entity recognition result is obtained.

In the embodiment of the application, the entity recognition result is used for indicating which entity categories the words in the text belong to respectively.

206. The computer equipment carries out intention recognition on the text based on the hidden vector sequence to obtain an intention recognition result.

In an embodiment of the present application, the intention recognition result is used to indicate which intention categories the text belongs to.

The above-mentioned fig. 2 is a basic flow of the entity and intention recognition method provided by the embodiment of the present application, and the entity and intention recognition method is further described below based on fig. 3. Fig. 3 is a flowchart of a method for identifying an entity and an intention, according to an embodiment of the present application, in this embodiment, taking an object identification model shown in fig. 4 as an example, to obtain an entity identification result and an intention identification result, the object identification model includes an input module 401, an encoding module 402, an entity identification module 403, and an intention identification module 404.

Referring first to FIG. 3, an example of execution by a computer device is illustrated in an embodiment of the present application. The method comprises the following steps:

301. the computer device inputs the text into an input module of the object recognition model, through which feature vectors of the plurality of words and feature vectors of the plurality of words are obtained.

In the embodiment of the application, after the computer equipment inputs the text into the target recognition model, a plurality of words in the text need to be determined first. Correspondingly, the process of inputting the text into the input module of the target recognition model by the computer equipment to obtain the feature vectors of a plurality of words and the feature vectors of a plurality of words through the input module comprises the following steps: the computer device inputs the text into an input module of the target recognition model, and determines a plurality of words in the text based on the first word list and the second word list through the input module, and obtains feature vectors of the plurality of words and feature vectors of a plurality of words in the text.

In the embodiment of the application, the first word list and the second word list are arranged, so that any word in the first word list and the second word list can be found out as long as the word appears in the input text, and the comprehensiveness of determining the words in the text is ensured. For example, for the entered text "today's photophobia", the words "today", "spot" and "photophobia" are words in the first vocabulary and the second vocabulary, respectively, and thus they are found out and determined as words in the text.

In some embodiments, the computer device sets a word vector table in the input module to determine feature vectors for the plurality of words based on the word vector table. Correspondingly, the process of acquiring the feature vectors of the words by the computer equipment through the input module comprises the following steps: the computer equipment queries feature vectors of a plurality of words from a vector word list through an input module, wherein the word vector list comprises feature vectors of a plurality of non-rare words and feature vectors of entity categories to which the plurality of rare words belong.

Wherein, for each entity class, the rare words of that entity class share a feature vector. For example, for the entity class "symptom", the rare words belonging to the symptom class share a feature vector, such as the rare words belonging to the symptom class "photophobia", "soma", etc., are represented using the feature vector of the symptom class. For another example, for the entity class "disease," rare words belonging to the disease class share a feature vector, such as the rare word "glass doll" belonging to the disease class, etc., represented using the feature vector of the disease class.

While in some embodiments, rare words may not be categorized into an entity category, the non-categorizable rare words are optionally collectively represented by a common feature vector that is used to represent that the rare words are not categorized into an entity category. In the embodiment of the application, the efficiency of acquiring the feature vectors of a plurality of words is improved by setting the word vector table, the feature vectors of rare words are expressed based on the feature vectors of the entity categories, and the related information of the rare words in the text can be acquired.

In the embodiment of the application, the feature vectors in the word vector table are organized into a matrix form so as to be part of a target recognition model and participate in model training to realize the determination of the feature vectors of a plurality of words through an input module.

In the embodiment of the application, whether one word is a rare word is judged based on the use frequency of the word; then optionally, a determination is made as to whether a word is a rare word, based on its number of occurrences in the training sample set, which can be indicative of its frequency of use. For example, it is determined to be a non-rare word in the case where the number of occurrences thereof is greater than a preset fixed threshold value, and it is determined to be a rare word in the case where the number of occurrences thereof is not greater than the preset fixed threshold value. The training sample set is used for model training of the target recognition model and comprises a plurality of training samples, and each training sample is a text. In the embodiment of the application, the rare words and the non-rare words are determined based on the training sample set, so that the determined rare words and the non-rare words are more matched with the target recognition model.

In the embodiment of the application, because the first word list comprises a plurality of non-rare words, the second word list comprises a plurality of rare words and entity categories of the rare words, and the word vector list comprises feature vectors of the non-rare words and feature vectors of entity categories to which the rare words belong, and further the feature vectors of the words are obtained based on the word vector list, the target recognition model can effectively carry out joint utilization on any word list and word vector list, so that even if the text comprises entity words which rarely occur or even never occur in training data, the target recognition model can obtain corresponding information from the word vector list as long as the words can be found in the second word list, and more accurate recognition results can be obtained when the entity and intention are recognized. And because the rare words in the same entity class share one feature vector, after model training is completed, new rare words can be dynamically added in the second word list continuously, and the new rare words can be represented by the trained feature vectors in the entity class without retraining the whole model, so that the flexibility and convenience of model updating are improved.

In some embodiments, the computer device sets a word vector table in the input module, and determines feature vectors for the plurality of words based on the word vector table. Accordingly, a process for obtaining feature vectors of a plurality of words in a text by a computer device includes the steps of: the computer device converts the text into a sequence of words, and queries the feature vector of each word in the sequence from a word vector table, i.e. the feature vectors of a plurality of words are obtained, the word vector table comprising the feature vectors of the plurality of words.

In the embodiment of the application, the feature vectors in the word vector table and the word vector table are feature vectors trained in advance, and any feature vector can be a vector obtained in an embedding way. Therefore, by setting the word vector table and the word vector table, the efficiency of acquiring the feature vector is improved, and the recognition efficiency can be improved.

302. The computer equipment inputs the feature vectors of the plurality of words and the feature vectors of the plurality of words into a coding module, and attention feature extraction is carried out on the feature vectors of the plurality of words and the feature vectors of the plurality of words through the coding module to obtain a hidden vector sequence.

In the embodiment of the application, the computer equipment inputs the feature vectors of the plurality of words and the feature vectors of the plurality of words into a coding module, and the process of extracting the attention features of the feature vectors of the plurality of words and the feature vectors of the plurality of words to obtain the hidden vector sequence through the coding module comprises the following steps A1-A3.

A1: the computer equipment inputs the feature vectors of the plurality of words and the feature vectors of the plurality of words into the encoding module; and carrying out linear transformation on the first feature matrix through the coding module to obtain a second feature matrix, wherein each row vector in the first feature matrix is a feature vector of a word or a feature vector of a word.

In the embodiment of the present application, the first feature matrix is denoted as X,n ₁ representing the number of rows, d, of the first feature matrix ₁ Columns representing a first feature matrixThe number, R, represents a real number.

In some embodiments, the computer device performs a linear transformation on the first feature matrix through the encoding module by the following equation (1).

V ₁ ＝XW _v (1)；

Wherein V is ₁ Represents the second feature matrix, X represents the first feature matrix, W _v Representing model parameters for linear transformation.

In the embodiment of the application, the characteristic vectors of a plurality of words and the characteristic vectors of a plurality of words are used as the input of the coding module, so that the situation that a sentence in Chinese possibly has a plurality of word combination modes is effectively solved, and a more comprehensive and accurate hidden vector sequence can be obtained.

A2: the computer equipment obtains a weight matrix based on a plurality of row vectors in the first feature matrix through the coding module, and each element in the weight matrix represents the association degree between two row vectors in the first feature matrix.

In the embodiment of the application, each element in the weight matrix is obtained not only based on the row vector in the first feature matrix, but also based on the positions of the words and the words in the text. Correspondingly, the determining process of the elements positioned in the ith row and the jth column in the weight matrix comprises the following steps: the method comprises the steps that computer equipment obtains a first position and a second position, wherein the first position is the position of a character or word indicated by an ith row vector in a first feature matrix in a text, the second position is the position of the character or word indicated by a jth row vector in a second feature matrix in the text, and i and j are integers larger than 0; determining a position vector based on a position difference of the first position and the second position; the computer equipment fuses the ith row vector, the jth row vector and the position vector to obtain elements positioned in the ith row and the jth column in the weight matrix. In the embodiment of the application, the element in the weight matrix is determined based on the positions of the words and the characters in the text, so that the information represented by the element is more rich and comprehensive.

In the embodiment of the application, the first position and the second position both comprise a starting position of a word or a word indicated by the vector in the text and an ending position of the word or the word indicated by the vector in the text. For example, in the text "today is somewhat photophobic", the start position of "photophobic" is 5 and the end position is 6. Accordingly, the position difference of the first position and the second position includes the position difference of the start position and the position difference of the end position.

In an embodiment of the application, the computer device extracts the position of each word and word in the text through the input module and then inputs the extracted position into the encoding module. The encoding module obtains a first location and a second location from a plurality of words and a plurality of word locations. For example, with continued reference to fig. 1, the input module extracts the start and end positions of each sub-word and word in the text "today's somewhat photophobic", and inputs them into the encoding module along with the feature vectors of the plurality of words and the feature vectors of the plurality of words. Thus, the position of the word or the word is extracted through the input module, the word or the word is preprocessed, and the data processing efficiency can be improved in the encoding module.

In some embodiments, the computer device determines, by the encoding module, a position vector based on a position difference of the first position and the second position, by the following equation (2).

Wherein R is _ij Representing a position vector obtained based on the ith row vector and the jth row vector in the first feature matrix, L _i Represents the starting position in the first position, L _j Represents the starting position in the second position, R _i Represents the end position in the first position, R _j Represents the end position in the second position, W _r Representing the model parameters of the determined position vector, The symbols represent vector concatenation, the PE represents converting a position difference into a vector, and the ReLU represents a nonlinear function for deriving the position vector.

In some embodiments, the PE is implemented by the following equation (3) when converting a position difference into a vector.

Wherein x represents a position difference, d ₂ Representing the dimension of the transformed vector, PE _2k Representing the 2k element in the transformed vector, PE _2k+1 Representing 2k+1th element in the transformed vector.

In some embodiments, the computer device fuses the ith row vector, the jth row vector and the position vector through the encoding module to obtain elements located in the ith row and the jth column in the weight matrix, and the elements are implemented through the following formula (4):

wherein A is _ij Representing elements in the ith row and the jth column in the weight matrix, X _i Represents the ith row vector, X _j Represents the j-th row vector, R _ij Representing a position vector, T representing a transpose of the vector or matrix, W _v 、W _q 、W _k,R 、W _k,E U, v denote model parameters that determine the elements in the weight matrix.

In the embodiment of the present application, each element in the weight matrix represents a degree of association between two row vectors in the first feature matrix, and the degree of association may be the magnitude of influence of one vector on another vector. For example, for element A _ij The magnitude of the influence of the jth row vector on the ith row vector may be represented.

A3: the computer equipment activates the weight matrix through the coding module, and based on the activated weight matrix and the second feature matrix, the hidden vector sequence is obtained.

In some embodiments, the above computer device performs activation processing on the weight matrix through the encoding module, and obtains the hidden vector sequence based on the activated weight matrix and the second feature matrix, which is implemented by the following formula (5).

H＝softmax(A)V ₁ (5)；

Wherein H represents a matrix formed by a sequence of hidden vectors, each hidden vector in the sequence of hidden vectors is a row vector in the matrix, softmax represents an activation function, A represents a weight matrix, and V ₁ Representing a second feature matrix.

The above steps A1-A3 are described by taking the hidden vector sequence obtained based on the single-head attention mechanism as an example. In other embodiments, the encoding module obtains the sequence of hidden vectors based on a Multi-head Attention mechanism (Multi-head Attention), the encoding module including an Attention machine sub-module including a Multi-head Attention unit therein. Correspondingly, the computer equipment inputs the feature vectors of a plurality of words and the feature vectors of a plurality of words into the attention machine submodule in the coding module, the hidden vector sequence output by each attention unit is obtained through the multi-head attention units in the attention machine submodule, and then the hidden vector sequence of the text is obtained based on the hidden vector sequences respectively output by the multi-head attention units. The attention mechanism submodule is used for splicing the hidden vector sequences respectively output by the multi-head attention units to obtain the hidden vector sequences of the text.

For example, the sequence of hidden vectors of text is denoted as H,n ₂ line number, d, of hidden vector sequence representing text ₃ The number of columns of hidden vector sequences representing text, the hidden vector sequence output by each attention unit is denoted as H',m*d ₄ ＝d ₃ the number of lines of the hidden vector sequence output by each attention unit is equal to that of the text hidden vector sequence, d ₄ The number of columns of the hidden vector sequence output by each attention unit is represented, m represents the number of the attention units with multiple heads, and H is the final output of the attention machine submodule.

In the embodiment of the application, the multi-head independent attention output is connected in series through the multi-head attention mechanism, namely the multi-head attention unit allows attention operation with different emphasis points to different parts of the first feature matrix, so that the accuracy of the determined hidden vector sequence can be improved.

In some embodiments, the encoding module uses a Transformer Encoder (Chinese encoder) model for attention feature extraction, such as may use structures in a Flat-Lattice Transformer (a Chinese entity recognition model) model. Accordingly, any word list and vector list in the embodiment of the application correspond to Chinese characters or words, so that the problem that rare words in the Chinese are difficult to identify can be optimized. And the Transformer Encoder model is a small neural network model based on Transformer Encoder model, has the advantages of small reference quantity, small occupied memory space during operation, high recognition speed and the like.

In the embodiment of the application, the hidden vector sequence is obtained by extracting the attention characteristic of the characteristic vector of the word and the characteristic vector of the character in the text, so that the hidden vector sequence can more accurately express the word and the character in the text.

303. The computer equipment inputs the hidden vector sequence of the word into the entity recognition module, and the entity recognition result is obtained through the entity recognition module.

In the embodiment of the application, the entity recognition result is a target tag sequence of the text, and the target tag sequence comprises category tags of a plurality of words, wherein the category tag of each word is used for representing the entity category of the target word where the word is located and the position of the word in the target word. Correspondingly, the process of inputting the hidden vector sequence of the word into the entity recognition module by the computer equipment and obtaining the entity recognition result by the entity recognition module comprises the following steps B1-B3.

B1: the computer device inputs a sequence of hidden vectors of words into an entity recognition module by which, for any one of a plurality of words, a degree parameter for each of the plurality of predetermined category labels, the degree parameter being indicative of a degree to which the word is labeled as being appropriate for the predetermined category label, is determined based on the hidden vector of the word and the predetermined category label of the word having a predetermined position difference from the word.

In the embodiment of the application, the entity identification module adopts a conditional random field (Conditional Random Fields) model to label the text. Accordingly, the category labels are BIO (begin, interior, exterior) labels, including B-t labels, I-t labels and O labels, and the word labeled B-t labels represents that the entity category to which the target word to which the word belongs is t, and the word is located at the first position of the target word. The word marked with the I-t label indicates that the entity class to which the target word where the word belongs is t, and the word is not located at the first position of the target word. The O-tagged word indicates that the word does not belong to any entity class. For example, with continued reference to FIG. 1, the entity recognition module labels each word in the text "today's bit photophobia" with a category label B-time, I-time, O, O, B-symptom, I-symptom, respectively.

In an embodiment of the application, the sequence of hidden vectors of words is expressed as Hidden vectors, n, each representing a word _c Is the number of words. The target tag sequence is expressed asEach representing a preset category label of a word.

In the embodiment of the present application, the word having the preset position difference with the word may be at least one of a word having one position difference with the word, a word having two position differences with the word, a word having three position differences with the word, and the like, before the word; in the embodiment of the present application, a word having a predetermined position difference from the word is described as a word preceding the word and having a position difference from the word.

Correspondingly, the computer equipment determines the degree parameters of the words marked as a plurality of preset category labels respectively according to the hidden vector of the word and the preset category label of the word with the preset position difference of the word for any word in a plurality of words through the entity recognition module, and the degree parameters are realized through the following formula (6).

f(y,y′,h)＝exp(U _y,y′ +W _y h+b _y ) (6)；

Wherein h represents the hidden vector of the word, y' represents the category label of the previous word in the two words, y represents the preset category label of the next word in the two words, U _y,y′ Representing the conversion parameters, W, from the preset class label y to the preset class label y _y And b _y The weight vector and bias representing the corresponding class label y, f (y, y ', h) represents the fitness of the word marked as the preset class label y in the case where the hidden vector of the current word is h and the preset class label of the previous word is y'. Wherein, in the case that any word is the first word in the text, the preset category label of the previous word is a fixed initial category label y which is convenient to calculate ₀ 。

B2: the computer device is used for marking the degree parameters of the plurality of words as the plurality of preset category labels and the plurality of candidate label sequences through the entity recognition module, and the probability of the plurality of candidate label sequences is determined respectively, and the plurality of candidate label sequences are obtained based on the combination of the plurality of preset category labels.

In some embodiments, the probability of a candidate tag sequence represents the likelihood that it is a true tag sequence of text. The probability of each candidate tag sequence is based on the product of the degree parameters of the plurality of words respectively marked as the preset category tags in the candidate tag sequence. Accordingly, the determination process of the probability of each candidate tag sequence comprises the following steps: the computer equipment calculates products of the degree parameters of the preset category labels in the candidate label sequences, which are marked as the candidate label sequences, through the entity identification module, and takes the products as the probability of the candidate label sequences, so that the rationality and convenience of determining the probability of the candidate label sequences are improved.

In some embodiments, for each candidate tag sequence, the probability of that candidate tag sequence is determined based on equation (7) below.

Wherein p (y|H _c ) Representing the probability of a candidate tag sequence y, f (y _i ,y _i-1 ,h _i ) The degree parameter of the label of a certain preset category is marked as the i-th word in the text.

In the embodiment of the application, the plurality of candidate tag sequences may be determined in advance, and the number of preset category tags in the plurality of candidate tag sequences may correspond to texts with different word numbers. Accordingly, the computer device may select a candidate tag sequence corresponding to the number of words of the text to be recognized from the plurality of candidate tag sequences, and perform entity recognition based on the selected candidate tag sequence, so that the range of the candidate tag sequence is reduced, and further, the efficiency of entity recognition can be improved.

B3: and the computer equipment outputs the candidate tag sequence with the highest probability of the plurality of candidate tag sequences as a target tag sequence through the entity identification module.

It should be noted that, since a word in a text is generally composed of a plurality of words, and thus, preset category labels of a plurality of words composing the word are generally matched, a plurality of preset category labels generally appear in pairs, and a category label pair may be composed. As for "today", the preset category labels for the two words are "B-time" and "I-time", respectively, which occur in pairs. Some category label pairs are easy to appear, and some category label pairs are not easy to appear, so that in the embodiment of the application, the degree parameters of the words marked as a plurality of preset category labels respectively are determined based on the preset category labels of the words before the words, the matching between the preset category labels of the two words is fully considered, the marking accuracy can be further improved, and the accuracy of the determined target label sequence can be further improved.

304. The computer device inputs the hidden vector sequence into an intention recognition module, and an intention recognition result is obtained through the intention recognition module.

In an embodiment of the present application, the intent recognition result includes probabilities of a plurality of intent categories, the probability of each intent category representing the likelihood that the intent category is a true intent category of text. Accordingly, the above computer device inputs the hidden vector sequence into the intention recognition module, and the process of obtaining the intention recognition result through the intention recognition module includes the following steps C1 to C6.

C1: the computer equipment inputs the hidden vector sequence into an intention recognition module; and determining a plurality of target preset category labels from the plurality of preset category labels through the intention recognition module, wherein any one preset category label indicates the entity category of the target word where the marked word is located and the position of the word in the target word, and the target preset category label indicates that the marked word is located at the first position of the target word.

In the embodiment of the application, the target preset category label refers to a B-t label, the word marked with the B-t label indicates that the entity category to which the target word where the word belongs is t, and the word is positioned at the head of the target word.

C2: for each target preset category label in the plurality of target preset category labels, the computer equipment determines the probability that the plurality of words are respectively marked as the target preset category label based on the hidden vectors of the plurality of words through the intention recognition module.

In an embodiment of the present application, the process of determining, by the intention recognition module, a probability that a plurality of words are respectively labeled as target preset category labels based on hidden vectors of the plurality of words includes the following steps: the intention recognition module determines, for each word of the plurality of words, a probability that the word is labeled as a target preset category label based on the hidden vector of the word and a target model parameter, which is a model parameter obtained from the entity recognition module for determining the probability that the word is labeled as any preset category label. In the embodiment of the application, the target model parameters are directly obtained from the entity recognition module, so that the aim that the target model parameters need to be trained by the intended recognition module is avoided, and the training efficiency of the model is improved.

In the embodiment of the application, because the probability of determining that the word is marked as the target preset class label based on the conditional random field model is complex and can influence the recognition time, the probability of marking the word as the target preset class label is determined only based on the hidden vector of the word and the target model parameter, thereby improving the recognition efficiency.

In the embodiment of the application, the target model parameters are acquired from the entity identification module and are used for determining the probability that the word is marked as any preset category label. Accordingly, the computer device determines, for each word, the probability that the word is labeled as the target preset category label based on the hidden vector of the word and the target model parameter through the intention recognition module, which can be achieved by the following (8).

Wherein,probability that the representation word is marked as target preset category label, h _i Hidden vector representing the i-th word, W _y And b _y Representing the target model parameters, and W in the formula (6) _y And b _i The same applies.

And C3: the computer equipment determines the labeling probability of the target preset category label in the text based on the labeling probability of the plurality of words as the target preset category label through the intention recognition module.

In the embodiment of the present application, the computer device determines, through the intention recognition module, the probability that the target preset category label is labeled in the text based on the probability that the plurality of words are labeled as the target preset category label, and is implemented based on the following formula (9).

P(t)＝1-∏ _i (1-p _i ) (9)；

Wherein P (t) represents the probability that the target preset category label is marked in the text, and P _i The probability that a representation word is labeled as a target preset class label.

And C4: the computer device determines, through the intent recognition module, a first probability vector based on probabilities that the multiple target preset category labels are annotated in the text.

In the embodiment of the application, each element in the first probability vector is a probability that a target preset category label is marked in a text.

In the embodiment of the application, the computer equipment forms the probability of labeling each target preset category label in the text into a vector form through an intention recognition module to obtain a first probability vector.

C5: the computer equipment extracts attention features of the hidden vector sequence through the intention recognition module to obtain an attention vector.

In an embodiment of the application, the intent recognition module derives an attention vector based on an attention mechanism. Correspondingly, the computer equipment extracts attention characteristics of the hidden vector sequence through the intention recognition module to obtain an attention vector, and the method comprises the following steps of: the computer equipment carries out linear transformation on the first hidden vector matrix through the intention recognition module to obtain a second hidden vector matrix, wherein each row vector in the first hidden vector matrix is one hidden vector in the hidden vector sequence; the computer equipment obtains a weight vector based on a plurality of row vectors in the first hidden vector matrix through the intention recognition module, and each element in the weight vector represents the importance degree of one row vector in the first hidden vector matrix; the computer equipment activates the weight vector through the intention recognition module, and obtains the attention vector based on the activated weight vector and the second hidden vector matrix.

The intention recognition module comprises an attention machine sub-module, wherein the computer equipment inputs the hidden vector sequence into the attention machine sub-module, and the attention vector is obtained through the attention machine sub-module.

In some embodiments, the computer device performs linear transformation on the first hidden vector matrix through the intention recognition module to obtain a second hidden vector matrix, which is implemented through the following formula (10).

V ₂ ＝HW _ID,v (10)；

Wherein V is ₂ Represents a second hidden vector matrix, H represents a first hidden vector matrix, W _ID,v Representing model parameters for linear transformation.

In some embodiments, each element in the weight vector is obtained by the following equation (11).

Wherein a is _i Represents the ith element, H, in the weight matrix _i Representing the ith row vector, W, in the first hidden vector matrix _ID,q 、W _ID,k 、W _ID,v 、u _ID The representation of the trainable model parameters may be a matrix or a vector.

In some embodiments, the computer device performs activation processing on the weight vector through the intention recognition module, and obtains the attention vector based on the weight vector and the second hidden vector matrix after the activation processing, which is implemented through the following formula (12).

h＝softmax(a)V ₂ (12)；

Wherein h represents an attention vector, a represents a weight vector, softmax represents an activation function, V ₂ Representing a second hidden vector matrix.

In the embodiment of the present application, the above steps are described by taking the attention machine submodule as an example of a single-layer structure. In other embodiments, the attention machine sub-module is a multi-layer structure, and for each layer, the output of the previous layer is taken as the input of the layer, so as to obtain the attention vector of the layer until the attention vector is output through the last layer. For example, z in the formula (11) is the vector output from the previous layer. For the first layer in the multi-layer structure, z is a randomly initialized vector.

In some embodiments, the intent recognition module derives the attention vector based on a multi-headed attention mechanism. Accordingly, the computer device inputs the hidden vector sequence into the attention machine sub-module of the intention recognition module, attention vectors output by each attention unit are obtained through the multi-head attention units in the attention machine sub-module, and attention vectors of texts are obtained based on the attention vectors respectively output by the multi-head attention units. The attention mechanism submodule is used for splicing the attention vectors respectively output by the multi-head attention units to obtain the attention vectors of the text.

In the embodiment of the application, the multi-head independent attention output is connected in series through the multi-head attention mechanism, namely the multi-head attention unit allows attention operation with different emphasis points on different parts of the hidden vector sequence, so that the accuracy of the determined attention vector can be improved.

C6: the computer equipment fuses the first probability vector and the attention vector through the intention recognition module to obtain a second probability vector, wherein the second probability vector comprises probabilities of a plurality of intention categories.

In some embodiments, the computer device fuses, through the intent recognition module, the first probability vector and the attention vector to obtain a second probability vector, which is implemented by the following formula (13).

Wherein p is _ID Representing a second probability vector, sigmoid representing the activation function, W and b representing the trainable model parameters, h _e Represents the attention vector, h _p A first probability vector is represented and a second probability vector is represented,the symbols represent vector concatenation. />

In the embodiment of the application, the second probability vector is obtained based on the first probability vector and the attention vector, and the first probability vector is based on the probability that a plurality of words in the text are respectively marked as the target preset category labels, so that the result of entity recognition is also considered when the intention recognition is carried out, and the accuracy of the intention recognition can be further improved.

In some embodiments, the intent recognition module includes a classification head sub-module by which the first probability vector and the attention vector are fused to obtain the second probability vector. With continued reference to fig. 4, the intent recognition module includes an attention machine sub-module and a classification head sub-module that derives a second probability vector based on the first probability vector and the attention vector output by the attention machine sub-module. Optionally, the computer device may further output an intention category corresponding to a maximum probability in the second probability vector to obtain an intention category to which the text most likely belongs. In the embodiment of the application, the classifying head sub-module can use a single-layer full-connection layer or a multi-layer perceptron structure.

It should be noted that, in the embodiment of the present application, the execution sequence between the steps 303 to 304 may be changed, and the step numbers are for convenience of description, and do not limit the execution sequence of the steps. Step 303 may be performed before step 304, after step 304, or simultaneously with step 304. In the embodiment of the present application, the step 303 and the step 304 are executed simultaneously, which is described as an example, so that the entity identification and the intention identification can be performed simultaneously, and the identification efficiency is improved. And based on the target recognition model, entity recognition and intention recognition are jointly carried out, so that information required by the entity recognition and the intention recognition can be obtained only by executing the steps 301-302 once, the complicated process that two modules need to execute the steps 301-302 respectively is omitted, and the intention recognition module also utilizes the result of part of entity recognition, so that a better intention recognition effect can be achieved.

In the embodiment of the application, the target recognition model is obtained by training based on a plurality of training samples in a training sample set, and the training process of the target recognition model comprises the following steps: the method comprises the steps that training samples are input into a target recognition model by computer equipment, a predicted entity recognition result is obtained through an entity recognition module, a first loss value is determined based on the predicted entity recognition result and a real entity recognition result, and the first loss value represents the difference between the predicted entity recognition result and the real entity recognition result; the computer equipment obtains a predicted intention recognition result through the intention recognition module, and determines a second loss value based on the predicted intention recognition result and the real intention recognition result, wherein the second loss value represents the difference between the predicted intention recognition result and the real intention recognition result; the computer equipment performs weighted summation on the first loss value and the second loss value to obtain a third loss value; the computer device adjusts model parameters of the object recognition model based on the third loss value.

In some embodiments, the first loss value is obtained using a Negative Log-Likelihood as a model training loss function for the entity identification module, i.e., by this function. See formula (14) below.

L _ER ＝-∑ _x logp(y _x |H _c,x ) (14)；

Wherein L is _ER Representing a first loss value, x represents a training sample, y _x Representing the true tag sequence of training sample x, H _c,x A sequence of hidden vectors representing words of training sample x.

In some embodiments, the loss function is trained using Binary Cross Entropy (binary cross entropy) as a model for the intent recognition module, i.e. the second loss value is derived by this function. See formula (15) below.

L _ID ＝∑ _x ∑ _y y _x logp _ID (x,y)+(1-y _x )log(1-p _ID (x,y)) (15)；

Wherein L is _ID Representing a second loss value, x representing a training sample, y representing an intent class, y _x For indicating whether the training sample x belongs to the intention category y, if so, y _x =1, otherwise y _x ＝0，p _ID (x, y) represents the element of the second probability vector of training sample x that corresponds to the intent class y.

It should be noted that, in the above embodiment, the second probability vector and the second loss value are obtained by taking an example that one text may belong to multiple intention categories. Under the condition that one text can only belong to one intention category, changing the sigmoid function in the formula (13) for determining the second probability vector into a softmax function and changing the loss function of the intention recognition module into a Cross Entropy function, thereby improving the flexibility of intention recognition and model training.

In an embodiment of the present application, the computer device performs weighted summation on the first loss value and the second loss value to obtain a third loss value, which is implemented by the following formula (16).

L＝λL _ER +(1-λ)L _ID (16)；

Wherein L represents a third loss function, L _ER Represents a first loss value, L _ID Represents the second loss value, λ represents the weight of the first loss value, and 1- λ represents the weight of the second loss value. Wherein lambda is E [0,1 ]]Is a super parameter for adjusting the importance degree of the entity recognition module and the intention recognition module during model training.

The training process is an iterative training process, and when the iteration stop condition is reached, the training is stopped. The iteration stop condition includes: the iteration times reach the preset iteration times, the third loss value reaches convergence, and the third loss value reaches the preset loss value.

In the embodiment of the application, in the training process of the target recognition model, the model parameters can be updated by adopting gradient descent methods such as Stochastic Gradient Descent (random gradient descent method), adam (adaptive moment estimation) and the like.

In some embodiments, because the entity recognition capability is poor in the initial stage of model training, in order to enable the intention recognition module to better learn how to utilize the model parameters of the entity recognition module, the entity recognition module is trained by using the second loss value, and then combined training is performed based on the two loss values, that is, the target recognition model is trained to perform entity recognition and intention recognition simultaneously, so that the training effect of the target recognition model can be improved.

By the method provided by the embodiment of the application, the identification of the entity and the intention can be completed quickly, and the identification accuracy under the condition that rare words appear in the text can be improved by utilizing the second word list which can be expanded in real time.

In the embodiment of the application, in order to show the effect of the method provided by the embodiment of the application on the aspects of entity identification and intention identification, experiments are carried out by using entity identification and intention identification data in a hospital customer service questioning and answering scene. The data includes 55 intent categories and 20 entity categories, including about 6 tens of thousands of manually labeled questions, each of which may have one or more correct intent categories, the questions being divided into three parts of training, development and test sets in an 8:1:1 ratio. The results of comparing the method provided using the embodiment of the present application with the method of performing only intention recognition and only entity recognition are shown in table 1 below. As can be seen from the table, compared with the method for solely carrying out the intention recognition, the method provided by the embodiment of the application can greatly improve the effect of the intention recognition, and compared with the method for solely carrying out the entity recognition, the method provided by the embodiment of the application is also better in the effect of the entity recognition. The F1 Score (F1 Score) is an index used in statistics to measure the accuracy of the two classification models, and combines the accuracy and recall.

TABLE 1

In some embodiments, the effect of the methods provided by embodiments of the present application can also be seen based on the two texts in table 2. The "no second vocabulary" method in table 2 represents a method obtained by removing a portion in the second vocabulary in the method provided in the embodiment of the present application, and the "second vocabulary" method represents a method provided in the embodiment of the present application. In both the text of table 2, "photophobia" and "bond tooth" are rare words, and the method without using the second vocabulary makes incorrect entity recognition and intention recognition due to the lack of information of these rare words, and the correct recognition can be made by the method provided by the embodiment of the present application.

TABLE 2

The method provided by the embodiment of the application can be applied to a medical question-answering system. For one question of the input system, the method provided by the embodiment of the application can identify the corresponding entity category and intention, and the identified result corresponds to the answer question and plays an important role in the downstream AI algorithm. The method provided by the embodiment of the application can be used as a key technology in intelligent diagnosis guiding and intelligent hospital question answering assistance. For example, for an input question "pediatric in place", the method provided by the embodiment of the application can identify that the most likely intention of the question is "question road", wherein one key entity is "pediatric" with an entity class of "department", so that the purpose of judging the question is to know the position of the department "pediatric" in a hospital, and thus give a correct answer according to the corresponding answer preconfigured in the system. Referring to fig. 5, fig. 5 is an interface schematic diagram of an application scenario provided in an embodiment of the present application.

Fig. 6 is a block diagram of an entity and intent recognition device provided in accordance with an embodiment of the present application. The device is used for executing the steps when the entity and intention identification method is executed, and referring to fig. 6, the device comprises:

a word determining module 601, configured to determine a plurality of words in the text based on a first vocabulary and a second vocabulary, where the first vocabulary includes a plurality of non-rare words, and the second vocabulary includes a plurality of rare words and entity categories to which the plurality of rare words belong;

a feature vector obtaining module 602, configured to obtain feature vectors of a plurality of words, where feature vectors of rare words in the plurality of words are represented based on feature vectors of entity classes to which the rare words belong;

The feature vector obtaining module 602 is further configured to obtain feature vectors of a plurality of words in the text;

the hidden vector sequence determining module 603 is configured to obtain a hidden vector sequence of the text based on the feature vectors of the plurality of words and the feature vectors of the plurality of words, where the hidden vector sequence includes a hidden vector sequence of the words and a hidden vector sequence of the words, the hidden vector sequence of the words includes hidden vectors of the plurality of words, and the hidden vector sequence of the words includes hidden vectors of the plurality of words;

the entity recognition module 604 is configured to perform entity recognition on the text based on the hidden vector sequence of the words in the hidden vector sequence, so as to obtain an entity recognition result;

the intention recognition module 605 is configured to perform intention recognition on the text based on the hidden vector sequence, and obtain an intention recognition result.

the feature vector obtaining module 602 is configured to input a text into an input module of the target recognition model, and obtain feature vectors of a plurality of words and feature vectors of a plurality of words through the input module;

the hidden vector sequence determining module 603 is configured to input feature vectors of a plurality of words and feature vectors of a plurality of words into the encoding module, and perform attention feature extraction on the feature vectors of the plurality of words and the feature vectors of the plurality of words through the encoding module to obtain a hidden vector sequence;

The entity recognition module 604 is configured to input the hidden vector sequence of the word into the entity recognition module, and obtain an entity recognition result through the entity recognition module;

the intention recognition module 605 is configured to input the hidden vector sequence into the intention recognition module, and obtain an intention recognition result through the intention recognition module.

In some embodiments, the feature vector acquisition module 602 is configured to:

inputting the feature vectors of the plurality of words and the feature vectors of the plurality of words into an encoding module;

and performing activation processing on the weight matrix, and obtaining a hidden vector sequence based on the weight matrix and the second feature matrix after the activation processing.

In some embodiments, the weight matrix includes a plurality of elements, a feature vector acquisition module 602 to:

acquiring a first position and a second position, wherein the first position is the position of a character or word indicated by an ith row vector in a first feature matrix in a text, the second position is the position of the character or word indicated by a jth row vector in a second feature matrix in the text, and i and j are integers larger than 0;

In some embodiments, the entity recognition result is a target tag sequence of text, the target tag sequence includes category tags of a plurality of words, and the category tag of each word is used for representing an entity category of a target word where the word is located and a position of the word in the target word;

an entity recognition module 604 for inputting the hidden vector sequence of words into the entity recognition module;

determining, by the entity recognition module, for any one of the plurality of words, a degree parameter of each of the plurality of preset class labels, based on the hidden vector of the word and the preset class label of the word having a preset position difference from the word, the degree parameter of each of the plurality of preset class labels being used for indicating a proper degree of the word being labeled as the preset class label;

And outputting the candidate tag sequence with the highest probability among the plurality of candidate tag sequences as a target tag sequence.

In some embodiments, the intent recognition result includes probabilities of multiple intent categories, intent recognition module 605:

inputting the hidden vector sequence into an intention recognition module;

determining multiple target preset category labels from multiple preset category labels through an intention recognition module, wherein any one preset category label indicates the entity category of a target word where a marked word is located and the position of the word in the target word, and the target preset category label indicates that the marked word is located at the first position of the target word;

determining the probability of labeling the target preset category label in the text based on the probability of labeling the plurality of words as the target preset category label respectively;

determining a first probability vector based on the probability that a plurality of target preset category labels are marked in the text;

extracting attention features of the hidden vector sequence to obtain attention vectors;

In some embodiments, the intent recognition module 605 is configured to determine, for each word of the plurality of words, a probability that the word is labeled as a target preset category label based on the hidden vector of the word and a target model parameter, the target model parameter being a model parameter obtained from the entity recognition module for determining the probability that the word is labeled as any preset category label.

In some embodiments, intent recognition module 605:

and performing activation processing on the weight vector, and obtaining the attention vector based on the weight vector and the second hidden vector matrix after the activation processing.

In some embodiments, the apparatus further comprises a training module for:

inputting a training sample into a target recognition model, obtaining a predicted entity recognition result through an entity recognition module, and determining a first loss value based on the predicted entity recognition result and a real entity recognition result, wherein the first loss value represents the difference between the predicted entity recognition result and the real entity recognition result;

Obtaining a predicted intention recognition result through the intention recognition module, and determining a second loss value based on the predicted intention recognition result and the real intention recognition result, wherein the second loss value represents a gap between the predicted intention recognition result and the real intention recognition result;

the first loss value and the second loss value are weighted and summed to obtain a third loss value;

The embodiment of the application provides an entity and intention recognition device, which can determine rare words because a plurality of words in a text are determined based on a first word list comprising non-rare words and a second word list comprising rare words, and the rare words can be determined, so that the determined words are more comprehensive. And the feature vector of the rare word is represented based on the feature vector of the entity class to which the rare word belongs, so that the related information of the rare word can be obtained, and more accurate recognition results can be obtained when the entity recognition and the intention recognition are carried out, thereby improving the accuracy of the entity recognition and the intention recognition.

Fig. 7 shows a block diagram of a terminal 700 according to an exemplary embodiment of the present application. The terminal 700 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, the terminal 700 includes: a processor 701 and a memory 702.

Processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 701 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 701 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and drawing of content that the display screen is required to display. In some embodiments, the processor 701 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. The memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one program code for execution by processor 701 to implement the method of identifying entities and intents provided by the method embodiments of the present application.

In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 703 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, a display 705, a camera assembly 706, audio circuitry 707, and a power supply 708.

A peripheral interface 703 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 701 and memory 702. In some embodiments, the processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 704 is configured to receive and transmit RF (Radio Frequency) signals, also referred to as electromagnetic signals. The radio frequency circuitry 704 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 704 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 704 may also include NFC (Near Field Communication ) related circuitry, which is not limiting of the application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 705 is a touch display, the display 705 also has the ability to collect touch signals at or above the surface of the display 705. The touch signal may be input to the processor 701 as a control signal for processing. At this time, the display 705 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 705 may be one and disposed on the front panel of the terminal 700; in other embodiments, the display 705 may be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in other embodiments, the display 705 may be a flexible display disposed on a curved surface or a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The display 705 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 706 is used to capture images or video. Optionally, the camera assembly 706 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing, or inputting the electric signals to the radio frequency circuit 704 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 707 may also include a headphone jack.

The power supply 708 is used to power the various components in the terminal 700. The power source 708 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 708 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 700 further includes one or more sensors 709. The one or more sensors 709 include, but are not limited to: acceleration sensor 710, gyro sensor 711, pressure sensor 712, optical sensor 713, and proximity sensor 714.

The acceleration sensor 710 may detect the magnitudes of accelerations on three coordinate axes of a coordinate system established with the terminal 700. For example, the acceleration sensor 710 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 710. Acceleration sensor 710 may also be used for the acquisition of motion data of a game or user.

The gyro sensor 711 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 711 may collect a 3D motion of the user on the terminal 700 in cooperation with the acceleration sensor 710. The processor 701 may implement the following functions according to the data collected by the gyro sensor 711: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 712 may be disposed at a side frame of the terminal 700 and/or at a lower layer of the display screen 705. When the pressure sensor 712 is disposed at a side frame of the terminal 700, a grip signal of the user to the terminal 700 may be detected, and the processor 701 performs a left-right hand recognition or a shortcut operation according to the grip signal collected by the pressure sensor 712. When the pressure sensor 712 is disposed at the lower layer of the display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 713 is used to collect the intensity of ambient light. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 713. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 705 is turned up; when the ambient light intensity is low, the display brightness of the display screen 705 is turned down. In another embodiment, the processor 701 may also dynamically adjust the shooting parameters of the camera assembly 706 based on the ambient light intensity collected by the optical sensor 713.

A proximity sensor 714, also known as a distance sensor, is typically provided on the front panel of the terminal 700. The proximity sensor 714 is used to collect the distance between the user and the front of the terminal 700. In one embodiment, when the proximity sensor 714 detects that the distance between the user and the front of the terminal 700 gradually decreases, the processor 701 controls the display 705 to switch from the bright screen state to the off screen state; when the proximity sensor 714 detects that the distance between the user and the front surface of the terminal 700 gradually increases, the processor 701 controls the display screen 705 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 7 is not limiting of the terminal 700 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 800 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 801 and one or more memories 802, where the memories 802 are used to store executable program codes, and the processors 801 are configured to execute the executable program codes to implement the entity and intention recognition methods provided by the foregoing method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

The embodiment of the application also provides a computer readable storage medium, wherein at least one program code is stored in the computer readable storage medium, and the at least one program code is loaded and executed by a processor to realize the entity and intention identification method of any implementation mode.

The embodiment of the application also provides a computer program product, which comprises computer program code, the computer program code is stored in a computer readable storage medium, a processor of the terminal reads the computer program code from the computer readable storage medium, and the processor executes the computer program code to enable the terminal to execute the entity and the intention identification method of any implementation mode.

In some embodiments, a computer program product according to an embodiment of the present application may be deployed to be executed on one terminal or on a plurality of terminals located at one site, or alternatively, on a plurality of terminals distributed at a plurality of sites and interconnected by a communication network, where a plurality of terminals distributed at a plurality of sites and interconnected by a communication network may constitute a blockchain system.

The foregoing is illustrative of the present application and is not to be construed as limiting thereof, but rather as various modifications, equivalent arrangements, improvements, etc., which fall within the spirit and principles of the present application.

Claims

1. A method of identifying an entity and an intent, the method comprising:

acquiring feature vectors of a plurality of words in the text;

2. The method of claim 1, wherein the entity recognition result and the intent recognition result are derived based on a target recognition model, the target recognition model comprising an input module, a coding module, an entity recognition module, and an intent recognition module;

The obtaining process of the feature vectors of the plurality of words and the feature vectors of the plurality of words comprises the following steps:

the text is input into an input module of the target recognition model, and the feature vectors of the words are obtained through the input module;

the obtaining the hidden vector sequence of the text based on the feature vectors of the words and the feature vectors of the words comprises the following steps:

inputting the feature vectors of the plurality of words and the feature vectors of the plurality of words into the coding module, and extracting attention features of the feature vectors of the plurality of words and the feature vectors of the plurality of words through the coding module to obtain the hidden vector sequence;

the entity recognition is carried out on the text based on the hidden vector sequence of the words in the hidden vector sequence to obtain an entity recognition result, which comprises the following steps:

inputting the hidden vector sequence of the word into the entity recognition module, and obtaining the entity recognition result through the entity recognition module;

the intention recognition is carried out on the text based on the hidden vector sequence to obtain an intention recognition result, which comprises the following steps:

and inputting the hidden vector sequence into the intention recognition module, and obtaining the intention recognition result through the intention recognition module.

3. The method according to claim 2, wherein inputting the feature vectors of the plurality of words and the feature vectors of the plurality of words into the encoding module, performing attention feature extraction on the feature vectors of the plurality of words and the feature vectors of the plurality of words by the encoding module, and obtaining the hidden vector sequence includes:

4. A method according to claim 3, wherein the weight matrix comprises a plurality of elements, and the determining of the elements in the weight matrix in the ith row and the jth column comprises:

5. The method of claim 2, wherein the entity recognition result is a target tag sequence of the text, the target tag sequence comprising category tags of the plurality of words, the category tag of each word being used to represent an entity category of a target word in which the word is located and a position of the word in the target word;

inputting the hidden vector sequence of the word into the entity recognition module, and obtaining the entity recognition result through the entity recognition module, wherein the method comprises the following steps:

inputting the hidden vector sequence of the word into the entity recognition module;

6. The method of claim 2, wherein the intent recognition result comprises probabilities of a plurality of intent categories, the inputting the sequence of hidden vectors into the intent recognition module, the intent recognition result being obtained by the intent recognition module, comprises:

inputting the hidden vector sequence into the intention recognition module;

7. The method of claim 6, wherein determining the probabilities that the plurality of words are respectively labeled as the target preset category labels based on the hidden vectors of the plurality of words comprises:

For each word in the plurality of words, determining the probability that the word is marked as the target preset category label based on the hidden vector of the word and a target model parameter, wherein the target model parameter is a model parameter which is acquired from the entity identification module and used for determining the probability that the word is marked as any preset category label.

8. The method of claim 6, wherein the performing attention feature extraction on the sequence of hidden vectors to obtain an attention vector comprises:

9. The method of claim 2, wherein the training process of the object recognition model comprises:

10. An apparatus for identifying an entity and an intent, the apparatus comprising:

11. A computer device, characterized in that it comprises a processor and a memory for storing at least one piece of computer program, which is loaded by the processor and which performs the method of identifying an entity and an intention according to any of claims 1 to 9.

12. A computer readable storage medium, characterized in that the computer readable storage medium is for storing at least one segment of a computer program for performing the entity and intention recognition method of any one of claims 1 to 9.

13. A computer program product, characterized in that it comprises a computer program code, which is stored in a computer readable storage medium, from which computer program code a processor of a computer device reads, which processor executes the computer program code, so that the computer device performs the method of identifying an entity and an intention as claimed in any one of claims 1 to 9.