WO2024179519A1 - Semantic recognition method and apparatus - Google Patents
Semantic recognition method and apparatus Download PDFInfo
- Publication number
- WO2024179519A1 WO2024179519A1 PCT/CN2024/079034 CN2024079034W WO2024179519A1 WO 2024179519 A1 WO2024179519 A1 WO 2024179519A1 CN 2024079034 W CN2024079034 W CN 2024079034W WO 2024179519 A1 WO2024179519 A1 WO 2024179519A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- keyword
- texts
- word
- vectors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Creation or modification of classes or clusters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
Definitions
- the present application belongs to the field of data processing technology, and specifically relates to a semantic recognition method and device thereof.
- the relevant technology usually performs semantic recognition on the text according to a pre-set vocabulary to obtain the text category corresponding to the text.
- the above vocabulary includes keywords and text categories. It is easy to understand that the above vocabulary can represent the mapping relationship between keywords and text categories, and the above keywords can be represented by the word vectors included in the vocabulary. However, the word vectors representing the keywords in the above vocabulary may not be accurate enough, which affects the recognition results of semantic recognition and thus reduces the accuracy of semantic recognition.
- the purpose of the embodiments of the present application is to provide a semantic recognition method and device thereof, which can solve the problem of low accuracy of semantic recognition.
- an embodiment of the present application provides a semantic recognition method, which includes: obtaining N first texts and second texts; each of the first texts includes a first keyword in a preset vocabulary, where N is a positive integer greater than 1; performing first text processing on each of the first texts through a preset text processing model to obtain M character vectors corresponding to the first keyword in each of the first texts, where M is a positive integer greater than 1; determining a first word vector corresponding to each of the first texts based on the M character vectors; and performing semantic recognition on the second text based on the first word vector corresponding to each of the first texts.
- an embodiment of the present application provides a semantic recognition device, which includes: an acquisition module, used to acquire N first texts and second texts; each of the first texts includes a first keyword in a preset vocabulary, N is a positive integer greater than 1; a first processing module, used to perform first text processing on each of the first texts through a preset text processing model, and obtain M character vectors corresponding to the first keyword in each of the first texts, M is a positive integer greater than 1; a first determination module, used to determine the first word vector corresponding to each of the first texts based on the M character vectors; a first recognition module, used to perform semantic recognition on the second text based on the first word vector corresponding to each of the first texts.
- an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instruction stored in the memory and executable on the processor, wherein the program or instruction, when executed by the processor, implements the steps of the method described in the first aspect.
- an embodiment of the present application provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented.
- an embodiment of the present application provides a chip, comprising a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run a program or instruction to implement the method described in the first aspect.
- an embodiment of the present application provides a computer program product, which is stored in a storage medium and is executed by at least one processor to implement the method described in the first aspect.
- the embodiment of the present application performs first text processing on each first text through a preset text processing model, obtains M character vectors corresponding to the first keyword in each first text, and then determines the first word vector based on the M character vectors.
- the above-mentioned first word vector is determined based on the M character vectors, which fully considers the context information in the first text, and does not directly determine the text feature vector corresponding to the first text as the word vector as in the related art;
- the above-mentioned M character vectors are obtained by performing first text processing on each first text through the text processing model, thereby taking into account the semantics of the first keyword in the first text.
- the above method ensures that the first word vector retains the semantics of the keyword. At the same time, it will not be affected by contextual information, thereby improving the accuracy of semantic recognition in the subsequent process of semantic recognition of the second text according to the first word vector corresponding to each first text.
- FIG1 is a flow chart of a semantic recognition method provided by an embodiment of the present application.
- FIG2 is an application flow chart of updating a vocabulary table provided in an embodiment of the present application.
- FIG3 is a structural diagram of a semantic recognition device provided in an embodiment of the present application.
- FIG4 is a structural diagram of an electronic device provided in an embodiment of the present application.
- FIG5 is a hardware structure diagram of an electronic device provided in an embodiment of the present application.
- first, second, etc. in the specification and claims of this application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by "first”, “second”, etc. are generally of one type, and the number of objects is not limited.
- the first object can be one or more.
- “and/or” in the specification and claims represents at least one of the connected objects, and the character “/" generally indicates that the objects associated with each other are in an "or” relationship.
- the text is usually classified by a pre-set vocabulary, and the vocabulary can represent the mapping relationship between keywords and text categories, and the keywords can be represented by word vectors included in the vocabulary.
- the text keywords in the text are extracted, and the similarity between the word vectors of the text keywords and the word vectors representing the keywords in the vocabulary is calculated. According to the similarity value, the text category corresponding to the text is determined to achieve semantic recognition of the text.
- the word vectors of keywords are usually obtained in the following two ways to build a vocabulary:
- the above text processing model includes a Bert model, a Markov model or other types of text processing models.
- the text including the keyword is input into a preset text processing model to obtain a text feature vector corresponding to the text, and then the text feature vector is determined as the word vector corresponding to the keyword.
- the text feature vector is a sentence vector.
- the word vector obtained by the above method 1 cannot fully express the semantics of the keyword, which in turn affects the accuracy of the vocabulary. In the process of classifying texts through the vocabulary, the accuracy of the semantic recognition results is also reduced.
- the word vector obtained by the above method 2 is easily affected by the context information in the text, that is, the word vector obtained by the method 2 is not sufficient, which in turn affects the accuracy of the vocabulary. In the process of classifying the text through the vocabulary, the accuracy of the semantic recognition result is also reduced.
- an embodiment of the present application provides a semantic recognition method.
- the semantic recognition method provided by the embodiment of the present application is described in detail below through specific embodiments and application scenarios in combination with the accompanying drawings.
- the present application embodiment provides a semantic recognition method. Please refer to Figure 1, which is a flow chart of the semantic recognition method provided by the present application embodiment.
- the semantic recognition method provided by the present application embodiment includes the following steps:
- N first texts are obtained, and each first text includes a first keyword, where N is a positive integer greater than 1.
- the N first texts may be obtained from a preset database, or downloaded from the Internet.
- the specific method for obtaining the N first texts is not limited herein.
- the first text mentioned above is a public opinion text used in a public opinion analysis project.
- the first text may be a sentence.
- a second text is obtained, and the second text can be understood as a text to be classified.
- the second text includes a second keyword, and optionally, the second keyword can be a manually annotated opinion word or an object word representing an opinion.
- S102 Perform first text processing on each of the first texts using a preset text processing model to obtain M character vectors corresponding to the first keyword in each of the first texts.
- a text processing model is pre-set.
- the text processing model includes a Bert model, a Markov model or other types of text processing models.
- N first texts are input into the text processing model, and each first text is processed by the text processing model to obtain M character vectors corresponding to the first keyword in each first text.
- M is a positive integer greater than 1.
- the above character vector is also called a token vector, and the above character vector is a vector corresponding to each character in the first keyword.
- the text processing model processes the input text to obtain a processing method of a vector corresponding to each character in the input text, which is called the first text processing method.
- S103 Determine a first word vector corresponding to each of the first texts according to the M character vectors.
- these M character vectors can be used to determine the first word vector corresponding to each first text, wherein the first word vector corresponding to the above first text is used to represent the first keyword in the first text.
- S104 Perform semantic recognition on the second texts according to the first word vector corresponding to each of the first texts.
- semantic recognition can be performed on the second text based on the N first word vectors.
- semantic recognition can be performed on the second text based on the N first word vectors.
- each first text is processed by a preset text processing model to obtain M character vectors corresponding to the first keyword in each first text, and then the M character vectors are processed according to the M Character vector, determine the first word vector.
- the above-mentioned first word vector is determined based on M character vectors, taking full account of the context information in the first text, rather than directly determining the text feature vector corresponding to the first text as the word vector as in the related art; the above-mentioned M character vectors are obtained by performing first text processing on each first text through a text processing model, thereby taking into account the semantics of the first keyword in the first text.
- the above-mentioned method ensures that the first word vector will not be affected by the context information while retaining the semantics of the keyword, thereby improving the accuracy of semantic recognition in the subsequent process of semantic recognition of the second text based on the first word vector corresponding to each first text.
- performing text processing on each of the first texts by using a preset text processing model to obtain M character vectors corresponding to the first keyword in each of the first texts includes:
- An average value of at least part of the character vectors corresponding to each of the target characters is determined as the M character vectors corresponding to the first keyword in the first text.
- multiple character vectors corresponding to each character in the first text are obtained; then, based on the position of the character corresponding to the first keyword in the first text, M target characters for representing the first keyword from the multiple characters included in the first text are determined; the average value of at least part of the character vector corresponding to each target character is determined as the character vector corresponding to the target character, and then the M character vectors corresponding to the M target characters are determined as the M character vectors corresponding to the first keyword.
- the text processing model is a Bert model including 25 network layers
- the first text is represented as a sentence including 10 characters "Cannot charge, charging is too slow”
- the first keyword is "too slow”.
- the first text is input into the Bert model, and each network layer of the Bert model outputs 10 character vectors, that is, the Bert model outputs a total of 250 character vectors, and each character in the first text corresponds to 25 character vectors.
- the output result of the i-th layer of the Bert model can be expressed as [vec_i1, vec_i2, ..., vec_i10].
- the first keyword is located at the last three characters in the first text, and then the last three characters in the first text are determined as 3 target characters.
- the target characters can be represented as [vec_i8, vec_i9, vec_i10].
- each character corresponds to 25 character vectors, that is, each target character also corresponds to 25 character vectors.
- the average value of the 10th to 25th character vectors corresponding to each target character can be determined as the character vector corresponding to the first keyword.
- the character " ⁇ " in the first keyword corresponds to 25 character vectors
- these 25 character vectors are the character vectors output by the 1st to 25th network layers of the Bert model respectively; the average value of the character vectors output by the 10th to 25th network layers of the Bert model is calculated, and the average value is determined as a character vector corresponding to the character " ⁇ " in the first keyword.
- the same sampling method as above is used to determine a character vector corresponding to the character " ⁇ " in the first keyword, and a character vector corresponding to the character " ⁇ " in the first keyword, and then the above three character vectors are determined as the three character vectors corresponding to the first keyword.
- the average value of all character vectors corresponding to each target character may also be determined as the character vector corresponding to the first keyword.
- each first text is processed by a preset text processing model to obtain M character vectors corresponding to the first keyword in each first text, so that the first text including the first keyword and some context information is processed, taking into account the semantics of the first keyword in the first text to avoid semantic loss of the first keyword.
- determining, based on the M character vectors, a first word vector corresponding to each of the first texts includes:
- an average value of the M character vectors corresponding to the first keyword in the first text is determined as the first word vector corresponding to the first text.
- the average value of the M character vectors corresponding to the first keyword can be determined as the first word vector corresponding to the first text. It should be understood that in other embodiments, the M character vectors corresponding to the first keyword can be weighted and summed, and the weighted sum result can be determined as the first word vector corresponding to the first text.
- the first word vector corresponding to each first text is determined based on M character vectors, rather than directly determining the text feature vector corresponding to the first text as the word vector as in the related art, so as to avoid the representation of the first word vector being affected by the context information in the first text.
- performing semantic recognition on the second text according to the first word vector corresponding to each of the first texts includes:
- the text category corresponding to the second text is determined through the updated vocabulary.
- a vocabulary is pre-set, the vocabulary includes multiple first keywords and multiple text categories, and the vocabulary is used to represent the mapping relationship between the first keywords and the text categories.
- the vocabulary can be used to classify texts, as described in the background technology, extract text keywords from the text to be classified, calculate the similarity between the word vector of the above text keyword and the word vector representing the first keyword in the vocabulary, and determine the text category corresponding to the text to be classified according to the similarity value.
- the above-mentioned preset vocabulary can be updated according to the first word vector corresponding to each first text, and then the text category corresponding to the second text is determined by the updated vocabulary, so as to achieve semantic recognition of the second text.
- the word vector of the second text may also be extracted, and at least one first word vector may be determined based on the similarity between each first word vector and the word vector of the second text, and then the text type corresponding to the first word vector may be determined as the text type corresponding to the second text, thereby achieving semantic recognition of the second text.
- the vocabulary can represent the mapping relationship between the first keyword including multiple semantics and the text category, in this case, the technical solution involved in the following embodiment can be used to update the vocabulary:
- updating a preset vocabulary according to the first word vector corresponding to each of the first texts includes:
- the word list is updated according to the K second word vectors corresponding to the K clusters; the updated word list includes K second word vectors for representing the first keyword, and each second word vector is used to represent a semantics corresponding to the first keyword.
- N first word vectors are clustered, and first word vectors with similar semantics are clustered in one cluster to obtain K clusters; wherein each cluster includes at least one first word vector, and K is a positive integer less than or equal to N.
- the average value of at least one first word vector included in each cluster is determined as the second word vector corresponding to the cluster.
- the second word vector corresponding to the cluster can be determined by the following formula:
- N I represents the number of second word vectors included in the cluster
- Vec_i represents the i-th second word vector
- N_I represents the I-th first text.
- the updated word list includes K second word vectors for representing the first keyword, that is, the first keyword can be represented by K second word vectors, and each second word vector is used to represent a semantics corresponding to the first keyword.
- the updated word list includes 3 second word vectors for representing the first keyword, indicating that the updated word list includes 3 semantics of the first keyword.
- the vocabulary is used to represent a mapping relationship between a first keyword and a text category, and determining the text category corresponding to the second text through the updated vocabulary includes:
- the text feature vector corresponding to the second text is obtained by performing second text processing on the second text by the text processing model;
- the text category corresponding to the second text is determined according to the mapping relationship between the first keyword represented by the updated vocabulary and the text category.
- the vocabulary is used to characterize the mapping relationship between the first keyword and the text category.
- the text can be classified according to the updated vocabulary.
- the second keyword is queried in the updated vocabulary. If the updated vocabulary includes the second keyword, that is, the second keyword and the first keyword are the same keyword, the text category corresponding to the second text can be directly determined based on the mapping relationship represented by the updated vocabulary.
- the second text can be processed by a preset text processing model to obtain a text feature vector corresponding to the second text, wherein the text processing model processes the input text to obtain a text feature vector corresponding to the input text, which is called a second text processing method.
- the second text is a sentence including the second keyword
- the text feature vector corresponding to the second text is a sentence vector.
- the similarity between the text feature vector and each second word vector in the vocabulary is calculated. If there is a second word vector whose similarity value with the text feature vector is higher than a preset threshold, the text category that has a mapping relationship with the second word vector is determined as the text category corresponding to the second text.
- the second word vector with the highest similarity value among the multiple second word vectors is determined, and the text category that has a mapping relationship with the second word vector is determined as the text category corresponding to the second text.
- the above-mentioned method of determining the similarity between the text feature vector and the second word vector includes, but is not limited to, based on cosine similarity between vectors, based on Euclidean distance between vectors or other calculation methods.
- the updated vocabulary is used to perform text classification to achieve semantic recognition of the text. Since the second word vector included in the updated vocabulary can accurately represent the keyword, it will not affect the similarity calculation result between the word vector of the text keyword and the word vector representing the keyword in the vocabulary, thereby improving the accuracy of the semantic recognition result.
- the vocabulary representation includes a mapping relationship between the first keyword with one semantic and the text category, in this case, the technical solution involved in the following embodiment can be used to update the vocabulary:
- the method further includes:
- the N first texts are screened to obtain L first texts
- the benchmark word vector is obtained by performing first text processing on a preset benchmark text through the text processing model
- the third word vector is an average value of L fourth word vectors
- the L fourth word vectors are obtained by performing first text processing on the L first texts based on the text processing model
- the second text processing is performed on N first texts through the text processing model to obtain N text feature vectors.
- the implementation method of the second text processing is consistent with the implementation method of the second text processing involved in the above embodiment, and will not be repeated here.
- the above first text is a sentence
- the text feature vector is a sentence vector.
- the text processing model performs first text processing on the preset reference text to obtain a reference word vector.
- the reference text includes a first keyword, and the implementation of the first text processing is consistent with the implementation of the first text processing involved in the above embodiment, which will not be repeated here.
- the similarity between each text feature vector and the benchmark word vector is calculated, and the first texts corresponding to the text feature vectors whose similarity is higher than a second preset threshold are selected to obtain L first texts.
- the L first texts are processed by the text processing model to obtain L fourth word vectors; the average value of the L fourth word vectors is determined as the third word vector, and then the second text is semantically recognized based on the third word vector.
- a preset word list can be updated based on the third word vector, wherein the updated word list includes a third word vector for representing the first keyword, that is, the first keyword can be represented by 1 third word vector, and the third word vector is used to represent a semantics corresponding to the first keyword.
- a vocabulary is pre-set, and the preset vocabulary can be updated, and then the second text is semantically recognized through the updated vocabulary.
- the application process of updating the vocabulary provided by the present application is: obtain N first texts including the first keyword.
- the vocabulary is updated in the following manner: determine N first word vectors corresponding to N first texts, perform semantic clustering on the N first word vectors, and obtain K cluster clusters; determine the second word vector corresponding to each cluster cluster; and update the vocabulary according to the K second word vectors corresponding to the K cluster clusters.
- the vocabulary is updated in the following manner: second text processing is performed on N first texts through a text processing model to obtain N text feature vectors; based on the similarity between the N text feature vectors and the benchmark word vectors, L first texts are screened and obtained; the third word vectors corresponding to the L first texts are determined; and based on the third word vectors, the vocabulary is updated.
- performing semantic recognition on the second text according to the L third word vectors corresponding to the first texts includes:
- a preset word list is updated;
- the updated word list includes the third word vector used to represent the first keyword, and the updated word list is used to represent the mapping relationship between the first keyword and the text category;
- the text feature vector corresponding to the second text is obtained by performing second text processing on the second text by the text processing model;
- the text category corresponding to the second text is determined according to the mapping relationship between the first keyword represented by the updated vocabulary and the text category.
- a vocabulary is pre-set, the vocabulary includes a plurality of first keywords and a plurality of text categories, and the vocabulary is used to represent the mapping relationship between the first keywords and the text categories.
- the vocabulary can be Used to categorize text.
- the preset word list can be updated according to the third word vector.
- the specific updating method is consistent with the above-mentioned method of updating the word list according to the first word vector, and will not be repeated here.
- the updated word list includes a third word vector for representing the first keyword, that is, the first keyword can be represented by 1 third word vector, and the third word vector is used to represent a semantics corresponding to the first keyword.
- the text can be classified according to the updated vocabulary, thereby achieving semantic recognition of the text.
- the second keyword included in the second text is queried in the updated vocabulary. If the updated vocabulary includes the second keyword, that is, the second keyword and the first keyword are the same keyword, the text category corresponding to the second text can be directly determined based on the mapping relationship represented by the updated vocabulary.
- the second text can be processed by a preset text processing model to obtain a text feature vector corresponding to the second text; then the similarity between the text feature vector and the third word vector in the vocabulary is calculated. If the similarity value between the text feature vector and the third word vector is higher than a third preset threshold, the text category that has a mapping relationship with the third word vector is determined as the text category corresponding to the second text.
- the text is classified by the updated vocabulary to achieve semantic recognition of the text. Since the third word vector included in the updated vocabulary can accurately represent the keyword, it will not affect the similarity calculation result between the word vector of the text keyword and the word vector representing the keyword in the vocabulary, thereby improving the accuracy of the semantic recognition result.
- the semantic recognition device 300 includes:
- An acquisition module 301 is used to acquire N first texts and second texts; each of the first texts includes a first keyword in a preset vocabulary, and N is a positive integer greater than 1;
- the first processing module 302 is used to perform first text processing on each of the first texts using a preset text processing model to obtain M character vectors corresponding to the first keyword in each of the first texts.
- quantity, M is a positive integer greater than 1;
- a first determination module 303 configured to determine a first word vector corresponding to each of the first texts according to the M character vectors
- the first recognition module 304 is used to perform semantic recognition on the second text according to the first word vector corresponding to each of the first texts.
- the first processing module 302 is specifically configured to:
- An average value of at least part of the character vectors corresponding to each of the target characters is determined as the M character vectors corresponding to the first keyword in the first text.
- the first determining module 303 is specifically configured to:
- an average value of the M character vectors corresponding to the first keyword in the first text is determined as the first word vector corresponding to the first text.
- the first identification module 304 is specifically configured to:
- a preset word list is updated; the word list includes the first keyword in each of the first texts;
- the text category corresponding to the second text is determined through the updated vocabulary.
- the first identification module 304 is further specifically configured to:
- each cluster includes at least one first word vector, and K is a positive integer less than or equal to N;
- the word list is updated according to the K second word vectors corresponding to the K clusters; the updated word list includes K second word vectors for representing the first keyword, and each second word vector is used to represent a semantics corresponding to the first keyword.
- the vocabulary is used to characterize a mapping relationship between the first keyword and the text category
- the first identification module 304 is further specifically configured to:
- the text feature vector corresponding to the second text is obtained by performing second text processing on the second text by the text processing model;
- the text category corresponding to the second text is determined according to the mapping relationship between the first keyword represented by the updated vocabulary and the text category.
- the semantic recognition device 300 further includes:
- a second processing module configured to perform second text processing on the N first texts by using the text processing model to obtain N text feature vectors corresponding to the N first texts one by one;
- a screening module configured to screen the N first texts according to the similarity between the N text feature vectors and the benchmark word vectors to obtain L first texts; the benchmark word vectors are obtained by performing first text processing on a preset benchmark text through the text processing model;
- a second determination module is used to determine third word vectors corresponding to L first texts; the third word vector is an average value of L fourth word vectors, and the L fourth word vectors are obtained by performing first text processing on the L first texts based on the text processing model;
- the second recognition module is used to perform semantic recognition on the second text according to the L third word vectors corresponding to the first texts.
- the second identification module is specifically used to:
- a preset word list is updated;
- the updated word list includes the third word vector used to represent the first keyword, and the updated word list is used to represent the mapping relationship between the first keyword and the text category;
- the text category corresponding to the second text is determined according to the similarity between the text feature vector corresponding to the second text and the third word vector; the text feature vector corresponding to the second text The amount is obtained by performing second text processing on the second text by the text processing model;
- the text category corresponding to the second text is determined according to the mapping relationship between the first keyword represented by the updated vocabulary and the text category.
- the embodiment of the present application performs first text processing on each first text through a preset text processing model, obtains M character vectors corresponding to the first keyword in each first text, and then determines the first word vector based on the M character vectors.
- the above-mentioned first word vector is determined based on the M character vectors, which fully considers the context information in the first text, and does not directly determine the text feature vector corresponding to the first text as the word vector as in the related art;
- the above-mentioned M character vectors are obtained by performing first text processing on each first text through the text processing model, thereby taking into account the semantics of the first keyword in the first text.
- the above method ensures that the first word vector will not be affected by the context information while retaining the semantics of the keyword, thereby improving the accuracy of semantic recognition in the subsequent process of semantic recognition of the second text based on the first word vector corresponding to each first text.
- the semantic recognition device in the embodiment of the present application can be an electronic device, or a component in the electronic device, such as an integrated circuit or a chip.
- the electronic device can be a terminal, or it can be other devices other than a terminal.
- the electronic device can be a mobile phone, a tablet computer, a laptop computer, a PDA, a car-mounted electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) device, a robot, a wearable device, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook or a personal digital assistant (personal digital assistant, PDA), etc.
- NAS Network Attached Storage
- PC personal computer
- TV television
- teller machine a self-service machine
- the semantic recognition device in the embodiment of the present application may be a device having an operating system.
- the operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiment of the present application.
- the embodiment of the present application provides a semantic recognition device that can implement various aspects of the method embodiment of FIG. 1. To avoid repetition, the process will not be described here.
- an embodiment of the present application also provides an electronic device 400, including a processor 401, a memory 402, and a program or instruction stored in the memory 402 and executable on the processor 401.
- a processor 401 When the program or instruction is executed by the processor 401, each process of the above-mentioned semantic recognition method embodiment is implemented, and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.
- the electronic devices in the embodiments of the present application include the mobile electronic devices and non-mobile electronic devices mentioned above.
- FIG5 is a schematic diagram of the hardware structure of an electronic device implementing an embodiment of the present application.
- the electronic device 500 includes but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 505, an interface unit 508, a memory 509, and a processor 510.
- the electronic device 500 may also include a power source (such as a battery) for supplying power to each component, and the power source may be logically connected to the processor 510 through a power management system, so that the power management system can manage charging, discharging, and power consumption.
- a power source such as a battery
- the electronic device structure shown in FIG5 does not constitute a limitation on the electronic device, and the electronic device may include more or fewer components than shown, or combine certain components, or arrange components differently, which will not be described in detail here.
- the input unit 504 is further used to obtain N first texts and second texts;
- the processor 510 is further configured to perform first text processing on each first text by using a preset text processing model to obtain M character vectors corresponding to the first keyword in each first text;
- the embodiment of the present application performs first text processing on each first text through a preset text processing model, obtains M character vectors corresponding to the first keyword in each first text, and then determines the first word vector based on the M character vectors.
- the above first word vector is determined based on the M character vectors, taking full account of the context information in the first text, rather than directly determining the text feature vector corresponding to the first text as the word vector as in the related art; the above M character vectors are obtained through text processing.
- the model processes each first text, thereby taking into account the semantics of the first keyword in the first text.
- the above method ensures that the first word vector will not be affected by context information while retaining the semantics of the keyword, thereby improving the accuracy of semantic recognition in the subsequent process of semantic recognition of the second text based on the first word vector corresponding to each first text.
- the input unit 504 may include a graphics processing unit (GPU) 5041 and a microphone 5042, and the graphics processor 5041 processes the image data of the static picture or video obtained by the image capture device (such as a camera) in the video capture mode or the image capture mode.
- the display unit 506 may include a display panel 5061, and the display panel 5061 may be configured in the form of a liquid crystal display, an organic light emitting diode, etc.
- the user input unit 506 includes a touch panel 5061 and at least one of other input devices 5062.
- the touch panel 5061 is also called a touch screen.
- the touch panel 5061 may include two parts: a touch detection device and a touch controller.
- Other input devices 5062 may include, but are not limited to, a physical keyboard, function keys (such as a volume control key, a switch key, etc.), a trackball, a mouse, and a joystick, which will not be repeated here.
- the memory 509 can be used to store software programs and various data.
- the memory 509 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required for at least one function (such as a sound playback function, an image playback function, etc.), etc.
- the memory 509 may include a volatile memory or a non-volatile memory, or the memory 509 may include both volatile and non-volatile memories.
- the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
- the volatile memory may be random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM) and Direct Rambus RAM (DRRAM)
- RAM random access memory
- SRAM static random access memory
- DRAM dynamic random access memory
- SDRAM synchronous dynamic random access memory
- DDRSDRAM double data rate synchronous dynamic random access memory
- ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous link dynamic random access memory
- DRRAM Direct Rambus RAM
- the memory 509 in the embodiment of the present application includes but is
- the processor 510 may include one or more processing units; optionally, the processor 510 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to an operating system, a user interface, and application programs, and the modem processor mainly processes wireless communication signals, such as a baseband processor. It is understandable that the modem processor may not be integrated into the processor 510.
- An embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored.
- a program or instruction is stored.
- the various processes of the above-mentioned semantic recognition method embodiment are implemented and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.
- the processor is the processor in the electronic device described in the above embodiment.
- the readable storage medium includes a computer readable storage medium, such as a computer read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.
- An embodiment of the present application further provides a chip, which includes a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned semantic recognition method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
- the chip mentioned in the embodiments of the present application can also be called a system-level chip, a system chip, a chip system or a system-on-chip chip, etc.
- An embodiment of the present application provides a computer program product, which is stored in a storage medium.
- the program product is executed by at least one processor to implement the various processes of the above-mentioned semantic recognition method embodiment and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
- the technical solution of the present application can be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in each embodiment of the present application.
- a storage medium such as ROM/RAM, a disk, or an optical disk
- a terminal which can be a mobile phone, a computer, a server, or a network device, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Character Discrimination (AREA)
Abstract
Description
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求在2023年03月01日提交中国专利局、申请号为202310181906.7、发明名称为“语义识别方法及其装置”的中国专利申请的优先权,该中国专利申请的全部内容通过引用包含于此。This application claims the priority of the Chinese patent application filed with the China Patent Office on March 1, 2023, with application number 202310181906.7 and invention name “Semantic Recognition Method and Device Thereof”. The entire contents of the Chinese patent application are incorporated herein by reference.
本申请属于数据处理技术领域,具体涉及一种语义识别方法及其装置。The present application belongs to the field of data processing technology, and specifically relates to a semantic recognition method and device thereof.
目前,相关技术通常根据预先设置的词表对文本进行语义识别,获得文本对应的文本类别。上述词表包括关键词和文本类别,容易理解,上述词表可以表征关键词和文本类别之间的映射关系,且上述关键词可以用词表包括的词向量表征。然而,上述词表中表征关键词的词向量可能不够准确,这影响了语义识别的识别结果,进而降低了语义识别的准确性。At present, the relevant technology usually performs semantic recognition on the text according to a pre-set vocabulary to obtain the text category corresponding to the text. The above vocabulary includes keywords and text categories. It is easy to understand that the above vocabulary can represent the mapping relationship between keywords and text categories, and the above keywords can be represented by the word vectors included in the vocabulary. However, the word vectors representing the keywords in the above vocabulary may not be accurate enough, which affects the recognition results of semantic recognition and thus reduces the accuracy of semantic recognition.
发明内容Summary of the invention
本申请实施例的目的是一种语义识别方法及其装置,能够解决语义识别的准确性较低的问题。The purpose of the embodiments of the present application is to provide a semantic recognition method and device thereof, which can solve the problem of low accuracy of semantic recognition.
第一方面,本申请实施例提供了一种语义识别方法,该方法包括:获取N个第一文本和第二文本;每个所述第一文本均包括预设的词表中的第一关键词,N为大于1的正整数;通过预设的文本处理模型对每个所述第一文本进行第一文本处理,获得所述每个所述第一文本中第一关键词对应的M个字符向量,M为大于1的正整数;根据所述M个所述字符向量,确定每个所述第一文本对应的第一词向量;根据每个所述第一文本对应的第一词向量,对所述第二文本进行语义识别。 In a first aspect, an embodiment of the present application provides a semantic recognition method, which includes: obtaining N first texts and second texts; each of the first texts includes a first keyword in a preset vocabulary, where N is a positive integer greater than 1; performing first text processing on each of the first texts through a preset text processing model to obtain M character vectors corresponding to the first keyword in each of the first texts, where M is a positive integer greater than 1; determining a first word vector corresponding to each of the first texts based on the M character vectors; and performing semantic recognition on the second text based on the first word vector corresponding to each of the first texts.
第二方面,本申请实施例提供了一种语义识别装置,该装置包括:获取模块,用于获取N个第一文本和第二文本;每个所述第一文本均包括预设的词表中的第一关键词,N为大于1的正整数;第一处理模块,用于通过预设的文本处理模型对所述每个第一文本进行第一文本处理,获得每个所述第一文本中第一关键词对应的M个字符向量,M为大于1的正整数;第一确定模块,用于根据M所述个字符向量,确定每个所述第一文本对应的第一词向量;第一识别模块,用于根据每个所述第一文本对应的第一词向量,对所述第二文本进行语义识别。In the second aspect, an embodiment of the present application provides a semantic recognition device, which includes: an acquisition module, used to acquire N first texts and second texts; each of the first texts includes a first keyword in a preset vocabulary, N is a positive integer greater than 1; a first processing module, used to perform first text processing on each of the first texts through a preset text processing model, and obtain M character vectors corresponding to the first keyword in each of the first texts, M is a positive integer greater than 1; a first determination module, used to determine the first word vector corresponding to each of the first texts based on the M character vectors; a first recognition module, used to perform semantic recognition on the second text based on the first word vector corresponding to each of the first texts.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instruction stored in the memory and executable on the processor, wherein the program or instruction, when executed by the processor, implements the steps of the method described in the first aspect.
第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented.
第五方面,本申请实施例提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。In a fifth aspect, an embodiment of the present application provides a chip, comprising a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run a program or instruction to implement the method described in the first aspect.
第六方面,本申请实施例提供一种计算机程序产品,该程序产品被存储在存储介质中,该程序产品被至少一个处理器执行以实现如第一方面所述的方法。In a sixth aspect, an embodiment of the present application provides a computer program product, which is stored in a storage medium and is executed by at least one processor to implement the method described in the first aspect.
本申请实施例通过预设的文本处理模型对每个第一文本进行第一文本处理,获得每个第一文本中第一关键词对应的M个字符向量,进而根据M个字符向量,确定第一词向量。上述第一词向量是根据M个字符向量确定的,充分考虑到了第一文本中的上下文信息,并非如相关技术中那样将第一文本对应的文本特征向量直接确定为词向量;上述M个字符向量是通过文本处理模型对每个第一文本进行第一文本处理得到的,以此考虑到了第一关键词在第一文本中的语义。通过上述方式确保第一词向量在保留有关键词的语义的 同时不会受到上下文信息的影响,进而在后续根据每个第一文本对应的第一词向量,对第二文本进行语义识别的过程中,提高语义识别的准确性。The embodiment of the present application performs first text processing on each first text through a preset text processing model, obtains M character vectors corresponding to the first keyword in each first text, and then determines the first word vector based on the M character vectors. The above-mentioned first word vector is determined based on the M character vectors, which fully considers the context information in the first text, and does not directly determine the text feature vector corresponding to the first text as the word vector as in the related art; the above-mentioned M character vectors are obtained by performing first text processing on each first text through the text processing model, thereby taking into account the semantics of the first keyword in the first text. The above method ensures that the first word vector retains the semantics of the keyword. At the same time, it will not be affected by contextual information, thereby improving the accuracy of semantic recognition in the subsequent process of semantic recognition of the second text according to the first word vector corresponding to each first text.
图1是本申请实施例提供的语义识别方法的流程图;FIG1 is a flow chart of a semantic recognition method provided by an embodiment of the present application;
图2是本申请实施例提供的更新词表的应用流程图;FIG2 is an application flow chart of updating a vocabulary table provided in an embodiment of the present application;
图3是本申请实施例提供的语义识别装置的结构图;FIG3 is a structural diagram of a semantic recognition device provided in an embodiment of the present application;
图4是本申请实施例提供的电子设备的结构图;FIG4 is a structural diagram of an electronic device provided in an embodiment of the present application;
图5是本申请实施例提供的电子设备的硬件结构图。FIG5 is a hardware structure diagram of an electronic device provided in an embodiment of the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all the embodiments. All other embodiments obtained by ordinary technicians in this field based on the embodiments in the present application belong to the scope of protection of this application.
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。The terms "first", "second", etc. in the specification and claims of this application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by "first", "second", etc. are generally of one type, and the number of objects is not limited. For example, the first object can be one or more. In addition, "and/or" in the specification and claims represents at least one of the connected objects, and the character "/" generally indicates that the objects associated with each other are in an "or" relationship.
在相关技术中,通常通过预先设置的词表对文本进行分类,上述词表可以表征关键词和文本类别之间的映射关系,且上述关键词可以用词表包括的词向量表征。具体而言,在获取到文本之后,提取文本中的文本关键词,计算上述文本关键词的词向量与词表中表征关键词的词向量之间的相似度,根据该相似度数值,确定该文本对应的文本类别,实现对文本的语义识别。 In the related art, the text is usually classified by a pre-set vocabulary, and the vocabulary can represent the mapping relationship between keywords and text categories, and the keywords can be represented by word vectors included in the vocabulary. Specifically, after obtaining the text, the text keywords in the text are extracted, and the similarity between the word vectors of the text keywords and the word vectors representing the keywords in the vocabulary is calculated. According to the similarity value, the text category corresponding to the text is determined to achieve semantic recognition of the text.
通常通过以下两种方式获得关键词的词向量,以构建词表:The word vectors of keywords are usually obtained in the following two ways to build a vocabulary:
方式一:Method 1:
将关键词输入至预设的文本处理模型中,获得该关键词对应的词向量。其中,上述文本处理模型包括Bert模型、马尔科夫模型或其他类型的文本处理模型。Input the keyword into a preset text processing model to obtain the word vector corresponding to the keyword. The above text processing model includes a Bert model, a Markov model or other types of text processing models.
方式二:Method 2:
将包括关键词的文本输入至预设的文本处理模型中,获得该文本对应的文本特征向量,进而将该文本特征向量确定为该关键词对应的词向量。可选地,上述文本为句子,则上述文本特征向量为句向量。The text including the keyword is input into a preset text processing model to obtain a text feature vector corresponding to the text, and then the text feature vector is determined as the word vector corresponding to the keyword. Optionally, if the text is a sentence, the text feature vector is a sentence vector.
然而,通过上述方式一获得的词向量不能完整的表达关键词的语义,进而影响了词表的准确性,在通过词表对文本进行分类的过程中,也降低了语义识别结果的准确性。However, the word vector obtained by the above method 1 cannot fully express the semantics of the keyword, which in turn affects the accuracy of the vocabulary. In the process of classifying texts through the vocabulary, the accuracy of the semantic recognition results is also reduced.
上述方式二获得的词向量容易受到文本中上下文信息的影响,即通过方式二获得的词向量不够准备,进而影响了词表的准确性,在通过词表对文本进行分类的过程中,也降低了语义识别结果的准确性。The word vector obtained by the above method 2 is easily affected by the context information in the text, that is, the word vector obtained by the method 2 is not sufficient, which in turn affects the accuracy of the vocabulary. In the process of classifying the text through the vocabulary, the accuracy of the semantic recognition result is also reduced.
基于上述存在的技术问题,本申请实施例提供了一种语义识别方法,下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的语义识别方法进行详细地说明。Based on the above-mentioned technical problems, an embodiment of the present application provides a semantic recognition method. The semantic recognition method provided by the embodiment of the present application is described in detail below through specific embodiments and application scenarios in combination with the accompanying drawings.
本申请实施例提供了一种语义识别方法,请参阅图1,图1是本申请实施例提供的语义识别方法的流程图。本申请实施例提供的语义识别方法包括以下步骤:The present application embodiment provides a semantic recognition method. Please refer to Figure 1, which is a flow chart of the semantic recognition method provided by the present application embodiment. The semantic recognition method provided by the present application embodiment includes the following steps:
S101,获取N个第一文本和第二文本。S101, obtaining N first texts and second texts.
本步骤中,获取N个第一文本,且每个第一文本均包括第一关键词,N为大于1的正整数。In this step, N first texts are obtained, and each first text includes a first keyword, where N is a positive integer greater than 1.
可选地,可以从预先设置的数据库中获取N个第一文本,或者通过互联网下载N个第一文本,在此不对N个第一文本的具体获取方式进行限定。Optionally, the N first texts may be obtained from a preset database, or downloaded from the Internet. The specific method for obtaining the N first texts is not limited herein.
可选地,上述第一文本为舆情分析项目中应用的舆情文本。 Optionally, the first text mentioned above is a public opinion text used in a public opinion analysis project.
可选地,上述第一文本可以为句子。Optionally, the first text may be a sentence.
本步骤中,获取第二文本,上述第二文本可以理解为待分类的文本。其中,上述第二文本包括第二关键词,可选地,上述第二关键词可以为人工标注的观点词或表征观点的对象词。In this step, a second text is obtained, and the second text can be understood as a text to be classified. The second text includes a second keyword, and optionally, the second keyword can be a manually annotated opinion word or an object word representing an opinion.
S102,通过预设的文本处理模型对每个所述第一文本进行第一文本处理,获得每个所述第一文本中第一关键词对应的M个字符向量。S102: Perform first text processing on each of the first texts using a preset text processing model to obtain M character vectors corresponding to the first keyword in each of the first texts.
本实施例中预先设置有文本处理模型,如上所述,该文本处理模型包括Bert模型、马尔科夫模型或其他类型的文本处理模型。In this embodiment, a text processing model is pre-set. As mentioned above, the text processing model includes a Bert model, a Markov model or other types of text processing models.
本步骤中,将N个第一文本输入至文本处理模型中,通过文本处理模型对每个第一文本进行第一文本处理,获得每个第一文本中第一关键词对应的M个字符向量。M为大于1的正整数。In this step, N first texts are input into the text processing model, and each first text is processed by the text processing model to obtain M character vectors corresponding to the first keyword in each first text. M is a positive integer greater than 1.
其中,上述字符向量又称为token向量,上述字符向量为第一关键词中每个字符对应的向量。The above character vector is also called a token vector, and the above character vector is a vector corresponding to each character in the first keyword.
其中,文本处理模型对输入文本进行处理,获得输入文本中每个字符对应的向量的处理方式,称为第一文本处理方式。Among them, the text processing model processes the input text to obtain a processing method of a vector corresponding to each character in the input text, which is called the first text processing method.
S103,根据M个所述字符向量,确定每个所述第一文本对应的第一词向量。S103: Determine a first word vector corresponding to each of the first texts according to the M character vectors.
本步骤中,在获得每个第一文本中第一关键词对应的M个字符向量之后,可以用这M个字符向量确定每个第一文本对应的第一词向量,其中,上述第一文本对应的第一词向量用于表征该第一文本中的第一关键词。In this step, after obtaining M character vectors corresponding to the first keyword in each first text, these M character vectors can be used to determine the first word vector corresponding to each first text, wherein the first word vector corresponding to the above first text is used to represent the first keyword in the first text.
S104,根据每个所述第一文本对应的第一词向量,对所述第二文本进行语义识别。S104: Perform semantic recognition on the second texts according to the first word vector corresponding to each of the first texts.
本步骤中,在获得每个第一文本对应的第一词向量,即N个第一词向量之后,可以根据N个第一词向量,对第二文本进行语义识别,具体的对第二文本进行语义识别的方式请参阅后续实施例。In this step, after obtaining the first word vector corresponding to each first text, that is, N first word vectors, semantic recognition can be performed on the second text based on the N first word vectors. For the specific method of performing semantic recognition on the second text, please refer to the subsequent embodiments.
本申请实施例通过预设的文本处理模型对每个第一文本进行第一文本处理,获得每个第一文本中第一关键词对应的M个字符向量,进而根据M个 字符向量,确定第一词向量。上述第一词向量是根据M个字符向量确定的,充分考虑到了第一文本中的上下文信息,并非如相关技术中那样将第一文本对应的文本特征向量直接确定为词向量;上述M个字符向量是通过文本处理模型对每个第一文本进行第一文本处理得到的,以此考虑到了第一关键词在第一文本中的语义。通过上述方式确保第一词向量在保留有关键词的语义的同时不会受到上下文信息的影响,进而在后续根据每个第一文本对应的第一词向量,对第二文本进行语义识别的过程中,提高语义识别的准确性。In the embodiment of the present application, each first text is processed by a preset text processing model to obtain M character vectors corresponding to the first keyword in each first text, and then the M character vectors are processed according to the M Character vector, determine the first word vector. The above-mentioned first word vector is determined based on M character vectors, taking full account of the context information in the first text, rather than directly determining the text feature vector corresponding to the first text as the word vector as in the related art; the above-mentioned M character vectors are obtained by performing first text processing on each first text through a text processing model, thereby taking into account the semantics of the first keyword in the first text. The above-mentioned method ensures that the first word vector will not be affected by the context information while retaining the semantics of the keyword, thereby improving the accuracy of semantic recognition in the subsequent process of semantic recognition of the second text based on the first word vector corresponding to each first text.
可选地,所述通过预设的文本处理模型对每个所述第一文本进行文本处理,获得每个所述第一文本中第一关键词对应的M个字符向量,包括:Optionally, performing text processing on each of the first texts by using a preset text processing model to obtain M character vectors corresponding to the first keyword in each of the first texts includes:
对于每个所述第一文本,通过所述文本处理模型对所述第一文本进行第一文本处理,获得所述第一文本中每个字符对应的至少两个字符向量;For each of the first texts, performing first text processing on the first text by using the text processing model to obtain at least two character vectors corresponding to each character in the first text;
根据所述第一关键词在所述第一文本中的位置,确定M个目标字符;所述M个目标字符用于表征所述第一关键词;Determine M target characters according to the position of the first keyword in the first text; the M target characters are used to represent the first keyword;
将每个所述目标字符对应的至少部分字符向量的平均值,确定为所述第一文本中第一关键词对应的M个字符向量。An average value of at least part of the character vectors corresponding to each of the target characters is determined as the M character vectors corresponding to the first keyword in the first text.
本实施例中,通过文本处理模型对第一文本进行第一文本处理之后,获得第一文本中每个字符对应的多个字符向量;进而根据第一关键词对应的字符在第一文本中的位置,确定第一文本包括的多个字符中用于表征第一关键词的M个目标字符;将每个目标字符对应的至少部分字符向量的平均值,确定为该目标字符对应的字符向量,进而将M个目标字符对应的M个字符向量,确定为第一关键词对应的M个字符向量。In this embodiment, after the first text is processed by the text processing model, multiple character vectors corresponding to each character in the first text are obtained; then, based on the position of the character corresponding to the first keyword in the first text, M target characters for representing the first keyword from the multiple characters included in the first text are determined; the average value of at least part of the character vector corresponding to each target character is determined as the character vector corresponding to the target character, and then the M character vectors corresponding to the M target characters are determined as the M character vectors corresponding to the first keyword.
例如,文本处理模型为包括25个网络层的Bert模型,第一文本表示为包括10个字符的句子“充不进电,充电太慢了”,第一关键词为“太慢了”。For example, the text processing model is a Bert model including 25 network layers, the first text is represented as a sentence including 10 characters "Cannot charge, charging is too slow", and the first keyword is "too slow".
这种情况下,将第一文本输入至Bert模型,Bert模型的每个网络层输出10个字符向量,即Bert模型总共输出250个字符向量,第一文本中的每个字符对应25个字符向量。示例性的,Bert模型第i层的输出结果可以表示为[vec_i1,vec_i2,…,vec_i10]。 In this case, the first text is input into the Bert model, and each network layer of the Bert model outputs 10 character vectors, that is, the Bert model outputs a total of 250 character vectors, and each character in the first text corresponds to 25 character vectors. Exemplarily, the output result of the i-th layer of the Bert model can be expressed as [vec_i1, vec_i2, ..., vec_i10].
第一关键词在第一文本中的位置为后三个字符,进而将第一文本中的后三个字符确定为3个目标字符。示例性的,在Bert模型的第i层,目标字符可以表示为[vec_i8,vec_i9,vec_i10]。The first keyword is located at the last three characters in the first text, and then the last three characters in the first text are determined as 3 target characters. Exemplarily, in the i-th layer of the Bert model, the target characters can be represented as [vec_i8, vec_i9, vec_i10].
如上所述,每个字符对应25个字符向量,也就是说,每个目标字符也对应25个字符向量。可选地,可以将该每个目标字符对应的第10至第25个字符向量的平均值,确定为该第一关键词对应的字符向量。As described above, each character corresponds to 25 character vectors, that is, each target character also corresponds to 25 character vectors. Optionally, the average value of the 10th to 25th character vectors corresponding to each target character can be determined as the character vector corresponding to the first keyword.
示例性的,第一关键词中的字符“太”对应25个字符向量,且这25个字符向量分别为Bert模型的第1个网络层至第25个网络层输出的字符向量;计算Bert模型第10个网络层至第25个网络层输出的字符向量的平均值,将该平均值确定为第一关键词中的字符“太”对应的一个字符向量。Exemplarily, the character "太" in the first keyword corresponds to 25 character vectors, and these 25 character vectors are the character vectors output by the 1st to 25th network layers of the Bert model respectively; the average value of the character vectors output by the 10th to 25th network layers of the Bert model is calculated, and the average value is determined as a character vector corresponding to the character "太" in the first keyword.
采样上述相同的方式,确定第一关键词中的字符“慢”对应的一个字符向量,以及确定第一关键词中的字符“了”对应的一个字符向量,进而将上述三个字符向量,确定为第一关键词对应的3个字符向量。The same sampling method as above is used to determine a character vector corresponding to the character "慢" in the first keyword, and a character vector corresponding to the character "了" in the first keyword, and then the above three character vectors are determined as the three character vectors corresponding to the first keyword.
可选地,也可以将每个目标字符对应的全部字符向量的平均值,确定为第一关键词对应的字符向量。Optionally, the average value of all character vectors corresponding to each target character may also be determined as the character vector corresponding to the first keyword.
本实施例中,通过预设的文本处理模型对每个第一文本进行第一文本处理,获得每个第一文本中第一关键词对应的M个字符向量,以此对包括第一关键词和部分上下文信息的第一文本进行文本处理,考虑到了第一关键词在第一文本中的语义,避免第一关键词的语义缺失。In this embodiment, each first text is processed by a preset text processing model to obtain M character vectors corresponding to the first keyword in each first text, so that the first text including the first keyword and some context information is processed, taking into account the semantics of the first keyword in the first text to avoid semantic loss of the first keyword.
可选地,所述根据M个所述字符向量,确定每个所述第一文本对应的第一词向量包括:Optionally, determining, based on the M character vectors, a first word vector corresponding to each of the first texts includes:
对于每个所述第一文本,将所述第一文本中第一关键词对应的M个字符向量的平均值,确定为所述第一文本对应的第一词向量。For each of the first texts, an average value of the M character vectors corresponding to the first keyword in the first text is determined as the first word vector corresponding to the first text.
本实施例中,可以将第一关键词对应的M个字符向量的平均值,确定为第一文本对应的第一词向量。应理解,在其他实施例中,可以对第一关键词对应的M个字符向量进行加权求和,将加权求和结果确定为第一文本对应的第一词向量。 In this embodiment, the average value of the M character vectors corresponding to the first keyword can be determined as the first word vector corresponding to the first text. It should be understood that in other embodiments, the M character vectors corresponding to the first keyword can be weighted and summed, and the weighted sum result can be determined as the first word vector corresponding to the first text.
本实施例中,根据M个字符向量,确定每个第一文本对应的第一词向量,而并非如相关技术中那样将第一文本对应的文本特征向量直接确定为词向量,以此避免第一词向量的表示受到第一文本中上下文信息的影响。In this embodiment, the first word vector corresponding to each first text is determined based on M character vectors, rather than directly determining the text feature vector corresponding to the first text as the word vector as in the related art, so as to avoid the representation of the first word vector being affected by the context information in the first text.
可选地,所述根据每个所述第一文本对应的第一词向量,对所述第二文本进行语义识别,包括:Optionally, performing semantic recognition on the second text according to the first word vector corresponding to each of the first texts includes:
根据每个所述第一文本对应的第一词向量,更新预设的词表;Update a preset vocabulary according to the first word vector corresponding to each of the first texts;
通过更新后的词表,确定所述第二文本对应的文本类别。The text category corresponding to the second text is determined through the updated vocabulary.
本实施例中预先设置有词表,该词表包括多个第一关键词和多个文本类别,且该词表用于表征第一关键词和文本类别之间的映射关系。该词表可以用于对文本进行分类,如背景技术中阐述的那样,提取待分类文本中的文本关键词,计算上述文本关键词的词向量与词表中表征第一关键词的词向量之间的相似度,根据该相似度数值,确定待分类文本对应的文本类别。In this embodiment, a vocabulary is pre-set, the vocabulary includes multiple first keywords and multiple text categories, and the vocabulary is used to represent the mapping relationship between the first keywords and the text categories. The vocabulary can be used to classify texts, as described in the background technology, extract text keywords from the text to be classified, calculate the similarity between the word vector of the above text keyword and the word vector representing the first keyword in the vocabulary, and determine the text category corresponding to the text to be classified according to the similarity value.
本实施例中,在确定N个第一词向量之后,可以根据每个第一文本对应的第一词向量更新上述预设的词表,进而通过更新后的词表确定第二文本对应的文本类别,以此实现对第二文本的语义识别。In this embodiment, after determining N first word vectors, the above-mentioned preset vocabulary can be updated according to the first word vector corresponding to each first text, and then the text category corresponding to the second text is determined by the updated vocabulary, so as to achieve semantic recognition of the second text.
在其他实施例中,也可以提取第二文本的词向量,根据每个个第一词向量与第二文本的词向量之间的相似度,确定至少一个第一词向量,进而将该第一词向量对应的文本类型确定为第二文本对应的文本类型,以此实现对第二文本的语义识别。In other embodiments, the word vector of the second text may also be extracted, and at least one first word vector may be determined based on the similarity between each first word vector and the word vector of the second text, and then the text type corresponding to the first word vector may be determined as the text type corresponding to the second text, thereby achieving semantic recognition of the second text.
若第一关键词包括多种语义,且词表可以表征包括多种语义的第一关键词与文本类别之间的映射关系。这种情况下,可以采用以下实施例涉及的技术方案,对词表进行更新:If the first keyword includes multiple semantics, and the vocabulary can represent the mapping relationship between the first keyword including multiple semantics and the text category, in this case, the technical solution involved in the following embodiment can be used to update the vocabulary:
可选地,所述根据每个所述第一文本对应的第一词向量,更新预设的词表,包括:Optionally, updating a preset vocabulary according to the first word vector corresponding to each of the first texts includes:
对N个第一词向量进行语义聚类处理,获得K个聚类簇;Perform semantic clustering on the N first word vectors to obtain K clusters;
将每个所述聚类簇包括的至少一个第一词向量的平均值,确定为所述聚类簇对应的第二词向量; Determine the average value of at least one first word vector included in each of the clusters as the second word vector corresponding to the cluster;
根据K个所述聚类簇对应的K个第二词向量,更新所述词表;更新后的词表包括用于表征所述第一关键词的K个第二词向量,且每个第二词向量用于表征所述第一关键词对应的一种语义。The word list is updated according to the K second word vectors corresponding to the K clusters; the updated word list includes K second word vectors for representing the first keyword, and each second word vector is used to represent a semantics corresponding to the first keyword.
本实施例中,对N个第一词向量进行聚类,将语义相近的第一词向量聚在一个簇中,获得K个聚类簇;其中,每个聚类簇包括至少一个第一词向量,K为小于或等于N的正整数。In this embodiment, N first word vectors are clustered, and first word vectors with similar semantics are clustered in one cluster to obtain K clusters; wherein each cluster includes at least one first word vector, and K is a positive integer less than or equal to N.
将每个聚类簇包括的至少一个第一词向量的平均值,确定为该聚类簇对应的第二词向量,具体而言,可以通过以下公式,确定聚类簇对应的第二词向量:
The average value of at least one first word vector included in each cluster is determined as the second word vector corresponding to the cluster. Specifically, the second word vector corresponding to the cluster can be determined by the following formula:
其中,表示第二词向量,NI表示聚类簇包括的第二词向量的数量,Vec_i表示第i个第二词向量,N_I表示第I个第一文本。in, Represents the second word vector, N I represents the number of second word vectors included in the cluster, Vec_i represents the i-th second word vector, and N_I represents the I-th first text.
在获得K个聚类簇以及每个聚类簇对应的第二词向量之后,更新词表。其中,更新后的词表包括用于表征第一关键词的K个第二词向量,即第一关键词可以用K个第二词向量表征,且每个第二词向量用于表征第一关键词对应的一种语义。例如,更新后的词表包括用于表征第一关键词的3个第二词向量,表示在更新后的词表包括了第一关键词的3种语义。After obtaining K clusters and the second word vector corresponding to each cluster, the word list is updated. The updated word list includes K second word vectors for representing the first keyword, that is, the first keyword can be represented by K second word vectors, and each second word vector is used to represent a semantics corresponding to the first keyword. For example, the updated word list includes 3 second word vectors for representing the first keyword, indicating that the updated word list includes 3 semantics of the first keyword.
可选地,所述词表用于表征第一关键词与文本类别之间的映射关系,所述通过更新后的词表,确定所述第二文本对应的文本类别,包括:Optionally, the vocabulary is used to represent a mapping relationship between a first keyword and a text category, and determining the text category corresponding to the second text through the updated vocabulary includes:
在所述第二文本包括的第二关键词与所述第一关键词不为同一关键词的情况下,根据所述第二文本对应的文本特征向量与每个第二词向量之间的相似度,确定所述第二文本对应的文本类别;所述第二文本对应的文本特征向量通过所述文本处理模型对所述第二文本进行第二文本处理获得;In the case where the second keyword included in the second text is not the same keyword as the first keyword, determining the text category corresponding to the second text according to the similarity between the text feature vector corresponding to the second text and each second word vector; the text feature vector corresponding to the second text is obtained by performing second text processing on the second text by the text processing model;
在所述第二关键词与所述第一关键词为同一关键词的情况下,根据所述更新后的词表表征的第一关键词与文本类别之间的映射关系,确定所述第二文本对应的文本类别。 In the case that the second keyword and the first keyword are the same keyword, the text category corresponding to the second text is determined according to the mapping relationship between the first keyword represented by the updated vocabulary and the text category.
应理解,词表用于表征第一关键词与文本类别之间的映射关系,本实施例中,可以根据更新后的词表,对文本进行分类。It should be understood that the vocabulary is used to characterize the mapping relationship between the first keyword and the text category. In this embodiment, the text can be classified according to the updated vocabulary.
在更新后的词表中对该第二关键词进行查询,若更新后的词表包括该第二关键词,即第二关键词与第一关键词为同一关键词,则可以根据更新后的词表表征的映射关系,直接确定第二文本对应的文本类别。The second keyword is queried in the updated vocabulary. If the updated vocabulary includes the second keyword, that is, the second keyword and the first keyword are the same keyword, the text category corresponding to the second text can be directly determined based on the mapping relationship represented by the updated vocabulary.
若更新后的词表不包括该第二关键词,即第二关键词与第一关键词不为同一关键词,这种情况下,可以通过预设的文本处理模型对第二文本进行第二文本处理,获得该第二文本对应的文本特征向量,其中,文本处理模型对输入文本进行处理,获得输入文本对应的文本特征向量的处理方式,称为第二文本处理方式。可选地,第二文本为包括第二关键词的句子,第二文本对应的文本特征向量为句向量。If the updated vocabulary does not include the second keyword, that is, the second keyword is not the same keyword as the first keyword, in this case, the second text can be processed by a preset text processing model to obtain a text feature vector corresponding to the second text, wherein the text processing model processes the input text to obtain a text feature vector corresponding to the input text, which is called a second text processing method. Optionally, the second text is a sentence including the second keyword, and the text feature vector corresponding to the second text is a sentence vector.
计算文本特征向量与词表中每个第二词向量之间的相似度,若存在与文本特征向量的相似度数值高于预设阈值的第二词向量,则将与该第二词向量存在映射关系的文本类别,确定为第二文本对应的文本类别。The similarity between the text feature vector and each second word vector in the vocabulary is calculated. If there is a second word vector whose similarity value with the text feature vector is higher than a preset threshold, the text category that has a mapping relationship with the second word vector is determined as the text category corresponding to the second text.
若存在多个与文本特征向量的相似度数值高于第一预设阈值的第二词向量,则确定多个第二词向量中相似度数值最高的第二词向量,并将与该第二词向量存在映射关系的文本类别,确定为第二文本对应的文本类别。If there are multiple second word vectors whose similarity values with the text feature vector are higher than the first preset threshold, the second word vector with the highest similarity value among the multiple second word vectors is determined, and the text category that has a mapping relationship with the second word vector is determined as the text category corresponding to the second text.
应理解,上述确定文本特征向量与第二词向量之间的相似度的方式包括但不限于,基于向量之间的余弦相似度、基于向量之间的欧氏距离或其他计算方式。It should be understood that the above-mentioned method of determining the similarity between the text feature vector and the second word vector includes, but is not limited to, based on cosine similarity between vectors, based on Euclidean distance between vectors or other calculation methods.
本实施例中,通过更新后的词表进行文本分类,实现对文本的语义识别。由于更新后的词表包括的第二词向量可以准确的表征关键词,因此不会影响文本关键词的词向量与词表中表征关键词的词向量之间的相似度计算结果,进而提高了语义识别结果的准确性。In this embodiment, the updated vocabulary is used to perform text classification to achieve semantic recognition of the text. Since the second word vector included in the updated vocabulary can accurately represent the keyword, it will not affect the similarity calculation result between the word vector of the text keyword and the word vector representing the keyword in the vocabulary, thereby improving the accuracy of the semantic recognition result.
若第一关键词包括多种语义,但词表表征包括一种语义的第一关键词与文本类别之间的映射关系。这种情况下,可以采用以下实施例涉及的技术方案,对词表进行更新: If the first keyword includes multiple semantics, but the vocabulary representation includes a mapping relationship between the first keyword with one semantic and the text category, in this case, the technical solution involved in the following embodiment can be used to update the vocabulary:
可选地,所述获取N个第一文本和第二文本之后,所述方法还包括:Optionally, after obtaining N first texts and second texts, the method further includes:
通过所述文本处理模型对N个所述第一文本进行第二文本处理,获得N个所述第一文本一一对应的N个文本特征向量;Performing second text processing on the N first texts by using the text processing model to obtain N text feature vectors corresponding to the N first texts one by one;
根据N个所述文本特征向量与基准词向量之间的相似度,对N个所述第一文本进行筛选,获得L个第一文本;所述基准词向量为通过所述文本处理模型对预设的基准文本进行第一文本处理获得;According to the similarity between the N text feature vectors and the benchmark word vector, the N first texts are screened to obtain L first texts; the benchmark word vector is obtained by performing first text processing on a preset benchmark text through the text processing model;
确定L个所述第一文本对应的第三词向量;所述第三词向量为L个第四词向量的平均值,L个所述第四词向量基于所述文本处理模型对L个所述第一文本进行第一文本处理获得;Determine third word vectors corresponding to L first texts; the third word vector is an average value of L fourth word vectors, and the L fourth word vectors are obtained by performing first text processing on the L first texts based on the text processing model;
根据L个所述第一文本对应的第三词向量,对所述第二文本进行语义识别。Perform semantic recognition on the second text based on L third word vectors corresponding to the first texts.
本实施例中,通过文本处理模型对N个第一文本进行第二文本处理,获得N个文本特征向量。第二文本处理的实施方式与上述实施例中涉及的第二文本处理的实施方式一致,在此不做重复阐述。可选地,上述第一文本为句子,文本特征向量为句向量。In this embodiment, the second text processing is performed on N first texts through the text processing model to obtain N text feature vectors. The implementation method of the second text processing is consistent with the implementation method of the second text processing involved in the above embodiment, and will not be repeated here. Optionally, the above first text is a sentence, and the text feature vector is a sentence vector.
通过文本处理模型对预设的基准文本进行第一文本处理,获得基准词向量。上述基准文本包括第一关键词,第一文本处理的实施方式与上述实施例中涉及的第一文本处理的实施方式一致,在此不做重复阐述。The text processing model performs first text processing on the preset reference text to obtain a reference word vector. The reference text includes a first keyword, and the implementation of the first text processing is consistent with the implementation of the first text processing involved in the above embodiment, which will not be repeated here.
计算每个文本特征向量与基准词向量之间的相似度,筛选相似度高于第二预设阈值的文本特征向量对应的第一文本,以此获得L个第一文本。The similarity between each text feature vector and the benchmark word vector is calculated, and the first texts corresponding to the text feature vectors whose similarity is higher than a second preset threshold are selected to obtain L first texts.
通过文本处理模型对L个第一文本进行第一文本处理,获得L个第四词向量;将这L个第四词向量的平均值确定为第三词向量,进而根据第三词向量,对第二文本进行语义识别。可选地,可以根据第三词向量更新预设的词表,其中,更新后的词表包括用于表征第一关键词的第三词向量,即第一关键词可以用1个第三词向量表征,且该第三词向量用于表征第一关键词对应的一种语义。The L first texts are processed by the text processing model to obtain L fourth word vectors; the average value of the L fourth word vectors is determined as the third word vector, and then the second text is semantically recognized based on the third word vector. Optionally, a preset word list can be updated based on the third word vector, wherein the updated word list includes a third word vector for representing the first keyword, that is, the first keyword can be represented by 1 third word vector, and the third word vector is used to represent a semantics corresponding to the first keyword.
为便于理解本申请提供的一些实施例的技术方案,请参阅图2,如图2 所示,本申请的一些实施例中,预先设置有词表,可以更新预设的词表,进而通过更新后的词表对第二文本进行语义识别。本申请提供的更新词表的应用流程为:获取包括第一关键词的N个第一文本。To facilitate understanding of the technical solutions of some embodiments provided in this application, please refer to FIG. As shown, in some embodiments of the present application, a vocabulary is pre-set, and the preset vocabulary can be updated, and then the second text is semantically recognized through the updated vocabulary. The application process of updating the vocabulary provided by the present application is: obtain N first texts including the first keyword.
在第一关键词包括多种语义,且词表可以表征包括多种语义的第一关键词与文本类别之间的映射关系的情况下,通过以下方式更新词表;确定N个第一文本对应的N个第一词向量,对N个第一词向量进行语义聚类处理,获得K个聚类簇;确定每个聚类簇对应的第二词向量;根据K个聚类簇对应的K个第二词向量,更新词表。When the first keyword includes multiple semantics and the vocabulary can represent the mapping relationship between the first keyword including multiple semantics and the text category, the vocabulary is updated in the following manner: determine N first word vectors corresponding to N first texts, perform semantic clustering on the N first word vectors, and obtain K cluster clusters; determine the second word vector corresponding to each cluster cluster; and update the vocabulary according to the K second word vectors corresponding to the K cluster clusters.
在第一关键词包括多种语义,词表表征包括一种语义的第一关键词与文本类别之间的映射关系的情况下,通过以下方式更新词表;通过文本处理模型对N个第一文本进行第二文本处理,获得N个文本特征向量;根据N个文本特征向量与基准词向量之间的相似度,筛选获得L个第一文本;确定L个第一文本对应的第三词向量;根据第三词向量,更新词表。In the case where the first keyword includes multiple semantics and the vocabulary representation includes a mapping relationship between the first keyword of one semantic and the text category, the vocabulary is updated in the following manner: second text processing is performed on N first texts through a text processing model to obtain N text feature vectors; based on the similarity between the N text feature vectors and the benchmark word vectors, L first texts are screened and obtained; the third word vectors corresponding to the L first texts are determined; and based on the third word vectors, the vocabulary is updated.
可选地,所述根据L个所述第一文本对应的第三词向量,对所述第二文本进行语义识别,包括:Optionally, performing semantic recognition on the second text according to the L third word vectors corresponding to the first texts includes:
根据L个所述第一文本对应的第三词向量,更新预设的词表;更新后的词表包括用于表征所述第一关键词的第三词向量,且更新后的词表用于表征第一关键词与文本类别之间的映射关系;According to the third word vectors corresponding to the L first texts, a preset word list is updated; the updated word list includes the third word vector used to represent the first keyword, and the updated word list is used to represent the mapping relationship between the first keyword and the text category;
在所述第二文本包括的第二关键词与所述第一关键词不为同一关键词的情况下,根据所述第二文本对应的文本特征向量与所述第三词向量之间的相似度,确定所述第二文本对应的文本类别;所述第二文本对应的文本特征向量通过所述文本处理模型对所述第二文本进行第二文本处理获得;In the case where the second keyword included in the second text is not the same keyword as the first keyword, determining the text category corresponding to the second text according to the similarity between the text feature vector corresponding to the second text and the third word vector; the text feature vector corresponding to the second text is obtained by performing second text processing on the second text by the text processing model;
在所述第二关键词与所述第一关键词为同一关键词的情况下,根据所述更新后的词表表征的第一关键词与文本类别之间的映射关系,确定所述第二文本对应的文本类别。In the case that the second keyword and the first keyword are the same keyword, the text category corresponding to the second text is determined according to the mapping relationship between the first keyword represented by the updated vocabulary and the text category.
本实施例中预先设置有词表,该词表包括多个第一关键词和多个文本类别,且该词表用于表征第一关键词和文本类别之间的映射关系,该词表可以 用于对文本进行分类。In this embodiment, a vocabulary is pre-set, the vocabulary includes a plurality of first keywords and a plurality of text categories, and the vocabulary is used to represent the mapping relationship between the first keywords and the text categories. The vocabulary can be Used to categorize text.
本实施例中,可以根据第三词向量更新预设的词表,具体的更新方式与上述根据第一词向量更新词表的方式一致,在此不做重复阐述。其中,更新后的词表包括用于表征第一关键词的第三词向量,即第一关键词可以用1个第三词向量表征,且该第三词向量用于表征第一关键词对应的一种语义。In this embodiment, the preset word list can be updated according to the third word vector. The specific updating method is consistent with the above-mentioned method of updating the word list according to the first word vector, and will not be repeated here. Among them, the updated word list includes a third word vector for representing the first keyword, that is, the first keyword can be represented by 1 third word vector, and the third word vector is used to represent a semantics corresponding to the first keyword.
本实施例中,可以根据更新后的词表,对文本进行分类,进而实现对文本的语义识别。In this embodiment, the text can be classified according to the updated vocabulary, thereby achieving semantic recognition of the text.
在更新后的词表中对第二文本包括的第二关键词进行查询,若更新后的词表包括该第二关键词,即第二关键词与第一关键词为同一关键词,则可以根据更新后的词表表征的映射关系,直接确定第二文本对应的文本类别。The second keyword included in the second text is queried in the updated vocabulary. If the updated vocabulary includes the second keyword, that is, the second keyword and the first keyword are the same keyword, the text category corresponding to the second text can be directly determined based on the mapping relationship represented by the updated vocabulary.
若更新后的词表不包括该第二关键词,即第二关键词与第一关键词不为同一关键词,这种情况下,可以通过预设的文本处理模型对第二文本进行第二文本处理,获得该第二文本对应的文本特征向量;进而计算文本特征向量与词表中第三词向量之间的相似度,若文本特征向量与第三词向量之间的相似度数值高于第三预设阈值,则将与该第三词向量存在映射关系的文本类别,确定为第二文本对应的文本类别。If the updated vocabulary does not include the second keyword, that is, the second keyword is not the same keyword as the first keyword, in this case, the second text can be processed by a preset text processing model to obtain a text feature vector corresponding to the second text; then the similarity between the text feature vector and the third word vector in the vocabulary is calculated. If the similarity value between the text feature vector and the third word vector is higher than a third preset threshold, the text category that has a mapping relationship with the third word vector is determined as the text category corresponding to the second text.
本实施例中,通过更新后的词表进行文本分类,实现对文本的语义识别。由于更新后的词表包括的第三词向量可以准确的表征关键词,因此不会影响文本关键词的词向量与词表中表征关键词的词向量之间的相似度计算结果,进而提高了语义识别结果的准确性。In this embodiment, the text is classified by the updated vocabulary to achieve semantic recognition of the text. Since the third word vector included in the updated vocabulary can accurately represent the keyword, it will not affect the similarity calculation result between the word vector of the text keyword and the word vector representing the keyword in the vocabulary, thereby improving the accuracy of the semantic recognition result.
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的语义识别装置进行详细地说明。The semantic recognition device provided in the embodiment of the present application is described in detail below through specific embodiments and their application scenarios in conjunction with the accompanying drawings.
如图3所示,语义识别装置300包括:As shown in FIG3 , the semantic recognition device 300 includes:
获取模块301,用于获取N个第一文本和第二文本;每个所述第一文本均包括预设的词表中的第一关键词,N为大于1的正整数;An acquisition module 301 is used to acquire N first texts and second texts; each of the first texts includes a first keyword in a preset vocabulary, and N is a positive integer greater than 1;
第一处理模块302,用于通过预设的文本处理模型对所述每个第一文本进行第一文本处理,获得每个所述第一文本中第一关键词对应的M个字符向 量,M为大于1的正整数;The first processing module 302 is used to perform first text processing on each of the first texts using a preset text processing model to obtain M character vectors corresponding to the first keyword in each of the first texts. quantity, M is a positive integer greater than 1;
第一确定模块303,用于根据M个所述字符向量,确定每个所述第一文本对应的第一词向量;A first determination module 303, configured to determine a first word vector corresponding to each of the first texts according to the M character vectors;
第一识别模块304,用于根据每个所述第一文本对应的第一词向量,对所述第二文本进行语义识别。The first recognition module 304 is used to perform semantic recognition on the second text according to the first word vector corresponding to each of the first texts.
可选地,所述第一处理模块302,具体用于:Optionally, the first processing module 302 is specifically configured to:
对于每个所述第一文本,通过所述文本处理模型对所述第一文本进行第一文本处理,获得所述第一文本中每个字符对应的至少两个字符向量;For each of the first texts, performing first text processing on the first text by using the text processing model to obtain at least two character vectors corresponding to each character in the first text;
根据所述第一关键词在所述第一文本中的位置,确定M个目标字符;所述M个目标字符用于表征所述第一关键词;Determine M target characters according to the position of the first keyword in the first text; the M target characters are used to represent the first keyword;
将每个所述目标字符对应的至少部分字符向量的平均值,确定为所述第一文本中第一关键词对应的M个字符向量。An average value of at least part of the character vectors corresponding to each of the target characters is determined as the M character vectors corresponding to the first keyword in the first text.
可选地,所述第一确定模块303,具体用于:Optionally, the first determining module 303 is specifically configured to:
对于每个所述第一文本,将所述第一文本中第一关键词对应的M个字符向量的平均值,确定为所述第一文本对应的第一词向量。For each of the first texts, an average value of the M character vectors corresponding to the first keyword in the first text is determined as the first word vector corresponding to the first text.
可选地,所述第一识别模块304,具体用于:Optionally, the first identification module 304 is specifically configured to:
根据每个所述第一文本对应的第一词向量,更新预设的词表;所述词表包括每个所述第一文本中的第一关键词;According to the first word vector corresponding to each of the first texts, a preset word list is updated; the word list includes the first keyword in each of the first texts;
通过更新后的词表,确定所述第二文本对应的文本类别。The text category corresponding to the second text is determined through the updated vocabulary.
可选地,所述第一识别模块304,还具体用于:Optionally, the first identification module 304 is further specifically configured to:
对N个第一词向量进行语义聚类处理,获得K个聚类簇;每个聚类簇包括至少一个第一词向量,K为小于或等于N的正整数;Perform semantic clustering on the N first word vectors to obtain K clusters; each cluster includes at least one first word vector, and K is a positive integer less than or equal to N;
将每个所述聚类簇包括的至少一个第一词向量的平均值,确定为所述聚类簇对应的第二词向量;Determine the average value of at least one first word vector included in each of the clusters as the second word vector corresponding to the cluster;
根据K个所述聚类簇对应的K个第二词向量,更新所述词表;更新后的词表包括用于表征所述第一关键词的K个第二词向量,且每个第二词向量用于表征所述第一关键词对应的一种语义。 The word list is updated according to the K second word vectors corresponding to the K clusters; the updated word list includes K second word vectors for representing the first keyword, and each second word vector is used to represent a semantics corresponding to the first keyword.
可选地,所述词表用于表征第一关键词与文本类别之间的映射关系;Optionally, the vocabulary is used to characterize a mapping relationship between the first keyword and the text category;
所述第一识别模块304,还具体用于:The first identification module 304 is further specifically configured to:
在所述第二文本包括的第二关键词与所述第一关键词不为同一关键词的情况下,根据所述第二文本对应的文本特征向量与每个第二词向量之间的相似度,确定所述第二文本对应的文本类别;所述第二文本对应的文本特征向量通过所述文本处理模型对所述第二文本进行第二文本处理获得;In the case where the second keyword included in the second text is not the same keyword as the first keyword, determining the text category corresponding to the second text according to the similarity between the text feature vector corresponding to the second text and each second word vector; the text feature vector corresponding to the second text is obtained by performing second text processing on the second text by the text processing model;
在所述第二关键词与所述第一关键词为同一关键词的情况下,根据所述更新后的词表表征的第一关键词与文本类别之间的映射关系,确定所述第二文本对应的文本类别。In the case that the second keyword and the first keyword are the same keyword, the text category corresponding to the second text is determined according to the mapping relationship between the first keyword represented by the updated vocabulary and the text category.
可选地,所述语义识别装置300还包括:Optionally, the semantic recognition device 300 further includes:
第二处理模块,用于通过所述文本处理模型对N个所述第一文本进行第二文本处理,获得N个所述第一文本一一对应的N个文本特征向量;A second processing module, configured to perform second text processing on the N first texts by using the text processing model to obtain N text feature vectors corresponding to the N first texts one by one;
筛选模块,用于根据N个所述文本特征向量与基准词向量之间的相似度,对N个所述第一文本进行筛选,获得L个第一文本;所述基准词向量为通过所述文本处理模型对预设的基准文本进行第一文本处理获得;A screening module, configured to screen the N first texts according to the similarity between the N text feature vectors and the benchmark word vectors to obtain L first texts; the benchmark word vectors are obtained by performing first text processing on a preset benchmark text through the text processing model;
第二确定模块,用于确定L个所述第一文本对应的第三词向量;所述第三词向量为L个第四词向量的平均值,L个所述第四词向量基于所述文本处理模型对L个所述第一文本进行第一文本处理获得;A second determination module is used to determine third word vectors corresponding to L first texts; the third word vector is an average value of L fourth word vectors, and the L fourth word vectors are obtained by performing first text processing on the L first texts based on the text processing model;
第二识别模块,用于根据L个所述第一文本对应的第三词向量,对所述第二文本进行语义识别。The second recognition module is used to perform semantic recognition on the second text according to the L third word vectors corresponding to the first texts.
可选地,所述第二识别模块,具体用于:Optionally, the second identification module is specifically used to:
根据L个所述第一文本对应的第三词向量,更新预设的词表;更新后的词表包括用于表征所述第一关键词的第三词向量,且更新后的词表用于表征第一关键词与文本类别之间的映射关系;According to the L third word vectors corresponding to the first texts, a preset word list is updated; the updated word list includes the third word vector used to represent the first keyword, and the updated word list is used to represent the mapping relationship between the first keyword and the text category;
在所述第二文本包括的第二关键词与所述第一关键词不为同一关键词的情况下,根据所述第二文本对应的文本特征向量与所述第三词向量之间的相似度,确定所述第二文本对应的文本类别;所述第二文本对应的文本特征向 量通过所述文本处理模型对所述第二文本进行第二文本处理获得;In the case where the second keyword included in the second text is not the same keyword as the first keyword, the text category corresponding to the second text is determined according to the similarity between the text feature vector corresponding to the second text and the third word vector; the text feature vector corresponding to the second text The amount is obtained by performing second text processing on the second text by the text processing model;
在所述第二关键词与所述第一关键词为同一关键词的情况下,根据所述更新后的词表表征的第一关键词与文本类别之间的映射关系,确定所述第二文本对应的文本类别。In the case that the second keyword and the first keyword are the same keyword, the text category corresponding to the second text is determined according to the mapping relationship between the first keyword represented by the updated vocabulary and the text category.
本申请实施例通过预设的文本处理模型对每个第一文本进行第一文本处理,获得每个第一文本中第一关键词对应的M个字符向量,进而根据M个字符向量,确定第一词向量。上述第一词向量是根据M个字符向量确定的,充分考虑到了第一文本中的上下文信息,并非如相关技术中那样将第一文本对应的文本特征向量直接确定为词向量;上述M个字符向量是通过文本处理模型对每个第一文本进行第一文本处理得到的,以此考虑到了第一关键词在第一文本中的语义。通过上述方式确保第一词向量在保留有关键词的语义的同时不会受到上下文信息的影响,进而在后续根据每个第一文本对应的第一词向量,对第二文本进行语义识别的过程中,提高语义识别的准确性。The embodiment of the present application performs first text processing on each first text through a preset text processing model, obtains M character vectors corresponding to the first keyword in each first text, and then determines the first word vector based on the M character vectors. The above-mentioned first word vector is determined based on the M character vectors, which fully considers the context information in the first text, and does not directly determine the text feature vector corresponding to the first text as the word vector as in the related art; the above-mentioned M character vectors are obtained by performing first text processing on each first text through the text processing model, thereby taking into account the semantics of the first keyword in the first text. The above method ensures that the first word vector will not be affected by the context information while retaining the semantics of the keyword, thereby improving the accuracy of semantic recognition in the subsequent process of semantic recognition of the second text based on the first word vector corresponding to each first text.
本申请实施例中的语义识别装置可以是电子设备,也可以是电子设备中的部件、例如集成电路或芯片。该电子设备可以是终端,也可以为除终端之外的其他设备。示例性的,电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、移动上网装置(Mobile Internet Device,MID)、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、机器人、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,还可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(personal computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。The semantic recognition device in the embodiment of the present application can be an electronic device, or a component in the electronic device, such as an integrated circuit or a chip. The electronic device can be a terminal, or it can be other devices other than a terminal. Exemplarily, the electronic device can be a mobile phone, a tablet computer, a laptop computer, a PDA, a car-mounted electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) device, a robot, a wearable device, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook or a personal digital assistant (personal digital assistant, PDA), etc. It can also be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a television (television, TV), a teller machine or a self-service machine, etc., and the embodiment of the present application does not make specific limitations.
本申请实施例中的语义识别装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为iOS操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。The semantic recognition device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiment of the present application.
本申请实施例提供语义识别装置能够实现图1的方法实施例实现的各个 过程,为避免重复,这里不再赘述。The embodiment of the present application provides a semantic recognition device that can implement various aspects of the method embodiment of FIG. 1. To avoid repetition, the process will not be described here.
可选地,如图4所示,本申请实施例还提供一种电子设备400,包括处理器401,存储器402,存储在存储器402上并可在所述处理器401上运行的程序或指令,该程序或指令被处理器401执行时实现上述语义识别方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Optionally, as shown in Figure 4, an embodiment of the present application also provides an electronic device 400, including a processor 401, a memory 402, and a program or instruction stored in the memory 402 and executable on the processor 401. When the program or instruction is executed by the processor 401, each process of the above-mentioned semantic recognition method embodiment is implemented, and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.
需要说明的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and non-mobile electronic devices mentioned above.
图5为实现本申请实施例的一种电子设备的硬件结构示意图。FIG5 is a schematic diagram of the hardware structure of an electronic device implementing an embodiment of the present application.
该电子设备500包括但不限于:射频单元501、网络模块502、音频输出单元503、输入单元504、传感器505、显示单元506、用户输入单元505、接口单元508、存储器509、以及处理器510等部件。The electronic device 500 includes but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 505, an interface unit 508, a memory 509, and a processor 510.
本领域技术人员可以理解,电子设备500还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器510逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图5中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。Those skilled in the art will appreciate that the electronic device 500 may also include a power source (such as a battery) for supplying power to each component, and the power source may be logically connected to the processor 510 through a power management system, so that the power management system can manage charging, discharging, and power consumption. The electronic device structure shown in FIG5 does not constitute a limitation on the electronic device, and the electronic device may include more or fewer components than shown, or combine certain components, or arrange components differently, which will not be described in detail here.
其中,所述输入单元504,还用于获取N个第一文本和第二文本;The input unit 504 is further used to obtain N first texts and second texts;
所述处理器510,还用于通过预设的文本处理模型对每个第一文本进行第一文本处理,获得所述每个第一文本中第一关键词对应的M个字符向量;The processor 510 is further configured to perform first text processing on each first text by using a preset text processing model to obtain M character vectors corresponding to the first keyword in each first text;
根据所述M个字符向量,确定所述每个第一文本对应的第一词向量;Determine, according to the M character vectors, a first word vector corresponding to each first text;
根据每个所述第一文本对应的第一词向量,对所述第二文本进行语义识别。Perform semantic recognition on the second text according to the first word vector corresponding to each of the first texts.
本申请实施例通过预设的文本处理模型对每个第一文本进行第一文本处理,获得每个第一文本中第一关键词对应的M个字符向量,进而根据M个字符向量,确定第一词向量。上述第一词向量是根据M个字符向量确定的,充分考虑到了第一文本中的上下文信息,并非如相关技术中那样将第一文本对应的文本特征向量直接确定为词向量;上述M个字符向量是通过文本处理 模型对每个第一文本进行第一文本处理得到的,以此考虑到了第一关键词在第一文本中的语义。通过上述方式确保第一词向量在保留有关键词的语义的同时不会受到上下文信息的影响,进而在后续根据每个第一文本对应的第一词向量,对第二文本进行语义识别的过程中,提高语义识别的准确性。The embodiment of the present application performs first text processing on each first text through a preset text processing model, obtains M character vectors corresponding to the first keyword in each first text, and then determines the first word vector based on the M character vectors. The above first word vector is determined based on the M character vectors, taking full account of the context information in the first text, rather than directly determining the text feature vector corresponding to the first text as the word vector as in the related art; the above M character vectors are obtained through text processing. The model processes each first text, thereby taking into account the semantics of the first keyword in the first text. The above method ensures that the first word vector will not be affected by context information while retaining the semantics of the keyword, thereby improving the accuracy of semantic recognition in the subsequent process of semantic recognition of the second text based on the first word vector corresponding to each first text.
应理解的是,本申请实施例中,输入单元504可以包括图形处理器(Graphics Processing Unit,GPU)5041和麦克风5042,图形处理器5041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元506可包括显示面板5061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板5061。用户输入单元506包括触控面板5061以及其他输入设备5062中的至少一种。触控面板5061,也称为触摸屏。触控面板5061可包括触摸检测装置和触摸控制器两个部分。其他输入设备5062可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。It should be understood that in the embodiment of the present application, the input unit 504 may include a graphics processing unit (GPU) 5041 and a microphone 5042, and the graphics processor 5041 processes the image data of the static picture or video obtained by the image capture device (such as a camera) in the video capture mode or the image capture mode. The display unit 506 may include a display panel 5061, and the display panel 5061 may be configured in the form of a liquid crystal display, an organic light emitting diode, etc. The user input unit 506 includes a touch panel 5061 and at least one of other input devices 5062. The touch panel 5061 is also called a touch screen. The touch panel 5061 may include two parts: a touch detection device and a touch controller. Other input devices 5062 may include, but are not limited to, a physical keyboard, function keys (such as a volume control key, a switch key, etc.), a trackball, a mouse, and a joystick, which will not be repeated here.
存储器509可用于存储软件程序以及各种数据。存储器509可主要包括存储程序或指令的第一存储区和存储数据的第二存储区,其中,第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外,存储器509可以包括易失性存储器或非易失性存储器,或者,存储器509可以包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM,SLDRAM)和 直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请实施例中的存储器509包括但不限于这些和任意其它适合类型的存储器。The memory 509 can be used to store software programs and various data. The memory 509 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required for at least one function (such as a sound playback function, an image playback function, etc.), etc. In addition, the memory 509 may include a volatile memory or a non-volatile memory, or the memory 509 may include both volatile and non-volatile memories. Among them, the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. The volatile memory may be random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM) and Direct Rambus RAM (DRRAM) The memory 509 in the embodiment of the present application includes but is not limited to these and any other suitable types of memory.
处理器510可包括一个或多个处理单元;可选的,处理器510集成应用处理器和调制解调处理器,其中,应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作,调制解调处理器主要处理无线通信信号,如基带处理器。可以理解的是,上述调制解调处理器也可以不集成到处理器510中。The processor 510 may include one or more processing units; optionally, the processor 510 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to an operating system, a user interface, and application programs, and the modem processor mainly processes wireless communication signals, such as a baseband processor. It is understandable that the modem processor may not be integrated into the processor 510.
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述语义识别方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。An embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored. When the program or instruction is executed by a processor, the various processes of the above-mentioned semantic recognition method embodiment are implemented and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.
其中,所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器(ROM)、随机存取存储器(RAM)、磁碟或者光盘等。The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述语义识别方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。An embodiment of the present application further provides a chip, which includes a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned semantic recognition method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chip mentioned in the embodiments of the present application can also be called a system-level chip, a system chip, a chip system or a system-on-chip chip, etc.
本申请实施例提供一种计算机程序产品,该程序产品被存储在存储介质中,该程序产品被至少一个处理器执行以实现上述语义识别方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。An embodiment of the present application provides a computer program product, which is stored in a storage medium. The program product is executed by at least one processor to implement the various processes of the above-mentioned semantic recognition method embodiment and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请 实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this article, the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device that includes a series of elements includes not only those elements, but also includes other elements that are not explicitly listed, or also includes elements that are inherent to such process, method, article or device. In the absence of further restrictions, an element defined by the sentence "includes a..." does not exclude the existence of other identical elements in the process, method, article or device that includes the element. In addition, it should be noted that this application The scope of the methods and devices in the embodiments is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order depending on the functions involved. For example, the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the above-mentioned embodiment methods can be implemented by means of software plus a necessary general hardware platform, and of course by hardware, but in many cases the former is a better implementation method. Based on such an understanding, the technical solution of the present application, or the part that contributes to the prior art, can be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in each embodiment of the present application.
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。 The embodiments of the present application are described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific implementation methods. The above-mentioned specific implementation methods are merely illustrative and not restrictive. Under the guidance of the present application, ordinary technicians in this field can also make many forms without departing from the purpose of the present application and the scope of protection of the claims, all of which are within the protection of the present application.
Claims (20)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310181906.7 | 2023-03-01 | ||
| CN202310181906.7A CN116187341A (en) | 2023-03-01 | 2023-03-01 | Semantic recognition method and device thereof |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024179519A1 true WO2024179519A1 (en) | 2024-09-06 |
Family
ID=86446059
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/079034 Ceased WO2024179519A1 (en) | 2023-03-01 | 2024-02-28 | Semantic recognition method and apparatus |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN116187341A (en) |
| WO (1) | WO2024179519A1 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116187341A (en) * | 2023-03-01 | 2023-05-30 | 维沃移动通信有限公司 | Semantic recognition method and device thereof |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106599269A (en) * | 2016-12-22 | 2017-04-26 | 东软集团股份有限公司 | Keyword extracting method and device |
| US20190171792A1 (en) * | 2017-12-01 | 2019-06-06 | International Business Machines Corporation | Interaction network inference from vector representation of words |
| US20190325029A1 (en) * | 2018-04-18 | 2019-10-24 | HelpShift, Inc. | System and methods for processing and interpreting text messages |
| CN110704638A (en) * | 2019-09-30 | 2020-01-17 | 南京邮电大学 | A Construction Method of Electric Power Text Dictionary Based on Clustering Algorithm |
| CN111241819A (en) * | 2020-01-07 | 2020-06-05 | 北京百度网讯科技有限公司 | Word vector generation method, device and electronic device |
| CN113961666A (en) * | 2021-09-18 | 2022-01-21 | 腾讯科技(深圳)有限公司 | Keyword recognition method, apparatus, device, medium, and computer program product |
| CN116187341A (en) * | 2023-03-01 | 2023-05-30 | 维沃移动通信有限公司 | Semantic recognition method and device thereof |
-
2023
- 2023-03-01 CN CN202310181906.7A patent/CN116187341A/en active Pending
-
2024
- 2024-02-28 WO PCT/CN2024/079034 patent/WO2024179519A1/en not_active Ceased
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106599269A (en) * | 2016-12-22 | 2017-04-26 | 东软集团股份有限公司 | Keyword extracting method and device |
| US20190171792A1 (en) * | 2017-12-01 | 2019-06-06 | International Business Machines Corporation | Interaction network inference from vector representation of words |
| US20190325029A1 (en) * | 2018-04-18 | 2019-10-24 | HelpShift, Inc. | System and methods for processing and interpreting text messages |
| CN110704638A (en) * | 2019-09-30 | 2020-01-17 | 南京邮电大学 | A Construction Method of Electric Power Text Dictionary Based on Clustering Algorithm |
| CN111241819A (en) * | 2020-01-07 | 2020-06-05 | 北京百度网讯科技有限公司 | Word vector generation method, device and electronic device |
| CN113961666A (en) * | 2021-09-18 | 2022-01-21 | 腾讯科技(深圳)有限公司 | Keyword recognition method, apparatus, device, medium, and computer program product |
| CN116187341A (en) * | 2023-03-01 | 2023-05-30 | 维沃移动通信有限公司 | Semantic recognition method and device thereof |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116187341A (en) | 2023-05-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111259148B (en) | Information processing method, device and storage medium | |
| CN109522538B (en) | Automatic listing method, device, equipment and storage medium for table contents | |
| CN107102746B (en) | Candidate word generation method and device and candidate word generation device | |
| US11436282B2 (en) | Methods, devices and media for providing search suggestions | |
| CN107608532B (en) | Association input method and device and electronic equipment | |
| US11856277B2 (en) | Method and apparatus for processing video, electronic device, medium and product | |
| CN108829893A (en) | Determine method, apparatus, storage medium and the terminal device of video tab | |
| CN109189879B (en) | Electronic book display method and device | |
| WO2018045646A1 (en) | Artificial intelligence-based method and device for human-machine interaction | |
| CN110019675B (en) | Keyword extraction method and device | |
| CN114328838A (en) | Event extraction method, apparatus, electronic device, and readable storage medium | |
| CN112631437A (en) | Information recommendation method and device and electronic equipment | |
| CN111814481B (en) | Shopping intention recognition method, device, terminal equipment and storage medium | |
| WO2024012289A1 (en) | Video generation method and apparatus, electronic device and medium | |
| WO2024179519A1 (en) | Semantic recognition method and apparatus | |
| CN117520544A (en) | Information identification method and device based on artificial intelligence and computer equipment | |
| CN108197105A (en) | Natural language processing method, apparatus, storage medium and electronic equipment | |
| CN115309487A (en) | Display method, display device, electronic equipment and readable storage medium | |
| US10241988B2 (en) | Prioritizing smart tag creation | |
| CN115035891A (en) | Voice recognition method and device, electronic equipment and time sequence fusion language model | |
| WO2024149183A1 (en) | Document display method and apparatus, and electronic device | |
| CN119006888A (en) | Image labeling method, device, equipment and medium | |
| CN119903135A (en) | A knowledge question answering method, device and storage medium | |
| CN115494965B (en) | Request sending method and device for sending request | |
| CN116069936B (en) | Method and device for generating digital media article |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24763199 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |