CN113191157A

CN113191157A - Method and system for processing text unit

Info

Publication number: CN113191157A
Application number: CN202110539425.XA
Authority: CN
Inventors: 史元春; 喻纯; 杨欢
Original assignee: Interactive Future Beijing Technology Co ltd; Tsinghua University
Current assignee: Interactive Future Beijing Technology Co ltd; Tsinghua University
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2021-07-30
Anticipated expiration: 2041-05-18
Also published as: CN113191157B

Abstract

The present invention provides a method and system for processing text units. The method comprises the following steps: using a semantic recognition model obtained by pre-training, classifying the voice content of a user for editing target text, and obtaining the text to be analyzed and the result of the intention classification; Based on the content in the text to be analyzed, it is determined whether the format of the text to be analyzed is the format of the homophones; if so, the last text unit in the text to be analyzed is extracted and used as the That is, the homophone text unit that needs to be processed; according to the intent classification result and the text unit to be processed, the target text is edited to assist the visually impaired people to accurately input the homophone text unit, thereby improving the user experience.

Description

Method and system for processing text unit

Technical Field

The invention relates to the technical field of voice interaction, in particular to a method and a system for processing a text unit.

Background

With the development of voice recognition technology, the application scenarios of voice interaction are more and more extensive. In the voice interaction process, the misrecognition of the homophones or the homophones is frequently encountered, and the user needs to modify the misrecognized homophones or the homophones in a manual operation mode. However, it is difficult for the visually impaired to modify the incorrectly recognized homophones or homophones, so a method for assisting the visually impaired to accurately input the homophones or homophones is needed.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and a system for processing text units to assist the visually impaired to accurately input homophones or homophones.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

the first aspect of the embodiments of the present invention discloses a method for processing a text unit, where the method includes:

performing intention classification on voice content of a target text edited by a user by utilizing a semantic recognition model obtained by pre-training to obtain a text to be analyzed and an intention classification result, wherein the intention classification result is a text input intention, a replacement intention, an insertion intention or a deletion intention;

determining whether the format of the text to be analyzed is the format of homophonic text unit word formation or not based on the content in the text to be analyzed;

if the format of the text to be analyzed is the format of homophonic text unit word formation, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, wherein the text unit comprises at least one continuous Chinese character;

and editing the target text according to the intention classification result and the text unit to be processed.

Preferably, the performing intent classification on the voice content of the target text edited by the user by using the pre-trained semantic recognition model to obtain a text to be analyzed and an intent classification result includes:

performing intention classification on voice content of a target text edited by a user by utilizing a semantic recognition model obtained by pre-training to obtain an intention classification result;

if the intention classification result is a text input intention, taking the voice content as a text to be analyzed;

and if the intention classification result is a replacement intention, an insertion intention or a deletion intention, extracting key information in the voice content by using the semantic recognition model, and taking the key information as a text to be analyzed.

Preferably, if the intention classification result is a replacement intention, an insertion intention, or a deletion intention, extracting key information in the speech content by using the semantic recognition model, and using the key information as a text to be analyzed, the method includes:

if the intention classification result is a replacing intention, extracting a replacing text unit and a replaced text unit in the voice content by using the semantic recognition model, taking a phrase containing the replacing text unit in the voice content as a first text to be analyzed, and taking a phrase containing the replaced text unit in the voice content as a second text to be analyzed;

if the intention classification result is an insertion intention, extracting a positioning text unit and a text unit to be inserted in the voice content by using the semantic recognition model, and taking a phrase containing the positioning text unit and the text unit to be inserted in the voice content as a third text to be analyzed;

if the intention classification result is the deletion intention, extracting a text unit to be deleted in the voice content by using the semantic recognition model, and taking a phrase containing the text unit to be deleted in the voice content as a fourth text to be analyzed.

Preferably, the editing the target text according to the intention classification result and the text unit to be processed includes:

if the intention classification result is a text input intention, inputting the text unit to be processed into the target text;

if the intention classification result is a replacement intention, replacing the replaced text unit in the target text with the replacement text unit, wherein the text unit to be processed of the first text to be analyzed is the replacement text unit, and the text unit to be processed of the second text to be analyzed is the replaced text unit;

if the intention classification result is an insertion intention, inserting the text unit to be inserted at the positioning text unit in the target text, wherein the text unit to be processed of the third text to be analyzed is the text unit to be inserted;

and if the intention classification result is a deletion intention, deleting the text unit to be deleted in the target text, wherein the text unit to be processed of the fourth text to be analyzed is the text unit to be deleted.

Preferably, the determining whether the format of the text to be analyzed is the format of the homophone text unit word group based on the content in the text to be analyzed includes:

determining whether the penultimate character in the text to be analyzed is a designated character;

if the penultimate character in the text to be analyzed is a designated character, judging whether a text unit before the penultimate character in the text to be analyzed is a word or not, wherein the text unit comprises at least one continuous Chinese character;

if the text unit before the penultimate character is a word, judging whether the text unit before the penultimate character comprises the last text unit in the text to be analyzed;

and if the text unit before the last-but-one character comprises the last text unit in the text to be analyzed, determining that the format of the text to be analyzed is the format of homophonic text unit word formation.

Preferably, the method further comprises the following steps:

and if the format of the text to be analyzed is not the format of the words of the homophone text unit, editing the target text according to the intention classification result and the text to be analyzed.

A second aspect of the embodiments of the present invention discloses a system for processing text units, the system including:

the classification unit is used for carrying out intention classification on the voice content of the target text edited by the user by utilizing a semantic recognition model obtained by pre-training to obtain a text to be analyzed and an intention classification result, wherein the intention classification result is a text input intention, a replacement intention, an insertion intention or a deletion intention;

the determining unit is used for determining whether the format of the text to be analyzed is the format of the homophone text unit word forming based on the content in the text to be analyzed;

the extraction unit is used for extracting the last text unit in the text to be analyzed and using the last text unit as a text unit to be processed if the format of the text to be analyzed is the format of homophone text unit word formation, and the text unit comprises at least one continuous Chinese character;

and the processing unit is used for editing the target text according to the intention classification result and the text unit to be processed.

Preferably, the classification unit includes:

the classification module is used for carrying out intention classification on the voice content of the target text edited by the user by utilizing a semantic recognition model obtained by pre-training to obtain an intention classification result;

the first processing module is used for taking the voice content as a text to be analyzed if the intention classification result is a text input intention;

and the second processing module is used for extracting key information in the voice content by utilizing the semantic recognition model if the intention classification result is a replacement intention, an insertion intention or a deletion intention, and taking the key information as a text to be analyzed.

Preferably, the second processing module is specifically configured to: if the intention classification result is a replacing intention, extracting a replacing text unit and a replaced text unit in the voice content by using the semantic recognition model, taking a phrase containing the replacing text unit in the voice content as a first text to be analyzed, and taking a phrase containing the replaced text unit in the voice content as a second text to be analyzed;

Preferably, the processing unit, configured to edit the target text according to the intention classification result and the text unit to be processed, is specifically configured to: if the intention classification result is a text input intention, inputting the text unit to be processed into the target text;

Based on the method and the system for processing the text unit provided by the embodiment of the invention, the method comprises the following steps: performing intention classification on voice content of a target text edited by a user by utilizing a semantic recognition model obtained by pre-training to obtain a text to be analyzed and an intention classification result; determining whether the format of the text to be analyzed is the format of the words of the homophone text unit or not based on the content in the text to be analyzed; if the format of the text to be analyzed is the format of the homophonic text unit word group, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed; and editing the target text according to the intention classification result and the text unit to be processed. According to the scheme, the semantic recognition model is used for carrying out intention classification on the voice input content of the user to obtain the corresponding text to be analyzed and the intention classification result. And if the format of the text to be analyzed is determined to be the format of the homophonic text unit word group, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, wherein the text unit to be processed is the homophonic text unit needing to be processed. And editing the target text according to the intention classification result and the text unit to be processed so as to realize the purpose of assisting the visually impaired people to accurately input the homophonic text unit and improve the user experience.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flowchart of a method for processing text units according to an embodiment of the present invention;

fig. 2 is a flowchart for determining a format of a text to be analyzed according to an embodiment of the present invention;

fig. 3 is a block diagram of a system for processing text units according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

As known from the background art, in the voice interaction, the situation of misrecognizing the homophones or homophones often occurs, and for the visually impaired people, there is a great difficulty in modifying the misrecognized homophones or homophones, so that a method capable of assisting the visually impaired people to accurately input the homophones or homophones is needed.

The embodiment of the invention provides a method and a system for processing a text unit, which are used for carrying out intention classification on voice input contents of a user by utilizing a semantic recognition model to obtain a corresponding text to be analyzed and an intention classification result. And if the format of the text to be analyzed is determined to be the format of the homophonic text unit word group, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, wherein the text unit to be processed is the homophonic text unit needing to be processed. And editing the target text according to the intention classification result and the text unit to be processed so as to realize the purpose of assisting the visually impaired people to accurately input the homophonic text unit and further improve the user experience.

It can be understood that, when the visually impaired people perform voice interaction and the character is set as a unit to perform reading operation, the characters are read after being combined into words, so that the visually impaired people can know which character the character is specifically, for example: when the 'yin' word is read, the 'yin and cloudy day' way is used for reading, so that the vision-impaired people can learn that the 'yin' word is the 'yin' of the 'cloudy day'.

It should be noted that the method for processing a text unit provided in the embodiment of the present invention is specifically applied to processing homophones or homophones, and similarly, the method for processing a text unit may also be applied to processing harmonious characters or harmonious words, and the details of the method for processing a text unit are described in the following embodiments.

Referring to fig. 1, a flowchart of a method for processing a text unit according to an embodiment of the present invention is shown, where the method includes:

step S101: and performing intention classification on the voice content of the target text edited by the user by using a semantic recognition model obtained by pre-training to obtain a text to be analyzed and an intention classification result.

It should be noted that the intention classification result is a text input intention, a replacement intention, an insertion intention, or a deletion intention.

In the specific implementation step S101, when the user edits the target text by voice, the voice content of the user is acquired. And performing intention classification on the voice content of the target text edited by the user by using a semantic recognition model obtained by pre-training to obtain an intention classification result.

That is, the semantic recognition model can be used to perform intent recognition on the voice content and classify the recognition result to obtain an intent classification result corresponding to the voice content.

It is understood that the intention classification result corresponding to the voice content of the user is a text input intention, a replacement intention, an insertion intention (which can be divided into a forward insertion and a backward insertion), or a deletion intention. Wherein the text input intent means: inputting the voice content into the target text; the intent of substitution is: replacing a certain text unit (namely a word or a word) in the target text by a certain text unit in the voice content; the intent of the insertion is: inserting a certain text unit in the voice content into a certain position in the target text; the deletion intention means: some text units in the target text are deleted using the speech content.

It can be understood that when a user edits a target text through voice content, the purpose is generally divided into two, one is to input the voice content into the target text; another purpose is to modify the target text with the speech content (the speech content at this time can be regarded as a modification instruction), for example: the voice content is 'change the sunny day into the cloudy day', and the sunny day 'in the target text is replaced by the cloudy day'.

After the intention classification result of the voice content of the target text edited by the user is obtained, if the intention classification result is a text input intention, the voice content is used as a text to be analyzed.

If the intention classification result is a replacement intention, an insertion intention or a deletion intention, extracting key information in the voice content by using a semantic recognition model, and taking the extracted key information as a text to be analyzed.

It can be understood that, in the process of extracting the key information in the voice content by using the semantic recognition model, the sequence labeling algorithm is firstly used to label each part of information in the voice content (label corresponding to each part of information), and then the label of each part of information in the voice content is used to extract the key information in the voice content in combination with the intention classification result.

For example: when the intention classification result of the voice content is the deletion intention, the extracted key information is a deleted object (the label of the object is a deletion label); when the intention classification result of the voice content is a replacing intention, the extracted key information is a replaced object (the label is a replaced label) and a replacing object (the label is a replacing label), namely, the replacing object in the target text is replaced by the replacing object; when the intention classification result of the voice content is an insertion intention, the extracted key information is a positioning word (a label is a positioning label) used as a reference and a target word (a label is an insertion label) to be inserted.

In combination with the above example, in some specific embodiments, the aforementioned "taking the extracted key information as the text to be analyzed" mainly includes three cases.

In the first case: if the intention classification result of the voice content of the target text edited by the user is a replacing intention, extracting a replacing text unit and a replaced text unit in the voice content by utilizing a semantic recognition model, taking a phrase containing the replacing text unit in the voice content as a first text to be analyzed, and taking a phrase containing the replaced text unit in the voice content as a second text to be analyzed.

For example: assuming that the voice content is "change the shadow of the cloudy day to the sound of music", the extracted replacement text unit is "sound", and the replaced text unit is "shadow", the phrase "sound of music" is used as the first text to be analyzed, and the phrase "shadow of the cloudy day" is used as the second text to be analyzed.

It should be noted that the text unit includes at least one continuous chinese character, that is, the text unit may be specifically used to represent a word or a word.

In the second case: if the intention classification result of the voice content of the target text edited by the user is the insertion intention, extracting a positioning text unit and a text unit to be inserted in the voice content by using a semantic recognition model, and taking a phrase containing the positioning text unit and the text unit to be inserted in the voice content as a third text to be analyzed.

In the third case: if the intention classification result of the voice content of the target text edited by the user is deletion intention, extracting a text unit to be deleted in the voice content by using a semantic recognition model, and taking a phrase containing the text unit to be deleted in the voice content as a fourth text to be analyzed.

Step S102: and determining whether the format of the text to be analyzed is the format of the homophone text unit word group based on the content in the text to be analyzed. If the format of the text to be analyzed is the homophonic text unit word formation format, step S103 is executed, and if the format of the text to be analyzed is not the homophonic text unit word formation format, step S105 is executed.

It should be noted that the format of the homophonic text unit word formation specifically includes: the "+" homophonic text units "of the" homophonic text unit group word "+".

Through the format of the homophonic text unit word group, whether a certain phrase is in the format of the homophonic text unit word group can be judged, for example: the phrase "yin in cloudy days" is the format of homophonic text unit word formation.

In the process of implementing step S102, the penultimate character in the text to be analyzed is compared with the specified character (the specified character may be a "word"). And if the penultimate character in the text to be analyzed is the designated character, judging whether the format of the text to be analyzed is the format of the homophone text unit word group or not through a text unit before the penultimate character in the text to be analyzed and the last text unit in the text to be analyzed.

If the format of the text to be analyzed is determined to be the format of the homophonic text unit word formation, step S103 is performed, and if the format of the text to be analyzed is determined not to be the format of the homophonic text unit word formation, step S105 is performed.

As can be seen from the content in step S101, when the text to be analyzed is determined, the determined text to be analyzed is different because of the difference in the intended classification result of the user for editing the speech content of the target text, and therefore, the format of the text to be analyzed is also different when determined.

If the intention classification result is the text input intention, whether the format of the text to be analyzed (namely the voice content) is the format of the homophone text unit word forming is judged.

And if the intention classification result is a replacement intention, judging whether the formats of the first text to be analyzed and the second text to be analyzed are the formats of the words forming in the homophone text unit.

And if the intention classification result is the insertion intention, judging whether the format of the third text to be analyzed is the format of the words forming of the homophone text unit.

And if the intention classification result is the deletion intention, judging whether the format of the fourth text to be analyzed is the format of the words forming of the homophone text unit.

Step S103: and extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, and executing the step S104.

It should be noted that the text unit includes at least one continuous chinese character.

In the process of specifically implementing step S103, if it is determined that the format of the text to be analyzed is the format of the homophonic text unit word formation, the last text unit in the text to be analyzed is extracted and used as a text unit to be processed, which is the homophonic text unit for editing the target text, and step S104 is executed.

It can be understood from the content in the above step S101 that, when determining the text to be analyzed, the determined text to be analyzed is different according to the intention classification result of the user for editing the voice content of the target text, and the details are as follows:

and if the intention classification result is a text input intention, taking the voice content as a text to be analyzed, and extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed.

It can be understood that, in the case that the format of the text to be analyzed is determined to be the format of the homophone text unit word group, the last text unit of the text to be analyzed is the text unit to be input (the text unit of the target text to be input). For example: for the speech content of "mobile phone of smart phone" (i.e. the text to be analyzed), the last text unit in the text to be analyzed is the word of the text unit to be input, i.e. the "mobile phone".

If the intention classification result is a replacement intention, a phrase containing a replacement text unit in the voice content is used as a first text to be analyzed, a phrase containing a replaced text unit in the voice content is used as a second text to be analyzed, at this time, the last text unit in the first text to be analyzed is extracted and used as a text unit to be processed, and the last text unit in the second text to be analyzed is extracted and used as a text unit to be processed.

It can be understood that, in the case that the format of the text to be analyzed is determined to be the format of the homophone text unit word group, the last text unit in the first text to be analyzed is the replaced text unit, and the last text unit in the second text to be analyzed is the replaced text unit. For example: for the speech content of 'change the shade of cloudy day to the sound of music', the first text to be analyzed is the sound of music ', the second text to be analyzed is the sound of cloudy day', the last text unit in the first text to be analyzed is the word of replacing the text unit 'sound', and the last text unit in the second text to be analyzed is the word of replacing the text unit 'shade'.

And if the intention classification result is an insertion intention, taking a phrase containing the positioning text unit and the text unit to be inserted in the voice content as a third text to be analyzed, and extracting the last text unit in the third text to be analyzed and taking the last text unit as a text unit to be processed.

It can be understood that, in the case that the format of the text to be analyzed is determined to be the format of the homophone text unit word group, the last text unit in the third text to be analyzed is the text unit to be inserted, for example: for the third text to be analyzed, which is "bright day inserted behind us", the last text unit in the third text to be analyzed is the word "bright" of the text unit to be inserted.

And if the intention classification result is the deletion intention, taking a phrase containing the text unit to be deleted in the voice content as a fourth text to be analyzed, and extracting the last text unit in the fourth text to be analyzed and taking the last text unit as a text unit to be processed.

It can be understood that, in the case that the format of the text to be analyzed is determined to be the format of the homophone text unit word group, the last text unit in the fourth text to be analyzed is the text unit to be deleted, for example: for the fourth text to be analyzed of "apple to be deleted", the last text unit in the fourth text to be analyzed is the word of the text unit to be deleted "apple".

Step S104: and editing the target text according to the intention classification result and the text unit to be processed.

In the process of implementing step S104 specifically, after determining that the format of the text to be analyzed is the format of the homophonic text unit word formation and extracting the text unit to be processed, the target text is edited according to the classification result of the intention of the user for editing the speech content of the target text and in combination with the extracted text unit to be processed. According to the difference of the intention classification result, the specific ways of editing the target text are mainly divided into the following four editing ways, which are described in detail below.

The first editing mode: if the intention classification result is a text input intention, the contents are known, the voice content is used as a text to be analyzed, and a text unit to be processed of the text to be analyzed is input into the target text. That is, if the intention classification result of the voice content is a text input intention, the last text unit (i.e., text unit to be processed) in the determined text to be analyzed is input into the target text.

It should be noted that the above steps can also be considered as follows: and replacing the text to be analyzed with the text unit to be processed, and inputting the text unit to be processed into the target text.

For example: if the intention classification result is the text input intention, the voice content (namely the text to be analyzed) is the sound of music, and the text unit to be processed is the word of sound, then the sound is input into the target text.

The second editing mode: if the intention classification result is a replacement intention, it can be known from the above that a first text to be analyzed and a second text to be analyzed can be determined, the text unit to be processed of the first text to be analyzed (i.e., the last text unit) is a replacement text unit, the text unit to be processed of the second text to be analyzed is a replaced text unit (i.e., the last text unit), and the replaced text unit in the target text is replaced with the replacement text unit.

For example: if the intention classification result is a replacement intention, the voice content is 'the shade of cloudy day is modified into the sound of music', the determined first text to be analyzed is 'the sound of music', the second text to be analyzed is 'the shade of cloudy day', the text unit to be processed of the first text to be analyzed is 'the sound' word (the text unit to be replaced), and the text unit to be processed of the second text to be analyzed is 'the shade' word (the text unit to be replaced), then the 'the shade' word in the target text is modified into 'the sound' word.

The third editing mode: if the intention classification result is an insertion intention, it can be known from the above contents that a third text to be analyzed can be determined, the text unit to be processed of the third text to be analyzed is the text unit to be inserted (i.e., the last text unit), and the text unit to be inserted is inserted at the position of the text unit to be positioned in the target text.

For example: if the intention classification result is an insertion intention, the third text to be analyzed is 'teaching of a teacher inserted behind us', the text unit to be processed of the third text to be analyzed is 'teaching' words (text unit to be inserted), the text unit to be positioned is 'us', and the 'teaching' words are inserted behind 'us' in the target text.

The fourth editing mode: if the intention classification result is the deletion intention, it can be known from the above that the fourth text to be analyzed can be determined, the text unit to be processed of the fourth text to be analyzed is the text unit to be deleted (i.e., the last text unit), and the text unit to be deleted in the target text is deleted.

For example: if the intention classification result is the deletion intention, the fourth text to be analyzed is 'negative for deleting cloudy days', at this moment, the text unit to be processed of the fourth text to be analyzed is 'negative' (text unit to be deleted), and the 'negative' word is deleted in the target text.

It should be noted that, in the example content in the four editing manners, a text unit is taken as a word for example, so as to illustrate, similarly, the text unit may also be a word, and when the text unit is a word, reference may be made to the above content for the manner of editing the target text, and details are not described here again.

Step S105: and editing the target text according to the intention classification result and the text to be analyzed.

In the process of implementing step S105 specifically, if it is determined that the format of the text to be analyzed is not the format of the homophonic text unit word group, the user edits the intention classification result of the voice content of the target text, and edits the target text in combination with the text to be analyzed.

If the intention classification result is a text input intention, the text to be analyzed (namely the voice content at this time) is input into the target text.

If the intention classification result is a replacement intention, it can be known from the above that the first text to be analyzed and the second text to be analyzed can be determined, the replaced text unit in the first text to be analyzed and the replaced text unit in the second text to be analyzed can be determined, and the replaced text unit in the target text can be replaced by the replaced text unit.

If the intention classification result is an insertion intention, the third text to be analyzed can be determined, the positioning text unit and the text unit to be inserted in the third text to be analyzed can be determined, and the text unit to be inserted can be inserted in the positioning text unit in the target text.

If the intention classification result is the deletion intention, the fourth text to be analyzed can be determined, the text unit to be deleted in the fourth text to be analyzed can be determined, and the text unit to be deleted in the target text can be deleted.

In the embodiment of the invention, the semantic recognition model is utilized to carry out intention classification on the voice input content of the user to obtain the corresponding text to be analyzed and the intention classification result. And if the format of the text to be analyzed is determined to be the format of the homophonic text unit word group, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, wherein the text unit to be processed is the homophonic text unit needing to be processed. And editing the target text according to the intention classification result and the text unit to be processed, and assisting the visually impaired people to accurately input the homophonic text unit so as to improve the user experience.

Fig. 2 shows a flowchart for determining a format of a text to be analyzed according to an embodiment of the present invention, which includes:

step S201: it is determined whether the penultimate character in the text to be analyzed is the designated character. If the penultimate character in the text to be analyzed is the designated character, step S202 is executed, and if the penultimate character in the text to be analyzed is not the designated character, step S205 is executed.

In the process of implementing step S201 specifically, it is determined whether the penultimate character in the text to be analyzed is a designated character, if it is determined that the penultimate character is the designated character, step S202 is executed to continue the subsequent determination, and if it is determined that the penultimate character is not the designated character, step S205 is executed to determine that the format of the text to be analyzed is not the format of the homophonic text unit word group.

For example: it is determined whether the penultimate character in the text to be analyzed is the "word" of "the designated character, if the penultimate character is determined to be the" word ", step S202 is performed, and if the penultimate character is determined not to be the" word ", step S205 is performed.

Step S202: and judging whether a text unit before the last character in the text to be analyzed is a word or not. If the text unit before the penultimate character in the text to be analyzed is a word, step S203 is executed, and if the text unit before the penultimate character in the text to be analyzed is not a word, step S205 is executed.

In the specific process of executing step S202, after determining that the penultimate character in the text to be analyzed is the designated character, a pre-constructed chinese word library is used to determine whether a text unit before the penultimate character in the text to be analyzed is a word. If the text unit before the penultimate character in the text to be analyzed is a word, executing step S203 to continue the subsequent determination, and if the text unit before the penultimate character in the text to be analyzed is not a word, executing step S205 to determine that the format of the text to be analyzed is not the format of the homophonic text unit word group.

Step S203: and judging whether the text unit before the last character in the text to be analyzed contains the last text unit in the text to be analyzed. If the text unit before the penultimate character includes the last text unit in the text to be analyzed, step S204 is performed, and if the text unit before the penultimate character does not include the last text unit in the text to be analyzed, step S205 is performed.

In the process of implementing step S203 specifically, after determining that a text unit before the penultimate character in the text to be analyzed is a word, it is determined whether the text unit before the penultimate character in the text to be analyzed includes the last text unit in the text to be analyzed, for example: and judging whether the words before the penultimate character in the text to be analyzed contain the last character in the text to be analyzed.

If the text unit before the penultimate character contains the last text unit in the text to be analyzed, step S204 is executed to determine that the format of the text to be analyzed is the format of the homophonic text unit word formation, and if the text unit before the penultimate character does not contain the last text unit in the text to be analyzed, step S205 is executed to determine that the format of the text to be analyzed is not the format of the homophonic text unit word formation.

Step S204: and determining the format of the text to be analyzed as the format of the homophone text unit word forming.

Step S205: determining that the format of the text to be analyzed is not the format of the homophonic text unit word group.

In the embodiment of the invention, the penultimate character in the text to be analyzed is compared with the specified character (and the text unit before the penultimate character in the text to be analyzed and the last text unit in the text to be analyzed are combined to judge whether the format of the text to be analyzed is the format of the word formation of the homophone text unit.

Corresponding to the method for processing a text unit provided in the foregoing embodiment of the present invention, referring to fig. 3, an embodiment of the present invention further provides a block diagram of a system for processing a text unit, where the system includes: a classification unit 301, a determination unit 302, an extraction unit 303, and a processing unit 304;

the classifying unit 301 is configured to perform intent classification on the voice content of the target text edited by the user by using a semantic recognition model obtained through pre-training, so as to obtain a text to be analyzed and an intent classification result, where the intent classification result is a text input intent, a replacement intent, an insertion intent, or a deletion intent.

The determining unit 302 is configured to determine whether the format of the text to be analyzed is a format of homophone text unit word formation based on the content in the text to be analyzed.

The extracting unit 303 is configured to, if the format of the text to be analyzed is a homophonic text unit word formation format, extract a last text unit in the text to be analyzed and use the last text unit as a text unit to be processed, where the text unit includes at least one continuous Chinese character.

And the processing unit 304 is used for editing the target text according to the intention classification result and the text unit to be processed.

Preferably, the processing unit 304 is further configured to: and if the format of the text to be analyzed is not the format of the words of the homophone text unit, editing the target text according to the intention classification result and the text to be analyzed.

Preferably, in conjunction with the content shown in fig. 3, the classification unit 301 includes: the system comprises a classification module, a first processing module and a second processing module, wherein the execution principle of each module is as follows:

and the classification module is used for performing intention classification on the voice content of the target text edited by the user by utilizing the semantic recognition model obtained by pre-training to obtain an intention classification result.

And the first processing module is used for taking the voice content as the text to be analyzed if the intention classification result is the text input intention.

And the second processing module is used for extracting key information in the voice content by utilizing the semantic recognition model and taking the key information as a text to be analyzed if the intention classification result is a replacement intention, an insertion intention or a deletion intention.

In a specific implementation, the second processing module is specifically configured to: if the intention classification result is a replacing intention, extracting a replacing text unit and a replaced text unit in the voice content by using a semantic recognition model, taking a phrase containing the replacing text unit in the voice content as a first text to be analyzed, and taking a phrase containing the replaced text unit in the voice content as a second text to be analyzed; if the intention classification result is an insertion intention, extracting a positioning text unit and a text unit to be inserted in the voice content by using a semantic recognition model, and taking a phrase containing the positioning text unit and the text unit to be inserted in the voice content as a third text to be analyzed; and if the intention classification result is the deletion intention, extracting a text unit to be deleted in the voice content by using a semantic recognition model, and taking a phrase containing the text unit to be deleted in the voice content as a fourth text to be analyzed.

Correspondingly, the processing unit 304, configured to edit the target text according to the intention classification result and the text unit to be processed, is specifically configured to: if the intention classification result is a text input intention, inputting a text unit to be processed into a target text; if the intention classification result is a replacing intention, replacing a replaced text unit in the target text with a replacing text unit, wherein a text unit to be processed of the first text to be analyzed is the replacing text unit, and a text unit to be processed of the second text to be analyzed is the replaced text unit; if the intention classification result is an insertion intention, inserting a text unit to be inserted into a positioning text unit in the target text, wherein a text unit to be processed of a third text to be analyzed is the text unit to be inserted; and if the intention classification result is the deletion intention, deleting the text unit to be deleted in the target text, wherein the text unit to be processed of the fourth text to be analyzed is the text unit to be deleted.

Preferably, in conjunction with what is shown in fig. 3, the determining unit 302 includes: the device comprises a first determining module, a first judging module, a second judging module and a second determining module, wherein the execution principle of each module is as follows:

the first determining module is used for determining whether the penultimate character in the text to be analyzed is a designated character.

The first judgment module is used for judging whether a text unit before the penultimate character in the text to be analyzed is a word or not if the penultimate character in the text to be analyzed is the designated character, and the text unit comprises at least one continuous Chinese character.

And the second judgment module is used for judging whether the text unit before the last character contains the last text unit in the text to be analyzed if the text unit before the last character is a word.

And the second determining module is used for determining that the format of the text to be analyzed is the format of the homophone text unit word group if the text unit before the last character contains the last text unit in the text to be analyzed.

In summary, embodiments of the present invention provide a method and a system for processing a text unit, which perform intent classification on a speech input content of a user by using a semantic recognition model to obtain a corresponding text to be analyzed and an intent classification result. And if the format of the text to be analyzed is determined to be the format of the homophonic text unit word group, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, wherein the text unit to be processed is the homophonic text unit needing to be processed. And editing the target text according to the intention classification result and the text unit to be processed, and assisting the visually impaired people to accurately input the homophonic text unit so as to improve the user experience.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for processing text units, wherein the method comprises:

Using the semantic recognition model obtained by pre-training, classify the speech content of the user for editing the target text by intent, and obtain the text to be analyzed and the intent classification result, where the intent classification result is text input intent, replacement intent, insertion intent or deletion intent ;

Based on the content in the text to be analyzed, determine whether the format of the text to be analyzed is the format of homophonetic text unit words;

If the format of the to-be-analyzed text is the format of homophonic text unit words, extract the last text unit in the to-be-analyzed text and use it as the to-be-processed text unit, and the text unit includes at least one continuous Chinese character;

The target text is edited according to the intent classification result and the to-be-processed text unit.

2. method according to claim 1, is characterized in that, described utilizing the semantic recognition model that pre-training obtains, carries out intention classification to the speech content that user is used for editing target text, obtains text to be analyzed and intention classification result, including :

Using the pre-trained semantic recognition model, classify the intent of the speech content used by the user to edit the target text, and obtain the intent classification result;

If the intent classification result is a text input intent, use the voice content as the text to be analyzed;

If the intent classification result is a replacement intent, an insertion intent, or a deletion intent, the semantic recognition model is used to extract key information in the speech content, and the key information is used as the text to be analyzed.

3. The method according to claim 2, wherein, if the intention classification result is a replacement intention, an insertion intention or a deletion intention, the semantic recognition model is used to extract the key information in the speech content, and the The key information is used as the text to be analyzed, including:

If the intent classification result is a replacement intent, use the semantic recognition model to extract the replacement text unit and the replaced text unit in the speech content, and use the phrase in the speech content that includes the replacement text unit as the first to-be-replaced text unit. Analyzing the text, using the phrase containing the replaced text unit in the voice content as the second text to be analyzed;

If the intent classification result is an insertion intent, the semantic recognition model is used to extract the positioned text unit and the to-be-inserted text unit in the voice content, and the voice content includes the positioned text unit and the to-be-inserted text Phrases of the unit as the third text to be analyzed;

If the intent classification result is an intent to delete, the semantic recognition model is used to extract the text unit to be deleted in the speech content, and the phrase containing the text unit to be deleted in the speech content is used as the fourth text to be analyzed.

4. The method according to claim 3, wherein the editing the target text according to the intent classification result and the to-be-processed text unit comprises:

If the intent classification result is a text input intent, input the to-be-processed text unit into the target text;

If the intent classification result is a replacement intent, replace the replaced text unit in the target text with the replacement text unit, wherein the to-be-processed text unit of the first to-be-analyzed text is the a replacement text unit, the to-be-processed text unit of the second to-be-analyzed text is the replaced text unit;

If the intent classification result is an insertion intent, insert the to-be-inserted text unit at the positioned text unit in the target text, wherein the to-be-processed text unit of the third to-be-analyzed text is the The text cell to be inserted;

If the intent classification result is a deletion intent, delete the to-be-deleted text unit in the target text, wherein the to-be-processed text unit of the fourth to-be-analyzed text is the to-be-deleted text unit.

5. The method according to claim 1, wherein, determining whether the format of the text to be analyzed is a format of homophonic text unit words based on the content in the text to be analyzed, comprising:

determining whether the penultimate character in the text to be analyzed is a specified character;

If the penultimate character in the text to be analyzed is a specified character, determine whether the text unit before the penultimate character in the text to be analyzed is a word, and the text unit includes at least one continuous Chinese character;

If the text unit before the second-to-last character is a word, determine whether the text unit before the second-to-last character includes the last text unit in the text to be analyzed;

If the text unit before the penultimate character includes the last text unit in the to-be-analyzed text, the format of the to-be-analyzed text is determined to be the format of homophonetic text unit words.

6. The method of claim 1, further comprising:

If the format of the to-be-analyzed text is not the format of homophonetic text unit words, the target text is edited according to the intent classification result and the to-be-analyzed text.

7. A system for processing text units, wherein the system comprises:

The classification unit is used for using the semantic recognition model obtained by pre-training to perform intention classification on the speech content used by the user to edit the target text, and obtain the text to be analyzed and the intention classification result, and the intention classification result is the text input intention, replacement intention, Insert intent or delete intent;

A determination unit for determining whether the format of the text to be analyzed is the format of a homophone text unit group word based on the content in the text to be analyzed;

The extraction unit is configured to extract the last text unit in the text to be analyzed and use it as the text unit to be processed, if the format of the text to be analyzed is the format of homophonetic text unit words, and the text unit includes at least one consecutive Chinese characters;

and a processing unit, configured to edit the target text according to the intent classification result and the to-be-processed text unit.

8. The system according to claim 7, wherein the classification unit comprises:

The classification module is used to use the pre-trained semantic recognition model to classify the speech content of the user for editing the target text, and obtain the intention classification result;

a first processing module, configured to use the voice content as the text to be analyzed if the intent classification result is a text input intent;

The second processing module is configured to extract key information in the speech content by using the semantic recognition model, and use the key information as the text to be analyzed if the intent classification result is a replacement intent, an insertion intent or a deletion intent.

9 . The system according to claim 8 , wherein the second processing module is specifically configured to: if the intent classification result is a replacement intent, extract the replacement text in the speech content by using the semantic recognition model. 10 . unit and the replaced text unit, the phrase containing the replaced text unit in the voice content is used as the first text to be analyzed, and the phrase containing the replaced text unit in the voice content is used as the second text to be analyzed;

10. The system according to claim 9, wherein the processing unit for editing the target text according to the intent classification result and the to-be-processed text unit is specifically used for: if the The intent classification result is a text input intent, and the to-be-processed text unit is input into the target text;