[go: up one dir, main page]

CN113191157A - Method and system for processing text unit - Google Patents

Method and system for processing text unit Download PDF

Info

Publication number
CN113191157A
CN113191157A CN202110539425.XA CN202110539425A CN113191157A CN 113191157 A CN113191157 A CN 113191157A CN 202110539425 A CN202110539425 A CN 202110539425A CN 113191157 A CN113191157 A CN 113191157A
Authority
CN
China
Prior art keywords
text
unit
analyzed
intent
text unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110539425.XA
Other languages
Chinese (zh)
Other versions
CN113191157B (en
Inventor
史元春
喻纯
杨欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Interactive Future Beijing Technology Co ltd
Tsinghua University
Original Assignee
Interactive Future Beijing Technology Co ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Interactive Future Beijing Technology Co ltd, Tsinghua University filed Critical Interactive Future Beijing Technology Co ltd
Priority to CN202110539425.XA priority Critical patent/CN113191157B/en
Publication of CN113191157A publication Critical patent/CN113191157A/en
Application granted granted Critical
Publication of CN113191157B publication Critical patent/CN113191157B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Machine Translation (AREA)

Abstract

本发明提供了一种处理文本单元的方法及系统,该方法为:利用预先训练得到的语义识别模型,对用户用于编辑目标文本的语音内容进行意图分类,得到待分析文本和意图分类结果;基于待分析文本中的内容,确定待分析文本的格式是否为同音文本单元组词的格式;若是,提取待分析文本中的最后一个文本单元并将其作为待处理文本单元,该待处理文本单元即为需要进行处理的同音文本单元;根据意图分类结果和待处理文本单元,对目标文本进行编辑,以实现辅助视障人群准确的输入同音文本单元,从而提高用户体验。

Figure 202110539425

The present invention provides a method and system for processing text units. The method comprises the following steps: using a semantic recognition model obtained by pre-training, classifying the voice content of a user for editing target text, and obtaining the text to be analyzed and the result of the intention classification; Based on the content in the text to be analyzed, it is determined whether the format of the text to be analyzed is the format of the homophones; if so, the last text unit in the text to be analyzed is extracted and used as the That is, the homophone text unit that needs to be processed; according to the intent classification result and the text unit to be processed, the target text is edited to assist the visually impaired people to accurately input the homophone text unit, thereby improving the user experience.

Figure 202110539425

Description

Method and system for processing text unit
Technical Field
The invention relates to the technical field of voice interaction, in particular to a method and a system for processing a text unit.
Background
With the development of voice recognition technology, the application scenarios of voice interaction are more and more extensive. In the voice interaction process, the misrecognition of the homophones or the homophones is frequently encountered, and the user needs to modify the misrecognized homophones or the homophones in a manual operation mode. However, it is difficult for the visually impaired to modify the incorrectly recognized homophones or homophones, so a method for assisting the visually impaired to accurately input the homophones or homophones is needed.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for processing text units to assist the visually impaired to accurately input homophones or homophones.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
the first aspect of the embodiments of the present invention discloses a method for processing a text unit, where the method includes:
performing intention classification on voice content of a target text edited by a user by utilizing a semantic recognition model obtained by pre-training to obtain a text to be analyzed and an intention classification result, wherein the intention classification result is a text input intention, a replacement intention, an insertion intention or a deletion intention;
determining whether the format of the text to be analyzed is the format of homophonic text unit word formation or not based on the content in the text to be analyzed;
if the format of the text to be analyzed is the format of homophonic text unit word formation, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, wherein the text unit comprises at least one continuous Chinese character;
and editing the target text according to the intention classification result and the text unit to be processed.
Preferably, the performing intent classification on the voice content of the target text edited by the user by using the pre-trained semantic recognition model to obtain a text to be analyzed and an intent classification result includes:
performing intention classification on voice content of a target text edited by a user by utilizing a semantic recognition model obtained by pre-training to obtain an intention classification result;
if the intention classification result is a text input intention, taking the voice content as a text to be analyzed;
and if the intention classification result is a replacement intention, an insertion intention or a deletion intention, extracting key information in the voice content by using the semantic recognition model, and taking the key information as a text to be analyzed.
Preferably, if the intention classification result is a replacement intention, an insertion intention, or a deletion intention, extracting key information in the speech content by using the semantic recognition model, and using the key information as a text to be analyzed, the method includes:
if the intention classification result is a replacing intention, extracting a replacing text unit and a replaced text unit in the voice content by using the semantic recognition model, taking a phrase containing the replacing text unit in the voice content as a first text to be analyzed, and taking a phrase containing the replaced text unit in the voice content as a second text to be analyzed;
if the intention classification result is an insertion intention, extracting a positioning text unit and a text unit to be inserted in the voice content by using the semantic recognition model, and taking a phrase containing the positioning text unit and the text unit to be inserted in the voice content as a third text to be analyzed;
if the intention classification result is the deletion intention, extracting a text unit to be deleted in the voice content by using the semantic recognition model, and taking a phrase containing the text unit to be deleted in the voice content as a fourth text to be analyzed.
Preferably, the editing the target text according to the intention classification result and the text unit to be processed includes:
if the intention classification result is a text input intention, inputting the text unit to be processed into the target text;
if the intention classification result is a replacement intention, replacing the replaced text unit in the target text with the replacement text unit, wherein the text unit to be processed of the first text to be analyzed is the replacement text unit, and the text unit to be processed of the second text to be analyzed is the replaced text unit;
if the intention classification result is an insertion intention, inserting the text unit to be inserted at the positioning text unit in the target text, wherein the text unit to be processed of the third text to be analyzed is the text unit to be inserted;
and if the intention classification result is a deletion intention, deleting the text unit to be deleted in the target text, wherein the text unit to be processed of the fourth text to be analyzed is the text unit to be deleted.
Preferably, the determining whether the format of the text to be analyzed is the format of the homophone text unit word group based on the content in the text to be analyzed includes:
determining whether the penultimate character in the text to be analyzed is a designated character;
if the penultimate character in the text to be analyzed is a designated character, judging whether a text unit before the penultimate character in the text to be analyzed is a word or not, wherein the text unit comprises at least one continuous Chinese character;
if the text unit before the penultimate character is a word, judging whether the text unit before the penultimate character comprises the last text unit in the text to be analyzed;
and if the text unit before the last-but-one character comprises the last text unit in the text to be analyzed, determining that the format of the text to be analyzed is the format of homophonic text unit word formation.
Preferably, the method further comprises the following steps:
and if the format of the text to be analyzed is not the format of the words of the homophone text unit, editing the target text according to the intention classification result and the text to be analyzed.
A second aspect of the embodiments of the present invention discloses a system for processing text units, the system including:
the classification unit is used for carrying out intention classification on the voice content of the target text edited by the user by utilizing a semantic recognition model obtained by pre-training to obtain a text to be analyzed and an intention classification result, wherein the intention classification result is a text input intention, a replacement intention, an insertion intention or a deletion intention;
the determining unit is used for determining whether the format of the text to be analyzed is the format of the homophone text unit word forming based on the content in the text to be analyzed;
the extraction unit is used for extracting the last text unit in the text to be analyzed and using the last text unit as a text unit to be processed if the format of the text to be analyzed is the format of homophone text unit word formation, and the text unit comprises at least one continuous Chinese character;
and the processing unit is used for editing the target text according to the intention classification result and the text unit to be processed.
Preferably, the classification unit includes:
the classification module is used for carrying out intention classification on the voice content of the target text edited by the user by utilizing a semantic recognition model obtained by pre-training to obtain an intention classification result;
the first processing module is used for taking the voice content as a text to be analyzed if the intention classification result is a text input intention;
and the second processing module is used for extracting key information in the voice content by utilizing the semantic recognition model if the intention classification result is a replacement intention, an insertion intention or a deletion intention, and taking the key information as a text to be analyzed.
Preferably, the second processing module is specifically configured to: if the intention classification result is a replacing intention, extracting a replacing text unit and a replaced text unit in the voice content by using the semantic recognition model, taking a phrase containing the replacing text unit in the voice content as a first text to be analyzed, and taking a phrase containing the replaced text unit in the voice content as a second text to be analyzed;
if the intention classification result is an insertion intention, extracting a positioning text unit and a text unit to be inserted in the voice content by using the semantic recognition model, and taking a phrase containing the positioning text unit and the text unit to be inserted in the voice content as a third text to be analyzed;
if the intention classification result is the deletion intention, extracting a text unit to be deleted in the voice content by using the semantic recognition model, and taking a phrase containing the text unit to be deleted in the voice content as a fourth text to be analyzed.
Preferably, the processing unit, configured to edit the target text according to the intention classification result and the text unit to be processed, is specifically configured to: if the intention classification result is a text input intention, inputting the text unit to be processed into the target text;
if the intention classification result is a replacement intention, replacing the replaced text unit in the target text with the replacement text unit, wherein the text unit to be processed of the first text to be analyzed is the replacement text unit, and the text unit to be processed of the second text to be analyzed is the replaced text unit;
if the intention classification result is an insertion intention, inserting the text unit to be inserted at the positioning text unit in the target text, wherein the text unit to be processed of the third text to be analyzed is the text unit to be inserted;
and if the intention classification result is a deletion intention, deleting the text unit to be deleted in the target text, wherein the text unit to be processed of the fourth text to be analyzed is the text unit to be deleted.
Based on the method and the system for processing the text unit provided by the embodiment of the invention, the method comprises the following steps: performing intention classification on voice content of a target text edited by a user by utilizing a semantic recognition model obtained by pre-training to obtain a text to be analyzed and an intention classification result; determining whether the format of the text to be analyzed is the format of the words of the homophone text unit or not based on the content in the text to be analyzed; if the format of the text to be analyzed is the format of the homophonic text unit word group, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed; and editing the target text according to the intention classification result and the text unit to be processed. According to the scheme, the semantic recognition model is used for carrying out intention classification on the voice input content of the user to obtain the corresponding text to be analyzed and the intention classification result. And if the format of the text to be analyzed is determined to be the format of the homophonic text unit word group, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, wherein the text unit to be processed is the homophonic text unit needing to be processed. And editing the target text according to the intention classification result and the text unit to be processed so as to realize the purpose of assisting the visually impaired people to accurately input the homophonic text unit and improve the user experience.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart of a method for processing text units according to an embodiment of the present invention;
fig. 2 is a flowchart for determining a format of a text to be analyzed according to an embodiment of the present invention;
fig. 3 is a block diagram of a system for processing text units according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As known from the background art, in the voice interaction, the situation of misrecognizing the homophones or homophones often occurs, and for the visually impaired people, there is a great difficulty in modifying the misrecognized homophones or homophones, so that a method capable of assisting the visually impaired people to accurately input the homophones or homophones is needed.
The embodiment of the invention provides a method and a system for processing a text unit, which are used for carrying out intention classification on voice input contents of a user by utilizing a semantic recognition model to obtain a corresponding text to be analyzed and an intention classification result. And if the format of the text to be analyzed is determined to be the format of the homophonic text unit word group, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, wherein the text unit to be processed is the homophonic text unit needing to be processed. And editing the target text according to the intention classification result and the text unit to be processed so as to realize the purpose of assisting the visually impaired people to accurately input the homophonic text unit and further improve the user experience.
It can be understood that, when the visually impaired people perform voice interaction and the character is set as a unit to perform reading operation, the characters are read after being combined into words, so that the visually impaired people can know which character the character is specifically, for example: when the 'yin' word is read, the 'yin and cloudy day' way is used for reading, so that the vision-impaired people can learn that the 'yin' word is the 'yin' of the 'cloudy day'.
It should be noted that the method for processing a text unit provided in the embodiment of the present invention is specifically applied to processing homophones or homophones, and similarly, the method for processing a text unit may also be applied to processing harmonious characters or harmonious words, and the details of the method for processing a text unit are described in the following embodiments.
Referring to fig. 1, a flowchart of a method for processing a text unit according to an embodiment of the present invention is shown, where the method includes:
step S101: and performing intention classification on the voice content of the target text edited by the user by using a semantic recognition model obtained by pre-training to obtain a text to be analyzed and an intention classification result.
It should be noted that the intention classification result is a text input intention, a replacement intention, an insertion intention, or a deletion intention.
In the specific implementation step S101, when the user edits the target text by voice, the voice content of the user is acquired. And performing intention classification on the voice content of the target text edited by the user by using a semantic recognition model obtained by pre-training to obtain an intention classification result.
That is, the semantic recognition model can be used to perform intent recognition on the voice content and classify the recognition result to obtain an intent classification result corresponding to the voice content.
It is understood that the intention classification result corresponding to the voice content of the user is a text input intention, a replacement intention, an insertion intention (which can be divided into a forward insertion and a backward insertion), or a deletion intention. Wherein the text input intent means: inputting the voice content into the target text; the intent of substitution is: replacing a certain text unit (namely a word or a word) in the target text by a certain text unit in the voice content; the intent of the insertion is: inserting a certain text unit in the voice content into a certain position in the target text; the deletion intention means: some text units in the target text are deleted using the speech content.
It can be understood that when a user edits a target text through voice content, the purpose is generally divided into two, one is to input the voice content into the target text; another purpose is to modify the target text with the speech content (the speech content at this time can be regarded as a modification instruction), for example: the voice content is 'change the sunny day into the cloudy day', and the sunny day 'in the target text is replaced by the cloudy day'.
After the intention classification result of the voice content of the target text edited by the user is obtained, if the intention classification result is a text input intention, the voice content is used as a text to be analyzed.
If the intention classification result is a replacement intention, an insertion intention or a deletion intention, extracting key information in the voice content by using a semantic recognition model, and taking the extracted key information as a text to be analyzed.
It can be understood that, in the process of extracting the key information in the voice content by using the semantic recognition model, the sequence labeling algorithm is firstly used to label each part of information in the voice content (label corresponding to each part of information), and then the label of each part of information in the voice content is used to extract the key information in the voice content in combination with the intention classification result.
For example: when the intention classification result of the voice content is the deletion intention, the extracted key information is a deleted object (the label of the object is a deletion label); when the intention classification result of the voice content is a replacing intention, the extracted key information is a replaced object (the label is a replaced label) and a replacing object (the label is a replacing label), namely, the replacing object in the target text is replaced by the replacing object; when the intention classification result of the voice content is an insertion intention, the extracted key information is a positioning word (a label is a positioning label) used as a reference and a target word (a label is an insertion label) to be inserted.
In combination with the above example, in some specific embodiments, the aforementioned "taking the extracted key information as the text to be analyzed" mainly includes three cases.
In the first case: if the intention classification result of the voice content of the target text edited by the user is a replacing intention, extracting a replacing text unit and a replaced text unit in the voice content by utilizing a semantic recognition model, taking a phrase containing the replacing text unit in the voice content as a first text to be analyzed, and taking a phrase containing the replaced text unit in the voice content as a second text to be analyzed.
For example: assuming that the voice content is "change the shadow of the cloudy day to the sound of music", the extracted replacement text unit is "sound", and the replaced text unit is "shadow", the phrase "sound of music" is used as the first text to be analyzed, and the phrase "shadow of the cloudy day" is used as the second text to be analyzed.
It should be noted that the text unit includes at least one continuous chinese character, that is, the text unit may be specifically used to represent a word or a word.
In the second case: if the intention classification result of the voice content of the target text edited by the user is the insertion intention, extracting a positioning text unit and a text unit to be inserted in the voice content by using a semantic recognition model, and taking a phrase containing the positioning text unit and the text unit to be inserted in the voice content as a third text to be analyzed.
In the third case: if the intention classification result of the voice content of the target text edited by the user is deletion intention, extracting a text unit to be deleted in the voice content by using a semantic recognition model, and taking a phrase containing the text unit to be deleted in the voice content as a fourth text to be analyzed.
Step S102: and determining whether the format of the text to be analyzed is the format of the homophone text unit word group based on the content in the text to be analyzed. If the format of the text to be analyzed is the homophonic text unit word formation format, step S103 is executed, and if the format of the text to be analyzed is not the homophonic text unit word formation format, step S105 is executed.
It should be noted that the format of the homophonic text unit word formation specifically includes: the "+" homophonic text units "of the" homophonic text unit group word "+".
Through the format of the homophonic text unit word group, whether a certain phrase is in the format of the homophonic text unit word group can be judged, for example: the phrase "yin in cloudy days" is the format of homophonic text unit word formation.
In the process of implementing step S102, the penultimate character in the text to be analyzed is compared with the specified character (the specified character may be a "word"). And if the penultimate character in the text to be analyzed is the designated character, judging whether the format of the text to be analyzed is the format of the homophone text unit word group or not through a text unit before the penultimate character in the text to be analyzed and the last text unit in the text to be analyzed.
If the format of the text to be analyzed is determined to be the format of the homophonic text unit word formation, step S103 is performed, and if the format of the text to be analyzed is determined not to be the format of the homophonic text unit word formation, step S105 is performed.
As can be seen from the content in step S101, when the text to be analyzed is determined, the determined text to be analyzed is different because of the difference in the intended classification result of the user for editing the speech content of the target text, and therefore, the format of the text to be analyzed is also different when determined.
If the intention classification result is the text input intention, whether the format of the text to be analyzed (namely the voice content) is the format of the homophone text unit word forming is judged.
And if the intention classification result is a replacement intention, judging whether the formats of the first text to be analyzed and the second text to be analyzed are the formats of the words forming in the homophone text unit.
And if the intention classification result is the insertion intention, judging whether the format of the third text to be analyzed is the format of the words forming of the homophone text unit.
And if the intention classification result is the deletion intention, judging whether the format of the fourth text to be analyzed is the format of the words forming of the homophone text unit.
Step S103: and extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, and executing the step S104.
It should be noted that the text unit includes at least one continuous chinese character.
In the process of specifically implementing step S103, if it is determined that the format of the text to be analyzed is the format of the homophonic text unit word formation, the last text unit in the text to be analyzed is extracted and used as a text unit to be processed, which is the homophonic text unit for editing the target text, and step S104 is executed.
It can be understood from the content in the above step S101 that, when determining the text to be analyzed, the determined text to be analyzed is different according to the intention classification result of the user for editing the voice content of the target text, and the details are as follows:
and if the intention classification result is a text input intention, taking the voice content as a text to be analyzed, and extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed.
It can be understood that, in the case that the format of the text to be analyzed is determined to be the format of the homophone text unit word group, the last text unit of the text to be analyzed is the text unit to be input (the text unit of the target text to be input). For example: for the speech content of "mobile phone of smart phone" (i.e. the text to be analyzed), the last text unit in the text to be analyzed is the word of the text unit to be input, i.e. the "mobile phone".
If the intention classification result is a replacement intention, a phrase containing a replacement text unit in the voice content is used as a first text to be analyzed, a phrase containing a replaced text unit in the voice content is used as a second text to be analyzed, at this time, the last text unit in the first text to be analyzed is extracted and used as a text unit to be processed, and the last text unit in the second text to be analyzed is extracted and used as a text unit to be processed.
It can be understood that, in the case that the format of the text to be analyzed is determined to be the format of the homophone text unit word group, the last text unit in the first text to be analyzed is the replaced text unit, and the last text unit in the second text to be analyzed is the replaced text unit. For example: for the speech content of 'change the shade of cloudy day to the sound of music', the first text to be analyzed is the sound of music ', the second text to be analyzed is the sound of cloudy day', the last text unit in the first text to be analyzed is the word of replacing the text unit 'sound', and the last text unit in the second text to be analyzed is the word of replacing the text unit 'shade'.
And if the intention classification result is an insertion intention, taking a phrase containing the positioning text unit and the text unit to be inserted in the voice content as a third text to be analyzed, and extracting the last text unit in the third text to be analyzed and taking the last text unit as a text unit to be processed.
It can be understood that, in the case that the format of the text to be analyzed is determined to be the format of the homophone text unit word group, the last text unit in the third text to be analyzed is the text unit to be inserted, for example: for the third text to be analyzed, which is "bright day inserted behind us", the last text unit in the third text to be analyzed is the word "bright" of the text unit to be inserted.
And if the intention classification result is the deletion intention, taking a phrase containing the text unit to be deleted in the voice content as a fourth text to be analyzed, and extracting the last text unit in the fourth text to be analyzed and taking the last text unit as a text unit to be processed.
It can be understood that, in the case that the format of the text to be analyzed is determined to be the format of the homophone text unit word group, the last text unit in the fourth text to be analyzed is the text unit to be deleted, for example: for the fourth text to be analyzed of "apple to be deleted", the last text unit in the fourth text to be analyzed is the word of the text unit to be deleted "apple".
Step S104: and editing the target text according to the intention classification result and the text unit to be processed.
In the process of implementing step S104 specifically, after determining that the format of the text to be analyzed is the format of the homophonic text unit word formation and extracting the text unit to be processed, the target text is edited according to the classification result of the intention of the user for editing the speech content of the target text and in combination with the extracted text unit to be processed. According to the difference of the intention classification result, the specific ways of editing the target text are mainly divided into the following four editing ways, which are described in detail below.
The first editing mode: if the intention classification result is a text input intention, the contents are known, the voice content is used as a text to be analyzed, and a text unit to be processed of the text to be analyzed is input into the target text. That is, if the intention classification result of the voice content is a text input intention, the last text unit (i.e., text unit to be processed) in the determined text to be analyzed is input into the target text.
It should be noted that the above steps can also be considered as follows: and replacing the text to be analyzed with the text unit to be processed, and inputting the text unit to be processed into the target text.
For example: if the intention classification result is the text input intention, the voice content (namely the text to be analyzed) is the sound of music, and the text unit to be processed is the word of sound, then the sound is input into the target text.
The second editing mode: if the intention classification result is a replacement intention, it can be known from the above that a first text to be analyzed and a second text to be analyzed can be determined, the text unit to be processed of the first text to be analyzed (i.e., the last text unit) is a replacement text unit, the text unit to be processed of the second text to be analyzed is a replaced text unit (i.e., the last text unit), and the replaced text unit in the target text is replaced with the replacement text unit.
For example: if the intention classification result is a replacement intention, the voice content is 'the shade of cloudy day is modified into the sound of music', the determined first text to be analyzed is 'the sound of music', the second text to be analyzed is 'the shade of cloudy day', the text unit to be processed of the first text to be analyzed is 'the sound' word (the text unit to be replaced), and the text unit to be processed of the second text to be analyzed is 'the shade' word (the text unit to be replaced), then the 'the shade' word in the target text is modified into 'the sound' word.
The third editing mode: if the intention classification result is an insertion intention, it can be known from the above contents that a third text to be analyzed can be determined, the text unit to be processed of the third text to be analyzed is the text unit to be inserted (i.e., the last text unit), and the text unit to be inserted is inserted at the position of the text unit to be positioned in the target text.
For example: if the intention classification result is an insertion intention, the third text to be analyzed is 'teaching of a teacher inserted behind us', the text unit to be processed of the third text to be analyzed is 'teaching' words (text unit to be inserted), the text unit to be positioned is 'us', and the 'teaching' words are inserted behind 'us' in the target text.
The fourth editing mode: if the intention classification result is the deletion intention, it can be known from the above that the fourth text to be analyzed can be determined, the text unit to be processed of the fourth text to be analyzed is the text unit to be deleted (i.e., the last text unit), and the text unit to be deleted in the target text is deleted.
For example: if the intention classification result is the deletion intention, the fourth text to be analyzed is 'negative for deleting cloudy days', at this moment, the text unit to be processed of the fourth text to be analyzed is 'negative' (text unit to be deleted), and the 'negative' word is deleted in the target text.
It should be noted that, in the example content in the four editing manners, a text unit is taken as a word for example, so as to illustrate, similarly, the text unit may also be a word, and when the text unit is a word, reference may be made to the above content for the manner of editing the target text, and details are not described here again.
Step S105: and editing the target text according to the intention classification result and the text to be analyzed.
In the process of implementing step S105 specifically, if it is determined that the format of the text to be analyzed is not the format of the homophonic text unit word group, the user edits the intention classification result of the voice content of the target text, and edits the target text in combination with the text to be analyzed.
If the intention classification result is a text input intention, the text to be analyzed (namely the voice content at this time) is input into the target text.
If the intention classification result is a replacement intention, it can be known from the above that the first text to be analyzed and the second text to be analyzed can be determined, the replaced text unit in the first text to be analyzed and the replaced text unit in the second text to be analyzed can be determined, and the replaced text unit in the target text can be replaced by the replaced text unit.
If the intention classification result is an insertion intention, the third text to be analyzed can be determined, the positioning text unit and the text unit to be inserted in the third text to be analyzed can be determined, and the text unit to be inserted can be inserted in the positioning text unit in the target text.
If the intention classification result is the deletion intention, the fourth text to be analyzed can be determined, the text unit to be deleted in the fourth text to be analyzed can be determined, and the text unit to be deleted in the target text can be deleted.
In the embodiment of the invention, the semantic recognition model is utilized to carry out intention classification on the voice input content of the user to obtain the corresponding text to be analyzed and the intention classification result. And if the format of the text to be analyzed is determined to be the format of the homophonic text unit word group, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, wherein the text unit to be processed is the homophonic text unit needing to be processed. And editing the target text according to the intention classification result and the text unit to be processed, and assisting the visually impaired people to accurately input the homophonic text unit so as to improve the user experience.
Fig. 2 shows a flowchart for determining a format of a text to be analyzed according to an embodiment of the present invention, which includes:
step S201: it is determined whether the penultimate character in the text to be analyzed is the designated character. If the penultimate character in the text to be analyzed is the designated character, step S202 is executed, and if the penultimate character in the text to be analyzed is not the designated character, step S205 is executed.
In the process of implementing step S201 specifically, it is determined whether the penultimate character in the text to be analyzed is a designated character, if it is determined that the penultimate character is the designated character, step S202 is executed to continue the subsequent determination, and if it is determined that the penultimate character is not the designated character, step S205 is executed to determine that the format of the text to be analyzed is not the format of the homophonic text unit word group.
For example: it is determined whether the penultimate character in the text to be analyzed is the "word" of "the designated character, if the penultimate character is determined to be the" word ", step S202 is performed, and if the penultimate character is determined not to be the" word ", step S205 is performed.
Step S202: and judging whether a text unit before the last character in the text to be analyzed is a word or not. If the text unit before the penultimate character in the text to be analyzed is a word, step S203 is executed, and if the text unit before the penultimate character in the text to be analyzed is not a word, step S205 is executed.
In the specific process of executing step S202, after determining that the penultimate character in the text to be analyzed is the designated character, a pre-constructed chinese word library is used to determine whether a text unit before the penultimate character in the text to be analyzed is a word. If the text unit before the penultimate character in the text to be analyzed is a word, executing step S203 to continue the subsequent determination, and if the text unit before the penultimate character in the text to be analyzed is not a word, executing step S205 to determine that the format of the text to be analyzed is not the format of the homophonic text unit word group.
Step S203: and judging whether the text unit before the last character in the text to be analyzed contains the last text unit in the text to be analyzed. If the text unit before the penultimate character includes the last text unit in the text to be analyzed, step S204 is performed, and if the text unit before the penultimate character does not include the last text unit in the text to be analyzed, step S205 is performed.
In the process of implementing step S203 specifically, after determining that a text unit before the penultimate character in the text to be analyzed is a word, it is determined whether the text unit before the penultimate character in the text to be analyzed includes the last text unit in the text to be analyzed, for example: and judging whether the words before the penultimate character in the text to be analyzed contain the last character in the text to be analyzed.
If the text unit before the penultimate character contains the last text unit in the text to be analyzed, step S204 is executed to determine that the format of the text to be analyzed is the format of the homophonic text unit word formation, and if the text unit before the penultimate character does not contain the last text unit in the text to be analyzed, step S205 is executed to determine that the format of the text to be analyzed is not the format of the homophonic text unit word formation.
Step S204: and determining the format of the text to be analyzed as the format of the homophone text unit word forming.
Step S205: determining that the format of the text to be analyzed is not the format of the homophonic text unit word group.
In the embodiment of the invention, the penultimate character in the text to be analyzed is compared with the specified character (and the text unit before the penultimate character in the text to be analyzed and the last text unit in the text to be analyzed are combined to judge whether the format of the text to be analyzed is the format of the word formation of the homophone text unit.
Corresponding to the method for processing a text unit provided in the foregoing embodiment of the present invention, referring to fig. 3, an embodiment of the present invention further provides a block diagram of a system for processing a text unit, where the system includes: a classification unit 301, a determination unit 302, an extraction unit 303, and a processing unit 304;
the classifying unit 301 is configured to perform intent classification on the voice content of the target text edited by the user by using a semantic recognition model obtained through pre-training, so as to obtain a text to be analyzed and an intent classification result, where the intent classification result is a text input intent, a replacement intent, an insertion intent, or a deletion intent.
The determining unit 302 is configured to determine whether the format of the text to be analyzed is a format of homophone text unit word formation based on the content in the text to be analyzed.
The extracting unit 303 is configured to, if the format of the text to be analyzed is a homophonic text unit word formation format, extract a last text unit in the text to be analyzed and use the last text unit as a text unit to be processed, where the text unit includes at least one continuous Chinese character.
And the processing unit 304 is used for editing the target text according to the intention classification result and the text unit to be processed.
Preferably, the processing unit 304 is further configured to: and if the format of the text to be analyzed is not the format of the words of the homophone text unit, editing the target text according to the intention classification result and the text to be analyzed.
Preferably, in conjunction with the content shown in fig. 3, the classification unit 301 includes: the system comprises a classification module, a first processing module and a second processing module, wherein the execution principle of each module is as follows:
and the classification module is used for performing intention classification on the voice content of the target text edited by the user by utilizing the semantic recognition model obtained by pre-training to obtain an intention classification result.
And the first processing module is used for taking the voice content as the text to be analyzed if the intention classification result is the text input intention.
And the second processing module is used for extracting key information in the voice content by utilizing the semantic recognition model and taking the key information as a text to be analyzed if the intention classification result is a replacement intention, an insertion intention or a deletion intention.
In a specific implementation, the second processing module is specifically configured to: if the intention classification result is a replacing intention, extracting a replacing text unit and a replaced text unit in the voice content by using a semantic recognition model, taking a phrase containing the replacing text unit in the voice content as a first text to be analyzed, and taking a phrase containing the replaced text unit in the voice content as a second text to be analyzed; if the intention classification result is an insertion intention, extracting a positioning text unit and a text unit to be inserted in the voice content by using a semantic recognition model, and taking a phrase containing the positioning text unit and the text unit to be inserted in the voice content as a third text to be analyzed; and if the intention classification result is the deletion intention, extracting a text unit to be deleted in the voice content by using a semantic recognition model, and taking a phrase containing the text unit to be deleted in the voice content as a fourth text to be analyzed.
Correspondingly, the processing unit 304, configured to edit the target text according to the intention classification result and the text unit to be processed, is specifically configured to: if the intention classification result is a text input intention, inputting a text unit to be processed into a target text; if the intention classification result is a replacing intention, replacing a replaced text unit in the target text with a replacing text unit, wherein a text unit to be processed of the first text to be analyzed is the replacing text unit, and a text unit to be processed of the second text to be analyzed is the replaced text unit; if the intention classification result is an insertion intention, inserting a text unit to be inserted into a positioning text unit in the target text, wherein a text unit to be processed of a third text to be analyzed is the text unit to be inserted; and if the intention classification result is the deletion intention, deleting the text unit to be deleted in the target text, wherein the text unit to be processed of the fourth text to be analyzed is the text unit to be deleted.
In the embodiment of the invention, the semantic recognition model is utilized to carry out intention classification on the voice input content of the user to obtain the corresponding text to be analyzed and the intention classification result. And if the format of the text to be analyzed is determined to be the format of the homophonic text unit word group, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, wherein the text unit to be processed is the homophonic text unit needing to be processed. And editing the target text according to the intention classification result and the text unit to be processed, and assisting the visually impaired people to accurately input the homophonic text unit so as to improve the user experience.
Preferably, in conjunction with what is shown in fig. 3, the determining unit 302 includes: the device comprises a first determining module, a first judging module, a second judging module and a second determining module, wherein the execution principle of each module is as follows:
the first determining module is used for determining whether the penultimate character in the text to be analyzed is a designated character.
The first judgment module is used for judging whether a text unit before the penultimate character in the text to be analyzed is a word or not if the penultimate character in the text to be analyzed is the designated character, and the text unit comprises at least one continuous Chinese character.
And the second judgment module is used for judging whether the text unit before the last character contains the last text unit in the text to be analyzed if the text unit before the last character is a word.
And the second determining module is used for determining that the format of the text to be analyzed is the format of the homophone text unit word group if the text unit before the last character contains the last text unit in the text to be analyzed.
In the embodiment of the invention, the penultimate character in the text to be analyzed is compared with the specified character (and the text unit before the penultimate character in the text to be analyzed and the last text unit in the text to be analyzed are combined to judge whether the format of the text to be analyzed is the format of the word formation of the homophone text unit.
In summary, embodiments of the present invention provide a method and a system for processing a text unit, which perform intent classification on a speech input content of a user by using a semantic recognition model to obtain a corresponding text to be analyzed and an intent classification result. And if the format of the text to be analyzed is determined to be the format of the homophonic text unit word group, extracting the last text unit in the text to be analyzed and taking the last text unit as a text unit to be processed, wherein the text unit to be processed is the homophonic text unit needing to be processed. And editing the target text according to the intention classification result and the text unit to be processed, and assisting the visually impaired people to accurately input the homophonic text unit so as to improve the user experience.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1.一种处理文本单元的方法,其特征在于,所述方法包括:1. A method for processing text units, wherein the method comprises: 利用预先训练得到的语义识别模型,对用户用于编辑目标文本的语音内容进行意图分类,得到待分析文本和意图分类结果,所述意图分类结果为文本输入意图、替换意图、插入意图或删除意图;Using the semantic recognition model obtained by pre-training, classify the speech content of the user for editing the target text by intent, and obtain the text to be analyzed and the intent classification result, where the intent classification result is text input intent, replacement intent, insertion intent or deletion intent ; 基于所述待分析文本中的内容,确定所述待分析文本的格式是否为同音文本单元组词的格式;Based on the content in the text to be analyzed, determine whether the format of the text to be analyzed is the format of homophonetic text unit words; 若所述待分析文本的格式为同音文本单元组词的格式,提取所述待分析文本中的最后一个文本单元并将其作为待处理文本单元,所述文本单元包括至少一个连续的汉字;If the format of the to-be-analyzed text is the format of homophonic text unit words, extract the last text unit in the to-be-analyzed text and use it as the to-be-processed text unit, and the text unit includes at least one continuous Chinese character; 根据所述意图分类结果和所述待处理文本单元,对所述目标文本进行编辑。The target text is edited according to the intent classification result and the to-be-processed text unit. 2.根据权利要求1所述的方法,其特征在于,所述利用预先训练得到的语义识别模型,对用户用于编辑目标文本的语音内容进行意图分类,得到待分析文本和意图分类结果,包括:2. method according to claim 1, is characterized in that, described utilizing the semantic recognition model that pre-training obtains, carries out intention classification to the speech content that user is used for editing target text, obtains text to be analyzed and intention classification result, including : 利用预先训练得到的语义识别模型,对用户用于编辑目标文本的语音内容进行意图分类,得到意图分类结果;Using the pre-trained semantic recognition model, classify the intent of the speech content used by the user to edit the target text, and obtain the intent classification result; 若所述意图分类结果为文本输入意图,将所述语音内容作为待分析文本;If the intent classification result is a text input intent, use the voice content as the text to be analyzed; 若所述意图分类结果为替换意图、插入意图或删除意图,利用所述语义识别模型提取所述语音内容中的关键信息,将所述关键信息作为待分析文本。If the intent classification result is a replacement intent, an insertion intent, or a deletion intent, the semantic recognition model is used to extract key information in the speech content, and the key information is used as the text to be analyzed. 3.根据权利要求2所述的方法,其特征在于,所述若所述意图分类结果为替换意图、插入意图或删除意图,利用所述语义识别模型提取所述语音内容中的关键信息,将所述关键信息作为待分析文本,包括:3. The method according to claim 2, wherein, if the intention classification result is a replacement intention, an insertion intention or a deletion intention, the semantic recognition model is used to extract the key information in the speech content, and the The key information is used as the text to be analyzed, including: 若所述意图分类结果为替换意图,利用所述语义识别模型提取所述语音内容中的替换文本单元和被替换文本单元,将所述语音内容中包含所述替换文本单元的短语作为第一待分析文本,将所述语音内容中包含所述被替换文本单元的短语作为第二待分析文本;If the intent classification result is a replacement intent, use the semantic recognition model to extract the replacement text unit and the replaced text unit in the speech content, and use the phrase in the speech content that includes the replacement text unit as the first to-be-replaced text unit. Analyzing the text, using the phrase containing the replaced text unit in the voice content as the second text to be analyzed; 若所述意图分类结果为插入意图,利用所述语义识别模型提取所述语音内容中的定位文本单元和待插入文本单元,将所述语音内容中包含所述定位文本单元和所述待插入文本单元的短语作为第三待分析文本;If the intent classification result is an insertion intent, the semantic recognition model is used to extract the positioned text unit and the to-be-inserted text unit in the voice content, and the voice content includes the positioned text unit and the to-be-inserted text Phrases of the unit as the third text to be analyzed; 若所述意图分类结果为删除意图,利用所述语义识别模型提取所述语音内容中的待删除文本单元,将所述语音内容中包含所述待删除文本单元的短语作为第四待分析文本。If the intent classification result is an intent to delete, the semantic recognition model is used to extract the text unit to be deleted in the speech content, and the phrase containing the text unit to be deleted in the speech content is used as the fourth text to be analyzed. 4.根据权利要求3所述的方法,其特征在于,所述根据所述意图分类结果和所述待处理文本单元,对所述目标文本进行编辑,包括:4. The method according to claim 3, wherein the editing the target text according to the intent classification result and the to-be-processed text unit comprises: 若所述意图分类结果为文本输入意图,将所述待处理文本单元输入所述目标文本中;If the intent classification result is a text input intent, input the to-be-processed text unit into the target text; 若所述意图分类结果为替换意图,将所述目标文本中的所述被替换文本单元替换为所述替换文本单元,其中,所述第一待分析文本的所述待处理文本单元为所述替换文本单元,所述第二待分析文本的所述待处理文本单元为所述被替换文本单元;If the intent classification result is a replacement intent, replace the replaced text unit in the target text with the replacement text unit, wherein the to-be-processed text unit of the first to-be-analyzed text is the a replacement text unit, the to-be-processed text unit of the second to-be-analyzed text is the replaced text unit; 若所述意图分类结果为插入意图,在所述目标文本中的所述定位文本单元处插入所述待插入文本单元,其中,所述第三待分析文本的所述待处理文本单元为所述待插入文本单元;If the intent classification result is an insertion intent, insert the to-be-inserted text unit at the positioned text unit in the target text, wherein the to-be-processed text unit of the third to-be-analyzed text is the The text cell to be inserted; 若所述意图分类结果为删除意图,将所述目标文本中的所述待删除文本单元删除,其中,所述第四待分析文本的所述待处理文本单元为所述待删除文本单元。If the intent classification result is a deletion intent, delete the to-be-deleted text unit in the target text, wherein the to-be-processed text unit of the fourth to-be-analyzed text is the to-be-deleted text unit. 5.根据权利要求1所述的方法,其特征在于,所述基于所述待分析文本中的内容,确定所述待分析文本的格式是否为同音文本单元组词的格式,包括:5. The method according to claim 1, wherein, determining whether the format of the text to be analyzed is a format of homophonic text unit words based on the content in the text to be analyzed, comprising: 确定所述待分析文本中的倒数第二个字符是否为指定字符;determining whether the penultimate character in the text to be analyzed is a specified character; 若所述待分析文本中的倒数第二个字符为指定字符,判断所述待分析文本中的倒数第二个字符之前的文本单元是否为词语,所述文本单元包括至少一个连续的汉字;If the penultimate character in the text to be analyzed is a specified character, determine whether the text unit before the penultimate character in the text to be analyzed is a word, and the text unit includes at least one continuous Chinese character; 若所述倒数第二个字符之前的文本单元为词语,判断所述倒数第二个字符之前的文本单元是否包含所述待分析文本中的最后一个文本单元;If the text unit before the second-to-last character is a word, determine whether the text unit before the second-to-last character includes the last text unit in the text to be analyzed; 若所述倒数第二个字符之前的文本单元包含所述待分析文本中的最后一个文本单元,确定所述待分析文本的格式为同音文本单元组词的格式。If the text unit before the penultimate character includes the last text unit in the to-be-analyzed text, the format of the to-be-analyzed text is determined to be the format of homophonetic text unit words. 6.根据权利要求1所述的方法,其特征在于,还包括:6. The method of claim 1, further comprising: 若所述待分析文本的格式不是同音文本单元组词的格式,根据所述意图分类结果和所述待分析文本,对所述目标文本进行编辑。If the format of the to-be-analyzed text is not the format of homophonetic text unit words, the target text is edited according to the intent classification result and the to-be-analyzed text. 7.一种处理文本单元的系统,其特征在于,所述系统包括:7. A system for processing text units, wherein the system comprises: 分类单元,用于利用预先训练得到的语义识别模型,对用户用于编辑目标文本的语音内容进行意图分类,得到待分析文本和意图分类结果,所述意图分类结果为文本输入意图、替换意图、插入意图或删除意图;The classification unit is used for using the semantic recognition model obtained by pre-training to perform intention classification on the speech content used by the user to edit the target text, and obtain the text to be analyzed and the intention classification result, and the intention classification result is the text input intention, replacement intention, Insert intent or delete intent; 确定单元,用于基于所述待分析文本中的内容,确定所述待分析文本的格式是否为同音文本单元组词的格式;A determination unit for determining whether the format of the text to be analyzed is the format of a homophone text unit group word based on the content in the text to be analyzed; 提取单元,用于若所述待分析文本的格式为同音文本单元组词的格式,提取所述待分析文本中的最后一个文本单元并将其作为待处理文本单元,所述文本单元包括至少一个连续的汉字;The extraction unit is configured to extract the last text unit in the text to be analyzed and use it as the text unit to be processed, if the format of the text to be analyzed is the format of homophonetic text unit words, and the text unit includes at least one consecutive Chinese characters; 处理单元,用于根据所述意图分类结果和所述待处理文本单元,对所述目标文本进行编辑。and a processing unit, configured to edit the target text according to the intent classification result and the to-be-processed text unit. 8.根据权利要求7所述的系统,其特征在于,所述分类单元包括:8. The system according to claim 7, wherein the classification unit comprises: 分类模块,用于利用预先训练得到的语义识别模型,对用户用于编辑目标文本的语音内容进行意图分类,得到意图分类结果;The classification module is used to use the pre-trained semantic recognition model to classify the speech content of the user for editing the target text, and obtain the intention classification result; 第一处理模块,用于若所述意图分类结果为文本输入意图,将所述语音内容作为待分析文本;a first processing module, configured to use the voice content as the text to be analyzed if the intent classification result is a text input intent; 第二处理模块,用于若所述意图分类结果为替换意图、插入意图或删除意图,利用所述语义识别模型提取所述语音内容中的关键信息,将所述关键信息作为待分析文本。The second processing module is configured to extract key information in the speech content by using the semantic recognition model, and use the key information as the text to be analyzed if the intent classification result is a replacement intent, an insertion intent or a deletion intent. 9.根据权利要求8所述的系统,其特征在于,所述第二处理模块具体用于:若所述意图分类结果为替换意图,利用所述语义识别模型提取所述语音内容中的替换文本单元和被替换文本单元,将所述语音内容中包含所述替换文本单元的短语作为第一待分析文本,将所述语音内容中包含所述被替换文本单元的短语作为第二待分析文本;9 . The system according to claim 8 , wherein the second processing module is specifically configured to: if the intent classification result is a replacement intent, extract the replacement text in the speech content by using the semantic recognition model. 10 . unit and the replaced text unit, the phrase containing the replaced text unit in the voice content is used as the first text to be analyzed, and the phrase containing the replaced text unit in the voice content is used as the second text to be analyzed; 若所述意图分类结果为插入意图,利用所述语义识别模型提取所述语音内容中的定位文本单元和待插入文本单元,将所述语音内容中包含所述定位文本单元和所述待插入文本单元的短语作为第三待分析文本;If the intent classification result is an insertion intent, the semantic recognition model is used to extract the positioned text unit and the to-be-inserted text unit in the voice content, and the voice content includes the positioned text unit and the to-be-inserted text Phrases of the unit as the third text to be analyzed; 若所述意图分类结果为删除意图,利用所述语义识别模型提取所述语音内容中的待删除文本单元,将所述语音内容中包含所述待删除文本单元的短语作为第四待分析文本。If the intent classification result is an intent to delete, the semantic recognition model is used to extract the text unit to be deleted in the speech content, and the phrase containing the text unit to be deleted in the speech content is used as the fourth text to be analyzed. 10.根据权利要求9所述的系统,其特征在于,用于根据所述意图分类结果和所述待处理文本单元对所述目标文本进行编辑的所述处理单元,具体用于:若所述意图分类结果为文本输入意图,将所述待处理文本单元输入所述目标文本中;10. The system according to claim 9, wherein the processing unit for editing the target text according to the intent classification result and the to-be-processed text unit is specifically used for: if the The intent classification result is a text input intent, and the to-be-processed text unit is input into the target text; 若所述意图分类结果为替换意图,将所述目标文本中的所述被替换文本单元替换为所述替换文本单元,其中,所述第一待分析文本的所述待处理文本单元为所述替换文本单元,所述第二待分析文本的所述待处理文本单元为所述被替换文本单元;If the intent classification result is a replacement intent, replace the replaced text unit in the target text with the replacement text unit, wherein the to-be-processed text unit of the first to-be-analyzed text is the a replacement text unit, the to-be-processed text unit of the second to-be-analyzed text is the replaced text unit; 若所述意图分类结果为插入意图,在所述目标文本中的所述定位文本单元处插入所述待插入文本单元,其中,所述第三待分析文本的所述待处理文本单元为所述待插入文本单元;If the intent classification result is an insertion intent, insert the to-be-inserted text unit at the positioned text unit in the target text, wherein the to-be-processed text unit of the third to-be-analyzed text is the The text cell to be inserted; 若所述意图分类结果为删除意图,将所述目标文本中的所述待删除文本单元删除,其中,所述第四待分析文本的所述待处理文本单元为所述待删除文本单元。If the intent classification result is a deletion intent, delete the to-be-deleted text unit in the target text, wherein the to-be-processed text unit of the fourth to-be-analyzed text is the to-be-deleted text unit.
CN202110539425.XA 2021-05-18 2021-05-18 A method and system for processing text units Active CN113191157B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110539425.XA CN113191157B (en) 2021-05-18 2021-05-18 A method and system for processing text units

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110539425.XA CN113191157B (en) 2021-05-18 2021-05-18 A method and system for processing text units

Publications (2)

Publication Number Publication Date
CN113191157A true CN113191157A (en) 2021-07-30
CN113191157B CN113191157B (en) 2024-11-19

Family

ID=76982605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110539425.XA Active CN113191157B (en) 2021-05-18 2021-05-18 A method and system for processing text units

Country Status (1)

Country Link
CN (1) CN113191157B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315737A (en) * 2017-07-04 2017-11-03 北京奇艺世纪科技有限公司 A kind of semantic logic processing method and system
WO2020119496A1 (en) * 2018-12-14 2020-06-18 深圳壹账通智能科技有限公司 Communication method, device and equipment based on artificial intelligence and readable storage medium
CN111415656A (en) * 2019-01-04 2020-07-14 上海擎感智能科技有限公司 Voice semantic recognition method and device and vehicle
CN112242143A (en) * 2019-07-19 2021-01-19 北京字节跳动网络技术有限公司 Voice interaction method and device, terminal equipment and storage medium
CN112468665A (en) * 2020-11-05 2021-03-09 中国建设银行股份有限公司 Method, device, equipment and storage medium for generating conference summary

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315737A (en) * 2017-07-04 2017-11-03 北京奇艺世纪科技有限公司 A kind of semantic logic processing method and system
WO2020119496A1 (en) * 2018-12-14 2020-06-18 深圳壹账通智能科技有限公司 Communication method, device and equipment based on artificial intelligence and readable storage medium
CN111415656A (en) * 2019-01-04 2020-07-14 上海擎感智能科技有限公司 Voice semantic recognition method and device and vehicle
CN112242143A (en) * 2019-07-19 2021-01-19 北京字节跳动网络技术有限公司 Voice interaction method and device, terminal equipment and storage medium
CN112468665A (en) * 2020-11-05 2021-03-09 中国建设银行股份有限公司 Method, device, equipment and storage medium for generating conference summary

Also Published As

Publication number Publication date
CN113191157B (en) 2024-11-19

Similar Documents

Publication Publication Date Title
CN108847241B (en) Method for recognizing conference voice as text, electronic device and storage medium
CN110750996B (en) Method and device for generating multimedia information and readable storage medium
CN108536654B (en) Method and device for displaying identification text
CN107039034B (en) Rhythm prediction method and system
CN110263322A (en) Audio for speech recognition corpus screening technique, device and computer equipment
CN108984529A (en) Real-time court's trial speech recognition automatic error correction method, storage medium and computing device
CN109710929A (en) A kind of bearing calibration, device, computer equipment and the storage medium of speech recognition text
CN112992125B (en) Voice recognition method and device, electronic equipment and readable storage medium
CN108182432A (en) Information processing method and device
CN111508479A (en) Voice recognition method, device, equipment and storage medium
CN109817210A (en) Voice writing method, device, terminal and storage medium
CN116320607A (en) Intelligent video generation method, device, equipment and medium
CN107665188B (en) Semantic understanding method and device
CN108231066A (en) Speech recognition system and method thereof and vocabulary establishing method
CN104485107A (en) Name voice recognition method, name voice recognition system and name voice recognition equipment
CN110740275A (en) nonlinear editing systems
CN114999463A (en) Voice recognition method, device, equipment and medium
CN110442855A (en) A kind of speech analysis method and system
CN118398032A (en) Audio evaluation method, electronic device and storage medium
CN110503943B (en) Voice interaction method and voice interaction system
CN102184172A (en) Chinese character reading system and method for blind people
CN111489742B (en) Acoustic model training method, speech recognition method, device and electronic equipment
CN112466286A (en) Data processing method and device, terminal equipment
CN113191157A (en) Method and system for processing text unit
CN113409791A (en) Voice recognition processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant