Disclosure of Invention
The embodiment of the invention provides an information processing method, an information processing device and electronic equipment, and aims to solve the problems of more complicated operation and lower efficiency of the conventional character query.
In a first aspect, an embodiment of the present invention provides an information processing method, including:
acquiring target voice, wherein the target voice comprises pronunciation of a first word in a first text displayed on electronic equipment;
determining characters to be recognized in the first text according to the target voice, wherein the characters to be recognized are different from the first characters;
acquiring target information of the character to be recognized, wherein the target information comprises at least one of pronunciation and annotation.
Optionally, the target speech further includes pronunciation of a preset cue word;
determining the characters to be recognized in the first text according to the target voice, wherein the determining comprises the following steps:
identifying the target voice to obtain a second text corresponding to the target voice;
determining second characters in the second text except the preset prompt words;
determining a target word sentence matched with the second character in the first text;
and determining characters to be recognized in the target words and sentences, wherein the characters to be recognized are different from the second characters.
Optionally, the determining second words in the second text except for the preset cue word includes:
according to the preset cue word, the second text is segmented to obtain a third character in the second text before the preset cue word and a fourth character after the preset cue word, wherein the second character comprises the third character and the fourth character;
the determining the target words and phrases in the first text that match the second words includes:
determining a fifth word in the first text, which is matched with the third word, and a sixth word which is matched with the fourth word;
and determining a target word sentence comprising the fifth word and the sixth word from the first text.
Optionally, the determining a target word sentence including the fifth word and the sixth word from the first text includes:
under the condition that the number of the fifth words or the sixth words is larger than 1, determining target fifth words and target sixth words with the smallest position intervals in the first text, wherein the target fifth words are before the target sixth words;
and determining a target word and sentence in the first text by taking the target fifth character as a starting word and the target sixth character as an ending word.
Optionally, the determining the characters to be recognized in the target words and sentences includes:
and determining characters positioned between the target fifth character and the target sixth character in the target words and sentences as characters to be recognized.
Optionally, the determining, according to the target speech, a character to be recognized in the first text includes:
identifying the target voice to obtain a third text corresponding to the target voice;
determining a seventh word in the first text that matches the third text;
receiving a first input of a user;
responding to the first input, and determining the word number K of the character to be recognized, wherein K is a positive integer;
and determining K characters positioned after the seventh character in the first text as the characters to be recognized.
Optionally, the receiving a first input of the user includes:
receiving a tap input of a user on a screen of the electronic equipment;
the responding to the first input, and determining the number K of the character words to be recognized comprises the following steps:
and determining the knocking times K of the knocking input as the number of the characters to be recognized.
Optionally, the number of the seventh characters is L, and L is an integer greater than 1;
the determining that K characters located after the seventh character in the first text are the characters to be recognized includes:
determining K characters respectively positioned behind each seventh character in the first text as candidate characters to obtain L groups of candidate characters;
receiving a second input of the user;
and responding to the second input, determining a target candidate character from the L groups of candidate characters, and determining the target candidate character as the character to be recognized.
Optionally, before receiving the second input of the user, the method further includes:
identifying the L groups of candidate words;
the receiving of the second input of the user comprises:
and receiving selection input of the L groups of candidate characters from a user.
In a second aspect, an embodiment of the present invention further provides an information processing apparatus, including:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring target voice, and the target voice comprises pronunciation of a first character in a first text displayed on the electronic equipment;
the determining module is used for determining characters to be recognized in the first text according to the target voice, wherein the characters to be recognized are different from the first characters;
and the second acquisition module is used for acquiring target information of the character to be recognized, wherein the target information comprises at least one of pronunciation and annotation.
Optionally, the target speech further includes pronunciation of a preset cue word;
the determining module comprises:
the first recognition submodule is used for recognizing the target voice to obtain a second text corresponding to the target voice;
the first determining submodule is used for determining second characters except the preset prompt words in the second text;
the second determining submodule is used for determining a target word and sentence matched with the second word in the first text;
and the third determining submodule is used for determining characters to be recognized in the target words and sentences, wherein the characters to be recognized are different from the second characters.
Optionally, the first determining sub-module is configured to segment the second text according to the preset cue word to obtain a third word before the preset cue word and a fourth word after the preset cue word in the second text, where the second word includes the third word and the fourth word;
the second determination submodule includes:
a first determining unit, configured to determine a fifth word in the first text that matches the third word, and a sixth word that matches the fourth word;
a second determining unit, configured to determine a target word sentence including the fifth word and the sixth word from the first text.
Optionally, the second determining unit includes:
a first determining subunit, configured to determine, when the number of the fifth words or the sixth words is greater than 1, a target fifth word and a target sixth word that are located at a minimum interval in the first text, where the target fifth word precedes the target sixth word;
and the second determining subunit is configured to determine a target word and sentence in the first text by using the target fifth character as a start word and the target sixth character as an end word.
Optionally, the third determining submodule is configured to determine, as a character to be recognized, a character in the target sentence, which is located between the target fifth character and the target sixth character.
Optionally, the determining module includes:
the second recognition submodule is used for recognizing the target voice to obtain a third text corresponding to the target voice;
a fourth determining submodule, configured to determine a seventh word in the first text, where the seventh word matches the third text;
the receiving submodule is used for receiving a first input of a user;
the fifth determining submodule is used for responding to the first input and determining the number K of the characters to be recognized, wherein K is a positive integer;
and the sixth determining submodule is used for determining K characters behind the seventh character in the first text as the characters to be recognized.
Optionally, the receiving sub-module is configured to receive a tap input of a user on the screen of the electronic device;
and the fifth determining submodule is used for determining the knocking times K of the knocking input as the character word number to be recognized.
Optionally, the number of the seventh characters is L, and L is an integer greater than 1;
the sixth determination submodule includes:
a third determining unit, configured to determine that K characters respectively located after each seventh character in the first text are candidate characters, so as to obtain L groups of candidate characters;
a receiving unit for receiving a second input of the user;
and the fourth determining unit is used for responding to the second input, determining a target candidate character from the L groups of candidate characters and determining the target candidate character as the character to be recognized.
Optionally, the sixth determining sub-module further includes:
an identification unit, configured to identify the L groups of candidate characters;
the receiving unit is used for receiving the selection input of the L groups of candidate characters by the user.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: a transceiver, a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the information processing method as described above when executing the computer program.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the information processing method as described above.
In the embodiment of the invention, target voice is obtained, wherein the target voice comprises pronunciation of a first character in a first text displayed on electronic equipment; determining characters to be recognized in the first text according to the target voice, wherein the characters to be recognized are different from the first characters; acquiring target information of the character to be recognized, wherein the target information comprises at least one of pronunciation and annotation. Therefore, when a user reads by using the electronic equipment, if the user encounters unknown characters, the user can read the characters displayed on the equipment, the electronic equipment can be triggered to position the characters to be recognized according to the user's reading, and then information such as pronunciation or comments of the characters to be recognized is acquired.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of an information processing method according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:
step 101, obtaining a target voice, wherein the target voice comprises a pronunciation of a first character in a first text displayed on an electronic device.
The embodiment of the invention can be applied to scenes that a user quickly pronounces and inquires unknown words such as uncommon words, quickly explains and inquires the encountered words with unknown literal meanings, quickly pronounces or inquires the unknown English words and the like in the reading process by utilizing electronic equipment.
In the embodiment of the invention, when the electronic equipment is in a reading page, namely the first text is displayed, the user may encounter unknown characters in reading, and at the moment, the user can read the characters before and after the unknown characters, so that the electronic equipment can collect the voice uttered by the characters read by the user, namely, the target voice is acquired. The first text may refer to a text currently displayed on the electronic device, the first character may refer to a certain section of characters before and after the character to be recognized in the first text, and the target voice is a voice uttered by a user by reading the first character. It should be noted that the first characters may be characters of different languages, such as chinese characters and english.
For example, the electronic device displays the text "ancient human shape and sound meaning is debated from badness", the user does not recognize the pronunciation of " " two characters in reading, the user can read out the characters before and after " " two characters, namely, the words "ancient human shape and sound meaning" and "badness", for " " two characters can be stopped and not read, or other cue words can be used for substitution; alternatively, the user may read only the word "ancient ideographical debate" preceding the word " ".
Optionally, the step 101 includes:
and acquiring the target voice under the condition of receiving preset input of a user.
The preset input may be a preset input for triggering the electronic device to collect a voice of a user, for example, as shown in fig. 2, the preset input may be a character recognition function button 21 displayed on an interface of the touch electronic device 20, or a sound pickup function for waking up the electronic device by voice. That is, in this embodiment, the voice collecting module may be turned on only when an input for triggering the voice collecting function is received from the user, so as to obtain the target voice, so as to ensure that the electronic device starts the voice collecting function at an appropriate time.
Step 102, determining characters to be recognized in the first text according to the target voice, wherein the characters to be recognized are different from the first characters.
After the target voice is obtained, a field currently read by a user can be positioned in the first text according to the target voice, and specific characters to be recognized are determined according to a preset rule.
Specifically, the target speech may be subjected to speech recognition, the target speech is converted into a text, then, based on the converted text, a position of the converted text is found in the first text, so that a reading position of the user may be located, and further, a word at the position where the user does not pronounce or a plurality of words after the position may be determined as a word to be recognized, where a specific word number may be set by default of the system or determined based on a word number parameter further input by the user, that is, the word to be recognized is different from the first word in pronunciation of the user, and the word to be recognized may be located between the first words or may be located immediately after the first word.
For example, when the text corresponding to the target voice is recognized as "ancient pictophonetic ideology and malignance", it may be determined that the user is reading the text "ancient pictophonetic ideology malignance" displayed on the electronic device, and two characters " " in which the user has not read may be determined as the characters to be recognized; or, when the text corresponding to the target voice is recognized as the ancient human form and sound meaning, it can be determined that the user is reading the text of the ancient human form and sound meaning displayed on the electronic equipment, and the next two characters "" "" are determined as the characters to be recognized by default.
Optionally, the target speech further includes pronunciation of a preset cue word;
the step 102 comprises:
identifying the target voice to obtain a second text corresponding to the target voice;
determining second characters in the second text except the preset prompt words;
determining a target word sentence matched with the second character in the first text;
and determining characters to be recognized in the target words and sentences, wherein the characters to be recognized are different from the second characters.
In one embodiment, when a user reads an unknown character, the user may use a specific cue word to replace the unknown character, where the specific cue word may be a preset cue word used to replace the character to be recognized, such as a "shape, sound, meaning, word learning," or a "learning sound," and thus the target speech may further include a pronunciation of the preset cue word, and the electronic device may accurately determine that the word in the first text corresponding to the specific cue word is the character to be recognized based on the preset cue word.
In this embodiment, the electronic device may recognize the target voice, obtain a second text corresponding to the target voice, extract a second word except the preset cue word from the second text, find a target word and sentence matched with the first text based on the second word, and determine a word at a position corresponding to the preset cue word in the target word and sentence, where the word is a word to be recognized.
For example, as shown in FIG. 2, when the user reads the three pieces displayed on the electronic device 20, the user can read the sentence aloud without knowing the four words of the three pieces, and the preset prompt word of shape and sound meaning instead of the three pieces, the electronic device can read out the voice information read out by the user and convert the voice information into text information; the text that converts is distinguished for "other sound will, very mobile", electronic equipment distinguishes "and text" other sound will distinguish through predetermineeing the suggestion word "the shape sound will, very mobile" compares, can confirm the target word and sentence of matching and for "other pieces and pieces, very mobile" stands to and can further confirm in this word and sentence with "shape sound will distinguish" the characters of waiting to discern that the position corresponds are "piece and piece with writings" in this sentence, electronic equipment can carry out the mark corresponding at the demonstration position department of "piece writings" four characters in obtaining "piece writings" four characters with writings.
Therefore, through the embodiment, the user only needs to read the unknown text paragraph and uses the preset cue word to replace the unknown text during reading, so that the electronic equipment can be triggered to accurately position the text to be recognized according to the user's pronunciation, and the target information of the text to be recognized of the user can be fed back in real time.
Optionally, the determining second words in the second text except for the preset cue word includes:
according to the preset cue word, the second text is segmented to obtain a third character in the second text before the preset cue word and a fourth character after the preset cue word, wherein the second character comprises the third character and the fourth character;
the determining the target words and phrases in the first text that match the second words includes:
determining a fifth word in the first text, which is matched with the third word, and a sixth word which is matched with the fourth word;
and determining a target word sentence comprising the fifth word and the sixth word from the first text.
Specifically, when determining a second text in the second text except the preset cue word, the second text may be segmented by using a position of the preset cue word in the second text as a boundary to obtain a third text which is a text in the second text before the preset cue word and a fourth text which is a text after the preset cue word, where the second text includes the third text and the fourth text.
For example, by taking "geometric-phonetic-meaning-distinguishment" as a preset hint word for example, the converted text is "geometric-phonetic-meaning-distinguishment, which is very mobile", the positions of the four words of "geometric-phonetic-meaning-distinguishment" in the converted text can be determined, then, "geometric-phonetic-meaning-distinguishment" can be used as a separator, and the converted text is split into "front section" [ geometric-phonetic-meaning-distinguishment ] "rear section", for example, the converted text is split into "other" [ geometric-phonetic-meaning-distinguishment ] "very mobile", "front section" is "other", and "rear section" is "very mobile".
Then, the third words and the fourth words can be used as matching keywords respectively, and fifth words respectively matched with the third words and sixth words matched with the fourth words can be found in the first text; and finally, determining a target word sentence comprising the fifth word and the sixth word from the first text, specifically, finding the position of the fifth word from the first text, continuing to find the position of the sixth word from the position, and taking words comprising the fifth word and the sixth word as the target word sentence.
Therefore, through the implementation mode, the words and sentences of the characters to be recognized can be accurately and quickly positioned from the first text.
Optionally, the determining a target word sentence including the fifth word and the sixth word from the first text includes:
under the condition that the number of the fifth words or the sixth words is larger than 1, determining target fifth words and target sixth words with the smallest position intervals in the first text, wherein the target fifth words are before the target sixth words;
and determining a target word and sentence in the first text by taking the target fifth character as a starting word and the target sixth character as an ending word.
In this embodiment, when a plurality of fifth characters matching with the third characters are determined from the first text, or a plurality of sixth characters matching with the fourth characters are determined from the first text, it is necessary to further determine words and phrases including the characters to be recognized, which are actually read by the user.
Specifically, the target fifth text and the target sixth text actually read by the user may be determined based on the position relationship and the position interval between each fifth text and each sixth text, for example, the size of the position interval between each fifth text and each sixth text may be compared one by one, and finally, a group of fifth text and sixth text with the smallest position interval and the fifth text located before the sixth text is obtained, where the group of text is the target fifth text and the target sixth text. Then, the target word and sentence may be determined in the first text by using the target fifth word as a start word and the target sixth word as an end word, where the target word and sentence includes words between the target fifth word and the target sixth word.
Therefore, under the condition that a plurality of matched characters exist in the first text, the words and sentences where the characters to be recognized are actually read by the user can be accurately positioned through the implementation mode.
Optionally, the determining the characters to be recognized in the target words and sentences includes:
and determining characters positioned between the target fifth character and the target sixth character in the target words and sentences as characters to be recognized.
When the target word and sentence is determined to be a word and sentence in the first text, which takes the target fifth character as a starting word and takes the target sixth character as an ending word, the character between the target fifth character and the target sixth character in the target word and sentence can be directly determined to be a character to be recognized. Of course, after the target fifth word and the target sixth word are determined, the word between the target fifth word and the target sixth word in the first text may also be directly determined as the word to be recognized.
For example, referring to FIG. 3, assume P represents a text string of the content of the page displayed on electronic device 20, V represents a string of characters after the user speaks the page text and converts the speech of the prompt "our Chinese character recognition" into text, V1 represents a string of characters before "ideograph" in the converted text, i.e., "our Chinese character", and V2 represents a string of characters after "ideograph" in the converted text, i.e., "whine".
As shown in fig. 3, in the content of the page displayed on the electronic device 20, a plurality of fields matching V1, assuming X1, X2, and X3 from front to back, respectively, by the display position, and a plurality of fields matching V2, assuming Y1 and Y2 from front to back, respectively, by the display position, may be determined.
The specific matching steps may be as follows:
1) after the user reads the sentence at the character position to be recognized, the electronic equipment waits for several seconds to convert the current reading sound into a character string V;
2) extracting the prompting words of 'shape, sound and meaning' contained in the V character string, and respectively defining the contents before and after the prompting words as V1 and V2;
3) starting from front to back in P, retrieve V1, resulting in X1, X2, and X3, respectively;
4) searching V2 from back to front (or from front to back) in P to obtain Y1 and Y2 respectively;
5) position interval minimum match calculation: judging the position relations of X1, X2 and X3 and Y1 respectively, and judging the position relations of X1, X2 and X3 and Y2 respectively, thereby excluding the matching relation of X3 and Y2; calculating the position intervals of X1 and Y1 and Y2, X2 and Y1 and Y2, and X3 and Y1 respectively; determining X3 and Y1 with the minimum position interval as a matching pair;
6) the method comprises the steps of determining an string ' waiver 222763,34337 ', azulene 35274and short palace moth ' between X3 and Y1 as characters to be identified, and sending the string to a rear-end interface to inquire pinyin information.
Through the implementation mode, the characters to be recognized really expected by the user can be quickly and accurately positioned.
Optionally, the determining, according to the target speech, a character to be recognized in the first text includes:
identifying the target voice to obtain a third text corresponding to the target voice;
determining a seventh word in the first text that matches the third text;
receiving a first input of a user;
responding to the first input, and determining the word number K of the character to be recognized, wherein K is a positive integer;
and determining K characters positioned after the seventh character in the first text as the characters to be recognized.
In another embodiment, the user may only read a segment of text before the unknown text, and the electronic device locates the text matching the user's pronunciation in the first text by the user's pronunciation, and determines how many characters following the selected matching text are the text to be recognized based on the user's input.
In this embodiment, the electronic device may recognize the target voice, obtain a third text corresponding to the target voice, and find a word, that is, a seventh word, matching with the first text based on the third text, and then the user may perform a first input for determining the word number of the word to be recognized, so that the electronic device may receive the first input, determine the word number K of the word to be recognized based on the first input, and further determine that K words located after the seventh word in the first text are the word to be recognized.
The first input may be inputting a specific number on the display interface of the electronic device, such as directly handwriting a number on the display interface, or inputting a corresponding number in a pop-up window for a user to input a word count, or clicking K times in a blank of the display interface, or tapping K times on a screen, or inputting a number by voice.
Therefore, through the embodiment, the user only needs to read the unknown text paragraph and input the word number of the character to be recognized during reading, and the electronic equipment can be triggered to accurately position the character to be recognized according to the user's reading and the input word number.
Optionally, the receiving a first input of the user includes:
receiving a tap input of a user on a screen of the electronic equipment;
the responding to the first input, and determining the number K of the character words to be recognized comprises the following steps:
and determining the knocking times K of the knocking input as the number of the characters to be recognized.
In an embodiment, the first input may be a tapping input on the screen of the electronic device, after the user pronounces a segment of text before the text to be recognized, the electronic device may locate the reading position of the user based on the pronunciation of the user, and the user may tap the screen of the electronic device for K times to prompt the electronic device to select K texts after the currently located position as the text to be recognized.
For example, for a text displayed on an electronic device, the text "ancient pictophonetic ideological dyscrasia", demon of drought, front , chi, 3957, 3953 ", wherein a user does not recognize demon of drought, front , chi, 3957, and 3953", the user can read the text "ancient pictophonetic ideographic dyscrasia", the electronic device can locate the sentence based on the user's reading, the user can continue to tap 8 times on the screen of the electronic device, and the electronic device can monitor the tap times of the user by tapping a monitoring module, so that 8 characters "demon of drought, front , chi, 57, 39397" after the "ancient pictophonetic ideographic dyscrasia" can be determined as characters to be recognized.
Therefore, the user can trigger the electronic equipment to accurately position the character to be recognized by the user only by matching with the knocking operation with the pronunciation of a section of the character before the character to be recognized, and the target information of the character to be recognized can be quickly acquired.
Optionally, the number of the seventh characters is L, and L is an integer greater than 1;
the determining that K characters located after the seventh character in the first text are the characters to be recognized includes:
determining K characters respectively positioned behind each seventh character in the first text as candidate characters to obtain L groups of candidate characters;
receiving a second input of the user;
and responding to the second input, determining a target candidate character from the L groups of candidate characters, and determining the target candidate character as the character to be recognized.
That is, in one embodiment, when a plurality of seventh characters matching with the pronunciation of the user are determined from the first text, the position of the character that the user really needs to recognize needs to be further determined.
Specifically, in the case that L seventh words are matched, for each seventh word, the part may determine, in the first text, K words located after the seventh word as candidate words, so as to obtain L groups of candidate words, and then the user may perform a second input for determining a target candidate word, so that the electronic device may receive the second input, determine a target candidate word from the L groups of candidate words based on the second input, and may determine the target candidate word as the word to be recognized.
The second input may be tapping on the screen several times to prompt the electronic device that the candidate characters in the group 3 are the target candidate characters, if tapping 3 times, the candidate characters in the group 3 are determined to be the target candidate characters, or clicking several times on a blank of the display interface, or clicking the position of the target candidate characters, or inputting a number by voice.
That is, when the system determines that the character to be recognized that meets the condition is unique, the system can directly acquire the K characters after the seventh character as the target character to be recognized; when the system judges that the characters to be recognized which meet the conditions are not unique, the system can monitor the number M of times of continuous knocking of the user for two times, and selects K characters after the Mth seventh character as the target characters to be recognized.
Therefore, under the condition that a plurality of matched candidate characters exist in the first text, the position of the character which is really needed to be identified by the user can be accurately and conveniently located through the implementation mode.
Optionally, before receiving the second input of the user, the method further includes:
identifying the L groups of candidate words;
the receiving of the second input of the user comprises:
and receiving selection input of the L groups of candidate characters from a user.
That is, in this embodiment, after determining that K characters respectively located after each seventh character in the first text are candidate characters, and obtaining L groups of candidate characters, the L groups of candidate characters may be identified in the first text, for example, the L groups of candidate characters are highlighted, or the L groups of candidate characters are displayed with a specific color, so as to visually prompt a user of a position where the candidate character to be selected is located, and the user may perform selection input on the identified L groups of candidate characters, for example, click on a group of candidate characters to be recognized, or click on a screen of the electronic device for a corresponding number of times after determining that the group of candidate characters to be recognized is a few groups in a front-to-back order, so as to trigger the electronic device to determine that the group of candidate characters is the character to be recognized.
After determining L seventh words in the first text, the L seventh words may also be identified, and after determining the number of words of the word to be recognized based on the user input, candidate words following each seventh word may be further identified.
Therefore, the position of the candidate character of the user can be visually prompted by identifying the L groups of candidate characters, and the user is helped to accurately and conveniently select the target character to be recognized.
For example, referring to fig. 4, the electronic device 20 displays a text "our Chinese characters, written down into a painting leaves a history of five thousand years, which is recognized by the world, our Chinese characters are all story at one time, Qiang honest and glaring at kneeling and lifting fire, our Chinese characters, wale 2227634337azu 35274g, whig 222277g, whig 3433277g, which is not recognized by the user in reading," wale 22276277g, azu 35354g ", then the user can read" our Chinese characters "by tapping the screen 8 times at a light point, the electronic device identifies display positions of three" our Chinese characters "and 8 characters after identifying each" hui yao "as candidate characters, at this time, the user can tap the screen 3 times at a light point, and can determine the candidate characters to be recognized as" wale 35763576353576303, and the electronic device can retrieve the text for the third group 3, i.e. wale 3522235763435763 times, after obtaining the pinyin, the pronunciation can be marked at the character position.
It should be noted that the above process of recognizing the target speech may be executed at the terminal side, or may be executed at the server side, if the process is executed at the terminal side, the technical process of acquiring the target speech and recognizing the target speech may be directly implemented by the terminal, and if the process is executed at the server side, the terminal may acquire the target speech uttered by the user, and send the target speech to the server for speech-to-text recognition and conversion.
Step 103, acquiring target information of the character to be recognized, wherein the target information comprises at least one of pronunciation and annotation.
After the character to be recognized is determined, target information of the character to be recognized, such as pronunciation, word sense annotation, and the like, can be directly acquired to help a user recognize or understand the character to be recognized.
The obtaining of the target information of the character to be recognized may specifically be searching information such as pronunciation, meaning and the like of the character to be recognized from a database, where the database may include data such as a chinese dictionary, an english-chinese dictionary and the like, or may be performing background networking search on the character to be recognized, and extracting information such as pronunciation, meaning and the like of the character to be recognized from a search result.
It should be noted that, when the target information of the character to be recognized is obtained, the target information of the character to be recognized may be directly identified in the first text, or the target information of the character to be recognized may be output in a voice prompt manner.
The information processing method of the embodiment of the invention comprises the steps of obtaining target voice, wherein the target voice comprises pronunciation of a first character in a first text displayed on electronic equipment; determining characters to be recognized in the first text according to the target voice, wherein the characters to be recognized are different from the first characters; acquiring target information of the character to be recognized, wherein the target information comprises at least one of pronunciation and annotation. Therefore, when a user reads by using the electronic equipment, if the user encounters unknown characters, the user can read the characters displayed on the equipment, the electronic equipment can be triggered to position the characters to be recognized according to the user's reading, and then information such as pronunciation or comments of the characters to be recognized is acquired. In addition, the embodiment of the invention is also suitable for the characters which are displayed on the interface and can not be selected and copied.
The embodiment of the invention also provides an information processing device. Referring to fig. 5, fig. 5 is a block diagram of an information processing apparatus according to an embodiment of the present invention. Because the principle of solving the problem of the information processing device is similar to the information processing method in the embodiment of the invention, the implementation of the information processing device can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 5, the information processing apparatus 500 includes:
a first obtaining module 501, configured to obtain a target voice, where the target voice includes a pronunciation of a first word in a first text displayed on an electronic device;
a determining module 502, configured to determine, according to the target voice, a to-be-recognized word in the first text, where the to-be-recognized word is different from the first word;
a second obtaining module 503, configured to obtain target information of the text to be recognized, where the target information includes at least one of pronunciation and annotation.
Optionally, the target speech further includes pronunciation of a preset cue word;
the determination module 502 includes:
the first recognition submodule is used for recognizing the target voice to obtain a second text corresponding to the target voice;
the first determining submodule is used for determining second characters except the preset prompt words in the second text;
the second determining submodule is used for determining a target word and sentence matched with the second word in the first text;
and the third determining submodule is used for determining characters to be recognized in the target words and sentences, wherein the characters to be recognized are different from the second characters.
Optionally, the first determining sub-module is configured to segment the second text according to the preset cue word to obtain a third word before the preset cue word and a fourth word after the preset cue word in the second text, where the second word includes the third word and the fourth word;
the second determination submodule includes:
a first determining unit, configured to determine a fifth word in the first text that matches the third word, and a sixth word that matches the fourth word;
a second determining unit, configured to determine a target word sentence including the fifth word and the sixth word from the first text.
Optionally, the second determining unit includes:
a first determining subunit, configured to determine, when the number of the fifth words or the sixth words is greater than 1, a target fifth word and a target sixth word that are located at a minimum interval in the first text, where the target fifth word precedes the target sixth word;
and the second determining subunit is configured to determine a target word and sentence in the first text by using the target fifth character as a start word and the target sixth character as an end word.
Optionally, the third determining submodule is configured to determine, as a character to be recognized, a character in the target sentence, which is located between the target fifth character and the target sixth character.
Optionally, the determining module 502 includes:
the second recognition submodule is used for recognizing the target voice to obtain a third text corresponding to the target voice;
a fourth determining submodule, configured to determine a seventh word in the first text, where the seventh word matches the third text;
the receiving submodule is used for receiving a first input of a user;
the fifth determining submodule is used for responding to the first input and determining the number K of the characters to be recognized, wherein K is a positive integer;
and the sixth determining submodule is used for determining K characters behind the seventh character in the first text as the characters to be recognized.
Optionally, the receiving sub-module is configured to receive a tap input of a user on the screen of the electronic device;
and the fifth determining submodule is used for determining the knocking times K of the knocking input as the character word number to be recognized.
Optionally, the number of the seventh characters is L, and L is an integer greater than 1;
the sixth determination submodule includes:
a third determining unit, configured to determine that K characters respectively located after each seventh character in the first text are candidate characters, so as to obtain L groups of candidate characters;
a receiving unit for receiving a second input of the user;
and the fourth determining unit is used for responding to the second input, determining a target candidate character from the L groups of candidate characters and determining the target candidate character as the character to be recognized.
Optionally, the sixth determining sub-module further includes:
an identification unit, configured to identify the L groups of candidate characters;
the receiving unit is used for receiving the selection input of the L groups of candidate characters by the user.
The information processing apparatus provided in the embodiment of the present invention may implement the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
The information processing apparatus 500 of the embodiment of the present invention obtains a target voice, where the target voice includes a pronunciation of a first word in a first text displayed on an electronic device; determining characters to be recognized in the first text according to the target voice, wherein the characters to be recognized are different from the first characters; acquiring target information of the character to be recognized, wherein the target information comprises at least one of pronunciation and annotation. Therefore, when a user reads by using the electronic equipment, if the user encounters unknown characters, the user can read the characters displayed on the equipment, the electronic equipment can be triggered to position the characters to be recognized according to the user's reading, and then information such as pronunciation or comments of the characters to be recognized is acquired.
The embodiment of the invention also provides the electronic equipment. Because the principle of the electronic device for solving the problem is similar to the information processing method in the embodiment of the present invention, the implementation of the electronic device may refer to the implementation of the method, and repeated details are not described again. As shown in fig. 6, the electronic device according to the embodiment of the present invention includes: the processor 600, which is used to read the program in the memory 620, executes the following processes:
acquiring target voice, wherein the target voice comprises pronunciation of a first word in a first text displayed on electronic equipment;
determining characters to be recognized in the first text according to the target voice, wherein the characters to be recognized are different from the first characters;
acquiring target information of the character to be recognized, wherein the target information comprises at least one of pronunciation and annotation.
A transceiver 610 for receiving and transmitting data under the control of the processor 600.
Where in fig. 6, the bus architecture may include any number of interconnected buses and bridges, with various circuits being linked together, particularly one or more processors represented by processor 600 and memory represented by memory 620. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 610 may be a number of elements including a transmitter and a transceiver providing a means for communicating with various other apparatus over a transmission medium. For different user devices, the user interface 630 may also be an interface capable of interfacing with a desired device externally, including but not limited to a keypad, display, speaker, microphone, joystick, etc. The processor 600 is responsible for managing the bus architecture and general processing, and the memory 620 may store data used by the processor 600 in performing operations.
Optionally, the target speech further includes pronunciation of a preset cue word;
the processor 600 is also used to read the program in the memory 620 and execute the following steps:
identifying the target voice to obtain a second text corresponding to the target voice;
determining second characters in the second text except the preset prompt words;
determining a target word sentence matched with the second character in the first text;
and determining characters to be recognized in the target words and sentences, wherein the characters to be recognized are different from the second characters.
Optionally, the processor 600 is further configured to read the program in the memory 620, and perform the following steps:
according to the preset cue word, the second text is segmented to obtain a third character in the second text before the preset cue word and a fourth character after the preset cue word, wherein the second character comprises the third character and the fourth character;
determining a fifth word in the first text, which is matched with the third word, and a sixth word which is matched with the fourth word;
and determining a target word sentence comprising the fifth word and the sixth word from the first text.
Optionally, the processor 600 is further configured to read the program in the memory 620, and perform the following steps:
under the condition that the number of the fifth words or the sixth words is larger than 1, determining target fifth words and target sixth words with the smallest position intervals in the first text, wherein the target fifth words are before the target sixth words;
and determining a target word and sentence in the first text by taking the target fifth character as a starting word and the target sixth character as an ending word.
Optionally, the processor 600 is further configured to read the program in the memory 620, and perform the following steps:
and determining characters positioned between the target fifth character and the target sixth character in the target words and sentences as characters to be recognized.
Optionally, the processor 600 is further configured to read the program in the memory 620, and perform the following steps:
identifying the target voice to obtain a third text corresponding to the target voice;
determining a seventh word in the first text that matches the third text;
receiving a first input of a user;
responding to the first input, and determining the word number K of the character to be recognized, wherein K is a positive integer;
and determining K characters positioned after the seventh character in the first text as the characters to be recognized.
Optionally, the processor 600 is further configured to read the program in the memory 620, and perform the following steps:
receiving a tap input of a user on a screen of the electronic equipment;
and determining the knocking times K of the knocking input as the number of the characters to be recognized.
Optionally, the number of the seventh characters is L, and L is an integer greater than 1;
the processor 600 is also used to read the program in the memory 620 and execute the following steps:
determining K characters respectively positioned behind each seventh character in the first text as candidate characters to obtain L groups of candidate characters;
receiving a second input of the user;
and responding to the second input, determining a target candidate character from the L groups of candidate characters, and determining the target candidate character as the character to be recognized.
Optionally, the processor 600 is further configured to read the program in the memory 620, and perform the following steps:
identifying the L groups of candidate words;
and receiving selection input of the L groups of candidate characters from a user.
The electronic device provided by the embodiment of the present invention can execute the above method embodiments, and the implementation principle and technical effect are similar, which are not described herein again.
Furthermore, a computer-readable storage medium of an embodiment of the present invention stores a computer program executable by a processor to implement:
acquiring target voice, wherein the target voice comprises pronunciation of a first word in a first text displayed on electronic equipment;
determining characters to be recognized in the first text according to the target voice, wherein the characters to be recognized are different from the first characters;
acquiring target information of the character to be recognized, wherein the target information comprises at least one of pronunciation and annotation.
Optionally, the target speech further includes pronunciation of a preset cue word;
determining the characters to be recognized in the first text according to the target voice, wherein the determining comprises the following steps:
identifying the target voice to obtain a second text corresponding to the target voice;
determining second characters in the second text except the preset prompt words;
determining a target word sentence matched with the second character in the first text;
and determining characters to be recognized in the target words and sentences, wherein the characters to be recognized are different from the second characters.
Optionally, the determining second words in the second text except for the preset cue word includes:
according to the preset cue word, the second text is segmented to obtain a third character in the second text before the preset cue word and a fourth character after the preset cue word, wherein the second character comprises the third character and the fourth character;
the determining the target words and phrases in the first text that match the second words includes:
determining a fifth word in the first text, which is matched with the third word, and a sixth word which is matched with the fourth word;
and determining a target word sentence comprising the fifth word and the sixth word from the first text.
Optionally, the determining a target word sentence including the fifth word and the sixth word from the first text includes:
under the condition that the number of the fifth words or the sixth words is larger than 1, determining target fifth words and target sixth words with the smallest position intervals in the first text, wherein the target fifth words are before the target sixth words;
and determining a target word and sentence in the first text by taking the target fifth character as a starting word and the target sixth character as an ending word.
Optionally, the determining the characters to be recognized in the target words and sentences includes:
and determining characters positioned between the target fifth character and the target sixth character in the target words and sentences as characters to be recognized.
Optionally, the determining, according to the target speech, a character to be recognized in the first text includes:
identifying the target voice to obtain a third text corresponding to the target voice;
determining a seventh word in the first text that matches the third text;
receiving a first input of a user;
responding to the first input, and determining the word number K of the character to be recognized, wherein K is a positive integer;
and determining K characters positioned after the seventh character in the first text as the characters to be recognized.
Optionally, the receiving a first input of the user includes:
receiving a tap input of a user on a screen of the electronic equipment;
the responding to the first input, and determining the number K of the character words to be recognized comprises the following steps:
and determining the knocking times K of the knocking input as the number of the characters to be recognized.
Optionally, the number of the seventh characters is L, and L is an integer greater than 1;
the determining that K characters located after the seventh character in the first text are the characters to be recognized includes:
determining K characters respectively positioned behind each seventh character in the first text as candidate characters to obtain L groups of candidate characters;
receiving a second input of the user;
and responding to the second input, determining a target candidate character from the L groups of candidate characters, and determining the target candidate character as the character to be recognized.
Optionally, before receiving the second input of the user, the method further includes:
identifying the L groups of candidate words;
the receiving of the second input of the user comprises:
and receiving selection input of the L groups of candidate characters from a user.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.