Summary of the invention
A kind of video telephone control method and the equipment that the objective of the invention is to overcome above-mentioned defective and provide.It is by speech recognition Long-distance Control video telephone, thereby accomplishes can conveniently control use when video telephone has.
A kind of video telephone control method for realizing that the object of the invention provides comprises the following steps:
Steps A) when the user uses video telephone, video telephone receives the user's voice instruction;
Step B) video telephone judges that the current speech input pattern is the phonetic order input pattern, converts this phonetic order to literal, relatively obtains the corresponding instruction code with predefined phonetic order literal then;
Step C) video telephone changes voice data input mode according to instruction code controlling current speech input pattern into by the phonetic order input pattern, the user converses by common talking mode, can't control video telephone by speech recognition in described voice data input mode.
Control method of the present invention can also comprise the following steps:
Video telephone is under default situations, and setting the current speech input pattern is the phonetic order input pattern;
Control method of the present invention also can also comprise the following steps:
Step D) under the phonetic order input pattern when the phonetic order of input when being invalid phonetic order, abandon the invalid phonetic order of being imported.
Described step D) also comprise the following steps: step D1) under the situation that can't discern phonetic order, make video telephone become voice data input mode by the instruction input pattern according to commands for controlling current speech input state by video telephone keyboard operation input instruction.
Video telephone control method of the present invention can also comprise the following steps:
In communication process, can utilize the video telephone control module that voice data input mode is converted to the phonetic order input pattern by keyboard operation.
Video telephone is converted to the phonetic order input pattern automatically or by keyboard operation with voice data input mode when communication process finishes on-hook.
Described step B) comprises the following steps:
When input instruction " hands-free ", speech recognition and instruction transformation resume module instruction input back send order CMD_SPEARER_ON to the video telephone control module, and the video telephone control module is set to hands-free state with loudspeaker after receiving the CMD_SPEARER_ON order.
When input instruction " dialing+number ", speech recognition and instruction transformation resume module instruction input back send order CMD_DIAL to the video telephone control module, and the video telephone control module receives CMD_DIAL order back control video telephone communication module and calls out.
When input instruction " is answered conversation ", speech recognition and instruction transformation resume module instruction input back send order CMD_TALK_START to the video telephone control module, and the video telephone control module receives CMD_TALK_START order back control video telephone communication module and begins to converse.
The present invention also provides a kind of speech recognition controlled visual telephone, comprises the phonetic entry receiving terminal, speech processing module, the video telephone communication module, the video telephone display module, current speech input state judge module, speech recognition and instruction transformation module and video telephone control module;
Current speech input state judge module is used to judge current speech input pattern state, and sends voice signal to speech recognition and instruction transformation module or speech processing module according to judged result;
Speech recognition and instruction transformation module, be used for when the current speech input pattern is the phonetic order input pattern, to import phonetic order and convert literal to, relatively obtain the corresponding instruction code with predefined phonetic order literal then, send the video telephone control module to;
The video telephone control module, being used to control video telephone converses, and when receiving the current speech conversion instruction, will work as the phonetic entry pattern and change phonetic order input pattern or voice data input mode into, can't control video telephone by speech recognition in described voice data input mode.
Video telephone of the present invention can also comprise the keyboard input module;
The keyboard input module is used to import the instruction of control video telephone, and when the instruction of input speech conversion, is transferred to video telephone control module control current speech input pattern.
The present invention uses speech recognition that video telephone is controlled, thereby reach the method and apparatus that the common realization that is equal to Keyboard Control is conveniently controlled video telephone, it judges that by speech recognition user speech is input as phonetic order input or speech data input, control video telephone respectively or receive speech data, realize simple far distance controlled video telephone, thereby accomplish also can realize controlling video telephone smoothly when video telephone has, the user uses video telephone more easily at a distance.
Embodiment
Further describe video telephone control method of the present invention and equipment below in conjunction with accompanying drawing 1,2.
As shown in Figure 1, be the system construction drawing of speech recognition video telephone in the present embodiment, comprise phonetic entry receiving terminal 1, current speech input state judge module 2, speech processing module 5, speech recognition and instruction transformation module 3, video telephone control module 4, video telephone display module 7, video telephone communication module 6, keyboard input module 8.
Phonetic entry receiving terminal 1 is used to receive user's voice, and gives current speech input state judge module 2 with voice transfer.
User's voice generally by pick up facility, as microphone, is picked up the sound that the user in the surrounding environment sends, and this voice signal is generally analog signal, and perhaps analog signal conversion is to pass to current speech input state judge module 2 after the digital signal.
Preferably, after the phonetic entry receiving terminal 1 in the present embodiment receives user's voice, voice analog signal is converted to digital signal, just passes to current speech input state judge module 2 then.
Voice analog signal is converted to digital signal, can adopt known A/D converter to realize.
Current speech input state judge module 2 is used to judge current speech input pattern state, and sends voice signal to speech recognition and instruction transformation module 3 or speech processing module 5 according to judged result.
Current speech input state judge module 2 is by judging that to the current speech input state deciding current input voice is as the instruction input, still imports as the simple speech data.
When importing as phonetic order, be converted to control command by speech recognition and instruction module 3 so after, this instruction is passed to video telephone control module 4 realizes control video telephone;
When importing as speech data, as phonetic entry in the communication process or recording input, the speech data speech processing module 5 that will be transported to video telephone is handled so.
Speech recognition and instruction transformation module 3, be used for when the current speech input pattern is the phonetic order input pattern, to import phonetic order and convert literal to, relatively obtain the corresponding instruction code with predefined phonetic order literal then, send video telephone control module 4 to.
Video telephone control module 4 is used to control video telephone and converses, and will work as the phonetic entry pattern change phonetic order input pattern or voice data input mode into when receiving the current speech conversion instruction.
In order to reach the conversion of voice to instruction, in the present embodiment, defined the literal expression of various command, when carry out voice to the instruction conversion the time, advanced lang sound identification, obtain the literal meaning that the user wants to express, compare with the literal expression of the various command of definition then and judge the user and want the order used, handle the purpose that reaches the control video telephone by 4 pairs of orders of video telephone control module.
As shown in Figure 2, further describe video telephone sound control method of the present invention below:
1) video telephone is under default situations, and video telephone control module 4 is set the current speech input state and is the instruction input pattern.
2) when the user by phonetic order or keyboard operation use that video telephone is conversed, when recording or loopback, by the phonetic entry receiving terminal with the user's voice command reception.
Phonetic order among the present invention generally is one section statement that is no more than 10 characters (Chinese character or English word), and the user can be provided with the content of phonetic order, resolves instruction by speech recognition and instruction transformation module 3; When the user makes video telephone converse (perhaps recording, loopback) by phonetic order or keyboard operation, correspondingly, speech recognition and instruction transformation module 3 or keyboard input module 8 can send the corresponding command, the conversion of video telephone control module 4 by the corresponding command being finished dealing with from the phonetic order input state to the speech data input state to video telephone control module 4.
Loopback is about to the sound of video telephone audio frequency input and exports from audio output port.
Phonetic matrix is the pcm audio data format of 16 bits.
3) judge that by current speech input state judge module 2 the current speech input pattern is the phonetic order input pattern, this phonetic order is given to speech recognition and instruction transformation module 3, speech recognition and instruction transformation module 3 convert phonetic order to literal, relatively obtain the corresponding instruction code with predefined phonetic order literal then, send video telephone control module 4 to.
The literal expression of various command can be provided with according to user's needs, but the predefine voice need be provided the order that provides.
As: " dialing+number " represents certain call number of making a telephone call to, and hands-free phone is enabled in " hands-free " representative, and " answering conversation " representative begins conversation or the like.
Convert phonetic order input to literal by speech recognition, then and the phonetic order literal of storage in advance relatively obtain the corresponding instruction code.
Under the instruction input pattern, when input instruction " hands-free ", speech recognition and instruction transformation module 3 processing instructions input back send order CMD_SPEARER_ON to video telephone control module 4, and video telephone control module 4 is set to hands-free state with loudspeaker after receiving the CMD_SPEARER_ON order.
That is: phonetic entry " hands-free "--" literal " hands-free "--" CMD_SPEARER_ON (parameter is empty)
Under the instruction input pattern, when input instruction " dialing+number ", speech recognition and instruction transformation module 3 send order CMD_DIAL by processing instruction input back to video telephone control module 4, and video telephone control module 4 receives CMD_DIAL order back control video telephone communication module 6 and calls out.
That is: phonetic entry " dialing+number "--" literal " dialing+number "--" CMD_DIAL (parameter is a number)
Under the instruction input pattern, when input instruction " is answered conversation ", speech recognition and instruction transformation module 3 send order CMD_TALK_START by processing instruction input back to video telephone control module 4, and video telephone control module 4 receives CMD_TALK_START order back control video telephone communication module 6 and begins to converse.
That is: phonetic entry " is answered conversation ", and--" literal " is answered conversation "--" CMD_TALK_START (parameter is for empty)
The corresponding command that speech recognition and instruction transformation module 3 send to video telephone control module 4 is CMD_TALK_START, CMD_SPEARER-ON, CMD_DIAL, video telephone utilizes 6 communications of video telephone communication module according to corresponding instruction, and image is shown to the user by video telephone display module 7.
4) video telephone control module 4 changes voice data input mode according to instruction code controlling current speech input pattern into by the phonetic order input pattern, and the user converses by speech processing module 5 by common talking mode.
After the phonetic entry state of video telephone changes voice data input mode into from the instruction input pattern, the user utilizes video telephone to begin conversation and transmission visual image, after the phonetic entry receiving terminal receives the speech data of user's input, current speech input state judge module judges that the state of current speech input is a voice data input mode, send input voice data to speech processing module 5, the user converses normally.
When conversing (perhaps recording, loopback) normally when the user uses phonovision, can't be by video telephone being controlled by speech recognition, at this moment all input voice datas all are the speech data inputs as the non-voice instruction.
5) under the phonetic order input pattern when input during illegal command, speech recognition and instruction transformation module 3 can abandon the input illegal command.
Preferably, consider that voice control is a kind of replenishment control method as Keyboard Control, so the voice-operated priority height of the priority ratio of Keyboard Control.
Under the situation that can't discern voice, can make video telephone control module root 4 become voice data input mode by the instruction input pattern by the keyboard operation input instruction of keyboard input module 8 according to commands for controlling current speech input state;
6) in communication process, can utilize video telephone control module 4 that voice data input mode is converted to the phonetic order input pattern by keyboard operation, as the special function keys of pressing on the lower keyboard realize; Perhaps video telephone is converted to the phonetic order input pattern by video telephone control module 4 automatically or by keyboard operation with voice data input mode when communication process finishes on-hook.
In communication process, utilize video telephone control module 4 that voice data input mode is converted to the phonetic order input pattern by keyboard operation, as the special function keys of pressing on the lower keyboard realize.
Under this video telephone working method, Keysheet module 8 need be provided with particular key and finish the conversion of phonetic entry pattern to the phonetic order pattern, finishes the conversion of phonetic entry pattern to the phonetic order pattern by particular key.
Also can be when communication process finish on-hook, video telephone control module 4 is received the on-hook response, and the current speech data entry mode is converted to the phonetic order input pattern automatically or by keyboard operation.
Video telephone of the present invention in use, the acquiescence situation under, the current speech input pattern is the phonetic order input pattern; When the user by phonetic order or keyboard operation make that video telephone is conversed, when recording or loopback, the current speech input pattern changes voice data input mode into by the phonetic order input pattern; Converse when the user uses video telephone, when recording or loopback, can't control video telephone by speech recognition, at this moment all input voice datas all are as the input of non-voice instruction speech data, want to reenter speech recognition commands for controlling video telephone state, need realize by keyboard operation.When end of conversation is hung up, the current speech input pattern is become the phonetic order input pattern with voice data input mode automatically or by keyboard operation.
The present invention utilizes ripe existing known speech recognition technology to realize video telephone is controlled, thereby reach the purpose of conveniently video telephone being controlled with common realization such as Keyboard Control, remote controller control, it is by language speech recognition Long-distance Control video telephone, thereby accomplishing is having with a certain distance from video telephone, generally also can make things convenient in the time of 0~5 meter and use video telephone to carry out work smoothly.
Present embodiment is to make those of ordinary skills understand the present invention; and to detailed description that the present invention carried out; but can expect; in the scope that does not break away from claim of the present invention and contained, can also make other variation and modification, these variations and revising all in protection scope of the present invention.