CN1282946C - Speech recognition conversation selection device, speech recogntion conversation system, speech recognition conversation selection method, and program - Google Patents
Speech recognition conversation selection device, speech recogntion conversation system, speech recognition conversation selection method, and program Download PDFInfo
- Publication number
- CN1282946C CN1282946C CNB038003465A CN03800346A CN1282946C CN 1282946 C CN1282946 C CN 1282946C CN B038003465 A CNB038003465 A CN B038003465A CN 03800346 A CN03800346 A CN 03800346A CN 1282946 C CN1282946 C CN 1282946C
- Authority
- CN
- China
- Prior art keywords
- interface
- speech recognition
- dispensing device
- data
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
Abstract
In a voice recognition dialogue system having a plurality of recognition dialogue servers, there is no framework to select and determine one recognition dialogue server. A client 10 transmits its ability information stored in a terminal information storage 140 to a recognition dialogue selecting server 20. The ability of the client 10 includes a CODEC ability (CODEC type, CODEC compression mode, etc.), a voice data format (compressed voice data, feature vector, etc.), a recorded voice I/O function, a synthesized voice I/O function (without synthesizing engine, with intermediate representation input engine, with character string input engine, etc.), and service contents. The recognition dialogue selecting server 20 receives the ability information transmitted from the client 10, and determines the optimum recognition dialogue server according to ability information of plural recognition dialogue servers which has been stored in a recognition dialogue server information storage 230 and information of the requested service contents.
Description
Technical field
The present invention relates to the speech recognition Interface, speech recognition dialogue system of selection, speech recognition dialogue selecting arrangement, and the recording medium of speech recognition dialogue option program, utilize speech recognition dialogue system of selection, device and program send to identification dialog server with the voice data of the terminal (client computer) of terminal and so on by network being input to as mobile phone, automobile, and carry out voice dialogues by speech recognition with replying at the identification dialog server.
Background technology
Routinely, utilize the speech recognition conversational system of VoIP (Voiceover Internet Protocol (networking telephone)) often to be called as the speech recognition Interface of client-server type, utilize this device to be sent to the identification dialog server by Packet Based Network, carry out the speech recognition dialog process at the identification dialog server then from the voice data of client computer output.For example, in the Nikkei Internet technology 130-137 in March, 1998 page or leaf, such speech recognition conversational system is had been described in detail.
In utilizing the system of VoIP, by speech recognition with reply the execution in the known main frame (framework) in the IP address of client computer and identification dialog server of the speech recognition of (speech synthetic, that record etc.) or voice dialogues.In this main frame, utilize the IP address interconnect so that can carry out carrying out the speech recognition dialogue under the condition of packet communication in client computer and identification dialog server, and the grouping of voice data sends to the identification dialog server from client computer.
In the open No.10-333693 of Jap.P., disclosed automatic speech recognition service method and system thereof can be provided.The structure of this system makes discerns voice data by from client computer voice data is sent to the speech recognition server on Packet Based Network.
But, in the above-mentioned conventional system that utilizes VoIP, need in all known main frame in the IP address of client computer and identification dialog server, carry out speech recognition and voice dialogues.Therefore, when having a plurality of identification dialog server, need exploitation to be used to select and will discern the new system that dialog server is associated with client computer the identification dialog server of client-server the best.
Similarly, can provide automatic speech recognition service method and system thereof for what disclose among the open No.10-333693 of Jap.P., when having a plurality of identification dialog server, also need to develop and be used to select for the identification dialog server of client computer the best and will discern the new system that dialog server is associated with client computer.
An object of the present invention is to provide the speech recognition Interface, speech recognition dialogue system of selection, speech recognition dialogue selecting arrangement, and be used for when having a plurality of identification dialog server, can be by the performance of pointing out client computer and the performance of discerning dialog server, select the best identified dialog server, and can between identification dialog server of determining and client computer, carry out the recording medium of the speech recognition dialogue option program of speech recognition dialogue.
Summary of the invention
In order to obtain above-mentioned purpose, speech recognition Interface of the present invention comprises: a plurality of Interfaces that are used to carry out the speech recognition dialogue; Be used for sending the dispensing device of speech information to Interface; The network that connects dispensing device and Interface; And the selecting arrangement of in a plurality of Interfaces, selecting an Interface according to the performance of the performance (ability) of dispensing device and a plurality of Interfaces.
In addition, speech recognition Interface of the present invention can comprise; Be used to carry out a plurality of Interfaces of speech recognition dialogue; Be used for sending the dispensing device of speech information to Interface; The network that connects dispensing device and Interface; With in a plurality of Interfaces, select an Interface according to the performance of dispensing device and the performance of a plurality of Interfaces, and possess the information that is used to specify selected Interface to the dispensing device transmission, and the selecting arrangement of the required information of speech recognition dialogue is carried out in exchange between selected Interface and dispensing device.
In addition, speech recognition Interface of the present invention can also comprise the request unit that is used for to Interface request service, and described network connects dispensing device, request unit and Interface.Can also comprise the service retaining device that is used to keep to Interface requested service content; The Connection Service retaining device, the network of dispensing device and Interface.
In above-mentioned speech recognition Interface, can use another to have and send the information be used to specify selected Interface, and the selecting arrangement of Exchange Service content and speech information function replaces above-mentioned selecting arrangement between selected Interface and request and dispensing device to dispensing device.In addition, as selecting arrangement, can use to have the selecting arrangement of a selected Interface being changed into another selected Interface function.
As selecting arrangement, can use another selecting arrangement with following function, promptly, the performance that can compare dispensing device and a plurality of Interfaces, and determine to have the Interface of the input format of the speech information that the is input to Interface this desired properties consistent with the output format of the speech information that outputs to dispensing device according to comparative result.As selecting arrangement, can use another selecting arrangement with following function, promptly, can compare the service of dispensing device and the performance of performance and a plurality of Interfaces, and determine to have the Interface of the input format of the speech information that the is input to Interface this desired properties consistent according to comparative result with the output format of the speech information that outputs to dispensing device.
Speech information as from dispensing device output preferably uses by digitized voice/data, the speech information that compressed speech data or characteristic vector data form.In addition, be used for determining that the dispensing device properties data preferably includes the CODEC performance, voice data format and the data of recording/synthesize speech I/O function.Be used for determining that the Interface properties data preferably includes the CODEC performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition performance and operation information.
More particularly, speech recognition Interface of the present invention can comprise: a plurality of speech recognition dialog servers that are used to carry out the speech recognition dialogue; Be used to send to the content service of speech recognition dialog server request and the client computer of speech information; Be used for selecting the speech recognition dialogue of an Interface to select server at a plurality of Interfaces; And the connection client computer, the network of server is selected in speech recognition dialog server and speech recognition dialogue.
Client computer can comprise: the data input cell that is used to import speech information and service content-data, the end message storer that is used for the storage client performance data, be used for selecting to communicate between the server and send the data communication units of speech information in speech recognition dialog server and speech recognition to selected speech recognition dialog server by network, and the controller that is used to control the operation of client computer.
Speech recognition dialogue selects server to comprise: be used for the data communication units that communicates between client computer and speech recognition dialog server by network, be used to store the identification dialog server information-storing device of each speech recognition dialog server performance, and the performance data that is used for reading the client computer that the end message storer stores, the performance data of the speech recognition dialog server of storing in this performance data and the identification dialog server information-storing device relatively, in a plurality of speech recognition dialog servers, determine at least one speech recognition dialog server, send the identification dialog server determining unit that is used to specify the required information of definite speech recognition dialog server to client computer then.
The speech recognition dialog server can comprise: be used for according to the speech recognition dialogue performance element of carrying out the speech recognition dialogue from the speech information of client computer input, be used for selecting the data communication units that communicates between the server in the dialogue of client computer and speech recognition by network, and the controller that is used to control the operation of speech recognition dialog server.
In this case, the speech recognition Interface can comprise: be connected to network and keep from the service content reservation server of the service content of client requests and be arranged in the speech recognition dialog server and read in the service content that keeps in the service content reservation server read the unit.In addition, the speech recognition Interface can also comprise be arranged in the speech recognition dialog server, be used for selecting server output the speech recognition dialog process to be transferred to the process transfer device of the request of another speech recognition dialog server to speech recognition dialogue.The speech information of client computer output is preferably by digitized voice/data, and compressed speech data or characteristic vector data form.
In addition, be used for determining that the data of client capabilities preferably include: CODEC performance, voice data format and record/synthesize the data of speech I/O function.Be used in addition determining that speech recognition dialog server properties data preferably includes: the CODEC performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
Speech recognition dialogue system of selection of the present invention is used for carrying out data communication by network between dispensing device and a plurality of Interface, and be used to carry out and will send to the processing of specifying Interface from the speech information data of dispensing device output, it comprises: the first step that receives the speech information data from dispensing device; Second step to the performance data of dispensing device send-request unit; Send the third step of the performance data of this dispensing device from dispensing device; Relatively from the performance data of dispensing device and the performance data of a plurality of Interfaces, and according to definite the 4th step of specifying Interface of comparative result; The notice dispensing device is specified the 5th step of the information of determined Interface; And between dispensing device and definite Interface, carry out the 6th step of speech recognition dialog process.In this case, speech recognition dialogue system of selection can also comprise: during the speech recognition dialog process between dispensing device and the Interface, send from seven step of Interface to the request of the copy of another Interface transfer dispensing device; The 8th step to the performance data of dispensing device send-request unit; Respond the request in the 8th step, send the 9th step of the performance data of this dispensing device from this dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the tenth step of new Interface according to comparative result; The notice dispensing device is used to specify the 11 step of the required information of Interface definite in the tenth step; And carry out the 12 step of speech recognition dialog process between Interface of in the tenth step, determining and the dispensing device.
In addition, can constitute speech recognition dialogue system of selection of the present invention, be used for by network at dispensing device, carry out data communication between a plurality of Interfaces and the service retaining device, execution will send to the process of the Interface of appointment from the speech information data of dispensing device output, and this method can comprise: reception comprises from the first step of the request of the content service of the speech recognition dialog process of dispensing device output; Second step to the performance data of this dispensing device of dispensing device request; Send the third step of the performance data of this dispensing device from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and in a plurality of Interfaces, determine the 4th step of the Interface of appointment according to comparative result; The notice dispensing device specifies in the 5th step of the required information of Interface definite in the 4th step; The 6th step of the speech recognition dialog process between the Interface of carrying out dispensing device and in the 4th step, determining; The Interface of determining from the 4th step is to seven step of service retaining device request from dispensing device requested service content; The Interface of determining in the 4th step is sent in the 8th step of requested service content in the 7th step; Read in the 9th step of the service content that sends in the 8th step by the Interface of determining in the 4th step; And carry out the tenth step of the speech recognition dialog process between the Interface of determining in dispensing device and the 4th step according to the service content of reading in.
In this case, speech recognition dialogue selecting arrangement can also comprise: during the speech recognition dialog process between dispensing device and the Interface, send from ten one step of Interface to the request of the copy of another Interface transfer dispensing device; The 12 step to the performance data of this dispensing device of dispensing device request; Send the 13 step of the performance data of this dispensing device from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the 14 step of a new Interface according to comparative result; The notice dispensing device is specified the 15 step of the required information of Interface definite in the 14 step; And Interface of determining in execution the 14 step and the 16 step of the speech recognition dialog process between the dispensing device.
As speech information, preferably use to comprise digitized voice/data, the speech information of compressed speech data or characteristic vector data.In addition, the performance data of dispensing device preferably includes: the CODEC performance, the data of speech I/O function and service content are recorded/synthesized to voice data format.The performance data of Interface preferably includes: the CODEC performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
Speech recognition dialogue selecting arrangement of the present invention can be configured to by the data communication between network execution dispensing device and a plurality of Interface, and comprise and be used to select to specify Interface and send from the selecting arrangement of the speech information data of dispensing device output to the Interface of appointment, when selecting, selecting arrangement is specified Interface according to the performance of dispensing device and the performance of a plurality of Interfaces.
In addition, speech recognition dialogue selecting arrangement of the present invention can be configured to by the data communication between network execution dispensing device and a plurality of Interface, carry out to select the Interface of appointment and send from the process of the speech information data of dispensing device output to the Interface of appointment, it comprises: be used to receive first device of wanting reformed data from the speech information and the expression Interface of dispensing device; Be used for second device to the performance data of this dispensing device of dispensing device request; Response is used for sending from dispensing device the 3rd device of performance data from the request of second device; Be used for the performance data of comparison dispensing device and the performance data of a plurality of Interfaces, and determine the 4th device of Interface according to comparative result; The notice dispensing device is used to specify the 5th device of the information of the Interface of being determined by the 4th device.
In this case, speech information preferably includes digital voice data, compressed speech data, or characteristic vector data.In addition, the performance data of dispensing device preferably includes: the CODEC performance, the data of speech I/O function and service content are recorded/synthesized to voice data format.The performance data of Interface also preferably includes: the CODEC performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
The present invention can realize by the identification of recording of voice on recording medium dialogue option program.That is to say, can dispose the recording medium that is used for according to speech recognition dialogue option program of the present invention, by the data communication between network execution dispensing device and a plurality of Interface, execution is to the process of specifying the Interface transmission from the speech information data of dispensing device output, and the step of recording of voice identification dialogue option program comprises: the first step that receives the speech information data from dispensing device; Second step to the performance data of this dispensing device of dispensing device request; Send this dispensing device performance data third step from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine to specify the 4th step of Interface according to comparative result; The notice dispensing device is specified the 5th step of determining the information of Interface; And carry out the 6th step of the speech recognition dialog process between dispensing device and the definite Interface.
In this case, the speech recognition dialogue option program that recording medium can write down also comprises: during the speech recognition dialog process between dispensing device and the Interface, transmission is used for from seven step of Interface to the request of the copy of another Interface transfer dispensing device; The 8th step to the performance data of dispensing device send-request unit; The request in the 8th step of responding sends the 9th step of the performance data of this dispensing device from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the tenth step of new Interface according to comparative result; The notice dispensing device is specified the 11 step of the required information of Interface definite in the tenth step; And Interface of determining in execution the tenth step and the 12 step of the speech recognition dialog process between the dispensing device.
For the speech recognition dialogue option program that writes down in the recording medium, preferably be used for carrying out dispensing device by network, data communication between a plurality of Interfaces and the service retaining device, with carry out to specifying Interface to send from the speech recognition dialogue option program of the process of the speech information data of dispensing device output, this program comprises: receive the first step that comprises from the request of the service content of the speech recognition dialog process of dispensing device output; Second step to the performance data of this dispensing device of dispensing device request; Send the third step of the performance data of this dispensing device from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine to specify the 4th step of Interface according to comparative result; The notice dispensing device is specified the 5th step of the required information of Interface definite in the 4th step; Carry out the 6th step of the speech recognition dialog procedure between the Interface of determining in dispensing device and the 4th step; The Interface of determining from the 4th step is to seven step of service retaining device request from dispensing device requested service content; Be sent in the 8th step of requested service content in the 7th step to the Interface of in the 4th step, determining; Read in the 9th step of the service content of the 8th step transmission by the Interface of determining in the 4th step; And carry out the tenth step of the speech recognition dialog process between the Interface of determining in dispensing device and the 4th step according to the service content of reading in.
In this case, speech recognition dialogue option program preferably also comprises: during the speech recognition dialog process between dispensing device and the Interface, the 11 step of the copy of dispensing device is shifted in the request that sends to another Interface from Interface; The 12 step to the performance data of this dispensing device of dispensing device request; Send the 13 step of the performance data of this dispensing device from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the 14 step of new Interface according to comparative result; The notice dispensing device is specified the 15 step of the required information of Interface definite in the 14 step; And Interface of determining in execution the 14 step and the 16 step of the speech recognition dialog process between the dispensing device.As speech information, preferably use to comprise digitized voice/data, the speech information of compressed speech data or characteristic vector data.In addition, the performance data of dispensing device preferably includes: the CODEC performance, the data of speech I/O function and service content are recorded/synthesized to voice data format.The performance data of Interface preferably includes: the CODEC performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
Speech recognition conversational system according to the present invention is the system that connects client computer and a plurality of identification dialog server composition by network.Even under the situation that has a plurality of identification dialog servers, also can in these servers, select and determine the identification dialog server of the best, from carrying out the speech recognition dialogue at the identification dialog server of the best.
A kind of method example that is used for definite best identified dialog server is, the performance data of client computer and identification dialog server relatively, thus definite method of selecting the highest and operating identification dialog server of performance in dialog servers discerned at consistent these of the output/input of client computer 10 and identification dialog server 30.
Be used for determining that the data of client capabilities comprise: CODEC performance (CODEC type, CODEC compact model etc.), voice data format (compressed voice data, proper vector etc.), the speech I/O function of recording, synthetic speech I/O function (do not possess Compositing Engine, possess middle performance input engine, possess character string input engine etc.), the data of service content etc.Be used for determining that identification dialog server properties data comprises: CODEC performance (CODEC type, CODEC mode of extension etc.), the speech output function of recording, synthetic speech output function (do not possess Compositing Engine, possess middle performance output engine, possess waveform output engine etc.), service content, the performance of recognition engine (task dedicated engine, instruction engine, command recognition engine etc.), operation information etc. data.The type of CODEC can be AMR-NB, AMR-WB etc.The example of performance is the performance after character string is converted to the phonic symbol string in the middle of the synthetic speech.Service content comprises the service of address identification, telephone number identification and credit card number identification of resembling the identification of Address Recognition, name, incoming call tone and so on.
The processing unit of determining the identification dialog server can be included in the web server, and server is selected in the identification dialogue, or in the identification dialog server, also can be included at web server, or the identification dialogue is selected in server and the identification dialog server.
According to the present invention, can utilize the best identified dialog server to carry out the speech recognition dialogue.In addition, because identification dialog server self has the ability of determining the identification dialog server, so terminal can be visited another suitable identification dialog server automatically at session.
According to the present invention, can also receive service content from other server (for example, the server of web server or content supplier) beyond the identification dialog server, thereby carry out the speech recognition dialogue according to the service content that receives.The form of service content can be, for example VoiceXML document or service name.
Description of drawings
Fig. 1 shows the structural drawing according to the speech recognition conversational system of the embodiment of the invention.
Fig. 2 shows the block scheme according to the structure of client computer 10 of the present invention.
Fig. 3 shows the block scheme according to the structure of the identification dialog server 30 of the embodiment of the invention.
Fig. 4 shows the block scheme that the structure of server 20 is selected in identification dialogue according to the present invention.
Fig. 5 shows in the speech recognition conversational system according to the embodiment of the invention, selects definite process flow diagram of discerning the process of dialog server in the server 20 in the identification dialogue.
Fig. 6 shows at the process flow diagram according to the speech recognition dialog procedure in the speech recognition dialogue method of the embodiment of the invention.
Fig. 7 shows in the speech recognition conversational system according to the embodiment of the invention, during identification dialog server 30 is carried out the identification dialog process, and the process flow diagram flow chart when the identification dialogue selects server 20 to determine new identification dialog server 80.
Fig. 8 has shown the block diagram according to embodiment of the invention identification dialogue performance server 40.
Fig. 9 shows in according to the speech recognition dialogue method of the embodiment of the invention during the identification dialog process, the process flow diagram flow chart when identification dialogue performance server 40 is determined new identification dialog server 80.
The synoptic diagram of identification dialog server C 50 after Figure 10 shows and increases speech recognition beginning of conversation unit and service content read the unit in device shown in Figure 4 according to the embodiment of the invention.Figure 11 shows in the speech recognition dialogue method according to the embodiment of the invention, the process flow diagram flow chart of identification dialog server C 50 when service content reservation server 60 reads in service content.
Figure 12 shows on the recording medium 902 of computer server 901 and logging program the program sketch of carrying out according to the speech recognition dialogue method of the embodiment of the invention.
Embodiment
Explain embodiments of the invention in detail below with reference to accompanying drawing.
The present invention utilizes network to provide the speech recognition conversational system of speech recognition dialogue service, this system to have when having a plurality of identification dialog server, can select and the function of definite best identified dialog server.
Next, describe embodiments of the invention with reference to the accompanying drawings in detail.Fig. 1 shows the structural drawing according to the speech recognition conversational system of the embodiment of the invention.Client computer 10 is selected server 20 by network 1 and identification dialogue, identification dialog server 30, and identification dialogue performance server 40, identification dialog server C 50, new identification dialog server 80 is connected with service content reservation server 60.At this, client computer 10 is as the request unit of dispensing device that sends speech information and request service content.
The type of network 1 can be Internet (Internet) (comprising wired and wireless) or in-house network (Intranet).
Fig. 2 shows the block diagram of client computer 10 of the present invention.Client computer 10 can be a portable terminal, PDA, automobile terminal, personal computer or home terminal.Client computer 10 is by the controller 120 that is used to control client computer 10, and the data communication units 130 that is used to keep the end message storer 140 of client computer 10 performances and pass through network 1 executive communication is formed.
For judging client computer 10 properties data, use CODEC performance (CODEC type, CODEC compact model etc.), voice data format (compressed voice data, proper vector etc.), the speech I/O function of recording, synthetic speech I/O function (do not possess Compositing Engine, possess middle performance input engine, possess character string input engine etc.), the data of service content.
Should be noted that to provide Internet-browser as user interface to client computer 10.The data of service content comprise as Address Recognition, name identification, the header identification of incoming call tone, the service data of telephone number identification and credit card number identification etc.
Fig. 3 shows the block diagram according to the identification dialog server 30 of the embodiment of the invention.Identification dialog server 30 is used to data communication units 310 compositions of carrying out the speech recognition dialogue performance element 330 of speech recognition and dialogue and being used for carrying out by network 1 by the controller 320 that is used to control identification dialog server 30.
Fig. 4 shows the block diagram that server 20 is selected in identification dialogue according to the present invention.The identification dialogue selects server 20 by the data communication units 210 by network 1 executive communication, when having a plurality of identification dialog server, be used to select and identification dialog server information-storing device 230 that the identification dialog server determining unit 220 of definite best identified dialog server and being used to is stored the performance information of selected and definite identification dialog server is formed.At this, identification dialogue selects server 20 to comprise according to as the performance of the client computer 10 of dispensing device and request unit with as the performance of the identified server of Interface, selects to specify the selecting arrangement of Interface in a plurality of dialog servers.
For judging identification dialog server properties data, use CODEC performance (CODEC type, CODEC mode of extension etc.), voice data format (compressed voice data, proper vector etc.), the speech output function of recording, synthetic speech output function (do not possess Compositing Engine, possess middle performance input engine, possess waveform output engine etc.), service content, recognition engine performance (task dedicated engine, instruction engine, command recognition engine etc.), the data of operation information.
New identification dialog server 80 and identification dialog server 30, identification dialogue performance server 40, or among the identification dialog server C 50 any one is identical.
Next operation according to the speech recognition conversational system of the embodiment of the invention will be described.
At first, illustrate identification dialogue select server 20 to carry out to be used for to determine to carry out speech recognition and dialogue identification dialog server 30 process and in the identification dialog server of determining 30, carry out the situation of speech recognition dialog process.Fig. 5 shows in the speech recognition conversational system according to the embodiment of the invention, selects definite process flow diagram of discerning the process of dialog server 30 in the server 20 in the identification dialogue.
At first, client computer 10 selects server 20 requests to comprise the service (step 501) of speech recognition dialog process to the identification dialogue.More particularly, utilize the data communication units 130 of order from client computer 10 to talk with the CGI URL independent variable (argument) required that selection server 20 sends the program of the service of execution with processing to identification resemble HTTP.
Next, after receiving service request from client computer 10, the performance information (step 502) of server 20 requesting clients 10 is selected in the identification dialogue.
Next, after the request that receives from identification dialogue selection server 20 performance information, client computer 10 is selected the performance information (step 503) of server 20 transmission end message storeies 140 client computer 10 of storage from data communication units 130 to discerning dialogue by controller 120.The performance of client computer 10 comprises CODEC performance (CODEC type, CODEC compact model etc.), voice data format (compressed voice data, proper vector etc.), the speech I/O function of recording, synthetic speech I/O function (do not possess Compositing Engine, possess middle performance input engine, possess character string input engine etc.), service content etc.
The identification dialogue selects server 20 to receive from the performance information of the client computer 10 of client computer 10 transmissions, and reads the performance information of a plurality of identification dialog servers of having stored in the identification dialog server information-storing device 230.Then, the identification dialogue selects server 20 to compare the performance information of client computer 10 and the performance information (step 504) of a plurality of identification dialog servers in identification dialog server determining unit 220, thereby determines best identification dialog server (step 505) by extra consideration from the information of client computer 10 requested service contents.
Performance for the identification dialog server, comprise CODEC performance (CODEC type, CODEC mode of extension etc.), voice data format (compressed voice data, proper vector etc.), the speech output function of recording, synthetic speech output function (do not possess Compositing Engine, possess middle performance output engine, possess waveform output engine etc.), service content, the performance of recognition engine (task dedicated engine, instruction engine, command recognition engine etc.), operation information etc.
The example of the method for a definite best identified dialog server 30 is the performance of comparison client computer 10 and the performance of identification dialog server, thereby selection presents peak performance and operating identification dialog server in the consistent a plurality of identification dialog servers of output/input of client computer 10 and identification dialog server 30.In addition, an identification dialog server 30 all appears in each service content, for example, exist as the situation of the private server of address task server, name task server, telephone number task server and card ID task server under, can carry out from the system of selection of the identification dialog server of client computer 10 requested service contents so can be the example of another kind of definite method.
Next, the information (step 506) of the identification dialog server that server 20 notice client computer 10 determine in identification dialog server determining unit 220 is selected in identification dialogue.As the example of Notification Method, a kind of method is to wait the address of notifying identification dialog server 30 or the address of carrying out the executive routine of identification dialogue on identification dialog server 30 by it being embedded into the HTML screen.
Next, client computer 10 selects server 20 to receive the information of identification dialog server 30 from the identification dialogue, to the 30 request initialization speech recognition dialogues of identification dialog server, notifies its information (step 507) then.As the example of the requesting method that is used for initialization speech recognition dialogue, a kind of method is that the POST order by HTTP sends the URL address of the executive routine that is used to carry out the identification dialogue and carries out the required independent variable of speech recognition dialogue.The example of independent variable comprises the document (VoiceXML etc.) of describing service content, and service name is carried out the order of speech recognition dialogue.
Next, when receiving the request that starts the speech recognition dialogue from client computer 10, identification dialog server 30 is carried out speech recognition dialogue (step 508).In Fig. 5, the dotted line of Connection Step 508 and step 509 has shown terminal and the exchanges data of identification between the dialog server for several times.The back will describe speech recognition dialog process process in detail with reference to figure 6.
In the time will stopping the speech recognition dialogue, client computer 10 requests stop identification dialogue (step 509).The example of request identification termination of a session comprises that the POST order that utilizes HTTP sends the method for the executive routine address that is used to stop to discern dialogue and utilizes the POST order of HTTP to send the address of the executive routine that is used to carry out the identification dialogue and is used to stop discern the method for the order of dialogue.The identification dialog server receives the request that stops the speech recognition dialogue and stops identification dialogue (step 710) from client computer 10.
Next, the process of speech recognition dialog process is described.Fig. 6 shows the processing flow chart of speech recognition dialogue in according to the speech recognition dialogue method of the embodiment of the invention.
At first, the speech that is input to the data input cell 110 in the client computer 10 is sent to controller 120, controller 120 is carried out data processing then.The example of data processing comprises digitizing, and speech detects and the speech analysis.
Next, the voice data after the processing sends to identification dialog server (step 601) from data communication units 210.The example of voice data comprises digitized voice data, compressed voice data and proper vector.
In identification dialog server 30, data communication units 310 receives the voice data (step 602) that sends continuously from client computer 10, and controller 320 is determined this voice data as voice data then, and it is sent to speech recognition dialogue performance element 330.Speech recognition dialogue performance element 330 with the required recognition engine of speech recognition dialogue, recognition dictionary, Compositing Engine, synthetic dictionary continues to carry out speech recognition dialog process (step 603).
The type of the voice data that the contents processing of speech recognition dialogue can send according to client computer 10 changes.For example, if the voice data that sends is a compressed voice data, carries out expansion, speech analysis and the identification of packed data so and handle.What send is under the situation of proper vector, so only carries out speech recognition processes.After identification was finished dealing with, the recognition result of output sent to client computer 10 (step 604).The form of recognition result can be a text, meets the speech of synthesize/recording of text, the URL screen of reflection identification content etc.The recognition result (step 605) that client computer 10 receives from identification dialog server 30 according to the format analysis processing of recognition result.For example, output speech when the form of recognition result is synthetic or records speech, and when the form of recognition result is the URL screen display screen.
Like this, step 601 arrives the process repeated several times of step 605, thereby carries out voice dialogues.
The second, be given in the speech recognition conversational system according to the embodiment of the invention, replace carrying out the explanation of situation of the identification dialog server 30 of speech recognition dialog process with another new identification dialog server 80.
Fig. 7 shows in the speech recognition conversational system according to the embodiment of the invention, during identification dialog server 30 is carried out the identification dialog process, selects server 20 to determine process flow diagram flow chart under the situation of new identification dialog server 80 in the identification dialogue.
In Fig. 7, need be when new identification dialog server 80 be carried out processing after the data between client computer 10 and the identification dialog server 30 are through exchange several times, the transfer processing (step 703) that identification dialog server 30 selects server 20 to ask to new identification dialog server 80 to the identification dialogue.In Fig. 7, the dotted line of Connection Step 702 and step 703 has shown that the data between terminal and the identification dialog server are exchanged several times.
When session changes service content, produce inconsistently between service content and the server performance, the request of transfer service device can appear in the identification dialog server when breaking down.
Next, the performance information (step 704) of server 20 to client computer 10 requesting clients 10 selected in the identification dialogue.
After selecting server 20 to receive the request of performance information from identification dialogue, client computer 10 sends to identification dialog server (step 705) with the performance information of the client computer 10 of storage in the information-storing device 140 of client computer 10 from data communication units 130 by controller 120.
The identification dialogue selects server 20 to receive from the performance information of the client computer 10 of client computer 10 transmissions, read the performance information of a plurality of identification dialog servers of storage in the identification dialog server information-storing device 230, the relatively performance information of client computer 10 and the performance information (step 706) of a plurality of identification dialog servers in identification dialog server determining unit 220, thus cause that by extra consideration the information of the service content of identification dialog server transfer request determines the identification dialog server (step 707) of the best.The method of determining the performance information of client computer 10, the performance information of discerning dialog server and identification dialog server is the same.
Next, the information (step 708) of the new identification dialog server 80 that server 20 notice client computer 10 determine in identification dialog server determining unit 220 is selected in identification dialogue.An example of Notification Method is by it being embedded into the address that the HTML screen is notified the address of new identification dialog server 80 and carry out the executive routine of identification dialogue on new identification dialog server 80.
Next, client computer 10 receives the address information of new identification dialog server 80, and the new identification dialog server 80 of request notice starts speech recognition dialogue (step 709).One is asked the example of startup speech recognition dialogue method is to utilize the POST order of HTTP to send the URL address of carrying out to discern the executive routine of talking with and carry out the required parameter of speech recognition dialogue.
The 3rd, in speech recognition conversational system according to the embodiment of the invention, above-mentioned identification dialogue selects server 20 and identification dialog server 30 can be arranged in the same station server, thereby forms the identification dialogue performance server 40 that can carry out the speech recognition dialogue and select suitable speech recognition dialog server.
Fig. 8 shows the block diagram according to the identification dialogue performance server 40 of the embodiment of the invention.
As shown in Figure 8, increase identification dialog server determining unit 440 and identification dialog server information-storing device 450 on the identification dialog server 30 shown in Figure 3 and form identification dialogue performance server 40.Other parts, that is, data communication units 410, controller 420 and speech recognition dialogue performance element 430 is identical with corresponding component among Fig. 3.
When having a plurality of identification dialog server, identification dialog server determining unit 440 is selected and is determined best identification dialog server.Identification dialog server information-storing device 450 is stored the performance information of the identification dialog server of selecting and determining.Identical in example and the first kind of situation of the performance of identification dialog server, comprise CODEC performance (CODEC type, CODEC compact model etc.), voice data format (compressed voice data, proper vector etc.), the speech output function of recording, synthetic speech output function (do not possess Compositing Engine, possess middle performance output engine, possess waveform output engine etc.), service content, the performance of recognition engine (task dedicated engine, instruction engine, command recognition engine etc.), operation information etc.
In this case, identification dialogue performance server 40 is by the processing procedure shown in its execution graph 5.
Next, be given in another new identification dialog server 80 of carrying out the speech recognition dialog process and replace carrying out explanation under the situation of identification dialogue performance server 40 of speech recognition dialog process.
Fig. 9 shows in the speech recognition dialogue method according to the embodiment of the invention, during the identification dialog process, determines the processing flow chart of new identification dialog server 80 at identification dialogue performance server 40.
Referring to Fig. 9, when the exchanges data between terminal and the identification dialog server need be carried out processing for several times in new identification dialog server 80, identification dialogue performance server 40 was to the performance information (step 903) of client computer 10 requesting clients 10.In Fig. 9, the exchanges data between the dotted line display terminal of Connection Step 902 and step 903 and the identification dialog server is performed for several times.
When the session service content is changed, take place between service content and the server performance inconsistent, performance information that may requesting clients 10 when the identification dialog server such as breaks down at situation.
Next, after receiving the performance information request from identification dialogue performance server 40, client computer 10 sends to identification dialogue performance server 40 (step 904) with the performance information of the client computer 10 of storage in the end message storer 140 from data communication units 130 by controller 120.
The performance information of the client computer 10 that identification dialogue performance server 40 subscribing clients 10 send, read the performance information of a plurality of identification dialog servers of storage in the identification dialog server information-storing device 450, compare the performance information of client computer 10 and the performance information (step 905) of a plurality of identification dialog servers in identification dialog server determining unit 440, thereby determine best identification dialog server (step 906) from the information of client computer 10 requested service contents by extra consideration.The performance information of client computer 10, the performance information of identification dialog server, with the method for determining the identification dialog server with above-mentioned identical.
Next, identification dialogue performance server 40 is being discerned the information (step 907) of the new identification dialog server of determining in the dialog server determining unit 440 80 to client computer 10 notices.An example of Notification Method is to notify the address of new identification dialog server 80 or the address of carrying out the executive routine of identification dialogue at new identification dialog server 80 by it being embedded the HTML screen.
Next, client computer 10 receives the new identification dialog server 80 startup speech recognition dialogues (step 908) of the address information and the request notice of new identification dialog server 80.A kind of example of asking to start the method for speech recognition dialogue is to utilize the POST order of HTTP to send the URL address of the executive routine of carrying out the identification dialogue and carry out the required parameter of speech recognition dialogue.
The 4th, in speech recognition conversational system, be given in identification dialog server C 50 from read in the explanation under the service content situation such as the such service content reservation server 60 of content supplier according to the embodiment of the invention.In this case, service content reservation server 60 can be arranged on the identification dialogue and select in the server 20, utilizes web as the web server that service interface is provided to the user thereby form.In addition, in this case, can provide the web browser as the interface of selecting or import service content to client computer 10.
Figure 10 shows the synoptic diagram of the identification dialog server C (identification dialog server device) 50 according to the embodiment of the invention.Identification dialog server device 50 shown in configuration Figure 10 increases speech recognition session start unit 530 on identification dialogue performance server 40 shown in Figure 8 and service content is read unit 540.Such as data communication units 510, controller 520, speech recognition dialogue performance element 530, identification dialog server determining unit 560 is identical with corresponding components among Fig. 8 with other parts of identification dialog server information-storing device 570 and so on.
Speech recognition session start unit 530 starts the speech recognition dialog process according to the information on services that client computer 10 sends, and to the server requests service content that is used to keep service content.Service content comprises header identification, telephone number identification and the credit card number identification of Address Recognition, name identification, incoming call tone.
Service content is read unit 540 and is read in service content from service content reservation server 60.Speech recognition dialogue performance element 550, controller 520 and data communication units 510 are talked with performance element 430 with speech recognition respectively, and controller 420 is identical with data communication units 410.Identification dialog server information-storing device 570 and identification dialog server determining unit 560 can be provided.In this case, select server 20 to carry out determining by the identification dialogue to an identification dialog server.If identification dialog server information-storing device 570 and identification dialog server determining unit 560 are provided, they are identical with identification dialog server determining unit 440 with identification dialog server information-storing device 450 respectively.
Figure 11 shows in the speech recognition dialogue method according to the embodiment of the invention, and identification dialog server C 50 reads in the process flow diagram of the process of service content from service content reservation server 60.
Step 1101 is identical to the processing of step 506 with step 501 described above to the processing of step 1105 among Figure 11.
Next, according to the information of selecting the identification dialog server C 50 of server 20 notices from the identification dialogue, client computer 10 request identification dialog server C 50 start speech recognition dialogue (step 1106).During asking, send service information.
A kind of example of asking to start the method for speech recognition dialogue is URL address and the service content information that utilizes the POST order of HTTP to send to be used to the executive routine of carrying out the identification dialogue.Service content information comprises document (VoiceXML etc.) and the service name of describing service content.
Next, identification dialog server C 50 receives request in data communication units 510 from client computer 10,530 start the speech recognition dialog process in speech recognition session start unit, and ask service content (step 1107) according to the information on services that client computer 10 sends to service content reservation server 60.
A kind of example of the method for service content of asking is, is under the situation of address in the service content information that sends from client computer 10, then visits this address.In the service content information that sends from client computer 10 is under the situation of service name, then has method that another kind obtains the address of corresponding with service title and this address of visit as an example.
Next, service content reservation server 60 receives the request of self-identifying dialog server C 50, and sends service content (step 1108).Identification dialog server C 50 receives the service content that sends in data communication units 510, reads unit 540 in service content and reads in service content (step 1109), starts speech recognition dialog process (step 1110) then.
Step 1110 is identical to the process of step 510 with step 507 among Figure 11 to the process of step 1112, and the exchanges data between the dotted line display terminal of Connection Step 1110 and step 1111 and the identification dialog server is performed for several times.
In said system, illustrated that identification dialogue selection server 20 and identification dialog server C 50 are connected to the example of bilateral network.But, also can accept the configuration that one of them is connected to network.
Each step of above-mentioned explanation can realize by the program of operation on server computer 901.Figure 12 shows the program of carrying out according to the speech recognition dialogue method of the embodiment of the invention on server computer 901, and the synoptic diagram of the medium 902 of logging program.
Industrial applicibility
According to aforesaid the present invention, even in the situation that has a plurality of identification dialog servers, Thereby also can from a plurality of servers, select and the knowledge of definite best identified dialog server execution speech Dui Hua not.
In addition, though since many reasons so that need to be in new identification dialogue service at session Carry out to process on the device, client computer also can another suitable identification dialog server of automatic access, So that the identification dialog process can continue.
Claims (32)
1. speech recognition Interface comprises:
Be used to carry out a plurality of Interfaces of speech recognition dialogue;
Be used for sending the dispensing device of speech information to Interface;
The network that connects dispensing device and Interface; With
In a plurality of Interfaces, select an Interface according to the performance of dispensing device and the performance of a plurality of Interfaces, and possess the information that is used to specify selected Interface to the dispensing device transmission, and the selecting arrangement of the required information of speech recognition dialogue is carried out in exchange between selected Interface and dispensing device.
2. speech recognition Interface according to claim 1 also comprises the request unit that is used for to Interface request service, and described network connects dispensing device, request unit and Interface.
3. speech recognition Interface according to claim 1 and 2 also comprises the service retaining device that is used to keep to Interface requested service content; And described network connection service retaining device, dispensing device and Interface.
4. speech recognition Interface according to claim 2, wherein selecting arrangement has the information that is used to specify selected Interface to the dispensing device transmission, and at selected Interface, the function of Exchange Service content and speech information between request unit and the dispensing device.
5. speech recognition Interface according to claim 1, wherein selecting arrangement has the function of selected Interface being changed into another Interface.
6. speech recognition Interface according to claim 4, wherein selecting arrangement has the function of selected Interface being changed into another Interface.
7. according to any one the described speech recognition Interface in the claim 1,2,3, wherein selecting arrangement has the function that the performance with the performance of dispensing device and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
8. speech recognition Interface according to claim 5, wherein selecting arrangement has the function that the performance with the performance of dispensing device and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
9. speech recognition Interface according to claim 6, wherein selecting arrangement has the function that the performance with the performance of dispensing device and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
10. according to claim 2 or 4 described speech recognition Interfaces, wherein selecting arrangement has the function that the performance with the service of dispensing device and performance and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
11. speech recognition Interface according to claim 5, wherein selecting arrangement has the function that the performance with the service of dispensing device and performance and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
12. speech recognition Interface according to claim 6, wherein selecting arrangement has the function that the performance with the service of dispensing device and performance and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
13. speech recognition Interface according to claim 1, wherein from the speech information of dispensing device output by digitized voice/data, compressed speech data or characteristic vector data form.
14. speech recognition Interface according to claim 1 is used for wherein determining that the dispensing device properties data comprises the encoding and decoding performance, voice data format and record/synthesize the data of speech I/O function.
15. speech recognition Interface according to claim 1 is used for wherein determining that the Interface properties data comprises the encoding and decoding performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition performance and operation information.
16. a speech recognition Interface comprises:
Be used to carry out a plurality of speech recognition dialog servers of speech recognition dialogue;
Be used to send to the content service of speech recognition dialog server request and the client computer of speech information;
Be used for selecting the speech recognition dialogue of an Interface to select server at a plurality of Interfaces; With
Connect client computer, the network of server is selected in speech recognition dialog server and speech recognition dialogue; Wherein
Client computer comprises: the data input cell that is used to import the data of speech information and service content, the end message storer that is used for the performance data of storage client, be used for by the communication between network execution speech recognition dialog server and the speech recognition selection server, and send the data communication units of speech information to selected speech recognition dialog server, and the controller that is used to control client actions
The speech recognition dialogue selects server to comprise: the data communication units that is used for carrying out by network the communication between client computer and the speech recognition dialog server, be used to store the identification dialog server information-storing device of the performance of each speech recognition dialog server, and the performance data that is used for reading the client computer that the end message storer stores, the performance data of the speech recognition dialog server of storing in this performance data and the identification dialog server information-storing device relatively, in a plurality of speech recognition dialog servers, determine at least one speech recognition dialog server, send the identification dialog server determining unit that is used to specify the required information of determined speech recognition dialog server to client computer then
The speech recognition dialog server comprises: the speech recognition dialogue performance element that is used for carrying out according to the speech information of client computer input the speech recognition dialogue, be used for carrying out client computer and the data communication units of the communication between the server is selected in the speech recognition dialogue by network, and the controller that is used to control the operation of speech recognition dialog server.
17. speech recognition Interface according to claim 16, also comprise: be connected on the network and keep from the service content reservation server of the service content of client requests and be arranged in the speech recognition dialog server and read in the unit of reading of the service content that keeps in the service content reservation server.
18. according to claim 16 or 17 described speech recognition Interfaces, also comprise: be arranged in the speech recognition dialog server, be used for selecting server output the speech recognition dialog process to be transferred to the process transfer device of the request of another speech recognition dialog server to the speech recognition dialogue.
19. speech recognition Interface according to claim 16, wherein from the speech information of client computer output by digital voice data, compressed speech data or characteristic vector data form.
20. speech recognition Interface according to claim 16, wherein the performance data of client computer comprises: encoding and decoding performance, voice data format and the data of recording/synthesize speech I/O function.
21. speech recognition Interface according to claim 16, wherein the performance data of speech recognition dialog server comprises the encoding and decoding performance, and the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
22. a speech recognition dialogue system of selection is used for by the data communication between network execution dispensing device and a plurality of Interface, and is used to carry out the process that the speech information data from dispensing device output is sent to the appointment Interface, comprising:
Receive the first step of speech information data from dispensing device;
Second step to the performance data of this dispensing device of dispensing device request;
Send the third step of the performance data of this dispensing device from dispensing device;
Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine to specify the 4th step of Interface according to comparative result;
The notice dispensing device is used to specify the 5th step of the information of determined Interface; With
Carry out the 6th step of the speech recognition dialog process between dispensing device and the definite Interface.
23. speech recognition dialogue system of selection according to claim 22 also comprises:
During the speech recognition dialog process between dispensing device and the Interface, transmission is used for from seven step of this Interface to the request of the corresponding content of another Interface transfer dispensing device;
The 8th step to the performance data of this dispensing device of dispensing device request;
The request in the 8th step of responding sends the 9th step of the performance data of this dispensing device from dispensing device;
Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the tenth step of new Interface according to comparative result;
The notice dispensing device is used to specify the 11 step of the required information of Interface definite in the tenth step; With
Carry out the 12 step of speech recognition dialog process between Interface of in the tenth step, determining and the dispensing device.
24. speech recognition dialogue system of selection according to claim 22 wherein as speech information, is used to comprise digitized voice/data, the speech information of compressed speech data or characteristic vector data.
25. speech recognition according to claim 22 dialogue system of selection, wherein the performance data of dispensing device comprises the encoding and decoding performance, and the data of speech I/O function and service content are recorded/synthesized to voice data format.
26. speech recognition dialogue system of selection according to claim 22, wherein the performance data of Interface comprises the encoding and decoding performance, and the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
27. speech recognition dialogue system of selection, be used for by network at dispensing device, carry out data communication and carry out the process that the speech information data from dispensing device output is sent to the Interface of appointment between a plurality of Interfaces and the service retaining device, the method comprising the steps of:
Reception comprises from the first step of the request of the service content of the speech recognition dialog process of dispensing device output;
Second step to the performance data of this dispensing device of dispensing device request;
Send the third step of the performance data of this dispensing device from dispensing device;
Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and in a plurality of Interfaces, determine the 4th step of the Interface of appointment according to comparative result;
The notice dispensing device is used to specify the 5th step of the required information of the Interface of determining in the 4th step;
Carry out the 6th step of speech recognition dialog process between the Interface of in dispensing device and the 4th step, determining;
The Interface of determining from the 4th step is to seven step of service retaining device request by dispensing device requested service content;
The Interface of determining in the 4th step is sent in the 8th step of requested service content in the 7th step;
Read in the 9th step of the service content that sends in the 8th step by the Interface of determining in the 4th step; With
Carry out the tenth step of speech recognition dialog process between the Interface of in dispensing device and the 4th step, determining according to the service content of reading in.
28. speech recognition dialogue system of selection according to claim 27 also comprises:
During the speech recognition dialog process between dispensing device and the Interface, send from the 11 step of this Interface to the request of the corresponding content of another Interface transfer dispensing device;
The 12 step to the performance data of this dispensing device of dispensing device request;
Send the 13 step of the performance data of this dispensing device from dispensing device;
Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the 14 step of new Interface according to comparative result;
The notice dispensing device is used to specify the 15 step of the required information of Interface definite in the 14 step; With
Carry out the 16 step of speech recognition dialog process between Interface of in the 14 step, determining and the dispensing device.
29. speech recognition dialogue selecting arrangement, be used for by the data communication between network execution dispensing device and a plurality of Interface, send from the process of the speech information data of dispensing device output with Interface that carry out to select appointment and to the Interface of appointment, this device comprises:
Be used to receive speech information and first device of representing the data that Interface will change from dispensing device;
Be used for second device to the performance data of this dispensing device of dispensing device request;
The request of response second device is used for sending from dispensing device the 3rd device of performance data;
Be used for the performance data of comparison dispensing device and the performance data of a plurality of Interfaces, and determine the 4th device of Interface according to comparative result; With
The notice dispensing device is used to specify the 5th device of the information of the Interface of determining in the 4th device.
30. speech recognition dialogue selecting arrangement according to claim 29, wherein speech information comprises digitized voice/data, compressed speech data, or characteristic vector data.
31. speech recognition according to claim 29 dialogue selecting arrangement, wherein the performance data of dispensing device comprises the encoding and decoding performance, and the data of speech I/O function and service content are recorded/synthesized to voice data format.
32. speech recognition dialogue selecting arrangement according to claim 29, wherein the performance data of Interface comprises the encoding and decoding performance, and the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP102274/2002 | 2002-04-04 | ||
JP2002102274A JP2003295890A (en) | 2002-04-04 | 2002-04-04 | Apparatus, system, and method for speech recognition interactive selection, and program |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1514995A CN1514995A (en) | 2004-07-21 |
CN1282946C true CN1282946C (en) | 2006-11-01 |
Family
ID=28786256
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB038003465A Expired - Fee Related CN1282946C (en) | 2002-04-04 | 2003-03-12 | Speech recognition conversation selection device, speech recogntion conversation system, speech recognition conversation selection method, and program |
Country Status (6)
Country | Link |
---|---|
US (1) | US20040162731A1 (en) |
EP (1) | EP1394771A4 (en) |
JP (1) | JP2003295890A (en) |
CN (1) | CN1282946C (en) |
TW (1) | TWI244065B (en) |
WO (1) | WO2003085640A1 (en) |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3885523B2 (en) * | 2001-06-20 | 2007-02-21 | 日本電気株式会社 | Server / client type speech recognition apparatus and method |
FR2853126A1 (en) * | 2003-03-25 | 2004-10-01 | France Telecom | DISTRIBUTED SPEECH RECOGNITION PROCESS |
US8311822B2 (en) * | 2004-11-02 | 2012-11-13 | Nuance Communications, Inc. | Method and system of enabling intelligent and lightweight speech to text transcription through distributed environment |
GB2427500A (en) * | 2005-06-22 | 2006-12-27 | Symbian Software Ltd | Mobile telephone text entry employing remote speech to text conversion |
US7957975B2 (en) * | 2005-08-09 | 2011-06-07 | Mobile Voice Control, LLC | Voice controlled wireless communication device system |
WO2007050358A2 (en) * | 2005-10-21 | 2007-05-03 | Callminer, Inc. | Method and apparatus for processing heterogeneous units of work |
US9330668B2 (en) * | 2005-12-20 | 2016-05-03 | International Business Machines Corporation | Sharing voice application processing via markup |
US20080154608A1 (en) * | 2006-12-26 | 2008-06-26 | Voice Signal Technologies, Inc. | On a mobile device tracking use of search results delivered to the mobile device |
US20080154870A1 (en) * | 2006-12-26 | 2008-06-26 | Voice Signal Technologies, Inc. | Collection and use of side information in voice-mediated mobile search |
US20080154612A1 (en) * | 2006-12-26 | 2008-06-26 | Voice Signal Technologies, Inc. | Local storage and use of search results for voice-enabled mobile communications devices |
US20080153465A1 (en) * | 2006-12-26 | 2008-06-26 | Voice Signal Technologies, Inc. | Voice search-enabled mobile device |
CN101079885B (en) * | 2007-06-26 | 2010-09-01 | 中兴通讯股份有限公司 | A system and method for providing automatic voice identification integrated development platform |
DE102008033056A1 (en) | 2008-07-15 | 2010-01-21 | Volkswagen Ag | Motor vehicle, has controller detecting manual input taken place by operating device, detecting acoustic input allowed corresponding to manual input, and acoustically outputting determined allowed acoustic input by loudspeaker |
US10387140B2 (en) | 2009-07-23 | 2019-08-20 | S3G Technology Llc | Modification of terminal and service provider machines using an update server machine |
CN102237087B (en) * | 2010-04-27 | 2014-01-01 | 中兴通讯股份有限公司 | Voice control method and voice control device |
US20120059655A1 (en) * | 2010-09-08 | 2012-03-08 | Nuance Communications, Inc. | Methods and apparatus for providing input to a speech-enabled application program |
WO2014020835A1 (en) * | 2012-07-31 | 2014-02-06 | 日本電気株式会社 | Agent control system, method, and program |
CN103024169A (en) * | 2012-12-10 | 2013-04-03 | 深圳市永利讯科技股份有限公司 | Method and device for starting communication terminal application program through voice |
US9413891B2 (en) | 2014-01-08 | 2016-08-09 | Callminer, Inc. | Real-time conversational analytics facility |
CN103870547A (en) * | 2014-02-26 | 2014-06-18 | 华为技术有限公司 | Grouping processing method and device of contact persons |
JP2018037819A (en) * | 2016-08-31 | 2018-03-08 | 京セラ株式会社 | Electronic device, control method and program |
US11663535B2 (en) | 2016-10-03 | 2023-05-30 | Google Llc | Multi computational agent performance of tasks |
JP6882463B2 (en) * | 2016-10-03 | 2021-06-02 | グーグル エルエルシーGoogle LLC | Computer-based selection of synthetic speech for agents |
CN106998359A (en) * | 2017-03-24 | 2017-08-01 | 百度在线网络技术(北京)有限公司 | The method for network access and device of speech-recognition services based on artificial intelligence |
JP6843388B2 (en) * | 2017-03-31 | 2021-03-17 | 株式会社アドバンスト・メディア | Information processing system, information processing device, information processing method and program |
EP3596616A1 (en) | 2018-05-03 | 2020-01-22 | Google LLC. | Coordination of overlapping processing of audio queries |
JP6555838B1 (en) * | 2018-12-19 | 2019-08-07 | Jeインターナショナル株式会社 | Voice inquiry system, voice inquiry processing method, smart speaker operation server apparatus, chatbot portal server apparatus, and program. |
CN109949817B (en) * | 2019-02-19 | 2020-10-23 | 一汽-大众汽车有限公司 | Voice arbitration method and device based on dual-operating-system dual-voice recognition engine |
CN110718219B (en) * | 2019-09-12 | 2022-07-22 | 百度在线网络技术(北京)有限公司 | Voice processing method, device, equipment and computer storage medium |
JP7377668B2 (en) * | 2019-10-04 | 2023-11-10 | エヌ・ティ・ティ・コミュニケーションズ株式会社 | Control device, control method and computer program |
CN113450785B (en) * | 2020-03-09 | 2023-12-19 | 上海擎感智能科技有限公司 | Implementation method, system, medium and cloud server for vehicle-mounted voice processing |
Family Cites Families (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5708697A (en) * | 1996-06-27 | 1998-01-13 | Mci Communications Corporation | Communication network call traffic manager |
US6292782B1 (en) * | 1996-09-09 | 2001-09-18 | Philips Electronics North America Corp. | Speech recognition and verification system enabling authorized data transmission over networked computer systems |
US6078886A (en) * | 1997-04-14 | 2000-06-20 | At&T Corporation | System and method for providing remote automatic speech recognition services via a packet network |
CN1163869C (en) * | 1997-05-06 | 2004-08-25 | 语音工程国际公司 | System and method for developing interactive speech applications |
US7251315B1 (en) * | 1998-09-21 | 2007-07-31 | Microsoft Corporation | Speech processing for telephony API |
US7003463B1 (en) * | 1998-10-02 | 2006-02-21 | International Business Machines Corporation | System and method for providing network coordinated conversational services |
US6408272B1 (en) * | 1999-04-12 | 2002-06-18 | General Magic, Inc. | Distributed voice user interface |
US6363349B1 (en) * | 1999-05-28 | 2002-03-26 | Motorola, Inc. | Method and apparatus for performing distributed speech processing in a communication system |
US6792086B1 (en) * | 1999-08-24 | 2004-09-14 | Microstrategy, Inc. | Voice network access provider system and method |
US6937977B2 (en) * | 1999-10-05 | 2005-08-30 | Fastmobile, Inc. | Method and apparatus for processing an input speech signal during presentation of an output audio signal |
US6633846B1 (en) * | 1999-11-12 | 2003-10-14 | Phoenix Solutions, Inc. | Distributed realtime speech recognition system |
JP2001142488A (en) * | 1999-11-17 | 2001-05-25 | Oki Electric Ind Co Ltd | Voice recognition communication system |
US6396898B1 (en) * | 1999-12-24 | 2002-05-28 | Kabushiki Kaisha Toshiba | Radiation detector and x-ray CT apparatus |
JP2001222292A (en) * | 2000-02-08 | 2001-08-17 | Atr Interpreting Telecommunications Res Lab | Voice processing system and computer readable recording medium having voice processing program stored therein |
US6505161B1 (en) * | 2000-05-01 | 2003-01-07 | Sprint Communications Company L.P. | Speech recognition that adjusts automatically to input devices |
JP3728177B2 (en) * | 2000-05-24 | 2005-12-21 | キヤノン株式会社 | Audio processing system, apparatus, method, and storage medium |
US6934756B2 (en) * | 2000-11-01 | 2005-08-23 | International Business Machines Corporation | Conversational networking via transport, coding and control conversational protocols |
CN1266625C (en) * | 2001-05-04 | 2006-07-26 | 微软公司 | Server for identifying WEB invocation |
GB2376394B (en) * | 2001-06-04 | 2005-10-26 | Hewlett Packard Co | Speech synthesis apparatus and selection method |
US6996525B2 (en) * | 2001-06-15 | 2006-02-07 | Intel Corporation | Selecting one of multiple speech recognizers in a system based on performance predections resulting from experience |
US20030078777A1 (en) * | 2001-08-22 | 2003-04-24 | Shyue-Chin Shiau | Speech recognition system for mobile Internet/Intranet communication |
US7146321B2 (en) * | 2001-10-31 | 2006-12-05 | Dictaphone Corporation | Distributed speech recognition system |
US6785654B2 (en) * | 2001-11-30 | 2004-08-31 | Dictaphone Corporation | Distributed speech recognition system with speech recognition engines offering multiple functionalities |
US6898567B2 (en) * | 2001-12-29 | 2005-05-24 | Motorola, Inc. | Method and apparatus for multi-level distributed speech recognition |
GB2389217A (en) * | 2002-05-27 | 2003-12-03 | Canon Kk | Speech recognition system |
US6834265B2 (en) * | 2002-12-13 | 2004-12-21 | Motorola, Inc. | Method and apparatus for selective speech recognition |
US7076428B2 (en) * | 2002-12-30 | 2006-07-11 | Motorola, Inc. | Method and apparatus for selective distributed speech recognition |
US20050177371A1 (en) * | 2004-02-06 | 2005-08-11 | Sherif Yacoub | Automated speech recognition |
-
2002
- 2002-04-04 JP JP2002102274A patent/JP2003295890A/en active Pending
-
2003
- 2003-03-12 EP EP03708563A patent/EP1394771A4/en not_active Withdrawn
- 2003-03-12 WO PCT/JP2003/002952 patent/WO2003085640A1/en active Application Filing
- 2003-03-12 US US10/476,638 patent/US20040162731A1/en not_active Abandoned
- 2003-03-12 CN CNB038003465A patent/CN1282946C/en not_active Expired - Fee Related
- 2003-04-03 TW TW092107581A patent/TWI244065B/en not_active IP Right Cessation
Also Published As
Publication number | Publication date |
---|---|
JP2003295890A (en) | 2003-10-15 |
TWI244065B (en) | 2005-11-21 |
CN1514995A (en) | 2004-07-21 |
EP1394771A1 (en) | 2004-03-03 |
TW200307908A (en) | 2003-12-16 |
WO2003085640A1 (en) | 2003-10-16 |
US20040162731A1 (en) | 2004-08-19 |
EP1394771A4 (en) | 2005-10-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1282946C (en) | Speech recognition conversation selection device, speech recogntion conversation system, speech recognition conversation selection method, and program | |
CN1276367C (en) | Multimedia data flow code conversion system | |
CN1492656A (en) | Method, apparatus and system for sharing application session information over multiple channels | |
CN1286304C (en) | Method of realizing scene chat between customers in instant communication | |
CN101075983A (en) | Instant speech telecommunication terminal, server, system and instant speech telecommunication method | |
CN1969316A (en) | Centralized biometric authentication | |
CN1573928A (en) | Semantic object synchronous understanding implemented with speech application language tags | |
CN1444748A (en) | Network service system and method | |
CN1574795A (en) | System and method for using packed compressed buffers for improved client server communication | |
CN1799217A (en) | System and method for authorizing a party to join a conference | |
EP2204965A1 (en) | Device and method for receiving scalable content from multiple sources having different content quality | |
CN1612580A (en) | Mobile phone system with incoming melody designating function and mobile phone | |
CN1976322A (en) | Method and system for realizing multimedia immediate communicating and control flow | |
CN1274175C (en) | Mobile communication terminal device, control method and programme thereof | |
CN1792081A (en) | Method for setting up a call between selected subscriber terminals via a dedicated communication device | |
CN1831762A (en) | Development framework for mixing semantics-driven and state driven dialog | |
CN1901707A (en) | Monitoring mobile phone and its remote monitoring method | |
CN1617075A (en) | Modal synchronization control method and multi-modal interface system | |
CN1742461A (en) | Methods and apparatus for identifying patterns in messages and generating actions | |
CN1652543A (en) | Method and apparatus for connecting heterogeneous protocol nodes | |
CN1905598A (en) | Method and system for searching and obtaining WAP network address based on speech identifying technique | |
US7403605B1 (en) | System and method for local replacement of music-on-hold | |
CN2881786Y (en) | Information processing device | |
CN1716861A (en) | Method for providing a cellular phone or a portable terminal with news or other information | |
CN1909506A (en) | Device and method for controlling broadcast of media resource in soft exchange |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20061101 Termination date: 20160312 |
|
CF01 | Termination of patent right due to non-payment of annual fee |