[go: up one dir, main page]

CN1282946C - Speech recognition conversation selection device, speech recogntion conversation system, speech recognition conversation selection method, and program - Google Patents

Speech recognition conversation selection device, speech recogntion conversation system, speech recognition conversation selection method, and program Download PDF

Info

Publication number
CN1282946C
CN1282946C CNB038003465A CN03800346A CN1282946C CN 1282946 C CN1282946 C CN 1282946C CN B038003465 A CNB038003465 A CN B038003465A CN 03800346 A CN03800346 A CN 03800346A CN 1282946 C CN1282946 C CN 1282946C
Authority
CN
China
Prior art keywords
interface
speech recognition
dispensing device
data
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB038003465A
Other languages
Chinese (zh)
Other versions
CN1514995A (en
Inventor
山田荣子
羽金广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Publication of CN1514995A publication Critical patent/CN1514995A/en
Application granted granted Critical
Publication of CN1282946C publication Critical patent/CN1282946C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

In a voice recognition dialogue system having a plurality of recognition dialogue servers, there is no framework to select and determine one recognition dialogue server. A client 10 transmits its ability information stored in a terminal information storage 140 to a recognition dialogue selecting server 20. The ability of the client 10 includes a CODEC ability (CODEC type, CODEC compression mode, etc.), a voice data format (compressed voice data, feature vector, etc.), a recorded voice I/O function, a synthesized voice I/O function (without synthesizing engine, with intermediate representation input engine, with character string input engine, etc.), and service contents. The recognition dialogue selecting server 20 receives the ability information transmitted from the client 10, and determines the optimum recognition dialogue server according to ability information of plural recognition dialogue servers which has been stored in a recognition dialogue server information storage 230 and information of the requested service contents.

Description

Speech recognition dialogue selecting arrangement and method and speech recognition Interface
Technical field
The present invention relates to the speech recognition Interface, speech recognition dialogue system of selection, speech recognition dialogue selecting arrangement, and the recording medium of speech recognition dialogue option program, utilize speech recognition dialogue system of selection, device and program send to identification dialog server with the voice data of the terminal (client computer) of terminal and so on by network being input to as mobile phone, automobile, and carry out voice dialogues by speech recognition with replying at the identification dialog server.
Background technology
Routinely, utilize the speech recognition conversational system of VoIP (Voiceover Internet Protocol (networking telephone)) often to be called as the speech recognition Interface of client-server type, utilize this device to be sent to the identification dialog server by Packet Based Network, carry out the speech recognition dialog process at the identification dialog server then from the voice data of client computer output.For example, in the Nikkei Internet technology 130-137 in March, 1998 page or leaf, such speech recognition conversational system is had been described in detail.
In utilizing the system of VoIP, by speech recognition with reply the execution in the known main frame (framework) in the IP address of client computer and identification dialog server of the speech recognition of (speech synthetic, that record etc.) or voice dialogues.In this main frame, utilize the IP address interconnect so that can carry out carrying out the speech recognition dialogue under the condition of packet communication in client computer and identification dialog server, and the grouping of voice data sends to the identification dialog server from client computer.
In the open No.10-333693 of Jap.P., disclosed automatic speech recognition service method and system thereof can be provided.The structure of this system makes discerns voice data by from client computer voice data is sent to the speech recognition server on Packet Based Network.
But, in the above-mentioned conventional system that utilizes VoIP, need in all known main frame in the IP address of client computer and identification dialog server, carry out speech recognition and voice dialogues.Therefore, when having a plurality of identification dialog server, need exploitation to be used to select and will discern the new system that dialog server is associated with client computer the identification dialog server of client-server the best.
Similarly, can provide automatic speech recognition service method and system thereof for what disclose among the open No.10-333693 of Jap.P., when having a plurality of identification dialog server, also need to develop and be used to select for the identification dialog server of client computer the best and will discern the new system that dialog server is associated with client computer.
An object of the present invention is to provide the speech recognition Interface, speech recognition dialogue system of selection, speech recognition dialogue selecting arrangement, and be used for when having a plurality of identification dialog server, can be by the performance of pointing out client computer and the performance of discerning dialog server, select the best identified dialog server, and can between identification dialog server of determining and client computer, carry out the recording medium of the speech recognition dialogue option program of speech recognition dialogue.
Summary of the invention
In order to obtain above-mentioned purpose, speech recognition Interface of the present invention comprises: a plurality of Interfaces that are used to carry out the speech recognition dialogue; Be used for sending the dispensing device of speech information to Interface; The network that connects dispensing device and Interface; And the selecting arrangement of in a plurality of Interfaces, selecting an Interface according to the performance of the performance (ability) of dispensing device and a plurality of Interfaces.
In addition, speech recognition Interface of the present invention can comprise; Be used to carry out a plurality of Interfaces of speech recognition dialogue; Be used for sending the dispensing device of speech information to Interface; The network that connects dispensing device and Interface; With in a plurality of Interfaces, select an Interface according to the performance of dispensing device and the performance of a plurality of Interfaces, and possess the information that is used to specify selected Interface to the dispensing device transmission, and the selecting arrangement of the required information of speech recognition dialogue is carried out in exchange between selected Interface and dispensing device.
In addition, speech recognition Interface of the present invention can also comprise the request unit that is used for to Interface request service, and described network connects dispensing device, request unit and Interface.Can also comprise the service retaining device that is used to keep to Interface requested service content; The Connection Service retaining device, the network of dispensing device and Interface.
In above-mentioned speech recognition Interface, can use another to have and send the information be used to specify selected Interface, and the selecting arrangement of Exchange Service content and speech information function replaces above-mentioned selecting arrangement between selected Interface and request and dispensing device to dispensing device.In addition, as selecting arrangement, can use to have the selecting arrangement of a selected Interface being changed into another selected Interface function.
As selecting arrangement, can use another selecting arrangement with following function, promptly, the performance that can compare dispensing device and a plurality of Interfaces, and determine to have the Interface of the input format of the speech information that the is input to Interface this desired properties consistent with the output format of the speech information that outputs to dispensing device according to comparative result.As selecting arrangement, can use another selecting arrangement with following function, promptly, can compare the service of dispensing device and the performance of performance and a plurality of Interfaces, and determine to have the Interface of the input format of the speech information that the is input to Interface this desired properties consistent according to comparative result with the output format of the speech information that outputs to dispensing device.
Speech information as from dispensing device output preferably uses by digitized voice/data, the speech information that compressed speech data or characteristic vector data form.In addition, be used for determining that the dispensing device properties data preferably includes the CODEC performance, voice data format and the data of recording/synthesize speech I/O function.Be used for determining that the Interface properties data preferably includes the CODEC performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition performance and operation information.
More particularly, speech recognition Interface of the present invention can comprise: a plurality of speech recognition dialog servers that are used to carry out the speech recognition dialogue; Be used to send to the content service of speech recognition dialog server request and the client computer of speech information; Be used for selecting the speech recognition dialogue of an Interface to select server at a plurality of Interfaces; And the connection client computer, the network of server is selected in speech recognition dialog server and speech recognition dialogue.
Client computer can comprise: the data input cell that is used to import speech information and service content-data, the end message storer that is used for the storage client performance data, be used for selecting to communicate between the server and send the data communication units of speech information in speech recognition dialog server and speech recognition to selected speech recognition dialog server by network, and the controller that is used to control the operation of client computer.
Speech recognition dialogue selects server to comprise: be used for the data communication units that communicates between client computer and speech recognition dialog server by network, be used to store the identification dialog server information-storing device of each speech recognition dialog server performance, and the performance data that is used for reading the client computer that the end message storer stores, the performance data of the speech recognition dialog server of storing in this performance data and the identification dialog server information-storing device relatively, in a plurality of speech recognition dialog servers, determine at least one speech recognition dialog server, send the identification dialog server determining unit that is used to specify the required information of definite speech recognition dialog server to client computer then.
The speech recognition dialog server can comprise: be used for according to the speech recognition dialogue performance element of carrying out the speech recognition dialogue from the speech information of client computer input, be used for selecting the data communication units that communicates between the server in the dialogue of client computer and speech recognition by network, and the controller that is used to control the operation of speech recognition dialog server.
In this case, the speech recognition Interface can comprise: be connected to network and keep from the service content reservation server of the service content of client requests and be arranged in the speech recognition dialog server and read in the service content that keeps in the service content reservation server read the unit.In addition, the speech recognition Interface can also comprise be arranged in the speech recognition dialog server, be used for selecting server output the speech recognition dialog process to be transferred to the process transfer device of the request of another speech recognition dialog server to speech recognition dialogue.The speech information of client computer output is preferably by digitized voice/data, and compressed speech data or characteristic vector data form.
In addition, be used for determining that the data of client capabilities preferably include: CODEC performance, voice data format and record/synthesize the data of speech I/O function.Be used in addition determining that speech recognition dialog server properties data preferably includes: the CODEC performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
Speech recognition dialogue system of selection of the present invention is used for carrying out data communication by network between dispensing device and a plurality of Interface, and be used to carry out and will send to the processing of specifying Interface from the speech information data of dispensing device output, it comprises: the first step that receives the speech information data from dispensing device; Second step to the performance data of dispensing device send-request unit; Send the third step of the performance data of this dispensing device from dispensing device; Relatively from the performance data of dispensing device and the performance data of a plurality of Interfaces, and according to definite the 4th step of specifying Interface of comparative result; The notice dispensing device is specified the 5th step of the information of determined Interface; And between dispensing device and definite Interface, carry out the 6th step of speech recognition dialog process.In this case, speech recognition dialogue system of selection can also comprise: during the speech recognition dialog process between dispensing device and the Interface, send from seven step of Interface to the request of the copy of another Interface transfer dispensing device; The 8th step to the performance data of dispensing device send-request unit; Respond the request in the 8th step, send the 9th step of the performance data of this dispensing device from this dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the tenth step of new Interface according to comparative result; The notice dispensing device is used to specify the 11 step of the required information of Interface definite in the tenth step; And carry out the 12 step of speech recognition dialog process between Interface of in the tenth step, determining and the dispensing device.
In addition, can constitute speech recognition dialogue system of selection of the present invention, be used for by network at dispensing device, carry out data communication between a plurality of Interfaces and the service retaining device, execution will send to the process of the Interface of appointment from the speech information data of dispensing device output, and this method can comprise: reception comprises from the first step of the request of the content service of the speech recognition dialog process of dispensing device output; Second step to the performance data of this dispensing device of dispensing device request; Send the third step of the performance data of this dispensing device from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and in a plurality of Interfaces, determine the 4th step of the Interface of appointment according to comparative result; The notice dispensing device specifies in the 5th step of the required information of Interface definite in the 4th step; The 6th step of the speech recognition dialog process between the Interface of carrying out dispensing device and in the 4th step, determining; The Interface of determining from the 4th step is to seven step of service retaining device request from dispensing device requested service content; The Interface of determining in the 4th step is sent in the 8th step of requested service content in the 7th step; Read in the 9th step of the service content that sends in the 8th step by the Interface of determining in the 4th step; And carry out the tenth step of the speech recognition dialog process between the Interface of determining in dispensing device and the 4th step according to the service content of reading in.
In this case, speech recognition dialogue selecting arrangement can also comprise: during the speech recognition dialog process between dispensing device and the Interface, send from ten one step of Interface to the request of the copy of another Interface transfer dispensing device; The 12 step to the performance data of this dispensing device of dispensing device request; Send the 13 step of the performance data of this dispensing device from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the 14 step of a new Interface according to comparative result; The notice dispensing device is specified the 15 step of the required information of Interface definite in the 14 step; And Interface of determining in execution the 14 step and the 16 step of the speech recognition dialog process between the dispensing device.
As speech information, preferably use to comprise digitized voice/data, the speech information of compressed speech data or characteristic vector data.In addition, the performance data of dispensing device preferably includes: the CODEC performance, the data of speech I/O function and service content are recorded/synthesized to voice data format.The performance data of Interface preferably includes: the CODEC performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
Speech recognition dialogue selecting arrangement of the present invention can be configured to by the data communication between network execution dispensing device and a plurality of Interface, and comprise and be used to select to specify Interface and send from the selecting arrangement of the speech information data of dispensing device output to the Interface of appointment, when selecting, selecting arrangement is specified Interface according to the performance of dispensing device and the performance of a plurality of Interfaces.
In addition, speech recognition dialogue selecting arrangement of the present invention can be configured to by the data communication between network execution dispensing device and a plurality of Interface, carry out to select the Interface of appointment and send from the process of the speech information data of dispensing device output to the Interface of appointment, it comprises: be used to receive first device of wanting reformed data from the speech information and the expression Interface of dispensing device; Be used for second device to the performance data of this dispensing device of dispensing device request; Response is used for sending from dispensing device the 3rd device of performance data from the request of second device; Be used for the performance data of comparison dispensing device and the performance data of a plurality of Interfaces, and determine the 4th device of Interface according to comparative result; The notice dispensing device is used to specify the 5th device of the information of the Interface of being determined by the 4th device.
In this case, speech information preferably includes digital voice data, compressed speech data, or characteristic vector data.In addition, the performance data of dispensing device preferably includes: the CODEC performance, the data of speech I/O function and service content are recorded/synthesized to voice data format.The performance data of Interface also preferably includes: the CODEC performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
The present invention can realize by the identification of recording of voice on recording medium dialogue option program.That is to say, can dispose the recording medium that is used for according to speech recognition dialogue option program of the present invention, by the data communication between network execution dispensing device and a plurality of Interface, execution is to the process of specifying the Interface transmission from the speech information data of dispensing device output, and the step of recording of voice identification dialogue option program comprises: the first step that receives the speech information data from dispensing device; Second step to the performance data of this dispensing device of dispensing device request; Send this dispensing device performance data third step from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine to specify the 4th step of Interface according to comparative result; The notice dispensing device is specified the 5th step of determining the information of Interface; And carry out the 6th step of the speech recognition dialog process between dispensing device and the definite Interface.
In this case, the speech recognition dialogue option program that recording medium can write down also comprises: during the speech recognition dialog process between dispensing device and the Interface, transmission is used for from seven step of Interface to the request of the copy of another Interface transfer dispensing device; The 8th step to the performance data of dispensing device send-request unit; The request in the 8th step of responding sends the 9th step of the performance data of this dispensing device from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the tenth step of new Interface according to comparative result; The notice dispensing device is specified the 11 step of the required information of Interface definite in the tenth step; And Interface of determining in execution the tenth step and the 12 step of the speech recognition dialog process between the dispensing device.
For the speech recognition dialogue option program that writes down in the recording medium, preferably be used for carrying out dispensing device by network, data communication between a plurality of Interfaces and the service retaining device, with carry out to specifying Interface to send from the speech recognition dialogue option program of the process of the speech information data of dispensing device output, this program comprises: receive the first step that comprises from the request of the service content of the speech recognition dialog process of dispensing device output; Second step to the performance data of this dispensing device of dispensing device request; Send the third step of the performance data of this dispensing device from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine to specify the 4th step of Interface according to comparative result; The notice dispensing device is specified the 5th step of the required information of Interface definite in the 4th step; Carry out the 6th step of the speech recognition dialog procedure between the Interface of determining in dispensing device and the 4th step; The Interface of determining from the 4th step is to seven step of service retaining device request from dispensing device requested service content; Be sent in the 8th step of requested service content in the 7th step to the Interface of in the 4th step, determining; Read in the 9th step of the service content of the 8th step transmission by the Interface of determining in the 4th step; And carry out the tenth step of the speech recognition dialog process between the Interface of determining in dispensing device and the 4th step according to the service content of reading in.
In this case, speech recognition dialogue option program preferably also comprises: during the speech recognition dialog process between dispensing device and the Interface, the 11 step of the copy of dispensing device is shifted in the request that sends to another Interface from Interface; The 12 step to the performance data of this dispensing device of dispensing device request; Send the 13 step of the performance data of this dispensing device from dispensing device; Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the 14 step of new Interface according to comparative result; The notice dispensing device is specified the 15 step of the required information of Interface definite in the 14 step; And Interface of determining in execution the 14 step and the 16 step of the speech recognition dialog process between the dispensing device.As speech information, preferably use to comprise digitized voice/data, the speech information of compressed speech data or characteristic vector data.In addition, the performance data of dispensing device preferably includes: the CODEC performance, the data of speech I/O function and service content are recorded/synthesized to voice data format.The performance data of Interface preferably includes: the CODEC performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
Speech recognition conversational system according to the present invention is the system that connects client computer and a plurality of identification dialog server composition by network.Even under the situation that has a plurality of identification dialog servers, also can in these servers, select and determine the identification dialog server of the best, from carrying out the speech recognition dialogue at the identification dialog server of the best.
A kind of method example that is used for definite best identified dialog server is, the performance data of client computer and identification dialog server relatively, thus definite method of selecting the highest and operating identification dialog server of performance in dialog servers discerned at consistent these of the output/input of client computer 10 and identification dialog server 30.
Be used for determining that the data of client capabilities comprise: CODEC performance (CODEC type, CODEC compact model etc.), voice data format (compressed voice data, proper vector etc.), the speech I/O function of recording, synthetic speech I/O function (do not possess Compositing Engine, possess middle performance input engine, possess character string input engine etc.), the data of service content etc.Be used for determining that identification dialog server properties data comprises: CODEC performance (CODEC type, CODEC mode of extension etc.), the speech output function of recording, synthetic speech output function (do not possess Compositing Engine, possess middle performance output engine, possess waveform output engine etc.), service content, the performance of recognition engine (task dedicated engine, instruction engine, command recognition engine etc.), operation information etc. data.The type of CODEC can be AMR-NB, AMR-WB etc.The example of performance is the performance after character string is converted to the phonic symbol string in the middle of the synthetic speech.Service content comprises the service of address identification, telephone number identification and credit card number identification of resembling the identification of Address Recognition, name, incoming call tone and so on.
The processing unit of determining the identification dialog server can be included in the web server, and server is selected in the identification dialogue, or in the identification dialog server, also can be included at web server, or the identification dialogue is selected in server and the identification dialog server.
According to the present invention, can utilize the best identified dialog server to carry out the speech recognition dialogue.In addition, because identification dialog server self has the ability of determining the identification dialog server, so terminal can be visited another suitable identification dialog server automatically at session.
According to the present invention, can also receive service content from other server (for example, the server of web server or content supplier) beyond the identification dialog server, thereby carry out the speech recognition dialogue according to the service content that receives.The form of service content can be, for example VoiceXML document or service name.
Description of drawings
Fig. 1 shows the structural drawing according to the speech recognition conversational system of the embodiment of the invention.
Fig. 2 shows the block scheme according to the structure of client computer 10 of the present invention.
Fig. 3 shows the block scheme according to the structure of the identification dialog server 30 of the embodiment of the invention.
Fig. 4 shows the block scheme that the structure of server 20 is selected in identification dialogue according to the present invention.
Fig. 5 shows in the speech recognition conversational system according to the embodiment of the invention, selects definite process flow diagram of discerning the process of dialog server in the server 20 in the identification dialogue.
Fig. 6 shows at the process flow diagram according to the speech recognition dialog procedure in the speech recognition dialogue method of the embodiment of the invention.
Fig. 7 shows in the speech recognition conversational system according to the embodiment of the invention, during identification dialog server 30 is carried out the identification dialog process, and the process flow diagram flow chart when the identification dialogue selects server 20 to determine new identification dialog server 80.
Fig. 8 has shown the block diagram according to embodiment of the invention identification dialogue performance server 40.
Fig. 9 shows in according to the speech recognition dialogue method of the embodiment of the invention during the identification dialog process, the process flow diagram flow chart when identification dialogue performance server 40 is determined new identification dialog server 80.
The synoptic diagram of identification dialog server C 50 after Figure 10 shows and increases speech recognition beginning of conversation unit and service content read the unit in device shown in Figure 4 according to the embodiment of the invention.Figure 11 shows in the speech recognition dialogue method according to the embodiment of the invention, the process flow diagram flow chart of identification dialog server C 50 when service content reservation server 60 reads in service content.
Figure 12 shows on the recording medium 902 of computer server 901 and logging program the program sketch of carrying out according to the speech recognition dialogue method of the embodiment of the invention.
Embodiment
Explain embodiments of the invention in detail below with reference to accompanying drawing.
The present invention utilizes network to provide the speech recognition conversational system of speech recognition dialogue service, this system to have when having a plurality of identification dialog server, can select and the function of definite best identified dialog server.
Next, describe embodiments of the invention with reference to the accompanying drawings in detail.Fig. 1 shows the structural drawing according to the speech recognition conversational system of the embodiment of the invention.Client computer 10 is selected server 20 by network 1 and identification dialogue, identification dialog server 30, and identification dialogue performance server 40, identification dialog server C 50, new identification dialog server 80 is connected with service content reservation server 60.At this, client computer 10 is as the request unit of dispensing device that sends speech information and request service content.
The type of network 1 can be Internet (Internet) (comprising wired and wireless) or in-house network (Intranet).
Fig. 2 shows the block diagram of client computer 10 of the present invention.Client computer 10 can be a portable terminal, PDA, automobile terminal, personal computer or home terminal.Client computer 10 is by the controller 120 that is used to control client computer 10, and the data communication units 130 that is used to keep the end message storer 140 of client computer 10 performances and pass through network 1 executive communication is formed.
For judging client computer 10 properties data, use CODEC performance (CODEC type, CODEC compact model etc.), voice data format (compressed voice data, proper vector etc.), the speech I/O function of recording, synthetic speech I/O function (do not possess Compositing Engine, possess middle performance input engine, possess character string input engine etc.), the data of service content.
Should be noted that to provide Internet-browser as user interface to client computer 10.The data of service content comprise as Address Recognition, name identification, the header identification of incoming call tone, the service data of telephone number identification and credit card number identification etc.
Fig. 3 shows the block diagram according to the identification dialog server 30 of the embodiment of the invention.Identification dialog server 30 is used to data communication units 310 compositions of carrying out the speech recognition dialogue performance element 330 of speech recognition and dialogue and being used for carrying out by network 1 by the controller 320 that is used to control identification dialog server 30.
Fig. 4 shows the block diagram that server 20 is selected in identification dialogue according to the present invention.The identification dialogue selects server 20 by the data communication units 210 by network 1 executive communication, when having a plurality of identification dialog server, be used to select and identification dialog server information-storing device 230 that the identification dialog server determining unit 220 of definite best identified dialog server and being used to is stored the performance information of selected and definite identification dialog server is formed.At this, identification dialogue selects server 20 to comprise according to as the performance of the client computer 10 of dispensing device and request unit with as the performance of the identified server of Interface, selects to specify the selecting arrangement of Interface in a plurality of dialog servers.
For judging identification dialog server properties data, use CODEC performance (CODEC type, CODEC mode of extension etc.), voice data format (compressed voice data, proper vector etc.), the speech output function of recording, synthetic speech output function (do not possess Compositing Engine, possess middle performance input engine, possess waveform output engine etc.), service content, recognition engine performance (task dedicated engine, instruction engine, command recognition engine etc.), the data of operation information.
New identification dialog server 80 and identification dialog server 30, identification dialogue performance server 40, or among the identification dialog server C 50 any one is identical.
Server 20 is selected in the identification dialogue, identification dialog server 30, identification dialogue performance server 40, identification dialog server C 50 and new identification dialog server 80 can be based on the computing machine of WindowsNT (registered trademark) or Windows 2000 (registered trademark) operating system, or based on the server of Solaris (registered trademark) operating system.The back will illustrate the structure of identification dialogue performance server 40 and identification dialog server C 50.Server 20 is selected in identification dialogue, identification dialog server 30, and identification dialogue performance server 40, identification dialog server C 50 and new identification dialog server 80 etc. are as above-mentioned Interface work.
Next operation according to the speech recognition conversational system of the embodiment of the invention will be described.
At first, illustrate identification dialogue select server 20 to carry out to be used for to determine to carry out speech recognition and dialogue identification dialog server 30 process and in the identification dialog server of determining 30, carry out the situation of speech recognition dialog process.Fig. 5 shows in the speech recognition conversational system according to the embodiment of the invention, selects definite process flow diagram of discerning the process of dialog server 30 in the server 20 in the identification dialogue.
At first, client computer 10 selects server 20 requests to comprise the service (step 501) of speech recognition dialog process to the identification dialogue.More particularly, utilize the data communication units 130 of order from client computer 10 to talk with the CGI URL independent variable (argument) required that selection server 20 sends the program of the service of execution with processing to identification resemble HTTP.
Next, after receiving service request from client computer 10, the performance information (step 502) of server 20 requesting clients 10 is selected in the identification dialogue.
Next, after the request that receives from identification dialogue selection server 20 performance information, client computer 10 is selected the performance information (step 503) of server 20 transmission end message storeies 140 client computer 10 of storage from data communication units 130 to discerning dialogue by controller 120.The performance of client computer 10 comprises CODEC performance (CODEC type, CODEC compact model etc.), voice data format (compressed voice data, proper vector etc.), the speech I/O function of recording, synthetic speech I/O function (do not possess Compositing Engine, possess middle performance input engine, possess character string input engine etc.), service content etc.
The identification dialogue selects server 20 to receive from the performance information of the client computer 10 of client computer 10 transmissions, and reads the performance information of a plurality of identification dialog servers of having stored in the identification dialog server information-storing device 230.Then, the identification dialogue selects server 20 to compare the performance information of client computer 10 and the performance information (step 504) of a plurality of identification dialog servers in identification dialog server determining unit 220, thereby determines best identification dialog server (step 505) by extra consideration from the information of client computer 10 requested service contents.
Performance for the identification dialog server, comprise CODEC performance (CODEC type, CODEC mode of extension etc.), voice data format (compressed voice data, proper vector etc.), the speech output function of recording, synthetic speech output function (do not possess Compositing Engine, possess middle performance output engine, possess waveform output engine etc.), service content, the performance of recognition engine (task dedicated engine, instruction engine, command recognition engine etc.), operation information etc.
The example of the method for a definite best identified dialog server 30 is the performance of comparison client computer 10 and the performance of identification dialog server, thereby selection presents peak performance and operating identification dialog server in the consistent a plurality of identification dialog servers of output/input of client computer 10 and identification dialog server 30.In addition, an identification dialog server 30 all appears in each service content, for example, exist as the situation of the private server of address task server, name task server, telephone number task server and card ID task server under, can carry out from the system of selection of the identification dialog server of client computer 10 requested service contents so can be the example of another kind of definite method.
Next, the information (step 506) of the identification dialog server that server 20 notice client computer 10 determine in identification dialog server determining unit 220 is selected in identification dialogue.As the example of Notification Method, a kind of method is to wait the address of notifying identification dialog server 30 or the address of carrying out the executive routine of identification dialogue on identification dialog server 30 by it being embedded into the HTML screen.
Next, client computer 10 selects server 20 to receive the information of identification dialog server 30 from the identification dialogue, to the 30 request initialization speech recognition dialogues of identification dialog server, notifies its information (step 507) then.As the example of the requesting method that is used for initialization speech recognition dialogue, a kind of method is that the POST order by HTTP sends the URL address of the executive routine that is used to carry out the identification dialogue and carries out the required independent variable of speech recognition dialogue.The example of independent variable comprises the document (VoiceXML etc.) of describing service content, and service name is carried out the order of speech recognition dialogue.
Next, when receiving the request that starts the speech recognition dialogue from client computer 10, identification dialog server 30 is carried out speech recognition dialogue (step 508).In Fig. 5, the dotted line of Connection Step 508 and step 509 has shown terminal and the exchanges data of identification between the dialog server for several times.The back will describe speech recognition dialog process process in detail with reference to figure 6.
In the time will stopping the speech recognition dialogue, client computer 10 requests stop identification dialogue (step 509).The example of request identification termination of a session comprises that the POST order that utilizes HTTP sends the method for the executive routine address that is used to stop to discern dialogue and utilizes the POST order of HTTP to send the address of the executive routine that is used to carry out the identification dialogue and is used to stop discern the method for the order of dialogue.The identification dialog server receives the request that stops the speech recognition dialogue and stops identification dialogue (step 710) from client computer 10.
Next, the process of speech recognition dialog process is described.Fig. 6 shows the processing flow chart of speech recognition dialogue in according to the speech recognition dialogue method of the embodiment of the invention.
At first, the speech that is input to the data input cell 110 in the client computer 10 is sent to controller 120, controller 120 is carried out data processing then.The example of data processing comprises digitizing, and speech detects and the speech analysis.
Next, the voice data after the processing sends to identification dialog server (step 601) from data communication units 210.The example of voice data comprises digitized voice data, compressed voice data and proper vector.
In identification dialog server 30, data communication units 310 receives the voice data (step 602) that sends continuously from client computer 10, and controller 320 is determined this voice data as voice data then, and it is sent to speech recognition dialogue performance element 330.Speech recognition dialogue performance element 330 with the required recognition engine of speech recognition dialogue, recognition dictionary, Compositing Engine, synthetic dictionary continues to carry out speech recognition dialog process (step 603).
The type of the voice data that the contents processing of speech recognition dialogue can send according to client computer 10 changes.For example, if the voice data that sends is a compressed voice data, carries out expansion, speech analysis and the identification of packed data so and handle.What send is under the situation of proper vector, so only carries out speech recognition processes.After identification was finished dealing with, the recognition result of output sent to client computer 10 (step 604).The form of recognition result can be a text, meets the speech of synthesize/recording of text, the URL screen of reflection identification content etc.The recognition result (step 605) that client computer 10 receives from identification dialog server 30 according to the format analysis processing of recognition result.For example, output speech when the form of recognition result is synthetic or records speech, and when the form of recognition result is the URL screen display screen.
Like this, step 601 arrives the process repeated several times of step 605, thereby carries out voice dialogues.
The second, be given in the speech recognition conversational system according to the embodiment of the invention, replace carrying out the explanation of situation of the identification dialog server 30 of speech recognition dialog process with another new identification dialog server 80.
Fig. 7 shows in the speech recognition conversational system according to the embodiment of the invention, during identification dialog server 30 is carried out the identification dialog process, selects server 20 to determine process flow diagram flow chart under the situation of new identification dialog server 80 in the identification dialogue.
In Fig. 7, need be when new identification dialog server 80 be carried out processing after the data between client computer 10 and the identification dialog server 30 are through exchange several times, the transfer processing (step 703) that identification dialog server 30 selects server 20 to ask to new identification dialog server 80 to the identification dialogue.In Fig. 7, the dotted line of Connection Step 702 and step 703 has shown that the data between terminal and the identification dialog server are exchanged several times.
When session changes service content, produce inconsistently between service content and the server performance, the request of transfer service device can appear in the identification dialog server when breaking down.
Next, the performance information (step 704) of server 20 to client computer 10 requesting clients 10 selected in the identification dialogue.
After selecting server 20 to receive the request of performance information from identification dialogue, client computer 10 sends to identification dialog server (step 705) with the performance information of the client computer 10 of storage in the information-storing device 140 of client computer 10 from data communication units 130 by controller 120.
The identification dialogue selects server 20 to receive from the performance information of the client computer 10 of client computer 10 transmissions, read the performance information of a plurality of identification dialog servers of storage in the identification dialog server information-storing device 230, the relatively performance information of client computer 10 and the performance information (step 706) of a plurality of identification dialog servers in identification dialog server determining unit 220, thus cause that by extra consideration the information of the service content of identification dialog server transfer request determines the identification dialog server (step 707) of the best.The method of determining the performance information of client computer 10, the performance information of discerning dialog server and identification dialog server is the same.
Next, the information (step 708) of the new identification dialog server 80 that server 20 notice client computer 10 determine in identification dialog server determining unit 220 is selected in identification dialogue.An example of Notification Method is by it being embedded into the address that the HTML screen is notified the address of new identification dialog server 80 and carry out the executive routine of identification dialogue on new identification dialog server 80.
Next, client computer 10 receives the address information of new identification dialog server 80, and the new identification dialog server 80 of request notice starts speech recognition dialogue (step 709).One is asked the example of startup speech recognition dialogue method is to utilize the POST order of HTTP to send the URL address of carrying out to discern the executive routine of talking with and carry out the required parameter of speech recognition dialogue.
The 3rd, in speech recognition conversational system according to the embodiment of the invention, above-mentioned identification dialogue selects server 20 and identification dialog server 30 can be arranged in the same station server, thereby forms the identification dialogue performance server 40 that can carry out the speech recognition dialogue and select suitable speech recognition dialog server.
Fig. 8 shows the block diagram according to the identification dialogue performance server 40 of the embodiment of the invention.
As shown in Figure 8, increase identification dialog server determining unit 440 and identification dialog server information-storing device 450 on the identification dialog server 30 shown in Figure 3 and form identification dialogue performance server 40.Other parts, that is, data communication units 410, controller 420 and speech recognition dialogue performance element 430 is identical with corresponding component among Fig. 3.
Controller 420, carry out the speech recognition dialogue performance element 430 of speech recognition and dialogue, with data communication units 410 by network 1 executive communication respectively with controller 320, carry out the speech recognition dialogue performance element 330 of speech recognition and dialogue, identical with the data communication units 310 of passing through network 1 executive communication.
When having a plurality of identification dialog server, identification dialog server determining unit 440 is selected and is determined best identification dialog server.Identification dialog server information-storing device 450 is stored the performance information of the identification dialog server of selecting and determining.Identical in example and the first kind of situation of the performance of identification dialog server, comprise CODEC performance (CODEC type, CODEC compact model etc.), voice data format (compressed voice data, proper vector etc.), the speech output function of recording, synthetic speech output function (do not possess Compositing Engine, possess middle performance output engine, possess waveform output engine etc.), service content, the performance of recognition engine (task dedicated engine, instruction engine, command recognition engine etc.), operation information etc.
In this case, identification dialogue performance server 40 is by the processing procedure shown in its execution graph 5.
Next, be given in another new identification dialog server 80 of carrying out the speech recognition dialog process and replace carrying out explanation under the situation of identification dialogue performance server 40 of speech recognition dialog process.
Fig. 9 shows in the speech recognition dialogue method according to the embodiment of the invention, during the identification dialog process, determines the processing flow chart of new identification dialog server 80 at identification dialogue performance server 40.
Referring to Fig. 9, when the exchanges data between terminal and the identification dialog server need be carried out processing for several times in new identification dialog server 80, identification dialogue performance server 40 was to the performance information (step 903) of client computer 10 requesting clients 10.In Fig. 9, the exchanges data between the dotted line display terminal of Connection Step 902 and step 903 and the identification dialog server is performed for several times.
When the session service content is changed, take place between service content and the server performance inconsistent, performance information that may requesting clients 10 when the identification dialog server such as breaks down at situation.
Next, after receiving the performance information request from identification dialogue performance server 40, client computer 10 sends to identification dialogue performance server 40 (step 904) with the performance information of the client computer 10 of storage in the end message storer 140 from data communication units 130 by controller 120.
The performance information of the client computer 10 that identification dialogue performance server 40 subscribing clients 10 send, read the performance information of a plurality of identification dialog servers of storage in the identification dialog server information-storing device 450, compare the performance information of client computer 10 and the performance information (step 905) of a plurality of identification dialog servers in identification dialog server determining unit 440, thereby determine best identification dialog server (step 906) from the information of client computer 10 requested service contents by extra consideration.The performance information of client computer 10, the performance information of identification dialog server, with the method for determining the identification dialog server with above-mentioned identical.
Next, identification dialogue performance server 40 is being discerned the information (step 907) of the new identification dialog server of determining in the dialog server determining unit 440 80 to client computer 10 notices.An example of Notification Method is to notify the address of new identification dialog server 80 or the address of carrying out the executive routine of identification dialogue at new identification dialog server 80 by it being embedded the HTML screen.
Next, client computer 10 receives the new identification dialog server 80 startup speech recognition dialogues (step 908) of the address information and the request notice of new identification dialog server 80.A kind of example of asking to start the method for speech recognition dialogue is to utilize the POST order of HTTP to send the URL address of the executive routine of carrying out the identification dialogue and carry out the required parameter of speech recognition dialogue.
The 4th, in speech recognition conversational system, be given in identification dialog server C 50 from read in the explanation under the service content situation such as the such service content reservation server 60 of content supplier according to the embodiment of the invention.In this case, service content reservation server 60 can be arranged on the identification dialogue and select in the server 20, utilizes web as the web server that service interface is provided to the user thereby form.In addition, in this case, can provide the web browser as the interface of selecting or import service content to client computer 10.
Figure 10 shows the synoptic diagram of the identification dialog server C (identification dialog server device) 50 according to the embodiment of the invention.Identification dialog server device 50 shown in configuration Figure 10 increases speech recognition session start unit 530 on identification dialogue performance server 40 shown in Figure 8 and service content is read unit 540.Such as data communication units 510, controller 520, speech recognition dialogue performance element 530, identification dialog server determining unit 560 is identical with corresponding components among Fig. 8 with other parts of identification dialog server information-storing device 570 and so on.
Speech recognition session start unit 530 starts the speech recognition dialog process according to the information on services that client computer 10 sends, and to the server requests service content that is used to keep service content.Service content comprises header identification, telephone number identification and the credit card number identification of Address Recognition, name identification, incoming call tone.
Service content is read unit 540 and is read in service content from service content reservation server 60.Speech recognition dialogue performance element 550, controller 520 and data communication units 510 are talked with performance element 430 with speech recognition respectively, and controller 420 is identical with data communication units 410.Identification dialog server information-storing device 570 and identification dialog server determining unit 560 can be provided.In this case, select server 20 to carry out determining by the identification dialogue to an identification dialog server.If identification dialog server information-storing device 570 and identification dialog server determining unit 560 are provided, they are identical with identification dialog server determining unit 440 with identification dialog server information-storing device 450 respectively.
Figure 11 shows in the speech recognition dialogue method according to the embodiment of the invention, and identification dialog server C 50 reads in the process flow diagram of the process of service content from service content reservation server 60.
Step 1101 is identical to the processing of step 506 with step 501 described above to the processing of step 1105 among Figure 11.
Next, according to the information of selecting the identification dialog server C 50 of server 20 notices from the identification dialogue, client computer 10 request identification dialog server C 50 start speech recognition dialogue (step 1106).During asking, send service information.
A kind of example of asking to start the method for speech recognition dialogue is URL address and the service content information that utilizes the POST order of HTTP to send to be used to the executive routine of carrying out the identification dialogue.Service content information comprises document (VoiceXML etc.) and the service name of describing service content.
Next, identification dialog server C 50 receives request in data communication units 510 from client computer 10,530 start the speech recognition dialog process in speech recognition session start unit, and ask service content (step 1107) according to the information on services that client computer 10 sends to service content reservation server 60.
A kind of example of the method for service content of asking is, is under the situation of address in the service content information that sends from client computer 10, then visits this address.In the service content information that sends from client computer 10 is under the situation of service name, then has method that another kind obtains the address of corresponding with service title and this address of visit as an example.
Next, service content reservation server 60 receives the request of self-identifying dialog server C 50, and sends service content (step 1108).Identification dialog server C 50 receives the service content that sends in data communication units 510, reads unit 540 in service content and reads in service content (step 1109), starts speech recognition dialog process (step 1110) then.
Step 1110 is identical to the process of step 510 with step 507 among Figure 11 to the process of step 1112, and the exchanges data between the dotted line display terminal of Connection Step 1110 and step 1111 and the identification dialog server is performed for several times.
In said system, illustrated that identification dialogue selection server 20 and identification dialog server C 50 are connected to the example of bilateral network.But, also can accept the configuration that one of them is connected to network.
Each step of above-mentioned explanation can realize by the program of operation on server computer 901.Figure 12 shows the program of carrying out according to the speech recognition dialogue method of the embodiment of the invention on server computer 901, and the synoptic diagram of the medium 902 of logging program.
Industrial applicibility
According to aforesaid the present invention, even in the situation that has a plurality of identification dialog servers, Thereby also can from a plurality of servers, select and the knowledge of definite best identified dialog server execution speech Dui Hua not.
In addition, though since many reasons so that need to be in new identification dialogue service at session Carry out to process on the device, client computer also can another suitable identification dialog server of automatic access, So that the identification dialog process can continue.

Claims (32)

1. speech recognition Interface comprises:
Be used to carry out a plurality of Interfaces of speech recognition dialogue;
Be used for sending the dispensing device of speech information to Interface;
The network that connects dispensing device and Interface; With
In a plurality of Interfaces, select an Interface according to the performance of dispensing device and the performance of a plurality of Interfaces, and possess the information that is used to specify selected Interface to the dispensing device transmission, and the selecting arrangement of the required information of speech recognition dialogue is carried out in exchange between selected Interface and dispensing device.
2. speech recognition Interface according to claim 1 also comprises the request unit that is used for to Interface request service, and described network connects dispensing device, request unit and Interface.
3. speech recognition Interface according to claim 1 and 2 also comprises the service retaining device that is used to keep to Interface requested service content; And described network connection service retaining device, dispensing device and Interface.
4. speech recognition Interface according to claim 2, wherein selecting arrangement has the information that is used to specify selected Interface to the dispensing device transmission, and at selected Interface, the function of Exchange Service content and speech information between request unit and the dispensing device.
5. speech recognition Interface according to claim 1, wherein selecting arrangement has the function of selected Interface being changed into another Interface.
6. speech recognition Interface according to claim 4, wherein selecting arrangement has the function of selected Interface being changed into another Interface.
7. according to any one the described speech recognition Interface in the claim 1,2,3, wherein selecting arrangement has the function that the performance with the performance of dispensing device and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
8. speech recognition Interface according to claim 5, wherein selecting arrangement has the function that the performance with the performance of dispensing device and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
9. speech recognition Interface according to claim 6, wherein selecting arrangement has the function that the performance with the performance of dispensing device and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
10. according to claim 2 or 4 described speech recognition Interfaces, wherein selecting arrangement has the function that the performance with the service of dispensing device and performance and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
11. speech recognition Interface according to claim 5, wherein selecting arrangement has the function that the performance with the service of dispensing device and performance and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
12. speech recognition Interface according to claim 6, wherein selecting arrangement has the function that the performance with the service of dispensing device and performance and a plurality of Interfaces compares, and have the function of determining to have the Interface of desired properties according to comparative result, described desired properties is meant that the input format of the speech information that is input to Interface is consistent with the output format of the speech information that outputs to dispensing device.
13. speech recognition Interface according to claim 1, wherein from the speech information of dispensing device output by digitized voice/data, compressed speech data or characteristic vector data form.
14. speech recognition Interface according to claim 1 is used for wherein determining that the dispensing device properties data comprises the encoding and decoding performance, voice data format and record/synthesize the data of speech I/O function.
15. speech recognition Interface according to claim 1 is used for wherein determining that the Interface properties data comprises the encoding and decoding performance, the speech output function is recorded/synthesized to voice data format, service content, the data of recognition performance and operation information.
16. a speech recognition Interface comprises:
Be used to carry out a plurality of speech recognition dialog servers of speech recognition dialogue;
Be used to send to the content service of speech recognition dialog server request and the client computer of speech information;
Be used for selecting the speech recognition dialogue of an Interface to select server at a plurality of Interfaces; With
Connect client computer, the network of server is selected in speech recognition dialog server and speech recognition dialogue; Wherein
Client computer comprises: the data input cell that is used to import the data of speech information and service content, the end message storer that is used for the performance data of storage client, be used for by the communication between network execution speech recognition dialog server and the speech recognition selection server, and send the data communication units of speech information to selected speech recognition dialog server, and the controller that is used to control client actions
The speech recognition dialogue selects server to comprise: the data communication units that is used for carrying out by network the communication between client computer and the speech recognition dialog server, be used to store the identification dialog server information-storing device of the performance of each speech recognition dialog server, and the performance data that is used for reading the client computer that the end message storer stores, the performance data of the speech recognition dialog server of storing in this performance data and the identification dialog server information-storing device relatively, in a plurality of speech recognition dialog servers, determine at least one speech recognition dialog server, send the identification dialog server determining unit that is used to specify the required information of determined speech recognition dialog server to client computer then
The speech recognition dialog server comprises: the speech recognition dialogue performance element that is used for carrying out according to the speech information of client computer input the speech recognition dialogue, be used for carrying out client computer and the data communication units of the communication between the server is selected in the speech recognition dialogue by network, and the controller that is used to control the operation of speech recognition dialog server.
17. speech recognition Interface according to claim 16, also comprise: be connected on the network and keep from the service content reservation server of the service content of client requests and be arranged in the speech recognition dialog server and read in the unit of reading of the service content that keeps in the service content reservation server.
18. according to claim 16 or 17 described speech recognition Interfaces, also comprise: be arranged in the speech recognition dialog server, be used for selecting server output the speech recognition dialog process to be transferred to the process transfer device of the request of another speech recognition dialog server to the speech recognition dialogue.
19. speech recognition Interface according to claim 16, wherein from the speech information of client computer output by digital voice data, compressed speech data or characteristic vector data form.
20. speech recognition Interface according to claim 16, wherein the performance data of client computer comprises: encoding and decoding performance, voice data format and the data of recording/synthesize speech I/O function.
21. speech recognition Interface according to claim 16, wherein the performance data of speech recognition dialog server comprises the encoding and decoding performance, and the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
22. a speech recognition dialogue system of selection is used for by the data communication between network execution dispensing device and a plurality of Interface, and is used to carry out the process that the speech information data from dispensing device output is sent to the appointment Interface, comprising:
Receive the first step of speech information data from dispensing device;
Second step to the performance data of this dispensing device of dispensing device request;
Send the third step of the performance data of this dispensing device from dispensing device;
Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine to specify the 4th step of Interface according to comparative result;
The notice dispensing device is used to specify the 5th step of the information of determined Interface; With
Carry out the 6th step of the speech recognition dialog process between dispensing device and the definite Interface.
23. speech recognition dialogue system of selection according to claim 22 also comprises:
During the speech recognition dialog process between dispensing device and the Interface, transmission is used for from seven step of this Interface to the request of the corresponding content of another Interface transfer dispensing device;
The 8th step to the performance data of this dispensing device of dispensing device request;
The request in the 8th step of responding sends the 9th step of the performance data of this dispensing device from dispensing device;
Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the tenth step of new Interface according to comparative result;
The notice dispensing device is used to specify the 11 step of the required information of Interface definite in the tenth step; With
Carry out the 12 step of speech recognition dialog process between Interface of in the tenth step, determining and the dispensing device.
24. speech recognition dialogue system of selection according to claim 22 wherein as speech information, is used to comprise digitized voice/data, the speech information of compressed speech data or characteristic vector data.
25. speech recognition according to claim 22 dialogue system of selection, wherein the performance data of dispensing device comprises the encoding and decoding performance, and the data of speech I/O function and service content are recorded/synthesized to voice data format.
26. speech recognition dialogue system of selection according to claim 22, wherein the performance data of Interface comprises the encoding and decoding performance, and the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
27. speech recognition dialogue system of selection, be used for by network at dispensing device, carry out data communication and carry out the process that the speech information data from dispensing device output is sent to the Interface of appointment between a plurality of Interfaces and the service retaining device, the method comprising the steps of:
Reception comprises from the first step of the request of the service content of the speech recognition dialog process of dispensing device output;
Second step to the performance data of this dispensing device of dispensing device request;
Send the third step of the performance data of this dispensing device from dispensing device;
Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and in a plurality of Interfaces, determine the 4th step of the Interface of appointment according to comparative result;
The notice dispensing device is used to specify the 5th step of the required information of the Interface of determining in the 4th step;
Carry out the 6th step of speech recognition dialog process between the Interface of in dispensing device and the 4th step, determining;
The Interface of determining from the 4th step is to seven step of service retaining device request by dispensing device requested service content;
The Interface of determining in the 4th step is sent in the 8th step of requested service content in the 7th step;
Read in the 9th step of the service content that sends in the 8th step by the Interface of determining in the 4th step; With
Carry out the tenth step of speech recognition dialog process between the Interface of in dispensing device and the 4th step, determining according to the service content of reading in.
28. speech recognition dialogue system of selection according to claim 27 also comprises:
During the speech recognition dialog process between dispensing device and the Interface, send from the 11 step of this Interface to the request of the corresponding content of another Interface transfer dispensing device;
The 12 step to the performance data of this dispensing device of dispensing device request;
Send the 13 step of the performance data of this dispensing device from dispensing device;
Compare the performance data of dispensing device and the performance data of a plurality of Interfaces, and determine the 14 step of new Interface according to comparative result;
The notice dispensing device is used to specify the 15 step of the required information of Interface definite in the 14 step; With
Carry out the 16 step of speech recognition dialog process between Interface of in the 14 step, determining and the dispensing device.
29. speech recognition dialogue selecting arrangement, be used for by the data communication between network execution dispensing device and a plurality of Interface, send from the process of the speech information data of dispensing device output with Interface that carry out to select appointment and to the Interface of appointment, this device comprises:
Be used to receive speech information and first device of representing the data that Interface will change from dispensing device;
Be used for second device to the performance data of this dispensing device of dispensing device request;
The request of response second device is used for sending from dispensing device the 3rd device of performance data;
Be used for the performance data of comparison dispensing device and the performance data of a plurality of Interfaces, and determine the 4th device of Interface according to comparative result; With
The notice dispensing device is used to specify the 5th device of the information of the Interface of determining in the 4th device.
30. speech recognition dialogue selecting arrangement according to claim 29, wherein speech information comprises digitized voice/data, compressed speech data, or characteristic vector data.
31. speech recognition according to claim 29 dialogue selecting arrangement, wherein the performance data of dispensing device comprises the encoding and decoding performance, and the data of speech I/O function and service content are recorded/synthesized to voice data format.
32. speech recognition dialogue selecting arrangement according to claim 29, wherein the performance data of Interface comprises the encoding and decoding performance, and the speech output function is recorded/synthesized to voice data format, service content, the data of recognition capability and operation information.
CNB038003465A 2002-04-04 2003-03-12 Speech recognition conversation selection device, speech recogntion conversation system, speech recognition conversation selection method, and program Expired - Fee Related CN1282946C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP102274/2002 2002-04-04
JP2002102274A JP2003295890A (en) 2002-04-04 2002-04-04 Apparatus, system, and method for speech recognition interactive selection, and program

Publications (2)

Publication Number Publication Date
CN1514995A CN1514995A (en) 2004-07-21
CN1282946C true CN1282946C (en) 2006-11-01

Family

ID=28786256

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB038003465A Expired - Fee Related CN1282946C (en) 2002-04-04 2003-03-12 Speech recognition conversation selection device, speech recogntion conversation system, speech recognition conversation selection method, and program

Country Status (6)

Country Link
US (1) US20040162731A1 (en)
EP (1) EP1394771A4 (en)
JP (1) JP2003295890A (en)
CN (1) CN1282946C (en)
TW (1) TWI244065B (en)
WO (1) WO2003085640A1 (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3885523B2 (en) * 2001-06-20 2007-02-21 日本電気株式会社 Server / client type speech recognition apparatus and method
FR2853126A1 (en) * 2003-03-25 2004-10-01 France Telecom DISTRIBUTED SPEECH RECOGNITION PROCESS
US8311822B2 (en) * 2004-11-02 2012-11-13 Nuance Communications, Inc. Method and system of enabling intelligent and lightweight speech to text transcription through distributed environment
GB2427500A (en) * 2005-06-22 2006-12-27 Symbian Software Ltd Mobile telephone text entry employing remote speech to text conversion
US7957975B2 (en) * 2005-08-09 2011-06-07 Mobile Voice Control, LLC Voice controlled wireless communication device system
WO2007050358A2 (en) * 2005-10-21 2007-05-03 Callminer, Inc. Method and apparatus for processing heterogeneous units of work
US9330668B2 (en) * 2005-12-20 2016-05-03 International Business Machines Corporation Sharing voice application processing via markup
US20080154608A1 (en) * 2006-12-26 2008-06-26 Voice Signal Technologies, Inc. On a mobile device tracking use of search results delivered to the mobile device
US20080154870A1 (en) * 2006-12-26 2008-06-26 Voice Signal Technologies, Inc. Collection and use of side information in voice-mediated mobile search
US20080154612A1 (en) * 2006-12-26 2008-06-26 Voice Signal Technologies, Inc. Local storage and use of search results for voice-enabled mobile communications devices
US20080153465A1 (en) * 2006-12-26 2008-06-26 Voice Signal Technologies, Inc. Voice search-enabled mobile device
CN101079885B (en) * 2007-06-26 2010-09-01 中兴通讯股份有限公司 A system and method for providing automatic voice identification integrated development platform
DE102008033056A1 (en) 2008-07-15 2010-01-21 Volkswagen Ag Motor vehicle, has controller detecting manual input taken place by operating device, detecting acoustic input allowed corresponding to manual input, and acoustically outputting determined allowed acoustic input by loudspeaker
US10387140B2 (en) 2009-07-23 2019-08-20 S3G Technology Llc Modification of terminal and service provider machines using an update server machine
CN102237087B (en) * 2010-04-27 2014-01-01 中兴通讯股份有限公司 Voice control method and voice control device
US20120059655A1 (en) * 2010-09-08 2012-03-08 Nuance Communications, Inc. Methods and apparatus for providing input to a speech-enabled application program
WO2014020835A1 (en) * 2012-07-31 2014-02-06 日本電気株式会社 Agent control system, method, and program
CN103024169A (en) * 2012-12-10 2013-04-03 深圳市永利讯科技股份有限公司 Method and device for starting communication terminal application program through voice
US9413891B2 (en) 2014-01-08 2016-08-09 Callminer, Inc. Real-time conversational analytics facility
CN103870547A (en) * 2014-02-26 2014-06-18 华为技术有限公司 Grouping processing method and device of contact persons
JP2018037819A (en) * 2016-08-31 2018-03-08 京セラ株式会社 Electronic device, control method and program
US11663535B2 (en) 2016-10-03 2023-05-30 Google Llc Multi computational agent performance of tasks
JP6882463B2 (en) * 2016-10-03 2021-06-02 グーグル エルエルシーGoogle LLC Computer-based selection of synthetic speech for agents
CN106998359A (en) * 2017-03-24 2017-08-01 百度在线网络技术(北京)有限公司 The method for network access and device of speech-recognition services based on artificial intelligence
JP6843388B2 (en) * 2017-03-31 2021-03-17 株式会社アドバンスト・メディア Information processing system, information processing device, information processing method and program
EP3596616A1 (en) 2018-05-03 2020-01-22 Google LLC. Coordination of overlapping processing of audio queries
JP6555838B1 (en) * 2018-12-19 2019-08-07 Jeインターナショナル株式会社 Voice inquiry system, voice inquiry processing method, smart speaker operation server apparatus, chatbot portal server apparatus, and program.
CN109949817B (en) * 2019-02-19 2020-10-23 一汽-大众汽车有限公司 Voice arbitration method and device based on dual-operating-system dual-voice recognition engine
CN110718219B (en) * 2019-09-12 2022-07-22 百度在线网络技术(北京)有限公司 Voice processing method, device, equipment and computer storage medium
JP7377668B2 (en) * 2019-10-04 2023-11-10 エヌ・ティ・ティ・コミュニケーションズ株式会社 Control device, control method and computer program
CN113450785B (en) * 2020-03-09 2023-12-19 上海擎感智能科技有限公司 Implementation method, system, medium and cloud server for vehicle-mounted voice processing

Family Cites Families (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5708697A (en) * 1996-06-27 1998-01-13 Mci Communications Corporation Communication network call traffic manager
US6292782B1 (en) * 1996-09-09 2001-09-18 Philips Electronics North America Corp. Speech recognition and verification system enabling authorized data transmission over networked computer systems
US6078886A (en) * 1997-04-14 2000-06-20 At&T Corporation System and method for providing remote automatic speech recognition services via a packet network
CN1163869C (en) * 1997-05-06 2004-08-25 语音工程国际公司 System and method for developing interactive speech applications
US7251315B1 (en) * 1998-09-21 2007-07-31 Microsoft Corporation Speech processing for telephony API
US7003463B1 (en) * 1998-10-02 2006-02-21 International Business Machines Corporation System and method for providing network coordinated conversational services
US6408272B1 (en) * 1999-04-12 2002-06-18 General Magic, Inc. Distributed voice user interface
US6363349B1 (en) * 1999-05-28 2002-03-26 Motorola, Inc. Method and apparatus for performing distributed speech processing in a communication system
US6792086B1 (en) * 1999-08-24 2004-09-14 Microstrategy, Inc. Voice network access provider system and method
US6937977B2 (en) * 1999-10-05 2005-08-30 Fastmobile, Inc. Method and apparatus for processing an input speech signal during presentation of an output audio signal
US6633846B1 (en) * 1999-11-12 2003-10-14 Phoenix Solutions, Inc. Distributed realtime speech recognition system
JP2001142488A (en) * 1999-11-17 2001-05-25 Oki Electric Ind Co Ltd Voice recognition communication system
US6396898B1 (en) * 1999-12-24 2002-05-28 Kabushiki Kaisha Toshiba Radiation detector and x-ray CT apparatus
JP2001222292A (en) * 2000-02-08 2001-08-17 Atr Interpreting Telecommunications Res Lab Voice processing system and computer readable recording medium having voice processing program stored therein
US6505161B1 (en) * 2000-05-01 2003-01-07 Sprint Communications Company L.P. Speech recognition that adjusts automatically to input devices
JP3728177B2 (en) * 2000-05-24 2005-12-21 キヤノン株式会社 Audio processing system, apparatus, method, and storage medium
US6934756B2 (en) * 2000-11-01 2005-08-23 International Business Machines Corporation Conversational networking via transport, coding and control conversational protocols
CN1266625C (en) * 2001-05-04 2006-07-26 微软公司 Server for identifying WEB invocation
GB2376394B (en) * 2001-06-04 2005-10-26 Hewlett Packard Co Speech synthesis apparatus and selection method
US6996525B2 (en) * 2001-06-15 2006-02-07 Intel Corporation Selecting one of multiple speech recognizers in a system based on performance predections resulting from experience
US20030078777A1 (en) * 2001-08-22 2003-04-24 Shyue-Chin Shiau Speech recognition system for mobile Internet/Intranet communication
US7146321B2 (en) * 2001-10-31 2006-12-05 Dictaphone Corporation Distributed speech recognition system
US6785654B2 (en) * 2001-11-30 2004-08-31 Dictaphone Corporation Distributed speech recognition system with speech recognition engines offering multiple functionalities
US6898567B2 (en) * 2001-12-29 2005-05-24 Motorola, Inc. Method and apparatus for multi-level distributed speech recognition
GB2389217A (en) * 2002-05-27 2003-12-03 Canon Kk Speech recognition system
US6834265B2 (en) * 2002-12-13 2004-12-21 Motorola, Inc. Method and apparatus for selective speech recognition
US7076428B2 (en) * 2002-12-30 2006-07-11 Motorola, Inc. Method and apparatus for selective distributed speech recognition
US20050177371A1 (en) * 2004-02-06 2005-08-11 Sherif Yacoub Automated speech recognition

Also Published As

Publication number Publication date
JP2003295890A (en) 2003-10-15
TWI244065B (en) 2005-11-21
CN1514995A (en) 2004-07-21
EP1394771A1 (en) 2004-03-03
TW200307908A (en) 2003-12-16
WO2003085640A1 (en) 2003-10-16
US20040162731A1 (en) 2004-08-19
EP1394771A4 (en) 2005-10-19

Similar Documents

Publication Publication Date Title
CN1282946C (en) Speech recognition conversation selection device, speech recogntion conversation system, speech recognition conversation selection method, and program
CN1276367C (en) Multimedia data flow code conversion system
CN1492656A (en) Method, apparatus and system for sharing application session information over multiple channels
CN1286304C (en) Method of realizing scene chat between customers in instant communication
CN101075983A (en) Instant speech telecommunication terminal, server, system and instant speech telecommunication method
CN1969316A (en) Centralized biometric authentication
CN1573928A (en) Semantic object synchronous understanding implemented with speech application language tags
CN1444748A (en) Network service system and method
CN1574795A (en) System and method for using packed compressed buffers for improved client server communication
CN1799217A (en) System and method for authorizing a party to join a conference
EP2204965A1 (en) Device and method for receiving scalable content from multiple sources having different content quality
CN1612580A (en) Mobile phone system with incoming melody designating function and mobile phone
CN1976322A (en) Method and system for realizing multimedia immediate communicating and control flow
CN1274175C (en) Mobile communication terminal device, control method and programme thereof
CN1792081A (en) Method for setting up a call between selected subscriber terminals via a dedicated communication device
CN1831762A (en) Development framework for mixing semantics-driven and state driven dialog
CN1901707A (en) Monitoring mobile phone and its remote monitoring method
CN1617075A (en) Modal synchronization control method and multi-modal interface system
CN1742461A (en) Methods and apparatus for identifying patterns in messages and generating actions
CN1652543A (en) Method and apparatus for connecting heterogeneous protocol nodes
CN1905598A (en) Method and system for searching and obtaining WAP network address based on speech identifying technique
US7403605B1 (en) System and method for local replacement of music-on-hold
CN2881786Y (en) Information processing device
CN1716861A (en) Method for providing a cellular phone or a portable terminal with news or other information
CN1909506A (en) Device and method for controlling broadcast of media resource in soft exchange

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20061101

Termination date: 20160312

CF01 Termination of patent right due to non-payment of annual fee