WO2015141700A1

WO2015141700A1 - Dialogue system construction support apparatus and method

Info

Publication number: WO2015141700A1
Application number: PCT/JP2015/057970
Authority: WO
Inventors: Yumiko Shimogori; Kenji Iwata; Masahiro Ito; Hisayoshi Nagae
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2014-03-18
Filing date: 2015-03-11
Publication date: 2015-09-24
Anticipated expiration: 2016-09-18
Also published as: JP2015176099A

Abstract

According to an embodiment, a dialogue system construction support apparatus includes the following units. The speech recognition unit performs speech recognition for utterances included in a dialogue to generate texts. The spoken language understanding unit understands intentions of the utterances based on the texts and obtain a spoken language understanding result including types of the utterances, the intentions of the utterances, words included in the texts, and semantic classes of the words. The scenario construction unit acquires, as an attribute, a word having a semantic class common to an utterance of question and an utterance of answer and constructs a scenario using the attribute and an action executed by the operator concerning the dialogue.

Description

D E S C R I P T I O N

DIALOGUE SYSTEM CONSTRUCTION SUPPORT APPARATUS AND METHOD

Cross-Reference to Related Applications This application is based upon and claims the benefit of priority from Japanese Patent Application

No. 2014-054491, filed March 18, 2014, the entire contents of which are incorporated herein by reference.

Field

Embodiments described herein relate generally to a dialogue system construction support apparatus and method.

Background

There exists a dialogue system such as an interactive voice response apparatus that automatically responds to an utterance of a user. Such a dialogue system responds in accordance with a scenario constructed in advance. The dialogue system may fail in responding because of a

scenario that does not meet a request of a user. In this case, an operator responds to the user. To respond well to a similar request received later, a new scenario needs to be added to the dialogue system.

It is necessary to construct a scenario from the dialogue between the user and the operator concerning the response failure of the dialogue system. JP-B 4901738 discloses an automated response system that performs learning using a conversation set between an agent and a user. However, there is no technique of constructing a scenario from a dialogue between a user and an operator in consideration of an action the operator has taken during the response .

Brief Description of Drawings

FIG. 1 is a block diagram schematically showing a dialogue system according to an embodiment;

FIG. 2 is a flowchart showing an example of the procedure of dialogue log recording according to the embodiment ;

FIG. 3 is a view showing an example of a dialogue between a user and an operator;

FIG. 4 is a view showing examples of an utterance type;

FIG. 5 is a view showing examples of an intention tag; FIG. 6 is a view showing examples of an action;

FIG. 7 is a view showing examples of a semantic class; FIG. 8 is a view showing an example of action

contents ;

FIG. 9 is a view showing a dialogue log concerning the dialogue shown in FIG . 3 ;

FIG. 10 is a flowchart showing an example of the procedure of scenario construction according to the

embodiment ;

FIG. 11A and FIG. 11B are views showing examples of a scenario constructed from the dialogue log shown in FIG. 9;

FIG. 12 is a view showing an example of a dialogue between the user and the operator, in which the user makes an utterance other than an answer in response to a question of the operator;

FIG. 13 is a view showing evaluation data used by a scenario construction unit shown in FIG. 1 to evaluate a scenario;

FIG. 14 is a flowchart showing an example of the procedure of action candidate display according to the embodiment ; and

FIG. 15 is a view showing an example of contents displayed by a dialogue state display unit shown in FIG. 1.

Detailed Description

According to an embodiment, a dialogue system

construction support apparatus includes a speech

recognition unit, a spoken language understanding unit, a dialogue information storage unit, and a scenario

construction unit. The speech recognition unit is

configured to perform speech recognition for utterances included in a dialogue between a user and an operator and generate a speech recognition result including texts corresponding to the utterances . The spoken language understanding unit is configured to understand intentions of the utterances based on the texts and obtain a spoken language understanding result including types of the utterances, the intentions of the utterances, words

included in the texts, and semantic classes of the words. The dialogue information storage unit is configured to store the speech recognition result, the spoken language understanding result, and an action executed by the

operator concerning the dialogue in association with each other. The scenario construction unit is configured to acquire, as an attribute, a word having a semantic class common to an utterance of question and an utterance of answer included in the dialogue and construct a scenario using the attribute and the action.

Hereinafter, embodiments will be described with reference to the accompanying drawings . The embodiments are directed to a dialogue system that automatically responds to an utterance of a user. This dialogue system is used in, for example, a contact center. The dialogue system selects a scenario meeting the utterance of the user from scenarios (dialogue scenarios) registered in advance and responds in accordance with the scenario. When the dialogue system has failed in responding, an operator responds via a dialogue with the user. The dialogue system can construct a new scenario based on the dialogue between the user and the operator and an action of the operator. As a result, the dialogue system can respond well to a similar request received later. In addition, the scenario construction cost can be reduced. It is also possible to decrease the necessary number of operators.

FIG. 1 schematically shows a dialogue system 100 according to an embodiment. As shown in FIG. 1, the dialogue system 100 includes a speech recognition unit 101, a spoken language understanding unit 102, a dialogue management unit 103, a response generation unit 104, a dialogue extraction unit 105, a scenario construction unit 106, a scenario updating unit 107, a dictionary storage unit 108, a spoken language understanding model storage unit 109, a scenario storage unit 110, a dialogue log storage unit (also called a dialogue information storage unit) 111, a dialogue state display unit 112, a scenario searching unit 113, and a scenario object database (DB) 114.

Automatic response processing of the dialogue system 100 will briefly be explained first. For example, the user communicates with the dialogue system 100 via a network using a terminal such as a mobile-phone or a smartphone. The dialogue system 100 provides a service to the terminal via the network by the automatic response processing. For example, the dialogue system 100 transmits, to the

terminal, data of a map illustrating the destination of the user, as in an example to be described later.

The dialogue system 100 executes the automatic

response processing using the speech recognition unit 101, the spoken language understanding unit 102, the dialogue management unit 103, the response generation unit 104, the dictionary storage unit 108, the spoken language

understanding model storage unit 109, and the scenario object DB 114. The speech recognition unit 101 performs speech recognition for the utterance of the user, and generates a natural language text (to be simply referred to as a text hereinafter) corresponding to the utterance. The spoken language understanding unit 102 analyzes the text by referring to the dictionary storage unit 108 and the spoken language understanding model storage unit 109 so as to understand the intention of the utterance, and outputs the spoken language understanding result. The dialogue

management unit 103 selects a scenario corresponding to the spoken language understanding result from the scenario object DB 114, and executes an action (for example,

transmission of map data) defined in the selected scenario. The response generation unit 104 generates a response sentence corresponding to the action executed by the dialogue management unit 103. The response sentence is converted into speech by a speech synthesis technology and output .

Scenario construction processing of the dialogue system 100 will be described next.

In the dialogue system 100, the dialogue with the user may fail because, for example, a scenario meeting a request of the user does not exist in the scenario object DB 114. When the dialogue with the user has failed, the dialogue management unit 103 transfers the connection with the user to an operator. The dialogue management unit 103 can also transfer the connection with the user to the operator when a predetermined condition has occurred during a response. A dialogue between the user and the operator thus starts . The dialogue system 100 analyzes the dialogue between the user and the operator. The dialogue system 100

constructs a new scenario based on the analysis result so as to respond well to a similar request received later. The scenario construction processing is performed using the speech recognition unit 101, the spoken language

understanding unit 102, the dialogue extraction unit 105, the scenario construction unit 106, the scenario updating unit 107, the dictionary storage unit 108, the spoken language understanding model storage unit 109, the scenario storage unit 110, the dialogue log storage unit 111, the dialogue state display unit 112, and the scenario searching unit 113. A portion including these elements associated with the scenario construction processing will be referred to as a dialogue system construction support unit. The dialogue system construction support unit may be included in the dialogue system 100, as shown in FIG. 1, or provided outside the dialogue system 100. When the dialogue system construction support unit is included in the dialogue system 100, the speech recognition unit 101, the spoken language understanding unit 102, the dictionary storage unit 108, and the spoken language understanding model storage unit 109 can be shared by automatic response processing and the scenario construction processing.

The speech recognition unit 101 performs speech recognition for a plurality of utterances included in the dialogue between the user and the operator, and generates a plurality of texts corresponding to the plurality of utterances, respectively. That is, the speech recognition unit 101 converts the plurality of utterances into the plurality of texts by a speech recognition technology.

Based on each text generated by the speech recognition unit 101, the spoken language understanding unit 102 understands the intention of the utterance corresponding to the text. More specifically, the spoken language

understanding unit 102 performs morphological analysis of each text, thereby dividing the text into words on a morpheme basis. Next, referring to a dictionary stored in the dictionary storage unit 108, the spoken language understanding unit 102 assigns a semantic class

representing the meaning of a word to each of nouns, proper nouns, verbs, and unknown words by a named entity

extraction technology. A plurality of words are registered in the dictionary in association with semantic classes.

The spoken language understanding unit 102 understands the intention of an utterance by referring to a spoken language understanding model stored in the spoken language understanding model storage unit 109 using features such as morphemes, the semantic classes of words, and notations of words, and outputs a spoken language understanding result. Spoken language understanding models are generated by learning using semantic classes, words, and the like from a number of utterance samples as features . The spoken language understanding method is not limited to the example described here.

The dialogue extraction unit 105 receives the spoken language understanding result from the spoken language understanding unit 102, and detects an operation performed for the dialogue system 100 by the operator during a response as the action of the operator. The action can be detected based on information received from a computer terminal operated by the operator. More specifically, the dialogue extraction unit 105 can receive, from the computer terminal, information representing the contents of an action executed by the operator. The dialogue extraction unit 105 records the analysis result of the dialogue between the user and the operator and the action of the operator in the dialogue log storage unit 111 in

association with each other. The analysis result of the dialogue includes the speech recognition result and the spoken language understanding result concerning an

utterance of the user and the speech recognition result and the spoken language understanding result concerning an utterance of the operator.

The scenario construction unit 106 constructs a scenario by referring to the dialogue log storage unit 111, and stores the scenario in the scenario storage unit 110. The scenario updating unit 107 updates the scenario object DB 114 by referring to the scenario storage unit 110. More specifically, the scenario updating unit 107 converts a scenario stored in the scenario storage unit 110 into an object executable by the dialogue management unit 103, and adds it to the scenario object DB 114 at an arbitrary timing. For example, a scenario stored in the scenario storage unit 110 is a text-based scenario, and a scenario stored in the scenario object DB 114 is an object-based scenario. Note that a scenario stored in the scenario object DB 114 may be a text-based scenario.

The scenario searching unit 113 extracts a scenario feature word from the dialogue between the user and the operator, and selects, as a similar scenario, a scenario associated with the scenario feature word from the scenario storage unit 110. The scenario feature word will be described later. The dialogue state display unit 112 displays the similar scenario. The dialogue state display unit 112 also displays the analysis result of the dialogue between the user and the operator.

The operation of the dialogue system 100 will be described next.

FIG. 2 schematically shows the procedure of dialogue log recording of the dialogue system 100. Here, a detailed example will be explained using a dialogue shown in FIG. 3. In step S201 of FIG. 2, the dialogue between the user and the operator starts. At this time, the dialogue extraction unit 105 records a dialogue start label representing the start of the dialogue in the dialogue log storage unit 111.

In step S202, the user or operator utters. In the dialogue example of FIG. 3, the user first utters "Where can I pick up the rental car I have reserved earlier?" In step S203, the speech recognition unit 101 performs speech recognition for the utterance input in step S202. In the dialogue example of FIG. 3, a text "Where can I pick up the rental car I have reserved earlier?" can be obtained as a speech recognition result.

In step S204, the spoken language understanding unit 102 understands the intention of the utterance from the speech recognition result, and outputs a spoken language understanding result. The spoken language understanding result includes an utterance type, an intention tag, and a semantic class . The utterance type represents the role of the utterance in the dialogue. Examples of the utterance type are "request", "greeting", "question", "response", "proposal", "confirmation", and "answer", as shown in

FIG. 4. The utterance type is output in a form

understandable by the machine, for example, as an utterance type ID. The intention tag is information representing an intention such as "flight timetable display", "rental car search", "rental car location display", "hotel rate

search", or "hotel reservation", as shown in FIG. 5. The intention tag is output in a form understandable by the machine, for example, as an intention tag ID.

In step S205, the dialogue extraction unit 105

extracts any one piece of information out of the intention tag, attribute, attribute value, and action contents from the utterance input in step S202, and records the speech recognition result, the spoken language understanding result, and the extracted information in the dialogue log storage unit 111 in association with each other. The process of step S205 will be described later.

In step S206, it is determined whether the dialogue has ended. For example, when an utterance representing the end of the dialogue is detected or when the operator executes an action, it is determined that the dialogue has ended. If the dialogue continues, the process returns to step S202. When the process returns to step S202, the next utterance occurs. In the dialogue example of FIG. 3, the operator utters "Location to pick up the rental car?" The processes of steps S203, S204, and S205 are executed for this utterance. Similarly, the operator's utterance "At which airport are you?", the user's utterance "At OO

airport", the operator's utterance "Which airline did you use?", and the user's utterance "xx airline", and the operator's utterance "I will send the map of the location to pick up the rental car" are sequentially processed. The operator utters "I will send the map of the location to pick up the rental car" and simultaneously transmits the map data to the terminal of the user by operating the computer terminal. The dialogue extraction unit 105 detects the action of the operator based on the spoken language understanding result of the utterance "I will send the map of the location to pick up the rental car" . The dialogue extraction unit 105 acquires the contents of the action executed by the operator during the response, and records them in the dialogue log storage unit 111.

Examples of the action are "transfer to car rental

operator", "flight timetable display", "airport facility information display", and "rental car search", as shown in FIG. 6. Each action is associated with an action ID.

When the dialogue has ended, the process advances to step S207. In step S207, the dialogue extraction unit 105 determines that the dialogue between the user and the operator has ended, and records a dialogue end label representing the end of the dialogue in the dialogue log storage unit 111. In the dialogue log storage unit 111, a log concerning one dialogue is recorded between a dialogue start label and a dialogue end label. The dialogue log concerning one dialogue includes the analysis result of the dialogue, scenario feature words, intention tags,

attributes and their semantic classes, attribute values and their semantic classes, and action contents.

The process of step S205 will be described in more detail .

In step S205-1, if the type of the utterance input in step S202 is confirmation, the dialogue extraction unit 105 extracts a scenario feature word from this utterance and a counterpart utterance. More specifically, the dialogue extraction unit 105 extracts, as the scenario feature word, a word common to the utterance of confirmation of one party (for example, operator) and the immediately preceding utterance of the other party (for example, user) . In the dialogue example of FIG. 3, the utterance of confirmation is the operator's utterance "Location to pick up the rental car?" The utterance as the counterpart to this is the immediately preceding user's utterance "Where can I pick up the rental car I have reserved earlier?" The common words are "rental car" and "pick up". Hence, "rental car" and "pick up" are extracted as the scenario feature words.

In step S205-2, the dialogue extraction unit 105 determines whether the utterance type is question. If the utterance type is question, the process advances to step S205-3. Otherwise, the process advances to step S205-4. In step S205-4, the dialogue extraction unit 105 determines whether the utterance type is answer. If the utterance type is answer, the process advances to step S205-5.

Otherwise, the process advances to step S205-6. In step S205-6, the dialogue extraction unit 105 determines whether the utterance is associated with the action of the

operator. If the utterance is associated with the action, the process advances to step S205-8. Otherwise, the process advances to step S205-7.

The dialogue extraction unit 105 acquires the

attribute from the utterance of question (step S205-3) , and acquires an attribute value from the utterance of answer that is the counterpart of the utterance of question (step S205-5) . Semantic classes can be defined by hierarchically classifying meanings, as shown in FIG. 7. Note that the semantic classes need not always be expressed in the hierarchical structure. The attribute value is an argument used to attain the intention represented by the intention tag.

More specifically, the dialogue extraction unit 105 acquires, out of words having a semantic class common to the utterance of question and the utterance of answer, a word in the utterance of question as an attribute and a word in the utterance of answer as an attribute value . In the dialogue example of FIG. 3, the user's answer to the operator's question "At which airport are you?" is "At OO airport". A semantic class common to these utterances is "Location_STATION_AIR" . In the operator's utterance "At which airport are you?", the word having the semantic class "Location_STATION_AIR" is "airport", and "airport" is extracted as an attribute. In the user's utterance "At OO airport", the word having the semantic class

"Location_STATION_AIR" is "OO airport", and "OO airport" is extracted as an attribute value. In addition, the user's answer to the operator's question "Which airline did you use?" is "xx airline". A semantic class common to these utterances is "Organization_COMPANY_AIR" . In the operator's utterance "Which airline did you use?", the word having the semantic class "Organization_COMPANY_AIR" is "airline", and "airline" is extracted as an attribute. In the user's utterance "xx airline", the word having the semantic class "Organization_COMPA Y_AIR" is "xx airline", and "xx airline" is extracted as an attribute value. The set of attribute "airport", attribute value "OO airport", and semantic class "Location_STATION_AIR" and the set of attribute "airline", attribute value "xx airline", and semantic class "Organization_COMPANY_AIR" are obtained from the dialogue example shown in FIG. 3.

Note that the dialogue extraction unit 105 does not necessarily extract the same word that appears in both an utterance of confirmation and an utterance as the

counterpart to that utterance as a scenario feature word, as in the above-described example, and may extract the same word that appears in a pair of an operator ¹ s utterance and a user's utterance such as a pair of question and answer as a scenario feature word.

Upon detecting the action of the operator, the

dialogue extraction unit 105 acquires the action contents (step S205-8) . The action contents include an operation that the operator actually executed for the system. FIG. 8 shows an example of action contents obtained when the operator operates an application in association with the dialogue example shown in FIG. 3. The action contents shown in FIG. 8 represent sending of a map illustrating the location to pick up the rental car.

The dialogue extraction unit 105 acquires an intention tag from an utterance that is neither of utterances

associated with question, answer, and action (step S205-7) . This utterance is recorded in the dialogue log storage unit 111 as an utterance having an intention that does not contribute to attain the purpose of the dialogue .

FIG. 9 shows a dialogue log associated with the dialogue example shown in FIG. 3. Referring to FIG. 9, "START OPERATOR" is the dialogue start label, and "END OPERATOR" is the dialogue end label. Pieces of information about utterances and actions are recorded between the dialogue start label and the dialogue end label . In the example of FIG. 9, the log of an utterance is described using colon separation as utterance subj ect : utterance type : utterance contents : intention tag. The utterance contents include a speech recognition result, words, and their semantic classes. Each semantic class is described in parentheses immediately after a word. The log of an action is described using colon separation as action subj ect : action contents.

FIG. 10 schematically shows the processing procedure of constructing a scenario from a dialogue log. In step S301 of FIG. 10, the scenario construction unit 106 loads a dialogue log from the dialogue log storage unit 111, and extracts a dialogue start label and a dialogue end label concerning a scenario construction target dialogue from the loaded dialogue log. In step S302, the scenario

construction unit 106 loads the log between the dialogue start label and the dialogue end label. In step S303, the scenario construction unit 106 generates a set of "input", "operation", and "state" as a unit of the scenario. FIGS. 11A and 11B show examples of a scenario constructed based on the dialogue log shown in FIG. 9. The scenario shown in FIG. 11A includes three states. The scenario shown in FIG. 11B includes one state. An input includes an intention tag and an attribute. An operation includes an operation tag.

In step S304, the scenario construction unit 106 acquires a semantic class common to an utterance whose type is question and an utterance whose type is answer and the word of the semantic class. Here, "common" is used as a term that means "same" or "being in inclusion relation". The scenario construction unit 106 uses the acquired word or semantic class as the attribute of the input.

The process of step S304 will be described in more detail. In step S304-1, the scenario construction unit 106 acquires words from an utterance whose type is question as attribute candidates and stores them in a memory. If the type of the next utterance is answer, in step S304-2, the scenario construction unit 106 acquires words from the utterance as attribute candidates and holds them in the memory. In step S304-3, the semantic classes of the words acquired in steps S304-1 and S304 -2 are compared, and an attribute is obtained from words having a common semantic class. For example, "airport" is acquired as an attribute from the pair of the operator ¹ s utterance "At which airport are you?" and the user's utterance "At OO airport". Note that the attribute acquisition method may be the same as that described concerning the process of step S205-3. Two attributes, "airport" and "airline", are obtained from the dialogue log of FIG. 9.

When an attribute is obtained in step S304-3, the process advances to step S304-5. In step S304-5, the scenario construction unit 106 generates an input condition using the attribute obtained in step S304 -3. More

specifically, the scenario construction unit 106 registers the attribute in a scenario as an input attribute

corresponding to the intention tag of the latest utterance whose type is request.

If no attribute is obtained as in a case where the utterance next to the utterance of question is not answer, the process advances to step S304-4. In step S304-4, the scenario construction unit 106 determines whether the user has returned a question in response to the question of the operator. For example, in a dialogue example shown in FIG. 12, the user responds by "Urn? I don't know" to the operator's question "At which terminal are you?" If the type of an utterance to a question is not answer, as described above, the scenario construction unit 106

determines it as a redundant response in the scenario (or may determine it as another response type) , that is, determines that the efficiency is low, and sets the

evaluation of the scenario under construction low. In step S304-6, the scenario construction unit 106 waits for an utterance whose type is answer. Upon detecting an utterance whose type is answer, the scenario construction unit 106 acquires an attribute from the pair of the

utterance of question and the utterance of answer and generates an input condition based on the attribute.

If the type of the utterance next to the utterance of question is not answer in step S304 -4, the process advances to step S304-7. In step S304 -7, the spoken language understanding unit 102 acquires an intention tag, and

"input" is generated based on the intention tag as well as the input condition generated in step S304-5.

In step S305, the scenario construction unit 106 ends the load of the dialogue log. In step S306, the scenario construction unit 106 replaces the word included in the action content with a semantic class serving as a variable. In step S307, the scenario construction unit 106 stores the constructed scenario in the scenario storage unit 110. The scenario is stored in association with a scenario feature word so as to enable a search by the scenario feature word.

Note that the scenario can be constructed so as to faithfully reproduce the dialogue between the user and the operator, as in the example of FIG. 11A, or constructed so as to receive necessary attributes at once, as in the example of FIG. 11B.

The scenario updating unit 107 converts the scenario stored in the scenario storage unit 110 into an object executable by the dialogue management unit 103 and adds it to the scenario object DB 114. As for the timing, the updating may be done automatically or based on an operation by an administrator. Similar scenarios may simultaneously be constructed for a plurality of operators. As shown in FIG. 13, the scenario storage unit 110 stores each scenario in association with a scenario feature word, the number of states, the number of response steps, and the number of response failures. The number of response failures

represents the number of failures in response as in a case where the user has made an utterance other than answer to a question of the operator. The number of states, the number of response steps, and the number of response failures are examples of evaluation data to evaluate a scenario, which are used when the administrator of the dialogue system 100 decides whether to add the scenario to the scenario object DB 114. The scenario updating unit 107 can display the evaluation data together with the scenarios so that the administrator of the dialogue system 100 can select the scenarios to be added to the scenario object DB 114.

FIG. 14 shows a procedure of presenting a candidate of an action to be executed to the operator during a response. In step S401 of FIG. 14, the scenario searching unit 113 extracts one or more scenario feature words from the dialogue between the user and the operator during the response of the operator. More specifically, the scenario searching unit 113 extracts, as the scenario feature words, words common to an utterance whose type is confirmation and an utterance as the counterpart to it. In step S402, the scenario searching unit 113 searches the scenario storage unit 110 using the scenario feature words as search keys . In step S403, it is determined whether there exists a similar scenario that is a scenario in which all or some of the scenario feature words match. If a similar scenario exists, the process advances to step

5404. Otherwise, the processing ends.

In step S404, the scenario searching unit 113 acquires action contents included in the similar scenario. In step

5405, the scenario searching unit 113 displays the acquired action contents as an action candidate via the dialogue state display unit 112. The operator decides an action to be executed with reference to the displayed action

candidate .

It is possible to assist the operator by displaying the action candidate in this way.

FIG. 15 shows an example of contents displayed by the dialogue state display unit 112. The dialogue state display unit 112 includes a conversation monitor, a spoken language understanding monitor, and an operation monitor. The conversation monitor displays the speech recognition result for the dialogue between the user and the operator by the speech recognition unit 101. The spoken language understanding monitor displays the spoken language

understanding result for the dialogue between the user and the operator by the spoken language understanding unit 102. The operation monitor displays an action candidate acquired by the scenario searching unit 113. In the example of FIG. 15, three action candidates are displayed.

When the dialogue state display unit 112 is provided, the operator can visually confirm the request of the user. If there are inadequacies in the speech recognition result and the spoken language understanding result, the speech recognition result and the spoken language understanding result need to be corrected to construct a useful scenario. In the example of FIG. 15, spoken language understanding fails because of a recognition error in speech recognition. When the speech recognition result and the spoken language understanding result are presented to the operator during the response, the operator can correct the speech

recognition result and the spoken language understanding result .

As described above, according to this embodiment, a necessary scenario can easily be added to the dialogue system by constructing the scenario based on the dialogue log including the analysis result of the dialogue between the user and the operator and the action of the operator.

Note that the dialogue system 100 according to the embodiment can also be implemented by, for example, using a general-purpose computer apparatus as basic hardware. That is, the speech recognition unit 101, the spoken language understanding unit 102, the dialogue management unit 103, the response generation unit 104, the dialogue extraction unit 105, the scenario construction unit 106, the scenario updating unit 107, the dialogue state display unit 112, and the scenario searching unit 113 can be implemented by causing a processor included in the computer apparatus to execute a program. At this time, the dialogue system can be implemented by installing the program in the computer apparatus in advance or by distributing the program stored in a storage medium such as a CD-ROM or via a network and installing the program in the computer apparatus as needed. The dialogue log storage unit, the scenario storage unit, the dictionary storage unit, and the spoken language understanding model storage unit can be implemented using an internal or external memory of the computer apparatus, a hard disk, or a storage medium such as a CD-R, CD-RW,

DVD-RAM, or DVD-R as needed.

While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions.

Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.

Claims

C L A I M S

1. A dialogue system construction support apparatus comprising:

a speech recognition unit configured to perform speech recognition for utterances included in a dialogue between a user and an operator and generate a speech recognition result including texts corresponding to the utterances; a spoken language understanding unit configured to understand intentions of the utterances based on the texts and obtain a spoken language understanding result including types of the utterances, the intentions of the utterances, words included in the texts, and semantic classes of the words ;

a dialogue information storage unit configured to store the speech recognition result, the spoken language understanding result, and an action executed by the

operator concerning the dialogue in association with each other; and

a scenario construction unit configured to acquire, as an attribute, a word having a semantic class common to an utterance of question and an utterance of answer included in the dialogue and construct a scenario using the

attribute and the action.

2. The apparatus according to claim 1, further comprising a dialogue extraction unit configured to

extract, as a scenario feature word, a common word that appears in a pair of an utterance of the operator and an utterance of the user included in the dialogue.

3. The apparatus according to claim 2, wherein the dialogue extraction unit extracts, as the scenario feature word, a word common to an utterance of confirmation and an utterance as a counterpart to the utterance of

confirmation, which are included in the dialogue.

4. The apparatus according to claim 2, further comprising:

a scenario storage unit configured to store scenarios in association with scenario feature words;

a scenario searching unit configured to search the scenario storage unit to acquire, as a similar scenario, a scenario associated with the scenario feature word

extracted by the dialogue extraction unit; and

a display unit configured to display an action included in the similar scenario.

5. The apparatus according to claim 1, further comprising a display unit configured to display the speech recognition result and the spoken language understanding result .

6. The apparatus according to claim 1, further comprising a scenario updating unit configured to add the scenario to a database of a dialogue system.

7. The apparatus according to claim 6, wherein the scenario construction unit generates evaluation data to evaluate the scenario, and

the scenario updating unit displays the scenario together with the evaluation data so that whether to add the scenario to the database is configured to be selected.

8. A dialogue system construction support method comprising :

performing speech recognition for utterances included in a dialogue between a user and an operator and generating a speech recognition result including texts corresponding to the utterances ;

understanding intentions of the utterances based on the texts and obtaining a spoken language understanding result including types of the utterances, the intentions of the utterances, words included in the texts, and semantic classes of the words;

storing the speech recognition result, the spoken language understanding result, and an action executed by the operator concerning the dialogue in association with each other; and

acquiring, as an attribute, a word having a semantic class common to an utterance of question and an utterance of answer included in the dialogue and constructing a scenario using the attribute and the action.

9. A non-transitory computer readable medium

including computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform a method comprising:

understanding intentions of the utterances based on the texts and obtaining a spoken language understanding result including types of the utterances, the intentions o the utterances, words included in the texts, and semantic classes of the words;