WO2015141700A1 - Dialogue system construction support apparatus and method - Google Patents
Dialogue system construction support apparatus and method Download PDFInfo
- Publication number
- WO2015141700A1 WO2015141700A1 PCT/JP2015/057970 JP2015057970W WO2015141700A1 WO 2015141700 A1 WO2015141700 A1 WO 2015141700A1 JP 2015057970 W JP2015057970 W JP 2015057970W WO 2015141700 A1 WO2015141700 A1 WO 2015141700A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dialogue
- scenario
- utterance
- utterances
- speech recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Definitions
- Embodiments described herein relate generally to a dialogue system construction support apparatus and method.
- a dialogue system such as an interactive voice response apparatus that automatically responds to an utterance of a user.
- Such a dialogue system responds in accordance with a scenario constructed in advance.
- the dialogue system may fail in responding because of a
- JP-B 4901738 discloses an automated response system that performs learning using a conversation set between an agent and a user.
- JP-B 4901738 discloses an automated response system that performs learning using a conversation set between an agent and a user.
- FIG. 1 is a block diagram schematically showing a dialogue system according to an embodiment
- FIG. 2 is a flowchart showing an example of the procedure of dialogue log recording according to the embodiment ;
- FIG. 3 is a view showing an example of a dialogue between a user and an operator
- FIG. 4 is a view showing examples of an utterance type
- FIG. 5 is a view showing examples of an intention tag
- FIG. 6 is a view showing examples of an action
- FIG. 7 is a view showing examples of a semantic class
- FIG. 8 is a view showing an example of action
- FIG. 9 is a view showing a dialogue log concerning the dialogue shown in FIG . 3 ;
- FIG. 10 is a flowchart showing an example of the procedure of scenario construction according to the
- FIG. 11A and FIG. 11B are views showing examples of a scenario constructed from the dialogue log shown in FIG. 9;
- FIG. 12 is a view showing an example of a dialogue between the user and the operator, in which the user makes an utterance other than an answer in response to a question of the operator;
- FIG. 13 is a view showing evaluation data used by a scenario construction unit shown in FIG. 1 to evaluate a scenario
- FIG. 14 is a flowchart showing an example of the procedure of action candidate display according to the embodiment .
- FIG. 15 is a view showing an example of contents displayed by a dialogue state display unit shown in FIG. 1.
- a dialogue system According to an embodiment, a dialogue system
- construction support apparatus includes a speech
- the speech recognition unit is
- the spoken language understanding unit is configured to understand intentions of the utterances based on the texts and obtain a spoken language understanding result including types of the utterances, the intentions of the utterances, words
- the dialogue information storage unit is configured to store the speech recognition result, the spoken language understanding result, and an action executed by the
- the scenario construction unit is configured to acquire, as an attribute, a word having a semantic class common to an utterance of question and an utterance of answer included in the dialogue and construct a scenario using the attribute and the action.
- the embodiments are directed to a dialogue system that automatically responds to an utterance of a user.
- This dialogue system is used in, for example, a contact center.
- the dialogue system selects a scenario meeting the utterance of the user from scenarios (dialogue scenarios) registered in advance and responds in accordance with the scenario.
- scenarios dialogue scenarios
- an operator responds via a dialogue with the user.
- the dialogue system can construct a new scenario based on the dialogue between the user and the operator and an action of the operator.
- the dialogue system can respond well to a similar request received later.
- the scenario construction cost can be reduced. It is also possible to decrease the necessary number of operators.
- FIG. 1 schematically shows a dialogue system 100 according to an embodiment.
- the dialogue system 100 includes a speech recognition unit 101, a spoken language understanding unit 102, a dialogue management unit 103, a response generation unit 104, a dialogue extraction unit 105, a scenario construction unit 106, a scenario updating unit 107, a dictionary storage unit 108, a spoken language understanding model storage unit 109, a scenario storage unit 110, a dialogue log storage unit (also called a dialogue information storage unit) 111, a dialogue state display unit 112, a scenario searching unit 113, and a scenario object database (DB) 114.
- DB scenario object database
- Automatic response processing of the dialogue system 100 will briefly be explained first.
- the user communicates with the dialogue system 100 via a network using a terminal such as a mobile-phone or a smartphone.
- the dialogue system 100 provides a service to the terminal via the network by the automatic response processing.
- the dialogue system 100 transmits, to the
- the dialogue system 100 executes the automatic
- the speech recognition unit 101 performs speech recognition for the utterance of the user, and generates a natural language text (to be simply referred to as a text hereinafter) corresponding to the utterance.
- the spoken language understanding unit 102 analyzes the text by referring to the dictionary storage unit 108 and the spoken language understanding model storage unit 109 so as to understand the intention of the utterance, and outputs the spoken language understanding result.
- the management unit 103 selects a scenario corresponding to the spoken language understanding result from the scenario object DB 114, and executes an action (for example,
- the response generation unit 104 generates a response sentence corresponding to the action executed by the dialogue management unit 103.
- the response sentence is converted into speech by a speech synthesis technology and output .
- the dialogue with the user may fail because, for example, a scenario meeting a request of the user does not exist in the scenario object DB 114.
- the dialogue management unit 103 transfers the connection with the user to an operator.
- the dialogue management unit 103 can also transfer the connection with the user to the operator when a predetermined condition has occurred during a response. A dialogue between the user and the operator thus starts .
- the dialogue system 100 analyzes the dialogue between the user and the operator.
- the dialogue system 100 analyzes the dialogue between the user and the operator.
- the scenario construction processing is performed using the speech recognition unit 101, the spoken language
- the dialogue system construction support unit may be included in the dialogue system 100, as shown in FIG. 1, or provided outside the dialogue system 100.
- the speech recognition unit 101, the spoken language understanding unit 102, the dictionary storage unit 108, and the spoken language understanding model storage unit 109 can be shared by automatic response processing and the scenario construction processing.
- the speech recognition unit 101 performs speech recognition for a plurality of utterances included in the dialogue between the user and the operator, and generates a plurality of texts corresponding to the plurality of utterances, respectively. That is, the speech recognition unit 101 converts the plurality of utterances into the plurality of texts by a speech recognition technology.
- the spoken language understanding unit 102 Based on each text generated by the speech recognition unit 101, the spoken language understanding unit 102 understands the intention of the utterance corresponding to the text. More specifically, the spoken language
- the understanding unit 102 performs morphological analysis of each text, thereby dividing the text into words on a morpheme basis. Next, referring to a dictionary stored in the dictionary storage unit 108, the spoken language understanding unit 102 assigns a semantic class
- a plurality of words are registered in the dictionary in association with semantic classes.
- the spoken language understanding unit 102 understands the intention of an utterance by referring to a spoken language understanding model stored in the spoken language understanding model storage unit 109 using features such as morphemes, the semantic classes of words, and notations of words, and outputs a spoken language understanding result.
- Spoken language understanding models are generated by learning using semantic classes, words, and the like from a number of utterance samples as features .
- the spoken language understanding method is not limited to the example described here.
- the dialogue extraction unit 105 receives the spoken language understanding result from the spoken language understanding unit 102, and detects an operation performed for the dialogue system 100 by the operator during a response as the action of the operator. The action can be detected based on information received from a computer terminal operated by the operator. More specifically, the dialogue extraction unit 105 can receive, from the computer terminal, information representing the contents of an action executed by the operator. The dialogue extraction unit 105 records the analysis result of the dialogue between the user and the operator and the action of the operator in the dialogue log storage unit 111 in
- the analysis result of the dialogue includes the speech recognition result and the spoken language understanding result concerning an
- the scenario construction unit 106 constructs a scenario by referring to the dialogue log storage unit 111, and stores the scenario in the scenario storage unit 110.
- the scenario updating unit 107 updates the scenario object DB 114 by referring to the scenario storage unit 110. More specifically, the scenario updating unit 107 converts a scenario stored in the scenario storage unit 110 into an object executable by the dialogue management unit 103, and adds it to the scenario object DB 114 at an arbitrary timing.
- a scenario stored in the scenario storage unit 110 is a text-based scenario
- a scenario stored in the scenario object DB 114 is an object-based scenario.
- a scenario stored in the scenario object DB 114 may be a text-based scenario.
- the scenario searching unit 113 extracts a scenario feature word from the dialogue between the user and the operator, and selects, as a similar scenario, a scenario associated with the scenario feature word from the scenario storage unit 110.
- the scenario feature word will be described later.
- the dialogue state display unit 112 displays the similar scenario.
- the dialogue state display unit 112 also displays the analysis result of the dialogue between the user and the operator.
- FIG. 2 schematically shows the procedure of dialogue log recording of the dialogue system 100.
- a detailed example will be explained using a dialogue shown in FIG. 3.
- the dialogue extraction unit 105 records a dialogue start label representing the start of the dialogue in the dialogue log storage unit 111.
- step S202 the user or operator utters.
- the user first utters "Where can I pick up the rental car I have reserved earlier?"
- step S203 the speech recognition unit 101 performs speech recognition for the utterance input in step S202.
- a text "Where can I pick up the rental car I have reserved earlier?" can be obtained as a speech recognition result.
- the spoken language understanding unit 102 understands the intention of the utterance from the speech recognition result, and outputs a spoken language understanding result.
- the spoken language understanding result includes an utterance type, an intention tag, and a semantic class .
- the utterance type represents the role of the utterance in the dialogue. Examples of the utterance type are "request”, “greeting”, “question”, “response”, “proposal”, “confirmation”, and “answer”, as shown in
- FIG. 4 The utterance type is output in a form
- the intention tag is information representing an intention such as "flight timetable display”, “rental car search”, “rental car location display”, “hotel rate
- the intention tag is output in a form understandable by the machine, for example, as an intention tag ID.
- step S205 the dialogue extraction unit 105
- step S205 extracts any one piece of information out of the intention tag, attribute, attribute value, and action contents from the utterance input in step S202, and records the speech recognition result, the spoken language understanding result, and the extracted information in the dialogue log storage unit 111 in association with each other.
- the process of step S205 will be described later.
- step S206 it is determined whether the dialogue has ended. For example, when an utterance representing the end of the dialogue is detected or when the operator executes an action, it is determined that the dialogue has ended. If the dialogue continues, the process returns to step S202. When the process returns to step S202, the next utterance occurs.
- the operator utters "Location to pick up the rental car?”
- the processes of steps S203, S204, and S205 are executed for this utterance.
- the operator's utterance "At which airport are you?" the user's utterance "At OO
- the dialogue extraction unit 105 detects the action of the operator based on the spoken language understanding result of the utterance "I will send the map of the location to pick up the rental car” .
- the dialogue extraction unit 105 acquires the contents of the action executed by the operator during the response, and records them in the dialogue log storage unit 111.
- Each action is associated with an action ID.
- step S207 the dialogue extraction unit 105 determines that the dialogue between the user and the operator has ended, and records a dialogue end label representing the end of the dialogue in the dialogue log storage unit 111.
- a log concerning one dialogue is recorded between a dialogue start label and a dialogue end label.
- the dialogue log concerning one dialogue includes the analysis result of the dialogue, scenario feature words, intention tags,
- step S205 The process of step S205 will be described in more detail .
- step S205-1 if the type of the utterance input in step S202 is confirmation, the dialogue extraction unit 105 extracts a scenario feature word from this utterance and a counterpart utterance. More specifically, the dialogue extraction unit 105 extracts, as the scenario feature word, a word common to the utterance of confirmation of one party (for example, operator) and the immediately preceding utterance of the other party (for example, user) .
- the utterance of confirmation is the operator's utterance "Location to pick up the rental car?”
- the utterance as the counterpart to this is the immediately preceding user's utterance "Where can I pick up the rental car I have reserved earlier?"
- the common words are "rental car” and "pick up”. Hence, "rental car” and "pick up” are extracted as the scenario feature words.
- step S205-2 the dialogue extraction unit 105 determines whether the utterance type is question. If the utterance type is question, the process advances to step S205-3. Otherwise, the process advances to step S205-4. In step S205-4, the dialogue extraction unit 105 determines whether the utterance type is answer. If the utterance type is answer, the process advances to step S205-5.
- step S205-6 the dialogue extraction unit 105 determines whether the utterance is associated with the action of the
- step S205-8 If the utterance is associated with the action, the process advances to step S205-8. Otherwise, the process advances to step S205-7.
- the dialogue extraction unit 105 acquires the
- Semantic classes can be defined by hierarchically classifying meanings, as shown in FIG. 7. Note that the semantic classes need not always be expressed in the hierarchical structure.
- the attribute value is an argument used to attain the intention represented by the intention tag.
- the dialogue extraction unit 105 acquires, out of words having a semantic class common to the utterance of question and the utterance of answer, a word in the utterance of question as an attribute and a word in the utterance of answer as an attribute value .
- the user's answer to the operator's question "At which airport are you?" is "At OO airport”.
- a semantic class common to these utterances is "Location_STATION_AIR” .
- the word having the semantic class "Location_STATION_AIR” is "airport"
- "airport" is extracted as an attribute.
- the word having the semantic class having the semantic class of the semantic class of the user's utterance "At OO airport” the word having the semantic class
- the dialogue extraction unit 105 does not necessarily extract the same word that appears in both an utterance of confirmation and an utterance as the
- the dialogue extraction unit 105 acquires the action contents (step S205-8) .
- the action contents include an operation that the operator actually executed for the system.
- FIG. 8 shows an example of action contents obtained when the operator operates an application in association with the dialogue example shown in FIG. 3.
- the action contents shown in FIG. 8 represent sending of a map illustrating the location to pick up the rental car.
- the dialogue extraction unit 105 acquires an intention tag from an utterance that is neither of utterances
- This utterance is recorded in the dialogue log storage unit 111 as an utterance having an intention that does not contribute to attain the purpose of the dialogue .
- FIG. 9 shows a dialogue log associated with the dialogue example shown in FIG. 3.
- "START OPERATOR” is the dialogue start label
- "END OPERATOR” is the dialogue end label.
- Pieces of information about utterances and actions are recorded between the dialogue start label and the dialogue end label .
- the log of an utterance is described using colon separation as utterance subj ect : utterance type : utterance contents : intention tag.
- the utterance contents include a speech recognition result, words, and their semantic classes. Each semantic class is described in parentheses immediately after a word.
- the log of an action is described using colon separation as action subj ect : action contents.
- FIG. 10 schematically shows the processing procedure of constructing a scenario from a dialogue log.
- the scenario construction unit 106 loads a dialogue log from the dialogue log storage unit 111, and extracts a dialogue start label and a dialogue end label concerning a scenario construction target dialogue from the loaded dialogue log.
- the scenario construction unit 106 loads a dialogue log from the dialogue log storage unit 111, and extracts a dialogue start label and a dialogue end label concerning a scenario construction target dialogue from the loaded dialogue log.
- step S302 the scenario
- FIGS. 11A and 11B show examples of a scenario constructed based on the dialogue log shown in FIG. 9.
- the scenario shown in FIG. 11A includes three states.
- the scenario shown in FIG. 11B includes one state.
- An input includes an intention tag and an attribute.
- An operation includes an operation tag.
- step S304 the scenario construction unit 106 acquires a semantic class common to an utterance whose type is question and an utterance whose type is answer and the word of the semantic class.
- “common” is used as a term that means “same” or "being in inclusion relation”.
- the scenario construction unit 106 uses the acquired word or semantic class as the attribute of the input.
- step S304 the scenario construction unit 106 acquires words from an utterance whose type is question as attribute candidates and stores them in a memory. If the type of the next utterance is answer, in step S304-2, the scenario construction unit 106 acquires words from the utterance as attribute candidates and holds them in the memory.
- step S304-3 the semantic classes of the words acquired in steps S304-1 and S304 -2 are compared, and an attribute is obtained from words having a common semantic class. For example, "airport" is acquired as an attribute from the pair of the operator 1 s utterance "At which airport are you?" and the user's utterance "At OO airport". Note that the attribute acquisition method may be the same as that described concerning the process of step S205-3. Two attributes, "airport" and "airline”, are obtained from the dialogue log of FIG. 9.
- step S304-3 When an attribute is obtained in step S304-3, the process advances to step S304-5.
- step S304-5 the scenario construction unit 106 generates an input condition using the attribute obtained in step S304 -3. More
- the scenario construction unit 106 registers the attribute in a scenario as an input attribute
- step S304-4 the scenario construction unit 106 determines whether the user has returned a question in response to the question of the operator. For example, in a dialogue example shown in FIG. 12, the user responds by "Urn? I don't know” to the operator's question "At which terminal are you?" If the type of an utterance to a question is not answer, as described above, the scenario construction unit 106
- step S304-6 the scenario construction unit 106 waits for an utterance whose type is answer. Upon detecting an utterance whose type is answer, the scenario construction unit 106 acquires an attribute from the pair of the
- step S304-7 the spoken language understanding unit 102 acquires an intention tag
- step S305 the scenario construction unit 106 ends the load of the dialogue log.
- step S306 the scenario construction unit 106 replaces the word included in the action content with a semantic class serving as a variable.
- step S307 the scenario construction unit 106 stores the constructed scenario in the scenario storage unit 110.
- the scenario is stored in association with a scenario feature word so as to enable a search by the scenario feature word.
- scenario can be constructed so as to faithfully reproduce the dialogue between the user and the operator, as in the example of FIG. 11A, or constructed so as to receive necessary attributes at once, as in the example of FIG. 11B.
- the scenario updating unit 107 converts the scenario stored in the scenario storage unit 110 into an object executable by the dialogue management unit 103 and adds it to the scenario object DB 114. As for the timing, the updating may be done automatically or based on an operation by an administrator. Similar scenarios may simultaneously be constructed for a plurality of operators. As shown in FIG. 13, the scenario storage unit 110 stores each scenario in association with a scenario feature word, the number of states, the number of response steps, and the number of response failures. The number of response failures
- the scenario updating unit 107 can display the evaluation data together with the scenarios so that the administrator of the dialogue system 100 can select the scenarios to be added to the scenario object DB 114.
- FIG. 14 shows a procedure of presenting a candidate of an action to be executed to the operator during a response.
- the scenario searching unit 113 extracts one or more scenario feature words from the dialogue between the user and the operator during the response of the operator. More specifically, the scenario searching unit 113 extracts, as the scenario feature words, words common to an utterance whose type is confirmation and an utterance as the counterpart to it.
- the scenario searching unit 113 searches the scenario storage unit 110 using the scenario feature words as search keys .
- the scenario searching unit 113 searches the scenario storage unit
- step S404 the scenario searching unit 113 acquires action contents included in the similar scenario.
- the scenario searching unit 113 displays the acquired action contents as an action candidate via the dialogue state display unit 112. The operator decides an action to be executed with reference to the displayed action
- FIG. 15 shows an example of contents displayed by the dialogue state display unit 112.
- the dialogue state display unit 112 includes a conversation monitor, a spoken language understanding monitor, and an operation monitor.
- the conversation monitor displays the speech recognition result for the dialogue between the user and the operator by the speech recognition unit 101.
- the spoken language understanding monitor displays the spoken language
- the operation monitor displays an action candidate acquired by the scenario searching unit 113. In the example of FIG. 15, three action candidates are displayed.
- the operator can visually confirm the request of the user. If there are inadequacies in the speech recognition result and the spoken language understanding result, the speech recognition result and the spoken language understanding result need to be corrected to construct a useful scenario.
- spoken language understanding fails because of a recognition error in speech recognition.
- a necessary scenario can easily be added to the dialogue system by constructing the scenario based on the dialogue log including the analysis result of the dialogue between the user and the operator and the action of the operator.
- the dialogue system 100 can also be implemented by, for example, using a general-purpose computer apparatus as basic hardware. That is, the speech recognition unit 101, the spoken language understanding unit 102, the dialogue management unit 103, the response generation unit 104, the dialogue extraction unit 105, the scenario construction unit 106, the scenario updating unit 107, the dialogue state display unit 112, and the scenario searching unit 113 can be implemented by causing a processor included in the computer apparatus to execute a program.
- the dialogue system can be implemented by installing the program in the computer apparatus in advance or by distributing the program stored in a storage medium such as a CD-ROM or via a network and installing the program in the computer apparatus as needed.
- the dialogue log storage unit, the scenario storage unit, the dictionary storage unit, and the spoken language understanding model storage unit can be implemented using an internal or external memory of the computer apparatus, a hard disk, or a storage medium such as a CD-R, CD-RW,
- DVD-RAM DVD-RAM, or DVD-R as needed.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
According to an embodiment, a dialogue system construction support apparatus includes the following units. The speech recognition unit performs speech recognition for utterances included in a dialogue to generate texts. The spoken language understanding unit understands intentions of the utterances based on the texts and obtain a spoken language understanding result including types of the utterances, the intentions of the utterances, words included in the texts, and semantic classes of the words. The scenario construction unit acquires, as an attribute, a word having a semantic class common to an utterance of question and an utterance of answer and constructs a scenario using the attribute and an action executed by the operator concerning the dialogue.
Description
D E S C R I P T I O N
DIALOGUE SYSTEM CONSTRUCTION SUPPORT APPARATUS AND METHOD
Cross-Reference to Related Applications This application is based upon and claims the benefit of priority from Japanese Patent Application
No. 2014-054491, filed March 18, 2014, the entire contents of which are incorporated herein by reference.
Field
Embodiments described herein relate generally to a dialogue system construction support apparatus and method.
Background
There exists a dialogue system such as an interactive voice response apparatus that automatically responds to an utterance of a user. Such a dialogue system responds in accordance with a scenario constructed in advance. The dialogue system may fail in responding because of a
scenario that does not meet a request of a user. In this case, an operator responds to the user. To respond well to a similar request received later, a new scenario needs to be added to the dialogue system.
It is necessary to construct a scenario from the dialogue between the user and the operator concerning the response failure of the dialogue system. JP-B 4901738 discloses an automated response system that performs learning using a conversation set between an agent and a
user. However, there is no technique of constructing a scenario from a dialogue between a user and an operator in consideration of an action the operator has taken during the response .
Brief Description of Drawings
FIG. 1 is a block diagram schematically showing a dialogue system according to an embodiment;
FIG. 2 is a flowchart showing an example of the procedure of dialogue log recording according to the embodiment ;
FIG. 3 is a view showing an example of a dialogue between a user and an operator;
FIG. 4 is a view showing examples of an utterance type;
FIG. 5 is a view showing examples of an intention tag; FIG. 6 is a view showing examples of an action;
FIG. 7 is a view showing examples of a semantic class; FIG. 8 is a view showing an example of action
contents ;
FIG. 9 is a view showing a dialogue log concerning the dialogue shown in FIG . 3 ;
FIG. 10 is a flowchart showing an example of the procedure of scenario construction according to the
embodiment ;
FIG. 11A and FIG. 11B are views showing examples of a scenario constructed from the dialogue log shown in FIG. 9;
FIG. 12 is a view showing an example of a dialogue
between the user and the operator, in which the user makes an utterance other than an answer in response to a question of the operator;
FIG. 13 is a view showing evaluation data used by a scenario construction unit shown in FIG. 1 to evaluate a scenario;
FIG. 14 is a flowchart showing an example of the procedure of action candidate display according to the embodiment ; and
FIG. 15 is a view showing an example of contents displayed by a dialogue state display unit shown in FIG. 1.
Detailed Description
According to an embodiment, a dialogue system
construction support apparatus includes a speech
recognition unit, a spoken language understanding unit, a dialogue information storage unit, and a scenario
construction unit. The speech recognition unit is
configured to perform speech recognition for utterances included in a dialogue between a user and an operator and generate a speech recognition result including texts corresponding to the utterances . The spoken language understanding unit is configured to understand intentions of the utterances based on the texts and obtain a spoken language understanding result including types of the utterances, the intentions of the utterances, words
included in the texts, and semantic classes of the words. The dialogue information storage unit is configured to
store the speech recognition result, the spoken language understanding result, and an action executed by the
operator concerning the dialogue in association with each other. The scenario construction unit is configured to acquire, as an attribute, a word having a semantic class common to an utterance of question and an utterance of answer included in the dialogue and construct a scenario using the attribute and the action.
Hereinafter, embodiments will be described with reference to the accompanying drawings . The embodiments are directed to a dialogue system that automatically responds to an utterance of a user. This dialogue system is used in, for example, a contact center. The dialogue system selects a scenario meeting the utterance of the user from scenarios (dialogue scenarios) registered in advance and responds in accordance with the scenario. When the dialogue system has failed in responding, an operator responds via a dialogue with the user. The dialogue system can construct a new scenario based on the dialogue between the user and the operator and an action of the operator. As a result, the dialogue system can respond well to a similar request received later. In addition, the scenario construction cost can be reduced. It is also possible to decrease the necessary number of operators.
FIG. 1 schematically shows a dialogue system 100 according to an embodiment. As shown in FIG. 1, the dialogue system 100 includes a speech recognition unit 101,
a spoken language understanding unit 102, a dialogue management unit 103, a response generation unit 104, a dialogue extraction unit 105, a scenario construction unit 106, a scenario updating unit 107, a dictionary storage unit 108, a spoken language understanding model storage unit 109, a scenario storage unit 110, a dialogue log storage unit (also called a dialogue information storage unit) 111, a dialogue state display unit 112, a scenario searching unit 113, and a scenario object database (DB) 114.
Automatic response processing of the dialogue system 100 will briefly be explained first. For example, the user communicates with the dialogue system 100 via a network using a terminal such as a mobile-phone or a smartphone. The dialogue system 100 provides a service to the terminal via the network by the automatic response processing. For example, the dialogue system 100 transmits, to the
terminal, data of a map illustrating the destination of the user, as in an example to be described later.
The dialogue system 100 executes the automatic
response processing using the speech recognition unit 101, the spoken language understanding unit 102, the dialogue management unit 103, the response generation unit 104, the dictionary storage unit 108, the spoken language
understanding model storage unit 109, and the scenario object DB 114. The speech recognition unit 101 performs speech recognition for the utterance of the user, and
generates a natural language text (to be simply referred to as a text hereinafter) corresponding to the utterance. The spoken language understanding unit 102 analyzes the text by referring to the dictionary storage unit 108 and the spoken language understanding model storage unit 109 so as to understand the intention of the utterance, and outputs the spoken language understanding result. The dialogue
management unit 103 selects a scenario corresponding to the spoken language understanding result from the scenario object DB 114, and executes an action (for example,
transmission of map data) defined in the selected scenario. The response generation unit 104 generates a response sentence corresponding to the action executed by the dialogue management unit 103. The response sentence is converted into speech by a speech synthesis technology and output .
Scenario construction processing of the dialogue system 100 will be described next.
In the dialogue system 100, the dialogue with the user may fail because, for example, a scenario meeting a request of the user does not exist in the scenario object DB 114. When the dialogue with the user has failed, the dialogue management unit 103 transfers the connection with the user to an operator. The dialogue management unit 103 can also transfer the connection with the user to the operator when a predetermined condition has occurred during a response. A dialogue between the user and the operator thus starts .
The dialogue system 100 analyzes the dialogue between the user and the operator. The dialogue system 100
constructs a new scenario based on the analysis result so as to respond well to a similar request received later. The scenario construction processing is performed using the speech recognition unit 101, the spoken language
understanding unit 102, the dialogue extraction unit 105, the scenario construction unit 106, the scenario updating unit 107, the dictionary storage unit 108, the spoken language understanding model storage unit 109, the scenario storage unit 110, the dialogue log storage unit 111, the dialogue state display unit 112, and the scenario searching unit 113. A portion including these elements associated with the scenario construction processing will be referred to as a dialogue system construction support unit. The dialogue system construction support unit may be included in the dialogue system 100, as shown in FIG. 1, or provided outside the dialogue system 100. When the dialogue system construction support unit is included in the dialogue system 100, the speech recognition unit 101, the spoken language understanding unit 102, the dictionary storage unit 108, and the spoken language understanding model storage unit 109 can be shared by automatic response processing and the scenario construction processing.
The speech recognition unit 101 performs speech recognition for a plurality of utterances included in the dialogue between the user and the operator, and generates a
plurality of texts corresponding to the plurality of utterances, respectively. That is, the speech recognition unit 101 converts the plurality of utterances into the plurality of texts by a speech recognition technology.
Based on each text generated by the speech recognition unit 101, the spoken language understanding unit 102 understands the intention of the utterance corresponding to the text. More specifically, the spoken language
understanding unit 102 performs morphological analysis of each text, thereby dividing the text into words on a morpheme basis. Next, referring to a dictionary stored in the dictionary storage unit 108, the spoken language understanding unit 102 assigns a semantic class
representing the meaning of a word to each of nouns, proper nouns, verbs, and unknown words by a named entity
extraction technology. A plurality of words are registered in the dictionary in association with semantic classes.
The spoken language understanding unit 102 understands the intention of an utterance by referring to a spoken language understanding model stored in the spoken language understanding model storage unit 109 using features such as morphemes, the semantic classes of words, and notations of words, and outputs a spoken language understanding result. Spoken language understanding models are generated by learning using semantic classes, words, and the like from a number of utterance samples as features . The spoken language understanding method is not limited to the example
described here.
The dialogue extraction unit 105 receives the spoken language understanding result from the spoken language understanding unit 102, and detects an operation performed for the dialogue system 100 by the operator during a response as the action of the operator. The action can be detected based on information received from a computer terminal operated by the operator. More specifically, the dialogue extraction unit 105 can receive, from the computer terminal, information representing the contents of an action executed by the operator. The dialogue extraction unit 105 records the analysis result of the dialogue between the user and the operator and the action of the operator in the dialogue log storage unit 111 in
association with each other. The analysis result of the dialogue includes the speech recognition result and the spoken language understanding result concerning an
utterance of the user and the speech recognition result and the spoken language understanding result concerning an utterance of the operator.
The scenario construction unit 106 constructs a scenario by referring to the dialogue log storage unit 111, and stores the scenario in the scenario storage unit 110. The scenario updating unit 107 updates the scenario object DB 114 by referring to the scenario storage unit 110. More specifically, the scenario updating unit 107 converts a scenario stored in the scenario storage unit 110 into an
object executable by the dialogue management unit 103, and adds it to the scenario object DB 114 at an arbitrary timing. For example, a scenario stored in the scenario storage unit 110 is a text-based scenario, and a scenario stored in the scenario object DB 114 is an object-based scenario. Note that a scenario stored in the scenario object DB 114 may be a text-based scenario.
The scenario searching unit 113 extracts a scenario feature word from the dialogue between the user and the operator, and selects, as a similar scenario, a scenario associated with the scenario feature word from the scenario storage unit 110. The scenario feature word will be described later. The dialogue state display unit 112 displays the similar scenario. The dialogue state display unit 112 also displays the analysis result of the dialogue between the user and the operator.
The operation of the dialogue system 100 will be described next.
FIG. 2 schematically shows the procedure of dialogue log recording of the dialogue system 100. Here, a detailed example will be explained using a dialogue shown in FIG. 3. In step S201 of FIG. 2, the dialogue between the user and the operator starts. At this time, the dialogue extraction unit 105 records a dialogue start label representing the start of the dialogue in the dialogue log storage unit 111.
In step S202, the user or operator utters. In the dialogue example of FIG. 3, the user first utters "Where
can I pick up the rental car I have reserved earlier?" In step S203, the speech recognition unit 101 performs speech recognition for the utterance input in step S202. In the dialogue example of FIG. 3, a text "Where can I pick up the rental car I have reserved earlier?" can be obtained as a speech recognition result.
In step S204, the spoken language understanding unit 102 understands the intention of the utterance from the speech recognition result, and outputs a spoken language understanding result. The spoken language understanding result includes an utterance type, an intention tag, and a semantic class . The utterance type represents the role of the utterance in the dialogue. Examples of the utterance type are "request", "greeting", "question", "response", "proposal", "confirmation", and "answer", as shown in
FIG. 4. The utterance type is output in a form
understandable by the machine, for example, as an utterance type ID. The intention tag is information representing an intention such as "flight timetable display", "rental car search", "rental car location display", "hotel rate
search", or "hotel reservation", as shown in FIG. 5. The intention tag is output in a form understandable by the machine, for example, as an intention tag ID.
In step S205, the dialogue extraction unit 105
extracts any one piece of information out of the intention tag, attribute, attribute value, and action contents from the utterance input in step S202, and records the speech
recognition result, the spoken language understanding result, and the extracted information in the dialogue log storage unit 111 in association with each other. The process of step S205 will be described later.
In step S206, it is determined whether the dialogue has ended. For example, when an utterance representing the end of the dialogue is detected or when the operator executes an action, it is determined that the dialogue has ended. If the dialogue continues, the process returns to step S202. When the process returns to step S202, the next utterance occurs. In the dialogue example of FIG. 3, the operator utters "Location to pick up the rental car?" The processes of steps S203, S204, and S205 are executed for this utterance. Similarly, the operator's utterance "At which airport are you?", the user's utterance "At OO
airport", the operator's utterance "Which airline did you use?", and the user's utterance "xx airline", and the operator's utterance "I will send the map of the location to pick up the rental car" are sequentially processed. The operator utters "I will send the map of the location to pick up the rental car" and simultaneously transmits the map data to the terminal of the user by operating the computer terminal. The dialogue extraction unit 105 detects the action of the operator based on the spoken language understanding result of the utterance "I will send the map of the location to pick up the rental car" . The dialogue extraction unit 105 acquires the contents of the
action executed by the operator during the response, and records them in the dialogue log storage unit 111.
Examples of the action are "transfer to car rental
operator", "flight timetable display", "airport facility information display", and "rental car search", as shown in FIG. 6. Each action is associated with an action ID.
When the dialogue has ended, the process advances to step S207. In step S207, the dialogue extraction unit 105 determines that the dialogue between the user and the operator has ended, and records a dialogue end label representing the end of the dialogue in the dialogue log storage unit 111. In the dialogue log storage unit 111, a log concerning one dialogue is recorded between a dialogue start label and a dialogue end label. The dialogue log concerning one dialogue includes the analysis result of the dialogue, scenario feature words, intention tags,
attributes and their semantic classes, attribute values and their semantic classes, and action contents.
The process of step S205 will be described in more detail .
In step S205-1, if the type of the utterance input in step S202 is confirmation, the dialogue extraction unit 105 extracts a scenario feature word from this utterance and a counterpart utterance. More specifically, the dialogue extraction unit 105 extracts, as the scenario feature word, a word common to the utterance of confirmation of one party (for example, operator) and the immediately preceding
utterance of the other party (for example, user) . In the dialogue example of FIG. 3, the utterance of confirmation is the operator's utterance "Location to pick up the rental car?" The utterance as the counterpart to this is the immediately preceding user's utterance "Where can I pick up the rental car I have reserved earlier?" The common words are "rental car" and "pick up". Hence, "rental car" and "pick up" are extracted as the scenario feature words.
In step S205-2, the dialogue extraction unit 105 determines whether the utterance type is question. If the utterance type is question, the process advances to step S205-3. Otherwise, the process advances to step S205-4. In step S205-4, the dialogue extraction unit 105 determines whether the utterance type is answer. If the utterance type is answer, the process advances to step S205-5.
Otherwise, the process advances to step S205-6. In step S205-6, the dialogue extraction unit 105 determines whether the utterance is associated with the action of the
operator. If the utterance is associated with the action, the process advances to step S205-8. Otherwise, the process advances to step S205-7.
The dialogue extraction unit 105 acquires the
attribute from the utterance of question (step S205-3) , and acquires an attribute value from the utterance of answer that is the counterpart of the utterance of question (step S205-5) . Semantic classes can be defined by hierarchically classifying meanings, as shown in FIG. 7. Note that the
semantic classes need not always be expressed in the hierarchical structure. The attribute value is an argument used to attain the intention represented by the intention tag.
More specifically, the dialogue extraction unit 105 acquires, out of words having a semantic class common to the utterance of question and the utterance of answer, a word in the utterance of question as an attribute and a word in the utterance of answer as an attribute value . In the dialogue example of FIG. 3, the user's answer to the operator's question "At which airport are you?" is "At OO airport". A semantic class common to these utterances is "Location_STATION_AIR" . In the operator's utterance "At which airport are you?", the word having the semantic class "Location_STATION_AIR" is "airport", and "airport" is extracted as an attribute. In the user's utterance "At OO airport", the word having the semantic class
"Location_STATION_AIR" is "OO airport", and "OO airport" is extracted as an attribute value. In addition, the user's answer to the operator's question "Which airline did you use?" is "xx airline". A semantic class common to these utterances is "Organization_COMPANY_AIR" . In the operator's utterance "Which airline did you use?", the word having the semantic class "Organization_COMPANY_AIR" is "airline", and "airline" is extracted as an attribute. In the user's utterance "xx airline", the word having the semantic class "Organization_COMPA Y_AIR" is "xx airline",
and "xx airline" is extracted as an attribute value. The set of attribute "airport", attribute value "OO airport", and semantic class "Location_STATION_AIR" and the set of attribute "airline", attribute value "xx airline", and semantic class "Organization_COMPANY_AIR" are obtained from the dialogue example shown in FIG. 3.
Note that the dialogue extraction unit 105 does not necessarily extract the same word that appears in both an utterance of confirmation and an utterance as the
counterpart to that utterance as a scenario feature word, as in the above-described example, and may extract the same word that appears in a pair of an operator 1 s utterance and a user's utterance such as a pair of question and answer as a scenario feature word.
Upon detecting the action of the operator, the
dialogue extraction unit 105 acquires the action contents (step S205-8) . The action contents include an operation that the operator actually executed for the system. FIG. 8 shows an example of action contents obtained when the operator operates an application in association with the dialogue example shown in FIG. 3. The action contents shown in FIG. 8 represent sending of a map illustrating the location to pick up the rental car.
The dialogue extraction unit 105 acquires an intention tag from an utterance that is neither of utterances
associated with question, answer, and action (step S205-7) . This utterance is recorded in the dialogue log storage unit
111 as an utterance having an intention that does not contribute to attain the purpose of the dialogue .
FIG. 9 shows a dialogue log associated with the dialogue example shown in FIG. 3. Referring to FIG. 9, "START OPERATOR" is the dialogue start label, and "END OPERATOR" is the dialogue end label. Pieces of information about utterances and actions are recorded between the dialogue start label and the dialogue end label . In the example of FIG. 9, the log of an utterance is described using colon separation as utterance subj ect : utterance type : utterance contents : intention tag. The utterance contents include a speech recognition result, words, and their semantic classes. Each semantic class is described in parentheses immediately after a word. The log of an action is described using colon separation as action subj ect : action contents.
FIG. 10 schematically shows the processing procedure of constructing a scenario from a dialogue log. In step S301 of FIG. 10, the scenario construction unit 106 loads a dialogue log from the dialogue log storage unit 111, and extracts a dialogue start label and a dialogue end label concerning a scenario construction target dialogue from the loaded dialogue log. In step S302, the scenario
construction unit 106 loads the log between the dialogue start label and the dialogue end label. In step S303, the scenario construction unit 106 generates a set of "input", "operation", and "state" as a unit of the scenario.
FIGS. 11A and 11B show examples of a scenario constructed based on the dialogue log shown in FIG. 9. The scenario shown in FIG. 11A includes three states. The scenario shown in FIG. 11B includes one state. An input includes an intention tag and an attribute. An operation includes an operation tag.
In step S304, the scenario construction unit 106 acquires a semantic class common to an utterance whose type is question and an utterance whose type is answer and the word of the semantic class. Here, "common" is used as a term that means "same" or "being in inclusion relation". The scenario construction unit 106 uses the acquired word or semantic class as the attribute of the input.
The process of step S304 will be described in more detail. In step S304-1, the scenario construction unit 106 acquires words from an utterance whose type is question as attribute candidates and stores them in a memory. If the type of the next utterance is answer, in step S304-2, the scenario construction unit 106 acquires words from the utterance as attribute candidates and holds them in the memory. In step S304-3, the semantic classes of the words acquired in steps S304-1 and S304 -2 are compared, and an attribute is obtained from words having a common semantic class. For example, "airport" is acquired as an attribute from the pair of the operator 1 s utterance "At which airport are you?" and the user's utterance "At OO airport". Note that the attribute acquisition method may be the same as
that described concerning the process of step S205-3. Two attributes, "airport" and "airline", are obtained from the dialogue log of FIG. 9.
When an attribute is obtained in step S304-3, the process advances to step S304-5. In step S304-5, the scenario construction unit 106 generates an input condition using the attribute obtained in step S304 -3. More
specifically, the scenario construction unit 106 registers the attribute in a scenario as an input attribute
corresponding to the intention tag of the latest utterance whose type is request.
If no attribute is obtained as in a case where the utterance next to the utterance of question is not answer, the process advances to step S304-4. In step S304-4, the scenario construction unit 106 determines whether the user has returned a question in response to the question of the operator. For example, in a dialogue example shown in FIG. 12, the user responds by "Urn? I don't know" to the operator's question "At which terminal are you?" If the type of an utterance to a question is not answer, as described above, the scenario construction unit 106
determines it as a redundant response in the scenario (or may determine it as another response type) , that is, determines that the efficiency is low, and sets the
evaluation of the scenario under construction low. In step S304-6, the scenario construction unit 106 waits for an utterance whose type is answer. Upon detecting an
utterance whose type is answer, the scenario construction unit 106 acquires an attribute from the pair of the
utterance of question and the utterance of answer and generates an input condition based on the attribute.
If the type of the utterance next to the utterance of question is not answer in step S304 -4, the process advances to step S304-7. In step S304 -7, the spoken language understanding unit 102 acquires an intention tag, and
"input" is generated based on the intention tag as well as the input condition generated in step S304-5.
In step S305, the scenario construction unit 106 ends the load of the dialogue log. In step S306, the scenario construction unit 106 replaces the word included in the action content with a semantic class serving as a variable. In step S307, the scenario construction unit 106 stores the constructed scenario in the scenario storage unit 110. The scenario is stored in association with a scenario feature word so as to enable a search by the scenario feature word.
Note that the scenario can be constructed so as to faithfully reproduce the dialogue between the user and the operator, as in the example of FIG. 11A, or constructed so as to receive necessary attributes at once, as in the example of FIG. 11B.
The scenario updating unit 107 converts the scenario stored in the scenario storage unit 110 into an object executable by the dialogue management unit 103 and adds it to the scenario object DB 114. As for the timing, the
updating may be done automatically or based on an operation by an administrator. Similar scenarios may simultaneously be constructed for a plurality of operators. As shown in FIG. 13, the scenario storage unit 110 stores each scenario in association with a scenario feature word, the number of states, the number of response steps, and the number of response failures. The number of response failures
represents the number of failures in response as in a case where the user has made an utterance other than answer to a question of the operator. The number of states, the number of response steps, and the number of response failures are examples of evaluation data to evaluate a scenario, which are used when the administrator of the dialogue system 100 decides whether to add the scenario to the scenario object DB 114. The scenario updating unit 107 can display the evaluation data together with the scenarios so that the administrator of the dialogue system 100 can select the scenarios to be added to the scenario object DB 114.
FIG. 14 shows a procedure of presenting a candidate of an action to be executed to the operator during a response. In step S401 of FIG. 14, the scenario searching unit 113 extracts one or more scenario feature words from the dialogue between the user and the operator during the response of the operator. More specifically, the scenario searching unit 113 extracts, as the scenario feature words, words common to an utterance whose type is confirmation and an utterance as the counterpart to it.
In step S402, the scenario searching unit 113 searches the scenario storage unit 110 using the scenario feature words as search keys . In step S403, it is determined whether there exists a similar scenario that is a scenario in which all or some of the scenario feature words match. If a similar scenario exists, the process advances to step
5404. Otherwise, the processing ends.
In step S404, the scenario searching unit 113 acquires action contents included in the similar scenario. In step
5405, the scenario searching unit 113 displays the acquired action contents as an action candidate via the dialogue state display unit 112. The operator decides an action to be executed with reference to the displayed action
candidate .
It is possible to assist the operator by displaying the action candidate in this way.
FIG. 15 shows an example of contents displayed by the dialogue state display unit 112. The dialogue state display unit 112 includes a conversation monitor, a spoken language understanding monitor, and an operation monitor. The conversation monitor displays the speech recognition result for the dialogue between the user and the operator by the speech recognition unit 101. The spoken language understanding monitor displays the spoken language
understanding result for the dialogue between the user and the operator by the spoken language understanding unit 102. The operation monitor displays an action candidate acquired
by the scenario searching unit 113. In the example of FIG. 15, three action candidates are displayed.
When the dialogue state display unit 112 is provided, the operator can visually confirm the request of the user. If there are inadequacies in the speech recognition result and the spoken language understanding result, the speech recognition result and the spoken language understanding result need to be corrected to construct a useful scenario. In the example of FIG. 15, spoken language understanding fails because of a recognition error in speech recognition. When the speech recognition result and the spoken language understanding result are presented to the operator during the response, the operator can correct the speech
recognition result and the spoken language understanding result .
As described above, according to this embodiment, a necessary scenario can easily be added to the dialogue system by constructing the scenario based on the dialogue log including the analysis result of the dialogue between the user and the operator and the action of the operator.
Note that the dialogue system 100 according to the embodiment can also be implemented by, for example, using a general-purpose computer apparatus as basic hardware. That is, the speech recognition unit 101, the spoken language understanding unit 102, the dialogue management unit 103, the response generation unit 104, the dialogue extraction unit 105, the scenario construction unit 106, the scenario
updating unit 107, the dialogue state display unit 112, and the scenario searching unit 113 can be implemented by causing a processor included in the computer apparatus to execute a program. At this time, the dialogue system can be implemented by installing the program in the computer apparatus in advance or by distributing the program stored in a storage medium such as a CD-ROM or via a network and installing the program in the computer apparatus as needed. The dialogue log storage unit, the scenario storage unit, the dictionary storage unit, and the spoken language understanding model storage unit can be implemented using an internal or external memory of the computer apparatus, a hard disk, or a storage medium such as a CD-R, CD-RW,
DVD-RAM, or DVD-R as needed.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions.
Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Claims
1. A dialogue system construction support apparatus comprising:
a speech recognition unit configured to perform speech recognition for utterances included in a dialogue between a user and an operator and generate a speech recognition result including texts corresponding to the utterances; a spoken language understanding unit configured to understand intentions of the utterances based on the texts and obtain a spoken language understanding result including types of the utterances, the intentions of the utterances, words included in the texts, and semantic classes of the words ;
a dialogue information storage unit configured to store the speech recognition result, the spoken language understanding result, and an action executed by the
operator concerning the dialogue in association with each other; and
a scenario construction unit configured to acquire, as an attribute, a word having a semantic class common to an utterance of question and an utterance of answer included in the dialogue and construct a scenario using the
attribute and the action.
2. The apparatus according to claim 1, further comprising a dialogue extraction unit configured to
extract, as a scenario feature word, a common word that appears in a pair of an utterance of the operator and an
utterance of the user included in the dialogue.
3. The apparatus according to claim 2, wherein the dialogue extraction unit extracts, as the scenario feature word, a word common to an utterance of confirmation and an utterance as a counterpart to the utterance of
confirmation, which are included in the dialogue.
4. The apparatus according to claim 2, further comprising:
a scenario storage unit configured to store scenarios in association with scenario feature words;
a scenario searching unit configured to search the scenario storage unit to acquire, as a similar scenario, a scenario associated with the scenario feature word
extracted by the dialogue extraction unit; and
a display unit configured to display an action included in the similar scenario.
5. The apparatus according to claim 1, further comprising a display unit configured to display the speech recognition result and the spoken language understanding result .
6. The apparatus according to claim 1, further comprising a scenario updating unit configured to add the scenario to a database of a dialogue system.
7. The apparatus according to claim 6, wherein the scenario construction unit generates evaluation data to evaluate the scenario, and
the scenario updating unit displays the scenario
together with the evaluation data so that whether to add the scenario to the database is configured to be selected.
8. A dialogue system construction support method comprising :
performing speech recognition for utterances included in a dialogue between a user and an operator and generating a speech recognition result including texts corresponding to the utterances ;
understanding intentions of the utterances based on the texts and obtaining a spoken language understanding result including types of the utterances, the intentions of the utterances, words included in the texts, and semantic classes of the words;
storing the speech recognition result, the spoken language understanding result, and an action executed by the operator concerning the dialogue in association with each other; and
acquiring, as an attribute, a word having a semantic class common to an utterance of question and an utterance of answer included in the dialogue and constructing a scenario using the attribute and the action.
9. A non-transitory computer readable medium
including computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform a method comprising:
performing speech recognition for utterances included in a dialogue between a user and an operator and generating
a speech recognition result including texts corresponding to the utterances ;
understanding intentions of the utterances based on the texts and obtaining a spoken language understanding result including types of the utterances, the intentions o the utterances, words included in the texts, and semantic classes of the words;
storing the speech recognition result, the spoken language understanding result, and an action executed by the operator concerning the dialogue in association with each other; and
acquiring, as an attribute, a word having a semantic class common to an utterance of question and an utterance of answer included in the dialogue and constructing a scenario using the attribute and the action.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2014054491A JP2015176099A (en) | 2014-03-18 | 2014-03-18 | Dialog system construction support apparatus, method, and program |
| JP2014-054491 | 2014-03-18 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2015141700A1 true WO2015141700A1 (en) | 2015-09-24 |
Family
ID=54144664
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2015/057970 Ceased WO2015141700A1 (en) | 2014-03-18 | 2015-03-11 | Dialogue system construction support apparatus and method |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP2015176099A (en) |
| WO (1) | WO2015141700A1 (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10574821B2 (en) | 2017-09-04 | 2020-02-25 | Toyota Jidosha Kabushiki Kaisha | Information providing method, information providing system, and information providing device |
| CN111048084A (en) * | 2019-12-18 | 2020-04-21 | 上海智勘科技有限公司 | Method and system for pushing information in intelligent voice interaction process |
| EP3663940A4 (en) * | 2017-08-04 | 2020-07-29 | Sony Corporation | INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD |
| CN112837684A (en) * | 2021-01-08 | 2021-05-25 | 北大方正集团有限公司 | Business processing method and system, business processing device and readable storage medium |
| RU2755781C1 (en) * | 2020-06-04 | 2021-09-21 | Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк) | Intelligent workstation of the operator and method for interaction thereof for interactive support of a customer service session |
| WO2022105115A1 (en) * | 2020-11-17 | 2022-05-27 | 平安科技(深圳)有限公司 | Question and answer pair matching method and apparatus, electronic device and storage medium |
| CN116467426A (en) * | 2023-05-17 | 2023-07-21 | 山东浪潮科学研究院有限公司 | A Chinese language model dialog system and method based on prompts and limited background |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP6946406B2 (en) * | 2016-03-16 | 2021-10-06 | 株式会社東芝 | Concept dictionary creation device, method and program |
| JP2017167851A (en) * | 2016-03-16 | 2017-09-21 | 株式会社東芝 | Concept dictionary creation device, method and program |
| JP6899558B2 (en) * | 2016-08-26 | 2021-07-07 | 株式会社Nextremer | Dialogue control device, dialogue engine, management terminal, dialogue device, dialogue control method, and program |
| JP6615803B2 (en) * | 2017-02-08 | 2019-12-04 | 日本電信電話株式会社 | Business determination device, business determination method and program |
| JP2018159729A (en) * | 2017-03-22 | 2018-10-11 | 株式会社東芝 | Dialog system construction support apparatus, method, and program |
| JP6873805B2 (en) * | 2017-04-24 | 2021-05-19 | 株式会社日立製作所 | Dialogue support system, dialogue support method, and dialogue support program |
| US20210034678A1 (en) * | 2018-04-23 | 2021-02-04 | Ntt Docomo, Inc. | Dialogue server |
| CA3045132C (en) * | 2019-06-03 | 2023-07-25 | Eidos Interactive Corp. | Communication with augmented reality virtual agents |
| JP6755633B2 (en) * | 2019-07-19 | 2020-09-16 | 日本電信電話株式会社 | Message judgment device, message judgment method and program |
| CN118280372B (en) * | 2024-06-03 | 2024-08-06 | 中邮消费金融有限公司 | Dialogue assistance method, device, storage medium, and computer program product |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050105712A1 (en) * | 2003-02-11 | 2005-05-19 | Williams David R. | Machine learning |
| JP2013225036A (en) * | 2012-04-23 | 2013-10-31 | Scsk Corp | Automatic interactive scenario creation support device and automatic interactive scenario creation support program |
-
2014
- 2014-03-18 JP JP2014054491A patent/JP2015176099A/en active Pending
-
2015
- 2015-03-11 WO PCT/JP2015/057970 patent/WO2015141700A1/en not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050105712A1 (en) * | 2003-02-11 | 2005-05-19 | Williams David R. | Machine learning |
| JP2013225036A (en) * | 2012-04-23 | 2013-10-31 | Scsk Corp | Automatic interactive scenario creation support device and automatic interactive scenario creation support program |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3663940A4 (en) * | 2017-08-04 | 2020-07-29 | Sony Corporation | INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD |
| US11514903B2 (en) | 2017-08-04 | 2022-11-29 | Sony Corporation | Information processing device and information processing method |
| US10574821B2 (en) | 2017-09-04 | 2020-02-25 | Toyota Jidosha Kabushiki Kaisha | Information providing method, information providing system, and information providing device |
| US10992809B2 (en) | 2017-09-04 | 2021-04-27 | Toyota Jidosha Kabushiki Kaisha | Information providing method, information providing system, and information providing device |
| CN111048084A (en) * | 2019-12-18 | 2020-04-21 | 上海智勘科技有限公司 | Method and system for pushing information in intelligent voice interaction process |
| CN111048084B (en) * | 2019-12-18 | 2022-05-31 | 上海智勘科技有限公司 | Method and system for pushing information in intelligent voice interaction process |
| RU2755781C1 (en) * | 2020-06-04 | 2021-09-21 | Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк) | Intelligent workstation of the operator and method for interaction thereof for interactive support of a customer service session |
| WO2022105115A1 (en) * | 2020-11-17 | 2022-05-27 | 平安科技(深圳)有限公司 | Question and answer pair matching method and apparatus, electronic device and storage medium |
| CN112837684A (en) * | 2021-01-08 | 2021-05-25 | 北大方正集团有限公司 | Business processing method and system, business processing device and readable storage medium |
| CN116467426A (en) * | 2023-05-17 | 2023-07-21 | 山东浪潮科学研究院有限公司 | A Chinese language model dialog system and method based on prompts and limited background |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2015176099A (en) | 2015-10-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2015141700A1 (en) | Dialogue system construction support apparatus and method | |
| CN116737908B (en) | Knowledge question-answering method, device, equipment and storage medium | |
| JP6791825B2 (en) | Information processing device, dialogue processing method and dialogue system | |
| US11568855B2 (en) | System and method for defining dialog intents and building zero-shot intent recognition models | |
| US11049493B2 (en) | Spoken dialog device, spoken dialog method, and recording medium | |
| US11915693B2 (en) | System and method for rule based modifications to variable slots based on context | |
| US10672391B2 (en) | Improving automatic speech recognition of multilingual named entities | |
| KR101583181B1 (en) | Method and computer program of recommending responsive sticker | |
| KR102469513B1 (en) | Methods for understanding incomplete natural language queries | |
| US11030400B2 (en) | System and method for identifying and replacing slots with variable slots | |
| US20220414463A1 (en) | Automated troubleshooter | |
| CN108369580B (en) | Language and domain independent model based approach to on-screen item selection | |
| US20190164540A1 (en) | Voice recognition system and voice recognition method for analyzing command having multiple intents | |
| CN109325091B (en) | Method, device, equipment and medium for updating attribute information of interest points | |
| US20160210279A1 (en) | Methods and systems for analyzing communication situation based on emotion information | |
| US20170199867A1 (en) | Dialogue control system and dialogue control method | |
| KR20160089152A (en) | Method and computer system of analyzing communication situation based on dialogue act information | |
| CN110415679A (en) | Speech error correction method, device, equipment and storage medium | |
| EP2887229A2 (en) | Communication support apparatus, communication support method and computer program product | |
| JP2017037588A (en) | Information processor and information processing program | |
| KR101763679B1 (en) | Method and computer system of analyzing communication situation based on dialogue act information | |
| KR102391447B1 (en) | Method and Apparatus for Providing Hybrid Intelligent Customer Consultation | |
| JP6675788B2 (en) | Search result display device, search result display method, and program | |
| US20250094480A1 (en) | Document processing and retrieval for knowledge-based question answering | |
| JP2011232619A (en) | Voice recognition device and voice recognition method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15764494 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 15764494 Country of ref document: EP Kind code of ref document: A1 |