[go: up one dir, main page]

US11521641B2 - Model learning device, estimating device, methods therefor, and program - Google Patents

Model learning device, estimating device, methods therefor, and program Download PDF

Info

Publication number
US11521641B2
US11521641B2 US16/484,053 US201816484053A US11521641B2 US 11521641 B2 US11521641 B2 US 11521641B2 US 201816484053 A US201816484053 A US 201816484053A US 11521641 B2 US11521641 B2 US 11521641B2
Authority
US
United States
Prior art keywords
satisfaction
state
change pattern
utterance
utterer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US16/484,053
Other versions
US20190392348A1 (en
Inventor
Atsushi Ando
Hosana KAMIYAMA
Satoshi KOBASHIKAWA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION reassignment NIPPON TELEGRAPH AND TELEPHONE CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ANDO, ATSUSHI, KAMIYAMA, Hosana, KOBASHIKAWA, Satoshi
Publication of US20190392348A1 publication Critical patent/US20190392348A1/en
Application granted granted Critical
Publication of US11521641B2 publication Critical patent/US11521641B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N7/005
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • G06Q30/015Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
    • G06Q30/016After-sales
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Definitions

  • the present invention relates to a technology for estimating the state of satisfaction of an utterer.
  • the state of satisfaction of a customer is a staged category indicating whether the customer expresses his or her satisfaction or dissatisfaction and refers to, for example, three stages: satisfaction, average, and dissatisfaction.
  • This technology can be applied to, for instance, automatization of evaluations of operators by counting the frequency of satisfaction of customers for each operator or carrying out a survey on the demands of customers by performing speech recognition and text analysis on the utterance indicating satisfaction.
  • technologies for estimating satisfaction or dissatisfaction or anger of a customer from a call are proposed in Non-patent Literatures 1 and 2.
  • Non-patent Literature 1 satisfaction/dissatisfaction of a customer at a given time is estimated by using the feature of the way a customer speaks, such as the rate of utterance, and a linguistic feature such as the presence or absence of a product name of a competitor.
  • the anger/non-anger state of a customer at a given time is estimated by using a prosodic feature such as the pitch or volume of a voice of a customer and a dialogic feature such as the frequency of responses.
  • a prosodic feature such as the pitch or volume of a voice of a customer
  • a dialogic feature such as the frequency of responses.
  • a state-of-satisfaction change pattern model including a set of transition weights in a state sequence (a state transition sequence) of the states of satisfaction is obtained for each of predetermined change patterns of the state of satisfaction by using a state-of-satisfaction change pattern correct value indicating a correct value of a change pattern of the state of satisfaction of an utterer in a conversation and state-of-satisfaction correct values, each indicating a correct value of the state of satisfaction of the utterer at the time of each utterance in the conversation, and the state-of-satisfaction change pattern model is output.
  • a state-of-satisfaction estimation model for obtaining the posteriori probability of the utterance feature amount given the state of satisfaction of an utterer is obtained by using the utterance-for-learning feature amount and a correct value of the state of satisfaction of an utterer who gave an utterance for learning corresponding to the utterance-for-learning feature amount, and the state-of-satisfaction estimation model is output.
  • an estimated value of the state of satisfaction of an utterer who gave an utterance corresponding to the input utterance feature amount is obtained and output.
  • FIG. 1 is a block diagram illustrating the functional configuration of a model learning device of an embodiment.
  • FIG. 2 is a block diagram illustrating the functional configuration of an estimating device of the embodiment.
  • FIG. 3 illustrates change patterns of the state of satisfaction.
  • FIG. 4 is a diagram illustrating temporal changes in the state of satisfaction.
  • FIG. 5 is a diagram illustrating a state-of-satisfaction change pattern model structure.
  • FIG. 6 is a diagram illustrating a state-of-satisfaction change pattern model structure.
  • change patterns of the state of satisfaction of an utterer in a conversation are classified into a predetermined number of expressions, and each change pattern is expressed in a probability model and used for estimation of the state of satisfaction.
  • a state-of-satisfaction change pattern model including a set of transition weights in a state sequence (a state transition sequence) of the states of satisfaction is obtained for each of predetermined change patterns of the state of satisfaction by using a state-of-satisfaction change pattern correct value indicating a correct value of a change pattern of the state of satisfaction of an utterer in a conversation and state-of-satisfaction correct values, each indicating a correct value of the state of satisfaction of the utterer at the time of each utterance in the conversation, and a state-of-satisfaction estimation model for obtaining the posteriori probability of the utterance feature amount given the state of satisfaction of an utterer is obtained by using the utterance-for-learning feature amount and a correct value of the state of satisfaction of an utterer who gave an utterance for learning corresponding to the utterance-for-learning feature amount.
  • an estimated value of the state of satisfaction of an utterer who gave an utterance corresponding to the input utterance feature amount is obtained.
  • the “conversation” is a call which is made between a customer and a call center; however, the present invention is not limited to this example.
  • the “conversation” may be a call which is made through a telephone line, a call which is made through the Internet, or a call which is made through a local line.
  • the “conversation” may be a conversation such as a dialogue, a talk, or a preliminary meeting which is made by two or more human beings face-to-face, not a call.
  • the “conversation” may be made between human beings or between a human being and an automatic interaction device (such as an interaction device using artificial intelligence).
  • the “utterer” means one particular person who gives an “utterance” in the “conversation”.
  • the customer is the “utterer”
  • the “conversation” is a conversation which is made by two or more human beings face-to-face, one particular person taking part in the conversation is the “utterer”
  • the “conversation” is a conversation which is made between a human being and an automatic interaction device, the human being who makes a conversation with the automatic interaction device is the “utterer”.
  • the “state of satisfaction” means the degree of satisfaction of the “utterer” (the extent to which the “utterer” is satisfied).
  • the “state of satisfaction” may be what is divided into a plurality of classifications or what is converted into numbers. In the former case, the “state of satisfaction” may be what is divided into two classifications (for example, two classifications: satisfaction and dissatisfaction), what is divided into three classifications (for example, three classifications: satisfaction, average, and dissatisfaction), or what is divided into four or more classifications.
  • the “change pattern” is a pattern indicating how the “state of satisfaction” of the “utterer” in the “conversation” changes.
  • the “change pattern” is a pattern indicating temporal changes in the “state of satisfaction” at a plurality of time points in the “conversation”.
  • the types and number of the “change patterns” are determined in advance. By using the “change pattern”, it is possible to estimate, from the estimated transition of the “state of satisfaction”, to which of the “change patterns” the “state of satisfaction” of the “utterer” applies and to which “state of satisfaction” the “state of satisfaction” probably transitions next. There is no restriction on the types and numbers of the “change patterns”.
  • the inventor has listened and analyzed a lot of calls on the assumption that the “state of satisfaction” is any one of the states: “satisfaction”, “average”, and “dissatisfaction” and found that the “change patterns” of the “state of satisfaction” of a customer (an utterer) in a call-center call can be classified into the following nine patterns ( FIG. 3 ).
  • Average ⁇ dissatisfaction ⁇ average A pattern in which average changes to dissatisfaction and then changes to average
  • Average ⁇ dissatisfaction A pattern in which average changes to dissatisfaction
  • Dissatisfaction ⁇ average A pattern in which dissatisfaction changes to average
  • the “state of satisfaction” is any one of the states: “satisfaction”, “average”, and “dissatisfaction”
  • the “change pattern” is any one of the above-described patterns (1) to (9).
  • the state of satisfaction at the start of the “conversation” of (9) is “satisfaction”
  • the state of satisfaction at the start of the “conversation” of (1), (2), (4), (5), and (7) is “average”
  • the state of satisfaction at the start of the “conversation” of (3), (6), and (8) is “dissatisfaction”.
  • the state of satisfaction at the end of the “conversation” of (1), (2), (3), and (9) is “satisfaction”
  • the state of satisfaction at the end of the “conversation” of (4), (5), and (8) is “average”
  • the state of satisfaction at the end of the “conversation” of (6) and (7) is “dissatisfaction”.
  • the state of satisfaction at the start of the “conversation” shows a high level of satisfaction (is “satisfaction” or “average”)
  • the state of satisfaction at the end of the “conversation” also tends to show a high level of satisfaction.
  • the number of cases where the state of satisfaction at the end of the “conversation” shows a higher level of satisfaction than the level at the start of the “conversation” is smaller than the number of other cases. It is assumed that the state of satisfaction “satisfaction” shows the highest level of satisfaction, the state of satisfaction “average” shows the second highest level of satisfaction, and the state of satisfaction “dissatisfaction” shows the lowest level of satisfaction.
  • the “state of satisfaction” of the “utterer” in the “conversation” changes in various ways. As illustrated in FIG. 4 , the same state of satisfaction sometimes continues at a plurality of time points (C 1 ) and the state of satisfaction sometimes changes at a plurality of time points (C 2 ). For this reason, changes in the state of satisfaction of the “utterer” in an actual “conversation” do not always fit into a predetermined “change pattern”. To express such complicated changes, for each “change pattern”, changes in the state of satisfaction are expressed in a probability model (a state-of-satisfaction change pattern model).
  • a “state-of-satisfaction change pattern model” including a set of transition weights (for example, transition probabilities) in a state sequence of the “states of satisfaction” is generated.
  • a model including a set of transition weights between the “states of satisfaction” in a state sequence is a “state-of-satisfaction change pattern model”.
  • a state sequence of the “states of satisfaction” means a sequence of the “states of satisfaction” to which the state of satisfaction can transition from the start to the end of the “conversation”.
  • HMM hidden Markov model
  • Reference Literature 1 Keiichi Tokuda, “State-of-the-art Technology of Speech Information Processing: Speech Recognition and Speech Synthesis based on Hidden Markov Models”, IPSJ Magazine, Vol. 45, No. 10, pp. 1005-1011, 2004.
  • left-to-right HMM with branches not a chain-like HMM, is used for modeling of the “change pattern”.
  • FIG. 5 a state sequence of the “states of satisfaction” in left-to-right HMM with branches is illustrated.
  • the state of satisfaction transitions to each of the “states of satisfaction” at time points I, II, and III and reaches the “state of satisfaction” S 4 at the end of the “conversation”.
  • a state sequence of the “states of satisfaction” which is used for modeling of the “change pattern” is referred to as a “state-of-satisfaction change pattern model structure”.
  • the “state-of-satisfaction change pattern model” is obtained for each “change pattern”, it is desirable that the same “state-of-satisfaction change pattern model structure” is used for all the “change patterns”. That is, it is desirable to use the same “state-of-satisfaction change pattern model structure” for all the “change patterns” and obtain the “state-of-satisfaction change pattern model” for each of the “change patterns”.
  • the reason is as follows. If the “state-of-satisfaction change pattern model structure” is changed in accordance with the “change pattern”, the “state-of-satisfaction change pattern model” reflects the tendency of the “state-of-satisfaction change pattern model structure”, which sometimes makes it impossible to model the “change pattern” properly. However, the same “state-of-satisfaction change pattern model structure” may not be used for all the “change patterns” as long as the “change pattern” can be properly modeled.
  • a model learning device 11 of the present embodiment includes an utterance-for-learning storage 111 a , a state-of-satisfaction correct value storage 111 b , a state-of-satisfaction change pattern correct value storage 111 c , a state-of-satisfaction change pattern model structure storage 111 d , a state-of-satisfaction estimation model storage 111 e , a state-of-satisfaction change pattern model storage 111 f , a state-of-satisfaction change pattern model learning unit 112 , a voice activity detection unit 113 , an utterance feature amount extraction unit 114 , and a state-of-satisfaction estimation model learning unit 115 .
  • an estimating device 12 of the present embodiment includes an input unit 121 , a voice activity detection unit 122 , an utterance feature amount extraction unit 123 , and a state estimation unit 124 .
  • Each of the model learning device 11 and the estimating device 12 of the present embodiment is configured as a result of, for example, a general-purpose or dedicated computer including a processor (a hardware processor) such as a central processing unit (CPU), memory such as random-access memory (RAM) and read-only memory (ROM), and so forth executing a predetermined program.
  • This computer may include one processor or memory or more than one processor or memory.
  • This program may be installed in the computer or may be recorded on the ROM or the like in advance.
  • part or all of the processing units may be configured by using not an electronic circuit (circuitry), like a CPU, which implements a functional configuration as a result of a program being read thereinto but an electronic circuit that implements a processing function without using a program.
  • An electronic circuit with which one device is configured may include a plurality of CPUs.
  • model learning processing which is performed by the model learning device 11 ( FIG. 1 ) will be described.
  • an “utterance for learning” necessary for model learning is stored in the utterance-for-learning storage 111 a of the model learning device 11 ( FIG. 1 ), a “state-of-satisfaction change pattern correct value” is stored in the state-of-satisfaction change pattern correct value storage 111 c , “state-of-satisfaction correct values” are stored in the state-of-satisfaction correct value storage 111 b , and a “state-of-satisfaction change pattern model structure” is stored in the state-of-satisfaction change pattern model structure storage 111 d .
  • the “utterance for learning” is time series speech data of “utterances” given by an “utterer” in each of a plurality of “conversations”.
  • the “utterance for learning” is obtained by recording the contents of “utterances” of an “utterer” who is making a “conversation”.
  • the “state-of-satisfaction change pattern correct value” indicates a correct value of a “change pattern” of the state of satisfaction of an “utterer” in each of the “conversations”.
  • the “state-of-satisfaction change pattern correct value” is manually set based on answers made by an “utterer” about a “change pattern” to which changes in his or her state of satisfaction in the “conversation” apply.
  • the “state of satisfaction” of the present embodiment is any one of the three states: “satisfaction”, “average”, and “dissatisfaction”, and the “state-of-satisfaction change pattern correct value” is any one of the above-mentioned nine “change patterns” (1) to (9) ( FIG. 3 ).
  • Each of the “state-of-satisfaction correct values” indicates a correct value of the “state of satisfaction” of an “utterer” at the time of each utterance in these “conversations”. That is, the “state-of-satisfaction correct value” indicates a correct value of the “state of satisfaction” of an “utterer” at a time point at which each utterance was given by the “utterer”.
  • the “state-of-satisfaction correct values” are manually set based on answers made by an “utterer” about the “state of satisfaction” at time points at which the “utterer” gave “utterances”.
  • the “state-of-satisfaction change pattern model structure” is a state sequence of the “states of satisfaction” which is used for modeling of the “change pattern”.
  • An example of the “state-of-satisfaction change pattern model structure” is the state sequence illustrated in FIG. 5 .
  • the same “state-of-satisfaction change pattern model structure” is used for all the “change patterns”.
  • the present invention is not limited thereto.
  • a label for identifying a “conversation” and an “utterance” corresponding to each time point is correlated with the “utterance for learning”
  • a label for identifying a “conversation” corresponding to each “state-of-satisfaction change pattern correct value” is correlated with the “state-of-satisfaction change pattern correct value”
  • a label for identifying an “utterance” corresponding to each “state-of-satisfaction correct value” is correlated with the “state-of-satisfaction correct value”.
  • the state-of-satisfaction change pattern model learning unit 112 obtains, for each of the predetermined “change patterns” of the “state of satisfaction”, a “state-of-satisfaction change pattern model” including the “state-of-satisfaction change pattern model structure” and a set of transition weights of the states of satisfaction and outputs the “state-of-satisfaction change pattern model”.
  • state-of-satisfaction correct values corresponding to “utterances” given in a “conversation” whose “state-of-satisfaction change pattern correct value” is the change pattern C k are used.
  • the state-of-satisfaction change pattern model learning unit 112 learns transition weights (for example, transition probabilities) between the states of satisfaction included in the “state-of-satisfaction change pattern model structure” by using, as learning data, “state-of-satisfaction correct values” corresponding to “utterances” included in a “conversation” whose “state-of-satisfaction change pattern correct value” is the change pattern C k , and outputs the state-of-satisfaction change pattern model PM k including the “state-of-satisfaction change pattern model structure” and a set of the obtained transition weights.
  • transition weights for example, transition probabilities
  • the state-of-satisfaction change pattern model learning unit 112 learns transition weights from S 0 to S 1 , S 2 , and S 3 in Stage I, transition weights of S 1 , S 2 , and S 3 in Stage I, transition weights from S 1 , S 2 , and S 3 in Stage I to S 1 , S 2 , and S 3 in Stage II, transition weights of S 1 , S 2 , and S 3 in Stage II, transition weights from S 1 , S 2 , and S 3 in Stage II, transition weights from S 1 , S 2 , and S 3 in Stage II to S 1 , S 2 , and S 3 in Stage III, transition weights of S 1 , S 2 , and S 3 in Stage III, and transition weights from S 1 , S 2 , and S 3 in Stage III to S 4 , and
  • transition weights can be performed by the same procedure as that of HMM learning which is performed when a state sequence is known (see, for example, Reference Literature 2 (Kiyohiro Shikano, Katsunobu Ito, Tatsuya Kawahara, Kazuya Takeda, Mikio Yamamoto, “Speech Recognition System”, Ohmsha, Ltd., pp. 27-29, 2001)).
  • the state-of-satisfaction change pattern model learning unit 112 obtains, by using the same “state-of-satisfaction change pattern model structure” for all the change patterns C 1 , . . .
  • the “utterance for learning” read from the utterance-for-learning storage 111 a is input.
  • the voice activity detection unit 113 detects one or more voice activities by applying voice activity detection to the input “utterance for learning”, and extracts an “utterance” of the “utterer” in the detected voice activities and outputs the “utterance”.
  • voice activity detection a well-known voice activity detection technique such as a technique based on threshold processing of power or a technique based on the likelihood ratio of speech/non-speech models can be used.
  • the “utterance (the utterance for learning)” of the “utterer” in the voice activity which is output from the voice activity detection unit 113 , is input.
  • the utterance feature amount extraction unit 114 extracts the “utterance-for-learning feature amount”, which is the feature amount considered to be related to the “state of satisfaction”, for each “utterance” of the “utterer”. For instance, the utterance feature amount extraction unit 114 extracts, as the “utterance-for-learning feature amount”, the feature amount including at least one or more of the prosodic feature, the dialogic feature, and the language feature of an “utterance”.
  • the utterance feature amount extraction unit 114 may divide an utterance into frames, obtain the fundamental frequency or power for each frame, and use the fundamental frequency or power of each frame as at least part of the feature amount.
  • the utterance feature amount extraction unit 114 may estimate a phoneme sequence in an utterance by using a well-known speech recognition technology and obtain the rate of utterance or the duration of a final phoneme.
  • the dialogic feature at least one or more of the following can be used: the time from the previous “utterance” given by an “utterer” such as a customer to the present “utterance”; the time from a dialogic utterance given by a dialogist, such as an operator, who made a conversation with an “utterer” such as a customer to an “utterance” given by the “utterer” such as the customer; the time from an “utterer” such as a customer to the next dialogic utterance given by a dialogist such as an operator; the length of an “utterance” given by an “utterer” such as a customer; the length of a dialogic utterance given by a dialogist such as an operator, which was given before and after an “utterance” given by an “utterer”; the number of responses made by an “utterer” such as a customer during a dialogic utterance given by a dialogist such as an operator before and after it; and the number of responses made by a dialogist such as an operator during an utterance
  • the utterance feature amount extraction unit 114 may estimate a word which may be used in an utterance by using a well-known speech recognition technology and use the result thereof.
  • the number of words of appreciation (for example, “thank you” or “thanks”), which are manually selected, may be used as at least part of the feature amount.
  • Which of the features is used as the “utterance-for-learning feature amount” is determined in advance.
  • the utterance feature amount extraction unit 114 outputs the extracted “utterance-for-learning feature amount”.
  • the “utterance-for-learning feature amount” output from the utterance feature amount extraction unit 114 and the correct value of the “state of satisfaction” read from the state-of-satisfaction correct value storage 111 b are input.
  • the correct value of the “state of satisfaction” input to the state-of-satisfaction estimation model learning unit 115 is the correct value of the “state of satisfaction” of an “utterer” who gave an “utterance” corresponding to the “utterance-for-learning feature amount” which is input to the state-of-satisfaction estimation model learning unit 115 .
  • the “utterance-for-learning feature amount” and the correct value of the “state of satisfaction” of an “utterer” at the time of each “utterance” corresponding to the “utterance-for-learning feature amount” are input to the state-of-satisfaction estimation model learning unit 115 .
  • the state-of-satisfaction estimation model learning unit 115 performs learning processing by using a pair of the input “utterance-for-learning feature amount” and the correct value of the “state of satisfaction” of an “utterer” for each “utterance (utterance for learning)” corresponding to the “utterance-for-learning feature amount”, generates a “state-of-satisfaction estimation model” for obtaining the posteriori probability (the posteriori probability of an estimated value of the utterance feature amount) of the “utterance feature amount (the utterance feature amount of each utterance of the utterer)” given the “state of satisfaction of the utterer (the state of satisfaction when the utterer gave each utterance)”, and outputs the “state-of-satisfaction estimation model”.
  • a neural network or the like can be used, and, for model learning therefor, error backpropagation which is the existing neural network learning technique, for example, can be used.
  • Models other than the neural network may be used as long as the posteriori probability of the “utterance feature amount” given the “state of satisfaction” of an “utterer” can be obtained, and a normal mixture distribution model, for instance, may be used.
  • the posteriori probability of the utterance feature amount X(n) given the state of satisfaction S(n) of the utterer can be expressed as P(X(n)
  • the state-of-satisfaction estimation model learning unit 115 outputs the generated “state-of-satisfaction estimation model”, and the “state-of-satisfaction estimation model” is stored in the state-of-satisfaction estimation model storage 111 e.
  • the “input utterance” is time series data of the utterances given by an “utterer” in a “conversation”.
  • the “input utterance” is output to the voice activity detection unit 122 .
  • the “input utterance” output from the input unit 121 is input.
  • the voice activity detection unit 122 detects one or more voice activities by applying voice activity detection to the input “input utterance”, extracts an “input utterance” of the “utterer” in the detected voice activities, and outputs the “input utterance”.
  • voice activity detection a well-known voice activity detection technique such as a technique based on threshold processing of power or a technique based on the likelihood ratio of speech/non-speech models can be used.
  • the “input utterance” of the “utterer” in the voice activity which is output from the voice activity detection unit 122 , is input.
  • the utterance feature amount extraction unit 123 extracts, for each “input utterance” of the “utterer”, the “input utterance feature amount” which is the feature amount considered to be related to the “state of satisfaction”.
  • the type of the feature amount which is extracted by the utterance feature amount extraction unit 123 is the same as the type of the feature amount which is extracted by the above-mentioned utterance feature amount extraction unit 114 .
  • the utterance feature amount extraction unit 123 outputs the extracted “input utterance feature amount”.
  • the “input utterance feature amount” output from the utterance feature amount extraction unit 123 , the “state-of-satisfaction estimation model” read from the state-of-satisfaction estimation model storage 111 e of the model learning device 11 ( FIG. 1 ), and the “state-of-satisfaction change pattern model” read from the state-of-satisfaction change pattern model storage 111 f are input.
  • the state estimation unit 124 obtains an estimated value of the state of satisfaction of the “utterer” who gave the “utterance” corresponding to the “input utterance feature amount” by using the “input utterance feature amount”, the “state-of-satisfaction estimation model”, and the “state-of-satisfaction change pattern model” and outputs the estimated value. Based on the following formula, the state estimation unit 124 of the present embodiment obtains an estimated value of the state of satisfaction of the “utterer” at the time of the “utterance”.
  • S ⁇ ⁇ ( n ) arg ⁇ ⁇ max S ⁇ ( n ) ⁇ P ⁇ ( S ⁇ ( n ) ⁇ ⁇ X ⁇ ( n ) ) ⁇ P ( S ⁇ ( n ) ⁇ ⁇ S ⁇ ⁇ ( n - 1 ) , ... ⁇ , S ⁇ ⁇ ( 1 ) , C k ) ( 1 )
  • S ⁇ circumflex over ( ) ⁇ (n) represents an estimated value of the “state of satisfaction” of the “utterer” at the time of an n-th (n-th in chronological order; n is an integer greater than or equal to 2) “utterance” in the “conversation”, S(n) represents the “state of satisfaction” of the “utterer” at the time of the n-th “utterance” in the “conversation”, X(n) represents the “input utterance feature amount” of the n-th “utterance” in the “convers
  • S ⁇ circumflex over ( ) ⁇ in “S ⁇ circumflex over ( ) ⁇ (n)” is supposed to be written immediately above “S”, but, due to a restriction imposed by text notation, it is written above “S” on the right side thereof.
  • an initial value S ⁇ circumflex over ( ) ⁇ (1) of S ⁇ circumflex over ( ) ⁇ (n) may be a constant, or any estimated S ⁇ circumflex over ( ) ⁇ (n) from the first to the last conversation may be used as the initial value S ⁇ circumflex over ( ) ⁇ (1) this time.
  • P(a) represents the probability of an event ⁇ and
  • the state estimation unit 124 obtains P(X(n)
  • S(n)) by applying the input utterance feature amount X(n) to the “state-of-satisfaction estimation model” and further obtains, for each change pattern C k (where k 1, . . . , K), P(S(n)
  • the state estimation unit 124 selects the change pattern C k with the greatest P(X(n)
  • the states of satisfaction of an “utterer” in a “conversation” have a time series correlation. For example, there is an extremely low possibility that the state of satisfaction of an “utterer” whose state of satisfaction at a given time in a “conversation” is “satisfaction” changes to “dissatisfaction” at the next time. Moreover, since an “utterer” whose state of satisfaction transitions from “dissatisfaction” to “average” and then to “satisfaction” has a strong feeling of satisfaction to the extent that “dissatisfaction” has changed to “satisfaction”, it can be expected that “satisfaction” will continue to some extent.
  • the state of satisfaction of an “utterer” has a strong correlation to the state of satisfaction up to a given “utterance”.
  • a “state-of-satisfaction change pattern model” and a “state-of-satisfaction estimation model” are learned and, by using them and the “input utterance feature amount”, an estimated value of the state of satisfaction of an utterer who gave an utterance corresponding to the “input utterance feature amount” is obtained. By doing so, it is possible to estimate the state of satisfaction with consideration given to changes in the state of satisfaction of an “utterer”.
  • model learning device 11 and the estimating device 12 may be one and the same device, the model learning device 11 may be configured with a plurality of devices, or the estimating device 12 may be configured with a plurality of devices.
  • the state estimation unit 124 selects the change pattern C k with the greatest P(X(n)
  • a plurality of change patterns C k may be selected in the order of P(X(n)
  • S ⁇ circumflex over ( ) ⁇ (n) corresponding to the selected change patterns C k may be used as estimated values of the state of satisfaction of the “utterer” at the time of the n-th “utterance” in the “conversation”.
  • the processing details of the functions supposed to be provided in each device are described by a program.
  • the above-described processing functions are implemented on the computer.
  • the program describing the processing details can be recorded on a computer-readable recording medium.
  • An example of the computer-readable recording medium is a non-transitory recording medium. Examples of such a recording medium include a magnetic recording device, an optical disk, a magneto-optical recording medium, and semiconductor memory.
  • the distribution of this program is performed by, for example, selling, transferring, or lending a portable recording medium such as a DVD or a CD-ROM on which the program is recorded. Furthermore, a configuration may be adopted in which this program is distributed by storing the program in a storage device of a server computer and transferring the program to other computers from the server computer via a network.
  • the computer that executes such a program first, for example, temporarily stores the program recorded on the portable recording medium or the program transferred from the server computer in a storage device thereof. At the time of execution of processing, the computer reads the program stored in the storage device thereof and executes the processing in accordance with the read program. As another mode of execution of this program, the computer may read the program directly from the portable recording medium and execute the processing in accordance with the program and, furthermore, every time the program is transferred to the computer from the server computer, the computer may sequentially execute the processing in accordance with the received program.
  • a configuration may be adopted in which the transfer of a program to the computer from the server computer is not performed and the above-described processing is executed by so-called application service provider (ASP)-type service by which the processing functions are implemented only by an instruction for execution thereof and result acquisition.
  • ASP application service provider
  • processing functions of the present device are implemented as a result of a predetermined program being executed on the computer, but at least part of these processing functions may be implemented by hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Acoustics & Sound (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Accounting & Taxation (AREA)
  • Medical Informatics (AREA)
  • Finance (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Resources & Organizations (AREA)
  • Algebra (AREA)
  • Signal Processing (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Quality & Reliability (AREA)

Abstract

State-of-satisfaction change pattern models each including a set of transition weights in state sequences of the states of satisfaction are obtained for predetermined change patterns of the states of satisfaction, and a state-of-satisfaction estimation model for obtaining the posteriori probability of the utterance feature amount given the state of satisfaction of an utterer is obtained by using the utterance-for-learning feature amount and a correct value of the state of satisfaction of an utterer who gave an utterance for learning corresponding to the utterance-for-learning feature amount. By using the input utterance feature amount and the state-of-satisfaction change pattern models and the state-of-satisfaction estimation model, an estimated value of the state of satisfaction of an utterer who gave an utterance corresponding to the input utterance feature amount is obtained.

Description

TECHNICAL FIELD
The present invention relates to a technology for estimating the state of satisfaction of an utterer.
BACKGROUND ART
In the management of a call center, a technology for estimating the state of satisfaction of a customer from a call is needed. Here, the state of satisfaction of a customer is a staged category indicating whether the customer expresses his or her satisfaction or dissatisfaction and refers to, for example, three stages: satisfaction, average, and dissatisfaction. This technology can be applied to, for instance, automatization of evaluations of operators by counting the frequency of satisfaction of customers for each operator or carrying out a survey on the demands of customers by performing speech recognition and text analysis on the utterance indicating satisfaction. As similar technologies of the above-described technology, technologies for estimating satisfaction or dissatisfaction or anger of a customer from a call are proposed in Non-patent Literatures 1 and 2. In Non-patent Literature 1, satisfaction/dissatisfaction of a customer at a given time is estimated by using the feature of the way a customer speaks, such as the rate of utterance, and a linguistic feature such as the presence or absence of a product name of a competitor. In Non-patent Literature 2, the anger/non-anger state of a customer at a given time is estimated by using a prosodic feature such as the pitch or volume of a voice of a customer and a dialogic feature such as the frequency of responses. In either of these technologies, the relationship between each feature amount and satisfaction/dissatisfaction or anger of a customer is learned from a lot of calls by using a machine learning technology and used for estimation.
PRIOR ART LITERATURE Non-Patent Literature
  • Non-patent Literature 1: Youngja Park, Stephen C. Gates, “Towards Real-Time Measurement of Customer Satisfaction Using Automatically Generated Call Transcripts,” in Proceedings of the 18th ACM conference on Information and knowledge management, pp. 1387-1396, 2009.
  • Non-patent Literature 2: Narichika Nomoto, Satoshi Kobashikawa, Masafumi Tamoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi, “Using nonverbal information and characteristic linguistic representations to detect anger emotion in dialog speech,” The transactions of the Institute of Electronics, Information and Communication Engineers, Vol. J96-D, No. 1, pp. 15-24, 2013.
SUMMARY OF THE INVENTION Problems to be Solved by the Invention
Both of the existing technologies estimate the state of satisfaction of a customer from the features of a call made by a given time or before and after that time. On the other hand, it can be considered that the states of satisfaction of a customer have a time series correlation. However, there exists no literature about a survey on how the state of satisfaction of a customer changes. This can be generalized to, not only a case where the state of satisfaction of a customer in a call is estimated, but also a case where the state of satisfaction of an utterer in a conversation is estimated. An object of the present invention is to estimate the state of satisfaction with consideration given to changes in the state of satisfaction of an utterer.
Means to Solve the Problems
At the time of model learning, a state-of-satisfaction change pattern model including a set of transition weights in a state sequence (a state transition sequence) of the states of satisfaction is obtained for each of predetermined change patterns of the state of satisfaction by using a state-of-satisfaction change pattern correct value indicating a correct value of a change pattern of the state of satisfaction of an utterer in a conversation and state-of-satisfaction correct values, each indicating a correct value of the state of satisfaction of the utterer at the time of each utterance in the conversation, and the state-of-satisfaction change pattern model is output. Moreover, a state-of-satisfaction estimation model for obtaining the posteriori probability of the utterance feature amount given the state of satisfaction of an utterer is obtained by using the utterance-for-learning feature amount and a correct value of the state of satisfaction of an utterer who gave an utterance for learning corresponding to the utterance-for-learning feature amount, and the state-of-satisfaction estimation model is output.
At the time of estimation, by using the input utterance feature amount and the state-of-satisfaction change pattern model and the state-of-satisfaction estimation model, an estimated value of the state of satisfaction of an utterer who gave an utterance corresponding to the input utterance feature amount is obtained and output.
Effects of the Invention
This makes it possible to estimate the state of satisfaction with consideration given to changes in the state of satisfaction of an utterer.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram illustrating the functional configuration of a model learning device of an embodiment.
FIG. 2 is a block diagram illustrating the functional configuration of an estimating device of the embodiment.
FIG. 3 illustrates change patterns of the state of satisfaction.
FIG. 4 is a diagram illustrating temporal changes in the state of satisfaction.
FIG. 5 is a diagram illustrating a state-of-satisfaction change pattern model structure.
FIG. 6 is a diagram illustrating a state-of-satisfaction change pattern model structure.
DETAILED DESCRIPTION OF THE EMBODIMENTS
An embodiment of the present invention will be described.
[General Outline]
The general outline of the present embodiment will be described. In the present embodiment, change patterns of the state of satisfaction of an utterer in a conversation are classified into a predetermined number of expressions, and each change pattern is expressed in a probability model and used for estimation of the state of satisfaction. At the time of model learning, a state-of-satisfaction change pattern model including a set of transition weights in a state sequence (a state transition sequence) of the states of satisfaction is obtained for each of predetermined change patterns of the state of satisfaction by using a state-of-satisfaction change pattern correct value indicating a correct value of a change pattern of the state of satisfaction of an utterer in a conversation and state-of-satisfaction correct values, each indicating a correct value of the state of satisfaction of the utterer at the time of each utterance in the conversation, and a state-of-satisfaction estimation model for obtaining the posteriori probability of the utterance feature amount given the state of satisfaction of an utterer is obtained by using the utterance-for-learning feature amount and a correct value of the state of satisfaction of an utterer who gave an utterance for learning corresponding to the utterance-for-learning feature amount. At the time of estimation of the state of satisfaction, by using the input utterance feature amount and the state-of-satisfaction change pattern model and the state-of-satisfaction estimation model which are obtained by model estimation, an estimated value of the state of satisfaction of an utterer who gave an utterance corresponding to the input utterance feature amount is obtained.
An example of the “conversation” is a call which is made between a customer and a call center; however, the present invention is not limited to this example. The “conversation” may be a call which is made through a telephone line, a call which is made through the Internet, or a call which is made through a local line. The “conversation” may be a conversation such as a dialogue, a talk, or a preliminary meeting which is made by two or more human beings face-to-face, not a call. The “conversation” may be made between human beings or between a human being and an automatic interaction device (such as an interaction device using artificial intelligence). The “utterer” means one particular person who gives an “utterance” in the “conversation”. For example, when the “conversation” is a call which is made between a customer and a call center, the customer is the “utterer”; when the “conversation” is a conversation which is made by two or more human beings face-to-face, one particular person taking part in the conversation is the “utterer”; when the “conversation” is a conversation which is made between a human being and an automatic interaction device, the human being who makes a conversation with the automatic interaction device is the “utterer”.
The “state of satisfaction” means the degree of satisfaction of the “utterer” (the extent to which the “utterer” is satisfied). The “state of satisfaction” may be what is divided into a plurality of classifications or what is converted into numbers. In the former case, the “state of satisfaction” may be what is divided into two classifications (for example, two classifications: satisfaction and dissatisfaction), what is divided into three classifications (for example, three classifications: satisfaction, average, and dissatisfaction), or what is divided into four or more classifications.
The “change pattern” is a pattern indicating how the “state of satisfaction” of the “utterer” in the “conversation” changes. In other words, the “change pattern” is a pattern indicating temporal changes in the “state of satisfaction” at a plurality of time points in the “conversation”. The types and number of the “change patterns” are determined in advance. By using the “change pattern”, it is possible to estimate, from the estimated transition of the “state of satisfaction”, to which of the “change patterns” the “state of satisfaction” of the “utterer” applies and to which “state of satisfaction” the “state of satisfaction” probably transitions next. There is no restriction on the types and numbers of the “change patterns”. The inventor has listened and analyzed a lot of calls on the assumption that the “state of satisfaction” is any one of the states: “satisfaction”, “average”, and “dissatisfaction” and found that the “change patterns” of the “state of satisfaction” of a customer (an utterer) in a call-center call can be classified into the following nine patterns (FIG. 3 ).
(1) Average→satisfaction: A pattern in which average changes to satisfaction
(2) Average→dissatisfaction→satisfaction: A pattern in which average changes to dissatisfaction and then changes to satisfaction
(3) Dissatisfaction→satisfaction: A pattern in which dissatisfaction changes to satisfaction
(4) Average→average: A pattern in which average continues
(5) Average→dissatisfaction→average: A pattern in which average changes to dissatisfaction and then changes to average
(6) Dissatisfaction→dissatisfaction: A pattern in which dissatisfaction continues
(7) Average→dissatisfaction: A pattern in which average changes to dissatisfaction
(8) Dissatisfaction→average: A pattern in which dissatisfaction changes to average
(9) Satisfaction→satisfaction: A pattern in which satisfaction continues
That is, when the “state of satisfaction” is any one of the states: “satisfaction”, “average”, and “dissatisfaction”, it is desirable that the “change pattern” is any one of the above-described patterns (1) to (9). It is to be noted that the state of satisfaction at the start of the “conversation” of (9) is “satisfaction”, the state of satisfaction at the start of the “conversation” of (1), (2), (4), (5), and (7) is “average”, and the state of satisfaction at the start of the “conversation” of (3), (6), and (8) is “dissatisfaction”. The state of satisfaction at the end of the “conversation” of (1), (2), (3), and (9) is “satisfaction”, the state of satisfaction at the end of the “conversation” of (4), (5), and (8) is “average”, and the state of satisfaction at the end of the “conversation” of (6) and (7) is “dissatisfaction”. As described above, when the state of satisfaction at the start of the “conversation” shows a high level of satisfaction (is “satisfaction” or “average”), the state of satisfaction at the end of the “conversation” also tends to show a high level of satisfaction. The number of cases where the state of satisfaction at the end of the “conversation” shows a higher level of satisfaction than the level at the start of the “conversation” is smaller than the number of other cases. It is assumed that the state of satisfaction “satisfaction” shows the highest level of satisfaction, the state of satisfaction “average” shows the second highest level of satisfaction, and the state of satisfaction “dissatisfaction” shows the lowest level of satisfaction.
Here, the “state of satisfaction” of the “utterer” in the “conversation” changes in various ways. As illustrated in FIG. 4 , the same state of satisfaction sometimes continues at a plurality of time points (C1) and the state of satisfaction sometimes changes at a plurality of time points (C2). For this reason, changes in the state of satisfaction of the “utterer” in an actual “conversation” do not always fit into a predetermined “change pattern”. To express such complicated changes, for each “change pattern”, changes in the state of satisfaction are expressed in a probability model (a state-of-satisfaction change pattern model). That is, for each “change pattern”, a “state-of-satisfaction change pattern model” including a set of transition weights (for example, transition probabilities) in a state sequence of the “states of satisfaction” is generated. In other words, a model including a set of transition weights between the “states of satisfaction” in a state sequence is a “state-of-satisfaction change pattern model”. It is to be noted that a state sequence of the “states of satisfaction” means a sequence of the “states of satisfaction” to which the state of satisfaction can transition from the start to the end of the “conversation”. For modeling of the “change pattern”, hidden Markov model (HMM) is used, for example (Reference Literature 1: Keiichi Tokuda, “State-of-the-art Technology of Speech Information Processing: Speech Recognition and Speech Synthesis based on Hidden Markov Models”, IPSJ Magazine, Vol. 45, No. 10, pp. 1005-1011, 2004). In order to properly model various changes in the state of satisfaction, it is desirable that left-to-right HMM with branches, not a chain-like HMM, is used for modeling of the “change pattern”. In FIG. 5 , a state sequence of the “states of satisfaction” in left-to-right HMM with branches is illustrated. In this example, from the “state of satisfaction” S0 at the start of the “conversation”, the state of satisfaction transitions to each of the “states of satisfaction” at time points I, II, and III and reaches the “state of satisfaction” S4 at the end of the “conversation”. The “states of satisfaction” at time points I, II, and III each branch to three states: S1=satisfaction, S2=average, and S3=dissatisfaction. A state sequence of the “states of satisfaction” which is used for modeling of the “change pattern” is referred to as a “state-of-satisfaction change pattern model structure”. Although the “state-of-satisfaction change pattern model” is obtained for each “change pattern”, it is desirable that the same “state-of-satisfaction change pattern model structure” is used for all the “change patterns”. That is, it is desirable to use the same “state-of-satisfaction change pattern model structure” for all the “change patterns” and obtain the “state-of-satisfaction change pattern model” for each of the “change patterns”. The reason is as follows. If the “state-of-satisfaction change pattern model structure” is changed in accordance with the “change pattern”, the “state-of-satisfaction change pattern model” reflects the tendency of the “state-of-satisfaction change pattern model structure”, which sometimes makes it impossible to model the “change pattern” properly. However, the same “state-of-satisfaction change pattern model structure” may not be used for all the “change patterns” as long as the “change pattern” can be properly modeled.
Details of the Embodiment
Hereinafter, the present embodiment will be specifically described with reference to the drawings.
<Configuration>
As illustrated in FIG. 1 , a model learning device 11 of the present embodiment includes an utterance-for-learning storage 111 a, a state-of-satisfaction correct value storage 111 b, a state-of-satisfaction change pattern correct value storage 111 c, a state-of-satisfaction change pattern model structure storage 111 d, a state-of-satisfaction estimation model storage 111 e, a state-of-satisfaction change pattern model storage 111 f, a state-of-satisfaction change pattern model learning unit 112, a voice activity detection unit 113, an utterance feature amount extraction unit 114, and a state-of-satisfaction estimation model learning unit 115. As illustrated in FIG. 2 , an estimating device 12 of the present embodiment includes an input unit 121, a voice activity detection unit 122, an utterance feature amount extraction unit 123, and a state estimation unit 124. Each of the model learning device 11 and the estimating device 12 of the present embodiment is configured as a result of, for example, a general-purpose or dedicated computer including a processor (a hardware processor) such as a central processing unit (CPU), memory such as random-access memory (RAM) and read-only memory (ROM), and so forth executing a predetermined program. This computer may include one processor or memory or more than one processor or memory. This program may be installed in the computer or may be recorded on the ROM or the like in advance. Moreover, part or all of the processing units may be configured by using not an electronic circuit (circuitry), like a CPU, which implements a functional configuration as a result of a program being read thereinto but an electronic circuit that implements a processing function without using a program. An electronic circuit with which one device is configured may include a plurality of CPUs.
<Model Learning Processing>
First, model learning processing which is performed by the model learning device 11 (FIG. 1 ) will be described.
<<Preprocessing>>
As preprocessing, an “utterance for learning” necessary for model learning is stored in the utterance-for-learning storage 111 a of the model learning device 11 (FIG. 1 ), a “state-of-satisfaction change pattern correct value” is stored in the state-of-satisfaction change pattern correct value storage 111 c, “state-of-satisfaction correct values” are stored in the state-of-satisfaction correct value storage 111 b, and a “state-of-satisfaction change pattern model structure” is stored in the state-of-satisfaction change pattern model structure storage 111 d. The “utterance for learning” is time series speech data of “utterances” given by an “utterer” in each of a plurality of “conversations”. The “utterance for learning” is obtained by recording the contents of “utterances” of an “utterer” who is making a “conversation”. The “state-of-satisfaction change pattern correct value” indicates a correct value of a “change pattern” of the state of satisfaction of an “utterer” in each of the “conversations”. The “state-of-satisfaction change pattern correct value” is manually set based on answers made by an “utterer” about a “change pattern” to which changes in his or her state of satisfaction in the “conversation” apply. The “state of satisfaction” of the present embodiment is any one of the three states: “satisfaction”, “average”, and “dissatisfaction”, and the “state-of-satisfaction change pattern correct value” is any one of the above-mentioned nine “change patterns” (1) to (9) (FIG. 3 ). Each of the “state-of-satisfaction correct values” indicates a correct value of the “state of satisfaction” of an “utterer” at the time of each utterance in these “conversations”. That is, the “state-of-satisfaction correct value” indicates a correct value of the “state of satisfaction” of an “utterer” at a time point at which each utterance was given by the “utterer”. The “state-of-satisfaction correct values” are manually set based on answers made by an “utterer” about the “state of satisfaction” at time points at which the “utterer” gave “utterances”. The “state-of-satisfaction change pattern model structure” is a state sequence of the “states of satisfaction” which is used for modeling of the “change pattern”. An example of the “state-of-satisfaction change pattern model structure” is the state sequence illustrated in FIG. 5 . In the present embodiment, the same “state-of-satisfaction change pattern model structure” is used for all the “change patterns”. However, the present invention is not limited thereto. A label for identifying a “conversation” and an “utterance” corresponding to each time point is correlated with the “utterance for learning”, a label for identifying a “conversation” corresponding to each “state-of-satisfaction change pattern correct value” is correlated with the “state-of-satisfaction change pattern correct value”, and a label for identifying an “utterance” corresponding to each “state-of-satisfaction correct value” is correlated with the “state-of-satisfaction correct value”. As a result, the “utterance for learning”, the “state-of-satisfaction change pattern correct value”, and the “state-of-satisfaction correct value” are correlated with each other.
<<Processing which is Performed by the State-of-Satisfaction Change Pattern Model Learning Unit 112>>
To the state-of-satisfaction change pattern model learning unit 112, the “state-of-satisfaction change pattern correct value”, the “state-of-satisfaction correct values”, and the “state-of-satisfaction change pattern model structure”, which are respectively read from the state-of-satisfaction change pattern correct value storage 111 c, the state-of-satisfaction correct value storage 111 b, and the state-of-satisfaction change pattern model structure storage 111 d, are input. By using them, the state-of-satisfaction change pattern model learning unit 112 obtains, for each of the predetermined “change patterns” of the “state of satisfaction”, a “state-of-satisfaction change pattern model” including the “state-of-satisfaction change pattern model structure” and a set of transition weights of the states of satisfaction and outputs the “state-of-satisfaction change pattern model”. When K types (where K is the total number of change patterns (K≥2) and K=9 in the example of FIG. 3 ) of change patterns C1, . . . , CK are set, the state-of-satisfaction change pattern model learning unit 112 obtains, for each change pattern Ck (where k=1, . . . , K), a state-of-satisfaction change pattern model PMk (where k=1, . . . , K) and outputs the state-of-satisfaction change pattern model PMk. For the generation of the state-of-satisfaction change pattern model PMk, “state-of-satisfaction correct values” corresponding to “utterances” given in a “conversation” whose “state-of-satisfaction change pattern correct value” is the change pattern Ck are used. In other words, the state-of-satisfaction change pattern model learning unit 112 learns transition weights (for example, transition probabilities) between the states of satisfaction included in the “state-of-satisfaction change pattern model structure” by using, as learning data, “state-of-satisfaction correct values” corresponding to “utterances” included in a “conversation” whose “state-of-satisfaction change pattern correct value” is the change pattern Ck, and outputs the state-of-satisfaction change pattern model PMk including the “state-of-satisfaction change pattern model structure” and a set of the obtained transition weights. In the case of the “state-of-satisfaction change pattern model structure” illustrated in FIG. 5 , by using, as learning data, “state-of-satisfaction correct values” corresponding to “utterances” included in a “conversation” whose “state-of-satisfaction change pattern correct value” is the change pattern Ck, the state-of-satisfaction change pattern model learning unit 112 learns transition weights from S0 to S1, S2, and S3 in Stage I, transition weights of S1, S2, and S3 in Stage I, transition weights from S1, S2, and S3 in Stage I to S1, S2, and S3 in Stage II, transition weights of S1, S2, and S3 in Stage II, transition weights from S1, S2, and S3 in Stage II to S1, S2, and S3 in Stage III, transition weights of S1, S2, and S3 in Stage III, and transition weights from S1, S2, and S3 in Stage III to S4, and outputs the state-of-satisfaction change pattern model PMk including the “state-of-satisfaction change pattern model structure” illustrated in FIG. 5 and a set of the obtained transition weights. It is to be noted that, when the “state-of-satisfaction change pattern model structure” is known, information which does not include the “state-of-satisfaction change pattern model structure” and includes a set of the obtained transition weights may be used as the “state-of-satisfaction change pattern model”. In FIG. 6 , a state transition with heavy transition weights of the transition weights corresponding to the change pattern which is the above-described “(1) Average→satisfaction: A pattern in which average changes to satisfaction” is illustrated by thick arrows, and a state transition with light transition weights of the transition weights is illustrated by thin arrows. Learning of transition weights can be performed by the same procedure as that of HMM learning which is performed when a state sequence is known (see, for example, Reference Literature 2 (Kiyohiro Shikano, Katsunobu Ito, Tatsuya Kawahara, Kazuya Takeda, Mikio Yamamoto, “Speech Recognition System”, Ohmsha, Ltd., pp. 27-29, 2001)). The state-of-satisfaction change pattern model PMk is obtained for each change pattern Ck (where k=1, K). The state-of-satisfaction change pattern model learning unit 112 obtains, by using the same “state-of-satisfaction change pattern model structure” for all the change patterns C1, . . . , CK, the state-of-satisfaction change pattern model PMk (where k=1, . . . , K) for each change pattern Ck (where k=1, . . . , K), and outputs the state-of-satisfaction change pattern model PMk. The state-of-satisfaction change pattern model PMk (where k=1, . . . , K) obtained for each change pattern Ck (where k=1, . . . , K) is stored in the state-of-satisfaction change pattern model storage 111 f.
<<Processing which is Performed by the Voice Activity Detection Unit 113»
To the voice activity detection unit 113, the “utterance for learning” read from the utterance-for-learning storage 111 a is input. The voice activity detection unit 113 detects one or more voice activities by applying voice activity detection to the input “utterance for learning”, and extracts an “utterance” of the “utterer” in the detected voice activities and outputs the “utterance”. For voice activity detection, a well-known voice activity detection technique such as a technique based on threshold processing of power or a technique based on the likelihood ratio of speech/non-speech models can be used.
<<Processing which is Performed by the Utterance Feature Amount Extraction Unit 114»
To the utterance feature amount extraction unit 114, the “utterance (the utterance for learning)” of the “utterer” in the voice activity, which is output from the voice activity detection unit 113, is input. The utterance feature amount extraction unit 114 extracts the “utterance-for-learning feature amount”, which is the feature amount considered to be related to the “state of satisfaction”, for each “utterance” of the “utterer”. For instance, the utterance feature amount extraction unit 114 extracts, as the “utterance-for-learning feature amount”, the feature amount including at least one or more of the prosodic feature, the dialogic feature, and the language feature of an “utterance”. As the prosodic feature, at least one or more of, for example, the fundamental frequency of an utterance, the mean, standard deviation, maximum value, and minimum value of power, the rate of utterance during the utterance, and the duration of a final phoneme in the utterance can be used. When the fundamental frequency or power of an utterance is used as at least part of the feature amount, the utterance feature amount extraction unit 114 may divide an utterance into frames, obtain the fundamental frequency or power for each frame, and use the fundamental frequency or power of each frame as at least part of the feature amount. When the rate of utterance or the duration of a final phoneme is used as at least part of the feature amount, the utterance feature amount extraction unit 114 may estimate a phoneme sequence in an utterance by using a well-known speech recognition technology and obtain the rate of utterance or the duration of a final phoneme. As the dialogic feature, at least one or more of the following can be used: the time from the previous “utterance” given by an “utterer” such as a customer to the present “utterance”; the time from a dialogic utterance given by a dialogist, such as an operator, who made a conversation with an “utterer” such as a customer to an “utterance” given by the “utterer” such as the customer; the time from an “utterer” such as a customer to the next dialogic utterance given by a dialogist such as an operator; the length of an “utterance” given by an “utterer” such as a customer; the length of a dialogic utterance given by a dialogist such as an operator, which was given before and after an “utterance” given by an “utterer”; the number of responses made by an “utterer” such as a customer during a dialogic utterance given by a dialogist such as an operator before and after it; and the number of responses made by a dialogist such as an operator during an utterance given by an “utterer” such as a customer. As the language feature, at least one or more of the number of words in an utterance, the number of fillers in the utterance, and the number of words of appreciation in the utterance can be used. When the language feature is used as at least part of the feature amount, the utterance feature amount extraction unit 114 may estimate a word which may be used in an utterance by using a well-known speech recognition technology and use the result thereof. The number of words of appreciation (for example, “thank you” or “thanks”), which are manually selected, may be used as at least part of the feature amount. Which of the features is used as the “utterance-for-learning feature amount” is determined in advance. The utterance feature amount extraction unit 114 outputs the extracted “utterance-for-learning feature amount”.
<<Processing which is Performed by the State-of-Satisfaction Estimation Model Learning Unit 115>>
To the state-of-satisfaction estimation model learning unit 115, the “utterance-for-learning feature amount” output from the utterance feature amount extraction unit 114 and the correct value of the “state of satisfaction” read from the state-of-satisfaction correct value storage 111 b are input. It is to be noted that the correct value of the “state of satisfaction” input to the state-of-satisfaction estimation model learning unit 115 is the correct value of the “state of satisfaction” of an “utterer” who gave an “utterance” corresponding to the “utterance-for-learning feature amount” which is input to the state-of-satisfaction estimation model learning unit 115. That is, the “utterance-for-learning feature amount” and the correct value of the “state of satisfaction” of an “utterer” at the time of each “utterance” corresponding to the “utterance-for-learning feature amount” are input to the state-of-satisfaction estimation model learning unit 115. The state-of-satisfaction estimation model learning unit 115 performs learning processing by using a pair of the input “utterance-for-learning feature amount” and the correct value of the “state of satisfaction” of an “utterer” for each “utterance (utterance for learning)” corresponding to the “utterance-for-learning feature amount”, generates a “state-of-satisfaction estimation model” for obtaining the posteriori probability (the posteriori probability of an estimated value of the utterance feature amount) of the “utterance feature amount (the utterance feature amount of each utterance of the utterer)” given the “state of satisfaction of the utterer (the state of satisfaction when the utterer gave each utterance)”, and outputs the “state-of-satisfaction estimation model”. For example, as the “state-of-satisfaction estimation model”, a neural network or the like can be used, and, for model learning therefor, error backpropagation which is the existing neural network learning technique, for example, can be used. Models other than the neural network may be used as long as the posteriori probability of the “utterance feature amount” given the “state of satisfaction” of an “utterer” can be obtained, and a normal mixture distribution model, for instance, may be used. If the “state of satisfaction” of an “utterer” at the time of an n-th “utterance” by the “utterer” in a “conversation” is assumed to be S(n) and the “utterance feature amount” of the n-th “utterance” is assumed to be X(n), the posteriori probability of the utterance feature amount X(n) given the state of satisfaction S(n) of the utterer can be expressed as P(X(n)|S(n)). It is assumed that, in the posteriori probability P(X(n)|S(n)), the state of satisfaction S(n) does not depend on n. The state-of-satisfaction estimation model learning unit 115 outputs the generated “state-of-satisfaction estimation model”, and the “state-of-satisfaction estimation model” is stored in the state-of-satisfaction estimation model storage 111 e.
<Estimation Processing>
Next, estimation processing which is performed by the estimating device 12 (FIG. 2 ) will be described.
<<Input to the Input Unit 121>>
An “input utterance”, which is an utterance based on which the state of satisfaction is to be estimated, is input to the input unit 121 of the estimating device 12. The “input utterance” is time series data of the utterances given by an “utterer” in a “conversation”. The “input utterance” is output to the voice activity detection unit 122.
<<Processing which is Performed by the Voice Activity Detection Unit 122>>
To the voice activity detection unit 122, the “input utterance” output from the input unit 121 is input. The voice activity detection unit 122 detects one or more voice activities by applying voice activity detection to the input “input utterance”, extracts an “input utterance” of the “utterer” in the detected voice activities, and outputs the “input utterance”. For voice activity detection, a well-known voice activity detection technique such as a technique based on threshold processing of power or a technique based on the likelihood ratio of speech/non-speech models can be used.
<<Processing which is Performed by the Utterance Feature Amount Extraction Unit 123>>
To the utterance feature amount extraction unit 123, the “input utterance” of the “utterer” in the voice activity, which is output from the voice activity detection unit 122, is input. The utterance feature amount extraction unit 123 extracts, for each “input utterance” of the “utterer”, the “input utterance feature amount” which is the feature amount considered to be related to the “state of satisfaction”. The type of the feature amount which is extracted by the utterance feature amount extraction unit 123 is the same as the type of the feature amount which is extracted by the above-mentioned utterance feature amount extraction unit 114. The utterance feature amount extraction unit 123 outputs the extracted “input utterance feature amount”.
<<Processing which is Performed by the State Estimation Unit 124>>
To the state estimation unit 124, the “input utterance feature amount” output from the utterance feature amount extraction unit 123, the “state-of-satisfaction estimation model” read from the state-of-satisfaction estimation model storage 111 e of the model learning device 11 (FIG. 1 ), and the “state-of-satisfaction change pattern model” read from the state-of-satisfaction change pattern model storage 111 f are input. The state estimation unit 124 obtains an estimated value of the state of satisfaction of the “utterer” who gave the “utterance” corresponding to the “input utterance feature amount” by using the “input utterance feature amount”, the “state-of-satisfaction estimation model”, and the “state-of-satisfaction change pattern model” and outputs the estimated value. Based on the following formula, the state estimation unit 124 of the present embodiment obtains an estimated value of the state of satisfaction of the “utterer” at the time of the “utterance”.
S ^ ( n ) = arg max S ( n ) P ( S ( n ) X ( n ) ) P ( S ( n ) S ^ ( n - 1 ) , , S ^ ( 1 ) , C k ) ( 1 )
It is to be noted that S{circumflex over ( )}(n) represents an estimated value of the “state of satisfaction” of the “utterer” at the time of an n-th (n-th in chronological order; n is an integer greater than or equal to 2) “utterance” in the “conversation”, S(n) represents the “state of satisfaction” of the “utterer” at the time of the n-th “utterance” in the “conversation”, X(n) represents the “input utterance feature amount” of the n-th “utterance” in the “conversation”, and Ck (where k=1, K) represents a k-th change pattern of the above-mentioned K (for example, nine) change patterns. “{circumflex over ( )}” in “S{circumflex over ( )}(n)” is supposed to be written immediately above “S”, but, due to a restriction imposed by text notation, it is written above “S” on the right side thereof. Moreover, an initial value S{circumflex over ( )}(1) of S{circumflex over ( )}(n) may be a constant, or any estimated S{circumflex over ( )}(n) from the first to the last conversation may be used as the initial value S{circumflex over ( )}(1) this time. Furthermore, P(a) represents the probability of an event α and
arg max S ( n ) P ( α )
means S(n) which maximizes P(a). Moreover, Formula (1) is derived as follows.
S ^ ( n ) = arg max S ( n ) P ( X ( n ) , S ( n ) S ^ ( n - 1 ) , , S ^ ( 1 ) , C k ) = arg max S ( n ) P ( X ( n ) , S ( n ) S ^ ( n - 1 ) , , S ^ ( 1 ) , C k ) × P ( S ( n ) S ^ ( n - 1 ) , , S ^ ( 1 ) , C k ) = arg max S ( n ) P ( X ( n ) S ( n ) ) P ( S ( n ) S ^ ( n - 1 ) , , S ^ ( 1 ) , C k ) = arg max S ( n ) P ( S ( n ) X ( n ) ) P ( S ( n ) S ^ ( n - 1 ) , , S ^ ( 1 ) , C k )
More specifically, the state estimation unit 124 obtains P(X(n)|S(n)) by applying the input utterance feature amount X(n) to the “state-of-satisfaction estimation model” and further obtains, for each change pattern Ck (where k=1, . . . , K), P(S(n)|S{circumflex over ( )}(n−1), S{circumflex over ( )}(1), Ck) by using the “state-of-satisfaction change pattern model” and S{circumflex over ( )}(n−1), S{circumflex over ( )}(1). The state estimation unit 124 obtains, for each change pattern Ck (where k=1, K; for example, K=1, . . . , 9), S(n), which maximizes the product P(X(n)|S(n))P(S(n)|S{circumflex over ( )}(n−1), S{circumflex over ( )}(1) Ck), as S{circumflex over ( )}(n). Furthermore, the state estimation unit 124 selects the change pattern Ck with the greatest P(X(n)|S(n))P(S(n)|S{circumflex over ( )}(n−1), S{circumflex over ( )}(1), Ck) corresponding to S{circumflex over ( )}(n)=S(n) obtained for each change pattern Ck (where k=1, K; for example, K=1, . . . , 9), and outputs S{circumflex over ( )}(n) corresponding to the selected change pattern Ck as an estimated value of the state of satisfaction of the “utterer” at the time of the n-th “utterance” in the “conversation”. If they are calculated in a brute-force manner, calculation amount is significantly increased; therefore, as in the case where, for example, HMM is used in speech recognition, only a maximum likelihood sequence may be used for calculation by using the Viterbi algorithm. The obtained S{circumflex over ( )}(n) is recursively used for calculation of the next n+1-th S{circumflex over ( )}(n+1).
Features of the Present Embodiment
It can be considered that the states of satisfaction of an “utterer” in a “conversation” have a time series correlation. For example, there is an extremely low possibility that the state of satisfaction of an “utterer” whose state of satisfaction at a given time in a “conversation” is “satisfaction” changes to “dissatisfaction” at the next time. Moreover, since an “utterer” whose state of satisfaction transitions from “dissatisfaction” to “average” and then to “satisfaction” has a strong feeling of satisfaction to the extent that “dissatisfaction” has changed to “satisfaction”, it can be expected that “satisfaction” will continue to some extent. As described above, the state of satisfaction of an “utterer” has a strong correlation to the state of satisfaction up to a given “utterance”. In the present embodiment, a “state-of-satisfaction change pattern model” and a “state-of-satisfaction estimation model” are learned and, by using them and the “input utterance feature amount”, an estimated value of the state of satisfaction of an utterer who gave an utterance corresponding to the “input utterance feature amount” is obtained. By doing so, it is possible to estimate the state of satisfaction with consideration given to changes in the state of satisfaction of an “utterer”.
[Other Modifications Etc.]
It is to be noted that the present invention is not limited to the above-described embodiment. For instance, the model learning device 11 and the estimating device 12 may be one and the same device, the model learning device 11 may be configured with a plurality of devices, or the estimating device 12 may be configured with a plurality of devices.
In the above-described embodiment, the state estimation unit 124 selects the change pattern Ck with the greatest P(X(n)|S(n))P(S(n)|S{circumflex over ( )}(n−1), . . . , SAW, Ck) corresponding to S{circumflex over ( )}(n)=S(n) obtained for each change pattern Ck (where k=1, K; for example, K=1, . . . , 9) and outputs S{circumflex over ( )}(n) corresponding to the selected change pattern Ck as an estimated value of the state of satisfaction of the “utterer” at the time of the n-th “utterance” in the “conversation”. Alternatively, a plurality of change patterns Ck may be selected in the order of P(X(n)|S(n))P(S(n)|S{circumflex over ( )}(n−1), . . . , S{circumflex over ( )}(1), Ck) corresponding to S{circumflex over ( )}(n)=S(n) obtained for each change pattern Ck (where k=1, . . . , K; for example, K=1, . . . , 9) from greatest to smallest, and S{circumflex over ( )}(n) corresponding to the selected change patterns Ck may be used as estimated values of the state of satisfaction of the “utterer” at the time of the n-th “utterance” in the “conversation”. Moreover, the state estimation unit 124 may output, as an estimated value of the state of satisfaction of an “utterer”, S{circumflex over ( )}(n)=S(n) obtained for each change pattern Ck (where k=1, . . . , K; for example, K=1, . . . , 9) along with the magnitude of P(X(n)|S(n))P(S(n)|S{circumflex over ( )}(n−1), . . . , S{circumflex over ( )}(1), Ck) corresponding thereto.
The above-described various kinds of processing may be executed, in addition to being executed in chronological order in accordance with the descriptions, in parallel or individually depending on the processing power of a device that executes the processing or when needed. In addition, it goes without saying that changes may be made as appropriate without departing from the spirit of the present invention.
When the above-described configurations are implemented by a computer, the processing details of the functions supposed to be provided in each device are described by a program. As a result of this program being executed by the computer, the above-described processing functions are implemented on the computer. The program describing the processing details can be recorded on a computer-readable recording medium. An example of the computer-readable recording medium is a non-transitory recording medium. Examples of such a recording medium include a magnetic recording device, an optical disk, a magneto-optical recording medium, and semiconductor memory.
The distribution of this program is performed by, for example, selling, transferring, or lending a portable recording medium such as a DVD or a CD-ROM on which the program is recorded. Furthermore, a configuration may be adopted in which this program is distributed by storing the program in a storage device of a server computer and transferring the program to other computers from the server computer via a network.
The computer that executes such a program first, for example, temporarily stores the program recorded on the portable recording medium or the program transferred from the server computer in a storage device thereof. At the time of execution of processing, the computer reads the program stored in the storage device thereof and executes the processing in accordance with the read program. As another mode of execution of this program, the computer may read the program directly from the portable recording medium and execute the processing in accordance with the program and, furthermore, every time the program is transferred to the computer from the server computer, the computer may sequentially execute the processing in accordance with the received program. A configuration may be adopted in which the transfer of a program to the computer from the server computer is not performed and the above-described processing is executed by so-called application service provider (ASP)-type service by which the processing functions are implemented only by an instruction for execution thereof and result acquisition.
In the above-described embodiments, processing functions of the present device are implemented as a result of a predetermined program being executed on the computer, but at least part of these processing functions may be implemented by hardware.
DESCRIPTION OF REFERENCE NUMERALS
    • 11 model learning device
    • 12 estimating device

Claims (9)

What is claimed is:
1. A model learning device comprising processing circuitry configured to:
obtain, for a plurality types of predetermined change patterns of a plurality of states of satisfaction, a plurality of state-of-satisfaction change pattern models each including a set of transition weights in a plurality of state sequences of the plurality of states of satisfaction of each of the predetermined change patterns by using state-of-satisfaction change pattern correct values indicating correct values of change patterns of state of satisfactions of an utterer in a conversation and state-of-satisfaction correct values, each indicating a correct value of the state of satisfaction of the utterer at the time of each utterance in the conversation, and output the state-of-satisfaction change pattern models;
obtain, by using an utterance-for-learning feature amount and a correct value of a state of satisfaction of an utterer who gave an utterance for learning corresponding to the utterance-for-learning feature amount, a state-of-satisfaction estimation model for obtaining a posteriori probability of an utterance feature amount given a state of satisfaction of an utterer, and output the state-of-satisfaction estimation model; wherein
the state-of-satisfaction change pattern models and the state-of-satisfaction estimation model are input to an estimating device,
the estimating device
receives a plurality of utterances given by a particular utterer in a conversation,
detects one or more voice activities in each of the plurality of utterances received, and
by using an input utterance feature amount extracted, based on the detected one or more voice activities, and the state-of-satisfaction change pattern models and the state-of-satisfaction estimation model, obtains an estimated value of a state of satisfaction of the particular utterer who provided the plurality of utterances to the estimating device and outputs the estimated value.
2. The model learning device according to claim 1, wherein
the states of satisfaction include any one of states: satisfaction, average, and dissatisfaction, and
the change patterns include any one of
(1) a pattern in which the state of satisfaction changes from average to satisfaction,
(2) a pattern in which the state of satisfaction changes from average to dissatisfaction and then changes to satisfaction,
(3) a pattern in which the state of satisfaction changes from dissatisfaction to satisfaction,
(4) a pattern in which average continues,
(5) a pattern in which the state of satisfaction changes from average to dissatisfaction and then changes to average,
(6) a pattern in which dissatisfaction continues,
(7) a pattern in which the state of satisfaction changes from average to dissatisfaction,
(8) a pattern in which the state of satisfaction changes from dissatisfaction to average, and
(9) a pattern in which satisfaction continues.
3. The model learning device according to claim 1, wherein
a state-of-satisfaction change pattern model structure is the state sequence of the states of satisfaction, and
the processing circuitry obtains, for the plurality types of the change patterns, the plurality of the state-of-satisfaction change pattern models by using the same state-of-satisfaction change pattern model structure for all the change patterns and outputs the state-of-satisfaction change pattern models.
4. An estimating device comprising processing circuitry comprising processing circuitry configured to:
receive the state-of-satisfaction change pattern models and the state-of-satisfaction estimation model of any one of claims 1 to 3;
receive a plurality of utterances given by a particular utterer in a conversation;
detect one or more voice activities in each of the plurality of utterances; and
by using an input utterance feature amount, based on the detected one or more voice activities, and the state-of-satisfaction change pattern models and the state-of-satisfaction estimation model, obtain an estimated value of a state of satisfaction of an utterer who provided the plurality of utterances to the estimating device and output the estimated value.
5. A model learning method of a model learning device, the model learning method, executed by processing circuitry, comprising:
obtaining, for a plurality types of predetermined change patterns of a plurality of states of satisfaction, a plurality of state-of-satisfaction change pattern models each including a set of transition weights in a plurality of state sequences of the plurality of states of satisfaction of each of the predetermined change patterns by using state-of-satisfaction change pattern correct values indicating correct values of change patterns of state of satisfactions of an utterer in a conversation and state-of-satisfaction correct values, each indicating a correct value of the state of satisfaction of the utterer at the time of each utterance in the conversation, and outputting the state-of-satisfaction change pattern models; and
obtaining, by using an utterance-for-learning feature amount and a correct value of a state of satisfaction of an utterer who gave an utterance for learning corresponding to the utterance-for-learning feature amount, a state-of-satisfaction estimation model for obtaining a posteriori probability of an utterance feature amount given a state of satisfaction of an utterer, and outputting the state-of-satisfaction estimation model, wherein
the state-of-satisfaction change pattern models and the state-of-satisfaction estimation model are input to an estimating device, the estimating device receives a plurality of utterances given by a particular utterer in a conversation,
detects one or more voice activities in each of the plurality of utterances, and
by using an input utterance feature amount extracted, based on the detected one or more voice activities, and the state-of-satisfaction change pattern models and the state-of-satisfaction estimation model, obtains an estimated value of a state of satisfaction of the particular utterer who provided the plurality of utterances to the estimating device and outputs the estimated value.
6. The model learning method according to claim 5, wherein
a state-of-satisfaction change pattern model structure is the state sequence of the states of satisfaction, and
the state-of-satisfaction change pattern model learning step obtains, for the plurality types of the change patterns, the plurality of the state-of-satisfaction change pattern models by using the same state-of-satisfaction change pattern model structure for all the change patterns and outputs the state-of-satisfaction change pattern models.
7. An estimating method of an estimating device, the estimating method, executed by processing circuitry, comprising:
receiving the state-of-satisfaction change pattern models and the state-of-satisfaction estimation model of claim 5 or 6;
receiving a plurality of utterances given by a particular utterer in a conversation;
detecting one or more voice activities in each of the plurality of utterances; and
by using an input utterance feature amount, based on the detected one or more voice activities, and the state-of-satisfaction change pattern models and the state-of-satisfaction estimation model, obtaining an estimated value of a state of satisfaction of an utterer who provided the plurality of utterances to the estimating device and outputting the estimated value.
8. A non-transitory computer-readable recording medium storing a program for causing a computer to execute the model learning device according to claim 5 or 6.
9. A non-transitory computer-readable recording medium storing a program for causing a computer to execute the estimating method according to claim 7.
US16/484,053 2017-02-08 2018-02-02 Model learning device, estimating device, methods therefor, and program Active 2039-12-12 US11521641B2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JPJP2017-020999 2017-02-08
JP2017-020999 2017-02-08
JP2017020999 2017-02-08
PCT/JP2018/003644 WO2018147193A1 (en) 2017-02-08 2018-02-02 Model learning device, estimation device, method therefor, and program

Publications (2)

Publication Number Publication Date
US20190392348A1 US20190392348A1 (en) 2019-12-26
US11521641B2 true US11521641B2 (en) 2022-12-06

Family

ID=63108059

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/484,053 Active 2039-12-12 US11521641B2 (en) 2017-02-08 2018-02-02 Model learning device, estimating device, methods therefor, and program

Country Status (3)

Country Link
US (1) US11521641B2 (en)
JP (1) JP6780033B2 (en)
WO (1) WO2018147193A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220067479A1 (en) * 2018-10-08 2022-03-03 Qualcomm Incorporated Vehicle entry detection
US12020427B2 (en) 2017-10-03 2024-06-25 Advanced Telecommunications Research Institute International Differentiation device, differentiation method for depression symptoms, determination method for level of depression symptoms, stratification method for depression patients, determination method for effects of treatment of depression symptoms, and brain activity training device

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102225984B1 (en) * 2018-09-03 2021-03-10 엘지전자 주식회사 Device including battery
JP7192492B2 (en) * 2018-12-27 2022-12-20 富士通株式会社 LEARNING DEVICE, LEARNING METHOD AND LEARNING PROGRAM
CN109670030B (en) * 2018-12-30 2022-06-28 联想(北京)有限公司 Question-answer interaction method and device
JP6998349B2 (en) * 2019-09-20 2022-01-18 ヤフー株式会社 Learning equipment, learning methods, and learning programs
US20220272124A1 (en) * 2021-02-19 2022-08-25 Intuit Inc. Using machine learning for detecting solicitation of personally identifiable information (pii)
US20250006197A1 (en) 2021-12-02 2025-01-02 Nippon Telegraph And Telephone Corporation Estimation method, training method, estimation device, and estimation program

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332287A1 (en) * 2009-06-24 2010-12-30 International Business Machines Corporation System and method for real-time prediction of customer satisfaction
US20120011158A1 (en) * 2010-03-24 2012-01-12 Taykey Ltd. System and methods thereof for real-time monitoring of a sentiment trend with respect of a desired phrase
US20120101808A1 (en) * 2009-12-24 2012-04-26 Minh Duong-Van Sentiment analysis from social media content
US20160350651A1 (en) * 2015-05-29 2016-12-01 North Carolina State University Automatically constructing training sets for electronic sentiment analysis
US20170102765A1 (en) * 2015-10-08 2017-04-13 Panasonic Intellectual Property Corporation Of America Information presenting apparatus and control method therefor
US20170277993A1 (en) * 2016-03-22 2017-09-28 Next It Corporation Virtual assistant escalation
US20170278067A1 (en) * 2016-03-25 2017-09-28 International Business Machines Corporation Monitoring activity to detect potential user actions
US20180165582A1 (en) * 2016-12-08 2018-06-14 Facebook, Inc. Systems and methods for determining sentiments in conversations in a chat application
US20180197088A1 (en) * 2017-01-10 2018-07-12 International Business Machines Corporation Discovery, characterization, and analysis of interpersonal relationships extracted from unstructed text data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150310877A1 (en) * 2012-10-31 2015-10-29 Nec Corporation Conversation analysis device and conversation analysis method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332287A1 (en) * 2009-06-24 2010-12-30 International Business Machines Corporation System and method for real-time prediction of customer satisfaction
US20120101808A1 (en) * 2009-12-24 2012-04-26 Minh Duong-Van Sentiment analysis from social media content
US20120011158A1 (en) * 2010-03-24 2012-01-12 Taykey Ltd. System and methods thereof for real-time monitoring of a sentiment trend with respect of a desired phrase
US20160350651A1 (en) * 2015-05-29 2016-12-01 North Carolina State University Automatically constructing training sets for electronic sentiment analysis
US20170102765A1 (en) * 2015-10-08 2017-04-13 Panasonic Intellectual Property Corporation Of America Information presenting apparatus and control method therefor
US20170277993A1 (en) * 2016-03-22 2017-09-28 Next It Corporation Virtual assistant escalation
US20170278067A1 (en) * 2016-03-25 2017-09-28 International Business Machines Corporation Monitoring activity to detect potential user actions
US20180165582A1 (en) * 2016-12-08 2018-06-14 Facebook, Inc. Systems and methods for determining sentiments in conversations in a chat application
US20180197088A1 (en) * 2017-01-10 2018-07-12 International Business Machines Corporation Discovery, characterization, and analysis of interpersonal relationships extracted from unstructed text data

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Engelbrecht, Klaus-Peter, et al. "Modeling user satisfaction with hidden Markov models." Proceedings of the SIGDIAL 2009 Conference. 2009. (Year: 2009). *
International Search Report dated Apr. 17, 2018 in PCT/JP2018/003644 filed Feb. 2, 2018.
Nomoto, N. et al., "Using nonverbal information and characteristic linguistic representations to detect anger emotion in dialog speech," The transactions of the Institute of Electronics, Information and Communication Engineers, vol. J96-D., No. 1, 2013, pp. 15-24 (with partial English translation).
Park, Y. et al., "Towards Real-Time Measurement of Customer Satisfaction Using Automatically Generated Call Transcripts," in Proceedings of the 18th ACM conference on Information and knowledge management, RC24754 (W0902-116) Feb. 27, 2009, pp. 1387-1396.
Shikano, K. et al., "Speech Recognition System", Ohmsha, Ltd., 2001, pp. 27-29 (with partial English translation).
Tokuda, K., "State-of-the-art Technology of Speech Information Processing: Speech Recognition and Speech Synthesis based on Hidden Markov Models", IPSJ Magazine, vol. 45, No. 10, 2004, pp. 1005-1011 (with partial English translation).

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12020427B2 (en) 2017-10-03 2024-06-25 Advanced Telecommunications Research Institute International Differentiation device, differentiation method for depression symptoms, determination method for level of depression symptoms, stratification method for depression patients, determination method for effects of treatment of depression symptoms, and brain activity training device
US20220067479A1 (en) * 2018-10-08 2022-03-03 Qualcomm Incorporated Vehicle entry detection
US12148228B2 (en) * 2018-10-08 2024-11-19 Qualcomm Incorporated Vehicle entry detection

Also Published As

Publication number Publication date
JPWO2018147193A1 (en) 2019-12-19
WO2018147193A1 (en) 2018-08-16
US20190392348A1 (en) 2019-12-26
JP6780033B2 (en) 2020-11-04

Similar Documents

Publication Publication Date Title
US11521641B2 (en) Model learning device, estimating device, methods therefor, and program
US10789943B1 (en) Proxy for selective use of human and artificial intelligence in a natural language understanding system
US11790896B2 (en) Detecting non-verbal, audible communication conveying meaning
US9536525B2 (en) Speaker indexing device and speaker indexing method
US9536547B2 (en) Speaker change detection device and speaker change detection method
JP6852161B2 (en) Satisfaction estimation model learning device, satisfaction estimation device, satisfaction estimation model learning method, satisfaction estimation method, and program
EP1465154B1 (en) Method of speech recognition using variational inference with switching state space models
CN104541324A (en) A speech recognition system and a method of using dynamic bayesian network models
JP2020187211A (en) Interaction device, interaction method and interaction computer program
US9224391B2 (en) Method and system for automatically providing linguistic formulations that are outside a recognition domain of an automatic speech recognition system
US7480615B2 (en) Method of speech recognition using multimodal variational inference with switching state space models
Hara et al. Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System.
CN113327596B (en) Training method of voice recognition model, voice recognition method and device
US20240038255A1 (en) Speaker diarization method, speaker diarization device, and speaker diarization program
US11798578B2 (en) Paralinguistic information estimation apparatus, paralinguistic information estimation method, and program
US20040044531A1 (en) Speech recognition system and method
Duchateau et al. Confidence scoring based on backward language models
KR101229108B1 (en) Apparatus for utterance verification based on word specific confidence threshold
WO2023281717A1 (en) Speaker diarization method, speaker diarization device, and speaker diarization program
Shigli et al. Automatic dialect and accent speech recognition of South Indian English
Korenevsky et al. Prediction of speech recognition accuracy for utterance classification.
CN117275458B (en) Speech generation method, device and equipment for intelligent customer service and storage medium
US11894017B2 (en) Voice/non-voice determination device, voice/non-voice determination model parameter learning device, voice/non-voice determination method, voice/non-voice determination model parameter learning method, and program
JPH10254485A (en) Speaker normalizing device, speaker adaptive device and speech recognizer
Coleman Modelling linguistic accommodation

Legal Events

Date Code Title Description
AS Assignment

Owner name: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ANDO, ATSUSHI;KAMIYAMA, HOSANA;KOBASHIKAWA, SATOSHI;REEL/FRAME:049979/0511

Effective date: 20190628

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE