US20210027772A1 - Unsupervised automated extraction of conversation structure from recorded conversations - Google Patents
Unsupervised automated extraction of conversation structure from recorded conversations Download PDFInfo
- Publication number
- US20210027772A1 US20210027772A1 US16/520,374 US201916520374A US2021027772A1 US 20210027772 A1 US20210027772 A1 US 20210027772A1 US 201916520374 A US201916520374 A US 201916520374A US 2021027772 A1 US2021027772 A1 US 2021027772A1
- Authority
- US
- United States
- Prior art keywords
- conversation
- conversations
- processor
- structure model
- given
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/041—Abduction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
Definitions
- the present invention relates generally to natural language processing, and particularly to techniques for analyzing the content of conversations.
- U.S. Pat. No. 8,214,242 describes signaling correspondence between a meeting agenda and a meeting discussion that includes: receiving a meeting agenda specifying one or more topics for a meeting; analyzing, for each topic, one or more documents to identify topic keywords for that topic; receiving meeting discussions among participants for the meeting; identifying a current topic for the meeting in dependence upon the meeting agenda; determining a correspondence indicator in dependence upon the meeting discussions and the topic keywords for the current topic, the correspondence indicator specifying the correspondence between the meeting agenda and the meeting discussion; and rendering the correspondence indicator to the participants of the meeting.
- PCT application WO/2019/016119 describes method and a system for performing automatically a discovery of topics within temporal ordered text document collections, the method comprising the steps of: generating a bag of words vector for each text document collection using a predefined dictionary, iteratively calculating on the basis of the generated bag of words vectors, for each text document collection, a hidden topic vector, representing topics of the respective text. document collection using a calculated hidden state vector, memorizing a hidden state of all previous text document collections.
- An embodiment of the present invention provides a method for information processing, including computing, over a corpus of conversations, a conversation structure model including: (i) a sequence of conversation parts having a defined order, and (ii) a probabilistic model defining each. of the conversation. parts. For a given conversation, a segmentation of the conversation is computed. based. on. the computed conversation structure model. Action is taken on the given conversation according to the segmentation.
- computing the probabilistic model includes assigning a probability to an occurrence of each word.
- assigning the probability includes running a Gibbs sampling process.
- assigning the probability includes using a prior probability distribution for one or more of the conversation parts.
- computing the conversation structure model includes pre-specifying a fixed number of the conversation parts.
- computing the conversation structure model includes selecting a subset of the conversations based on one or more business rules.
- computing the segmentation of the conversation includes finding the segmentation that best matches the conversation structure model.
- the method further includes computing a coherence score, which quantifies an extent of fit between the given conversation and the conversation structure model.
- the method further includes, when the coherence score is below a given value, regarding the given conversation as not matching the conversation structure model.
- estimating the coherence score includes analyzing a likelihood of the segmentation of the conversation under the conversation structure model. In other embodiments, the method further includes deciding, based on one or more coherence scores computed between one or more respective conversations in the corpus and the conversation structure model, that the conversation structure model does not capture a valid conversation structure.
- the conversations are transcribed from human conversations.
- the conversations are recorded conversations, conducted over a telephone, a conference system, or in a meeting.
- acting on the given conversation includes presenting a timeline that graphically illustrates the respective order and durations of the conversation parts during the given conversation.
- acting on the given conversation includes displaying conversation part duration to computer users.
- the method further includes searching for words within a conversation or within the corpus based on a conversation part to which the words are assigned. In other embodiments, the method further includes correlating the conversation parts of a given participant with participant metadata to identify conversation differences between participants.
- a system for information processing, including an interface and a processor.
- the interface is configured for accessing a corpus of recorded conversations.
- the processor is configured. to: (a) compute, over a corpus of conversations, a conversation structure model, including (i) a sequence of conversation parts having a defined order, and (ii) a probabilistic model defining each of the conversation parts, (b) compute, for a given conversation, a segmentation of the conversation based on the computed conversation structure model, and (c) act on the given conversation according to the segmentation.
- a computer software product including a tangible non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a processor, cause the processor to: (a) compute, over a corpus of conversations, a conversation structure model including (i) a sequence of conversation parts having a defined order, and (ii) a probabilistic model defining each of the conversation parts, (b) compute, for a given conversation, a segmentation of the conversation based on the computed conversation structure model, and (c) act on the given conversation according to the segmentation.
- FIG. 1 is schematic pictorial illustration of a teleconferencing system, in accordance with an embodiment of the invention
- FIG. 2 is a plate diagram that schematically describes a Bayesian model for unsupervised extraction of a conversation structure model from a corpus of conversations, in accordance with an embodiment of the invention
- FIG. 3 is a flow chart that schematically illustrates a method for unsupervised extraction. of a conversation structure from a corpus of conversations, in accordance with an embodiment of the invention
- FIG. 4 is a flow chart that schematically illustrates a method for analyzing and assigning a coherence score to a conversation, in accordance with an embodiment of the invention.
- FIG. 5 is a schematic representation oz a computer screen, showing a graphical analysis of a recorded conversation, in accordance with an embodiment of the invention.
- Embodiments of the present invention that are described. hereinafter provide methods and systems that are capable of autonomously analyzing an input corpus of recorded conversations between two or more speakers, such as telephone conversations, and identifying a common conversation structure across the conversations.
- the present embodiments are particularly useful in analyzing recorded teleconferences, with two or more speakers participating in each conversation.
- the principles of the present invention may similarly be applied to substantially any large corpus of text documents or recorded conversations. Any and all such items are regarded herein as “conversations.”
- conversation structure means a list of conversation intervals, typically comprising sentences uttered by a single speaker continuously, which have respective common characteristics and which appear in a specific order.
- the intervals are referred to herein as conversation parts.
- a conversation structure model may consist of an ordered set of conversation parts titled introduction, follow-up and summary.
- each conversation part typically consists of a single semantic topic, for the purpose of determining a structure model there is no need to specify the topic of each conversation part explicitly.
- the disclosed methods and systems can further estimate the extent to which that conversation matches .g., complies with) the common structure model that the system has identified. For example, the system may assess to what degree the order of topics was maintained, such as introduction, follow-up and summary. Assessing the compliance to a common conversation model is useful in various real-life applications, such as training or evaluation of sales persons and customer support representatives.
- the disclosed techniques also apply to identifying multiple common structures across a corpus of conversation and estimating the extent to which a given conversation matches (e.g., complies with) one or more of the common structures.
- computerized conversation processing system comprises an interface for accessing a corpus of recorded conversations, and a processor.
- the processor computes, over the conversations in the corpus, a conversation structure model comprising (i) a sequence of parts having a defined order, and (ii) a probabilistic model defining each of the conversation parts. Then, for a given conversation, the computes a segmentation of the conversation based on the computed structure model. Subsequently, the processor acts on the given conversation according to the segmentation.
- the processor computes the conversation structure model by adapting a predefined set of a-priori occurrence probabilities to reflect actual occurrence probabilities computed over the corpus.
- the prior probability distribution is over a pre-specified fixed number of ordered and non-recurring conversation parts and over the word occurrences in each part.
- the processor computes a posterior probability distribution by: (a) dividing each recorded conversation into an ordered set of paragraphs, (b) computing, by the processor, respective frequencies of occurrence of multiple words in each of a plurality of paragraphs in each of the recorded conversations, (c) based on the frequency of occurrence of the words over the conversations in the corpus and the prior probabilities of the words and conversation parts, running a Gibbs sampling process, and (d) outputting the parameters of the posterior probability distribution obtained by the Gibbs sampler.
- the processor Given the structure model, the processor then computes the segmentation of the conversation by finding the segmentation that has the best match to the computed model.
- the processor computes a coherence score between a given conversation and the structure model, which quantifies an. extent of fit between the given. conversation and the conversation. structure model.
- the processor is further configured. to, when the coherence score is below a given value, regard the given. conversation as not matching the conversation structure model.
- the processor estimates the coherence score by analyzing likelihood of the segmentation of the conversation under the conversation structure model.
- a conversation with too low of a coherence score, determined. based on the disclosed technique, may be flagged or dropped from a displaying process of the analyzed agendas.
- a user of the system Based on the coherence score for the conversation structure, a user of the system is able to understand how well the conversation was devoted to peruse the most common (i.e., learned) structure.
- FIG. 1 is schematic pictorial illustration of a teleconferencing system 20 , in accordance with. an embodiment of the invention.
- a server 22 receives and records conversations via a network 24 .
- Server 22 may receive audio input from the conversations online in real time, or it may receive recordings made and. stored by other means, such as by processors 26 , or even textual transcripts of conversations, created by speech-to-text programs running on other processors.
- server 22 may collect recordings of Web conferences using the methods described in U.S. Pat. No. 9,699,409, whose disclosure is incorporated herein by reference.
- server 22 collects and analyzes conversations made by people working in a given field, for example, help desk personnel or sales agents working for a given company.
- sales agents 30 using processors 26 , communicate with customers 32 who use audio devices 28 .
- These conversations may be carried out over substantially any sort of network, including both telephone and packet networks.
- server 22 may similarly apply the techniques described herein in analyzing conversations between three or more participants.
- a processor correlates the conversation parts o a given participant with participant metadata to identify conversation differences between participants.
- Server 22 comprises computerized conversation processing system including a processor 36 that may be a general-purpose computer, which is connected to network 24 by a network interface 34 .
- Server 22 receives and stores the corpus of recorded conversations in memory 38 for processing by processor 36 .
- Processor 36 autonomously derives an optimal conversation structure of K parts (i.e., optimal conversation structure) and, at the conclusion of this process, processor 36 is ably to present the conversation structure over the entire duration of the recorded conversations on a display 40 .
- processor 36 Given a new conversation, processor 36 can extract the new conversation structure and, based on the previously learned conversation structure model, assigns to the newly extracted structure a coherence score reflecting how well it fits this model. Processor 36 may then present the given coherence score on display 40 .
- Processor 36 typically carries out the functions that are described herein under the control of program instructions in software.
- This software may be downloaded to server 22 in electronic form, for example over a network. Additionally or alternatively, the software may be provided and/or stored on tangible, non-transitory computer-readable media, such as optical, magnetic, or electronic memory media.
- processor 36 runs a dedicated algorithm as disclosed herein, including in FIG. 2 , that enables processor 36 to perform the disclosed steps, as further described below.
- a processor of the computerized conversation processing system first sets a conversation structure model comprising an ordered sequence of a pre-specified number K of conversation parts.
- K each of the K implicit parts appears only once in the conversation (i.e., is non-recurring).
- the conversation parts are identified by a unique qualifier (e.g., a running index). Specifically, the conversation parts must appear in the same order (although not all K parts must appear all conversations).
- the total number K of the conversation parts can be defined in advance to be any suitable target number, for example a chosen number between five and ten.
- Each part of the structure model is given as a prior distribution of its word occurrences and a prior distribution of its duration in a conversation.
- the system autonomously processes the contents of an input corpus of recorded conversations that are assumed to have a common content-structure made up of these K parts.
- the disclosed system converts the recorded. conversations into text using any suitable methods and tools.
- the system breaks the entire conversation into an ordered collection of textual units referred to hereinafter as paragraphs made of sentences uttered by the same speaker continuously.
- the processor then computes respective frequencies of occurrence of multiple words in each of a plurality of paragraphs in each of the recorded conversations.
- the system uses the prior probability of the conversation structure model, the frequency of word occurrences in the input corpus and a suitable machine learning algorithm, such as a Gibbs sampling process, to calculate a posterior probability distribution of the K parts across the entire corpus.
- the system outputs the parameters of the learned probability for further use, such as the analysis of specific recorded conversations, as described below.
- the conversation structure estimation proceeds autonomously in the same manner, without human supervision, to determine a segmentation of that conversation that has the highest likelihood under the learned probability distribution of the structure model, as described below.
- each conversation admits a segmentation given the words such a conversation contains.
- a specific conversation, d can admit a segmentation where conversation part K i of that conversation, is any number of paragraphs long, t Ki, d .
- the likelihood of the segmentation decreases as the number of paragraphs t Ki, d differs from the learned mean number of paragraphs ⁇ K i and when the words in this conversation part are very different from a learned multinomial word distribution, w ⁇ Multinomial( ⁇ ), that is based on a given dictionary of words, ⁇ 0 , as described below.
- FIG. 2 is a plate diagram 100 schematically describing a Bavesian model 100 for unsupervised estimation of a conversation structure model from a corpus of recorded conversations, in accordance with an embodiment of the invention.
- the method imposes a probability distribution of an ordered sequence of conversation parts, further described below, on the conversations that is determined (e.g., learned) by a generative process that the method associates with the corpus of conversations.
- the probability distribution of the ordered sequence of conversation parts i.e., the learned probability
- the probability distribution of the ordered sequence of conversation parts is a combination of the multinomial distribution over the conversation part assignments and a multinomial distribution over the words in each paragraph.
- a subsequent conversation part distribution of a specific conversation cannot be reduced to a single multinomial distribution.
- the corpus contains a number D of conversations.
- the disclosed model assumes any conversation structure is made of a number K of ordered different yet-unidentified conversation parts, where the order of conversation parts is maintained for all conversations. Some of the conversation parts, though, may be absent in a given conversation.
- Each conversation d of the D conversations is assumed by the model to be made of paragraphs, where the model imposes a probability distribution ⁇ ( 104 ) , such as a multinomial distribution, on the count of paragraphs assigned to each of the K ordered different conversation parts.
- This probability distribution is either set in advance or itself drawn from a prior ⁇ 0 ( 102 ). In the shown embodiment
- the language model is based on a given prior, ⁇ 0 , ( 112 ) on the distribution of words.
- the disclosed model enforces conversation part coherence (i.e., uniqueness) for an entire paragraph.
- the model implicitly estimates the joint marginal distribution P(t
- Several methods exist to estimate this posterior probability one of which is Gibbs sampling. Gibbs sampling a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are drawn from a specified multivariate probability distribution, when direct sampling is difficult. This sampled sequence is in turn used. to approximate the joint and marginal distributions of the learned probability
- the derivation of the learned. probability comprises extracting the probability of the ordered sequence of conversation parts from the conversations by the processor without using a pre-classified training set.
- an optimal segmentation for each conversation is computed, in the sense that it maximizes the likelihood that the paragraphs are generated by the conversation structure model.
- w conversation words
- ⁇ 0 language model parameters of each of the model parts
- N p number of words in paragraph p
- N d is the number of paragraphs to the conversation (document) d.
- the model assigns a multinomial distribution for each paragraph p in conversation d, and for each word w in paragraph p:
- the disclosed technique can determine agenda structure segmentation of K conversation parts of that conversation having the highest likelihood L(z d0
- the conversation parts structure which obtains the maximum likelihood has no closed form but can be found by applying, for example, a message passing algorithm.
- FIG. 3 is a flow chart that schematically illustrates a method for unsupervised. estimation of a conversation structure from. a corpus of conversations, in accordance with an embodiment of the invention.
- server 22 records a corpus of conversations in memory 38 , at a recording step 50 .
- the conversations in the corpus are assumed to belong to a certain shared domain, such as sales conversations between agents 30 and customers 32 in the example shown in FIG. 1 , so that there will be a commonality of conversation parts among the conversations.
- processor 36 converts the oral conversation streams to text, at a speech-to-text conversion step 52 . Any suitable sort of speech processing program may be used for this purpose.
- processor 36 selects the recorded conversations based on business rules, e.g., processor 36 acts only on the first conversation of every sales opportunity, at conversations selection step 54 .
- processor 36 drops non-useful conversations, such as described below.
- processor 36 filters out the un-useful recorded conversations by testing language and/or syntax. This step can be important in the unsupervised learning process of FIG. 2 in eliminating spurious conversations that might otherwise impair the precision of classification. For example, processor 36 excludes from computation of the conversation structure any conversations in which the analyzed syntax does not match the target language, at a discard step 56 .
- processor 36 breaks the conversations into paragraphs, at a paragraph division step 58 .
- a paragraph is a continuous series of words of a selected length, or within a selected length range uttered by a single speaker. The inventors have found that it is helpful to use a paragraph size on the order of three sentences, at step 58 . Other considerations may also be applied in choosing paragraph boundaries, such as pauses in the conversation. However, other definitions of a paragraph may be applied.
- “Stop words” is a term used in natural language processing to denote words that have little or no semantic meaning. The inventors have found it useful in this regard to filter out roughly one hundred of the most common English words, including “a”, “able”, “about”, “across”, “after”, “all”, “almost”, etc. Because such stop words have a roughly equal chance of appearing in any conversation part, removing them from the paragraphs can be helpful in speeding Up subsequent conversation part estimation.
- Processor 36 counts the number of occurrences of the remaining words in each of the paragraphs and in the corpus as a whole. Absent human supervision, words that appear only once or a few times (for example, less than four times) in the corpus cannot reliably be associated with a conversation part. Therefore, processor 36 eliminates these rare words, as well, at step 60 in order to speed up the conversation part estimation.
- Processor 36 sets in advance, or uploads from a memory, a prior probability of an ordered sequence of K conversation parts, at a prior probability uploading step 61 .
- a Bayesian model such as the model described in FIG. 2
- a Gibbs sampling process processor 36 derives the posterior probability of the ordered sequence of conversation parts, i.e., of the duration of each of the K conversation parts as well as its distribution of words in each conversation part, at probability distribution derivation step 62 .
- Step 62 can be carried out using any suitable Bayesian estimation tools.
- the processor stores the parameters of the probability distribution (i.e., of ⁇ and ⁇ ) of the learned model.
- FIG. 4 is a flow chart that schematically illustrates a method for analyzing and assigning a coherence score to a conversation, in accordance with an embodiment of the invention.
- Processor 36 applies the model described in FIG. 2 to analyze the conversation, using the parameters of the learned model in step 62 .
- the model is used to find the maximal likelihood segmentation for the conversation (e.g., phone call) and this optimal segmentation is given a coherence score.
- processor 36 selects a conversation to analyze, at a conversation selection step 70 .
- the conversation may be selected from the same corpus as was used previously in learning the conversation parts, as described above, or it may be a newly collected conversation.
- processor 36 calculates the maximal likelihood segmentation (S do ) for the conversation do, at a given conversation structure derivation step 72 .
- the set of conversation parts count vector, z d0 is the “optimal” vector in the sense that maximizes the likelihood q(w d0 ) that the paragraphs and the words in each paragraph can be generated by the machine-learned conversation part structure.
- Processor 36 then calculates a coherence score, at a coherence scoring step 74 .
- the coherence score is calculated using a function that accepts as an input the probabilities of each paragraph to belong to each of the conversation parts, as well as the optimal segmentation estimated by the algorithm. It outputs a score based on the ratio between the paragraph probability under the most probable conversation part and the actual chosen conversation part. The score typically ranges between very poor and excellent and/or an equivalent numerical score between zero and a hundred.
- Processor 36 presents the coherence scoring of the conversation on display 40 , at an output step 76 .
- FIG. 5 is a schematic representation of a computer screen 80 , showing a graphical analysis of a recorded conversation, in accordance with an embodiment of the invention.
- Processor 36 presents the results of analysis of the conversation on display 40 , at an output step 76 .
- the display may show the segmentation of the conversation.
- processor 36 may present the results of such an analysis of the conversation on display 40 .
- the display shows the segmentation of the conversation.
- This figure shows an example of a user interface screen, illustrating how a typical conversation has been segmented by conversation part at step 74 and presented at step 76 for subsequent review by the user.
- Horizontal bars 82 labeled “Jabulani” and “Alex” (an account executive and a customer, for example) show which of these two parties to the conversation was speaking at each given moment during the conversation.
- a “Conversation parts” bar 84 shows the conversation part at each corresponding moment during the conversation. The conversation parts are color-coded, according to the legend appearing at the bottom of screen 80 .
- the user who is viewing screen 80 can browse through the conversation using a cursor 86 .
- the user can move the cursor horizontally to one of the conversation parts labeled with the title “Introduction” and then. listen to, or read, the text of the conversation in this conversation part.
- the user can also view a screenshot 88 of Jabulani's computer screen at each point in the conversation.
- the conversation structure estimation process may then output a coherence score 90 for the maximal likelihood segmentation, that measures the extent to which the derived conversation structure fits the constrained learned probability.
- the results of the sort of segmentation of conversations that is described above can be used in analyzing certain qualities of a conversation and possibly to predict its outcome.
- the location and distribution of conversation parts can be used to assess whether the conversation is following a certain desired agenda. Additionally or alternatively, the location and distribution of conversation parts can be used to predict whether the conversation is likely to result in a desired business outcome.
- processor 36 uses the conversation part location, distribution and related statistics, such as the duration of a given conversation part, the time of its occurrence in a conversation, to predict the expected likelihood that a conversation belongs to a certain group.
- useful groups of this sort are conversations resulting in a desired business outcome, conversations managed by top-performing sales representative, conversations marked as good conversations by team members, or conversations following a desired pattern.
- processor 36 Based on these predictions, processor 36 provides insights and actionable recommendations for improving the sales process, for both the entire sales organization and for specific sales people or teams.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Algebra (AREA)
- Machine Translation (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
Abstract
Description
- The present invention relates generally to natural language processing, and particularly to techniques for analyzing the content of conversations.
- Vast amounts of information are exchanged among participants in teleconferences. In many organizations, teleconferences are recorded and available for subsequent review. Even when the teleconferences are transcribed to textual form, however, reviewing the records is so time-consuming that the vast majority of the information cannot be exploited.
- A number of methods have been proposed in the patent literature for automating the extraction of information from teleconferences. For example, U.S. Pat. No. 8,214,242 describes signaling correspondence between a meeting agenda and a meeting discussion that includes: receiving a meeting agenda specifying one or more topics for a meeting; analyzing, for each topic, one or more documents to identify topic keywords for that topic; receiving meeting discussions among participants for the meeting; identifying a current topic for the meeting in dependence upon the meeting agenda; determining a correspondence indicator in dependence upon the meeting discussions and the topic keywords for the current topic, the correspondence indicator specifying the correspondence between the meeting agenda and the meeting discussion; and rendering the correspondence indicator to the participants of the meeting.
- As another example, PCT application WO/2019/016119 describes method and a system for performing automatically a discovery of topics within temporal ordered text document collections, the method comprising the steps of: generating a bag of words vector for each text document collection using a predefined dictionary, iteratively calculating on the basis of the generated bag of words vectors, for each text document collection, a hidden topic vector, representing topics of the respective text. document collection using a calculated hidden state vector, memorizing a hidden state of all previous text document collections.
- An embodiment of the present invention provides a method for information processing, including computing, over a corpus of conversations, a conversation structure model including: (i) a sequence of conversation parts having a defined order, and (ii) a probabilistic model defining each. of the conversation. parts. For a given conversation, a segmentation of the conversation is computed. based. on. the computed conversation structure model. Action is taken on the given conversation according to the segmentation.
- In some embodiments, computing the probabilistic model includes assigning a probability to an occurrence of each word.
- In some embodiments, assigning the probability includes running a Gibbs sampling process.
- In an embodiment, assigning the probability includes using a prior probability distribution for one or more of the conversation parts.
- In another embodiment, computing the conversation structure model includes pre-specifying a fixed number of the conversation parts.
- In some embodiments, computing the conversation structure model includes selecting a subset of the conversations based on one or more business rules.
- In some embodiments, computing the segmentation of the conversation includes finding the segmentation that best matches the conversation structure model.
- In an embodiment, the method further includes computing a coherence score, which quantifies an extent of fit between the given conversation and the conversation structure model.
- In another embodiment, the method further includes, when the coherence score is below a given value, regarding the given conversation as not matching the conversation structure model.
- In some embodiments, estimating the coherence score includes analyzing a likelihood of the segmentation of the conversation under the conversation structure model. In other embodiments, the method further includes deciding, based on one or more coherence scores computed between one or more respective conversations in the corpus and the conversation structure model, that the conversation structure model does not capture a valid conversation structure.
- In an embodiment, further includes, subsequent to computing the conversation structure model, merging one or more of the conversation parts into a single conversation part.
- In some embodiments, the conversations are transcribed from human conversations.
- In some embodiments, the conversations are recorded conversations, conducted over a telephone, a conference system, or in a meeting.
- In an embodiment, acting on the given conversation includes presenting a timeline that graphically illustrates the respective order and durations of the conversation parts during the given conversation. In another embodiment, acting on the given conversation includes displaying conversation part duration to computer users.
- In some embodiments, the method further includes searching for words within a conversation or within the corpus based on a conversation part to which the words are assigned. In other embodiments, the method further includes correlating the conversation parts of a given participant with participant metadata to identify conversation differences between participants.
- There is additionally provided, in accordance with an embodiment of the present invention a system, for information processing, including an interface and a processor. The interface is configured for accessing a corpus of recorded conversations. The processor is configured. to: (a) compute, over a corpus of conversations, a conversation structure model, including (i) a sequence of conversation parts having a defined order, and (ii) a probabilistic model defining each of the conversation parts, (b) compute, for a given conversation, a segmentation of the conversation based on the computed conversation structure model, and (c) act on the given conversation according to the segmentation.
- There is further provided, in accordance with an embodiment of the present invention a computer software product, the product. including a tangible non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a processor, cause the processor to: (a) compute, over a corpus of conversations, a conversation structure model including (i) a sequence of conversation parts having a defined order, and (ii) a probabilistic model defining each of the conversation parts, (b) compute, for a given conversation, a segmentation of the conversation based on the computed conversation structure model, and (c) act on the given conversation according to the segmentation. The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:
-
FIG. 1 is schematic pictorial illustration of a teleconferencing system, in accordance with an embodiment of the invention; -
FIG. 2 is a plate diagram that schematically describes a Bayesian model for unsupervised extraction of a conversation structure model from a corpus of conversations, in accordance with an embodiment of the invention; -
FIG. 3 is a flow chart that schematically illustrates a method for unsupervised extraction. of a conversation structure from a corpus of conversations, in accordance with an embodiment of the invention; -
FIG. 4 is a flow chart that schematically illustrates a method for analyzing and assigning a coherence score to a conversation, in accordance with an embodiment of the invention; and -
FIG. 5 is a schematic representation oz a computer screen, showing a graphical analysis of a recorded conversation, in accordance with an embodiment of the invention. - Embodiments of the present invention that are described. hereinafter provide methods and systems that are capable of autonomously analyzing an input corpus of recorded conversations between two or more speakers, such as telephone conversations, and identifying a common conversation structure across the conversations. The present embodiments are particularly useful in analyzing recorded teleconferences, with two or more speakers participating in each conversation. However, the principles of the present invention may similarly be applied to substantially any large corpus of text documents or recorded conversations. Any and all such items are regarded herein as “conversations.”
- In the context of the present disclosure and in the claims, the term “conversation structure” means a list of conversation intervals, typically comprising sentences uttered by a single speaker continuously, which have respective common characteristics and which appear in a specific order. The intervals are referred to herein as conversation parts. For example, a conversation structure model may consist of an ordered set of conversation parts titled introduction, follow-up and summary.
- Although each conversation part typically consists of a single semantic topic, for the purpose of determining a structure model there is no need to specify the topic of each conversation part explicitly.
- Given an input conversation (e.g., one of the conversations in the corpus or a new relevant conversation) , the disclosed methods and systems can further estimate the extent to which that conversation matches .g., complies with) the common structure model that the system has identified. For example, the system may assess to what degree the order of topics was maintained, such as introduction, follow-up and summary. Assessing the compliance to a common conversation model is useful in various real-life applications, such as training or evaluation of sales persons and customer support representatives.
- While the following description refers mainly a singe common structure, the disclosed techniques also apply to identifying multiple common structures across a corpus of conversation and estimating the extent to which a given conversation matches (e.g., complies with) one or more of the common structures.
- In some disclosed embodiments, computerized conversation processing system comprises an interface for accessing a corpus of recorded conversations, and a processor. The processor computes, over the conversations in the corpus, a conversation structure model comprising (i) a sequence of parts having a defined order, and (ii) a probabilistic model defining each of the conversation parts. Then, for a given conversation, the computes a segmentation of the conversation based on the computed structure model. Subsequently, the processor acts on the given conversation according to the segmentation.
- In some embodiments, the processor computes the conversation structure model by adapting a predefined set of a-priori occurrence probabilities to reflect actual occurrence probabilities computed over the corpus. The prior probability distribution is over a pre-specified fixed number of ordered and non-recurring conversation parts and over the word occurrences in each part. In some embodiments, the processor computes a posterior probability distribution by: (a) dividing each recorded conversation into an ordered set of paragraphs, (b) computing, by the processor, respective frequencies of occurrence of multiple words in each of a plurality of paragraphs in each of the recorded conversations, (c) based on the frequency of occurrence of the words over the conversations in the corpus and the prior probabilities of the words and conversation parts, running a Gibbs sampling process, and (d) outputting the parameters of the posterior probability distribution obtained by the Gibbs sampler.
- Given the structure model, the processor then computes the segmentation of the conversation by finding the segmentation that has the best match to the computed model.
- In some embodiments, the processor computes a coherence score between a given conversation and the structure model, which quantifies an. extent of fit between the given. conversation and the conversation. structure model. In an embodiment, the processor is further configured. to, when the coherence score is below a given value, regard the given. conversation as not matching the conversation structure model. The processor estimates the coherence score by analyzing likelihood of the segmentation of the conversation under the conversation structure model.
- A conversation with too low of a coherence score, determined. based on the disclosed technique, may be flagged or dropped from a displaying process of the analyzed agendas. Based on the coherence score for the conversation structure, a user of the system is able to understand how well the conversation was devoted to peruse the most common (i.e., learned) structure.
-
FIG. 1 is schematic pictorial illustration of ateleconferencing system 20, in accordance with. an embodiment of the invention. - A
server 22 receives and records conversations via anetwork 24.Server 22 may receive audio input from the conversations online in real time, or it may receive recordings made and. stored by other means, such as byprocessors 26, or even textual transcripts of conversations, created by speech-to-text programs running on other processors. As one example,server 22 may collect recordings of Web conferences using the methods described in U.S. Pat. No. 9,699,409, whose disclosure is incorporated herein by reference. - In the pictured embodiment,
server 22 collects and analyzes conversations made by people working in a given field, for example, help desk personnel or sales agents working for a given company. In the disclosed example,sales agents 30, usingprocessors 26, communicate withcustomers 32 who useaudio devices 28. These conversations may be carried out over substantially any sort of network, including both telephone and packet networks. Although the conversations shown inFIG. 1 have two participants,server 22 may similarly apply the techniques described herein in analyzing conversations between three or more participants. In an embodiment, a processor correlates the conversation parts o a given participant with participant metadata to identify conversation differences between participants. -
Server 22 comprises computerized conversation processing system including aprocessor 36 that may be a general-purpose computer, which is connected to network 24 by anetwork interface 34.Server 22 receives and stores the corpus of recorded conversations inmemory 38 for processing byprocessor 36.Processor 36 autonomously derives an optimal conversation structure of K parts (i.e., optimal conversation structure) and, at the conclusion of this process,processor 36 is ably to present the conversation structure over the entire duration of the recorded conversations on adisplay 40. Given a new conversation,processor 36 can extract the new conversation structure and, based on the previously learned conversation structure model, assigns to the newly extracted structure a coherence score reflecting how well it fits this model.Processor 36 may then present the given coherence score ondisplay 40. -
Processor 36 typically carries out the functions that are described herein under the control of program instructions in software. This software may be downloaded toserver 22 in electronic form, for example over a network. Additionally or alternatively, the software may be provided and/or stored on tangible, non-transitory computer-readable media, such as optical, magnetic, or electronic memory media. In particular,processor 36 runs a dedicated algorithm as disclosed herein, including inFIG. 2 , that enablesprocessor 36 to perform the disclosed steps, as further described below. - Introduction
- In the disclosed embodiments, a processor of the computerized conversation processing system first sets a conversation structure model comprising an ordered sequence of a pre-specified number K of conversation parts. Each of the K implicit parts appears only once in the conversation (i.e., is non-recurring). Furthermore, the conversation parts are identified by a unique qualifier (e.g., a running index). Specifically, the conversation parts must appear in the same order (although not all K parts must appear all conversations). The total number K of the conversation parts can be defined in advance to be any suitable target number, for example a chosen number between five and ten.
- Each part of the structure model is given as a prior distribution of its word occurrences and a prior distribution of its duration in a conversation.
- Subsequently, the system. autonomously processes the contents of an input corpus of recorded conversations that are assumed to have a common content-structure made up of these K parts. In some embodiments, the disclosed system. converts the recorded. conversations into text using any suitable methods and tools. Following conversion to text, and optionally filtering out irrelevant conversations, the system breaks the entire conversation into an ordered collection of textual units referred to hereinafter as paragraphs made of sentences uttered by the same speaker continuously. The processor then computes respective frequencies of occurrence of multiple words in each of a plurality of paragraphs in each of the recorded conversations.
- The system uses the prior probability of the conversation structure model, the frequency of word occurrences in the input corpus and a suitable machine learning algorithm, such as a Gibbs sampling process, to calculate a posterior probability distribution of the K parts across the entire corpus. The system outputs the parameters of the learned probability for further use, such as the analysis of specific recorded conversations, as described below.
- Given one of the conversations in the corpus, or a new conversation as an input, the conversation structure estimation proceeds autonomously in the same manner, without human supervision, to determine a segmentation of that conversation that has the highest likelihood under the learned probability distribution of the structure model, as described below.
- In an embodiment, each conversation admits a segmentation given the words such a conversation contains. The structure model dictates that the length of any particular conversation part Ki of the K conversation parts, i=1, 2 . . . K, is multinomially distributed with a mean length of ζKj paragraphs. In an embodiment, a specific conversation, d, can admit a segmentation where conversation part Ki of that conversation, is any number of paragraphs long, tKi, d. The likelihood of the segmentation decreases as the number of paragraphs tKi, d differs from the learned mean number of paragraphs ƒKi and when the words in this conversation part are very different from a learned multinomial word distribution, w˜Multinomial(β), that is based on a given dictionary of words, β0, as described below.
-
FIG. 2 is a plate diagram 100 schematically describing aBavesian model 100 for unsupervised estimation of a conversation structure model from a corpus of recorded conversations, in accordance with an embodiment of the invention. - The method imposes a probability distribution of an ordered sequence of conversation parts, further described below, on the conversations that is determined (e.g., learned) by a generative process that the method associates with the corpus of conversations. In an embodiment, the probability distribution of the ordered sequence of conversation parts (i.e., the learned probability) is a combination of the multinomial distribution over the conversation part assignments and a multinomial distribution over the words in each paragraph. Typically, a subsequent conversation part distribution of a specific conversation cannot be reduced to a single multinomial distribution.
- In the model, the corpus contains a number D of conversations. The disclosed model assumes any conversation structure is made of a number K of ordered different yet-unidentified conversation parts, where the order of conversation parts is maintained for all conversations. Some of the conversation parts, though, may be absent in a given conversation.
- Each conversation d of the D conversations is assumed by the model to be made of paragraphs, where the model imposes a probability distribution θ(104) , such as a multinomial distribution, on the count of paragraphs assigned to each of the K ordered different conversation parts. This probability distribution is either set in advance or itself drawn from a prior θ0 (102). In the shown embodiment
-
td˜Multinomial(θ) Eq. 1 - a vector td (106) gives paragraph counts in a conversation where td={(np1, np2, . . . npK)}, and (np1, np2, . . . npK) is multinomially distributed according to parameters ƒ0 and npi are paragraph counts per conversation parts i, i=1, 2 . . . K.
- The vector of paragraph assignments, zd (108), is given by the unpacked vector td ordered by the specified conversation part ordering, where paragraph j belongs to conversation part i, i=1, 2 . . . K.
- According to the Bayesian model of
FIG. 2 , the distribution of words in a paragraph p of a conversation d, Wp,d, are drawn from a language model βi (114) where i=zd, p. The language model is based on a given prior, β0, (112) on the distribution of words. As noted above, the disclosed model enforces conversation part coherence (i.e., uniqueness) for an entire paragraph. - At training time the model implicitly estimates the joint marginal distribution P(t|w), the probability of the conversation part assignments given the document text. Several methods exist to estimate this posterior probability, one of which is Gibbs sampling. Gibbs sampling a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are drawn from a specified multivariate probability distribution, when direct sampling is difficult. This sampled sequence is in turn used. to approximate the joint and marginal distributions of the learned probability The derivation of the learned. probability comprises extracting the probability of the ordered sequence of conversation parts from the conversations by the processor without using a pre-classified training set.
- At inference time an optimal segmentation for each conversation is computed, in the sense that it maximizes the likelihood that the paragraphs are generated by the conversation structure model.
- The expression for the joint marginal probability used in the Gibbs sampling is given in Eq. 2. The notation w−d signifies the word frequencies in all but the dth conversation, respectively t−(d, i) is the vector of conversation parts assignment counts in all but the ith paragraph of conversation.
-
- in which w is conversation words, β0 is language model parameters of each of the model parts, Np is number of words in paragraph p, and Nd is the number of paragraphs to the conversation (document) d.
- The model assigns a multinomial distribution for each paragraph p in conversation d, and for each word w in paragraph p:
-
wd˜Multinonial(β[zd, p]) Eq. 3 - Given a relevant conversation d0, such as one belonging to the D conversations, and using Eq. 3, the disclosed technique can determine agenda structure segmentation of K conversation parts of that conversation having the highest likelihood L(zd0|wd0). The conversation parts structure which obtains the maximum likelihood has no closed form but can be found by applying, for example, a message passing algorithm.
-
FIG. 3 is a flow chart that schematically illustrates a method for unsupervised. estimation of a conversation structure from. a corpus of conversations, in accordance with an embodiment of the invention. - In the description. that follows, this method, as well as the other methods described below, is assumed to be carried out by
server 22, but these methods may alternatively be implemented. in. any other suitable processing configurations. All such implementations are considered to be within the scope of the present invention. - To initiate the method of
FIG. 3 ,server 22 records a corpus of conversations inmemory 38, at arecording step 50. The conversations in the corpus are assumed to belong to a certain shared domain, such as sales conversations betweenagents 30 andcustomers 32 in the example shown inFIG. 1 , so that there will be a commonality of conversation parts among the conversations. If the recorded conversations are not already in textual form,processor 36 converts the oral conversation streams to text, at a speech-to-text conversion step 52. Any suitable sort of speech processing program may be used for this purpose. - In some embodiments,
processor 36 selects the recorded conversations based on business rules, e.g.,processor 36 acts only on the first conversation of every sales opportunity, atconversations selection step 54. Next, at a discardstep 56,processor 36 drops non-useful conversations, such as described below. - In some embodiments,
processor 36 filters out the un-useful recorded conversations by testing language and/or syntax. This step can be important in the unsupervised learning process ofFIG. 2 in eliminating spurious conversations that might otherwise impair the precision of classification. For example,processor 36 excludes from computation of the conversation structure any conversations in which the analyzed syntax does not match the target language, at a discardstep 56. - To begin the actual conversation structure estimation process,
processor 36 breaks the conversations into paragraphs, at aparagraph division step 58. A paragraph is a continuous series of words of a selected length, or within a selected length range uttered by a single speaker. The inventors have found that it is helpful to use a paragraph size on the order of three sentences, atstep 58. Other considerations may also be applied in choosing paragraph boundaries, such as pauses in the conversation. However, other definitions of a paragraph may be applied. - As another preliminary step, it is also useful for
processor 36 to filter out of the conversation transcripts certain types of words, such as stop words and rare words, at aword filtering step 60. “Stop words” is a term used in natural language processing to denote words that have little or no semantic meaning. The inventors have found it useful in this regard to filter out roughly one hundred of the most common English words, including “a”, “able”, “about”, “across”, “after”, “all”, “almost”, etc. Because such stop words have a roughly equal chance of appearing in any conversation part, removing them from the paragraphs can be helpful in speeding Up subsequent conversation part estimation. -
Processor 36 counts the number of occurrences of the remaining words in each of the paragraphs and in the corpus as a whole. Absent human supervision, words that appear only once or a few times (for example, less than four times) in the corpus cannot reliably be associated with a conversation part. Therefore,processor 36 eliminates these rare words, as well, atstep 60 in order to speed up the conversation part estimation. -
Processor 36 sets in advance, or uploads from a memory, a prior probability of an ordered sequence of K conversation parts, at a priorprobability uploading step 61. Using a Bayesian model, such as the model described inFIG. 2 , and a Gibbssampling process processor 36 derives the posterior probability of the ordered sequence of conversation parts, i.e., of the duration of each of the K conversation parts as well as its distribution of words in each conversation part, at probabilitydistribution derivation step 62. In the inventors' experience, a number K of 5-10 conversation parts is a useful target for analysis of corpora containing hundreds of conversations in a particular domain.Step 62 can be carried out using any suitable Bayesian estimation tools. - Finally, at a
storage step 64, the processor stores the parameters of the probability distribution (i.e., of β and ζ) of the learned model. -
FIG. 4 is a flow chart that schematically illustrates a method for analyzing and assigning a coherence score to a conversation, in accordance with an embodiment of the invention.Processor 36 applies the model described inFIG. 2 to analyze the conversation, using the parameters of the learned model instep 62. The model is used to find the maximal likelihood segmentation for the conversation (e.g., phone call) and this optimal segmentation is given a coherence score. To initiate this method,processor 36 selects a conversation to analyze, at aconversation selection step 70. The conversation may be selected from the same corpus as was used previously in learning the conversation parts, as described above, or it may be a newly collected conversation. - Using the procedure described in
FIG. 3 ,processor 36 calculates the maximal likelihood segmentation (Sdo) for the conversation do, at a given conversationstructure derivation step 72. The set of conversation parts count vector, zd0, is the “optimal” vector in the sense that maximizes the likelihood q(wd0) that the paragraphs and the words in each paragraph can be generated by the machine-learned conversation part structure. -
Processor 36 then calculates a coherence score, at acoherence scoring step 74. The coherence score is calculated using a function that accepts as an input the probabilities of each paragraph to belong to each of the conversation parts, as well as the optimal segmentation estimated by the algorithm. It outputs a score based on the ratio between the paragraph probability under the most probable conversation part and the actual chosen conversation part. The score typically ranges between very poor and excellent and/or an equivalent numerical score between zero and a hundred.Processor 36 presents the coherence scoring of the conversation ondisplay 40, at anoutput step 76. -
FIG. 5 is a schematic representation of acomputer screen 80, showing a graphical analysis of a recorded conversation, in accordance with an embodiment of the invention. -
Processor 36 presents the results of analysis of the conversation ondisplay 40, at anoutput step 76. The display may show the segmentation of the conversation. - Using further analysis tools, such as those described in U.S Patent Application Publication 2018/0239822,
processor 36 may present the results of such an analysis of the conversation ondisplay 40. The display shows the segmentation of the conversation. - This figure shows an example of a user interface screen, illustrating how a typical conversation has been segmented by conversation part at
step 74 and presented atstep 76 for subsequent review by the user. -
Horizontal bars 82, labeled “Jabulani” and “Alex” (an account executive and a customer, for example) show which of these two parties to the conversation was speaking at each given moment during the conversation. A “Conversation parts” bar 84 shows the conversation part at each corresponding moment during the conversation. The conversation parts are color-coded, according to the legend appearing at the bottom ofscreen 80. - The user who is viewing
screen 80 can browse through the conversation using acursor 86. For example, to look into how pricing was negotiated between Jabulani and Alex, the user can move the cursor horizontally to one of the conversation parts labeled with the title “Introduction” and then. listen to, or read, the text of the conversation in this conversation part. Optionally, the user can also view ascreenshot 88 of Jabulani's computer screen at each point in the conversation. - The conversation structure estimation process may then output a
coherence score 90 for the maximal likelihood segmentation, that measures the extent to which the derived conversation structure fits the constrained learned probability. - The results of the sort of segmentation of conversations that is described above can be used in analyzing certain qualities of a conversation and possibly to predict its outcome. For example, the location and distribution of conversation parts can be used to assess whether the conversation is following a certain desired agenda. Additionally or alternatively, the location and distribution of conversation parts can be used to predict whether the conversation is likely to result in a desired business outcome.
- For such purposes, processor 36 (or another processor, which receives the segmentation results) uses the conversation part location, distribution and related statistics, such as the duration of a given conversation part, the time of its occurrence in a conversation, to predict the expected likelihood that a conversation belongs to a certain group. An example of useful groups of this sort are conversations resulting in a desired business outcome, conversations managed by top-performing sales representative, conversations marked as good conversations by team members, or conversations following a desired pattern.
- Based on these predictions,
processor 36 provides insights and actionable recommendations for improving the sales process, for both the entire sales organization and for specific sales people or teams. - It will be appreciated that the embodiments described above are cited by way of example, and that the present invention not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and sub-combinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.
Claims (37)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/520,374 US20210027772A1 (en) | 2019-07-24 | 2019-07-24 | Unsupervised automated extraction of conversation structure from recorded conversations |
| EP20184576.5A EP3770795A1 (en) | 2019-07-24 | 2020-07-07 | Unsupervised automated extraction of conversation structure from recorded conversations |
| US18/310,558 US12183332B2 (en) | 2019-07-24 | 2023-05-02 | Unsupervised automated extraction of conversation structure from recorded conversations |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/520,374 US20210027772A1 (en) | 2019-07-24 | 2019-07-24 | Unsupervised automated extraction of conversation structure from recorded conversations |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/310,558 Continuation US12183332B2 (en) | 2019-07-24 | 2023-05-02 | Unsupervised automated extraction of conversation structure from recorded conversations |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20210027772A1 true US20210027772A1 (en) | 2021-01-28 |
Family
ID=71527586
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/520,374 Abandoned US20210027772A1 (en) | 2019-07-24 | 2019-07-24 | Unsupervised automated extraction of conversation structure from recorded conversations |
| US18/310,558 Active US12183332B2 (en) | 2019-07-24 | 2023-05-02 | Unsupervised automated extraction of conversation structure from recorded conversations |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/310,558 Active US12183332B2 (en) | 2019-07-24 | 2023-05-02 | Unsupervised automated extraction of conversation structure from recorded conversations |
Country Status (2)
| Country | Link |
|---|---|
| US (2) | US20210027772A1 (en) |
| EP (1) | EP3770795A1 (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11232266B1 (en) | 2020-07-27 | 2022-01-25 | Verizon Patent And Licensing Inc. | Systems and methods for generating a summary of a multi-speaker conversation |
| US11272058B2 (en) * | 2020-07-27 | 2022-03-08 | Verizon Patent And Licensing Inc. | Method and apparatus for summarization of dialogs |
| WO2022170876A1 (en) * | 2021-02-10 | 2022-08-18 | 华为技术有限公司 | Method, device and system for processing dialogue data, and storage medium |
| US11663824B1 (en) | 2022-07-26 | 2023-05-30 | Seismic Software, Inc. | Document portion identification in a recorded video |
| US20240386190A1 (en) * | 2023-05-19 | 2024-11-21 | Optum, Inc. | Machine learning divide and conquer techniques for long dialog summarization |
| US12197484B2 (en) | 2022-03-28 | 2025-01-14 | Gong.Io Ltd. | System and method for generating a multi-label classifier for textual communications |
| US12386867B2 (en) | 2022-07-27 | 2025-08-12 | Gong.Io Ltd. | System and method for rapid initialization and transfer of topic models by a multi-stage approach |
| US12425517B2 (en) * | 2023-10-04 | 2025-09-23 | Genesys Cloud Services, Inc. | Technologies for leveraging artificial intelligence for post-call actions in contact center systems |
| US12549499B2 (en) | 2024-04-22 | 2026-02-10 | Gong.Io Ltd. | System and method for generating a chat response on sales deals using a large language model |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160162804A1 (en) * | 2014-12-09 | 2016-06-09 | Xerox Corporation | Multi-task conditional random field models for sequence labeling |
Family Cites Families (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7487094B1 (en) * | 2003-06-20 | 2009-02-03 | Utopy, Inc. | System and method of call classification with context modeling based on composite words |
| US8214242B2 (en) | 2008-04-24 | 2012-07-03 | International Business Machines Corporation | Signaling correspondence between a meeting agenda and a meeting discussion |
| US8756233B2 (en) * | 2010-04-16 | 2014-06-17 | Video Semantics | Semantic segmentation and tagging engine |
| US10095692B2 (en) | 2012-11-29 | 2018-10-09 | Thornson Reuters Global Resources Unlimited Company | Template bootstrapping for domain-adaptable natural language generation |
| US20140214402A1 (en) * | 2013-01-25 | 2014-07-31 | Cisco Technology, Inc. | Implementation of unsupervised topic segmentation in a data communications environment |
| US9892745B2 (en) | 2013-08-23 | 2018-02-13 | At&T Intellectual Property I, L.P. | Augmented multi-tier classifier for multi-modal voice activity detection |
| US10304458B1 (en) | 2014-03-06 | 2019-05-28 | Board of Trustees of the University of Alabama and the University of Alabama in Huntsville | Systems and methods for transcribing videos using speaker identification |
| US9575952B2 (en) * | 2014-10-21 | 2017-02-21 | At&T Intellectual Property I, L.P. | Unsupervised topic modeling for short texts |
| US9875743B2 (en) | 2015-01-26 | 2018-01-23 | Verint Systems Ltd. | Acoustic signature building for a speaker from multiple sessions |
| WO2016176371A1 (en) * | 2015-04-27 | 2016-11-03 | TalkIQ, Inc. | Methods and systems for determining conversation quality |
| US9697833B2 (en) | 2015-08-25 | 2017-07-04 | Nuance Communications, Inc. | Audio-visual speech recognition with scattering operators |
| US10706873B2 (en) | 2015-09-18 | 2020-07-07 | Sri International | Real-time speaker state analytics platform |
| US9699409B1 (en) | 2016-02-17 | 2017-07-04 | Gong I.O Ltd. | Recording web conferences |
| US10515292B2 (en) | 2016-06-15 | 2019-12-24 | Massachusetts Institute Of Technology | Joint acoustic and visual processing |
| EP3509549A4 (en) | 2016-09-06 | 2020-04-01 | Neosensory, Inc. | METHOD AND SYSTEM FOR PROVIDING ADDITIONAL SENSORY INFORMATION TO A USER |
| US10497382B2 (en) | 2016-12-16 | 2019-12-03 | Google Llc | Associating faces with voices for speaker diarization within videos |
| US10642889B2 (en) | 2017-02-20 | 2020-05-05 | Gong I.O Ltd. | Unsupervised automated topic detection, segmentation and labeling of conversations |
| EP3432155A1 (en) | 2017-07-17 | 2019-01-23 | Siemens Aktiengesellschaft | Method and system for automatic discovery of topics and trends over time |
| US11004013B2 (en) * | 2017-12-05 | 2021-05-11 | discourse.ai, Inc. | Training of chatbots from corpus of human-to-human chats |
| CN108920644B (en) * | 2018-06-29 | 2021-10-08 | 北京百度网讯科技有限公司 | Method, apparatus, device and computer-readable medium for judging dialogue coherence |
| US10943070B2 (en) * | 2019-02-01 | 2021-03-09 | International Business Machines Corporation | Interactively building a topic model employing semantic similarity in a spoken dialog system |
-
2019
- 2019-07-24 US US16/520,374 patent/US20210027772A1/en not_active Abandoned
-
2020
- 2020-07-07 EP EP20184576.5A patent/EP3770795A1/en not_active Ceased
-
2023
- 2023-05-02 US US18/310,558 patent/US12183332B2/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160162804A1 (en) * | 2014-12-09 | 2016-06-09 | Xerox Corporation | Multi-task conditional random field models for sequence labeling |
Cited By (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11232266B1 (en) | 2020-07-27 | 2022-01-25 | Verizon Patent And Licensing Inc. | Systems and methods for generating a summary of a multi-speaker conversation |
| US11272058B2 (en) * | 2020-07-27 | 2022-03-08 | Verizon Patent And Licensing Inc. | Method and apparatus for summarization of dialogs |
| US11637928B2 (en) | 2020-07-27 | 2023-04-25 | Verizon Patent And Licensing Inc. | Method and apparatus for summarization of dialogs |
| WO2022170876A1 (en) * | 2021-02-10 | 2022-08-18 | 华为技术有限公司 | Method, device and system for processing dialogue data, and storage medium |
| US12197484B2 (en) | 2022-03-28 | 2025-01-14 | Gong.Io Ltd. | System and method for generating a multi-label classifier for textual communications |
| US11663824B1 (en) | 2022-07-26 | 2023-05-30 | Seismic Software, Inc. | Document portion identification in a recorded video |
| US12205372B2 (en) | 2022-07-26 | 2025-01-21 | Seismic Software, Inc. | Document portion identification in a recorded video |
| US12386867B2 (en) | 2022-07-27 | 2025-08-12 | Gong.Io Ltd. | System and method for rapid initialization and transfer of topic models by a multi-stage approach |
| US20240386190A1 (en) * | 2023-05-19 | 2024-11-21 | Optum, Inc. | Machine learning divide and conquer techniques for long dialog summarization |
| US12475305B2 (en) * | 2023-05-19 | 2025-11-18 | Optum, Inc. | Machine learning divide and conquer techniques for long dialog summarization |
| US12547822B2 (en) | 2023-05-19 | 2026-02-10 | Optum, Inc. | Machine learning divide and conquer techniques for long dialog summarization |
| US12425517B2 (en) * | 2023-10-04 | 2025-09-23 | Genesys Cloud Services, Inc. | Technologies for leveraging artificial intelligence for post-call actions in contact center systems |
| US12549499B2 (en) | 2024-04-22 | 2026-02-10 | Gong.Io Ltd. | System and method for generating a chat response on sales deals using a large language model |
Also Published As
| Publication number | Publication date |
|---|---|
| US20230267927A1 (en) | 2023-08-24 |
| EP3770795A1 (en) | 2021-01-27 |
| US12183332B2 (en) | 2024-12-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12183332B2 (en) | Unsupervised automated extraction of conversation structure from recorded conversations | |
| US10642889B2 (en) | Unsupervised automated topic detection, segmentation and labeling of conversations | |
| US11004013B2 (en) | Training of chatbots from corpus of human-to-human chats | |
| US8798255B2 (en) | Methods and apparatus for deep interaction analysis | |
| US20100070276A1 (en) | Method and apparatus for interaction or discourse analytics | |
| CN113505606B (en) | Training information acquisition method and device, electronic equipment and storage medium | |
| CA3151051A1 (en) | Method for conversion and classification of data based on context | |
| CN117441165A (en) | Reduce bias in generative language models | |
| US8781880B2 (en) | System, method and apparatus for voice analytics of recorded audio | |
| WO2023235580A1 (en) | Video-based chapter generation for a communication session | |
| US10255346B2 (en) | Tagging relations with N-best | |
| US8762161B2 (en) | Method and apparatus for visualization of interaction categorization | |
| US12437159B2 (en) | Methods and systems for enhanced searching of conversation data and related analytics in a contact center | |
| US12323264B2 (en) | Dynamic communication session topic generation | |
| US12118316B2 (en) | Sentiment scoring for remote communication sessions | |
| US12518751B2 (en) | Extracting engaging questions from a communication session | |
| US20110197206A1 (en) | System, Method And Program Product For Analyses Based On Agent-Customer Interactions And Concurrent System Activity By Agents | |
| US12455914B2 (en) | Dynamic agenda item coverage prediction | |
| US20240428000A1 (en) | Communication Session Sentiment Scoring | |
| US12530535B2 (en) | Intelligent prediction of next step sentences from a communication session | |
| US12112748B2 (en) | Extracting filler words and phrases from a communication session | |
| US12475892B2 (en) | Talking speed analysis per topic segment in a communication session | |
| US20260045257A1 (en) | Talking Speed Analysis in a Communication Session | |
| US20260045255A1 (en) | Playback Of Processed Transcript From A Communication Session | |
| CN119626225A (en) | Conference audio data processing method, device, equipment, medium and program product |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: GONG I.O LTD., ISRAEL Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HOREV, INBAL;RESHEF, EILON;ALLOUCHE, OMRI;AND OTHERS;REEL/FRAME:049841/0565 Effective date: 20190723 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| AS | Assignment |
Owner name: GONG.IO LTD., ISRAEL Free format text: CHANGE OF NAME;ASSIGNOR:GONG I.O LTD.;REEL/FRAME:058403/0142 Effective date: 20210303 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
| STCV | Information on status: appeal procedure |
Free format text: EXAMINER'S ANSWER TO APPEAL BRIEF MAILED |
|
| STCV | Information on status: appeal procedure |
Free format text: ON APPEAL -- AWAITING DECISION BY THE BOARD OF APPEALS |
|
| STCV | Information on status: appeal procedure |
Free format text: BOARD OF APPEALS DECISION RENDERED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |