GB2345183A

GB2345183A - Monitoring speech presentation

Info

Publication number: GB2345183A
Application number: GB9828545A
Authority: GB
Inventors: Robert Alexander Keiller; Richard Anthony Kirk; De Veen Evelyn Van; Gerhardt Paul Otto
Original assignee: Canon Research Centre Europe Ltd
Current assignee: Canon Technology Europe Ltd
Priority date: 1998-12-23
Filing date: 1998-12-23
Publication date: 2000-06-28
Anticipated expiration: 2018-12-23
Also published as: GB2389220B; GB0320871D0; GB2345183B; GB2389220A; GB9828545D0

Abstract

A system for monitoring speech, eg during a presentation to an audience, provides a feedback signal to the speaker and/or enables subsequent analysis. An audible, visible or vibrating alarm 11 may respond to characteristics such as talking too fast, leaving gaps or pauses, speaking in a monotone, or repeatedly using certain words or phrases; a datalog 15 feeds a printer 17 to provide statistics for study afterwards. In other embodiments, a monitoring system (31) can assess the rate of speaking (or possibly pitch and style) against pre-set parameters and control the speed at which an interacting computer (35) operates, Fig 2 (not shown); and the monitor (51) can match the measured delivery with an autocue display (57) by comparing with a known text file (55), Fig 3 (not shown).

Description

SPEECH MONITORING SYSTEM The present invention relates to an apparatus for and a method of monitoring speech. The invention has particular, although not exclusive, relevance to the monitoring of various characteristics of a user's speech signal in order to provide control signals for controlling the way in which the user gives a presentation to an audience and/or for controlling an interaction between the user and a computer system.

Another aspect of the present invention concerns the monitoring of certain characteristics of a speech signal representative of a known text for controlling an autocue or the like.

An audience's interest in a speech or presentation made by a speaker often depends upon the presentation and communication skills which the speaker has. For example, if the speaker speaks too quickly, then the audience will not be able to keep up with the information which is being presented to them and consequently they will lose interest in the remainder of the speech. Similarly, speakers who speak in a monotone are liable to send the audience to sleep, even though the content of the speech may be very interesting to the audience. Similarly, speakers who tend to leave large gaps or pauses within their presentations or who repeatedly use certain words or phrases, such as"basically"or"to be honest", are likely to annoy or bore the audience.

According to a first aspect, the present invention provides a system for monitoring a user's speech and for providing a feedback signal to the user for controlling the user's presentation. This system can be used, for example, to monitor for predetermined characteristics, such as speaking too fast, speaking in a monotone, leaving large gaps or pauses etc. If a speech recognition unit is used as well, then the system can also monitor for the occurrence of preselected words within the user's speech.

In existing computer-user interactive systems, the rate of interaction is usually the same each time the user interacts with the computer. According to second aspect, the present invention provides a system for monitoring a user's speech and for varying the interaction between the user and a computer system which the user is using in dependence upon the monitored speech. The system can, for example, try to infer the user's mood from the input speech and vary the interaction in dependence upon the inferred mood. In particular, if the user is talking fast and sounds bright and alert, then the computer system can increase its rate of interaction with the user by, for example, decreasing the response time of the software or by increasing the speaking rate of a speech synthesiser forming part of the computer system.

According to a third aspect, the present invention provides a system for monitoring the progress of a speaker as he/she delivers a known speech. The system can be used for identifying the approximate position within the known speech so as to control the automatic advancement of an autocue system.

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings in which: Figure 1 is a schematic block diagram illustrating a speech monitoring system according to a first aspect of the present invention; Figure 2 is a schematic block diagram illustrating a speech monitoring system according to a second aspect of the present invention; and Figure 3 is a schematic block diagram illustrating a speech monitoring system according to a third aspect of the present invention.

Figure 1, is a schematic block diagram illustrating a speech monitoring system according to a first aspect of the present invention. The purpose of the speech monitoring system shown in Figure 1 is to warn the user when he/she is making a speech that they are talking too fast or doing something else that impairs their communication. As shown, the monitoring system comprises a microphone 1, a speech processor and analysis unit 3, a speech recognition unit 5 and associated language and word models 7, a control unit 9, an alarm 11 a display 13, a data log 15 and a printer 17.

In operation, the microphone 1 converts an acoustic speech signal from the user into an equivalent electrical signal which is passed, via connector 21, to the speech processor and analysis unit 3. In this embodiment, the speech processor and analysis unit 3 is operable to convert the input speech signal from the microphone 1 into a sequence of parameter frames, each parameter frame representing a corresponding time frame of the input speech signal. The parameters in each parameter frame typically include cepstral coefficients and power/energy coefficients, which provide important information characteristic of the input speech signal. The sequence of parameter frames generated by the speech processor and analysis unit 3 are supplied, via conductor 23, to the speech recognition unit 5 which is operable to try to identify the presence of predetermined words and/or phrases within the user's speech for which there is a model in the language and word models 7. In this embodiment, the language and word models 7 are generated in advance by the user identifying words and/or phases which he/she tends to repeat and which may cause annoyance to the audience. In the event that the speech recognition unit 5 identifies one of the words and/or phases in the input speech, it outputs a corresponding signal to the control unit 9, via conductor 27.

In this embodiment, the speech processor and analysis unit 3 is also arranged to process the input speech to derive (i) a value indicative of the rate at which the user is speaking; (ii) a value indicative of whether or not the user is speaking in a monotone; and (iii) a value indicative of the gaps or pauses within the input speech, which values are passed to the control unit 9 via conductor 25. The control unit 9 is operable for receiving the output from the speech recognition unit 5 via conductor 27 and the above values output by the speech processor and analysis unit 3 via conductor 25 and to generate control signals for controlling the alarm 11 and the display 13. The alarm 11 may be, for example, an audible, visible or vibrating alarm. More specifically, the control unit 9 is operable to monitor the output from the speech recognition unit 5 and the speech processing and analysis unit 3 and to generate, if appropriate, a warning to the user either by activating the alarm 11 or by displaying appropriate information on the display 13 in order to inform the user that, for example, he is speaking too quickly, speaking in a monotone, is leaving large gaps in the speech or is repeatedly saying one of the words and/or phrases which the speech recognition unit is designed to identify. In this embodiment, the speech monitoring system operates in real time so that the speaker is given instantaneous feedback, so that he/she and can modify their presentation accordingly.

In addition to being able to be used in real time during a presentation, the speech monitoring system shown in Figure 1, and described above can also be used to monitor the entire speech given by a speaker and to provide an analysis of the speech after it has ended. The analysis might include the number of repetitions of selected words and/or phases, the average speaking rate, the variation of the speaking rate throughout the speech, the number of gaps or pauses within the speech, etc. This analysis is generated by the control unit and logged in the data log 15 which can be printed out to the printer 17 or displayed on the display 13, so that the speaker is given the appropriate feedback so that they can improve their presentation skills.

The speech monitoring system illustrated in Figure 1, might be built into a separate portable computer device or it might be implemented in computer software 1 on a personal computer which may also be assisting the user in their presentation by, for example, generating slides for display on an overhead projector (not shown).

Figure 2 is a schematic block diagram of a speech monitoring system according to a second aspect of the present invention. The purpose of the speech monitoring system shown in Figure 2 is to monitor a user's speech and to vary an interaction between the user and a computer system in dependence upon the mood of the user which is inferred from the user's speech. As shown, the speech monitoring system comprises a microphone 1, a speech monitoring unit 31, a control unit 33 and a computer application 35, all of which, in this embodiment, are operated in a common computer system 37.

In operation, the microphone 1 converts an acoustic speech signal from the user 39 into an equivalent electrical signal which is passed, via connector 41, to the speech monitoring unit 31. In this embodiment, the speech monitoring unit identifies the rate at which the user 39 is speaking (and optionally the user's pitch and style of speaking) and passes this information to the control unit 33 via the conductor 43. In this embodiment, the rate at which the user is speaking is identified by monitoring the beginning of known words in the input speech and their duration. The rate at which the user is speaking is then determined by comparing the durations with prestored durations loaded in memory.

From this information, the control unit 33 infers the speakers'mood and outputs a control signal on conductor 45 for controlling the computer application 35 accordingly. In particular, the control signal output by the control unit 33 changes the way in which the computer application 35 interacts (as represented by the doubleheaded arrow 47) with the user 39. For example, if the user 39 is talking quickly, the control unit 33 infers that the user is bright and alert, and accordingly causes the computer application 35 to react more quickly to the user 39. For example, where the computer application 35 includes a speech synthesiser, in response to the control unit 33 inferring that the user is bright and alert the computer application 35 increases the speed of speech synthesis. Alternatively, the computer application 35 might control the time resolution of the double click detection of a mouse button (not shown).

Conversely, if the speaker is speaking slowly, then the control unit 33 infers that the user is not alert and therefore causes the computer application to react more slowly to the user 39.

Figure 3, is a schematic block diagram of a speech monitoring system according to a third aspect of the present invention. The purpose of the speech monitoring system shown in Figure 3 is to track the position of a speaker as he/she delivers a known speech. As shown, the speech monitoring system comprises a microphone 1, a speech monitoring unit 51, a control unit 53, a text file 55 and a display 57, all of which, in this embodiment, form part of a single computer system 59.

In operation, the microphone 1 converts an acoustic speech signal of the user 39 into an equivalent electrical signal which is supplied, via conductor 61, to the speech monitoring unit 51. In this embodiment, the speech monitoring unit 51 identifies the words and/or syllables in the input speech signal and outputs a signal to the control unit via conductor 63 whenever a word and/or syllable is identified. In response, the control unit 53 counts the words and/or syllables and outputs a control signal to the text file 55 for identifying a part of the text file corresponding to the speech which the user 39 is about the speak. The identified part of the text file is then passed to the display 57, so that the user 59 can read the next part of the speech from the display 57.

In addition to being able to be used in an autocue system, the speech monitoring system illustrated in Figure 3 can be used in similar applications, such as in a play or in an opera, where the input speech will come from the stage whilst the displayed text will be displayed to a stage manager who can provide oral prompts whenever necessary. The advantage of such an automatic autocue system is that even if the operator/stage manager is temporarily distracted, the autocue will not loose its place within the known text.

In an alternative embodiment, the input speech could be passed to a speech recognition unit which is operable to compare the input speech signal with the text file in order to identify the next part of the text file to be displayed on the display 57.

As those skilled in the art will appreciate, the above embodiments could be combined to provide, for example, a system which can be used to promote the presentational skills of a speaker and which interacts with the speaker in dependence upon the mood of the speaker which is inferred from the speaker's speech. The system may also track the speaker's progress through his speech so as to control an automatic autocue.

The present invention is not limited to the exemplary embodiments described above, and various other modifications and embodiments will be apparent to those skilled in the art.

Claims

CLAIMS: 1. A speech monitoring system for use in promoting the presentational skills of a user, the apparatus comprising: means for receiving speech signals from the user; means for processing the received speech signals and for generating signals indicative of the occurrence of one or more predetermined events within the received speech signals; and control means responsive to said generated signals for outputting a feedback signal to said user for warning the user of the occurrence of the predetermined events within the received speech signals.
2. A speech monitoring system according to claim 1, further comprising a speech recognition unit operable for receiving the processed speech and for identifying one or more predetermined words and/or phases within the received speech signals by comparing the processed speech with stored reference models, and wherein said control means is operable to generate said feedback signal in dependence upon said recognition result.
3. A speech monitor according to claim 1 or 2, wherein said feedback signal is supplied to said user via at least one of an audible, visible or vibrating alarm.
4. A speech monitoring system according to any of claims 1 to 3, which is operable to provide said feedback signal in real time, so that said user can try to reduce the occurrence of said predetermined event or events within subsequent speech signals output by the user.
5. A speech monitor according to any preceding claim, further comprising means for generating a data log indicative of the occurrences of the predetermined event or events which are identified within the user's speech.
6. A speech monitor according to any preceding claim, wherein said predetermined event or events comprise at least one of speaking too fast, leaving too many gaps or pauses within the speech, speaking in a monotone or the like.
7. A computer system comprising: a computer application operable for interacting with a user; means for receiving speech signals from the user; means for processing the received speech signals and for deriving from the processed speech signals an indication of the mood of the user; and control means for controlling the interaction between the computer application and the user in dependence upon the derived indication of the mood of the user.
8. A system according to claim 7, wherein said received speech signals are processed in order to extract at least one of the speed at which the user is speaking, the user's pitch and the style of speech employed by the user.
9. A system according to claim 7 or 8, wherein said computer application comprises a speech synthesiser, and wherein said computer application is arranged to vary the speaking rate of the speech synthesiser in dependence upon the indication of the user's mood.
10. A system according to any of claims 7 to 9, wherein said control means is operable to vary the response rate of the computer application in dependence upon the indication of the user's mood.
11. A speech processing system comprising: means for storing signals representative of a known speech to be spoken by a speaker; means for receiving speech signals from the speaker as the speaker delivers the known speech; and means for determining from said stored signals and said received signals the position of the speaker within the known speech.
12. An autocue for prompting a user with the next part of a known speech comprising: means for storing signals representative of the known speech to be spoken by a speaker; means for receiving speech signals from the speaker as the speaker delivers the known speech; means for determining from said stored signals and said received signals the position of the speaker within the known speech; and means for informing the user of the next part of the known speech to be spoken by said speaker.
13. A system according to claim 12, wherein said user and said speaker are different people.
14. A system according to claim 12 or 13, wherein said next part of the known speech is displayed to said user on a display.
15. A system according to any of claims 11 to 14, wherein said determining means comprising a counter for counting words and/or syllables within the received speech signals.
16. A system according to any of claims 11 to 14, wherein said determining means comprises speech recognition means for comparing the received speech signals with the known speech signals.
17. A computer system for use by a user during the presentation of a speech, the computer system comprising: a speech monitoring system according to any of claims 1 to 6 for promoting the presentational skills of the user ; and a computer system according to any of claims 7 to 10 for varying an interaction between the computer system and the user.
18. A computer system according to claim 17, further comprising a speech processing system according to claim 11 or an autocue according to claim 12.
19. A method of promoting the presentational skills of a user, comprising steps of: receiving speech signals from the user; processing the received speech signals and generating signals indicative of the occurrence of one or more predetermined events within the received speech; and in response to the generated signals, outputting a feedback signal to the user for warning the user of the occurrence of the predetermined events within the received speech signals.
20. A method of varying the interaction between a computer application and a user, the method comprising the steps of: receiving speech signals from the user; processing the received speech signals and deriving from the processed speech signals an indication of the mood of the user; and controlling the interaction between the computer application and the user in dependence upon the derived mood of the user.
21. A speech processing method comprising the steps of: storing signals representative of a known speech to be spoken by a speaker: receiving speech signals from the speaker as the speaker delivers the known speech; and determining from the stored signals and the received signals the position of the speaker within the known speech.
22. A method of operating an autocue for prompting a user with the next part of a known speech, the method comprising steps of: storing signals representative of the known speech to be spoken by a speaker; receiving speech signals from the speaker as the speaker delivers the known speech; determining from the stored signals and the received signals the position of the speaker within the known speech; and informing the user of the next part of the known speech to be spoken by the speaker.
23. A data carrier programmed with instructions for carrying out the method according to any of claims 19 to 22 or for implementing the apparatus of any of claims 1 to 18.
24. A speech monitoring system or method substantially as hereinbefore described with reference to or as shown in any of Figures 1 to 3.