EP1344211A1

EP1344211A1 - Device and method for differentiated speech output

Info

Publication number: EP1344211A1
Application number: EP01991746A
Authority: EP
Inventors: Georg Obert; Klaus Bengler
Original assignee: Bayerische Motoren Werke AG
Current assignee: Bayerische Motoren Werke AG
Priority date: 2000-12-20
Filing date: 2001-11-21
Publication date: 2003-09-17
Anticipated expiration: 2021-11-21
Also published as: EP1344211B1; DE50115798D1; US20030225575A1; DE10063503A1; US7698139B2; WO2002050815A1; ES2357700T3; JP2004516515A

Abstract

The invention relates to a device and to a method for differentiated speech output. The systems available in a motor vehicle, such as on-board computer, navigation system and others can be linked with a speech output device. The speech output of different systems can be differentiated by voice characteristics.

Description

Device and method for differentiated speech output

The present invention relates to a device for differentiated speech output or speech generation and an associated method, systems for use with the speech output device and combinations of a speech output device with at least two systems, in particular for use in a vehicle.

Individual systems are used in vehicles that have an acoustic human-machine interface for voice output. A voice output module is directly assigned to each of these systems. The language-producing methods used are mostly based on pulse code modulation (= PCM), whereby subsequent compression (e.g. MPEG) can be connected. Other systems use speech synthesis methods, which form words and sentences mainly by assembling syllable segments (phonemes) (signal manipulation).

There is also a speaker dependency in the aforementioned speech output methods, which requires that the same human speaker be used for recordings when expanding the scope of the word or text. Furthermore, just like high-quality phoneme synthesis through signal manipulation, PCM methods require considerable storage space in order to store texts or syllable segments. With both methods, the storage space increases considerably if different national languages are to be output.

Methods are also known which are based on a full synthesis of the language. In particular, methods are known which implement the human vocal tract as an electrical equivalent and work with a tone generator and several downstream filters (source-filter model). A device that works according to this process is a so-called formant synthesizer (eg KLATTALK). Such a formant synthesizer has the advantage that the voice characteristics can be influenced. The object of the invention is to provide a device and an associated method with which a differentiated speech output is possible, as well as systems for use with the speech output device and combinations of a speech output device with at least two systems, in particular for use in vehicles.

This object is achieved with the features of the claims.

The invention has the advantage that speech outputs for different systems are possible with a single speech output device or speech synthesis device, each system being identifiable by voice characteristic differences.

According to a preferred embodiment of the invention, a parameter set is assigned to each system, which is used by the speech synthesis device in a speech output by this system. For example, a first parameter set for an on-board computer, a second parameter set for a navigation system, a third parameter set for traffic information, a fourth parameter set for a TTS system (Text to Speach system), such as e-mail, and one or more further parameter sets ^'provided for additional systems.

Depending on the assigned parameter set, the speech synthesis device generates the speech output, for example with a soft female voice, e.g. B. for voice output of a navigation system, or with a hard male bass voice, e.g. B. for voice output of traffic reports.

According to a preferred embodiment of the invention, a method and a device for a full synthesis of speech is used, preferably a formant synthesizer. The control parameters for the synthesizer are divided into classes. A class of dynamic parameters controls the articulation, like the movement of the speech tract when speaking. A second class of static parameters controls speaker-characteristic features, such as the generator basic frequency and fixed formants, which are used in a child, a woman or a male speaker are formed by the different geometric dimensions of the speech tract.

With an extended model of the formant synthesizer, a separate generation of voiced and unvoiced sounds is possible. Additional resonators or attenuators can be switched on by further parameters or the dynamic parameters for the articulation can be influenced.

The device according to the invention and the method according to the invention can be used in particular in systems of a vehicle. Each system has two options for voice output to control the voice output. The first way of voice output involves sending an output of control commands for voice articulation, the sequence of control parameters for words, sentences and sentence sequences being stored in the system. The second option for controlling the speech output is via a second output, which switches over a parameter set that is decisive for the speaker characteristic.

As an alternative or in addition, it is also possible to store this parameter data record directly in the system and to load the parameter data record into the speech synthesis device when speech is required.

According to a further preferred embodiment, which can be used as an alternative or in addition to the above embodiments, to differentiate the sources of information, i. H. of the systems that perform a speech output, the generator and formant parameters are also changed dynamically. This makes it possible to achieve audible differences in the prosody, such as the duration and / or emphasis on syllable segments and / or the sentence melody. In particular, prosodic modulation depending on e.g. B. from a traffic situation or a traffic situation can be used for the voice output of announcement texts. Finally, the explosiveness of information can be expressed by modulating the voice.

The invention has the advantage that, for. B. in a vehicle only a single voice generator with a small parameter memory from multiple information sources can be controlled. The information sources can be equipped with different voice characteristics.

When using a full synthesis device, e.g. B. a vocal tract synthesis device shows that the method is speaker-independent and no high-quality studio recordings are required.

With an extended formant synthesizer, emotional expression in the voice can also be given according to the invention.

The voice characteristics can be changed very easily using pre-made parameter templates. The procedure is also suitable for converting free texts into speech (Text to Speech), e.g. B. reading aloud email.

The invention is explained below with reference to an embodiment and the drawing.

1 shows a basic illustration of a preferred embodiment of the invention for differentiated speech output with a plurality of systems according to the invention.

The preferred embodiment of the invention shown in FIG. 1 has a speech output unit 1 with a speech synthesis device 10, which in the example is a vocal tract synthesis module and is based on full speech synthesis. For example, a formant synthesizer like KLATTALK can be used. The speech synthesis device 10 is connected to an amplifier 12, the output 14 of which supplies an audio signal which outputs speech via a loudspeaker (not shown). The speech synthesis device 10 is assigned N parameter sets 21, 22 to 2N, which in the example shown are stored in a memory 20 of the speech output unit 1. Furthermore, N systems 31, 32 to 3N are shown, each of which is connected to the voice output unit 1 via a data connection, such as individual lines, a bus system or data channels. Each system can carry out a voice output via the voice output unit. Specifically, an on-board computer 31 with an associated parameter set for the on-board Computer 21, a navigation system 32 with an associated parameter set for navigation 22, a traffic information system 33 with an associated parameter set for traffic information 23, an e-mail system such as TTS system 34 with an associated parameter set for e-mail 24. Additional systems 3N with a respective assigned parameter set 2N can be provided. In the example shown, it is possible, using a single speech output unit 1, to have the navigation system 32 speak, for example, with a soft female voice, which is determined by the parameter set for the navigation system 22. A parameter set 23 can also be provided for traffic reports, for example, with which a hard male bass voice is used in the speech output.

The sequence of the speech outputs can take place in succession according to the receipt of the order for the speech output from the systems. Information with a higher priority, e.g. Traffic information in dangerous situations such as wrong-way drivers is first output by voice output. Information with the highest priority, e.g. Information is immediately output from the on-board computer about malfunctions of the vehicle or the onset of slippery road surfaces, whereby an ongoing voice output can be interrupted. The interrupted speech output can then be completed or repeated.

The invention has the advantage that systems with an acoustic display provide the driver with information from various systems without distracting him from his task, as is the case with visual displays. Costs can be saved by using a speech synthesis device that can be used by various on-board computers. Compared to previously used language-producing methods in navigation systems, for example, the storage space requirement can be reduced.

The invention can be used particularly advantageously in motor vehicles.

Claims

claims

1. A device for differentiated speech output (1), which can be connected to a first system (31) and at least one further system (32, 33 to 3N), the speech output of the first system (31) being assigned a first voice characteristic and the further one A further voice characteristic is assigned to the voice output of the further system (32, 33 to 3N), which audibly differs from the first voice characteristic.

2. Device according to claim 1 with a speech synthesis device (10) which receives control parameters which have a first class of dynamic parameters and a second class of static parameters, the dynamic parameters being the articulation corresponding to the movement of a speech tract and the static parameters controls the characteristics of the voice.

3. The device according to claim 2, wherein the static parameters have a generator basic frequency and / or fixed formants, which preferably correspond to the different geometric dimension of the speech tract in a child, a woman or a male speaker.

4. The device according to claim 3, wherein generator and / or formant parameters for the speech output of different systems can be changed and preferably audible differences in the prosody such as the duration and / or emphasis of syllable segments and / or the sentence melody are effected.

5. Device according to one of claims 2 to 4, wherein the speech synthesis device (10) is a formant synthesizer with which the voice characteristic properties can be influenced.

6. The device according to claim 5, wherein the formant synthesizer is suitable for generating voiced and unvoiced sounds separately, and in particular additional resonators or attenuators can be switched on by further parameters and / or the dynamic parameters for the articulation can be influenced.

7. Device according to one of claims 2 to 6, wherein the dynamic parameters corresponding to the sequence of words, sentences and sentence sequences are stored in each system.

8. Device according to one of claims 2 to 7, wherein the static parameters are stored as a parameter set in each system and this parameter set is transmitted to the speech synthesis device (10) when speech is required.

9. Device according to one of claims 2 to 7, wherein the static parameters for the systems are stored as assigned parameter sets in a memory (20) of the speech output device and, depending on a selection signal of a system, an assigned parameter set from the speech synthesis device (10) for the speech output is used.

10. Device according to one of claims 2 to 9, wherein the speech synthesis device (10) is connected to an amplifier (12) and a voice output takes place via an audio output (14) of the amplifier (12).

11. System for use with a device according to one of claims 1 to 10, with a first output for outputting dynamic parameters and a second output for outputting a selection signal for switching over a parameter set in the speech output device (10).

12. System for use with a device according to one of claims 1 to 10, with an output for output of dynamic parameters and static parameters, preferably as a parameter set to the speech output device (10).

13. Combination of a device according to one of claims 1 to 10 with at least a first and a further system, such as an on-board computer (31), a navigation system (32), a traffic information system (33), an e-mail system (34), or an information system (3N), preferably for use in a vehicle.

14. A method for differentiated speech output using a device according to one of claims 1 to 10.