DE1922170A1

DE1922170A1 - Speech synthesis system

Info

Publication number: DE1922170A1
Application number: DE19691922170
Authority: DE
Inventors: Shinichiro Hashimoto; Shuzo Saito; Hisashi Wakita
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1968-05-01
Filing date: 1969-04-30
Publication date: 1969-11-13
Also published as: SE355887B; GB1224137A; FR2007620A1; JPS4949241B1; DE1922170B2

Description

8 MÜNCHEN 86, DEN8 MUNICH 86, DEN

POSTFACH 860 820PO Box 860 820

MÖHLSTRASSE 22, RUFNUMMER 48 3921/22MÖHLSTRASSE 22, CALL NUMBER 48 3921/22

NIPPON TELEGRAPH AND TELEPHONE PUBLIC CORPORATION, 1, 1-chome, Uchisaiwaicho, Chiyodä-ku, Tokyo, JapanNIPPON TELEGRAPH AND TELEPHONE PUBLIC CORPORATION, 1, 1-chome, Uchisaiwaicho, Chiyodä-ku, Tokyo, Japan

Sprachsynth.es esyst emSpeech synthesizer esyst em

Die Erfindung bezieht sich auf ein Sprachsynthesesystem, mit dessen Hilfe mechanisch eine Synthese von Sprache erfolgt, die der natürlichen Sprache weitgehend entspricht, und zwar durch Aufnahme eines Befehles aus einer Reihe von Phonemsymbolen und Verwendung dieser Befehle ,als Eingabegrößen.The invention relates to a speech synthesis system with whose help is a mechanical synthesis of language that largely corresponds to natural language, namely by taking a command from a series of phoneme symbols and using these commands as input variables.

Im Zusammenhang 'mit der Erzeugung synthetischer Sprache durch Zusammenfassen gespeicherter Daten nach einer Sprachsyntheseregel auf einen Befehl hin, der sich auf eine Phonemreihe bezieht, sind bereits verschiedene Syntheseverfahren vorgeschlagen worden. Bei diesen Verfahren wird ein Phonem als eine Einheit benutzt; ferner wird eine Silbe als.eine Einheit benutzt und außerdem wird eine Kombination von zwei Phonemen als eine Einheit benutzt. Hinsichtlich der gespeicherten Daten werden somit als Grundparameter z.B. Formantfrequenzen und Grundfrequenzen für die Bildung eines Phonems, einer Silbe und von Kombinationen von zwei Phonemen benutzt.In connection with the generation of synthetic speech by Summarizing stored data according to a speech synthesis rule in response to a command that relates to a series of phonemes, various synthetic methods have already been proposed. In this method, a phoneme is used as a uses a unit; furthermore, a syllable is used as a unit and also a combination of two phonemes is used as one unit. Regarding the saved Data are thus used as basic parameters, e.g. formant frequencies and fundamental frequencies are used for the formation of a phoneme, a syllable and combinations of two phonemes.

909846/0759909846/0759

Bei einem derartigen bekannten konventionellen Sprachsyhthesesystem, bei dem zuerst ein Phonem als eine Einheit und eine Wörter oder Sätze umfassende Aussage durch Kombinieren derartiger Einheiten zusammengesetzt wird, ist die für das Kombinieren der Phoneme zu befolgende Regel so kompliziert, daß es sehr schwierig ist, ein System mit guter Sprachqualität zu erhalten. Ein System, bei dem eine Silbe als eine Einheit benutzt wird, besitzt ebenfalls den Nachteil, daß die Regel für das Kombinieren der Silben sehr kompliziert ist. Das System, in dem eine Kombinat ion, von irgendwelchen zwei Phonemen als eine Einheit benutzt wird, wenn jedes Phonem wiederholt zweimal gelesen wird, arbeitet darüber hinaus im Hinblick auf die Auslösezeit und im Hinblick auf die für die Zusammensetzung erforderliche Zeit nicht nur unwirtschaftlich, sondern es wird darüber hinaus die Regel, die zur Zusammensetzung zu befolgen ist, kompliziert. Dies trifft insbesondere für das Wort "anden" zu, dem die Phonemkombinationsfolge an-nd-da-θη, entspräche. Daraus ergäbe sich eine technische Schwierigkeit beim Betrieb, und zwar hinsichtlich der Kombination der Konsonanten, wie d-d. Die Folge hiervon ist, daß keine zufriedenstellende synthetische Sprache erzielt wird. Die Gründe hierfür liegen im wesentlichen darin, daß ein und dasselbe Phonem sich durch den Einfluß eines anderen mit ihm zu kombinierenden Phonems ändert und so wirkt, als wäre es ein anderes Phonem, und daß eine derartige Änderung in komplizierter Weise durch die Betonung und andere prosodische Merkmale beeinflußt wird. Eine derart komplizierte Kombinierungsweise, wie sie gerade beschrieben worden ist, ist derzeit als mit einer vergleichsweise einfachen Regel kaum steuerbar anzusehen. In such a known conventional speech synthesis system, in which first a phoneme as a unit and a statement comprising words or sentences by combining them such units is composed, is the the rule to be followed for combining the phonemes so complicated, that it is very difficult to get a system with good speech quality. A system in which a syllable is used as a Unit is used also has the disadvantage that the rule for combining the syllables is very complicated. The system in which a combination of any two phonemes is used as a unit, if each phoneme is read repeatedly twice, moreover, works in terms of the trigger time and in terms of for the Composition time required not only is uneconomical, but it also becomes the rule that goes to composition is complicated to obey. This is particularly true of the word "anden", which is the phoneme combination sequence an-nd-da-θη, would correspond. This would result in a technical one Difficulty in operation in terms of combining consonants such as d-d. The consequence of this is that no satisfactory synthetic speech is obtained. The reasons for this are essentially that a and the same phoneme changes under the influence of another phoneme to be combined with it and acts as if it were another phoneme, and that such a change in more complicated Wise through stress and other prosodic features being affected. Such a complicated manner of combining as just described is currently known as hardly controllable with a comparatively simple rule.

Der Erfindung liegt daher die Aufgabe zugrunde, ein neues Sprachsynthesesystem zu schaffen, das die Nachteile der bisherigen Sprachsynthesesysteme vermeidet. Das neu zuThe invention is therefore based on the object of creating a new speech synthesis system that has the disadvantages of avoids previous speech synthesis systems. That new too

909848/0759909848/0759

OBIG»*»OBIG »*»

ο α Beο α Be

H=" ^ —H = "^ -

922170922170

schaffende Sprachsynthesesystem soll gespeicherte Daten auszunutzen gestatten, die sich auf Einheiten von Konsonant-Vokal- bzw« Vokal-Vokal-Reihen "beziehen. Das neu zu schaffende Sprachsynthesesystem soll ferner derartige Einheiten derart zu verbinden erlauben, daß die Vokalteile an beiden Enden auftreten. Ferner soll das neu zu schaffende Sprachsynthesesystem mit einer besonders einfachen Syntheseregel auskommen, wenn jede Einheit in Vokalteilen verbunden wird. Schließlich soll die synthetische Sprache weitgehend der natürlichen Sprache entsprechen.creating speech synthesis system should allow the use of stored data, which are based on units of consonant-vowel- or "vowel-vowel rows". The new to be created Speech synthesis system should also such units allow the vowel parts to appear at both ends. Furthermore, the newly created speech synthesis system should manage with a particularly simple synthesis rule, if each unit is connected in vowel parts. After all, synthetic language is said to be largely natural language correspond.

Gelöst wird die vorstehend aufgezeigte Aufgabe mit einem Sprachsynthesesystem erfindungsgemäß dadurch, daß durch Phonemketten gebildete Syntheseeinheiten verwendet werden, in welchen die Enden Vokale in einer Vokal-Konsonant-Vokalbzw. Vokal-Vokal-Folge Sind, daß ein Datenspeicher vorgesehen ist, in welchem entsprechend grundsätzlichen Parametern Daten gespeichert sind, die sich auf phonemische Merkmale der Syntheseeinheiten beziehen, und daß diese Daten nach einer solchen, sich auf ein linguistisches Kennzeichen beziehenden Regel modifiziert und in denselben Vokalteilen verbunden werden, daß eine der natürlichen Sprache .weitgehend entsprechende, automatisch zusammengesetzte Sprache gebildet ist.The problem outlined above is achieved with a Speech synthesis system according to the invention in that by Synthesis units formed by phoneme chains are used, in which the ends vowels in a vowel-consonant-vowel or vowel. Vowel-vowel sequences are a data memory provided is, in which, according to basic parameters, data are stored that relate to phonemic features of the synthesis units relate, and that these data after such, relate to a linguistic identifier Rule modified and connected in the same vowel parts so that a natural language. automatically composed language is formed.

An Hand von Zeichnungen wird die Erfindung nachstehend näher erläutert. -"The invention is explained below with reference to drawings explained in more detail. - "

Fig. 1 zeigt in einem Blockdiagramm eine Ausführungsform des erfindungsgemäßen Sprachsynthesesystems. Fig. 2a und 2b vei*anschaulichen an einem Beispiel gespeicherte Daten von Phonemverkettungen, die in dem erfindungsgemäßen Sprachsynthesesystem zu verwenden sind, und in einem Diagramm ein Amplitudenbild einer Phonemverkettung. Fig. 5 verdeutlicht ein Beispiel von in den gespeicherten Daten enthaltenen prosodischen Merkmalen.Fig. 1 shows in a block diagram an embodiment of the speech synthesis system according to the invention. 2a and 2b show clearly stored data using an example Data of phoneme concatenations that are used in the inventive Speech synthesis system are to be used, and in a diagram an amplitude image of a phoneme concatenation. Fig. 5 illustrates an example of the stored in Data contained prosodic features.

ORIGINAL INSPECTEDORIGINAL INSPECTED

909846/0759909846/0759

"U"U

191211191211

^ig» ώ- w@3?R£Lsohaulicht is ©insüä Biagrsiiaia die ms sprseaesdea seitlichen Ernstera einer^ ig »ώ- w @ 3? R £ Lsohaulicht is © insüä Biagrsiiaia the ms sprseaesdea lateral seriousness one

xnß, ^fon fSprechenergie vorhandenen Besiefeisigea*" ". .-"- ". ■ " w xnß, ^ fon fspeaking energy existing Besiefisigea * "". .- "-". ■ "w

Hie Änsalal der¹ aus den oben erwähnten P ■ssfeildsten Kett-eneinheiten beträgt iaHere the number of ¹ from the above-mentioned chain units is ia

800 omfL einschließlich ifoe. Spesialf äll©B aia Asieiisg Satses !sehr ale 90O₃ -Bisse Aasalil ist um"©'kja ©ia© Ordnung höher als die Anzahl der M3,oB©iEeiafe,eit@a,₃ iiöiten oder aus jeweils zwei Phonemen- best©li©©sL©n. Sitsheitea800 omfL including ifoe. Spesialf äll © B aia Asieiisg Satses! Very ale 90O ₃ -Bites Aasalil is by "© 'kja © ia © order higher than the number of M3, oB © iEeiafe, eit @ a, ₃ or two phonemes-best © li © © sL © n. Sitsheitea

3a sich die akustischen ligeaschaftea" eines feisaleä ia 4e© Mittelbsreich verhältnismäßig langsam 3BeLeKa₀ ©ri©Igt'- ein© Sprachsynthese-durch Verbinden dar Bislisites isä. ä.erartig©a-Bereichen. Batorsb. ist die zur f@rbini.eiig diQBSS¹ "Sialseitsa zu "befolgende Hegel sehr einfach ₉ und die Qualität "der tischen Sprache'ist günstig., Bine WhoiienkQ^t^ _% bei . der * beide"- Sndendursh.Vokalsgebildet und als Speieherdaten..au speieheE¹!!" SXiId₃ wird ia Joz-m 'von Baten .geBpeiahestn- 4ie sich. ss._BB._o auf-. das Spektrum, die Zeit und den Srregi2Hgsp@ggl -©®.si©h©n_o Isrner werden einig© Steuerdaten gespeichert} wi©- Solo-: eis.die Art der Erregerquelle bezeichnender ICode^ und awar ia dem IaIl₅ daß der die Sprachsynthese ¥ornshmende Generator vom End- . " Analogtjp ist« ilnst-@l.le d@r betrachteten Baten., köisaea aucli . Daten "gespeichert werden, die siöh"s_eB_o auf die" Querschnitts-"fläche, dis Seit und den Erreger pegel besiehen·, und ferner können siiiig® Stsuardatsa in dem lall" gespeichert^; sein, daß der die Spracfesyathese vornehmende "Generator y&m, sogenannten Sachen- und Mundhohlrauin-Analogtyp ist« -3a, the acoustic ligeaschaftea "a feisaleä ia 4e © Mittelbsreich relatively slow 3BeLeKa ₀ © ri © © Igt'- a speech synthesis by connecting represents Bislisites isä. Ä.erartig © a-areas. Batorsb. Is to f@rbini.eiig diQBSS ¹ "Sialseitsa to" obey Hegel very simply ₉ and the quality "of the table language" is favorable., Bine WhoiienkQ ^ t ^ _% bei. der * both "- Sndendursh.Vokals formed and as Speieherdaten..au speieheE ¹ !!" SXiId ₃ is ia Joz-m 'von Baten .geBpeiahestn- 4ie himself. ss. _B B. _o on-. the spectrum, the time and the Srregi2Hgsp @ ggl - © ®.si © h © n _o Isrner some © control data are saved} wi © - Solo-: eis.the type of excitation source designating ICode ^ and awar ia the IaIl ₅ that the the speech synthesis ¥ ornshmende generator from the end. "Analogtjp is" ilnst-@l.le d @ r considered Baten., Köisaea aucli. Data "are stored, the siöh" s _e B _o to the "cross-sectional" area, dis Since the pathogen and besiehen level · and siiiig® Stsuardatsa can also be saved in the lall "^; be that the "generator y & m, which carries out the speech syathesis, is what is known as the analog type of things and cavities in the mouth" -

Die "gespeicherten Daten sind Phonemlcetten einer Form«, bei der beide-Eadöii -durch Vokale gebildet sinde- Ber Beginn oder" -. das Ende sines susamaenzusetsenden Wortes oder Satzes ist häufig durch einen Komsoaaiiten gebildet« Bei der synthetischen ΖηβΒΜΜΒΆΒteilung einer derartigen Spräche'kann.in den * The "stored data are phoneme chains of a form" in which both - Eadöii - are formed by vowels - beginning or "-. the end of sines susamaenzusetsenden word or phrase is frequently formed by a Komsoaaiiten "In the synthetic ΖηβΒΜΜΒΆΒ such Spräche'kann.in the distribution *

9SSS4S/07-SS ■ ■ ■ ■ .9SSS4S / 07-SS ■ ■ ■ ■.

ORIGINAL. INSPECTEDORIGINAL. INSPECTED

GO Got' te·* SCLGO Got 'te * SCL

gespeicherten Daten eine Phönem&ette von der Form enthalten sein, bei.der der erste Vokal in der vorangehenden Kette fehlt oder der zweite Vokal in der letzteren Kette. In einem solche» Fall nimmt die Anzahl gespeicherter Daten jedoch zu. Um dies zu verhindern, werden gespeicherte Daten, in welchen der erste Vokal und der zweite Vokal jeweils durch ein- und dasselbe Phonem gebildet sind, derart benutzt, daß der erste Vokal und der zweite Vokal derartiger Phon@mketten abgeleitet werden können. Dadurch kann eine Sprache zusammengesetzt werden, bei der der Beginn oder das Ende eines Wortes durch einen Konsonanten gebildet ist. So werden z.B. bei der Zusammensetzung des Wortes m#ni die drei Phonemketten iß) m B - #ni - i (i) benutzt, Dabei wird für /m d/ am Beginn des Wortes eine' Phonemkette benutzt, die dadurch erhalten wird, daß der Vokal /d / aus der Folge /9 md / entfernt wird.stored data may contain a phoneme of the form in which the first vowel in the preceding chain is absent or the second vowel in the latter chain. In such a »case, however, the amount of data stored increases. To prevent this, stored data in which the first vowel and the second vowel are each formed by one and the same phoneme are used in such a way that the first vowel and the second vowel of such phoneme strings can be derived. This allows a language to be composed in which the beginning or the end of a word is formed by a consonant. For example, when composing the word m # ni, the three phoneme strings iß) m B - #ni - i (i) are used. A phoneme string is used for / m d / at the beginning of the word, which is obtained by the vowel / d / is removed from the sequence / 9 md / .

Tritt ein derartiger Nasallaut am Ende einer Silbe auf, wie bei /n/ _;in /sent 8 /(Mitte), so wird dieser Nasallaut gemäß der vorliegenden Erfindung als eine Art Vokal behandelt, und zwar derart, daß die Anzahl der zu speichernden Einheiten herabgesetzt werden kann.If such a nasal sound occurs at the end of a syllable, as with / n / _; in / sent 8 / (middle), this nasal sound is treated according to the present invention as a kind of vowel, specifically in such a way that the number of units to be stored can be reduced.

Bei der Zusammensetzung des zuvor betrachteten Wortes /sent#/' werden die vier Phonemketten £ej se - en -nt d - 3\p\ benutzt, und für den Teil des Nasallautes /n/ werden die Phonemketten /en/ und /ntθ / verwendet. Dabei wird der Nasallaut /n/ als Vokal betrachtet.In the composition of the previously considered word / sent # / 'the four phoneme strings £ ej se - en -nt d - 3 \ p \ are used, and the phoneme strings / en / and / nt θ are used for the part of the nasal sound / n / / used. The nasal sound / n / is regarded as a vowel.

Die gemäß der vorliegenden Erfindung gespeicherten Daten der Phonemketteneinheiten können Änderungen grundsätzlicher Parameter enthalten, die sich auf Phonemeigenschaften beziehen, welche durch Betonungen und andere prosodische Eigenschaften gegeben sind. Ferner können sich die betreffenden Parameter auf Regeln beziehen, die linguistische Eigen-The data of the phoneme chain units stored according to the present invention can be changed more fundamentally Contain parameters relating to phoneme properties defined by stresses and other prosodic Properties are given. Furthermore, the parameters concerned can relate to rules that have linguistic properties

909846/0759909846/0759

« #« fr«#« Fr

1-9 S 1-9 p

schäften betreffen, wie die Einfluss© ¥on Phonemen auf Betonungen,business concerns how the influence © ¥ on phonemes on Stresses,

Bei einem End-Analogsystem, wie es die vorliegende Erfindung anwendet, werden z.B. zur Erzielung einer männlichen Stimme mit einer mittleren Grundfrequenz von 15O Hz Grundparameter gespeichert; zur Erzielung einer synthetischen Sprache mit einer davon abweichenden mittleren Grundfrequenz oder z.B. einer weiblichen Stimme mit einer Grundfrequenz von 250 Hz ist eine einen der Grundparameter darstellende Regel gespeichert, nach der eine Formantfrequenz umgesetzt wird, und zwar in eine Formantfrequenz, die eine neue Grundfrequenz von 250 Hz betrifft. Es dürfte sich erübrigen darauf hinzuweisen, daß in derselben Weise wie die Phonemketten gespeichert werden auch die Beton__ungen und andere prosodische Eigenschaften betreffende Daten gespeichert werden können.In an end-to-end analog system such as the present invention are used, for example, to achieve a male voice with a mean base frequency of 150 Hz Basic parameters saved; to achieve a synthetic speech with a different mean basic frequency or e.g. a female voice with a basic frequency of 250 Hz is one of the basic parameters Rule stored according to which a formant frequency is converted into a formant frequency, the one new basic frequency of 250 Hz concerns. It should be superfluous to point out that the accentuations and are stored in the same way as the phoneme strings data relating to other prosodic properties can be stored.

Bei der natürlichen Sprache ist zwischen dem zeitlichen Bild der Grundfrequenz und dem zeitlichen Bild der Sprechenergie eine derart enge Beziehung vorhanden, daß durch alleinige Kennzeichnung der beiden sich auf die Lage der Betonung in den Phonemreihen der zusammenzusetzenden Sprache und auf den Maximalwert der mit der Grundfrequenz in der zusammengesetzten Sprache auftretenden Schwingung beziehenden prosodischen Eigenschäften nach dieser Regel eine synthetische Sprache erhalten werden kann, die dicht bei einer natürlichen Sprache liegt.In natural language there is between the temporal image the fundamental frequency and the temporal image of the speech energy have such a close relationship that through sole Marking of the two on the position of the stress in the phoneme series of the language to be compounded and on the Maximum value of the compound with the fundamental frequency Prosodic properties relating to vibrations occurring in language according to this rule, a synthetic language which is close to a natural language can be obtained.

Bei der vorliegenden Ausführungsform ist z.B. zur Festlegung von prosodischen Eigenschaften, denen zufolge der Maximalwert der Grundfrequenz bei I50 Hz liegt und der erste Vokal eingegebener Phonemreihen betont wird, eine Regel gespeichert, die die Korrelation zwischen der Grundfrequena und der Sprechenergie festlegt, und zwar entsprechend einem aus ]?ig. 4 ersichtlichen zeitlichen Muster.In the present embodiment, for example, it is to be determined of prosodic properties, according to which the maximum value of the fundamental frequency is 150 Hz and the first Vowel entered phoneme series is emphasized, a rule is saved, which determines the correlation between the fundamental frequency and the speech energy, according to one off]? ig. 4 apparent time patterns.

909846/0759909846/0759

ORfQiNAL INSPECTSDORfQiNAL INSPECTSD

-D Λ r*3»Ö-D Λ r * 3 »Ö

2217022170

a &©χ·- Ha1i!felie&©a Spsaeiis t-iird ein bestimmtar fokal.ia & © χ · - Ha1i! felie & © a Spsaeiis t-iird a certain focal.i

tiosrfe ©der ©la am Hade ©ines Sata©s auftretender !Fokal i©ri;· -gjritt eine derartig© Bevokalisierung - meb. die akustische Eigenschaft...des "?oicals. g@a g@spel©hea?t-er Baten ₅ die -d@a betreffendsn in ^©rm iroa BhosassiketteiisiiaSieiten entsprechen, ka?m werdeB.₂ AaS dea? Klang der s-ynthet£seiiea. Sprache sich, as elite- natürliche Spraela® aimaliert_B Wsrn&t kazm'die änderung akustisclien Sigeiisciiafteii durch die Be^okalisierung nach Segel ■ erfolgen, ü±® ia der gleiciien leise wie ander© anzuwenden ist« - . -tiosrfe © the © la am Hade © ines Sata © s occurring! Fokal i © ri; · -gjritt such a © bevocalization - meb. the acoustic property ... of the "? oicals. g @ a g @ spel © hea? t-er baten ₅ the -d @ a concerningn in ^ © rm iroa BhosassiketteiisiiaSieiten correspond, ka? m will B. ₂ AaS dea? sound of the s -ynthet £ seiiea. Language itself, as elite- natural Spraela® aimalized _B Wsrn & t kazm'the change acoustical Sigeiisciiafteii by the occupancy according to Segel ■ take place, ü ± ® ia the same quiet as other © is to be used «-. -

-Bai Ü.BT for die Bjpatlieseeliöiieiti ■ bei der vorliegenden Erfindung ötä bsBsi*send0i?h©2ieiaketiie;siaa beide Enden dmi?©a.¥©kale ge- Ml&etr. Zwischen diesen ¥o&alea lcann eine Vielsa&l" von-Koaso-"aantenliegen* Wird sin derartiges Slanggefeiide _s in welcaem sias- Vielzahl, von SonsomasifeeB fortlaufend auftritt;, als ein aaabhäagiger Konsonant t?st2?aeMtet _s so aimat di©_ speich©rfcer Daten zu*'Die itozaHl derartiger ist Qeäocii ν erhält nismäBig- gering ₉ so daß ite Einfluß eben« falls gering ist* Bei der Susammezisetzung des Portes /straik/ wird z.B. eine Phonemk-etisesieiniieit /astra/ benutzt, die autS Cal stra *■ ai - i Jr IiJ siasamiaengesetst ist und die drei Eonsonanten aufweist»-Bai Ü.BT for die Bjpatlieseeliöiieiti ■ in the present invention ötä bsBsi * send0i? H © 2ieiaketiie; siaa both ends dmi? © a. ¥ © kale ge Ml & etr. Between these ¥ o & alea lcann a Vielsa & l aantenliegen "by-Koaso-" * If sin such Slanggefeiide _s in welcaem sias- variety occurs continuously from SonsomasifeeB ;, as a aaabhäagiger consonant t? St2? _S aeMtet so aimat di © _ mem © rfcer Data on * 'The itozaHl such is Qeäocii ν receives nismäBig- low ₉ so that ite influence is also low * In the definition of the port / straik / a Phonemk-etisesieiniieit / astra / is used, which autS Cal stra * ■ ai - i Jr IiJ siasamiaengesetst and has the three eonsonants »

Xn Mg. i -ist. in einem Bloekdiagramm eine Ausführuagsform des erfindungsgemäßen BpsiaclisyAfehesesystems gezeigt. Gemäß dieser Ausführungsforia wird- in einer Phonemreiaen-Einteileinrichtung lein aas einer Phonemreihe. isnd pro sodischen Sigenscßaftssjfflboleß" bes-feeliender Befehl P in die · Phoneiireihenimd einige eader© psosodische Symbole aufgeteilt'; Die Ehonemreihe wird dann imeiae Beihe von Phonemketteneinheitsn unterteilt», bei denen !»©ils "beide Enden durch Vokale ge» bildet sind« Sodann werden diesen Phoaemketteneiaheiten entsprechende Daten und des p3?©s©disehen Symbolen entsprechendeXn Mg. I -ist. an embodiment in a bloek diagram of the Bpsiaclisy deficiency system according to the invention. According to this embodiment is in a phonemes classifier not a line of phonemes. isnd pro sodic Sigenscßaftssjfflboleß "bes-feeling command P in the Phoneiireihenimd split up some eader © psosodic symbols'; the Ehoneme series then becomes imeiae a series of phoneme chain units divided », where!» © ils "both ends are separated by vowels» forms are then corresponding to these phoaemic chain units Data and the symbols corresponding to the p3? © s ©

prosodiache Baten aus einer diese
Phonemketten-SatenspeicliereinriclitiiHg 3 mittels -eiser leseeinrichtung 2 ausgelesen.. Diese Baten werd©s einander verbunden, wobei die Art dea? den zur these vorgesehenen Generator erregenden Quelle.^ -wi Impulse oder Störungen, bestimmt werden., Dato®! einer Syntheseregel L gearbeitet, und swai? is I die in einer. Quellen-Dateneinrichtung 4' ©2ÄsI'0$a@s Daten.und auf die in einer Spektrums—Dates©ässs enthaltenen Rachen- und Mundhohlraum-SpektSiiÄSg Daten werden einem zur Sprachsynthese "diijasndea des Snd-Analogtyps zugeführte, Von .einem-Lautsprecher¹" 7 schließlich die synthetsiche Sprache ab"g©g©beiw JSe ist" ohne weiteres möglich, als zur Sprachs^mtfees© Generator 6 eine Einrichtung, vom
Mundhohlraum-Analogtyp zu verwendenprosodiache requests from one of these
Phoneme chain satellite storage device 3 is read out by means of an ice reading device 2 .. These files are connected to each other, with the type dea? the source that excites the generator provided for the thesis. ^ -wi impulses or disturbances., Dato®! a synthesis rule L worked, and swai? is I the one in one. Source data device 4 '© 2ÄsI'0 $ a @ s data. And the pharynx and oral cavity spectSiiÄSg data contained in a spectrum — Dates © ßs are supplied to a speaker of the Snd analogue type for speech synthesis "diijasndea" ¹ "7, finally, the synthetic language from" g © g © beiw JSe is "easily possible, as a device for Sprachs ^ mtfees © generator 6, from
To use oral cavity analog type

aiafaiaf

f."f. "

In Fig, 2a ist ein Beispiel gespeicherter. veranschaulicht, wie sie in dem. erfindungsg^mäßen Sprach-*» synthesesysteis benutEt werden· 2)ab@i "bezeichnet, die llertf©lg©" "" FILE NAME den Kamen einer Phonemket-tenei&heit _β Mit AIiS sind " zur Steuerung dienende Hilfsdaten bezaichnst» Mit ΦΒΙΕ ist die zeitlich© Struktur der Phonemkette -feeseielinetj mit LElEL die relativ® Pegelbasiehung ^jeder Einheit in der Phonemkette" und mit PKO die Art-der die ^jeweilige-Einheit ..in der Phonem- -. kette erregende Quelle. Mit I¹OHM sind die Jrrequen&en. -der ersten bis η-ten Formante pro Zeiteinheit beaeiolmet tm& -nit. BAND die Bandbreite einer deiL-artigen Formante. Bei dem- vo3?- liegenden" Ausführ-ungsbeispiel sind neben den oben erwähnten Worten-TIME bzw«, ffOEI"! z.B.. die festgelegten konkreten Zahlenwerte angegeben«, In Fig. 2b ist schematisch ein Pegeldiagramm einer Phonemkette veranschaulicht. Fig. 3 veranschaulicht in einem Beispiel gespeicherte Daten, die sich auf die Prosodie .' der Grundfrequenzj des Amplitudenpegels und der ZeitbeziehungenIn Fig. 2a an example is stored. illustrates how they are in the. erfindungsg ^ MAESSEN voice * "synthesesysteis benutEt be · 2)", the llertf © lg © "from @ i""FILENAME the Kamen a Phonemket-Tenei & standardized _β With AIIS" are bezaichnst to control serving auxiliary data "With ΦΒΙΕ is the temporal © structure of the phoneme chain -feeseielinetj with LElEL the relative® level basis ^ of each unit in the phoneme chain "and with PKO the type of the ^ respective unit .. in the phoneme- -. chain arousing source. With I ¹ OHM the requirements are. -the first to η-th formants per unit of time beaeiolmet tm & -nit. BAND the range of a deiL-like formant. In the "exemplary embodiment" shown above, in addition to the above-mentioned words-TIME or ", ffOEI"! e.g. the specified specific numerical values are given. FIG. 2b shows a schematic diagram of a level diagram of a phoneme chain. Fig. 3 illustrates, as an example, stored data relating to prosody. ' the fundamental frequency j of the amplitude level and the time relationships

/0759/ 0759

OR!Glä%äÄL INSPECTfDOR! Glä% äÄL INSPECTfD

•für die Zusammensetzung eines Wortes beziehen, das in der ersten Silbe einen Betongungsschwerpunkt besitzt.• refer to the composition of a word that occurs in the first syllable has an emphasis.

Die durch das erfindungsgemäße Sprachsynthesesystem gebildete synthetische Sprache ist wesentlich natürlicher als irgendeine nach dem herkömmlichen System gebildete synthetische Sprache. Der Anwendungsbereich der durch das erfindungsgemäße Sprachsynthesesystem gebildeten Synthesespräche ist damit weit. So ist z.B. das erfindungsgemäße Sprachsynthesesystem für niederfrequente Ausgabeeinrichtungen von elektronischen Rechnerngeeignet sowie für lernsprechauskunftsysterne und für niederfrequente Ansageeinrichtungen für verschiedene Ansagedienste, und zwar unter Verwendung von Drucktasten-Fernsprechapparaten.The one formed by the speech synthesis system according to the invention synthetic speech is far more natural than any conventional system synthetic speech. The area of application of the synthesis speeches formed by the speech synthesis system according to the invention is so far. For example, the inventive Speech synthesis system suitable for low-frequency output devices of electronic computers as well as for learning information systems and for low-frequency announcement devices for various announcement services, namely using of push-button telephones.

Die gespeicherten Daten von Phonemketteneinheiten, bei denen beide Enden jeweils durch einen Vokal gebildet sind, wie dies bei der vorliegenden Erfindung der !Fall ist, können von irgendeiner Art sein, bei der die Sprachsyntheseeinheiten so ausgebildet sind, daß sie zumindest einen Teil der akustischen Eigenschaften der Phonemketteneinheiten und einen Teil der Eigenschaften der anderen prosodischen Eigenschaften umfassen, um nämlich eine synthetische Sprache hoher Qualität zu erzielen.The stored data of phoneme chain units where both ends are each formed by a vowel, as is the case with the present invention! be of any kind in which the speech synthesis units are designed so that they are at least a part of the acoustic Properties of the phoneme chain units and a part the properties of the other prosodic properties, namely, to achieve high quality synthetic speech.

909S46/075S909S46 / 075S

Claims

.:. . ■. 1922110;

Patent aasprlich'e ..

'"Speech synthesis system, characterized in that} that synthesis units formed by phoneme used v / ground, in which the ends of vowels in a vowel-consonant-vowel or vowel-Yokal sequence are that a data memory is provided in.welchem corresponding basic parameters data is saved. which relate to phonemic features of the synthesis units and that these data are modified according to such a rule relating to a linguistic characteristic and connected in the same vowel parts that an automatically composed language largely corresponding to the natural language is formed.

2. Speech synthesis system according to claim 1, characterized in that in the composition of a language at the consonants at the beginning and at the end of each word, such a phoneme concatenation of stored data is carried out is that there are vowels at both word ends that are formed by one and the same vowel are that if a word beginning with a consonant is present, the first consonant is removed, that in the presence of a word ending with a consonant, the second consonant is omitted, and that the phoneme strings formed in this way are used as data for speech synthesis «,

3. Speech synthesis system according to claim 1 or 2, characterized characterized in that for de-vocalizing a vowel in a word or a vowel at the end of a word stored data corresponding to the vowel in question the data of the respective word, if necessary, after a clie Saved rule related to de-vocalization .Ma-Added will.

9098A6/0759 _0RlaiNAL 9098A6 / 0759 _0RlaiNAL

characterized in that one following a syllable Hasallaut is rated as ¥ ecal, so that the number of the phonam chain units, each of which has both Ends formed by a yokal is reduced.

% Speech synthesis system according to one of Claims 1 to 4, characterized in that a sound in which consonants occur consecutively is evaluated as an independent consonant in which a phoneme concatenation of a multiplicity of consonants - ** - Yokal -f * in a bynthesis unit -I? Olgen is included.

Speech synthesis system according to claim 1, characterized in that a limited number of Bat @ n is stored which relate to temporal patterns of a fundamental frequency and of speech energy, so that these Baten are selected on a Bsefault Mm which relate to prosodic features of the symtfaetic Relates to speech, and that the acoustic property is modified according to such a rule that an improved speech quality results.

909846/0758

J2s
Blank page