[go: up one dir, main page]

CN111968619A - Method and device for controlling voice synthesis pronunciation - Google Patents

Method and device for controlling voice synthesis pronunciation Download PDF

Info

Publication number
CN111968619A
CN111968619A CN202010873463.4A CN202010873463A CN111968619A CN 111968619 A CN111968619 A CN 111968619A CN 202010873463 A CN202010873463 A CN 202010873463A CN 111968619 A CN111968619 A CN 111968619A
Authority
CN
China
Prior art keywords
pronunciation
pronunciation dictionary
text
key value
synthesized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010873463.4A
Other languages
Chinese (zh)
Inventor
王昆
朱海
周琳珉
展华益
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN202010873463.4A priority Critical patent/CN111968619A/en
Publication of CN111968619A publication Critical patent/CN111968619A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a method for controlling voice synthesis pronunciation, which comprises the following steps: creating a pronunciation dictionary; carrying out regularization processing and prosody analysis on the text to be synthesized and converting the text to be synthesized into a pinyin mark; reading the pronunciation dictionary, performing replacement processing on the pinyin marks, and converting the pinyin marks into phonemes; converting the phonemes into acoustic features using a speech synthesis model; converting the acoustic features into audio using a vocoder; the invention solves the problems of pronunciation error of polyphones of a voice synthesis system and self-adaption of accents of users.

Description

Method and device for controlling voice synthesis pronunciation
Technical Field
The invention relates to the technical field of voice processing, in particular to a method and a device for controlling voice synthesis pronunciation.
Background
Speech synthesis is a technique for converting text information into speech information, i.e. converting text information into arbitrary audible speech. In relation to various disciplines such as acoustics, linguistics, computer disciplines and the like, an End-to-End (End-to-End) modeling method mainly represented by Tacotron is the mainstream at present.
When using end-to-end speech synthesis techniques, front-end processing is crucial and it is necessary to make full use of the linguistic information in the text to obtain high quality speech synthesis results. However, pronunciation errors of polyphones have been a problem in the end-to-end speech synthesis process, and how to let users control speech synthesis according to their pronunciation habits is also considered by the speech synthesis system. Therefore, for the Chinese speech synthesis system, text replacement is carried out by constructing a pronunciation dictionary, so that the user can correct polyphone errors and adapt to the pronunciation habit of the user by inputting pinyin by himself.
Disclosure of Invention
The invention aims to solve the problems in the prior art, and provides a method and a device for controlling pronunciation of a speech synthesis system.
In order to achieve the purpose, the invention adopts the technical scheme that: a method of controlling speech synthesis pronunciation, comprising the steps of:
s1, creating a pronunciation dictionary;
s2, carrying out regularization processing and prosody analysis on the text to be synthesized and converting the text to be synthesized into a pinyin mark;
s3, reading the pronunciation dictionary, replacing the phonetic symbol, and converting the phonetic symbol into phoneme;
s4, converting the phoneme into acoustic features by using a speech synthesis model;
and S5, converting the acoustic features into audio by using a vocoder.
As a preferred embodiment, the step S1 is specifically as follows:
the key value of the pronunciation dictionary is Chinese words, the value is pinyin, and the pronunciation dictionary is initialized to be empty before the user inputs the pinyin; when the user inputs the input key value and the value, checking the input key value and the input value to ensure that the input key value and the input value are legal; if the key value in the pronunciation dictionary does not exist, adding the key value and the value input by the user into the pronunciation dictionary; if the key value exists in the pronunciation dictionary, updating the value corresponding to the key value; and the pronunciation dictionary supports the user to view and modify and delete.
As another preferred embodiment, the step S2 is specifically as follows:
and carrying out regularization processing on the text to be synthesized, screening out illegal characters, carrying out word segmentation and part-of-speech tagging on the input of the method, inputting the extracted comprehensive linguistic characteristics into a rhythm prediction model to obtain pause level tagging, and converting the Chinese characters into pinyin tags.
As another preferred embodiment, the step S3 is specifically as follows:
when the pronunciation dictionary is read, if the pronunciation dictionary is empty, no processing is carried out; and if the pronunciation dictionary is not empty, reading the pronunciation dictionary, performing word detection on the text to be synthesized through the key value of the pronunciation dictionary, and if the text to be synthesized contains the key value, replacing the pinyin mark on the key value corresponding to the text to be synthesized in the step S2 by the value corresponding to the key value, and keeping the rest unchanged.
In another preferred embodiment, in step S4, the speech synthesis model is tacontron or tacontron 2 or Transformer TTS.
In another preferred embodiment, in step S5, the network structure adopted by the model of the vocoder is WavNET, WavRNN or MelGAN.
As another preferred embodiment, in steps S4 and S5, the acoustic features are mel-frequency spectrum features or linear spectrum features or other acoustic features related to spectrum envelopes.
In order to solve the problems of pronunciation error of polyphones and self-adaption of accents of users in a speech synthesis system, the invention also provides a device for controlling the pronunciation of the speech synthesis, which comprises:
the pronunciation dictionary construction module is used for storing and reading Chinese words and pronunciations input by a user;
the text processing module is used for carrying out regularization processing and rhythm analysis on the text to be synthesized and converting the text to a pinyin mark;
the replacing processing module is used for reading the pronunciation dictionary, replacing the phonetic symbol and converting the phonetic symbol into a phoneme;
the synthesis module is used for converting the input processed text to be synthesized into acoustic features;
a vocoder module for converting the input acoustic features into audio.
The invention has the beneficial effects that:
the invention replaces the pinyin of the Chinese words in the voice synthesis process by the pronunciation dictionary which is self-defined by the user in the using process, so that the synthesized voice meets the pronunciation habit of the user, and the problem of polyphone errors can be corrected.
Drawings
FIG. 1 is a block flow diagram of an embodiment of the present invention;
FIG. 2 is a diagram illustrating a pronunciation dictionary creation method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an alternative pronunciation processing method according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Examples
As shown in fig. 1, a method for controlling pronunciation of a speech synthesis includes the following steps:
s1, creating a pronunciation dictionary:
the pronunciation dictionary is created in a way shown in fig. 2, the key value of the pronunciation dictionary is a Chinese word, the value is pinyin, and the pronunciation dictionary is initialized to be empty before the user inputs the pinyin; when the user inputs the input key value and the value, checking the input key value and the input value to ensure that the input key value and the input value are legal; if the key value in the pronunciation dictionary does not exist, adding the key value and the value input by the user into the pronunciation dictionary; if the key value exists in the pronunciation dictionary, updating the value corresponding to the key value; the pronunciation dictionary supports user viewing and modification deletion.
For example, firstly, a pronunciation dictionary is initialized to be { }, when a user uses the system, because the input text to be synthesized is 'Jiuzhitong donkey-hide gelatin hematinic granule', the speech synthesis system wrongly pronounces 'donkey-hide gelatin' (pinyin is represented as a1 jiao1, the latter number is the tone of the pinyin, and the correct pronunciation should be e1 jiao1), and then the user inputs: "Ejiao | e1 jiao 1" (this example takes the "|" symbol for segmentation). If the key value of the donkey-hide gelatin does not exist in the pronunciation dictionary, the pronunciation dictionary is newly added with the donkey-hide gelatin: and e1 jiao1', if the pronunciation dictionary contains the key value of donkey-hide gelatin, replacing the value corresponding to the key value with e1 jiao 1'.
S2, carrying out regularization processing and prosody analysis on the text to be synthesized and converting the text to be synthesized into a pinyin mark:
carrying out regularization processing on a text to be synthesized, screening out illegal characters, carrying out word segmentation, part of speech tagging and the like on legal input, and inputting the extracted comprehensive linguistic characteristics into a rhythm prediction model to obtain pause level tagging; the Chinese characters are converted into phonetic symbols.
For example, the "Jiuzhitang donkey-hide gelatin blood-enriching granule is sold in a 180 Yuan box. "," is first screened out in this example. The method includes the steps of "waiting for illegal characters, wherein the Arabic numerals" 180 "are converted into corresponding pronounced Chinese characters according to circumstances, then inputting legal texts to be synthesized into a prosody prediction model, obtaining pause level labels, and then converting the Chinese characters into phonetic symbols, namely [ 'jiu3', 'zhi1', 'tang2', '2', 'e1', 'jiao1', '1', 'bu3', 'xue4', '2', 'ke1', 'li4', '1', 'shou4', 'jia4', '2', 'yi1', 'bai3', 'ba1', 'shi2', 'yuan2', '#1', 'yi1', 'he2', 'pause', wherein the pause marks are [ # ].
S3, reading the pronunciation dictionary, replacing the phonetic symbol, and converting the phonetic symbol into phoneme:
the alternative processing mode of the pronunciation dictionary is shown in fig. 3, if the pronunciation dictionary is empty, no processing is performed, and the subsequent processing step is directly performed; if the pronunciation dictionary is not empty, reading the pronunciation dictionary, performing word detection on the text to be synthesized through a key value of the pronunciation dictionary, and if the text to be synthesized contains the key value, replacing the pinyin mark on the key value corresponding to the text to be synthesized in the step S2 by the value corresponding to the key value; the rest remain unchanged.
Specifically, suppose that the user inputs a text to be synthesized as "Jiuzhitong donkey-hide gelatin blood-enriching granule with a selling price of 180 Yuan Box. If the user never inputs the words and the pinyin, the pronunciation dictionary is empty, and the subsequent processing and synthesis steps are directly carried out to obtain audio; if the user wants the pronunciation of "enrich | bu3xue 4" to be "enrich | bu3 xie 3" according to his own habits, then the user inputs "enrich | bu3 xie 3", and the pronunciation dictionary is { "enrich": "bu 3 xie 3" }, then the text to be synthesized is detected by the key value of the pronunciation dictionary, and the text to be synthesized is found to contain the key value "enrich the blood", and then "bu 3xue 4" is replaced by the value "bu 3 xie 3" corresponding to the key value, resulting in [ 'j iu3', 'zhi1', 'tan 2', '2', 'e1', 'jiao1', '1', 'bu3', 'xie3', '2', 'ke1', 'li4', '1', 'shou4', 'jia4', '2', 'yi1', 'bai3', 'ba1', 'shi2', 'yu 2', 'yia 6342', '6861', '68628', '6864'. And then, converting the pinyin label into a phoneme label to obtain 'j iou3 zh iii1t ang2#2e1 j iao1#1b u3x ie3#2k e1 l i4#1sh 4 j ia4#2i1b ai3b a 1sh iii2van2#1i1h e2# 4' as the front-end input of the speech synthesis model.
S4, converting the processing result into acoustic features by using a speech synthesis model:
to synthesize acoustic features, speech synthesis models include, but are not limited to, the current tacontron or tacontron 2 or the Transformer TTS. Optionally, in order to extract acoustic features, the acoustic features include, but are not limited to, mel-frequency spectral features or linear spectral features or other acoustic features related to a spectral envelope.
S5, converting the acoustic features into audio by using a vocoder:
in order to convert acoustic features into audio, models of vocoders employ network structures including, but not limited to, WavNET, WavRNN, MelGAN. Optionally, in order to extract acoustic features, the acoustic features include, but are not limited to, mel-frequency spectral features or linear spectral features or other acoustic features related to a spectral envelope.
By the method for controlling the pronunciation of the synthesized voice, the user-defined pronunciation dictionary is used for replacing the synthesized voice text, so that the synthesized voice meets the pronunciation habit of the user, and meanwhile, the problem of polyphone errors can be corrected.
The embodiment further provides an apparatus for controlling pronunciation of speech synthesis, including:
the pronunciation dictionary construction module is used for storing and reading Chinese words and pronunciations input by a user;
the key value of the pronunciation dictionary is Chinese words, and the value is pinyin; before the user inputs, the pronunciation dictionary is initialized to be empty; when the user inputs the input key value and the value, checking the input key value and the input value to ensure that the input key value and the input value are legal; if the key value does not exist in the pronunciation dictionary, adding the key value and the value input by the user into the pronunciation dictionary; if the key value exists in the pronunciation dictionary, updating the value corresponding to the key value; the pronunciation dictionary supports user viewing and modification deletion.
The text processing module is used for carrying out regularization processing and rhythm analysis on the text to be synthesized and converting the text to a pinyin mark;
carrying out regularization processing on a text to be synthesized, screening out illegal characters, carrying out word segmentation, part of speech tagging and the like on legal input, and inputting the extracted comprehensive linguistic characteristics into a rhythm prediction model to obtain pause level tagging; the Chinese characters are converted into phonetic symbols.
The replacing processing module is used for reading the pronunciation dictionary, replacing the phonetic symbol and converting the phonetic symbol into a phoneme;
the key value of the pronunciation dictionary is Chinese words, the value is the corresponding pinyin, and the pronunciation dictionary is an empty dictionary when no user inputs; in the using process of a user, firstly, checking the words and the pinyin input by the user to ensure that the words and the pinyin are input legally; if the key value does not exist, the Chinese words newly input by the user are used as the key value, the input pinyin is used as the value and stored in the dictionary, and if the key value already exists, the value of the existing key value is updated by the newly input pinyin; the pronunciation dictionary supports the user to view the existing key values and value values and to modify and delete them.
The synthesis module is used for converting the input processed text to be synthesized into acoustic features;
to synthesize acoustic features, speech synthesis models include, but are not limited to, the current tacontron or tacontron 2 or the Transformer TTS. Optionally, in order to extract acoustic features, the acoustic features include, but are not limited to, mel-frequency spectral features or linear spectral features or other acoustic features related to a spectral envelope.
A vocoder module for converting the input acoustic features into audio.
In order to convert acoustic features into audio, vocoder models employ network structures including, but not limited to, WavNET, WavRNN, MelGAN. Optionally, in order to extract acoustic features, the acoustic features include, but are not limited to, mel-frequency spectral features or linear spectral features or other acoustic features related to a spectral envelope.
By the device for controlling the pronunciation of the synthesized voice, the synthesized voice meets the pronunciation habit of the user by replacing the synthesized voice text by using the user-defined pronunciation dictionary, and meanwhile, the problem of polyphone errors can be corrected.
The above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (8)

1. A method of controlling speech synthesis pronunciation, comprising the steps of:
s1, creating a pronunciation dictionary;
s2, carrying out regularization processing and prosody analysis on the text to be synthesized and converting the text to be synthesized into a pinyin mark;
s3, reading the pronunciation dictionary, replacing the phonetic symbol, and converting the phonetic symbol into phoneme;
s4, converting the phoneme into acoustic features by using a speech synthesis model;
and S5, converting the acoustic features into audio by using a vocoder.
2. The method for controlling speech synthesis pronunciation according to claim 1, wherein the step S1 is as follows:
the key value of the pronunciation dictionary is Chinese words, the value is pinyin, and the pronunciation dictionary is initialized to be empty before the user inputs the pinyin; when the user inputs the input key value and the value, checking the input key value and the input value to ensure that the input key value and the input value are legal; if the key value in the pronunciation dictionary does not exist, adding the key value and the value input by the user into the pronunciation dictionary; if the key value exists in the pronunciation dictionary, updating the value corresponding to the key value; and the pronunciation dictionary supports the user to view and modify and delete.
3. The method for controlling speech synthesis pronunciation according to claim 2, wherein the step S2 is as follows:
and carrying out regularization processing on the text to be synthesized, screening out illegal characters, carrying out word segmentation and part-of-speech tagging on the input of the method, inputting the extracted comprehensive linguistic characteristics into a rhythm prediction model to obtain pause level tagging, and converting the Chinese characters into pinyin tags.
4. The method for controlling speech synthesized pronunciation according to claim 3, wherein the step S3 is as follows:
when the pronunciation dictionary is read, if the pronunciation dictionary is empty, no processing is carried out; and if the pronunciation dictionary is not empty, reading the pronunciation dictionary, performing word detection on the text to be synthesized through the key value of the pronunciation dictionary, and if the text to be synthesized contains the key value, replacing the pinyin mark on the key value corresponding to the text to be synthesized in the step S2 by the value corresponding to the key value, and keeping the rest unchanged.
5. The method for controlling pronunciation synthesis according to any one of claims 1-4, wherein the speech synthesis model in step S4 is Tacotron or Tacotron2 or Transformer TTS.
6. The method for controlling pronunciation synthesis as claimed in claim 5, wherein the network structure adopted by the model of the vocoder in step S5 is WavNET or WavRNN or MelGAN.
7. The method for controlling synthesized pronunciation of speech as claimed in claim 1 or 6, wherein the acoustic features are Mel spectral features or linear spectral features or other acoustic features related to spectral envelope in steps S4 and S5.
8. An apparatus for controlling speech synthesis pronunciation, comprising:
the pronunciation dictionary construction module is used for storing and reading Chinese words and pronunciations input by a user;
the text processing module is used for carrying out regularization processing and rhythm analysis on the text to be synthesized and converting the text to a pinyin mark;
the replacing processing module is used for reading the pronunciation dictionary, replacing the phonetic symbol and converting the phonetic symbol into a phoneme;
the synthesis module is used for converting the input processed text to be synthesized into acoustic features;
a vocoder module for converting the input acoustic features into audio.
CN202010873463.4A 2020-08-26 2020-08-26 Method and device for controlling voice synthesis pronunciation Pending CN111968619A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010873463.4A CN111968619A (en) 2020-08-26 2020-08-26 Method and device for controlling voice synthesis pronunciation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010873463.4A CN111968619A (en) 2020-08-26 2020-08-26 Method and device for controlling voice synthesis pronunciation

Publications (1)

Publication Number Publication Date
CN111968619A true CN111968619A (en) 2020-11-20

Family

ID=73390608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010873463.4A Pending CN111968619A (en) 2020-08-26 2020-08-26 Method and device for controlling voice synthesis pronunciation

Country Status (1)

Country Link
CN (1) CN111968619A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694627A (en) * 2020-12-14 2022-07-01 马上消费金融股份有限公司 Speech synthesis related method, training method of speech flow-to-speech model and related device
CN114999438A (en) * 2021-05-08 2022-09-02 中移互联网有限公司 Audio playback method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7139712B1 (en) * 1998-03-09 2006-11-21 Canon Kabushiki Kaisha Speech synthesis apparatus, control method therefor and computer-readable memory
US7292980B1 (en) * 1999-04-30 2007-11-06 Lucent Technologies Inc. Graphical user interface and method for modifying pronunciations in text-to-speech and speech recognition systems
CN101324884A (en) * 2008-07-29 2008-12-17 无敌科技(西安)有限公司 Method of polyphone pronunciation
CN106205600A (en) * 2016-07-26 2016-12-07 浪潮电子信息产业股份有限公司 Interactive Chinese text voice synthesis system and method
US20180190269A1 (en) * 2016-12-29 2018-07-05 Soundhound, Inc. Pronunciation guided by automatic speech recognition
CN108962217A (en) * 2018-07-28 2018-12-07 华为技术有限公司 Phoneme synthesizing method and relevant device
CN110534089A (en) * 2019-07-10 2019-12-03 西安交通大学 A kind of Chinese speech synthesis method based on phoneme and rhythm structure

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7139712B1 (en) * 1998-03-09 2006-11-21 Canon Kabushiki Kaisha Speech synthesis apparatus, control method therefor and computer-readable memory
US7292980B1 (en) * 1999-04-30 2007-11-06 Lucent Technologies Inc. Graphical user interface and method for modifying pronunciations in text-to-speech and speech recognition systems
CN101324884A (en) * 2008-07-29 2008-12-17 无敌科技(西安)有限公司 Method of polyphone pronunciation
CN106205600A (en) * 2016-07-26 2016-12-07 浪潮电子信息产业股份有限公司 Interactive Chinese text voice synthesis system and method
US20180190269A1 (en) * 2016-12-29 2018-07-05 Soundhound, Inc. Pronunciation guided by automatic speech recognition
CN108962217A (en) * 2018-07-28 2018-12-07 华为技术有限公司 Phoneme synthesizing method and relevant device
CN110534089A (en) * 2019-07-10 2019-12-03 西安交通大学 A kind of Chinese speech synthesis method based on phoneme and rhythm structure

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694627A (en) * 2020-12-14 2022-07-01 马上消费金融股份有限公司 Speech synthesis related method, training method of speech flow-to-speech model and related device
CN114999438A (en) * 2021-05-08 2022-09-02 中移互联网有限公司 Audio playback method and device
CN114999438B (en) * 2021-05-08 2023-08-15 中移互联网有限公司 Audio playing method and device

Similar Documents

Publication Publication Date Title
CN112420016B (en) Method and device for aligning synthesized voice and text and computer storage medium
US9424833B2 (en) Method and apparatus for providing speech output for speech-enabled applications
US8015011B2 (en) Generating objectively evaluated sufficiently natural synthetic speech from text by using selective paraphrases
US8825486B2 (en) Method and apparatus for generating synthetic speech with contrastive stress
Olaszy et al. Profivox—a Hungarian text-to-speech system for telecommunications applications
US8914291B2 (en) Method and apparatus for generating synthetic speech with contrastive stress
EP1668628A1 (en) Method for synthesizing speech
JP2019109278A (en) Speech synthesis system, statistic model generation device, speech synthesis device, and speech synthesis method
CN111968619A (en) Method and device for controlling voice synthesis pronunciation
CN114708848A (en) Method and device for acquiring size of audio and video file
Akinwonmi Development of a prosodic read speech syllabic corpus of the Yoruba language
CN116229947A (en) Voice recognition method and voice recognition device
CN114822489A (en) Text transcription method and text transcription device
JP6197523B2 (en) Speech synthesizer, language dictionary correction method, and language dictionary correction computer program
Hendessi et al. A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM
Khamdamov et al. Syllable-Based Reading Model for Uzbek Language Speech Synthesizers
JP3029403B2 (en) Sentence data speech conversion system
JP2580568B2 (en) Pronunciation dictionary update device
Quazza et al. The use of lexica in text-to-speech systems
Kato et al. Multilingualization of speech processing
Kaur et al. BUILDING AText-TO-SPEECH SYSTEM FOR PUNJABI LANGUAGE
CN119763547A (en) Speech synthesis method, speech synthesis model training method, electronic device and computer program product
CN118135995A (en) Speech synthesis method, device, equipment and storage medium
Toma et al. Automatic rule-based syllabication for Romanian
CN117711373A (en) Text phoneme label information generation method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201120

RJ01 Rejection of invention patent application after publication