[go: up one dir, main page]

CN1111811C - Pronunciation Synthesis Method of Computer Speech Signal - Google Patents

Pronunciation Synthesis Method of Computer Speech Signal Download PDF

Info

Publication number
CN1111811C
CN1111811C CN 97110082 CN97110082A CN1111811C CN 1111811 C CN1111811 C CN 1111811C CN 97110082 CN97110082 CN 97110082 CN 97110082 A CN97110082 A CN 97110082A CN 1111811 C CN1111811 C CN 1111811C
Authority
CN
China
Prior art keywords
diphones
pronunciation
word
syllable
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 97110082
Other languages
Chinese (zh)
Other versions
CN1196531A (en
Inventor
张景嵩
曹洪
张金玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Corp
Original Assignee
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Corp filed Critical Inventec Corp
Priority to CN 97110082 priority Critical patent/CN1111811C/en
Publication of CN1196531A publication Critical patent/CN1196531A/en
Application granted granted Critical
Publication of CN1111811C publication Critical patent/CN1111811C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

本发明为一种计算机语音信号的发音合成方法,主要利用英语单词中的相邻两个音节中由前一个音节中间位置至后一个音节中间位置的过渡部分,作为英语单词合成发音的双音素。相对于传统使用的半音素和单音素而言,由于双音素是从英语单词中各音节的平稳段切分下来的,因而可最大程度地保留英语单词中各音节间的变化信息,因此,通过本发明将可针对英语单词合成出更逼近真人发音效果的计算机语音。

Figure 97110082

The present invention is a method for synthesizing pronunciation of computer speech signals, which mainly utilizes the transition part from the middle position of the previous syllable to the middle position of the next syllable in two adjacent syllables in an English word as the diphone for synthesizing pronunciation of the English word. Compared with the traditional semiphones and monophones, since the diphones are segmented from the stable segments of each syllable in the English word, the variation information between the syllables in the English word can be retained to the greatest extent. Therefore, the present invention can synthesize computer speech for English words that is closer to the pronunciation effect of real people.

Figure 97110082

Description

计算机语音信号的发音合成方法Pronunciation Synthesis Method of Computer Speech Signal

传统的计算机由于受到其中央处理器的速度限制和存储装置(如:硬盘等)的存储容量限制,计算机语音合成的运算法及所使用的基本合成单元较简单,致使所合成出文字语音的效果与原声相距甚远,虽有部分业者为获得较符合原声效果的语音而设计有许多新的运算法,可是,至今不仅仍不能彻底解决问题,甚至语音效果上亦无明显的改善。Due to the speed limit of the traditional computer and the storage capacity of the storage device (such as: hard disk, etc.), the computer speech synthesis algorithm and the basic synthesis unit used are relatively simple, resulting in the effect of synthesizing text and speech. It is far from the original sound. Although some manufacturers have designed many new algorithms to obtain a voice that is more in line with the original sound effect, they still cannot completely solve the problem so far, and even the voice effect has not been significantly improved.

由于现今计算机科技在相关硬件设备迅速进步之情形下,为设计者提供了更快的处理器及更大的存储空间,因此,对于语音合成技术而言,设计者不仅可采用复杂的合成和压缩运算法,且用来合成语音的单元也可更大,从而使这些单元中包含更多的语音信息,所以现今计算机科技确实塑造了一个极佳的设计环境。虽然如此,现今语音合成技术在制作合成语音时却仍存在着发音失真的问题,这种失真问题主要是由语音合成技术中的语音合成运算法及压缩运算法所造成。Because today's computer technology provides designers with faster processors and larger storage space under the condition of rapid progress in related hardware equipment, therefore, for speech synthesis technology, designers can not only use complex synthesis and compression Algorithms, and the units used to synthesize speech can also be larger, so that these units contain more speech information, so today's computer technology has indeed shaped an excellent design environment. Even so, the current speech synthesis technology still has the problem of pronunciation distortion when producing synthesized speech. This distortion problem is mainly caused by the speech synthesis algorithm and compression algorithm in the speech synthesis technology.

以英文单词“HELLO”为例,传统的语音合成技术在针对英文单词找出其国际音标<halo>后,首先按照传统的切分方法切分出<h>、<a>、<l>及<o>等组成音素,并找出其分界点,根据这些音素从相关的发音数据库中提取相关的发音,但实际上在这些音素相互合并连接时,由于各音素间交互影响,并不存在分界段,而存在一个交互影响的区段,且按采样点分断音素,必然导致音素不纯,不纯的音素在连接时,自然清晰度低、噪音大、声音粗糙且机器声明显。Taking the English word "HELLO" as an example, after the traditional speech synthesis technology finds the international phonetic symbol <halo> for the English word, it first segments <h>, <a>, <l> and <halo> according to the traditional segmentation method. <o> and other phonemes are formed, and their boundary points are found, and the relevant pronunciations are extracted from the relevant pronunciation database according to these phonemes. However, when these phonemes are merged and connected with each other, there is no boundary due to the interaction between the phonemes. segment, but there is a section of interactive influence, and the phoneme is divided according to the sampling point, which will inevitably lead to impure phonemes. When the impure phonemes are connected, the natural clarity is low, the noise is loud, the sound is rough, and the machine sound is obvious.

因此,本发明的目的在于提供一种计算机语音信号的发音合成方法,通过本发明的方法,能有效提高单词合成的准确率,令其发音产生更接近真人说话的效果,并有效增加合成发音的作业速度,从而克服上述传统方法对英语单词进行计算机语音合成处理时所产生的各种缺点。Therefore, the object of the present invention is to provide a kind of pronunciation synthesis method of computer speech signal, by the method of the present invention, can effectively improve the accuracy rate of word synthesis, make its pronunciation produce the effect that is closer to real person's speaking, and effectively increase the synthetic pronunciation. Operation speed, thereby overcome the various shortcomings that the above-mentioned traditional method produces when carrying out computer speech synthesis processing to English word.

本发明的计算机语音信号的发音合成方法包括:The pronunciation synthesis method of computer speech signal of the present invention comprises:

首先将单词的真人正确发音输入语音接收装置,该单词的语音信号经模/数转换器采样处理后,产生该单词的数字语音数据;Firstly, the correct pronunciation of the real person of the word is input into the voice receiving device, and the voice signal of the word is sampled and processed by the analog/digital converter to generate the digital voice data of the word;

经由声音编辑器,该数据按各元音或子音的位置及其与前后元音或子音间之相互影响关系,由相邻两个音节中前一个音节中间位置至后一个音节中间位置的过渡部分,切分出一个以上的双音素;Through the sound editor, according to the position of each vowel or consonant and the mutual influence relationship with the front and back vowels or consonants, the data is the transition part from the middle position of the previous syllable to the middle position of the next syllable among two adjacent syllables , segment out more than one diphone;

根据所切分出的各双音素,通过音质校正装置适当调整不同单词中相同双音素的语音信号,并将该双音素的语音信号录制成发音数据库,从而使发音数据库中所采集的双音素更适合作为合成不同单词语音时的基本单元;According to the diphones that are cut out, the voice signals of the same diphones in different words are appropriately adjusted by the sound quality correction device, and the voice signals of the diphones are recorded into a pronunciation database, so that the diphones collected in the pronunciation database are more accurate. It is suitable as the basic unit for synthesizing the speech of different words;

在利用双音素合成单词语音时,首先由计算机读入单词,通过分析单词得到其对应的国际音标,再将所对应的国际音标分解成双音素,并经转换为双音素序号后,计算机即按照该序号自所录制成的发音数据库中提取相对应的数字语音信号,并借助解压缩程序予以解压缩,以取得该双音素的语音信号,然后再将所取得的语音信号合并,并经平滑处理,从而合成该单词的正确发音。When using diphones to synthesize word sounds, the computer first reads the words, analyzes the words to obtain the corresponding international phonetic symbols, and then decomposes the corresponding international phonetic symbols into diphones, and after converting them into diphone numbers, the computer follows The serial number extracts the corresponding digital voice signal from the recorded pronunciation database, and decompresses it with the help of a decompression program to obtain the voice signal of the diphone, and then merges the obtained voice signals and processes them smoothly , so as to synthesize the correct pronunciation of the word.

附图说明:Description of drawings:

图1所示是本发明中采集双音素单元的流程示意图;Shown in Fig. 1 is the schematic flow sheet of gathering two phoneme units among the present invention;

图2是说明本发明的双音素单元分析构成的示意图;Fig. 2 is the schematic diagram that illustrates the composition of two-phoneme unit analysis of the present invention;

图3所示是本发明利用双音素单元合成单词发音的流程示意图;Shown in Fig. 3 is that the present invention utilizes the synthesizing flow diagram of word pronunciation of double phoneme unit;

图4和5是母音“O”的波形图和对应的能量谱;Figures 4 and 5 are waveform diagrams and corresponding energy spectra of the vowel "O";

图6和7是经过降调处理后的母音“O”的波形图和对应的能量谱。Figures 6 and 7 are waveform diagrams and corresponding energy spectra of the vowel "O" after the down-tone processing.

以下,将结合附图详细叙述本发明的一个较佳实施例。Hereinafter, a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

本发明主要在于利用双音素作为英语单词合成发音的基本单元,其中所谓双音素是指英语单词中相邻两个音节的过渡部分,亦即英语单词的相邻两个音节中由前一个音节中间位置至后一个音节中间位置的过渡部分,如以单词“HELLO”为例,其国际音标为<halo>,则该单词中相邻两个音节的过渡部分表示如下:

Figure C9711008200041
其中*符号代表空音或静音。若以国际音标表示,则该单词“HELLO”即系由<*h>、<ha>、<al>、<lo>及<o*>等双音素所组成。The present invention mainly lies in utilizing diphone as the basic unit of English word synthetic pronunciation, and wherein so-called diphone refers to the transition part of two adjacent syllables in the English word, that is, in the adjacent two syllables of the English word, it is formed by the middle of the previous syllable. position to the middle position of the next syllable, for example, take the word "HELLO" as an example, its International Phonetic Alphabet is <halo>, then the transition part of two adjacent syllables in this word is expressed as follows:
Figure C9711008200041
The * symbol stands for empty tone or mute. If expressed in the International Phonetic Alphabet, the word "HELLO" is composed of diphones such as < * h>, <ha>, <al>, <lo> and <o * >.

由此可知,英语单词的发音即由各双音素单元所组成,而采集双音素之方法,参见第1图所示,主要是先将单词经由真人以正确发音输入语音接收装置,单词的语音信号经模/数转换器的采样处理后,产生该单词的数字语音数据,该数据再经声音编辑器按照本发明方法进行切分处理,以切分出组成该单词语音信号的双音素。由于不同单词中相同双音素在发音上仍可能存在有若干差异,因而,借助音质校正装置适当调整不同单词中相同双音素的语音信号,就可使所获得的双音素能更适用于合成不同单词语音时的基本单元。最后,再将所采集的各双音素利用录音及压缩技术将其录制于一发音数据库中,在合成语音时,即可利用该发音数据库中的双音素,以合成单词的正确发音。It can be seen that the pronunciation of English words is composed of each diphone unit, and the method of collecting diphones, as shown in Figure 1, is mainly to first input the words into the voice receiving device with correct pronunciation by a real person, and the voice signals of the words After the sampling process by the analog/digital converter, the digital voice data of the word is generated, and the data is then segmented by the voice editor according to the method of the present invention, so as to segment out the diphones that form the voice signal of the word. Since there may still be some differences in the pronunciation of the same diphone in different words, therefore, by properly adjusting the speech signals of the same diphone in different words with the help of a sound quality correction device, the obtained diphone can be more suitable for synthesizing different words The basic unit of speech. Finally, the collected diphones are recorded in a pronunciation database using recording and compression techniques. When synthesizing speech, the diphones in the pronunciation database can be used to synthesize the correct pronunciation of words.

本发明依据前述双音素原理可由8万个英语单词中归纳出约1600个双音素,并利用这些双音素合成单词的发音,因此,欲针对英语单词合成出更逼近真人发音效果的计算机语音,应完全取决于这些双音素之采集方式。因此,如何获得所需之双音素,将是决定本发明双音素合成法中合成音质好坏的关键,所以,在利用语音合成及录音技术录制双音素的发音数据库时,必需适当控制双音素的音速(发音的长短)与音量。The present invention can summarize about 1600 diphones from 80,000 English words according to the aforementioned diphone principle, and utilize these diphones to synthesize the pronunciation of words. It all depends on how these diphones are collected. Therefore, how to obtain the required diphone will be the key to determine the sound quality of the double phoneme in the double phoneme synthesis method of the present invention, so when utilizing the pronunciation database of speech synthesis and recording technology to record the diphone, it is necessary to properly control the diphone Speed of sound (length of pronunciation) and volume.

本发明的双音素单元主要由英语国际音标最基本的母音和子音所组成,其组成方式包括子母音、母子音、母母音及子子音等组成方式,其中母音也称元音,子音也称辅音,一般来说,母音与子音各有其发音特色,母音振幅较大,波形较有规则,周期亦较明显,子音振幅小,波形不规则,周期较无规律性。The diphone unit of the present invention is mainly composed of the most basic vowels and consonants of the English International Phonetic Alphabet. Generally speaking, vowels and consonants have their own pronunciation characteristics. Vowels have larger amplitudes, more regular waveforms, and more obvious cycles, while consonants have smaller amplitudes, irregular waveforms, and more irregular cycles.

然而,无论是子音或母音,其振幅仍大致有一个由低而高,由高而低的变化过程,因而在本发明中为保证所采样的双音素有足够的变化幅度及相关性,在选择用来切分双音素的语音段时,应按以下步骤进行(参见图2):However, no matter it is a consonant or a vowel, its amplitude still generally has a change process from low to high and from high to low, so in the present invention, in order to ensure that the sampled diphones have sufficient range of change and correlation, when selecting When being used to segment the speech segment of diphone, should carry out according to the following steps (referring to Fig. 2):

1)先准备一个大容量的语音库,并得出与其对应的参数信息-音素编号(PhonemeLabel),音调级别(PitchLevel),能量级别(PowerLevel)。1) Prepare a large-capacity speech library first, and obtain the corresponding parameter information-phoneme number (PhonemeLabel), pitch level (PitchLevel), and energy level (PowerLevel).

2)对语音库进行LPC(16阶)频谱分析。2) Carry out LPC (order 16) spectrum analysis on the speech library.

3)对相同音素编号的语音段计算出平均频谱特性,所得结果的平均值AverageK为各频谱参数的加权和。3) Calculate the average spectral characteristic for the speech segment with the same phoneme number, and the average value AverageK of the obtained result is the weighted sum of each spectral parameter.

4)以频谱特性最接近AverageK的语音段作为合成单元数据。4) The speech segment whose spectral characteristic is closest to AverageK is used as the synthesis unit data.

5)在选定语音段后,开始切分双音素。5) After the speech segment is selected, start to split diphones.

在切分双音素时,必须依据下列规则:When splitting diphones, the following rules must be followed:

1)自前一个音节的波峰切分到后一个音节的波峰。1) Segment from the crest of the previous syllable to the crest of the next syllable.

2)由于英语单词是由几个双音素拼接而成,因此,每个双音素的幅度、长度必需十分相当。2) Since English words are spliced by several diphones, the amplitude and length of each diphone must be quite equal.

3)为令双音素在拼接时保持其周期的完整,切分双音素开始和结束的两端均为波形周期起始点,意即组成该双音素的单音素两端为波形周期起始点,且其波形相接点必须相位相同。否则,若前一音素以正变化率上升,第二音素马上以负变化率连接,则将出现杂音。3) In order to keep the cycle of the diphones intact during splicing, both ends of the split diphone start and end are the starting points of the waveform cycle, which means that the two ends of the single phoneme forming the diphone are the starting points of the waveform cycle, and The phases of their waveforms must be the same. Otherwise, if the previous phoneme rises with a positive rate of change and the second phoneme is immediately connected with a negative rate of change, a murmur will appear.

4)不同双音素的同一音节应有大致相同的周期,因此,将这些双音素拼接时,语调才会统一。4) The same syllable of different diphones should have approximately the same period, therefore, when these diphones are spliced together, the intonation will be unified.

与传统使用的半音素和单音素相比,本发明之双音素由于是从英语单词中各音节的平稳段切分下来的,因而可最大程度地保留英语单词中各音节间的变化信息,因此,利用本发明将可针对英语单词合成出更逼近真人发音的计算机语音。Compared with traditionally used semiphones and monophones, the diphone of the present invention can retain the change information between each syllable in the English word to the greatest extent because it is cut from the steady segment of each syllable in the English word. , using the present invention will be able to synthesize a computer voice that is closer to the pronunciation of a real person for English words.

以英语单词“HELLO”为例,本发明的双音素切分是按照下列步骤进行的:Taking the English word "HELLO" as an example, the diphone segmentation of the present invention is carried out according to the following steps:

1)首先,针对该英语单词“HELLO”找出其正确的国际音标<halo>;1) First, find out its correct International Phonetic Alphabet <halo> for the English word "HELLO";

2)再根据该国际音标<halo>各元音或子音的位置及其与前后元音或子音间的相互影响关系,按照读音规则切分成<*h>、<ha>、<al>、<lo>及<o*>等区段,其中符号*代表空音或静音,而所切分出的<*h>、<ha>、<al>、<lo>及<o*>等区段,即本发明所称的双音素。2) According to the position of each vowel or consonant of the International Phonetic Alphabet <halo> and the mutual influence relationship between the front and back vowels or consonants, it is divided into < * h>, <ha>, <al>, <Lo> and <o * > and other sections, where the symbol * represents empty sound or mute, and the segmented < * h>, <ha>, <al>, <lo> and <o * > and other sections , the so-called diphone in the present invention.

特别需要注意的是,各区段的切分点是在纯音素的平稳段中点,如此,将该区段的发音拼接合成时,由于是以同一个音素连接,所以,连接比较平稳。In particular, it should be noted that the segmentation point of each section is at the midpoint of the stable section of pure phoneme, so that when the pronunciation of the section is spliced and synthesized, the connection is relatively stable because it is connected by the same phoneme.

本发明在利用双音素合成单词语音时,其处理步骤参见图3所示,首先,由计算机读入单词,通过分析单词得到其对应的国际音标,再将所对应的国际音标分解成双音素,并经转换为双音素序号后,计算机即依双音素序号自本发明所录制成之发音数据库中检索相对应的语音数字编码信号。若检索到,则提取所寻得的数字信号,并借助解压缩程序予以解压缩,以取得双音素的语音数据,然后,将所取得的语音数据予以合并,再经平滑处理,即合成该单词的正确发音。When the present invention utilizes diphones to synthesize word speech, its processing steps are shown in Fig. 3, at first, read in words by computer, obtain its corresponding International Phonetic Symbols by analyzing words, then decompose corresponding International Phonetic Symbols into diphones, And after being converted into a diphone serial number, the computer retrieves the corresponding speech digital coding signal from the pronunciation database recorded in the present invention according to the diphone serial number. If it is retrieved, extract the found digital signal and decompress it with the help of a decompression program to obtain the speech data of diphones, then merge the obtained speech data, and then perform smoothing to synthesize the word correct pronunciation of .

例如,将这些数据合并后所得到的、合并的语音信号称为S(i),对S(i)做均值平滑滤波处理。取该信号中邻近3帧(一帧指一个采样周期)做计算:当前帧的语音信号S(i)=A1S(p)+A2S(i)+A3S(s)。For example, the combined speech signal obtained by combining these data is called S(i), and the mean value smoothing filter is performed on S(i). Take 3 adjacent frames in the signal (one frame refers to one sampling period) for calculation: the speech signal S(i) of the current frame=A1S(p)+A2S(i)+A3S(s).

A1,A2,A3-加权系数A1, A2, A3-weighting coefficients

S(p)-前一帧语音数据S(p) - the previous frame of voice data

S(s)-后一帧语音数据S(s) - the next frame of voice data

由于语音信号是以脉冲编码调制(PCM)为基础的音调同步差分编码PSDC(Pitch Synchronized Diffrential Coding),合成时可方便地实现音调控制。将语音信号由周期长度Torg调整到目标周期长度Ttar时,使用一个长度为T=2Torg的哈明窗Hamming window W(i),变换后信号S(i)=W(i)S(i)+W(T/2-i)S(i+a),其中a=Ttar-Torg。为避免合成语音质量变坏,限制Torg/2<Ttar<2Torg。Since the speech signal is based on the pulse code modulation (PCM) based pitch synchronous differential coding PSDC (Pitch Synchronized Diffrential Coding), the pitch control can be easily realized during synthesis. When the voice signal is adjusted from the period length Torg to the target period length Ttar, a Hamming window Hamming window W(i) with a length of T=2Torg is used, and the transformed signal S(i)=W(i)S(i)+ W(T/2-i)S(i+a), where a=Ttar-Torg. In order to avoid deterioration of synthesized speech quality, limit Torg/2<Ttar<2Torg.

图4,5为母音“O”的波形图和对应的能量谱。Figures 4 and 5 are the waveform diagrams of the vowel "O" and the corresponding energy spectrum.

图6,7为经降调处理后的母音“O”的波形图和对应的能量谱,与图4,5对比可看出,变换后的信号保留了原信号所有频带的语音特性,失真很小。Figures 6 and 7 are the waveform diagrams and corresponding energy spectra of the vowel "O" after the down-tone processing. Compared with Figures 4 and 5, it can be seen that the transformed signal retains the speech characteristics of all frequency bands of the original signal, and the distortion is very large. Small.

仍以单词“HELLO”为例,其所对应的国际音标为<halo>,本发明在利用双音素合成单词语音时,系按以下步骤:Still taking the word "HELLO" as an example, its corresponding International Phonetic Alphabet is <halo>, and the present invention follows the steps below when utilizing diphones to synthesize word speech:

1)先将该音标<halo>切分出<*h>、<he>、<el>、<lo>及<o*>等双音素;1) Segment the phonetic symbol <halo> into diphones such as < * h>, <he>, <el>, <lo> and <o * >;

2)再按照各双音素对应至发音数据库中的双音素序号12、19、23、33及78等,从该发音数据库中提取这些双音素的数字语音信号;2) then according to the diphone serial numbers 12, 19, 23, 33 and 78 etc. corresponding to each diphone in the pronunciation database, extract the digital voice signals of these diphones from the pronunciation database;

3)再借助解压缩程序就所提取的数字语音信号予以解压缩,以取得双音素的语音信号,然后,将所取得之语音信号予以合并,再经平滑处理,即合成该单词的正确发音。3) Decompress the extracted digital voice signal with the help of a decompression program to obtain a biphone voice signal, then merge the obtained voice signals, and then smooth them to synthesize the correct pronunciation of the word.

以上所述,仅是本发明的一个较佳实施例,正因如此,本发明的权利要求范围并不局限于此,凡是本领域的熟练技术人员,依据本发明所公开的技术内容所作出的修改和等效变化,均应不脱离本发明的保护范围。The above is only a preferred embodiment of the present invention, and for this reason, the scope of the claims of the present invention is not limited thereto, any person skilled in the art, based on the technical content disclosed in the present invention made Modifications and equivalent changes shall not depart from the protection scope of the present invention.

Claims (4)

1. the pronunciation synthetic method of a machine word tone signal comprises:
At first with true man's orthoepy input pronunciation receiver of word, the voice signal of this word produces the digital voice data of this word after the A/D converter sampling processing;
Via the sound-editing device, these data by the position of each vowel or consonant and and front and back vowel or consonant between the relation of influencing each other, with amplitude must be identical, length must equate to be the transition portion of principle by previous syllable centre position in adjacent two syllables to a back syllable centre position, is syncopated as more than one diphones;
According to each diphones that is syncopated as, suitably adjust the voice signal of identical diphones in the various words by the acoustical correction device, and the voice signal recording of this diphones made the pronunciation data storehouse, thereby the elementary cell when making the diphones of being gathered in the pronunciation data storehouse be more suitable for as synthetic various words voice;
When utilizing diphones to synthesize word pronunciation, at first read in word by computing machine, obtain its corresponding International Phonetic Symbols by analyzing word, again the pairing International Phonetic Symbols are resolved into diphones, and after being converted to the diphones sequence number, computing machine promptly extracts corresponding audio digital signals according to this sequence number in the pronunciation data storehouse of being recorded, and decompressed by gunzip, to obtain the voice signal of this diphones, and then with obtained voice signal merging, and through smoothing processing, thereby the orthoepy of synthetic this word.
2. the pronunciation synthetic method of machine word tone signal as claimed in claim 1 is characterized in that, the cutting of described diphones can be by the crest cutting of the previous syllable crest to a back syllable.
3. the pronunciation synthetic method of machine word tone signal as claimed in claim 1 is characterized in that, the two ends of forming the single-tone element of described diphones are the wave period starting point, and the necessary phase place of its waveform phase contact is identical.
4. the pronunciation synthetic method of machine word tone signal as claimed in claim 1 is characterized in that, the same syllable of different diphones should have the identical cycle.
CN 97110082 1997-04-14 1997-04-14 Pronunciation Synthesis Method of Computer Speech Signal Expired - Fee Related CN1111811C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 97110082 CN1111811C (en) 1997-04-14 1997-04-14 Pronunciation Synthesis Method of Computer Speech Signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 97110082 CN1111811C (en) 1997-04-14 1997-04-14 Pronunciation Synthesis Method of Computer Speech Signal

Publications (2)

Publication Number Publication Date
CN1196531A CN1196531A (en) 1998-10-21
CN1111811C true CN1111811C (en) 2003-06-18

Family

ID=5171305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 97110082 Expired - Fee Related CN1111811C (en) 1997-04-14 1997-04-14 Pronunciation Synthesis Method of Computer Speech Signal

Country Status (1)

Country Link
CN (1) CN1111811C (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100686085B1 (en) * 1999-03-22 2007-02-23 엘지전자 주식회사 Imaging device with learning function and control method
US7693715B2 (en) * 2004-03-10 2010-04-06 Microsoft Corporation Generating large units of graphonemes with mutual information criterion for letter to sound conversion
CN109389968B (en) * 2018-09-30 2023-08-18 平安科技(深圳)有限公司 Waveform splicing method, device, equipment and storage medium based on double syllable mixing and lapping
CN112071299B (en) * 2020-09-09 2024-07-19 腾讯音乐娱乐科技(深圳)有限公司 Neural network model training method, audio generation method and device and electronic equipment
CN112530404A (en) * 2020-11-30 2021-03-19 深圳市优必选科技股份有限公司 Voice synthesis method, voice synthesis device and intelligent equipment

Also Published As

Publication number Publication date
CN1196531A (en) 1998-10-21

Similar Documents

Publication Publication Date Title
JP2885372B2 (en) Audio coding method
US9135923B1 (en) Pitch synchronous speech coding based on timbre vectors
US20080091428A1 (en) Methods and apparatus related to pruning for concatenative text-to-speech synthesis
CN110136687B (en) Voice training based cloned accent and rhyme method
JPS62160495A (en) Voice synthesization system
JPH0869299A (en) Voice coding method, voice decoding method and voice coding/decoding method
US9607610B2 (en) Devices and methods for noise modulation in a universal vocoder synthesizer
Choi et al. Korean singing voice synthesis based on auto-regressive boundary equilibrium gan
CN1190236A (en) Speech synthesizing system and redundancy-reduced waveform database therefor
CN1111811C (en) Pronunciation Synthesis Method of Computer Speech Signal
JP3109778B2 (en) Voice rule synthesizer
JPH05307395A (en) Voice synthesizer
Fu et al. Classification of voiceless fricatives through spectral moments
JP3622990B2 (en) Speech synthesis apparatus and method
JP2001100776A (en) Vocie synthesizer
JP3614874B2 (en) Speech synthesis apparatus and method
CN1089045A (en) The computer speech of Chinese-character text is monitored and critique system
JPH10133678A (en) Voice reproducing device
JP2987089B2 (en) Speech unit creation method, speech synthesis method and apparatus therefor
CN1210686C (en) Voice Pronunciation Speed Adjustment Method
TW318238B (en) Pronunciation synthesization method of computer voice signal
JPH05127697A (en) Speech synthesis method by division of linear transfer section of formant
Ghaemmaghami et al. Speech compaction using temporal decomposition
JP2995774B2 (en) Voice synthesis method
JP2000099094A (en) Time series signal processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20030618

Termination date: 20110414