[go: up one dir, main page]

JPS63121899A - Voice improvement - Google Patents

Voice improvement

Info

Publication number
JPS63121899A
JPS63121899A JP61266704A JP26670486A JPS63121899A JP S63121899 A JPS63121899 A JP S63121899A JP 61266704 A JP61266704 A JP 61266704A JP 26670486 A JP26670486 A JP 26670486A JP S63121899 A JPS63121899 A JP S63121899A
Authority
JP
Japan
Prior art keywords
hoarseness
pitch
analysis
normal
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP61266704A
Other languages
Japanese (ja)
Inventor
尚夫 桑原
徹 都木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Japan Broadcasting Corp
Original Assignee
Nippon Hoso Kyokai NHK
Japan Broadcasting Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Hoso Kyokai NHK, Japan Broadcasting Corp filed Critical Nippon Hoso Kyokai NHK
Priority to JP61266704A priority Critical patent/JPS63121899A/en
Publication of JPS63121899A publication Critical patent/JPS63121899A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 [産業上の利用分野] 本発明は、病気その他の理由によって、声帯が正常に機
能しない話者の声−いわゆる鳴声−の改善を行い、ピッ
チ情報を付加することによって正常発生の音声に近すけ
、それによる鳴声の質的変換を可能にしたものである。
[Detailed Description of the Invention] [Industrial Application Field] The present invention improves the voice of a speaker whose vocal cords do not function properly due to illness or other reasons - so-called vocalization - and adds pitch information. This makes it possible to bring the sound closer to normally generated sounds, thereby making it possible to qualitatively transform the sound.

[従来の技術] 医学の分野では、なんらかの理由によって喉頭をなくし
た人に対して、人工喉頭など、発話補助装置の開発が従
来から行われている。また、声帯に異常があってもまだ
摘出されていない人の声に対しては、外部から声帯に機
械的な振動を与えるバイブレーション装置を除いてはあ
まり対策がない。しかも、この装置は、バイブレーショ
ンの周期(ピッチの高さ)は一定であり、発話の内容に
応じたイントネーションの変化をつけることはできない
[Prior Art] In the field of medicine, speech assisting devices such as artificial larynxes have been developed for people who have lost their larynx for some reason. In addition, there are few countermeasures for the voices of people who have vocal cord abnormalities that have not yet been removed, except for vibration devices that apply mechanical vibrations to the vocal cords from the outside. Moreover, in this device, the vibration period (pitch height) is constant, and the intonation cannot be changed depending on the content of the utterance.

[発明が解決しようとする問題点] 従来、発声者に制約を加えることなく、発声された音声
を人工的に処理して、純工学的に鳴声の改善を試みた例
はない。本発明は、この改善をある程度可能にしたもの
であり、声帯振動が正常でない話者の声を計算機処理に
より正常話者の音声に近ずけることを目的とする。
[Problems to be Solved by the Invention] Hitherto, there has been no attempt to improve vocalization in a purely engineering manner by artificially processing uttered voices without imposing restrictions on the speaker. The present invention makes this improvement possible to some extent, and aims to make the voice of a speaker with abnormal vocal fold vibrations closer to the voice of a normal speaker through computer processing.

[問題点を解決するための手段] このような目的を達成するために、本発明の嗄声改善方
法は嗄声音信号の有声音部分からスペクトル情報を抽出
して声道フィルターを構成し、声道フィルターに嗄声音
信号以外の音声信号から抽出したピッチ情報を人力して
音声を合成することを特徴とする。
[Means for Solving the Problems] In order to achieve such an object, the hoarseness improvement method of the present invention extracts spectrum information from the voiced part of a hoarse sound signal, configures a vocal tract filter, and improves the vocal tract. It is characterized by manually inputting pitch information extracted from audio signals other than hoarseness signals into a filter to synthesize speech.

[作 用] 本発明によれば、嗄声音において不十分なピッチ情報を
正常音声中または人工的な声帯波形から抽出して、鳴声
から抽出したスペクトル情報と合成することによって、
鳴声な改善することができる。
[Function] According to the present invention, insufficient pitch information in hoarse voices is extracted from normal speech or artificial vocal fold waveforms, and synthesized with spectral information extracted from vocalizations.
The noise can be improved.

[実施例コ 以下に、図面を参照しながら、本発明の詳細な説明する
[Example] The present invention will be described in detail below with reference to the drawings.

第1図は末男式のブロック図を示す。全体のシステムは
、分析部1、特徴抽出部2、合成部3の三つの部分から
なり、すべて電子計算機内でデジタル信号処理によって
実行される。本発明の基本は嗄声音声の改善であるが、
改善すべき鳴声の他に、同一内容の正常音声の情報を用
いるため、図に示すように、末男式には原則として鳴声
と正常音声の2人力が必要である。但し、同一内容の正
常音声が得られない場合、次善の策として、人工的な声
帯波形を作り、合成部へ供給することができる。
FIG. 1 shows a block diagram of the youngest son's method. The entire system consists of three parts: an analysis section 1, a feature extraction section 2, and a synthesis section 3, all of which are executed by digital signal processing within an electronic computer. The basis of the present invention is to improve hoarse speech.
In addition to the voice to be improved, information on normal voice with the same content is used, so as shown in the figure, the youngest son's ceremony requires two people in principle to produce the voice and the normal voice. However, if normal speech with the same content cannot be obtained, as a second best measure, an artificial vocal fold waveform can be created and supplied to the synthesis section.

分析部1では、鳴声に対する分析(1)と、正常音声に
対する分析(2)とが行われる。
The analysis unit 1 performs analysis (1) for vocalizations and analysis (2) for normal voices.

まず鳴声に対する分析(1)について説明する。First, analysis (1) of voice will be explained.

へ/D変換器によって標本化された嗄声音信号を、有音
区間と無音区間に区別し、さらに有音区間に対して無声
音と有声音区間の判別を行う。無声音区間はそのまま記
録され、有声音区間に対していわゆる線形予測分析がな
され、スペクトル情報を担っている線形予測係数とピッ
チ情報を担っている残差信号が得られる。このとき、分
析(1)、すなわち鳴声に対する分析では、第2図に示
すようにフレーム長(分析窓幅)は固定(20〜30ミ
リ秒)で、フレーム周期はフレーム長の172とする。
The hoarse sound signal sampled by the D/D converter is divided into a voiced section and a silent section, and the voiced section is further distinguished into an unvoiced sound section and a voiced section. Unvoiced sound sections are recorded as they are, and so-called linear prediction analysis is performed on voiced sound sections to obtain linear prediction coefficients carrying spectral information and residual signals carrying pitch information. At this time, in the analysis (1), that is, the analysis of the voice, the frame length (analysis window width) is fixed (20 to 30 milliseconds) and the frame period is 172 of the frame length, as shown in FIG.

正常な音声は、基本波とその高調波からなり、明確なピ
ッチ情報があるため、それに同期した分析は可能である
が、鳴声ではピッチ情報が不完全なので、このようにフ
レーム長を固定し、フレーム周期だけずらせて分析する
Normal speech consists of a fundamental wave and its harmonics, and has clear pitch information, so it is possible to analyze it in synchronization with it. However, since pitch information is incomplete in speech, the frame length cannot be fixed in this way. , the frame period is shifted and analyzed.

次の特徴抽出部では第2図のフレーム長、フレーム周期
を用いて線形予測分析を行った結果得たスペクトル計数
(予測係数)のみを抽出し、ピッチ情報(残差信号)は
棄却する。
The next feature extraction section extracts only the spectral counts (prediction coefficients) obtained as a result of linear prediction analysis using the frame length and frame period shown in FIG. 2, and discards pitch information (residual signal).

次に、分析(2)、すなわち鳴声と同一内容の正常音声
に対する分析を行う。この場合有声音部に対して、通常
の分析を行うまえに先ずピッチ同期分析を行う。すなわ
ち第3図に示すように、ピッチの長さを検出して、その
区間をフレーム長とする。この場合はフレームは重複し
ない。次いで、正常音の有声部分の長さと、対応する鳴
声の有声部分の長さとを比較し、正常音声の長さを伸縮
し、1ピツチの精度で長さが等しくなるように調節する
。第4図は、第1図の分析(2)の部分の詳細である。
Next, analysis (2), that is, analysis of normal speech having the same content as the speech is performed. In this case, the voiced part is first subjected to pitch synchronization analysis before being subjected to normal analysis. That is, as shown in FIG. 3, the length of the pitch is detected and that section is taken as the frame length. In this case, frames will not overlap. Next, the length of the voiced part of the normal sound is compared with the length of the voiced part of the corresponding utterance, and the length of the normal sound is expanded or contracted to adjust the lengths to be equal with an accuracy of one pitch. FIG. 4 shows details of the analysis (2) part of FIG. 1.

ピッチ同期分析を行ったのち、長さを調節することは、
1ピツチの単位で伸縮が可能で、しかも特定部分の伸縮
ではなく、有声音区間全体にわたって、平均した伸縮が
できる利点がある。すなわち、第5図に示すように、正
常音声の有声音部のほうが鳴声より短い場合、いくつか
のピッチ区間を等間隔に直前の区間を挿入し、長い場合
には、逆にいくつかのピッチ区間を等間隔に間びいて長
さを揃える。
Adjusting the length after performing pitch synchronization analysis is
It has the advantage that expansion and contraction can be performed in units of one pitch, and that expansion and contraction can be averaged over the entire voiced sound section, rather than expansion and contraction of a specific part. In other words, as shown in Figure 5, when the voiced part of normal speech is shorter than the utterance, several pitch sections are inserted at equal intervals, and when it is long, several pitch sections are inserted at equal intervals. The pitch sections are spaced at equal intervals to make the length the same.

このようにして、各有声区間毎に正常音声の長さを調節
し、対応する吟声の有声音区間とほぼ同じになるように
作り替える。しかる後に、鳴声の分析(第1図の分析(
1))と同一の分析を行い、スペクトル情報(予測係数
)とピッチ情報(残差信号)を得るが、スペクトル情報
は不用なため棄却する。第1図の合成部では、吟声から
得られたスペクトル情報、すなわち線形予測係数を用い
て声道フィルターを構成し、正常音声から得られたピッ
チ情報、すなわち残差信号をそのフィルターに人力する
ことによって合成音声を得る。ちなみに、第1図におい
て、ピッチ情報を鳴声のそれにすると、出力音声は元の
吟声がそのまま復元され、逆に、スペクトル情報を正常
音声のそれにすると、元の正常音声が復元される。吟声
にはピッチ情報がもともとないか、あっても極めて不完
全であるが、合成の際、このように正常話者の残差信号
と入れ替えることによりピッチ情報を付加することがで
き、正常音声に近い音色の声が得られる。
In this way, the length of the normal voice is adjusted for each voiced section, and the length of the normal voice is rearranged so that it is almost the same as the corresponding voiced section. After that, analysis of vocalizations (analysis of Figure 1)
The same analysis as in 1)) is performed to obtain spectral information (prediction coefficients) and pitch information (residual signal), but the spectral information is unnecessary and is therefore discarded. In the synthesis section shown in Figure 1, a vocal tract filter is constructed using the spectrum information obtained from the ginsei, that is, the linear prediction coefficients, and the pitch information obtained from the normal speech, that is, the residual signal, is manually applied to the filter. Obtain synthetic speech by doing this. Incidentally, in FIG. 1, when the pitch information is set to be that of a voice, the original sung voice is restored as the output sound, and conversely, when the spectrum information is set to that of a normal voice, the original normal sound is restored. Ginsei originally does not have pitch information, or even if it does, it is extremely incomplete, but during synthesis, by replacing the residual signal of a normal speaker in this way, pitch information can be added, and normal speech can be reproduced. You can get a voice with a tone similar to that of

しかし、常に同一内容の正常音声が得られるとは限らな
い。このようなとき、第6図に示すように、三角波を模
擬した人工的な声帯波形を作り、それから得られるピッ
チ情報を声道フィルターに供給する。三角波の周期Tが
ピッチ情報を与えるのて、Tの値として鳴声話者の性別
2年齢で決まる平均的な値を設定する。また、三角波の
立上り(T1)、立ち下がり時間(T2)はそれぞれ数
ミリ秒が適当である。但しこの場合、得られた合成音出
力は、人工的な声帯波を使っているため、機械的な響き
の音声になり、肉声からかなり離れた響きの合成音とな
る。
However, it is not always possible to obtain normal audio with the same content. In such a case, as shown in FIG. 6, an artificial vocal cord waveform simulating a triangular wave is created, and the pitch information obtained from it is supplied to the vocal tract filter. Since the period T of the triangular wave provides pitch information, an average value determined by the gender and age of the speaker is set as the value of T. Further, it is appropriate that the rise (T1) and fall time (T2) of the triangular wave are each several milliseconds. However, in this case, the obtained synthesized sound output uses artificial vocal cord waves, so it becomes a mechanical sounding sound, and the synthesized sound has a sound that is quite different from the real voice.

第7図に鳴声と、上述した2種類の方法によって改善さ
れた音声の波形の一例を示す。同図(A)が吟声であり
、(B)は正常音声を用いて改善した音声波形、(C)
は三角波を用いて改善した音声波形である。なお、この
ようにして声道特性の変更を行うと、波形は多少変形す
るため、合成の際、フレームの接合部で不連続が生じる
場合がある。
FIG. 7 shows an example of waveforms of voice and voice improved by the two methods described above. In the same figure, (A) is Ginsei, (B) is the speech waveform improved using normal speech, and (C) is the speech waveform improved using normal speech.
is an improved audio waveform using a triangular wave. Note that when the vocal tract characteristics are changed in this manner, the waveform is slightly deformed, so that discontinuity may occur at the joint of frames during synthesis.

そこで、第8図に示すように、各フレーム毎に合成音出
力に振幅1の三角窓をかけ、隣り合うフレームとの重な
り合う部分のゲインの和が常に1になるように設定して
両者の波形を加えるとよい。
Therefore, as shown in Figure 8, a triangular window with an amplitude of 1 is applied to the synthesized sound output for each frame, and the sum of the gains in the overlapping parts of adjacent frames is always set to 1, so that the waveforms of both It is a good idea to add

このような操作は、波形の連続性を保ち、スムーズな合
成音を得るのに有効である。
Such operations are effective in maintaining waveform continuity and obtaining smooth synthesized sounds.

[発明の効果] 以上説明したように、本発明によれば、嗄声音において
不十分なピッチ情報を正常音声中または人工的な声帯波
形から抽出して、鳴声から抽出したスペクトル情報と合
成することによって、鳴声を改善することができる。
[Effects of the Invention] As explained above, according to the present invention, insufficient pitch information in hoarseness is extracted from normal speech or artificial vocal fold waveforms, and synthesized with spectral information extracted from the vocalizations. By doing so, the sound can be improved.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明のブロック図、 第2図は鳴声分析におけるフレームおよび分析窓幅を説
明する波形図、 第3図は正常音声に対するピッチ同期分析を説明する波
形図、 第4図は正常音声に対する分析のブロック図、 第5図は正常音声の有声区間の長さを調整する方法を示
す線図、 第6図は人工的な声帯波形図、 第7図(A)は鳴声の音声波形図、同図(B)は正常音
声を用いて改善した吟声波形図、同図(C)は三角波を
用いて改善した嘆声波形図、 第8図は合成波形の連続性を保存するための方法を示す
模式図である。 1・・・分析部、 2・・・特徴抽出部、 3・・・合成部。
Figure 1 is a block diagram of the present invention. Figure 2 is a waveform diagram explaining the frame and analysis window width in speech analysis. Figure 3 is a waveform diagram explaining pitch synchronization analysis for normal speech. Figure 4 is a waveform diagram explaining the pitch synchronization analysis for normal speech. A block diagram of the analysis for speech; Fig. 5 is a diagram showing a method for adjusting the length of voiced sections of normal speech; Fig. 6 is an artificial vocal fold waveform diagram; Fig. 7 (A) is the voice of vocalization. Waveform diagram: (B) is a wailing waveform diagram improved using normal speech, (C) is a wailing waveform diagram improved using a triangular wave, and Figure 8 preserves the continuity of the composite waveform. FIG. 1... Analysis section, 2... Feature extraction section, 3... Synthesis section.

Claims (1)

【特許請求の範囲】 1)嗄声音信号の有声音部分からスペクトル情報を抽出
して声道フィルターを構成し、該声道フィルターに前記
嗄声音信号以外の音声信号から抽出したピッチ情報を入
力して音声を合成することを特徴とする嗄声改善方法。 2)前記ピッチ情報を、前記嗄声音信号と同一内容の正
常音声信号の有声音部分から抽出することを特徴とする
特許請求の範囲第1項記載の嗄声改善方法。 3)前記ピッチ情報を、前記正常音声信号の有声音部分
の長さを、前記嗄声音信号の有声音部分の長さと一致さ
せてから抽出することを特徴とする特許請求の範囲第2
項記載の嗄声改善方法。 4)前記ピッチ情報を声帯波形を模擬した三角波から抽
出することを特徴とする特許請求の範囲第1項記載の嗄
声改善方法。 5)前記三角波の周期を前記嗄声音の発音者の性別、年
令で決まる平均的ピッチとすることを特徴とする特許請
求の範囲第4項記載の嗄声改善方法。
[Claims] 1) A vocal tract filter is configured by extracting spectrum information from a voiced part of a hoarse sound signal, and pitch information extracted from a voice signal other than the hoarse sound signal is input to the vocal tract filter. A method for improving hoarseness, which is characterized by synthesizing speech. 2) The method for improving hoarseness according to claim 1, wherein the pitch information is extracted from a voiced part of a normal speech signal having the same content as the hoarseness signal. 3) The pitch information is extracted after matching the length of the voiced part of the normal speech signal with the length of the voiced part of the hoarse sound signal.
How to improve hoarseness as described in section. 4) The hoarseness improvement method according to claim 1, characterized in that the pitch information is extracted from a triangular wave that simulates a vocal cord waveform. 5) The hoarseness improvement method according to claim 4, characterized in that the period of the triangular wave is set to an average pitch determined by the gender and age of the person producing the hoarseness.
JP61266704A 1986-11-11 1986-11-11 Voice improvement Pending JPS63121899A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP61266704A JPS63121899A (en) 1986-11-11 1986-11-11 Voice improvement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP61266704A JPS63121899A (en) 1986-11-11 1986-11-11 Voice improvement

Publications (1)

Publication Number Publication Date
JPS63121899A true JPS63121899A (en) 1988-05-25

Family

ID=17434523

Family Applications (1)

Application Number Title Priority Date Filing Date
JP61266704A Pending JPS63121899A (en) 1986-11-11 1986-11-11 Voice improvement

Country Status (1)

Country Link
JP (1) JPS63121899A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016002246A (en) * 2014-06-17 2016-01-12 株式会社電制 Electric type artificial larynx

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016002246A (en) * 2014-06-17 2016-01-12 株式会社電制 Electric type artificial larynx

Similar Documents

Publication Publication Date Title
JP3078205B2 (en) Speech synthesis method by connecting and partially overlapping waveforms
Schroeder Vocoders: Analysis and synthesis of speech
EP0982713A2 (en) Voice converter with extraction and modification of attribute data
WO2014046789A1 (en) System and method for voice transformation, speech synthesis, and speech recognition
Bonada et al. Sample-based singing voice synthesizer by spectral concatenation
RU2296377C2 (en) Method for analysis and synthesis of speech
JPH05307399A (en) Voice analysis system
US6594631B1 (en) Method for forming phoneme data and voice synthesizing apparatus utilizing a linear predictive coding distortion
Acero Source-filter models for time-scale pitch-scale modification of speech
JPS63121899A (en) Voice improvement
JP2841797B2 (en) Voice analysis and synthesis equipment
JP2612867B2 (en) Voice pitch conversion method
US10354671B1 (en) System and method for the analysis and synthesis of periodic and non-periodic components of speech signals
JP3035939B2 (en) Voice analysis and synthesis device
US7130799B1 (en) Speech synthesis method
JPS59501520A (en) Device for articulatory speech recognition
JP2000003200A (en) Voice signal processor and voice signal processing method
JP3294192B2 (en) Voice conversion device and voice conversion method
JPS62102294A (en) Voice coding system
JPH02293900A (en) Voice synthesizer
JP3949828B2 (en) Voice conversion device and voice conversion method
JP3083830B2 (en) Method and apparatus for controlling speech production time length
KR100359988B1 (en) real-time speaking rate conversion system
JPH0690638B2 (en) Speech analysis method
KR100322704B1 (en) Method for varying voice signal duration time