JPH04121794A - Speech recognizing method - Google Patents
Speech recognizing methodInfo
- Publication number
- JPH04121794A JPH04121794A JP24341290A JP24341290A JPH04121794A JP H04121794 A JPH04121794 A JP H04121794A JP 24341290 A JP24341290 A JP 24341290A JP 24341290 A JP24341290 A JP 24341290A JP H04121794 A JPH04121794 A JP H04121794A
- Authority
- JP
- Japan
- Prior art keywords
- frames
- time series
- difference values
- series pattern
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims description 18
- 238000000605 extraction Methods 0.000 abstract description 5
- 238000001228 spectrum Methods 0.000 abstract 1
- 230000005540 biological transmission Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 4
- 238000007796 conventional method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
Abstract
Description
【発明の詳細な説明】
[産業上の利用分野]
本発明は、電気錠、ICカード等のオンライン端末等て
入力音声からその単語を認識するに好適な音声認識方法
に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition method suitable for recognizing words from input speech using online terminals such as electric locks and IC cards.
[従来の技術]
従来、特開平1−260490号公報に記載の如くの音
声認識方法が提案されている。この音声認識方法は、入
力音声の特徴パラメータを所定長のフレーム単位て算出
し、フレーム内の特徴パラメータの差分値を求め、該差
分値の時系列パターンを作成し、この差分値の時系列パ
ターンと各音声の標準パターンとの類似度を統計的距離
尺度によって算出し、音声認識を行なうものである。[Prior Art] Conventionally, a speech recognition method as described in Japanese Patent Application Laid-Open No. 1-260490 has been proposed. This speech recognition method calculates the feature parameters of input speech in units of frames of a predetermined length, determines the difference value of the feature parameters within the frame, creates a time series pattern of the difference value, and creates a time series pattern of the difference value. This method calculates the similarity between the standard pattern of each voice and the standard pattern of each voice using a statistical distance measure, and performs voice recognition.
[発明か解決しようとする課題]
黙しながら、従来技術ては、入力音声の全フレームの特
徴パラメータを、それらフレームのパワーの大小にかか
わらずそのまま用いて、音声認識を行なっている。[Problems to be Solved by the Invention] In the prior art, speech recognition is performed by directly using the feature parameters of all frames of input speech, regardless of the power level of those frames.
然るに、パワーの小なるフレームの情報は、伝送系の歪
や定常雑音の影響を受は易いものであるため、類似度判
定の信頼度か低い。However, since information on frames with low power is easily affected by distortion and stationary noise in the transmission system, the reliability of similarity determination is low.
また、パワーの小なるフレーム間の差分値は、周波数領
域て差分をとるものであるため、パワーの大なるフレー
ム間の差分値におけると同等に扱われるものとなり、認
識率への影響は大きい。Furthermore, since the difference value between frames with low power is determined by taking the difference in the frequency domain, it is treated in the same way as the difference value between frames with high power, and this has a large effect on the recognition rate.
即ち、従来技術ては、類似度判定の信頼度か低いパワー
の小なるフレームの情報が、大きな影響度て認識率に影
響する結果、高い認識率の確保に困難かある。That is, in the prior art, it is difficult to ensure a high recognition rate because information of a small frame with low power and low reliability of similarity determination has a large influence on the recognition rate.
本発明は、定常的なスペクトル歪に強く、高い認識率を
確保てきる音声認識方法を提供することを目的とする。An object of the present invention is to provide a speech recognition method that is resistant to stationary spectral distortion and can ensure a high recognition rate.
[課趙を解決するための手段]
請求項1に記載の本発明は、入力音声の特徴パラメータ
を所定長のフレーム単位て算出し、各フレームのパワー
の実効値か任意のしきい値より小なるとき、当該フレー
ムの特徴パラメータを除外した後、フレーム間の特徴パ
ラメータの差分値を求め、該差分値の時系列パターンを
作成し、この差分値の時系列パターンと各音声の標準パ
ターンとの類似度を統計的距離尺度によって算出し、音
声認識を行なうようにしだものである。[Means for solving the problem] The present invention according to claim 1 calculates the characteristic parameters of input speech in units of frames of a predetermined length, and calculates whether the effective value of the power of each frame is smaller than an arbitrary threshold value. After excluding the feature parameters of the frame, calculate the difference value of the feature parameters between frames, create a time series pattern of the difference value, and compare the time series pattern of this difference value with the standard pattern of each voice. This method calculates similarity using a statistical distance measure and performs speech recognition.
請求項2に記載の本発明は、入力音声の特徴パラメータ
を所定長のフレーム単位で算出し、各フレームのパワー
の実効値か任意のしきい値より小なるとき、当該フレー
ムの特徴パラメータの影響が少なくなるように重み付け
を行なった後、フレーム間の特徴パラメータの差分値を
求め、該差分値の時系列パターンを作成し、この差分値
の時系列パターンと各音声の標準パターンとの類似度を
統計的距離尺度によって算出し、音声認識を行なうよう
にしたものである。The present invention as set forth in claim 2 calculates the characteristic parameters of input audio in units of frames of a predetermined length, and when the effective value of the power of each frame is smaller than an arbitrary threshold value, the influence of the characteristic parameters of the frame is calculated. After weighting is performed to reduce the difference between frames, the difference value of the feature parameters between frames is calculated, a time series pattern of the difference value is created, and the similarity between the time series pattern of this difference value and the standard pattern of each voice is calculated. is calculated using a statistical distance measure to perform speech recognition.
[作用]
本発明によれば、伝送系の歪や定常雑音の影響を受は易
く、類似度判定の信頼度か低いパワーの小なるフレーム
の特徴パラメータを、除外、又は影響か少なくなるよう
に重み付けした後、フレーム間の特徴パラメータの差分
値を求め、この差分値に基づいて音声認識を行なうこと
となる。[Operation] According to the present invention, feature parameters of small frames that are easily affected by distortion and stationary noise in the transmission system and have low power and low reliability for similarity determination are excluded or their influence is reduced. After weighting, difference values of feature parameters between frames are determined, and speech recognition is performed based on this difference value.
即ち、パワーの大小にかかわらず全フレーム間の差分値
を用いるものに比して、伝送系の歪みや定常雑音の影響
を消去した音声認識を行なうこととなる。従って、定常
的なスペクトル歪に強く、高い認識率を確保てきる音声
認識方法を得ることがてきる。That is, compared to the method that uses difference values between all frames regardless of the power level, speech recognition is performed while eliminating the effects of transmission system distortion and stationary noise. Therefore, it is possible to obtain a speech recognition method that is resistant to stationary spectral distortion and can ensure a high recognition rate.
[実施例]
第1図は本発明の一実施例に係る音声認識システムを示
す模式図である。[Embodiment] FIG. 1 is a schematic diagram showing a speech recognition system according to an embodiment of the present invention.
音声認識システム10は、音声入力部11、特徴抽出部
12、パワー判定部13、差分値作成部14、時系列パ
ターン作成部15、辞書部(標準パターン格納部)16
、類似度算出部17、判定部18を有して構成される。The speech recognition system 10 includes a speech input section 11, a feature extraction section 12, a power determination section 13, a difference value creation section 14, a time series pattern creation section 15, and a dictionary section (standard pattern storage section) 16.
, a similarity calculation section 17, and a determination section 18.
以下、音声認識システム10を用いた辞書作成手順、認
識手順について説明する。Hereinafter, a dictionary creation procedure and a recognition procedure using the speech recognition system 10 will be explained.
(A)音声入力部11にて、音声試料を取り入れる。(A) A voice sample is taken into the voice input section 11.
このとき、認識単語を47都道府県名、特定話者を1名
とした。At this time, the recognized words were the names of 47 prefectures, and the specific speaker was one person.
TB)辞書作成
■各認識単語の既知入力音声波形を、特徴抽出部12に
おいて、16チヤンネルのバンドパスフィルタに通し、
1フレーム(12,8m5ec)毎に周波数特性を得る
。TB) Dictionary creation ■The known input speech waveform of each recognized word is passed through a 16-channel bandpass filter in the feature extraction unit 12,
Frequency characteristics are obtained every frame (12.8 m5ec).
■パワー制定部13において、実験的に決めたしきい値
θと各フレームの周波数特性のパワーの実効値を比較し
、パワーの実効値かしきい値θより小なるフレームの特
徴パラメータを除外する。■The power establishing unit 13 compares the experimentally determined threshold value θ with the effective value of the power of the frequency characteristic of each frame, and excludes the characteristic parameters of frames whose effective value of power is smaller than the threshold value θ. .
■差分値作成部14において、フレーム間の特徴パラメ
ータの差分値を求め、時系列パターン作成部15におい
て、該差分値の時系列パターンを作成する0時系列パタ
ーン作成部15て作成した差分値の時系列パターンを辞
書部16に格納し、辞書とする。■The difference value creation unit 14 calculates the difference value of the feature parameters between frames, and the time series pattern creation unit 15 creates a time series pattern of the difference value. The time series pattern is stored in the dictionary section 16 and used as a dictionary.
(C)認識
■各認識単語の未知入力音声波形に定常雑音を付加した
ものを、特徴抽出部12において、16チヤンネルのバ
ンドパスフィルタに通し、1フレーム(12,8m5e
c)毎に周波数特性を得る。(C) Recognition ■The unknown input speech waveform of each recognition word with stationary noise added is passed through a 16-channel band-pass filter in the feature extraction unit 12 for one frame (12,8m5e
c) Obtain frequency characteristics for each step.
■パワー判定部13において、実験的に決めたしきい値
θと各フレームの周波数特性のパワーの実効値を比較し
、パワーの実効値がしきい値θより小なるフレームの特
徴パラメータを除外する。■The power determination unit 13 compares the effective value of the power of the frequency characteristic of each frame with the experimentally determined threshold θ, and excludes characteristic parameters of frames whose effective value of power is smaller than the threshold θ. .
■差分値作成部14において、フレーム間の差分値を求
め、時系列パターン作成部15において、該差分値の時
系列パターンを作成する。(2) A difference value creation unit 14 calculates a difference value between frames, and a time series pattern creation unit 15 creates a time series pattern of the difference values.
■類似度算出部17において、上記■で作成した差分値
の時系列パターンと、辞書部16に格納しである各音声
の標準パターンとの類似度を統計的距離尺度によって算
出する。(2) The similarity calculation unit 17 calculates the similarity between the time series pattern of the difference values created in (2) above and the standard pattern of each voice stored in the dictionary unit 16 using a statistical distance measure.
■判定部18において、上記■の結果、類似度か最も高
いものを認識結果とする。(2) In the determination unit 18, the one with the highest degree of similarity as a result of (2) above is determined as the recognition result.
然るに、従来方式と、上記音声認識システム10による
本発明方式の実験結果について説明する。However, experimental results of the conventional method and the method of the present invention using the speech recognition system 10 described above will be explained.
(従来方式)
実験:特徴パラメータ(バントパスフィルタの出力)の
フレーム間差分値を用い、統計的距離尺度により計算し
たとき。(Conventional method) Experiment: When calculated using a statistical distance measure using inter-frame difference values of feature parameters (outputs of band-pass filters).
尚、特定話者を1名、認識単語を47都道府県名とした
。The specific speaker was one person, and the recognized words were the names of 47 prefectures.
結果:認識率は93.2%てあった。Results: The recognition rate was 93.2%.
(本発明方式)
実験:パワーの小なる特徴パラメータ(バントパスフィ
ルタの出力)を除外し、入力にフレーム間差分値を用い
、統計的距離尺度により認識したとき。(Method of the present invention) Experiment: When feature parameters with small power (output of band pass filter) are excluded, inter-frame difference values are used as input, and recognition is performed using a statistical distance measure.
尚、特定話者を1名、認識単語を47都道府県名とした
。The specific speaker was one person, and the recognized words were the names of 47 prefectures.
結果:認識率は95.3%てあった。Results: The recognition rate was 95.3%.
尚、本発明の実施において、辞書作成段階、及び認識段
1i1(上述の(B)の■の段階、及び(C)の■の段
階)て、パワーの小さいフレームの特徴パラメータを除
外することなく、当該フレームの特徴パラメータの影響
か少なくなるように重み付けを行なうものてあっても良
い。In the implementation of the present invention, the dictionary creation stage and the recognition stage 1i1 (stage (B) (■) and (C) (■) described above) do not exclude feature parameters of frames with low power. , weighting may be performed to reduce the influence of the characteristic parameters of the frame.
上記音声認識システム10によれば、以下の如くの作用
かある。The voice recognition system 10 has the following effects.
上記実施例によれば、伝送系の歪や定常雑音の影響を受
は易く、類似度判定の信頼度か低いパワーの小なるフレ
ームの特徴パラメータを、除外、又は影響が少なくなる
ように重み付けした後、フレーム間の特徴パラメータの
差分値を求め、この差分値に基づいて音声認識を行なう
こととなる。According to the above embodiment, feature parameters of small frames that are easily affected by transmission system distortion and stationary noise and have low power or reliability for similarity determination are excluded or weighted to reduce the influence. After that, difference values of feature parameters between frames are determined, and speech recognition is performed based on this difference value.
即ち、パワーの大小にかかわらず全フレーム間の差分値
を用いるものに比して、伝送系の歪みゃ定常雑音の影響
を消去した音声認識を行なうこととなる。従って、定常
的なスペクトル歪に強く、高い認識率を確保できる音声
認識方法を得ることがてきる。That is, compared to the method that uses difference values between all frames regardless of the power level, speech recognition is performed while eliminating the effects of distortion and stationary noise in the transmission system. Therefore, it is possible to obtain a speech recognition method that is resistant to stationary spectral distortion and can ensure a high recognition rate.
[発明の効果]
以上のように本発明によれば、定常的なスペクトル歪に
強く、高い認識率を確保できる音声認識方法を得ること
かてきる。[Effects of the Invention] As described above, according to the present invention, it is possible to obtain a speech recognition method that is resistant to stationary spectral distortion and can ensure a high recognition rate.
第1図は本発明の一実施例に係る音声認識システムを示
す模式図である。
10・・・音声認識システム、
11・・・音声入力部、
12・・・特徴抽出部、
13・・・パワー判定部、
14・・・差分値作成部、
15・・・時系列パターン作成部、
16・・・辞書部、
17・・・類似度算出部、
18・・・判定部。
特許出願人 積水化学工業株式会社
代表者 廣 1) 馨FIG. 1 is a schematic diagram showing a speech recognition system according to an embodiment of the present invention. DESCRIPTION OF SYMBOLS 10... Speech recognition system, 11... Speech input unit, 12... Feature extraction unit, 13... Power determination unit, 14... Difference value creation unit, 15... Time series pattern creation unit , 16... Dictionary section, 17... Similarity calculation section, 18... Judgment section. Patent applicant: Sekisui Chemical Co., Ltd. Representative Hiroshi 1) Kaoru
Claims (2)
位で算出し、各フレームのパワーの実効値が任意のしき
い値より小なるとき、当該フレームの特徴パラメータを
除外した後、フレーム間の特徴パラメータの差分値を求
め、該差分値の時系列パターンを作成し、この差分値の
時系列パターンと各音声の標準パターンとの類似度を統
計的距離尺度によって算出し、音声認識を行なう音声認
識方法。(1) Calculate the feature parameters of the input audio in units of frames of a predetermined length, and when the effective value of the power of each frame is smaller than an arbitrary threshold, after excluding the feature parameters of the frame, Speech recognition that calculates the difference values of parameters, creates a time series pattern of the difference values, calculates the similarity between this time series pattern of the difference values and the standard pattern of each voice using a statistical distance scale, and performs speech recognition. Method.
位で算出し、各フレームのパワーの実効値が任意のしき
い値より小なるとき、当該フレームの特徴パラメータの
影響が少なくなるように重み付けを行なった後、フレー
ム間の特徴パラメータの差分値を求め、該差分値の時系
列パターンを作成し、この差分値の時系列パターンと各
音声の標準パターンとの類似度を統計的距離尺度によっ
て算出し、音声認識を行なう音声認識方法。(2) Calculate the feature parameters of the input audio in units of frames of a predetermined length, and when the effective value of the power of each frame is smaller than an arbitrary threshold, weighting is applied so that the influence of the feature parameters of the frame is reduced. After that, calculate the difference value of the feature parameters between frames, create a time series pattern of the difference value, and calculate the similarity between this time series pattern of the difference value and the standard pattern of each voice using a statistical distance measure. A voice recognition method that performs voice recognition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP24341290A JPH04121794A (en) | 1990-09-12 | 1990-09-12 | Speech recognizing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP24341290A JPH04121794A (en) | 1990-09-12 | 1990-09-12 | Speech recognizing method |
Publications (1)
Publication Number | Publication Date |
---|---|
JPH04121794A true JPH04121794A (en) | 1992-04-22 |
Family
ID=17103483
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP24341290A Pending JPH04121794A (en) | 1990-09-12 | 1990-09-12 | Speech recognizing method |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPH04121794A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0675962A (en) * | 1992-05-01 | 1994-03-18 | Internatl Business Mach Corp <Ibm> | Method and device for automatic detection/processing for vacant multimedia data object |
-
1990
- 1990-09-12 JP JP24341290A patent/JPH04121794A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0675962A (en) * | 1992-05-01 | 1994-03-18 | Internatl Business Mach Corp <Ibm> | Method and device for automatic detection/processing for vacant multimedia data object |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112053695A (en) | Voiceprint recognition method and device, electronic equipment and storage medium | |
KR100463657B1 (en) | Apparatus and method of voice region detection | |
Singh et al. | Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition | |
JPH04121794A (en) | Speech recognizing method | |
CN113948088A (en) | Voice recognition method and device based on waveform simulation | |
CN115862636B (en) | Internet man-machine verification method based on voice recognition technology | |
JP2968976B2 (en) | Voice recognition device | |
Coy et al. | Soft harmonic masks for recognising speech in the presence of a competing speaker. | |
JPH04121799A (en) | Speech recognizing method | |
JPH04163600A (en) | Method of speaker recognition | |
Weißkirchen et al. | Utilizing computer vision algorithms to detect and describe local features in images for emotion recognition from speech | |
JPH03122699A (en) | Noise removing device and voice recognition device using same device | |
JPH0465399B2 (en) | ||
JPH03230200A (en) | Voice recognizing method | |
Zão et al. | Noise robust speaker verification based on the MFCC and pH features fusion and multicondition training | |
JPS62211698A (en) | Detection of voice section | |
JPH0415699A (en) | Speaker recognition system | |
CN113571054A (en) | Speech recognition signal preprocessing method, device, equipment and computer storage medium | |
JPH04163599A (en) | Method of speaker recognition | |
JPH02302799A (en) | Speech recognition system | |
JPS59181396A (en) | Rematching speech recognition method | |
JPH0558560B2 (en) | ||
JPS59124388A (en) | Word speech recognition processing method | |
JPS62293299A (en) | Voice recognition | |
JPH02273799A (en) | Speaker recognition system |