[go: up one dir, main page]

JPH04121794A - Speech recognizing method - Google Patents

Speech recognizing method

Info

Publication number
JPH04121794A
JPH04121794A JP24341290A JP24341290A JPH04121794A JP H04121794 A JPH04121794 A JP H04121794A JP 24341290 A JP24341290 A JP 24341290A JP 24341290 A JP24341290 A JP 24341290A JP H04121794 A JPH04121794 A JP H04121794A
Authority
JP
Japan
Prior art keywords
frames
time series
difference values
series pattern
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP24341290A
Other languages
Japanese (ja)
Inventor
Kazuhiko Okashita
和彦 岡下
Shingo Nishimura
新吾 西村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sekisui Chemical Co Ltd
Original Assignee
Sekisui Chemical Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sekisui Chemical Co Ltd filed Critical Sekisui Chemical Co Ltd
Priority to JP24341290A priority Critical patent/JPH04121794A/en
Publication of JPH04121794A publication Critical patent/JPH04121794A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To obtain tolerance to spectrum distortion and a high recognition rate by removing parameters which are smaller than a threshold value and then finding difference values in feature parameter between frames, generating a time-series pattern of the difference values, and calculating the similarity between each speech and a standard pattern on a statistical distance scale. CONSTITUTION:1. A speech sample is inputted to a speech input part 1. 2. A feature extraction part 12 obtains frequency characteristics, frame by frame, through a band- pass filter. 3. A power decision part compares effective values of power of the frequency characteristics of the respective frames with the threshold value theta and removes the feature parameters which are smaller than the threshold value theta. 4. A difference value generation part 14 finds the difference values between frames and a time series pattern generation part 15 generates the time series pattern of the difference values. 5. A similarity calculation part 17 calculates the similarity between the time series pattern of the difference values and standard patterns of respective speeches stored in a dictionary part 16 on the statistical distance scale. 6. A decision part 18 decides the recognition result which has the highest similarity. Consequently, speech recognition wherein the influence of distortion and noises is removed is enabled regardless of whether the power is large or small.

Description

【発明の詳細な説明】 [産業上の利用分野] 本発明は、電気錠、ICカード等のオンライン端末等て
入力音声からその単語を認識するに好適な音声認識方法
に関する。
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition method suitable for recognizing words from input speech using online terminals such as electric locks and IC cards.

[従来の技術] 従来、特開平1−260490号公報に記載の如くの音
声認識方法が提案されている。この音声認識方法は、入
力音声の特徴パラメータを所定長のフレーム単位て算出
し、フレーム内の特徴パラメータの差分値を求め、該差
分値の時系列パターンを作成し、この差分値の時系列パ
ターンと各音声の標準パターンとの類似度を統計的距離
尺度によって算出し、音声認識を行なうものである。
[Prior Art] Conventionally, a speech recognition method as described in Japanese Patent Application Laid-Open No. 1-260490 has been proposed. This speech recognition method calculates the feature parameters of input speech in units of frames of a predetermined length, determines the difference value of the feature parameters within the frame, creates a time series pattern of the difference value, and creates a time series pattern of the difference value. This method calculates the similarity between the standard pattern of each voice and the standard pattern of each voice using a statistical distance measure, and performs voice recognition.

[発明か解決しようとする課題] 黙しながら、従来技術ては、入力音声の全フレームの特
徴パラメータを、それらフレームのパワーの大小にかか
わらずそのまま用いて、音声認識を行なっている。
[Problems to be Solved by the Invention] In the prior art, speech recognition is performed by directly using the feature parameters of all frames of input speech, regardless of the power level of those frames.

然るに、パワーの小なるフレームの情報は、伝送系の歪
や定常雑音の影響を受は易いものであるため、類似度判
定の信頼度か低い。
However, since information on frames with low power is easily affected by distortion and stationary noise in the transmission system, the reliability of similarity determination is low.

また、パワーの小なるフレーム間の差分値は、周波数領
域て差分をとるものであるため、パワーの大なるフレー
ム間の差分値におけると同等に扱われるものとなり、認
識率への影響は大きい。
Furthermore, since the difference value between frames with low power is determined by taking the difference in the frequency domain, it is treated in the same way as the difference value between frames with high power, and this has a large effect on the recognition rate.

即ち、従来技術ては、類似度判定の信頼度か低いパワー
の小なるフレームの情報が、大きな影響度て認識率に影
響する結果、高い認識率の確保に困難かある。
That is, in the prior art, it is difficult to ensure a high recognition rate because information of a small frame with low power and low reliability of similarity determination has a large influence on the recognition rate.

本発明は、定常的なスペクトル歪に強く、高い認識率を
確保てきる音声認識方法を提供することを目的とする。
An object of the present invention is to provide a speech recognition method that is resistant to stationary spectral distortion and can ensure a high recognition rate.

[課趙を解決するための手段] 請求項1に記載の本発明は、入力音声の特徴パラメータ
を所定長のフレーム単位て算出し、各フレームのパワー
の実効値か任意のしきい値より小なるとき、当該フレー
ムの特徴パラメータを除外した後、フレーム間の特徴パ
ラメータの差分値を求め、該差分値の時系列パターンを
作成し、この差分値の時系列パターンと各音声の標準パ
ターンとの類似度を統計的距離尺度によって算出し、音
声認識を行なうようにしだものである。
[Means for solving the problem] The present invention according to claim 1 calculates the characteristic parameters of input speech in units of frames of a predetermined length, and calculates whether the effective value of the power of each frame is smaller than an arbitrary threshold value. After excluding the feature parameters of the frame, calculate the difference value of the feature parameters between frames, create a time series pattern of the difference value, and compare the time series pattern of this difference value with the standard pattern of each voice. This method calculates similarity using a statistical distance measure and performs speech recognition.

請求項2に記載の本発明は、入力音声の特徴パラメータ
を所定長のフレーム単位で算出し、各フレームのパワー
の実効値か任意のしきい値より小なるとき、当該フレー
ムの特徴パラメータの影響が少なくなるように重み付け
を行なった後、フレーム間の特徴パラメータの差分値を
求め、該差分値の時系列パターンを作成し、この差分値
の時系列パターンと各音声の標準パターンとの類似度を
統計的距離尺度によって算出し、音声認識を行なうよう
にしたものである。
The present invention as set forth in claim 2 calculates the characteristic parameters of input audio in units of frames of a predetermined length, and when the effective value of the power of each frame is smaller than an arbitrary threshold value, the influence of the characteristic parameters of the frame is calculated. After weighting is performed to reduce the difference between frames, the difference value of the feature parameters between frames is calculated, a time series pattern of the difference value is created, and the similarity between the time series pattern of this difference value and the standard pattern of each voice is calculated. is calculated using a statistical distance measure to perform speech recognition.

[作用] 本発明によれば、伝送系の歪や定常雑音の影響を受は易
く、類似度判定の信頼度か低いパワーの小なるフレーム
の特徴パラメータを、除外、又は影響か少なくなるよう
に重み付けした後、フレーム間の特徴パラメータの差分
値を求め、この差分値に基づいて音声認識を行なうこと
となる。
[Operation] According to the present invention, feature parameters of small frames that are easily affected by distortion and stationary noise in the transmission system and have low power and low reliability for similarity determination are excluded or their influence is reduced. After weighting, difference values of feature parameters between frames are determined, and speech recognition is performed based on this difference value.

即ち、パワーの大小にかかわらず全フレーム間の差分値
を用いるものに比して、伝送系の歪みや定常雑音の影響
を消去した音声認識を行なうこととなる。従って、定常
的なスペクトル歪に強く、高い認識率を確保てきる音声
認識方法を得ることがてきる。
That is, compared to the method that uses difference values between all frames regardless of the power level, speech recognition is performed while eliminating the effects of transmission system distortion and stationary noise. Therefore, it is possible to obtain a speech recognition method that is resistant to stationary spectral distortion and can ensure a high recognition rate.

[実施例] 第1図は本発明の一実施例に係る音声認識システムを示
す模式図である。
[Embodiment] FIG. 1 is a schematic diagram showing a speech recognition system according to an embodiment of the present invention.

音声認識システム10は、音声入力部11、特徴抽出部
12、パワー判定部13、差分値作成部14、時系列パ
ターン作成部15、辞書部(標準パターン格納部)16
、類似度算出部17、判定部18を有して構成される。
The speech recognition system 10 includes a speech input section 11, a feature extraction section 12, a power determination section 13, a difference value creation section 14, a time series pattern creation section 15, and a dictionary section (standard pattern storage section) 16.
, a similarity calculation section 17, and a determination section 18.

以下、音声認識システム10を用いた辞書作成手順、認
識手順について説明する。
Hereinafter, a dictionary creation procedure and a recognition procedure using the speech recognition system 10 will be explained.

(A)音声入力部11にて、音声試料を取り入れる。(A) A voice sample is taken into the voice input section 11.

このとき、認識単語を47都道府県名、特定話者を1名
とした。
At this time, the recognized words were the names of 47 prefectures, and the specific speaker was one person.

TB)辞書作成 ■各認識単語の既知入力音声波形を、特徴抽出部12に
おいて、16チヤンネルのバンドパスフィルタに通し、
1フレーム(12,8m5ec)毎に周波数特性を得る
TB) Dictionary creation ■The known input speech waveform of each recognized word is passed through a 16-channel bandpass filter in the feature extraction unit 12,
Frequency characteristics are obtained every frame (12.8 m5ec).

■パワー制定部13において、実験的に決めたしきい値
θと各フレームの周波数特性のパワーの実効値を比較し
、パワーの実効値かしきい値θより小なるフレームの特
徴パラメータを除外する。
■The power establishing unit 13 compares the experimentally determined threshold value θ with the effective value of the power of the frequency characteristic of each frame, and excludes the characteristic parameters of frames whose effective value of power is smaller than the threshold value θ. .

■差分値作成部14において、フレーム間の特徴パラメ
ータの差分値を求め、時系列パターン作成部15におい
て、該差分値の時系列パターンを作成する0時系列パタ
ーン作成部15て作成した差分値の時系列パターンを辞
書部16に格納し、辞書とする。
■The difference value creation unit 14 calculates the difference value of the feature parameters between frames, and the time series pattern creation unit 15 creates a time series pattern of the difference value. The time series pattern is stored in the dictionary section 16 and used as a dictionary.

(C)認識 ■各認識単語の未知入力音声波形に定常雑音を付加した
ものを、特徴抽出部12において、16チヤンネルのバ
ンドパスフィルタに通し、1フレーム(12,8m5e
c)毎に周波数特性を得る。
(C) Recognition ■The unknown input speech waveform of each recognition word with stationary noise added is passed through a 16-channel band-pass filter in the feature extraction unit 12 for one frame (12,8m5e
c) Obtain frequency characteristics for each step.

■パワー判定部13において、実験的に決めたしきい値
θと各フレームの周波数特性のパワーの実効値を比較し
、パワーの実効値がしきい値θより小なるフレームの特
徴パラメータを除外する。
■The power determination unit 13 compares the effective value of the power of the frequency characteristic of each frame with the experimentally determined threshold θ, and excludes characteristic parameters of frames whose effective value of power is smaller than the threshold θ. .

■差分値作成部14において、フレーム間の差分値を求
め、時系列パターン作成部15において、該差分値の時
系列パターンを作成する。
(2) A difference value creation unit 14 calculates a difference value between frames, and a time series pattern creation unit 15 creates a time series pattern of the difference values.

■類似度算出部17において、上記■で作成した差分値
の時系列パターンと、辞書部16に格納しである各音声
の標準パターンとの類似度を統計的距離尺度によって算
出する。
(2) The similarity calculation unit 17 calculates the similarity between the time series pattern of the difference values created in (2) above and the standard pattern of each voice stored in the dictionary unit 16 using a statistical distance measure.

■判定部18において、上記■の結果、類似度か最も高
いものを認識結果とする。
(2) In the determination unit 18, the one with the highest degree of similarity as a result of (2) above is determined as the recognition result.

然るに、従来方式と、上記音声認識システム10による
本発明方式の実験結果について説明する。
However, experimental results of the conventional method and the method of the present invention using the speech recognition system 10 described above will be explained.

(従来方式) 実験:特徴パラメータ(バントパスフィルタの出力)の
フレーム間差分値を用い、統計的距離尺度により計算し
たとき。
(Conventional method) Experiment: When calculated using a statistical distance measure using inter-frame difference values of feature parameters (outputs of band-pass filters).

尚、特定話者を1名、認識単語を47都道府県名とした
The specific speaker was one person, and the recognized words were the names of 47 prefectures.

結果:認識率は93.2%てあった。Results: The recognition rate was 93.2%.

(本発明方式) 実験:パワーの小なる特徴パラメータ(バントパスフィ
ルタの出力)を除外し、入力にフレーム間差分値を用い
、統計的距離尺度により認識したとき。
(Method of the present invention) Experiment: When feature parameters with small power (output of band pass filter) are excluded, inter-frame difference values are used as input, and recognition is performed using a statistical distance measure.

尚、特定話者を1名、認識単語を47都道府県名とした
The specific speaker was one person, and the recognized words were the names of 47 prefectures.

結果:認識率は95.3%てあった。Results: The recognition rate was 95.3%.

尚、本発明の実施において、辞書作成段階、及び認識段
1i1(上述の(B)の■の段階、及び(C)の■の段
階)て、パワーの小さいフレームの特徴パラメータを除
外することなく、当該フレームの特徴パラメータの影響
か少なくなるように重み付けを行なうものてあっても良
い。
In the implementation of the present invention, the dictionary creation stage and the recognition stage 1i1 (stage (B) (■) and (C) (■) described above) do not exclude feature parameters of frames with low power. , weighting may be performed to reduce the influence of the characteristic parameters of the frame.

上記音声認識システム10によれば、以下の如くの作用
かある。
The voice recognition system 10 has the following effects.

上記実施例によれば、伝送系の歪や定常雑音の影響を受
は易く、類似度判定の信頼度か低いパワーの小なるフレ
ームの特徴パラメータを、除外、又は影響が少なくなる
ように重み付けした後、フレーム間の特徴パラメータの
差分値を求め、この差分値に基づいて音声認識を行なう
こととなる。
According to the above embodiment, feature parameters of small frames that are easily affected by transmission system distortion and stationary noise and have low power or reliability for similarity determination are excluded or weighted to reduce the influence. After that, difference values of feature parameters between frames are determined, and speech recognition is performed based on this difference value.

即ち、パワーの大小にかかわらず全フレーム間の差分値
を用いるものに比して、伝送系の歪みゃ定常雑音の影響
を消去した音声認識を行なうこととなる。従って、定常
的なスペクトル歪に強く、高い認識率を確保できる音声
認識方法を得ることがてきる。
That is, compared to the method that uses difference values between all frames regardless of the power level, speech recognition is performed while eliminating the effects of distortion and stationary noise in the transmission system. Therefore, it is possible to obtain a speech recognition method that is resistant to stationary spectral distortion and can ensure a high recognition rate.

[発明の効果] 以上のように本発明によれば、定常的なスペクトル歪に
強く、高い認識率を確保できる音声認識方法を得ること
かてきる。
[Effects of the Invention] As described above, according to the present invention, it is possible to obtain a speech recognition method that is resistant to stationary spectral distortion and can ensure a high recognition rate.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例に係る音声認識システムを示
す模式図である。 10・・・音声認識システム、 11・・・音声入力部、 12・・・特徴抽出部、 13・・・パワー判定部、 14・・・差分値作成部、 15・・・時系列パターン作成部、 16・・・辞書部、 17・・・類似度算出部、 18・・・判定部。 特許出願人 積水化学工業株式会社 代表者 廣 1) 馨
FIG. 1 is a schematic diagram showing a speech recognition system according to an embodiment of the present invention. DESCRIPTION OF SYMBOLS 10... Speech recognition system, 11... Speech input unit, 12... Feature extraction unit, 13... Power determination unit, 14... Difference value creation unit, 15... Time series pattern creation unit , 16... Dictionary section, 17... Similarity calculation section, 18... Judgment section. Patent applicant: Sekisui Chemical Co., Ltd. Representative Hiroshi 1) Kaoru

Claims (2)

【特許請求の範囲】[Claims] (1)入力音声の特徴パラメータを所定長のフレーム単
位で算出し、各フレームのパワーの実効値が任意のしき
い値より小なるとき、当該フレームの特徴パラメータを
除外した後、フレーム間の特徴パラメータの差分値を求
め、該差分値の時系列パターンを作成し、この差分値の
時系列パターンと各音声の標準パターンとの類似度を統
計的距離尺度によって算出し、音声認識を行なう音声認
識方法。
(1) Calculate the feature parameters of the input audio in units of frames of a predetermined length, and when the effective value of the power of each frame is smaller than an arbitrary threshold, after excluding the feature parameters of the frame, Speech recognition that calculates the difference values of parameters, creates a time series pattern of the difference values, calculates the similarity between this time series pattern of the difference values and the standard pattern of each voice using a statistical distance scale, and performs speech recognition. Method.
(2)入力音声の特徴パラメータを所定長のフレーム単
位で算出し、各フレームのパワーの実効値が任意のしき
い値より小なるとき、当該フレームの特徴パラメータの
影響が少なくなるように重み付けを行なった後、フレー
ム間の特徴パラメータの差分値を求め、該差分値の時系
列パターンを作成し、この差分値の時系列パターンと各
音声の標準パターンとの類似度を統計的距離尺度によっ
て算出し、音声認識を行なう音声認識方法。
(2) Calculate the feature parameters of the input audio in units of frames of a predetermined length, and when the effective value of the power of each frame is smaller than an arbitrary threshold, weighting is applied so that the influence of the feature parameters of the frame is reduced. After that, calculate the difference value of the feature parameters between frames, create a time series pattern of the difference value, and calculate the similarity between this time series pattern of the difference value and the standard pattern of each voice using a statistical distance measure. A voice recognition method that performs voice recognition.
JP24341290A 1990-09-12 1990-09-12 Speech recognizing method Pending JPH04121794A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP24341290A JPH04121794A (en) 1990-09-12 1990-09-12 Speech recognizing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP24341290A JPH04121794A (en) 1990-09-12 1990-09-12 Speech recognizing method

Publications (1)

Publication Number Publication Date
JPH04121794A true JPH04121794A (en) 1992-04-22

Family

ID=17103483

Family Applications (1)

Application Number Title Priority Date Filing Date
JP24341290A Pending JPH04121794A (en) 1990-09-12 1990-09-12 Speech recognizing method

Country Status (1)

Country Link
JP (1) JPH04121794A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0675962A (en) * 1992-05-01 1994-03-18 Internatl Business Mach Corp <Ibm> Method and device for automatic detection/processing for vacant multimedia data object

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0675962A (en) * 1992-05-01 1994-03-18 Internatl Business Mach Corp <Ibm> Method and device for automatic detection/processing for vacant multimedia data object

Similar Documents

Publication Publication Date Title
CN112053695A (en) Voiceprint recognition method and device, electronic equipment and storage medium
KR100463657B1 (en) Apparatus and method of voice region detection
Singh et al. Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition
JPH04121794A (en) Speech recognizing method
CN113948088A (en) Voice recognition method and device based on waveform simulation
CN115862636B (en) Internet man-machine verification method based on voice recognition technology
JP2968976B2 (en) Voice recognition device
Coy et al. Soft harmonic masks for recognising speech in the presence of a competing speaker.
JPH04121799A (en) Speech recognizing method
JPH04163600A (en) Method of speaker recognition
Weißkirchen et al. Utilizing computer vision algorithms to detect and describe local features in images for emotion recognition from speech
JPH03122699A (en) Noise removing device and voice recognition device using same device
JPH0465399B2 (en)
JPH03230200A (en) Voice recognizing method
Zão et al. Noise robust speaker verification based on the MFCC and pH features fusion and multicondition training
JPS62211698A (en) Detection of voice section
JPH0415699A (en) Speaker recognition system
CN113571054A (en) Speech recognition signal preprocessing method, device, equipment and computer storage medium
JPH04163599A (en) Method of speaker recognition
JPH02302799A (en) Speech recognition system
JPS59181396A (en) Rematching speech recognition method
JPH0558560B2 (en)
JPS59124388A (en) Word speech recognition processing method
JPS62293299A (en) Voice recognition
JPH02273799A (en) Speaker recognition system