[go: up one dir, main page]

JPS63298298A - Voice section detecting system for voice recognition equipment - Google Patents

Voice section detecting system for voice recognition equipment

Info

Publication number
JPS63298298A
JPS63298298A JP62131679A JP13167987A JPS63298298A JP S63298298 A JPS63298298 A JP S63298298A JP 62131679 A JP62131679 A JP 62131679A JP 13167987 A JP13167987 A JP 13167987A JP S63298298 A JPS63298298 A JP S63298298A
Authority
JP
Japan
Prior art keywords
section
speech
voice
signal
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP62131679A
Other languages
Japanese (ja)
Inventor
松下 満次
勝美 高橋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oki Electric Industry Co Ltd
Original Assignee
Oki Electric Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oki Electric Industry Co Ltd filed Critical Oki Electric Industry Co Ltd
Priority to JP62131679A priority Critical patent/JPS63298298A/en
Publication of JPS63298298A publication Critical patent/JPS63298298A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 (産業上の利用分野) 本発明は、音声認識装置における音声区間検出方式に関
するものである。
DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a speech interval detection method in a speech recognition device.

(従来の技術) 近年、不特定話者を対象とした音声認識技術の発展に伴
ない、電話音声を認識することによりカタログ・ショッ
ピングや銀行残高照会等を行う音声認識装置が普及され
てきた。
(Prior Art) In recent years, with the development of voice recognition technology targeting unspecified speakers, voice recognition devices that perform catalog shopping, bank balance inquiries, etc. by recognizing telephone voices have become popular.

電話音声入力における音声区間検出方式と、音声入力ワ
ードプロセッサ等のマイク入力における音声区間検出方
式とはその方式を異にしているが、これは発声音声以外
の雑音の形態に起因する。
The method for detecting a voice section in telephone voice input is different from the method for detecting a voice section in microphone input of a voice input word processor, etc., and this is due to the form of noise other than voiced voice.

音声入力ワードプロセッサ等においては、音声入力はマ
イクで行なわれるが、通常これらには防風用のスポンジ
または網で包まれている。これは風切り音や口からの呼
気による音の雑音を防ぐ為である。しかしながら電話機
の場合は、この様な対策は施されず、また送話機を手で
直接持っている為に生ずる雑音等が含まれるので、音声
区間検出はマイク入力に比べてより高度の技術を要する
Voice input In word processors and the like, voice input is performed using a microphone, which is usually wrapped in a windproof sponge or net. This is to prevent wind noise and noise caused by exhalation from the mouth. However, in the case of a telephone, such measures are not taken, and since noise is generated by holding the transmitter directly in the hand, voice section detection requires more advanced technology than microphone input. .

また、電話音声の認識においては、話者が認識結果を直
接口で見て確認することが出来ないので合成音等による
認識結果の確認が不可欠である。しかし、確認時の音声
入力に対して認識結果を確認することは不合理であり、
確認の為の音声入力に対しては高い信頼の認識が必要で
あることは言うまでも無い。
Furthermore, in the recognition of telephone speech, since the speaker cannot directly check the recognition result by looking at it with his or her mouth, it is essential to check the recognition result using synthesized speech or the like. However, it is unreasonable to check the recognition results based on the voice input during confirmation.
It goes without saying that highly reliable recognition is required for voice input for confirmation.

したがって確認等に使用される制御語はなるべく誤認識
を生じにくい「はい」とか「いいえ」などの単語が選ば
れることが一般的である。
Therefore, the control words used for confirmation etc. are generally selected from words such as "yes" and "no" that are less likely to cause misrecognition.

(発明が解決しようとする問題点) しかしながら、従来の電話音声認識装置における音声区
間検出方式においては、高い認識率が必要とされる認識
時例えば、認識結果の確認等においても、通常の認識と
同様な音声区間検出を行っているために、通常の認識対
象語の場合と誤検出の発生状況は変らず、信頼性の高い
認識を行うことができないという問題点があった。
(Problems to be Solved by the Invention) However, in the speech section detection method in the conventional telephone speech recognition device, even in recognition that requires a high recognition rate, such as confirmation of recognition results, Since the same speech interval detection is performed, the occurrence of false detections is the same as in the case of normal recognition target words, and there is a problem that highly reliable recognition cannot be performed.

本発明は、以上述べた認識対象語に応じた音声区間検出
が出来ないという問題点を除去し、特に信頼性の高い認
識を必要とする認識対象語の認識率を向上させる音声区
間検出方式を提供することを目的とする。
The present invention eliminates the above-mentioned problem of not being able to detect speech segments according to recognition target words, and provides a speech interval detection method that improves the recognition rate of recognition target words that particularly require highly reliable recognition. The purpose is to provide.

(問題点を解決するための手段) 本発明は前記問題点を解決するため、予め受入れ用意さ
れている複数の認識対象語の音声信号を予定の順序指定
に従って受けて該音声信号の音声区間を検出し、該音声
区間の信号を標準パタン信号と照合して音声認識する音
声認識装置の音声区間検出方式において、各認識対象語
の音声信号について音声ブロック数の相違による特徴を
含み、各認識対象語相互に共通の特徴によって群分けさ
れた各認識対象語の音声信号を受入れて音声区間を検出
するための各群毎の音声区間検出部と、入力された音声
信号を前記各音声区間検出部に選択して振分ける音声区
間検出選択部とを備え、前記音声区間検出選択部は前記
予定の順序指定に基づいて選択指定される方式とした。
(Means for Solving the Problems) In order to solve the above-mentioned problems, the present invention receives audio signals of a plurality of recognition target words prepared for acceptance in advance in accordance with a predetermined order designation, and calculates the audio sections of the audio signals. In the speech section detection method of a speech recognition device, which performs speech recognition by comparing the signal of the speech section with a standard pattern signal, the speech signal of each recognition target word includes the characteristics due to the difference in the number of speech blocks, and the speech signal of each recognition target word is detected. A speech section detection section for each group receives the speech signals of each recognition target word divided into groups based on common features and detects the speech section, and a speech section detection section for each group detects the speech section. and a speech section detection and selection section that selects and distributes the speech sections, and the speech section detection and selection section is selected and designated based on the scheduled order designation.

(作 用) 本発明によれば、音声認識装置は予定の順序指定に従っ
た各音声信号を受け、音声区間検出選択部はその順序指
定に基づいた選択指定によって、各音声信号を受ける毎
に各音声区間検出部を選択し、当該各音声信号が、前記
選択された各音声区間検出部に送出されて、その音声区
間が検出される。
(Function) According to the present invention, the speech recognition device receives each speech signal according to the scheduled order designation, and the speech section detection and selection section receives each speech signal according to the selection designation based on the order designation. Each voice section detecting section is selected, and each voice signal is sent to each selected voice section detecting section, and the corresponding voice section is detected.

(実施例) 第1図は本発明の一実施例を示す電話音声認識装置の回
路のブロック図である。
(Embodiment) FIG. 1 is a block diagram of a circuit of a telephone voice recognition device showing an embodiment of the present invention.

同図において、1は電話音声認識装置である。In the figure, 1 is a telephone voice recognition device.

2は音声入力部で、電話回線3から話者による音声信号
を受ける。
Reference numeral 2 denotes an audio input unit which receives an audio signal from a speaker from the telephone line 3.

図示していないが、話者に対しては、音声入力するに当
ってこの電話音声認識装置1の上位装置から合成音によ
って順次ガイダンスが与えられ、話者はそのガイダンス
に従って暗唱番号の数字などの予め受入れ用意されてい
る認識対象語を音声によって入力し、これに対して確認
のために合成音によってその数字などが話者に伝えられ
て、話者は確認したことの「はい」または「いいえ」な
どの音声を入力するようにしている。
Although not shown in the figure, when inputting voice, the speaker is sequentially given guidance using synthesized voices from the host device of this telephone voice recognition device 1, and the speaker follows the guidance to input numbers such as the numbers of the code number. A recognition target word that has been prepared in advance is input by voice, and the number etc. is conveyed to the speaker using a synthesized voice for confirmation, and the speaker confirms it by saying "yes" or "no." ” etc. is input.

4.5.6はそれぞれ第1と第2と第・3の各音声区間
検出部で、音声入力部2が受けた各音声信号毎に後記す
る選択によってそのうちの1つあるいは複数の音声区間
検出部に受けて、当該各音声信号の音声区間を検出する
4.5.6 are first, second, and third voice section detection units, respectively, which detect one or more voice sections according to the selection described later for each voice signal received by the voice input unit 2. Then, the audio section of each audio signal is detected.

第2図は前記第1.第2.第3の各音声区間検出部4.
5.6の音声区間検出の説明図であり、音声信号Aの音
声区間を検出するに当って、音声ブロック始端用及び音
声ブロック終端用の各閾値LS、LEを予め各音声区間
検出部4.5.6毎に設定しておき閾値LSを始端決定
時間78以上継続して越えたならば、該閾値LSを最初
に越えた時点を音声ブロック始端とし、閾値LEを終端
決定時間TE以上継続して下廻ったならば、該閾値LE
を最初に下廻った時点を音声ブロック終端として、その
音声ブロックの長さ等から音声区間を決定する。
FIG. 2 shows the above-mentioned section 1. Second. Third each voice section detection unit 4.
5.6 is an explanatory diagram of voice section detection in section 5.6. In detecting the voice section of the voice signal A, the threshold values LS and LE for the voice block start end and the voice block end are set in advance in each voice zone detecting section 4. 5. If the threshold LS, which is set every 6 minutes, is exceeded continuously for a start end determination time of 78 or more, the first time the threshold LS is exceeded is set as the start of the audio block, and the threshold LE is continued for an end end determination time of TE. If the threshold value LE
The voice block is determined to be the end of the voice block at the time when the voice block first passes, and the voice section is determined from the length of the voice block.

音声信号は1つの音声ブロックによるものと複数の音声
ブロックによるものとがあり、また一般にノイズを伴う
The audio signal may consist of one audio block or multiple audio blocks, and generally includes noise.

第3図は各発声音などによる音声波形図で、その(1)
は発生音が「はい」の波形図、(2)は「いいえ」の波
形図、(3)は「いいえ」にノイズを伴っている場合の
波形図、(4)は「いち」の波形図である。音声認識す
るに当って、同図の(1) 、 (2)の波形について
はそれぞれ1つの音声ブロックによって音声区間が構成
されていることを検出し、そして(3)の波形について
はノイズの音声ブロックは除去された上で同様に1つの
音声ブロックによって音声区間が構成されていることを
検出し、(4)の波形については2つの音声ブロックに
よって音声区間が構成されていることを検出する必要が
ある。
Figure 3 is an audio waveform diagram of each vocalization, etc. (1)
is a waveform diagram when the generated sound is "yes", (2) is a waveform diagram when the generated sound is "no", (3) is a waveform diagram when "no" is accompanied by noise, and (4) is a waveform diagram when the sound is "ichi". It is. In speech recognition, it is detected that the waveforms (1) and (2) in the same figure each consist of a speech section made up of one speech block, and the waveform (3) is detected as noise speech. After removing the block, it is necessary to similarly detect that a speech section is composed of one speech block, and for the waveform (4), it is necessary to detect that a speech section is composed of two speech blocks. There is.

第1の音声区間検出部4は、例えば各認識対象語のうち
音声ブロックが、ただ一つから構成される群の音声信号
の音声区間を検出し、第2の音声区間検出部6は音声ブ
ロックが、1つ又は2つの場合の群について、そして第
3の音声区間検出部7は音声ブロックの数に制限のない
群についてそれぞれ音声信号の音声区間を検出する。
The first speech section detection section 4 detects the speech section of the speech signal of a group consisting of only one speech block among each recognition target word, for example, and the second speech section detection section 6 detects the speech section of the speech signal of a group consisting of only one speech block among each recognition target word. However, the third speech section detecting unit 7 detects the speech section of the speech signal for a group in which the number of speech blocks is one or two, and for a group in which there is no limit to the number of speech blocks.

7は分析部で、音声入力部2から音声信号を受け、該音
声信号を分析し、特徴をパラメータ化する。8は標準パ
タン部で、予め受入れ用意されている音声信号の各パタ
ンを記憶している。9は認識マツチング部で、各音声区
間検出部4,5.6の音声区間検出信号と、分析部7の
パラメータ化された信号とを受け、当該検出された音声
区間について前記パラメータ化された信号を標準パタン
部8のパタンと照合して一致したときはそのパタン信号
を送出する。10は制御部で、認識マツチング部9のマ
ツチング結果を受けて音声認識の判定を行い、認識結果
を上位装置に送出する。
Reference numeral 7 denotes an analysis section which receives the audio signal from the audio input section 2, analyzes the audio signal, and converts the features into parameters. Reference numeral 8 denotes a standard pattern section which stores each pattern of the audio signal prepared for acceptance in advance. Reference numeral 9 denotes a recognition matching section which receives the speech section detection signals from each speech section detection section 4, 5.6 and the parameterized signal from the analysis section 7, and calculates the parameterized signal for the detected speech section. is compared with the pattern in the standard pattern section 8, and if they match, the pattern signal is sent out. Reference numeral 10 denotes a control unit that receives the matching result of the recognition matching unit 9, makes a speech recognition determination, and sends the recognition result to the host device.

11は選択指定部で、前述したように話者に対しては上
位装置から音声入力するためのガイダンスが順次与えら
れるが、該選択指定部11は、順次人力される音声が、
各ガイダンス対応毎に特有な性質の各認識対象語群ある
いは各性質が混在する認識対象語群等に分類し得ること
から、音声入力部2に入力された音声信号を各ガイダン
スに対応させて適応の各音声区間検出部4,5.6に振
分けるべく選択指定信号を出力する。
Reference numeral 11 denotes a selection designation unit, and as mentioned above, the speaker is sequentially provided with guidance for voice input from the host device.
Since each guidance response can be classified into recognition target word groups with unique properties or recognition target word groups with mixed characteristics, the audio signal input to the audio input unit 2 can be adapted to correspond to each guidance. A selection designation signal is output to be distributed to each voice section detection section 4, 5.6.

12は音声区間検出選択部で、選択指定部11の選択指
定信号に基づいて各音声区間検出部4゜5.6のいずれ
か1つあるいは複数を選択して当該音声区間検出部にイ
ネーブル信号を送出する。
Reference numeral 12 denotes a voice section detection and selection section, which selects one or more of the voice section detection sections 4, 5, and 6 based on the selection designation signal of the selection specification section 11, and sends an enable signal to the voice section detection section. Send.

認識結果の確認の場合等における認識対象語は、通常、
「はい」または「いいえ」であり、第3図に示すように
、ノイズ等を除去すると音声ブロックは1つで構成され
るため、音声区間検出としては第1の音声区間検出部4
のみを選択すれば良い。
In the case of checking recognition results, etc., the recognition target word is usually
"Yes" or "No", and as shown in FIG. 3, if noise etc. are removed, the voice block consists of one voice block, so the first voice zone detection unit 4
You only need to select.

そして第1音声区間検出部4では音声ブロックを1つの
みに限定しているので、周囲雑音、呼気等により音声ブ
ロックが2つ以上検出された場合、その音声信号を無効
にし、あるいは音声波形に最も近いものを1つ選択すれ
ば良い。
Since the first voice section detection unit 4 limits the number of voice blocks to only one, if two or more voice blocks are detected due to ambient noise, exhalation, etc., the voice signal is invalidated or the voice waveform is Just select the one closest to you.

つぎに第1図の回路の動作を説明する。Next, the operation of the circuit shown in FIG. 1 will be explained.

選択指定部11は、上位装置(図示せず)が話者に対し
て音声入力指示のガイダンスを行う毎に、その対応の音
声入力の認識動作に先立ち、音声区間検出選択部12に
対して、各音声区間検出部4゜5.6を指定する選択指
定信号を送出する。音声区間検出選択部12は、各音声
区間検出部4,5゜6の1つまたは複数に対して音声区
間検出イネーブル信号を送出する。電話回線3より入力
された音声信号は音声入力部2を通り各音声区間検出部
4.5.6に入る。第1音声区間検出部4が選択された
場合は、第1音声区間検出部4の音声区間検出結果のみ
が、認識マツチング部9に入力される。一方、音声入力
部2からの音声入力は分析部7によって分析され、特徴
がパラメータ化されて認識マツチング部9に入力される
。認識マツチング部9は、先に得られた音声区間検出結
果をもとに、標準パターン部8のパターンとのマツチン
グ処理を行い、制御部10にマツチング結果を送出する
。制御部10はそのマツチング結果より認識判定を行な
い、認識結果を上位装置に送出する。
The selection designation unit 11 instructs the voice segment detection and selection unit 12, each time a host device (not shown) provides voice input instruction guidance to the speaker, prior to the recognition operation of the corresponding voice input. A selection designation signal is sent out to designate each voice section detection section 4.5.6. The voice section detection and selection section 12 sends out a voice section detection enable signal to one or more of the voice section detection sections 4, 5, and 6. A voice signal input from the telephone line 3 passes through the voice input section 2 and enters each voice section detection section 4.5.6. When the first speech section detection section 4 is selected, only the speech section detection result of the first speech section detection section 4 is input to the recognition matching section 9. On the other hand, the voice input from the voice input section 2 is analyzed by the analysis section 7, and the features are converted into parameters and input to the recognition matching section 9. The recognition matching section 9 performs matching processing with the pattern of the standard pattern section 8 based on the voice section detection result obtained previously, and sends the matching result to the control section 10. The control unit 10 performs recognition determination based on the matching results, and sends the recognition results to the host device.

(発明の効果) 以上説明したように本発明によれば、認識対象語の特徴
を群分けして各群に対応させてそれぞれの音声区間検出
を選択するようにしたので、特に高い信頼性が要求され
る確認のための認識対象語などについて適正な音声区間
検出を選択することにより、その音声認識の精度の向上
が期待できる。
(Effects of the Invention) As explained above, according to the present invention, the characteristics of the recognition target word are divided into groups and the speech section detection is selected for each group, so that particularly high reliability can be achieved. By selecting appropriate speech segment detection for recognition target words for required confirmation, it is expected that the accuracy of speech recognition will improve.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の実施例を示す電話音声認識装置の回路
のブロック図、第2図は音声区間検出の説明図、第3図
は各発生音の波形図である。 1・・・電話音声認識装置 4.5.6・・・音声区間検出部 11・・・選択指定部 12・・・音声区間検出選択部
FIG. 1 is a block diagram of a circuit of a telephone speech recognition device showing an embodiment of the present invention, FIG. 2 is an explanatory diagram of voice section detection, and FIG. 3 is a waveform diagram of each generated sound. 1... Telephone speech recognition device 4.5.6... Voice section detection section 11... Selection specification section 12... Voice section detection selection section

Claims (1)

【特許請求の範囲】 予め受入れ用意されている複数の認識対象語の音声信号
を予定の順序指定に従って受けて該音声信号の音声区間
を検出し、該音声区間の信号を標準パタン信号と照合し
て音声認識する音声認識装置の音声区間検出方式におい
て、 各認識対象語の音声信号について音声ブロック数の相違
による特徴を含み、各認識対象語相互に共通の特徴によ
って群分けされた各認識対象語の音声信号を受入れて音
声区間を検出するための各群毎の音声区間検出部と、 入力された音声信号を前記各音声区間検出部に選択して
振分ける音声区間検出選択部とを備え、前記音声区間検
出選択部は前記予定の順序指定に基づいて選択指定され
る ことを特徴とする音声認識装置の音声区間検出方式。
[Claims] A speech signal of a plurality of recognition target words prepared for acceptance in advance is received in accordance with a predetermined order designation, a speech section of the speech signal is detected, and the signal of the speech section is compared with a standard pattern signal. In the speech segment detection method of a speech recognition device, each recognition target word is divided into groups based on features common to each recognition target word, including features due to differences in the number of audio blocks for the audio signal of each recognition target word. a voice section detection section for each group for receiving the voice signal and detecting the voice section; and a voice section detection and selection section for selecting and distributing the input voice signal to each of the voice section detection sections, A speech segment detection method for a speech recognition device, wherein the speech segment detection and selection unit is selectively designated based on the scheduled order designation.
JP62131679A 1987-05-29 1987-05-29 Voice section detecting system for voice recognition equipment Pending JPS63298298A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP62131679A JPS63298298A (en) 1987-05-29 1987-05-29 Voice section detecting system for voice recognition equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP62131679A JPS63298298A (en) 1987-05-29 1987-05-29 Voice section detecting system for voice recognition equipment

Publications (1)

Publication Number Publication Date
JPS63298298A true JPS63298298A (en) 1988-12-06

Family

ID=15063686

Family Applications (1)

Application Number Title Priority Date Filing Date
JP62131679A Pending JPS63298298A (en) 1987-05-29 1987-05-29 Voice section detecting system for voice recognition equipment

Country Status (1)

Country Link
JP (1) JPS63298298A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4810044B2 (en) * 2000-01-27 2011-11-09 ニュアンス コミュニケーションズ オーストリア ゲーエムベーハー Voice detection device with two switch-off criteria

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4810044B2 (en) * 2000-01-27 2011-11-09 ニュアンス コミュニケーションズ オーストリア ゲーエムベーハー Voice detection device with two switch-off criteria

Similar Documents

Publication Publication Date Title
US4811399A (en) Apparatus and method for automatic speech recognition
US7672844B2 (en) Voice processing apparatus
EP0121248A1 (en) Speaker verification system and process
CN108877783A (en) The method and apparatus for determining the audio types of audio data
JP2996019B2 (en) Voice recognition device
JPS63298298A (en) Voice section detecting system for voice recognition equipment
JP3008593B2 (en) Voice recognition device
JP3360978B2 (en) Voice recognition device
JP2000148187A (en) Speaker recognizing method, device using the method and program recording medium therefor
JPS58130394A (en) Voice recognition equipment
JPH0316038B2 (en)
JP2712704B2 (en) Signal processing device
KR100349656B1 (en) Apparatus and method for speech detection using multiple sub-detection system
JP2599974B2 (en) Voice detection method
JPS60205600A (en) Voice recognition equipment
JPH06110491A (en) Speech recognition device
JP2844592B2 (en) Discrete word speech recognition device
CA1279403C (en) Computer communication system
JPH0950292A (en) Voice recognition device
JPS6227398B2 (en)
JPS5946698A (en) Voice recognition system
JPS59111698A (en) Voice recognition system
JPH02272495A (en) Voice recognizing device
JPH0285898A (en) Voice detecting system
JPS59111697A (en) Voice recognition method