JP2570395B2

JP2570395B2 - Sound recognition device

Info

Publication number: JP2570395B2
Application number: JP63173387A
Authority: JP
Inventors: 哲也中村; 雅之高見; 一郎赤堀
Original assignee: 日本電装株式会社
Priority date: 1988-07-12
Filing date: 1988-07-12
Publication date: 1997-01-08
Anticipated expiration: 2012-01-08
Also published as: JPH0222799A

Description

DETAILED DESCRIPTION OF THE INVENTION [Industrial applications]

本発明は、周囲音の中から、特別な音を認識する音響
認識装置に関する。特に、自動車等に載置され、車室外
の周囲音の中から緊急自動車の警報音等を認識して、運
転者に報知させるための装置に利用される。The present invention relates to a sound recognition device that recognizes a special sound from ambient sounds. In particular, it is used in a device that is mounted on a car or the like and recognizes an alarm sound of an emergency car or the like from ambient sounds outside the vehicle compartment and notifies the driver of the alarm sound.

[Prior art]

従来、自動車に搭載され、車室外の周囲音から緊急自
動車、踏切の警報器等の警報音を認識して運転車に報知
する音響認識装置として、次のものが知られている。第１は、検出音のスペクトルに警報音の特徴となる基
準周波数が含まれるか否かで判定する装置である。具体
的には、認識すべき警報音の特徴となる２〜３倍の基準
周波数を中心周波数とするバントパスフィルタにより、
音響信号の中から、特徴となる２〜３の周波数成分を抽
出して、その成分の大きさが所定のしきい値を越える場
合に、検出音の中に警報音が存在するとするものである
（特開昭58−221500号公報）。第２は、検出音の周期、即ち、ピッチを測定し、各ピ
ッチの比率を所定の警報音のピッチと照合したり、ピッ
チの安定度を用いて緊急自動車の警報音を認識するもの
である（特開昭60−219521号公報）。第３は、認識すべき警報音のスペクトルの固定された
ピーク周波数をそれぞれ通過周波数とする複数のバント
パスフィルタを用いて、検出音からその警報音の複数の
ピーク周波数成分を抽出して、その複数のピーク周波数
成分の時間的変化形状を予め登録された警報音のピーク
周波数成分の時間的変化形状と照合することにより、警
報音を認識するものである（特開昭62−175238号公
報）。2. Description of the Related Art Conventionally, the following is known as an acoustic recognition device that is mounted on a vehicle and recognizes an alarm sound of an emergency vehicle, a level crossing alarm, or the like from an ambient sound outside the vehicle compartment and notifies the driving vehicle of the alarm sound. The first is an apparatus that determines whether or not the spectrum of the detected sound includes a reference frequency that is a feature of the alarm sound. Specifically, a band-pass filter having a center frequency of a reference frequency of 2 to 3 times, which is a feature of the alarm sound to be recognized,
Two or three characteristic frequency components are extracted from the acoustic signal, and if the magnitude of the component exceeds a predetermined threshold value, it is assumed that an alarm sound is present in the detected sound. (JP-A-58-221500). The second is to measure the period of the detected sound, that is, the pitch, check the ratio of each pitch with the pitch of a predetermined alarm sound, or recognize the alarm sound of an emergency vehicle using the pitch stability. (JP-A-60-219521). Third, a plurality of peak frequency components of the alarm sound are extracted from the detected sound using a plurality of band-pass filters each having a fixed peak frequency of the spectrum of the alarm sound to be recognized as a pass frequency. The alarm sound is recognized by comparing the temporal change shape of a plurality of peak frequency components with the temporal change shape of the peak frequency component of the alarm sound registered in advance (Japanese Patent Laid-Open No. Sho 62-175238). .

[Problems to be solved by the invention]

しかしながら、上記の第１の装置は、各時刻における
検出音の２〜３の固定された周波数成分の絶対値に注目
して判定しているので、警報音発音体の固体差、雑音、
ドップラー効果により検出音のスペクトル形状が変化す
ると、認識できないという問題がある。又、第２の装置では、他の音、例えば、自動車の走行
音、風切音、他車の走行音等が混入すると、ピッチ周期
が大きく影響されるため、警報音の認識が容易でない。又、第３の装置は、検出音のスペクトル形状のピーク
周波数成分を抽出するにあたり、予め各警報音に対応し
て過剰帯域が固定的に設定された複数のバントパスフィ
ルタを用いているため、上記の理由により周波数シフト
した警報音が検出音に含まれる場合には、シフトしたピ
ーク周波数成分が検出されないという問題がある。又、
各警報音毎にピーク周波数を検出するバンドパスフィル
タが多数必要となり装置が複雑化する。更に、抽出され
た周波数成分の時間的変化形状を予め登録された警報音
のピーク周波数の時間的変化形状と直接照合するため、
照合対象のデータが多く必要となると共に演算時間が長
くかかり、警報音の認識に時間がかかるという問題もあ
る。又、特に、ピーク周波数が時間に対して変化するよう
な音源、例えば、消防自動車のサイレンに対しては、上
記装置の抽出周波数が固定されているため、ピーク周波
数の変化をとらえることができず、正確な認識ができな
い。ところで、人間の音の識別能力は、周波数や音圧の絶
対値には鈍感であるが、周波数や音圧の相対変化に対し
ては敏感である。そこで、従来の装置が、周波数や音圧
の絶対値に注目して所定音の識別を行っているのに対し
て、本発明では、このような人間の音の識別能力の特性
に注目して、ピーク周波数やその振幅の時間的な相対変
化を特徴量として抽出して、音響認識を行うものであ
る。However, since the above-mentioned first device makes the determination by paying attention to the absolute values of a few fixed frequency components of the detected sound at each time, individual differences in the alarm sounding body, noise,
If the spectrum shape of the detected sound changes due to the Doppler effect, there is a problem that it cannot be recognized. Further, in the second device, when other sounds, for example, a running sound of a vehicle, a wind noise, a running sound of another vehicle, and the like are mixed, the pitch period is greatly affected, and it is not easy to recognize the alarm sound. In extracting the peak frequency component of the spectrum shape of the detected sound, the third device uses a plurality of bunt-pass filters in which the excess band is fixedly set in advance corresponding to each alarm sound. When the alarm sound shifted in frequency is included in the detection sound for the above reason, there is a problem that the shifted peak frequency component is not detected. or,
A large number of bandpass filters for detecting the peak frequency are required for each alarm sound, and the device becomes complicated. Furthermore, in order to directly match the temporal change shape of the extracted frequency component with the temporal change shape of the peak frequency of the pre-registered alarm sound,
There is also a problem that a large amount of data to be collated is required, and a long calculation time is required, and it takes time to recognize the alarm sound. In particular, for a sound source whose peak frequency changes with time, for example, for a siren of a fire engine, the change in the peak frequency cannot be detected because the extraction frequency of the above device is fixed. , Can not be accurate recognition. By the way, the ability to discriminate human sound is insensitive to the absolute values of frequency and sound pressure, but sensitive to relative changes in frequency and sound pressure. Therefore, while the conventional device performs the discrimination of the predetermined sound by focusing on the absolute value of the frequency and the sound pressure, the present invention focuses on such characteristics of the discrimination ability of human sound. , A temporal relative change in the peak frequency and its amplitude is extracted as a feature value, and the sound is recognized.

[Means for Solving the Problems]

発明を構成する手段は、第１図に示すように、音響電
気変換器M1、周波数分析手段M2、ピーク抽出手段M3、群
別手段M4、特徴量抽出手段M5、識別手段M6、基準特徴量
記憶手段M7とから成る。上記構成手段はハードウエア又はソフトウエアで構成
される手段により達成され、周波数分析手段M2は、マイ
クロホン等の音響電気変換器M1から出力される時間を変
数とする音響信号の各時刻での周波数特性を求める手段
である。具体的には、帯域可変の通過帯域の急峻なディ
ジタルバンドパスフィルタを用いて周波数をスキャンし
ながら、周波数成分を求めたり、音響信号をフーリエ変
換するフーリエ変換器を用いることができる。ピーク抽出手段M3は、周波数特性からピークを抽出す
る手段である。具体的は、周波数特性において、差分演
算により周波数に関して微分する微分器で構成できる。群別手段M4は、ピーク抽出手段M3の出力するピークの
時間列から、ピークの時間に関する連続性を判定して、
ピークを連続するピーク群毎に群別する手段である。具
体的には、ピークの存在する周波数区分が時間に関して
連続するか否かによりピークを連続するピーク毎にグル
ープ化する手段であり、良く知られたように、画像処理
における線分抽出器と類似の手段で構成できる。特徴量抽出手段M5は、群別されたピーク群の時間的変
化に伴う特性形状の特徴量を抽出する手段である。具体
的には、各ピーク群毎に、又、各ピーク群の部分形状毎
に、周波数の増加量又は減少量、振幅の増加量又は減少
量、その部分形状の継続時間等の特徴量を抽出する手段
である。基準特徴量記憶手段M7は、検出音の特徴量に対応し
て、認識すべき所定者の特徴量を基準特徴量として記憶
する手段である。識別手段M6は、特徴量抽出手段M5により抽出された特
徴量と基準特徴量記憶手段M7に記憶された基準特徴量と
に基づいて、所定音を識別する手段である。As shown in FIG. 1, the means constituting the invention include an acoustoelectric converter M1, a frequency analyzing means M2, a peak extracting means M3, a grouping means M4, a feature extracting means M5, a discriminating means M6, and a reference feature storing. Means M7. The above-mentioned configuration means is achieved by means configured by hardware or software, and the frequency analysis means M2 comprises a frequency characteristic at each time of an acoustic signal whose time is output from an acoustoelectric transducer M1 such as a microphone as a variable. It is a means to ask for. Specifically, it is possible to obtain a frequency component while scanning the frequency using a digital bandpass filter having a steeply variable pass band, or to use a Fourier transformer that performs a Fourier transform on the acoustic signal. The peak extracting means M3 is a means for extracting a peak from the frequency characteristics. More specifically, the frequency characteristic can be constituted by a differentiator that differentiates with respect to the frequency by a difference operation. The grouping means M4 determines the continuity of the peak time from the peak time sequence output from the peak extracting means M3,
This is means for classifying peaks into continuous peak groups. Specifically, it is means for grouping peaks into consecutive peaks depending on whether or not the frequency divisions in which the peaks exist are continuous with respect to time. As is well known, similar to a line segment extractor in image processing, Means. The feature amount extracting unit M5 is a unit that extracts a feature amount of a characteristic shape associated with a temporal change of the peak group classified. Specifically, for each peak group, and for each partial shape of each peak group, the characteristic amount such as the amount of increase or decrease in frequency, the amount of increase or decrease in amplitude, and the duration of the partial shape is extracted. It is a means to do. The reference feature storage unit M7 is a unit that stores the feature of a predetermined person to be recognized as the reference feature in correspondence with the feature of the detected sound. The identification unit M6 is a unit that identifies a predetermined sound based on the feature amount extracted by the feature amount extraction unit M5 and the reference feature amount stored in the reference feature amount storage unit M7.

[Action]

検出音は音響電気変化器M1により電気信号である音響
信号に変換され、その音響信号は周波数分析手段M2に入
力し、その周波数分析手段M2により各時刻での周波数特
性が求められる。各時刻毎に与えられる周波数特性は、
ピーク抽出手段M3に入力し、そのピーク抽出手段M3によ
り各時刻における周波数特性のピークが抽出される。そ
のピーク情報は、群別手段M4に入力し、その群別手段M4
により、時間的に連続するピーク毎に群別される。その
群別されたピーク群の情報は、特徴量抽出手段M5に入力
し、その特徴量抽出手段M5により、連続するピーク群の
特性形状の特徴量が抽出される。そして、その特徴量の
情報は識別手段M6に入力し、その特徴量と基準特徴量記
憶手段M7に記憶されている認識すべき音響信号の基準特
徴量とに基づいて所定音が識別される。The detected sound is converted into an acoustic signal, which is an electric signal, by the acoustic-electrical changer M1, and the acoustic signal is input to the frequency analyzing means M2, and the frequency characteristic at each time is obtained by the frequency analyzing means M2. The frequency characteristic given at each time is
The signal is input to the peak extracting means M3, and the peak of the frequency characteristic at each time is extracted by the peak extracting means M3. The peak information is input to the grouping means M4, and the grouping means M4
Thus, the data is grouped for each temporally continuous peak. The information on the group of peak groups is input to the feature amount extracting unit M5, and the feature amount extracting unit M5 extracts the feature amount of the characteristic shape of the continuous peak group. Then, the information of the characteristic amount is input to the identification unit M6, and the predetermined sound is identified based on the characteristic amount and the reference characteristic amount of the acoustic signal to be recognized stored in the reference characteristic amount storage unit M7.

【Example】

以下、本発明を具体的な実施例に基づいて説明する。第２図は、実施例装置の構成を示すブロック図であ
る。本実施例の車両用警報装置２は、車両外部の音を収音
するマイクロホン４からの音響信号を所定時間サンプリ
ングして記憶する音響信号入力部６と、そのサンプリン
グされた音響信号を分析するために所定の演算処理を高
速で行う高速演算処理部８と、音響信号入力部６でサン
プリングされた音響信号を高速演算処理部８に入力して
所定の演算処理を実行させると共に、その演算結果に基
づきマイクロホン４により収音された外部音から緊急自
動車や踏切の遮断機等からの種々の警報音を識別し、そ
の識別結果を表す識別信号を出力する警報音識別部10
と、警報音識別部10からの識別信号に応じて車両室内に
設けられた警報器72に制御信号を出力し、車両運転者に
各種警報音の識別結果を報知する出力部74と、警報音識
別部10の識別結果を車両制御装置76に送信し、その識別
結果に応じた車両制御を実行させる送信部78と、から構
成されている。ここで、音響信号入力部６において、まずマイクロホ
ン４からの音響信号が前処理回路20に入力され、前処理
回路20を通過した音響信号がA/Dコンバータ22でA/D変換
される。前処理回路20は、A/Dコンバータ22でA/D変換を
良好に実行できるように音響信号を処理するための回路
で、音響信号を増幅する増幅器、アンチ・エイリアシン
グ・フィルタ、サンプルホールド回路等が備えられてい
る。また、A/Dコンバータ22は、コントロール回路24に
より制御され、所定のサンプリング周期で自動的に音響
信号をA/D変換し、その結果をRAMa26またはRAMb28に格
納する。コントロール回路24は、まずA/Dコンバータ22
の出力をスイッチ回路30を介してRAMa26に接続して、A/
D変換結果をRAMa26に順次格納させ、その後RAMa26の記
憶領域が一杯になった時点で警報音識別部10のCPU40に
その旨を表す格納信号を出力し、それと同時にスイッチ
回路30を切り替え、A/Dコンバータ22の出力をRAMb28に
接続し、その後A/Dコンバータ22によりA/D変換データを
RAMb28に順次格納させる、といった手順で、A/Dコンバ
ータ22によるA/D変換結果を、RAMb26,RAMb28に交互に記
憶させる。これによって、A/Dコンバータ22の動作を停止するこ
となく警報音識別部10側でA/D変換データを読み取るこ
とができるようになる。尚、スイッチ回路30は実際にはTTLやCMOSのロジック
回路で構成されている。また上記A/Dコンバータ22によ
るA/D変換結果を記憶するRAMa26,RAMb28には、認識すべ
き警報音の包絡線の周期の２倍以上の時間継続してA/D
変換結果を記憶できる容量のものが使用される。次に、警報音識別部10は、CPU40、ROM42、及びRAM44
により構成され、ROM42に格納された後述の制御プログ
ラムにそって後述の警報音認識処理を実行する。尚、警報音認識処理は、コントロール回路24を介して
A/Dコンバータ22のコントロールを行うと共に、RAMa26
又はRAMb28に格納されたA/D変換データを高速演算処理
部８に出力して所定の演算処理を実行させ、その演算結
果に基づき、マイクロホン４により収音された外部音の
中から、緊急自動車の警報音，横断歩道の警報音，踏切
における遮断機からの警報音，他の自動車からの警報音
（即ちクラクション音）等を識別する処理である。ま
た、この処理を実行するため、ROM42内には、上記識別
すべき種々の警報音を周波数分析して得られる周波数特
性のピークの時間的変化に伴う特性形状から抽出された
基準特徴量が予め記憶されている。次に、高速演算処理部８は、警報音識別部10からの入
力データに基づき、音響信号入力部６を介して入力され
た音響信号を高速で処理して音響信号の各時刻での周波
数特性を得るためのもので、大量の数値演算を高速に処
理するためのDSP（ディジタル・シグナル・プロセッ
サ）50,警報音識別部10からの入力データや演算処理後
のデータを記憶するためのRAM52、高速演算実行のため
の制御プログラムが予め記憶されたROM54、及び警報音
識別部10からの演算指令に応じてDSP50の動作の実行或
いは停止をコントロールするコントロール回路56により
構成されている。このため、警報音識別部10のCPU40は、コントロール
回路56によりDSP50の演算処理を停止しておき、処理し
たいデータをRAM52に転送し、この後コントロール回路5
6を介してDSP50の演算処理を実行させることで、所望の
演算処理を必要に応じて実行させることができる。次に、出力部74は警報音識別部10による警報音の識別
結果により運転者に警報を発しなければならないときに
警報器72に信号を与えるためのもので、警報器72はスピ
ーカや警報ランプ或いは表示器等により警報の有無や種
類を運転者に報知する。また、送信部78は、警報音の認識結果を、当該車両の
走行制御を行う車両制御装置76に転送するためのもの
で、車両制御装置76側では、この情報を制御の一入力要
素として利用する。つまり、例えば、車両制御装置76が
エンジン制御装置であれば、遮断機からの警報音が認識
された場合に車両を減速させるとか、或いは、車両制御
装置76がステアリング制御装置であれば、自動走行車両
において緊急車両からの警報音が認識された場合に車両
を路肩に寄せるといった制御を実行させることができ
る。以上が本装置のハードウエア上の構成であるが、音響
電気変換器M1はマイクロホン４で構成され、周波数分析
手段M2は主構成の高速演算処理部８と高速演算処理部８
に対する入力データの加工と制御を行う副構成としての
音響信号入力部６と警報音識別部10で構成され、ピーク
抽出手段M3,群別手段M4,特徴量抽出手段M5,識別手段M6
は警報音識別部10で構成され、基準特徴量記憶手段M7は
警報音識別部10のROM42で構成される。次に、車両用警報装置２の作用を、警報音認識部10の
CPU40の処理手順を示した第３図，第４図及びデータ処
理を図示した第５図〜第12図の説明図を参照して説明す
る。第３図に示す如く、警報音認識処理が開始されると、
まずステップ100でメモリや各種周辺素子のイニシャラ
イズを行う初期化の処理を実行し、ステップ110に移行
する。ステップ110では、音響信号入力部６の動作を開始す
べく、コントロール回路24に駆動信号を出力してA/Dコ
ンバータ22のA/D変換動作をスタートさせる。すると、
前述したように音響信号入力部６においては、マイクロ
ホン４を介して入力された音響信号をA/Dコンバータ22
により所定のサンプリング周期でA/D変換し、そのA/D変
換データをまずRAMa26に順次格納し、RAMa26が一杯にな
った時に、即ち、警報音の認識に必要な時間分のデータ
が得られた時にコントロール回路24からCPU40に格納信
号を出力する。そして、その後のA/Dコンバータ22によ
るサンプリングデータは、RAMb28に順次格納される。こ
のような手順でマイクロホン４からの音響信号は、バッ
ファメモリとして機能するRAMa26、RAMb28に順次格納さ
れ、RAMa26又はRAMb28が一杯になったところでその都度
CPU40へ格納信号が入力される。そして、この格納信号
に同期して、周波数特性の一定時間内の時間変化特性が
求められる。このため、続くステップ120では、上記コントロール
回路24からの格納信号が入力されるのを待ち、格納信号
が入力され、RAMa26又はRAMb28が一杯になると、ステッ
プ130に移行して、そのA/D変換データをRAMa26又はRAMb
28からの読み込み、警報音識別部10のRAM44に一旦格納
する。続くステップ140では、上記格納したA/D変換データに
基づき周波数分析処理を実行する。この周波数分析処理は第４図に示す如く実行される。第４図に示す如く、周波数分析処理においては、まず
ステップ300で、上記RAM44に格納したA/D変換データを
高速演算処理部８のRAM52に転送し、続くステップ310で
コントロール回路56を介してDSP50の周波数分析に関す
るプログラムであるフィルタ処理を起動する。すると、DSP50はROM54に記憶されているプログラム手
順に従い、RAM52に格納された一定時間のA/D変換データ
から特定の周波数成分の振幅（パワー）のその一定時間
における時間変化特性を演算するフィルタ処理を実行
し、その処理結果をRAM52の空いている領域に格納し
て、プログラムが終了したことをコントロール回路56を
通してCPU40に知らせる。このため、続くステップ320では、そのプログラムの
終了信号が入力されるのを待ち、終了信号が入力される
とステップ330に移行して、RAM52からフィルタ処理結果
の１周波数成分の時間変化特性データを読み出し、RAM4
4の空領域に格納する。そして、続くステップ340で、予
め設定されている全抽出周波数に対してフィルタ処理が
終了したかどうかを判断し、終了していなければ再度ス
テップ310に移行して、再びフィルタ処理のプログラム
の起動を行う。 DSP50はフィルタ処理が起動される度に抽出周波数を
少しずつ変更して全周波数範囲について周波数分析処理
を実行する。その結果、同一時間内における各周波数成
分の時間変化特性データがRAM44に格納される。つまり、例えば、第５図に示す音響信号は音響信号入
力部６により所定周期で一定時間サンプリングされ、DS
P50により周波数分析されて、第６図に示すように、一
定時間内での周波数特性の時間変化特性が得られる。こ
の一定時間は、警報音の認識に必要な時間、即ち、警報
音の包絡線の周期の２倍程度の時間である。尚、DSP50
の出力する周波数分析データの時間間隔はサンプリング
周期に等しいのであるが、第６図に示すデータは、一定
の時間幅での平均値をその時刻での周波数分析データす
るように、CPU40によりデータの平均加工が施されてい
る。次に、CPU40は第３図のステップ150へ実行を移し、第
６図の如きデータからピークの情報を取り出し、第７図
の如きデータを作成する。即ち、ある時刻における周波
数特性が第11図に示す特性とすると、周波数に関して微
分演算（実際には差分演算）を行い、極大値、即ち、ピ
ークが周波数と振幅の組データ（f₀,P₀），（f₂,P₂）と
して抽出される。このような処理は、各時刻t₁,t₂,t₂…
t_nでの各周波数特性に関して実行され、結局第７図に示
すようなピークデータが得られる。次に、CPU40はステップ160へ実行を移し、第７図のよ
うにして抽出されたピークの連続性の判定を行う。これ
は任意時刻t_iの周波数特性におけるピーク周波数f_iが１
つ前の時刻t_i-1における周波数特性のピーク周波数f_i-1
と一定幅で接近している場合には、その抽出ピークf
_iを、連続線Ｂの延長として群別化する。又、その逆
に、ピーク周波数f_iとピーク周波数f_i-1が一定幅で接近
していない場合には、そのピークf_iを新たな連続線の開
始点として群別化する。このような処理が、各時刻の周
波数特性のピークに関して実行されることにより、第８
図に示すように、抽出ピークは連続線Ｂと連続線Ｃ等に
群別化される。尚、実際の道路環境においてノイズや遮
音物体の通過等により、本来抽出されるべきピークが抽
出されない場合がある。このような場合、その前後のピ
ーク間に無音状態が発生し連続音として検出されなくな
るため、時間に関する連続性の判定を緩和して、一定時
間前のピーク周波数と連続する場合にも連続と判定する
ようにしても良い。次に、ステップ170及びステップ180では、群別化され
たピーク群は、時間変化に伴う特性形状の共通の特性量
毎にブロック化される。そのうち、ステップ170では、
第８図のように連続線Ｂと連続線Ｃ等に群別化されたピ
ークデータの周波数の時間変化に注目し、ブロック分け
を行う。即ち、第９図に示すように、連続線Ｂのピーク
列は周波数の時間変化という観点からは、周波数が上昇
しているブロックＸと、周波数が下降しているブロック
Ｙとに分割することができる。一方、連続線Ｃのピーク
列は、全体に周波数が一定であるからそのままひとつの
ブロックＺとして判定される。次に、ステップ180では、振幅の時間変化に注目した
ブロック分割を更に行う。第９図のブロックＸは振幅一
定、ブロックＹは振幅一定、ブロックＺは振幅減衰とし
て判断される。この場合これ以上のブロック分割は行わ
れない。尚、第12図に示すような例では、前記ステップ
170における周波数の時間変化に注目したブロック分割
では、全体が周波数一定でありひとつのブロックとして
判定されるが、このステップ180においては振幅の時間
変化に注目しているので、ブロックＶとブロックＷとの
２ブロックに分割され、どちらかのブロックも振幅減衰
と判定される。次に、ステップ190において、上記のように分割され
た各ブロックX,Y,Zの特徴量が次の形式にてRAM44に記憶
される。｛ブロックの始まり時刻，ブロックの終了時刻，振幅の
時間変化形態，ブロックの始まり周波数，ブロックの終
了周波数，周波数の時間変化形態｝従って、第10のデータにおいては、具体的に、Ｘ＝｛T₀,T₁,一定,f₀,f₁,上昇｝Ｙ＝｛T₁,T₂,一定,f₁,f₀,下降｝Ｚ＝｛T₃,T₄,下降,f₂,f₂,一定｝となる。以上のようにして、マイクロホン４から入力された音
の周波数の時間変化と振幅の時間変化に注目した特徴量
がブロック毎に抽出されたことになる。次に、プログラムはステップ200へ進み、ステップ190
で得られた特徴量の中に、認識対象者の基準特徴量と一
致するものがあるかどうかを判定する。各認識対象音は
上記のブロック化と同様に基準特徴量毎に基準ブロック
に分割されている。そして、各認識対象音毎に構成され
る基準ブロック名と各基準ブロックの基準特徴量が上記
と同様な形式にて、ROM42内に記憶されている。例えば、踏切の遮断機の警報音は、２つの基準ブロッ
クα，βに分けられる。そして、各基準ブロックの基準
特徴量は、 α＝｛0,t_e±Δ₁,下降， f_j±Δ₂,f_j±Δ₃,一定｝ β＝｛０＋Δ₄,t_f±Δ₅,下降， ±Δ₆,±Δ₇,一定｝となる。尚、実際の踏切の警報音において、基準特徴量αとβ
は、発音体固有の一定の関係で関係付けられているの
で、基準特徴量βは基準特徴量αからの許容偏差で定義
されている。即ち、基準特徴量βに対応する第２音は、
基準特徴量αに対応する第１音に対して、開始時刻が第
１恩終了後Δ_４以内とし、周波数は第１音の周波数に対
して±Δ_６又は±_７の範囲を許容範囲としている。この
ように、第１音と第２音の周波数の相対差Δ₆,Δ_７や、
周波数と振幅の時間変化形状、第１音と第２音の発音時
間t_e,t_fが限定されているため第１音と第２音の周波数
の絶対値f_jの許容範囲Δ₂,Δ_３は、非常に大きな値もし
くは無限大（つまり周波数の絶対値の限定なし）にして
も、他の音を踏切警報音と誤認識することはなく、発音
体の固体差による周波数のバラツキや、ドップラー効果
による周波数のシフトに影響されることなく認識が可能
となる。また、救急車のサイレンの場合も「ピーポーピ
ーポー」の「ピー」と「ポー」に相当する２つの基準ブ
ロックの相対関係を限定すれば、踏切の場合と同様認識
が可能となる。上記のように抽出された検出音の特徴量は、次のよう
にして、基準特徴量と比較される。特徴量と基準特徴量との間で、それぞれ、振幅の時間
変化形態と周波数の時間変化形態とが共に一致するか否
かの判定により、対応する基準特徴量が選別される。そ
して、選別された基準特徴量と特徴量との間で、継続時
間が許容範囲に存在するか否か、周波数変化量が許容範
囲に存在するか否か、開始時の周波数や終了時の周波数
が許容範囲に存在するか否かの判定が行われる。こうし
て、全ての条件を満たす基準特徴量が選別される。次に、群別化された１つのピーク群が複数のブロック
で構成される時や２つのブロックが接近している時に
は、そのブロック間の関係が基準ブロック間の関係と等
しくなるか否かが判定される。両者の関係が等しくなる
時に、最終的に、基準ブロックで構成される警報音と認
識される。具体的には、第９図の連続線分Ｂは、周波数が時間と
共に変化する消防自動車のサイレン音、第12図の連続線
分Ｄは周波数不変で振幅が減衰し２本連続していること
から、踏切の遮断機の警報音と認識される。次に、ステップ210へ進み、ステップ200での認識結果
に基づき、第２図の出力部74、送信部78に認識信号を出
力して、認識された警報音の種類を表示したり、音声合
成により車室内に発音させたりする。このようにして、１サイクルの音響認識処理が完了し
て、ステップ120に戻り、又、次の格納信号に同期して
次のサイクルの音響認識処理が実行される。Hereinafter, the present invention will be described based on specific examples. FIG. 2 is a block diagram showing a configuration of the apparatus of the embodiment. The vehicle alarm device 2 of the present embodiment is configured to sample and store a sound signal from a microphone 4 for picking up sound outside the vehicle for a predetermined period of time, and to analyze the sampled sound signal. A high-speed arithmetic processing unit 8 for performing predetermined arithmetic processing at a high speed; an audio signal sampled by the audio signal input unit 6 being input to the high-speed arithmetic processing unit 8 to execute predetermined arithmetic processing; An alarm sound identification unit 10 that identifies various alarm sounds from emergency vehicles, level crossings, and the like from external sounds picked up by the microphone 4 and outputs an identification signal indicating the identification result.
And an output unit 74 that outputs a control signal to an alarm device 72 provided in the vehicle cabin according to the identification signal from the alarm sound identification unit 10 and notifies the vehicle driver of the identification result of various alarm sounds. A transmission unit 78 that transmits the identification result of the identification unit 10 to the vehicle control device 76 and executes vehicle control according to the identification result. Here, in the audio signal input unit 6, first, the audio signal from the microphone 4 is input to the pre-processing circuit 20, and the audio signal passing through the pre-processing circuit 20 is A / D converted by the A / D converter 22. The pre-processing circuit 20 is a circuit for processing an audio signal so that the A / D converter 22 can perform A / D conversion satisfactorily. An amplifier for amplifying the audio signal, an anti-aliasing filter, a sample-and-hold circuit, etc. Is provided. Also, the A / D converter 22 is controlled by the control circuit 24, automatically A / D converts the audio signal at a predetermined sampling cycle, and stores the result in the RAMa 26 or RAMb 28. The control circuit 24 starts with the A / D converter 22
Is connected to RAMa26 via the switch circuit 30, and A / A
The D conversion results are sequentially stored in the RAMa 26, and thereafter, when the storage area of the RAMa 26 is full, a storage signal indicating that fact is output to the CPU 40 of the alarm sound identification unit 10, and at the same time, the switch circuit 30 is switched, and the A / A The output of the D converter 22 is connected to the RAMb 28, and then the A / D converter 22
The result of the A / D conversion by the A / D converter 22 is alternately stored in the RAMb 26 and the RAMb 28 by a procedure such as sequentially storing the result in the RAMb 28. As a result, the A / D converter 22 can read the A / D converted data without stopping the operation of the A / D converter 22. The switch circuit 30 is actually configured by a TTL or CMOS logic circuit. The RAMa 26 and RAMb 28 which store the A / D conversion result by the A / D converter 22 have the A / D continuously for at least twice the period of the envelope of the alarm sound to be recognized.
The one having a capacity capable of storing the conversion result is used. Next, the alarm sound identification unit 10 includes a CPU 40, a ROM 42, and a RAM 44.
And executes a later-described alarm sound recognition process according to a later-described control program stored in the ROM 42. The alarm sound recognition processing is performed via the control circuit 24.
Controls the A / D converter 22 and RAMa 26
Alternatively, the A / D conversion data stored in the RAMb 28 is output to the high-speed arithmetic processing unit 8 to execute predetermined arithmetic processing, and based on the arithmetic result, the emergency vehicle is selected from the external sounds collected by the microphone 4. This is a process for identifying an alarm sound of a pedestrian crossing, an alarm sound of a pedestrian crossing, an alarm sound from a circuit breaker at a level crossing, an alarm sound from another vehicle (ie, a horn sound), and the like. In order to execute this processing, the ROM 42 stores in advance a reference feature amount extracted from a characteristic shape associated with a temporal change of a frequency characteristic peak obtained by frequency analysis of the various alarm sounds to be identified. It is remembered. Next, based on the input data from the alarm sound identification unit 10, the high-speed arithmetic processing unit 8 processes the audio signal input through the audio signal input unit 6 at a high speed, and performs the frequency characteristics of the audio signal at each time. DSP (Digital Signal Processor) 50 for processing a large amount of numerical operations at high speed, RAM 52 for storing input data from the alarm sound identification unit 10 and data after the arithmetic processing, It comprises a ROM 54 in which a control program for executing a high-speed operation is stored in advance, and a control circuit 56 for controlling the execution or stop of the operation of the DSP 50 in accordance with the operation command from the alarm sound discrimination unit 10. For this reason, the CPU 40 of the alarm sound discriminating unit 10 stops the arithmetic processing of the DSP 50 by the control circuit 56, transfers the data to be processed to the RAM 52, and thereafter, the control circuit 5
By executing the arithmetic processing of the DSP 50 via 6, desired arithmetic processing can be executed as needed. Next, the output unit 74 is for giving a signal to the alarm 72 when it is necessary to issue an alarm to the driver based on the result of the alarm sound identification by the alarm sound identification unit 10, and the alarm 72 is a speaker or an alarm lamp. Alternatively, the driver is notified of the presence / absence and type of the alarm by a display or the like. Further, the transmission unit 78 is for transferring the recognition result of the alarm sound to the vehicle control device 76 for controlling the traveling of the vehicle, and the vehicle control device 76 uses this information as one input element of the control. I do. That is, for example, if the vehicle control device 76 is an engine control device, the vehicle is decelerated when an alarm sound from the circuit breaker is recognized, or if the vehicle control device 76 is a steering control device, When the warning sound from the emergency vehicle is recognized in the vehicle, control can be performed such that the vehicle is brought closer to the road shoulder. The above is the hardware configuration of this apparatus. The acoustoelectric converter M1 is composed of the microphone 4, and the frequency analysis means M2 is composed of the main high-speed processing unit 8 and the high-speed processing unit 8
It comprises an audio signal input unit 6 and a warning sound identification unit 10 as sub-components for processing and controlling input data for the, and includes a peak extraction unit M3, a group classification unit M4, a feature amount extraction unit M5, and an identification unit M6.
Is constituted by the alarm sound identification unit 10, and the reference feature amount storage means M7 is constituted by the ROM 42 of the alarm sound identification unit 10. Next, the operation of the vehicle alarm device 2 will be described by the alarm sound recognition unit 10.
The processing procedure of the CPU 40 will be described with reference to FIGS. 3 and 4, and the explanatory views of FIGS. 5 to 12 showing the data processing. As shown in FIG. 3, when the alarm sound recognition process is started,
First, in step 100, initialization processing for initializing a memory and various peripheral elements is executed, and the process proceeds to step 110. In step 110, a drive signal is output to the control circuit 24 to start the A / D conversion operation of the A / D converter 22 in order to start the operation of the acoustic signal input unit 6. Then
As described above, the audio signal input unit 6 converts the audio signal input via the microphone 4 into an A / D converter 22.
A / D conversion is performed at a predetermined sampling cycle, and the A / D conversion data is sequentially stored in the RAMa 26 first, and when the RAMa 26 is full, that is, data for a time necessary for recognition of the alarm sound can be obtained. Then, the control circuit 24 outputs a stored signal to the CPU 40. Then, the subsequent sampling data by the A / D converter 22 is sequentially stored in the RAMb 28. According to such a procedure, the acoustic signal from the microphone 4 is sequentially stored in the RAMa26 and RAMb28 functioning as a buffer memory, and each time the RAMa26 or RAMb28 becomes full,
The storage signal is input to the CPU 40. Then, a time change characteristic of the frequency characteristic within a certain time is obtained in synchronization with the stored signal. Therefore, in the subsequent step 120, the process waits for the storage signal from the control circuit 24 to be input, and when the storage signal is input and the RAMa 26 or RAMb 28 is full, the process proceeds to step 130, where the A / D conversion is performed. Transfer data to RAMa26 or RAMb
28, and temporarily stored in the RAM 44 of the alarm sound identification unit 10. In the following step 140, a frequency analysis process is executed based on the stored A / D conversion data. This frequency analysis processing is executed as shown in FIG. As shown in FIG. 4, in the frequency analysis processing, first, in step 300, the A / D conversion data stored in the RAM 44 is transferred to the RAM 52 of the high-speed operation processing unit 8, and in the next step 310, the control circuit 56 The filter processing, which is a program related to the frequency analysis of the DSP 50, is started. Then, according to the program procedure stored in the ROM 54, the DSP 50 performs a filtering process for calculating a time change characteristic of the amplitude (power) of the specific frequency component during the predetermined time from the A / D conversion data stored in the RAM 52 for the predetermined time. And stores the processing result in an empty area of the RAM 52 to notify the CPU 40 of the end of the program through the control circuit 56. Therefore, in the subsequent step 320, the process waits for the end signal of the program to be input, and when the end signal is input, the process proceeds to step 330, where the time change characteristic data of one frequency component as a result of the filter processing is transferred from the RAM 52. Read, RAM4
Store in the empty area of 4. Then, in the following step 340, it is determined whether or not the filtering process has been completed for all preset extraction frequencies. If not, the process returns to step 310 to start the filtering process program again. Do. The DSP 50 changes the extraction frequency little by little every time the filter processing is started, and executes the frequency analysis processing over the entire frequency range. As a result, the time change characteristic data of each frequency component within the same time is stored in the RAM 44. That is, for example, the acoustic signal shown in FIG.
The frequency is analyzed by P50, and as shown in FIG. 6, a time change characteristic of the frequency characteristic within a certain time is obtained. This fixed time is a time required for recognizing the alarm sound, that is, a time about twice as long as the cycle of the envelope of the alarm sound. In addition, DSP50
The time interval of the frequency analysis data output by the CPU 40 is equal to the sampling period, but the data shown in FIG. 6 is obtained by the CPU 40 so that the average value in a certain time width is subjected to the frequency analysis data at that time. Average processing is applied. Next, the CPU 40 shifts the execution to step 150 in FIG. 3, extracts the peak information from the data as shown in FIG. 6, and creates the data as shown in FIG. That is, assuming that the frequency characteristic at a certain time is the characteristic shown in FIG. 11, a differential operation (actually, a difference operation) is performed with respect to the frequency, and the maximum value, that is, the peak is the set data of the frequency and the amplitude (f ₀ , P ₀ ), (F ₂ , P ₂ ). Such processing is performed at each time t ₁ , t ₂ , t ₂ .
This is performed for each frequency characteristic at t _n , and eventually, peak data as shown in FIG. 7 is obtained. Next, the CPU 40 shifts the execution to step 160, and determines the continuity of the peaks extracted as shown in FIG. This peak frequency f _i in the frequency characteristics of the arbitrary time t _i 1
Peak frequency f _i-1 of the frequency characteristic at time t _i-1 before One
If it is approaching with a certain width, the extracted peak f
Group _i as an extension of continuous line B. Further, on the contrary, if the peak frequency f _i and the peak frequency f _i-1 does not approach a constant width, the group-specific the peak f _i as a starting point of a new continuous line. Such a process is executed for the peak of the frequency characteristic at each time, so that the eighth
As shown in the figure, the extracted peaks are grouped into a continuous line B and a continuous line C. Incidentally, in an actual road environment, peaks that should be extracted may not be extracted due to noise or passage of a sound-insulating object. In such a case, a silent state occurs between the peaks before and after the peak and the sound is no longer detected as a continuous sound. You may do it. Next, in step 170 and step 180, the grouped peak groups are blocked for each common characteristic amount of the characteristic shape with time. In step 170,
As shown in FIG. 8, block division is performed by paying attention to the time change of the frequency of the peak data grouped into the continuous line B and the continuous line C. That is, as shown in FIG. 9, the peak sequence of the continuous line B can be divided into a block X having a higher frequency and a block Y having a lower frequency from the viewpoint of the time change of the frequency. it can. On the other hand, the peak sequence of the continuous line C is determined as one block Z as it is because the frequency is constant as a whole. Next, in step 180, block division focusing on a temporal change in amplitude is further performed. In FIG. 9, block X is determined as having a constant amplitude, block Y is determined as having a constant amplitude, and block Z is determined as having an amplitude attenuation. In this case, no further block division is performed. Incidentally, in the example as shown in FIG.
In the block division focusing on the time change of the frequency in 170, the frequency is constant and the whole is determined as one block. However, in this step 180, the time change of the amplitude is focused on, so that the block V and the block W , And one of the blocks is determined to have amplitude attenuation. Next, in step 190, the feature amounts of the blocks X, Y, and Z divided as described above are stored in the RAM 44 in the following format. {Block start time, block end time, amplitude time change form, block start frequency, block end frequency, frequency time change form} Therefore, in the tenth data, specifically, X = {T ₀ , T ₁ , constant, f ₀ , f ₁ , rise｝ Y = ｛T ₁ , T ₂ , constant, f ₁ , f ₀ , fall｝ Z = ｛T ₃ , T ₄ , fall, f ₂ , f ₂ , Is constant. As described above, the feature amount focusing on the time change of the frequency and the time change of the amplitude of the sound input from the microphone 4 is extracted for each block. Next, the program proceeds to step 200 and step 190
It is determined whether or not any of the feature amounts obtained in step 1 matches the reference feature amount of the person to be recognized. Each recognition target sound is divided into reference blocks for each reference feature amount in the same manner as in the above-described blocking. Then, a reference block name configured for each recognition target sound and a reference feature amount of each reference block are stored in the ROM 42 in a format similar to the above. For example, the alarm sound of a level crossing barrier is divided into two reference blocks α and β. The reference feature value of each reference _{block, α = {0, t e} ± Δ 1, _{_{lowering, f j ± Δ 2, f}} j ± Δ 3, _{constant} β = {0 + Δ 4} , t f ± Δ 5, Descent, ± Δ ₆ , ± Δ ₇ , constant｝. Note that the reference feature values α and β
Are related by a certain relationship peculiar to the sounding body, so that the reference feature value β is defined as an allowable deviation from the reference feature value α. That is, the second sound corresponding to the reference feature β is
The first sound data corresponding to the reference feature alpha, start time and delta ₄ within After completion first ON, the frequency is in the allowable range range of ± delta ₆ or ± ₇ relative to the frequency of the first sound . Thus, the relative differences Δ ₆ and Δ ₇ between the frequencies of the first sound and the second sound,
Since the time-varying shape of the frequency and the amplitude and the sounding times t _e and t _f of the first and second sounds are limited, the allowable ranges Δ ₂ and Δ of the absolute values f _j of the frequencies of the first and second sounds are limited. ₃ has a very large value or infinity (that is, there is no limitation on the absolute value of the frequency), it does not erroneously recognize other sounds as a level crossing alarm sound, and there is a variation in frequency due to individual differences in sounding bodies, Recognition can be performed without being affected by the frequency shift due to the Doppler effect. Also, in the case of a siren of an ambulance, if the relative relationship between the two reference blocks corresponding to "Pee" and "Pee" of "Pee Pee Poo" is limited, recognition can be made in the same manner as in the case of a railroad crossing. The feature amount of the detected sound extracted as described above is compared with the reference feature amount as follows. A corresponding reference feature is selected by judging whether or not both the time variation of the amplitude and the time variation of the frequency match between the feature and the reference feature. Then, between the selected reference feature value and the feature value, whether or not the duration is within the allowable range, whether or not the frequency change amount is within the allowable range, the start frequency and the end frequency Is determined whether or not is within the allowable range. In this way, the reference feature values satisfying all the conditions are selected. Next, when one group of peaks is composed of a plurality of blocks or when two blocks are close to each other, it is determined whether the relationship between the blocks is equal to the relationship between the reference blocks. Is determined. When the relationship between the two becomes equal, it is finally recognized as an alarm sound composed of the reference block. Specifically, the continuous line segment B in FIG. 9 is a siren sound of a fire engine whose frequency changes with time, and the continuous line segment D in FIG. From this, it is recognized as a warning sound of a level crossing barrier. Next, proceeding to step 210, based on the recognition result in step 200, outputting a recognition signal to the output unit 74 and the transmission unit 78 in FIG. 2 to display the type of the recognized alarm sound, And make it sound inside the vehicle. In this way, one cycle of the sound recognition processing is completed, the process returns to step 120, and the sound recognition processing of the next cycle is executed in synchronization with the next storage signal.

【The invention's effect】

本発明の音響認識装置は、音響信号の各時刻における
周波数特性を求め、その周波数特性からピークを抽出
し、そのピークの時間に関する連続性を判定して、ピー
クを連続するピーク群毎に群別し、そのピーク群の時間
的変化に伴う特性形状の特徴量を抽出し、その特徴量と
基準特徴量とに基づいて、所定者を識別するようにして
いる。従って、周波数特性の時間的変化に伴う特性形状で音
の認識をしているので、周波数シフトがあったり、周波
数が時間に対して変化するような音に対しても精度良く
認識することが可能となる。又、その特性形状は、形状
のまま照合されるのではなく、形状の特徴量で比較され
るため、照合演算が短縮され、認識速度が向上する。The acoustic recognition device of the present invention obtains frequency characteristics at each time of an acoustic signal, extracts peaks from the frequency characteristics, determines continuity of the peaks with respect to time, and classifies the peaks into groups of consecutive peaks. Then, the characteristic amount of the characteristic shape accompanying the temporal change of the peak group is extracted, and the predetermined person is identified based on the characteristic amount and the reference characteristic amount. Therefore, since the sound is recognized using the characteristic shape that accompanies the temporal change of the frequency characteristic, it is possible to accurately recognize even a sound that has a frequency shift or a frequency that changes with time. Becomes In addition, the characteristic shape is not compared with the shape as it is, but is compared with the feature amount of the shape. Therefore, the matching calculation is shortened, and the recognition speed is improved.

【図面の簡単な説明】第１図は本発明の概念を示すブロック図。第２図は本発
明の具体的な実施例に係る装置の構成を示すブロック
図。第３図、第４図は同実施例装置のCPUの処理手順を
示したフローチャート。第５図は音響信号を示す波形
図。第６図は周波数分析結果を示す説明図。第７図はピ
ークの抽出を示した説明図。第８図は連続線分によるピ
ークの群別を示した説明図。第９図は群別化されたピー
ク群の特徴毎のブロック化を示した説明図。第10図は特
徴量の抽出を示した説明図。第11図は１時刻における周
波数特性を示した特性図。第12図はピーク群のブロック
化を示した説明図である。２……車両用警報装置、４……マイクロホン６……音響信号入力部、８……高速演算処理部 10……警報音識別部BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing the concept of the present invention. FIG. 2 is a block diagram showing a configuration of an apparatus according to a specific embodiment of the present invention. FIG. 3 and FIG. 4 are flowcharts showing processing procedures of the CPU of the apparatus of the embodiment. FIG. 5 is a waveform diagram showing an acoustic signal. FIG. 6 is an explanatory diagram showing a frequency analysis result. FIG. 7 is an explanatory diagram showing extraction of a peak. FIG. 8 is an explanatory diagram showing peaks classified by continuous line segments. FIG. 9 is an explanatory diagram showing block formation for each feature of a peak group divided into groups. FIG. 10 is an explanatory diagram showing extraction of a feature amount. FIG. 11 is a characteristic diagram showing frequency characteristics at one time. FIG. 12 is an explanatory view showing blocking of a peak group. 2 ... Vehicle alarm device 4 ... Microphone 6 ... Acoustic signal input unit 8 ... High speed processing unit 10 ... Alarm sound identification unit

Claims

(57) [Claims]

1. A sound recognition device for recognizing a predetermined sound from a sound detected by an acoustoelectric converter, wherein: a frequency analysis means for obtaining a frequency characteristic at each time of an acoustic signal output from the acoustoelectric converter; Peak extracting means for extracting a peak from the frequency characteristic at each time obtained by the frequency analyzing means; determining continuity of the peak extracted by the peak extracting means with respect to time; Grouping means for separating, a characteristic amount extracting means for extracting a characteristic amount of a characteristic shape associated with a temporal change of the peak group classified by the grouping means, and recognition corresponding to the characteristic amount of the detected sound. A reference feature value storage unit that stores a feature value of a predetermined sound to be performed as a reference feature value; a feature value extracted by the feature value extraction unit and the reference feature value storage unit. On the basis of the reference feature values,
An acoustic recognition device, comprising: identification means for identifying a predetermined sound.