JP2008301236A

JP2008301236A - Audio signal processing apparatus, method, program, and computer-readable recording medium storing the program

Info

Publication number: JP2008301236A
Application number: JP2007145679A
Authority: JP
Inventors: Yuzuru Koga; 譲古賀
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2007-05-31
Filing date: 2007-05-31
Publication date: 2008-12-11

Abstract

【課題】例えばマイクロフォンに物体がぶつかった場合や息が直接掛かった場合に生じる突発的な雑音を、複雑な処理を要することなく確実に除去もしくは大幅に低減できるようにする。
【解決手段】３以上の入力部１０，２０，３０によって入力された３以上の音声信号に基づいて、一の出力音声信号を生成して出力する出力制御部４２ａと、３以上の音声信号を比較し、３以上の音声信号間に差が生じた場合、３以上の音声信号間における多数決により少数派の音声信号の多数派の音声信号に対する差異をノイズとして検出するノイズ検出部４１ａとをそなえ、出力制御部４２ａは、ノイズ検出部４１ａによってノイズが検出されたときは、ノイズが含まれる音声信号の出力音声信号の生成に対する寄与率を低減または零にする。
【選択図】図１For example, sudden noise generated when an object collides with a microphone or when a person directly takes a breath can be reliably removed or greatly reduced without requiring complicated processing.
An output control unit (42a) for generating and outputting one output audio signal based on three or more audio signals input by three or more input units (10, 20, 30), and three or more audio signals. In comparison, when a difference occurs between three or more audio signals, a noise detection unit 41a that detects a difference between the minority audio signal and the majority audio signal as noise by majority vote between the three or more audio signals is provided. When the noise is detected by the noise detection unit 41a, the output control unit 42a reduces or eliminates the contribution rate of the audio signal including the noise to the generation of the output audio signal.
[Selection] Figure 1

Description

本発明は、例えばマイクロフォンで会話等の音声を電気信号に変換し、その信号を拡声、録音、もしくは遠隔地に伝達するための技術に関する。 The present invention relates to a technique for converting voice such as a conversation into an electric signal using a microphone, for example, and transmitting the signal to voice, recording, or a remote place.

従来、一本のマイクロフォンで音声を電気信号に変換する場合、例えば話者がマイクロフォンにぶつかった場合、あるいは、息がマイクロフォンに直接掛かった場合など、本来の会話に含まれない、望ましくない過大で突発的な雑音も電気信号に変換され、その結果、過大な雑音が電気信号（音声信号）にそのまま含まれてしまう。
このとき、ＡＬＣ（Automatic Level Control；自動音量調節）のような技術を使用すれば、過大な雑音が入った部分の音量を自動的に絞り、不愉快感を減じることは可能であるが、音量を絞った部分では、本来の信号（会話）の音量も絞られてしまい、会話を聞き取ることが難しくなる場合がある。 Traditionally, when a single microphone is used to convert sound into an electrical signal, for example, when a speaker hits the microphone or when a breath is directly applied to the microphone, it is an undesirable overload that is not part of the original conversation. Sudden noise is also converted into an electric signal, and as a result, excessive noise is included in the electric signal (audio signal) as it is.
At this time, if technology such as ALC (Automatic Level Control) is used, it is possible to automatically squeeze the volume of the part with excessive noise and reduce unpleasantness. In the narrowed part, the volume of the original signal (conversation) is also reduced, and it may be difficult to hear the conversation.

また、複数のマイクロフォンからの電気信号をミックスして利用する場合には、話者が物理的にぶつかったマイクロフォンからの音声信号は、マイクの本数分の１に緩和されるが、その雑音がなくなることはなく、音声信号にノイズが入ることはやむを得ないことであった。
なお、従来から複数のマイクロフォンの音声信号の相互相関係数に従って、複数の音声信号の結合音声信号を調整して目的音の成分を強調する技術（例えば、下記特許文献１）や、車内に設けられた複数のマイク毎に周囲雑音状況を常時監視し、検知された周囲雑音状況に応じて使用するマイクを選択する技術（例えば、下記特許文献２）がある。
特開２００４−２８９７６２号公報特開２００６−３３７００号公報 In addition, when using a mixture of electrical signals from a plurality of microphones, the audio signal from the microphone physically hit by the speaker is alleviated to a fraction of the number of microphones, but the noise is eliminated. However, it was unavoidable that noise entered the audio signal.
Conventionally, a technique for enhancing a target sound component by adjusting a combined sound signal of a plurality of sound signals in accordance with a cross-correlation coefficient of sound signals of a plurality of microphones (for example, Patent Document 1 below) or provided in a vehicle There is a technique (for example, Patent Document 2 below) that constantly monitors the ambient noise situation for each of a plurality of microphones and selects a microphone to be used according to the detected ambient noise situation.
JP 2004-289762 A JP 2006-33700 A

しかしながら、上記特許文献１の技術では、相互相関係数等の計算や成分調整など、非常に複雑な処理が必要になってしまう。
また、上記特許文献２の技術は、周囲の定常的な雑音状況、例えば車両のエンジン音やエアコンのブロア音、あるいは、車両の窓の開閉状況に応じた雑音を対象としているため、例えば、マイクロフォンに物体（話者等）がぶつかった場合や息が直接掛かった場合などに生じる突発的な雑音を除去することはできない。 However, the technique of Patent Document 1 requires very complicated processing such as calculation of cross-correlation coefficients and component adjustment.
Further, since the technique of Patent Document 2 is intended for a stationary ambient noise situation, for example, a noise according to a vehicle engine sound, an air conditioner blower sound, or a vehicle window opening / closing situation, for example, a microphone. Sudden noise that occurs when an object (such as a speaker) collides with a person or when a person directly takes a breath cannot be removed.

本発明は、このような課題に鑑み創案されたもので、例えばマイクロフォンに物体がぶつかった場合や息が直接掛かった場合に生じるような、突発的な雑音を、複雑な処理を要することなく確実に除去もしくは大幅に低減できるようにすることを目的とする。 The present invention has been devised in view of such problems. For example, sudden noise that occurs when an object collides with a microphone or when a person directly takes a breath can be reliably obtained without requiring complicated processing. It is intended to be able to be removed or greatly reduced.

上記目的を達成するために、本発明の音声信号処理装置は、音声信号を入力する３以上の入力部と、これら３以上の入力部によって入力された３以上の音声信号に基づいて、一の出力音声信号を生成して出力する出力制御部と、該３以上の入力部によって入力された該３以上の音声信号を比較し、該３以上の音声信号間に差が生じた場合、該３以上の音声信号間における多数決により少数派の音声信号の多数派の音声信号に対する差異をノイズとして検出するノイズ検出部とをそなえ、該出力制御部は、該ノイズ検出部によってノイズが検出されたときは、ノイズが含まれる該音声信号の該出力音声信号の生成に対する寄与率を低減または零にすることを特徴としている（請求項１）。 In order to achieve the above object, an audio signal processing device according to the present invention is based on three or more input units for inputting an audio signal, and three or more audio signals input by the three or more input units. When an output control unit that generates and outputs an output audio signal is compared with the three or more audio signals input by the three or more input units, and there is a difference between the three or more audio signals, A noise detection unit that detects, as noise, a difference between a minority audio signal and a majority audio signal by a majority vote between the above audio signals, and the output control unit is configured to detect when noise is detected by the noise detection unit. Is characterized in that the contribution rate of the audio signal including noise to the generation of the output audio signal is reduced or made zero (claim 1).

また、上記目的を達成するために、本発明の音声信号処理装置は、音声信号を入力する複数の入力部と、これら複数の入力部によって入力された複数の音声信号に基づいて、一の出力音声信号を生成して出力する出力制御部と、該複数の音声信号において信号レベルが閾値を超えた部分をノイズとして検出するノイズ検出部とをそなえ、該出力制御部は、該ノイズ検出部によってノイズが検出されたときは、ノイズが含まれる該音声信号の該出力音声信号の生成に対する寄与率を低減または零にすることを特徴としている（請求項２）。 In order to achieve the above object, an audio signal processing device according to the present invention has a plurality of input units for inputting audio signals and one output based on the plurality of audio signals input by the plurality of input units. An output control unit that generates and outputs an audio signal; and a noise detection unit that detects, as noise, a portion of the plurality of audio signals that has a signal level exceeding a threshold value. When noise is detected, the contribution ratio of the audio signal including noise to the generation of the output audio signal is reduced or made zero (claim 2).

なお、該出力制御部は、該ノイズ検出部によってノイズが検出されなければ各音声信号を所定の割合で結合することにより該出力音声信号を生成して出力する一方、該ノイズ検出部によってノイズが検出されると、当該ノイズを検出された音声信号の該割合を低減することが好ましい（請求項３）。
また、該出力制御部は、該ノイズ検出部によってノイズが検出されなければ複数の該音声信号の平均値を該出力音声信号として出力する一方、該ノイズ検出部によってノイズが検出されると、当該ノイズが検出された音声信号以外の音声信号の平均値を該出力音声信号として出力することが好ましい（請求項４）。 The output control unit generates and outputs the output audio signal by combining the audio signals at a predetermined ratio unless noise is detected by the noise detection unit, while the noise detection unit generates noise. When detected, it is preferable to reduce the proportion of the audio signal in which the noise is detected.
The output control unit outputs an average value of the plurality of audio signals as the output audio signal unless noise is detected by the noise detection unit, and when noise is detected by the noise detection unit, It is preferable to output an average value of audio signals other than the audio signal in which noise is detected as the output audio signal.

さらに、該入力部によって入力された該音声信号を一時的に保持する保持部をそなえ、該出力制御部は、該保持部に保持された該音声信号に基づいて該出力音声信号を生成して出力することが好ましい（請求項５）。
また、上記目的を達成するために、本発明の音声信号処理方法は、３以上の音声信号を入力する入力ステップと、この入力ステップにおいて入力された３以上の音声信号に基づいて、一の出力音声信号を生成して出力する出力制御ステップと、該入力ステップにおいて入力された３以上の音声信号を比較し、該３以上の音声信号間に差異が生じた場合、該３以上の音声信号間における多数決により少数派の音声信号の多数派の音声信号に対する差異の部分をノイズとして検出するノイズ検出ステップとを含み、該出力制御ステップは、該ノイズ検出ステップにおいてノイズが検出されたときは、ノイズが含まれる該音声信号の該出力音声信号の生成に対する寄与率を低減または零にすることを特徴としている（請求項６）。 Furthermore, a holding unit that temporarily holds the audio signal input by the input unit is provided, and the output control unit generates the output audio signal based on the audio signal held in the holding unit. It is preferable to output (Claim 5).
In order to achieve the above object, the audio signal processing method of the present invention includes an input step of inputting three or more audio signals, and one output based on the three or more audio signals input in the input step. When an output control step for generating and outputting an audio signal is compared with three or more audio signals input in the input step, and there is a difference between the three or more audio signals, the three or more audio signals are A noise detecting step of detecting, as noise, a difference portion of the minority audio signal with respect to the majority audio signal by the majority vote in the method, and when the noise is detected in the noise detection step, The contribution ratio of the audio signal including the output signal to the generation of the output audio signal is reduced or made zero (claim 6).

また、上記目的を達成するために、本発明の音声信号処理プログラムは、３以上の入力部から入力された３以上の音声信号に基づいて一の出力音声信号を出力する機能をコンピュータに実現させるためのプログラムであって、該３以上の入力部によって入力された該３以上の音声信号に基づいて、該一の出力音声信号を生成して出力する出力制御部、および、３以上の入力部によって入力された該３以上の音声信号を比較し、該３以上の音声信号間に差が生じた場合、該３以上の音声信号間における多数決により少数派の音声信号の多数派の音声信号に対する差異の部分をノイズとして検出するノイズ検出部として、該コンピュータを機能させるとともに、該出力制御部が、該ノイズ検出部によってノイズが検出されたときは、ノイズが含まれる該音声信号の該出力音声信号の生成に対する寄与率を低減または零にするように、該コンピュータを機能させることを特徴としている（請求項７）。 In order to achieve the above object, the audio signal processing program of the present invention causes a computer to realize a function of outputting one output audio signal based on three or more audio signals input from three or more input units. An output control unit for generating and outputting the one output audio signal based on the three or more audio signals input by the three or more input units, and three or more input units When the three or more audio signals input by the above are compared, and a difference occurs between the three or more audio signals, the majority audio signal of the minority audio signal with respect to the majority audio signal is determined by a majority vote between the three or more audio signals. The computer functions as a noise detection unit that detects a difference portion as noise, and noise is included when the output control unit detects noise by the noise detection unit. So as to reduce or zero the contribution ratio with respect to generation of the output audio signal that the audio signal is characterized in that to function the computer (claim 7).

なお、上記目的を達成するために、本発明のコンピュータ読取可能な記録媒体は上述した音声信号処理プログラムを記録したものである（請求項８）。 In order to achieve the above object, the computer-readable recording medium of the present invention records the above-described audio signal processing program (claim 8).

このように、本発明によれば、例えばマイクロフォンに物体（例えば話者）がぶつかった場合や話者の息が直接強く吹きかかった場合に生じる突発的な雑音を、ノイズ検出部は複雑な処理を要することなく確実に検出でき、さらに、出力制御部が当該ノイズを出力音声信号から確実に除去もしくは大幅に低減できる（請求項１〜８）。 As described above, according to the present invention, for example, the noise detection unit performs complicated processing on sudden noise generated when an object (for example, a speaker) collides with the microphone or when the speaker's breath blows directly and strongly. In addition, the output control unit can reliably remove or significantly reduce the noise from the output audio signal (claims 1 to 8).

以下、図面を参照しながら本発明の実施の形態について説明する。
〔１〕本発明の第１実施形態について
まず、図１に示すブロック図を参照しながら、本発明の第１実施形態としての音声信号処理装置（以下、本音声信号処理装置という）１ａの構成について説明する。本音声信号処理装置１ａは、３以上（より好ましくは３以上の奇数個；ここでは３つ）の入力部１０，２０，３０（以下、これらを特に区別しない場合は符号“１０〜３０”で示す。），およびＤＳＰ（Digital Signal Processor）４０ａを備えて構成されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[1] First Embodiment of the Present Invention First, the configuration of an audio signal processing apparatus (hereinafter referred to as the present audio signal processing apparatus) 1a as a first embodiment of the present invention with reference to the block diagram shown in FIG. Will be described. The audio signal processing device 1a has three or more (more preferably three or more odd numbers; here three) input units 10, 20, and 30 (hereinafter referred to as “10-30” unless otherwise distinguished). And a DSP (Digital Signal Processor) 40a.

入力部１０〜３０は、外部からの音声を音声信号として入力するものであり、それぞれ、マイクロフォン１１，２１，３１（以下、これらを特に区別しない場合は符号“１１〜３１”で示す。）と、マイクアンプ１２，２２，３２（以下、これらを特に区別しない場合は符号“１２〜３２”で示す。）と、ＡＤＣ（Analog Digital Converter；アナログデジタルコンバータ）１３，２３，３３（以下、これらを特に区別しない場合は符号“１３〜３３”で示す。）とを一つずつをそなえて構成されている。 The input units 10 to 30 input external audio as audio signals, and are microphones 11, 21, and 31 (hereinafter, indicated by “11 to 31” unless otherwise distinguished). , Microphone amplifiers 12, 22, and 32 (hereinafter referred to as “12 to 32” unless otherwise distinguished) and ADCs (Analog Digital Converters) 13, 23, and 33 (hereinafter referred to as “analog digital converter”). If there is no particular distinction, the reference numerals “13 to 33” are used).

なお、以下の説明において音声とは、主に話者（ユーザ）が発した音声をいうが、本発明において音声とは、話者が発する音声に限定されるものではなく、例えば楽器から発せられる音等、音であれば良い。
ここで、本音声信号処理装置１ａの携帯電話２への適用例を図２に示す。この場合、各入力部１０〜３０のマイクロフォン１１〜３１は、互いに所定の間隔をあけて設けられる。そして、本音声信号処理装置１ａは、例えば、通常の通話ではマイクロフォン１１（入力部１０）を使用し、テレビ電話やハンズフリー通話を行なう際にマイクロフォン１１〜３１（入力部１０〜３０）を使用するように構成される。なお、使用するマイクロフォンの切り替えはＤＳＰ４０ａによって行なわれる。 In the following description, a voice mainly means a voice uttered by a speaker (user). However, in the present invention, a voice is not limited to a voice uttered by a speaker, and is emitted from a musical instrument, for example. Any sound such as sound may be used.
Here, an application example of the audio signal processing apparatus 1a to the mobile phone 2 is shown in FIG. In this case, the microphones 11 to 31 of the input units 10 to 30 are provided with a predetermined interval therebetween. The audio signal processing device 1a uses, for example, the microphone 11 (input unit 10) in a normal call, and uses the microphones 11 to 31 (input units 10 to 30) when performing a videophone call or a hands-free call. Configured to do. The microphone to be used is switched by the DSP 40a.

なお、図２において符号“３”は携帯電話２の操作ボタンを示し、符号“４”は携帯電話２の表示部を示している。
また、図３に本音声信号処理装置１ａのマイクロフォン１１〜３１をヘッドセット５に搭載した例を示す。この場合もマイクロフォン１１〜３１は、互いに所定の間隔をあけて設けられる。なお、図３において符号“６”はヘッドセット５のヘッドフォン部を示している。 In FIG. 2, reference numeral “3” indicates an operation button of the mobile phone 2, and reference numeral “4” indicates a display unit of the mobile phone 2.
FIG. 3 shows an example in which the microphones 11 to 31 of the audio signal processing apparatus 1 a are mounted on the headset 5. Also in this case, the microphones 11 to 31 are provided at a predetermined interval. In FIG. 3, reference numeral “6” indicates a headphone portion of the headset 5.

マイクロフォン１１〜３１は、話者が発した音声を電気信号に変換して出力するものである。
マイクアンプ１２〜３２は、対応するマイクロフォン１１〜３１から出力された音声にかかる電気信号を増幅して後段のＡＤＣ１３〜３３に出力する。
ＡＤＣ１３〜３３は、マイクアンプ１２〜３２から出力された電気信号（アナログ信号）をデジタル信号（ＰＣＭ：Pulse Code Modulation）に変換して音声信号として後段のＤＳＰ４０ａに入力する。 The microphones 11 to 31 convert the voice uttered by the speaker into an electrical signal and output it.
The microphone amplifiers 12 to 32 amplify the electrical signal applied to the sound output from the corresponding microphones 11 to 31 and output the amplified electrical signals to the ADCs 13 to 33 in the subsequent stage.
The ADCs 13 to 33 convert the electrical signals (analog signals) output from the microphone amplifiers 12 to 32 into digital signals (PCM: Pulse Code Modulation), and input them as audio signals to the subsequent DSP 40a.

ＤＳＰ４０ａは、複数の入力部１０〜３０から入力された音声信号に対する処理を行なうものであり、ノイズ検出部４１ａと出力制御部４２ａとをそなえて構成されている。これらノイズ検出部４１ａや出力制御部４２ａは、ＤＳＰ４０ａが所定のアプリケーションプログラム（例えば後述する音声信号処理プログラム）を実行することによって実現される。 The DSP 40a performs processing on audio signals input from the plurality of input units 10 to 30, and includes a noise detection unit 41a and an output control unit 42a. The noise detection unit 41a and the output control unit 42a are realized by the DSP 40a executing a predetermined application program (for example, an audio signal processing program described later).

ノイズ検出部４１ａは、入力部１０〜３０によって入力された３つの音声信号を比較し、３つの音声信号間に差が生じた場合、３つの音声信号間における多数決により少数派の音声信号の多数派の音声信号に対する差異の部分をノイズとして検出するものである。
具体的には、ノイズ検出部４１ａは、図４（ａ）〜（ｃ）に示すごとく、入力された３つの音声信号の音声波形をリアルタイムで監視し、これら３つの音声信号間に所定値以上の差が生じた場合に、ノイズ検出処理を行なう。 The noise detection unit 41a compares the three audio signals input by the input units 10 to 30, and when a difference occurs between the three audio signals, the majority of the minority audio signals are determined by a majority vote between the three audio signals. The difference portion with respect to the voice signal of the group is detected as noise.
Specifically, as shown in FIGS. 4A to 4C, the noise detection unit 41a monitors the audio waveforms of the three input audio signals in real time, and a predetermined value or more between these three audio signals. When the difference is generated, noise detection processing is performed.

なお、図４（ａ）は入力部１０から入力された音声信号の音声波形を示し、図４（ｂ）は入力部２０から入力された音声信号の音声波形を示し、図４（ｃ）は入力部３０から入力された音声信号の音声波形を示している。また、図４（ａ）〜（ｃ），後述する図５および図８（ａ），（ｂ）において、横軸は時間を示し、縦軸は音声波形レベル（例えば電圧値）を示している。 4A shows the speech waveform of the speech signal input from the input unit 10, FIG. 4B shows the speech waveform of the speech signal input from the input unit 20, and FIG. The voice waveform of the voice signal input from the input unit 30 is shown. 4A to 4C and FIGS. 5 and 8A and 8B described later, the horizontal axis indicates time, and the vertical axis indicates voice waveform level (for example, voltage value). .

ノイズ検出部４１ａは、３つの音声信号の音声波形レベルをリアルタイムで比較し、その結果、３つの音声信号の音声波形レベルに所定値以上の差が生じた場合に、３つの音声信号間に差が生じたと判断する。したがって、ノイズ検出部４１ａは、音声信号間の音声波形レベルが厳密に同一でなくなるとすぐに音声信号間に差が生じたと判断するわけではなく、複数の音声信号間の音声波形レベルの差が所定値以内であれば、それら複数の音声信号は同一レベルであると判断する一方、ノイズと認められる程度の所定値以上の差が音声信号間で生じた場合に、音声信号間に差が生じたと判断するように構成されている。 The noise detection unit 41a compares the audio waveform levels of the three audio signals in real time, and as a result, if a difference of a predetermined value or more occurs in the audio waveform levels of the three audio signals, the noise detection unit 41a generates a difference between the three audio signals. Is determined to have occurred. Therefore, the noise detection unit 41a does not determine that a difference has occurred between the audio signals as soon as the audio waveform levels between the audio signals are not exactly the same, but the difference in the audio waveform levels between the plurality of audio signals is not. If it is within the predetermined value, it is determined that the plurality of audio signals are at the same level. On the other hand, if there is a difference between the audio signals that is recognized as noise, a difference occurs between the audio signals. It is comprised so that it may be judged.

例えば、図４（ａ）〜（ｃ）に示すように、時間ｔ０において３つの音声信号が入力されてから時間ｔ１までの間は、ノイズ検出部４１ａは、音声信号間に差が生じたと判断しないが、時間ｔ１において図４（ｂ）の音声信号に突発的に発生したノイズが含まれ、他の音声信号との間に所定値以上の差が生じると、ノイズ検出部４１ａは音声信号間に差が生じたと判断する。 For example, as shown in FIGS. 4A to 4C, the noise detection unit 41a determines that a difference has occurred between the audio signals from the time when three audio signals are input at time t0 to the time t1. However, when noise suddenly occurs in the audio signal in FIG. 4B at time t1 and a difference of a predetermined value or more occurs between other audio signals, the noise detection unit 41a detects the difference between the audio signals. It is judged that a difference has occurred.

そして、ノイズ検出部４１ａは、音声信号間に差が生じたと判断すると、多数決の論理により、３つ音声信号を、音声波形レベルに応じて、音声波形レベルが同一であるグループにグループ分けし、少数派のグループ（ここでは入力部２０からの音声信号）の多数派のグループ（ここでは入力部１０，３０からの音声信号）に対する差異をノイズとして検出する。 When the noise detection unit 41a determines that a difference has occurred between the audio signals, the three audio signals are grouped into groups having the same audio waveform level according to the audio waveform level according to the majority logic. A difference between the minority group (here, the audio signal from the input unit 20) and the majority group (here, the audio signal from the input units 10 and 30) is detected as noise.

つまり、ノイズ検出部４１ａは、多数決によってノイズを含む異常な音声信号を特定するとともに、そのノイズを特定する。
なお、上述したようにマイクロフォン１１〜３１は所定間隔をあけて配置されているため（上記図２，図３参照）、例えばマイクロフォン１１〜３１のいずれか一つに物体（話者等）がぶつかった場合や息が直接吹きかかった場合等は、その一つのマイクロフォン１１〜３１にのみノイズが発生することになる。換言すると、マイクロフォン１１〜３１間に距離があれば、例えば物体がぶつかった場合等は、その物体がぶつかった少数派のマイクロフォン１１〜３１からの音声信号と、他の多数派のマイクロフォン１１〜３１からの音声信号との間には明らかな違いが現われ、この違いは図４（ａ），図４（ｂ）に示すように、雑音の波形として少数派の音声信号（波形）に現われる。 That is, the noise detection unit 41a specifies an abnormal audio signal including noise by majority decision and specifies the noise.
As described above, since the microphones 11 to 31 are arranged at a predetermined interval (see FIGS. 2 and 3), for example, an object (speaker or the like) hits one of the microphones 11 to 31. In the case where the air is blown or the air is blown directly, noise is generated only in one of the microphones 11 to 31. In other words, if there is a distance between the microphones 11 to 31, for example, when an object collides, the audio signal from the minority microphones 11 to 31 that the object collided with and the other majority microphones 11 to 31. A clear difference appears between the voice signal and the voice signal from, and this difference appears in the minority voice signal (waveform) as a noise waveform, as shown in FIGS. 4 (a) and 4 (b).

したがって、ノイズ検出部４１ａは、多数決により、少数派の音声信号（波形）における多数派の音声信号（波形）に対する差をノイズとして特定することができるのである。
さらに、ノイズ検出部４１ａは、図４（ｂ）の時間ｔ２に示すように、ノイズを含む異常な音声信号であると判断した入力部２０からの音声信号の音声波形レベルと、多数派の入力部１０，３０からの音声信号の音声波形レベルとの差が、所定時間連続してなくなると、入力部２０からの音声信号のノイズが無くなったと判断する。 Therefore, the noise detection unit 41a can specify the difference between the minority voice signal (waveform) and the majority voice signal (waveform) as noise by majority vote.
Furthermore, as shown at time t2 in FIG. 4 (b), the noise detection unit 41a determines the voice waveform level of the voice signal from the input unit 20 determined to be an abnormal voice signal including noise, and the majority input. When the difference between the sound waveform levels of the sound signals from the units 10 and 30 disappears continuously for a predetermined time, it is determined that the noise of the sound signal from the input unit 20 has disappeared.

このように、ノイズ検出部４１ａは、図４（ｂ）の入力部２０からの音声信号の、図４（ａ），（ｃ）の入力部１０，３０からの音声信号に対する差異の部分（ｔ１〜ｔ２までの部分）を、ノイズとして検出する。
同様に、ノイズ検出部４１ａは、図４（ａ）の入力部１０からの音声信号と、図４（ｂ），（ｃ）の入力部２０，３０からの音声信号との間に所定値以上の差が生じた時点（図４（ａ）の時間ｔ３）から、所定時間連続して差がなくなる（図４（ａ）の時間ｔ４）まで、の入力部１０からの音声信号の差異の部分を、ノイズとして検出する。 As described above, the noise detection unit 41a has a difference (t1) between the audio signal from the input unit 20 in FIG. 4B and the audio signal from the input units 10 and 30 in FIGS. 4A and 4C. ˜t2) is detected as noise.
Similarly, the noise detection unit 41a has a predetermined value or more between the audio signal from the input unit 10 in FIG. 4A and the audio signals from the input units 20 and 30 in FIGS. 4B and 4C. Part of the difference in the audio signal from the input unit 10 from the time point when the difference occurs (time t3 in FIG. 4A) until the difference disappears continuously for a predetermined time (time t4 in FIG. 4A) Is detected as noise.

出力制御部４２ａは、入力部１０〜３０から入力された３つの音声信号に基づいて、一の出力音声信号を生成して外部に出力するものであり、ノイズ検出部４１ａによってノイズが検出されなければ、入力部１０〜３０からの音声信号をそれぞれ所定の割合（例えば、入力部１０の音声信号を５０％，入力部２０，３０の各音声信号を２５％）で結合（ミキシング）することにより出力音声信号を生成したり、あるいは、３つの音声信号の平均値を出力音声信号として出力したりする。 The output control unit 42a generates one output audio signal based on the three audio signals input from the input units 10 to 30 and outputs the same to the outside, and noise must be detected by the noise detection unit 41a. For example, the audio signals from the input units 10 to 30 are combined (mixed) at a predetermined ratio (for example, the audio signal of the input unit 10 is 50% and the audio signals of the input units 20 and 30 are 25%). An output audio signal is generated, or an average value of three audio signals is output as an output audio signal.

一方、出力制御部４２ａは、ノイズ検出部４１ａによってノイズが検出されると、当該ノイズを検出された音声信号のかかる割合（出力音声信号の生成に対する寄与率）を低減もしくは零にして、当該ノイズを低減もしくは除去した出力音声信号を生成して出力する。例えば、各音声信号を所定の割合で結合して出力音声信号を生成している場合には、ノイズを検出された音声信号の結合割合を低減もしくは零にして、出力音声信号に当該ノイズが極力含まれないようにする。 On the other hand, when the noise is detected by the noise detection unit 41a, the output control unit 42a reduces or eliminates the ratio of the audio signal in which the noise is detected (contribution rate to the generation of the output audio signal). An output audio signal with reduced or eliminated is generated and output. For example, when the output audio signal is generated by combining the audio signals at a predetermined ratio, the noise is detected as much as possible in the output audio signal by reducing or reducing the combination ratio of the audio signals in which noise is detected. Do not include.

したがって、出力制御部４２ａは、図４（ａ）〜（ｃ）に示す場合には、ノイズ検出部４１ａが、図４（ｂ）の入力部２０からの音声信号にノイズが含まれていることを検出した時間ｔ１〜ｔ２の間は、入力部２０からの音声信号の出力音声信号に対する結合割合を２５％から０％に変更し、入力部１０，３０からの音声信号の出力音声信号に対する結合割合をともに５０％に変更する。さらに、ノイズ検出部４１ａが、図４（ａ）の音声信号にノイズが含まれていることを検出した時間ｔ３〜ｔ４の間は、入力部１０からの音声信号の出力音声信号に対する結合割合を５０％から０％に変更し、入力部１０，３０からの音声信号の出力音声信号に対する結合割合を５０％に変更する。 Therefore, in the output control unit 42a, in the case shown in FIGS. 4A to 4C, the noise detection unit 41a includes noise in the audio signal from the input unit 20 in FIG. 4B. During the time t1 to t2 when the signal is detected, the coupling ratio of the audio signal from the input unit 20 to the output audio signal is changed from 25% to 0%, and the audio signal from the input units 10 and 30 is combined with the output audio signal. Both percentages are changed to 50%. Furthermore, during the time t3 to t4 when the noise detection unit 41a detects that the audio signal in FIG. 4A includes noise, the coupling ratio of the audio signal from the input unit 10 to the output audio signal is changed. 50% is changed to 0%, and the coupling ratio of the audio signal from the input units 10 and 30 to the output audio signal is changed to 50%.

また、出力制御部４２ａは、３つの音声信号の平均値を出力音声信号として出力している場合には、ノイズを検出された音声信号を含まない他の音声信号の平均値を音声出力信号として出力する。
図４（ａ）〜（ｃ）に示す場合には、ノイズ検出部４１ａによって図４（ｂ）の入力部２０からの音声信号に時間ｔ１〜ｔ２の間ノイズが含まれていることが検出されると、出力制御部４２ａはその間は入力部２０からの音声信号を含まず、入力部１０，３０からの音声信号の平均値を出力音声信号として出力する。なお、同様に、ノイズ検出部４１ａによって、入力部１０からの音声信号にノイズが含まれていることが時間ｔ３〜ｔ４の間検出されると、出力制御部４２ａは、その間は入力部１０からの音声信号以外の入力部２０，３０からの音声信号の平均値を出力音声信号として外部に出力する。 Further, when the average value of the three audio signals is output as the output audio signal, the output control unit 42a uses the average value of other audio signals that do not include the audio signal from which noise is detected as the audio output signal. Output.
In the cases shown in FIGS. 4A to 4C, the noise detection unit 41a detects that the audio signal from the input unit 20 in FIG. 4B includes noise for the time t1 to t2. Then, the output control unit 42a does not include the audio signal from the input unit 20 during that period, and outputs the average value of the audio signals from the input units 10 and 30 as the output audio signal. Similarly, when the noise detection unit 41a detects that the audio signal from the input unit 10 contains noise during the time t3 to t4, the output control unit 42a receives from the input unit 10 during that time. The average value of audio signals from the input units 20 and 30 other than the audio signal is output to the outside as an output audio signal.

これにより、出力制御部４２ａは、図５に示すように、ノイズが含まれていない出力音声信号を出力する。
ここで、図６に示すフローチャート（ステップＳ１〜Ｓ５）を参照しながら、本音声信号処理装置１ａの動作手順（つまり、本発明の音声信号処理方法）について説明すると、入力部１０〜３０が音声信号をそれぞれ入力すると（ステップＳ１；入力ステップ）、ＤＳＰ４０ａのノイズ検出部４１ａが、上述のごとく入力部１０〜３０によって入力された３つの音声信号を比較し、３つの音声信号間に差異が生じたか否かを判断する（ステップＳ２）。 Thereby, the output control part 42a outputs the output audio | voice signal which does not contain noise, as shown in FIG.
Here, the operation procedure of the audio signal processing device 1a (that is, the audio signal processing method of the present invention) will be described with reference to the flowchart (steps S1 to S5) shown in FIG. When each signal is input (step S1; input step), the noise detection unit 41a of the DSP 40a compares the three audio signals input by the input units 10 to 30 as described above, and a difference occurs between the three audio signals. It is determined whether or not (step S2).

そして、ノイズ検出部４１ａが３つの音声信号間に差異が生じてないと判断すると（ステップＳ２のＮｏルート）、出力制御部４２ａは、通常の動作（出力制御）として、３つの音声信号の所定割合の結合もしくは平均値を出力音声信号として外部に出力して（ステップＳ３；出力制御ステップ）、処理を終了する。
一方、ノイズ検出部４１ａが３つの音声信号間に差異が生じたと判断すると（ステップＳ２のＹｅｓルート）、３つの音声信号間における多数決により少数派の音声信号の多数派の音声信号に対する差異をノイズとして検出する（ステップＳ４；ノイズ検出ステップ）。 When the noise detection unit 41a determines that there is no difference between the three audio signals (No route in step S2), the output control unit 42a performs a predetermined operation on the three audio signals as a normal operation (output control). The ratio combination or average value is output to the outside as an output audio signal (step S3; output control step), and the process ends.
On the other hand, when the noise detection unit 41a determines that a difference has occurred between the three audio signals (Yes route in step S2), the difference between the minority audio signal and the majority audio signal is determined by the majority vote between the three audio signals. (Step S4; noise detection step).

次に、出力制御部４２ａが、ノイズ検出部時の出力制御として、ノイズを検出されたノイズが含まれる音声信号の出力音声信号の生成に対する寄与率を低減または零にして出力音声信号を生成し外部に出力して（ステップＳ５；出力制御ステップ）、処理を終了する。
このように、本発明の第１実施形態としての音声信号処理装置１ａ（音声信号処理方法）によれば、ノイズ検出部４１ａが入力部１０〜３０からの音声信号間における多数決によりノイズを検出し（ノイズ検出ステップ）、出力制御部４２ａがノイズ検出部４１ａによってノイズが検出されると、当該ノイズが含まれる音声信号の出力音声信号の出力音声信号の生成に対する寄与率を低減または零にする（出力制御ステップ）ので、例えばマイクロフォン１１〜３１に物体（例えば話者）がぶつかった場合や話者の息が直接強く吹きかかった場合に生じる突発的な雑音を、ノイズ検出部４１ａは複雑な処理を要することなく確実に検出でき、さらに、出力制御部４２ａが当該ノイズを出力音声信号から確実に除去もしくは大幅に低減できる。 Next, the output control unit 42a generates an output audio signal by reducing or eliminating the contribution ratio of the audio signal including the detected noise to the generation of the output audio signal as output control at the time of the noise detection unit. Output to the outside (step S5; output control step), and the process ends.
As described above, according to the audio signal processing device 1a (audio signal processing method) as the first embodiment of the present invention, the noise detection unit 41a detects noise by the majority vote between the audio signals from the input units 10 to 30. (Noise detection step) When the output control unit 42a detects noise by the noise detection unit 41a, the contribution rate of the output audio signal of the audio signal including the noise to the generation of the output audio signal is reduced or zero ( Output control step), for example, the noise detection unit 41a performs complicated processing for sudden noise generated when an object (for example, a speaker) collides with the microphones 11 to 31 or when the speaker's breath is blown directly and strongly. The output control unit 42a can reliably remove or significantly reduce the noise from the output audio signal.

なお、本音声信号処理装置１ａによって除去もしくは低減できる雑音は、当然に、マイクロフォン１１〜３１に物体がぶつかった場合や息が強く吹きかかった場合の雑音だけではなく、放電によるスパイクノイズやネットワーク回線の異常等による電気的なノイズが加わった場合でも同様の効果を得ることができる。
さらに、ノイズ検出部４１ａは、多数決の論理を用いてノイズを検出するので、大音量のノイズだけでなく、例えばマイクロフォン１１〜３１の故障が原因で、音声が途絶えて小音量のノイズが入力されるような場合であっても、その小音量のノイズをノイズとして検出することができ、この故障したマイクロフォン１１〜３１からの音声信号を出力音声信号に含めないようにすることができる。 Of course, the noise that can be removed or reduced by the audio signal processing apparatus 1a is not only noise caused when an object collides with the microphones 11 to 31 or when breath is blown strongly, but also spike noise caused by discharge or a network line. The same effect can be obtained even when electrical noise due to abnormalities is added.
Furthermore, since the noise detection unit 41a detects noise using the majority logic, not only a loud sound but also a sound with a small sound volume is input due to, for example, a failure of the microphones 11 to 31. Even in such a case, the noise of the small volume can be detected as noise, and the audio signal from the failed microphones 11 to 31 can be prevented from being included in the output audio signal.

また、出力制御部４２ａは、通常時は各音声信号を所定の割合で結合している場合には、ノイズ検出部４１ａによってノイズが検出されると、当該ノイズを検出された音声信号の結合割合を低減もしくは零にするので、出力音声信号から当該ノイズを確実に除去もしくは大幅に低減できる。
さらに、出力制御部４２ａは、通常時は３つの音声信号の平均値を出力音声信号として出力している場合には、ノイズ検出部４１ａによってノイズが検出されると、当該ノイズを検出された音声信号以外の音声信号の平均値を出力音声信号として出力するので、出力音声信号から当該ノイズを確実に除去できる。 In addition, when the noise is detected by the noise detection unit 41a, the output control unit 42a normally combines the audio signals at a predetermined ratio, and the combination ratio of the audio signals from which the noise is detected Therefore, the noise can be surely removed or greatly reduced from the output audio signal.
Further, when the output control unit 42a normally outputs an average value of three audio signals as an output audio signal, when the noise is detected by the noise detection unit 41a, the audio from which the noise is detected is output. Since the average value of the audio signals other than the signal is output as the output audio signal, the noise can be reliably removed from the output audio signal.

〔２〕本発明の第２実施形態について
次に、図７に示すブロック図を参照しながら、本発明の第２実施形態としての音声信号処理装置１ｂについて説明する。なお、図７において既述の符号と同一の符号は同一の部分もしくは略同一の部分を示している。
本音声信号処理装置１ｂは、複数（ここでは２つ）の入力部１０，２０、およびＤＳＰ４０ｂを備えて構成されている。 [2] Second Embodiment of the Present Invention Next, an audio signal processing device 1b as a second embodiment of the present invention will be described with reference to the block diagram shown in FIG. In FIG. 7, the same reference numerals as those described above indicate the same or substantially the same parts.
The audio signal processing apparatus 1b is configured to include a plurality (here, two) of input units 10 and 20 and a DSP 40b.

ここで、入力部１０，２０は上述した第１実施形態のものと同様のものであり、それぞれ、マイクロフォン１１，２１、マイクアンプ１２，２２、およびＡＤＣ１３，２３をそなえて構成されている。
そして、例えば、入力部１０からは図８（ａ）に示すような音声信号が入力され、入力部２０からは図８（ｂ）に示すような音声信号が入力される。 Here, the input units 10 and 20 are the same as those in the first embodiment described above, and are configured to include microphones 11 and 21, microphone amplifiers 12 and 22, and ADCs 13 and 23, respectively.
For example, an audio signal as shown in FIG. 8A is input from the input unit 10, and an audio signal as shown in FIG. 8B is input from the input unit 20.

ＤＳＰ４０ｂは、複数の入力部１０，２０から入力された音声信号に対する処理を行なうものであり、ノイズ検出部４１ｂと出力制御部４２ｂとをそなえて構成されている。
ノイズ検出部４１ｂは、複数の音声信号において信号レベルが閾値を超えた部分をノイズとして検出するものであり、例えば、図８（ａ），（ｂ）に示すように、入力信号の音声波形レベルが予め設定された閾値ｓ１以下もしくは閾値ｓ２以上である部分をノイズとして検出する。 The DSP 40b performs processing on audio signals input from the plurality of input units 10 and 20, and includes a noise detection unit 41b and an output control unit 42b.
The noise detection unit 41b detects a portion where the signal level exceeds a threshold in a plurality of audio signals as noise. For example, as shown in FIGS. 8A and 8B, the audio waveform level of the input signal is detected. Is detected as noise when the threshold value is less than or equal to a preset threshold value s1 or greater than or equal to the threshold value s2.

具体的には、ノイズ検出部４１ｂは、図８（ｂ）に示すように、入力部２０からの音声信号の音声波形レベルが閾値ｓ１以下になると（時間ｔ５参照）、音声信号がノイズを含んでいると判断し、その後所定時間（Δｔ）連続して音声波形レベルが閾値ｓ１より大きく且つ閾値ｓ２より小さくなるまで（時間ｔ７参照）は、入力部２０からの音声信号がノイズを含んでいると判断する。 Specifically, as shown in FIG. 8B, the noise detection unit 41b, when the audio waveform level of the audio signal from the input unit 20 becomes equal to or less than the threshold s1 (see time t5), the audio signal includes noise. The audio signal from the input unit 20 contains noise until the audio waveform level is continuously higher than the threshold value s1 and lower than the threshold value s2 for a predetermined time (Δt) thereafter (see time t7). Judge.

つまり、ノイズ検出部４１ｂは、時間ｔ５においてノイズを検出した後、所定時間Δｔ連続して音声波形レベルが閾値ｓ１より大きく且つ閾値ｓ２より小さくなる時間ｔ７まで（すなわち、時間ｔ６において音声波形レベルが閾値ｓ１より大きくなってから所定時間Δｔ経過後の時間ｔ７までの）、入力部２０からの音声信号にノイズが含まれていると判断する。 That is, the noise detection unit 41b detects noise at time t5 and then continues until a time t7 when the speech waveform level is greater than the threshold value s1 and smaller than the threshold value s2 for a predetermined time Δt (that is, the speech waveform level is at time t6). It is determined that the audio signal from the input unit 20 contains noise after the predetermined time Δt has elapsed from the time when the value exceeds the threshold s1 to the time t7.

なお、図８（ｂ）において、時間ｔ５の後時間ｔ６までの間に、音声波形レベルが閾値ｓ１より大きく閾値ｓ２より小さくなっているが、その時間が所定時間Δｔより短いため、ここではノイズ検出部４１ｂはノイズが解消したと判断しない。
同様に、ノイズ検出部４１ｂは、図８（ａ）に示すように、入力部１０からの音声信号おいて、時間ｔ８から時間ｔ９までの間をノイズとして検出する。 In FIG. 8B, the speech waveform level is larger than the threshold value s1 and smaller than the threshold value s2 until the time t6 after the time t5. However, since the time is shorter than the predetermined time Δt, noise is used here. The detection unit 41b does not determine that the noise has been eliminated.
Similarly, as shown in FIG. 8A, the noise detection unit 41b detects noise from time t8 to time t9 in the audio signal from the input unit 10.

出力制御部４２ｂは、ノイズ検出部４１ｂによってノイズが検出されたときは、ノイズが含まれる音声信号の出力音声信号の生成に対する寄与率を低減または零（ここでは零）にするものであり、これにより、出力音声信号に含まれるノイズが入った側の音声信号の音量の割合を減らし、出力音声信号におけるノイズを低減もしくは削除する。
具体的には、出力制御部４２ｂは、通常時（つまりノイズ検出部４１ｂによってノイズが検出されていないとき）は、入力部１０，２０の音声信号の平均値を出力音声信号として出力する一方、ノイズ検出部４１ｂによってノイズが検出されたときは、ノイズが含まれる音声信号の寄与率を零にして、ノイズを含まない方の音声信号を出力音声信号として出力する。 When noise is detected by the noise detection unit 41b, the output control unit 42b reduces or reduces the contribution ratio of the audio signal including the noise to the generation of the output audio signal (here, zero). Thus, the volume ratio of the audio signal on the side containing noise included in the output audio signal is reduced, and the noise in the output audio signal is reduced or eliminated.
Specifically, the output control unit 42b outputs an average value of the audio signals of the input units 10 and 20 as an output audio signal during normal times (that is, when no noise is detected by the noise detection unit 41b), When noise is detected by the noise detection unit 41b, the contribution ratio of the audio signal including noise is set to zero, and the audio signal not including noise is output as an output audio signal.

ここで、出力制御部４２ｂは、上述のごとくノイズ検出部４１ｂによってノイズが検出された場合に、出力音声信号をそのノイズを含まない音声信号に切り替えるために、図９に示すような出力音声信号に含める音声信号を比例配分によって調整（パンポット）するための調整信号（パンポット信号）を生成し、この調整信号に基づいて音声信号の出力制御を行なう。なお、図９において横軸は時間を示し、縦軸はパンポットのための調整信号の値（調整値）を示す。 Here, when the noise is detected by the noise detection unit 41b as described above, the output control unit 42b switches the output audio signal to an audio signal not including the noise, as shown in FIG. An adjustment signal (panpot signal) for adjusting (panpotting) the audio signal to be included in the audio signal is generated by proportional distribution, and output control of the audio signal is performed based on the adjustment signal. In FIG. 9, the horizontal axis indicates time, and the vertical axis indicates the value (adjustment value) of the adjustment signal for the pan pot.

なお、図９に示す例では、出力制御部４２ｂは、調整値が“０”のときは入力部２０からの音声信号のみを出力音声信号として出力し、調整値が“０．５”のときは入力部１０，２０からの音声信号それぞれを同比率で結合して出力音声信号として出力し、調整値が“１”のときは入力部１０からの音声信号のみを出力音声信号として出力する。
そして、出力制御部４２ｂは、ノイズ検出部４１ｂが入力部２０からの音声信号のノイズを検出した時間ｔ５〜ｔ７の間は、入力部１０からの音声信号を出力音声信号として出力すべく調整信号を“１”にして出力制御を行なう。 In the example shown in FIG. 9, when the adjustment value is “0”, the output control unit 42b outputs only the audio signal from the input unit 20 as the output audio signal, and when the adjustment value is “0.5”. The audio signals from the input units 10 and 20 are combined at the same ratio and output as an output audio signal. When the adjustment value is “1”, only the audio signal from the input unit 10 is output as the output audio signal.
Then, the output control unit 42b adjusts a signal to output the audio signal from the input unit 10 as an output audio signal during the time t5 to t7 when the noise detection unit 41b detects the noise of the audio signal from the input unit 20. Is set to “1” to perform output control.

また、出力制御部４２ｂは、ノイズ検出部４１ｂが入力部１０からの音声信号のノイズを検出した時間ｔ８〜ｔ９の間は、入力部２０からの音声信号を出力音声信号として出力すべく調整信号を“０”にして出力制御を行なう。これにより、出力制御部４２ｂによって、上記図５に示すようなノイズが低減された出力音声信号が出力される。
なお、出力制御部４２ｂは、調整信号が調整値“０．５”の状態から調整値“１”もしくは“０”の状態にいきなり変化させずに、所定時間をもって徐々に調整値“１”もしくは“０”に変化させているが、これは出力音声信号を入力部１０，２０の音声信号を半分ずつ結合している状態からいずれかの音声信号に直接切り替えると、この切り替え自体がノイズ発生原因になるためである。 Further, the output control unit 42b is an adjustment signal for outputting the audio signal from the input unit 20 as an output audio signal during the time t8 to t9 when the noise detection unit 41b detects the noise of the audio signal from the input unit 10. Is set to “0” to perform output control. As a result, the output control unit 42b outputs an output audio signal with reduced noise as shown in FIG.
The output control unit 42b gradually adjusts the adjustment value “1” or “within a predetermined time without changing the adjustment signal from the adjustment value“ 0.5 ”to the adjustment value“ 1 ”or“ 0 ”. This is changed to “0”. This is because when the output audio signal is directly switched from the state in which the audio signals of the input units 10 and 20 are combined in half to any one of the audio signals, this switching itself causes noise generation. Because it becomes.

このように、本発明の第２実施形態としての音声信号処理装置１ｂによれば、ノイズ検出部４１ｂが、入力部１０，２０からの複数の音声信号において信号レベルが閾値を超えた部分をノイズとして検出し、出力制御部４２ｂが、ノイズ検出部４１ｂによってノイズが検出されたときは、ノイズが含まれる音声信号の出力音声信号の生成に対する寄与率を零（もしくは低減）するので、例えばマイクロフォン１１，２１に物体がぶつかった場合や話者の息が直接強く吹きかかった場合に生じる突発的な雑音を、ノイズ検出部４１ａは複雑な処理を要することなく確実に検出でき、さらに、出力制御部４２ｂが当該ノイズを出力音声信号から確実に除去もしくは大幅に低減できる。 As described above, according to the audio signal processing device 1b as the second embodiment of the present invention, the noise detection unit 41b performs noise detection on a portion of the plurality of audio signals from the input units 10 and 20 where the signal level exceeds the threshold value. When the noise is detected by the noise detection unit 41b, the output control unit 42b zeroes (or reduces) the contribution rate of the audio signal including the noise to the generation of the output audio signal. For example, the microphone 11 , 21 can be detected reliably without the need for complicated processing, and the noise detection unit 41a can detect sudden noise generated when an object collides with the speaker 21 or when the speaker's breath blows directly. 42b can reliably remove or significantly reduce the noise from the output audio signal.

〔３〕その他
なお、本発明は上述した実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で種々変形して実施することができる。
例えば、上述した第１実施形態では、音声信号処理装置１ａが入力部１０〜３０およびＤＳＰ４０ａをそなえて構成された例をあげて説明したが、本発明はこれに限定されるものではなく、例えば、図１０に示すごとくＤＡＣ（Digital Analog Converter；デジタルアナログコンバータ）５０をそなえて構成されてもよく、このＤＡＣ５０がＤＳＰ４０ｃから出力された出力音声信号をアナログ信号に変換して外部に出力するようにしてもよい。 [3] Others The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the spirit of the present invention.
For example, in the first embodiment described above, the example in which the audio signal processing device 1a is configured to include the input units 10 to 30 and the DSP 40a has been described. However, the present invention is not limited to this, for example, As shown in FIG. 10, a DAC (Digital Analog Converter) 50 may be provided. The DAC 50 converts an output audio signal output from the DSP 40c into an analog signal and outputs the analog signal to the outside. May be.

なお、図１０は本発明の変形例としての音声信号処理装置１ｃの構成を示すブロック図であり、図１０において既述の符号と同一の符号は同一の部分もしくは略同一の部分を示している。
また、上述した第１実施形態では入力部１０〜３０から入力された音声信号をリアルタイムでＤＳＰ４０ａによる処理を施して外部に出力する場合を例にあげて説明したが、本発明はこれに限定されるものではなく、出力する出力音声信号に少々の遅延が許されるならば、図１０の音声信号処理装置１ｃに示すように、ＤＳＰ４０ｃが入力部１０〜３０から入力された音声信号を保持する保持部４３ａをそなえ、ノイズ検出部４１ａによってノイズが検出されると、出力制御部４２ａが、保持部４３ａに保持された、その検出タイミングより少し（所定時間）前の音声信号を使用して音声信号のミキシングを行なうように構成してもよい。 FIG. 10 is a block diagram showing a configuration of an audio signal processing apparatus 1c as a modification of the present invention. In FIG. 10, the same reference numerals as those already described indicate the same or substantially the same parts. .
In the first embodiment described above, the case where the audio signal input from the input units 10 to 30 is processed by the DSP 40a in real time and output to the outside has been described as an example. However, the present invention is not limited to this. If the output audio signal to be output is allowed to have a slight delay, the DSP 40c holds the audio signal input from the input units 10 to 30 as shown in the audio signal processing device 1c in FIG. When the noise is detected by the noise detection unit 41a, the output control unit 42a uses the audio signal held in the holding unit 43a slightly before the detection timing (predetermined time) as an audio signal. You may comprise so that it may mix.

つまり、図４（ｂ）に示すように、ノイズ検出部４１ａが入力部２０からの音声信号において時間ｔ１〜ｔ２間にノイズを検出した場合、出力制御部４２ａは、保持部２０に保持された音声信号を用いる際に、入力部２０からの音声信号については時間ｔ１よりも少し前から当該入力部２０からの音声信号を出力音声信号に含めないように出力音声信号を生成するように構成することが望ましい。これにより、ノイズ検出部４１ａがノイズを検出した後、出力制御部４２ａによって当該ノイズを含まないように出力制御を行なうまでの僅かな時間に出力音声信号に当該ノイズが含まれてしまうことを確実に抑止でき、出力音声信号におけるノイズの除去をより確実に実現できる。 That is, as illustrated in FIG. 4B, when the noise detection unit 41 a detects noise in the audio signal from the input unit 20 during the time t <b> 1 to t <b> 2, the output control unit 42 a is held by the holding unit 20. When using the audio signal, the audio signal from the input unit 20 is configured to generate the output audio signal so that the audio signal from the input unit 20 is not included in the output audio signal slightly before the time t1. It is desirable. Thereby, after the noise detection unit 41a detects the noise, it is ensured that the output audio signal includes the noise in a short time until the output control unit 42a performs the output control so as not to include the noise. Therefore, it is possible to more reliably realize noise removal from the output audio signal.

なお、この変形例は、上述した第２実施形態においても当然適用できるものであり、その場合、ノイズ検出部４１ｂが、例えば図８（ｂ）に示すように入力部２０からの音声信号において時間ｔ５にノイズ発生を検出した場合には、出力制御部４２ｂは、保持部に保持された音声信号を用いることにより、時間ｔ５より所定時間前から、入力部２０からの音声信号を出力音声信号に含まれないように制御する。これによって、雑音が発生してからノイズ検出部４１ｂに当該雑音が検出されるまでの間の雑音や、出力制御部４２ｂが調整信号（パンポット信号；図９参照）を用いて出力音声信号を入力部１０からの音声信号に完全に切り替えるまでの間に出力音声信号に含まれる雑音を、出力音声信号からより確実に除去できる。 Note that this modification can naturally be applied to the second embodiment described above, and in this case, the noise detection unit 41b performs time in an audio signal from the input unit 20 as shown in FIG. 8B, for example. When noise generation is detected at t5, the output control unit 42b uses the audio signal held in the holding unit to convert the audio signal from the input unit 20 into the output audio signal from a predetermined time before the time t5. Control so that it is not included. As a result, the noise from when the noise is generated until the noise is detected by the noise detection unit 41b, and the output control unit 42b outputs the output audio signal using the adjustment signal (panpot signal; see FIG. 9). The noise contained in the output audio signal can be more reliably removed from the output audio signal until it is completely switched to the audio signal from the input unit 10.

さらに、上述した第１実施形態では、ノイズ検出部４１ａが多数決の論理によりノイズを検出するので、大音量のノイズ以外にも、他の入力音声信号に含まれない小音量のノイズをも検出することができることを説明したが、この作用効果を利用して、図１０に示すように、例えばＤＳＰ４０ｃのノイズ検出部４１ａによって特定の音声信号から定常的に（つまり、所定時間以上連続して）ノイズが検出された場合には、その特定の音声信号にかかる入力部１０〜３０に異常があると判断して異常検出を行なう異常検出部４４ａをＤＳＰ４０ｃがそなえて構成してもよく、さらに、異常検出部４４ａが入力部１０〜３０に異常を検出した場合に外部（ユーザ）にその旨を警告するための警告手段４５をＤＳＰ４０ｃがそなえて構成してもよい。 Furthermore, in the first embodiment described above, since the noise detection unit 41a detects noise based on the majority logic, it detects not only high-volume noise but also low-volume noise that is not included in other input audio signals. However, as shown in FIG. 10, for example, the noise detection unit 41a of the DSP 40c makes noise from a specific audio signal steadily (that is, continuously for a predetermined time or more). When the signal is detected, the DSP 40c may be configured to include an abnormality detection unit 44a that performs abnormality detection by determining that there is an abnormality in the input units 10 to 30 related to the specific audio signal. When the detection unit 44a detects an abnormality in the input units 10 to 30, the DSP 40c may be configured to include warning means 45 for warning the outside (user) to that effect.

警告手段４５は、例えばＬＥＤ（Light Emitting Diode）によって実現され、マイクロフォン１１〜３１それぞれの近辺に設置される。そして、異常検出部４４ａは入力部１０〜３０に異常を検出すると、異常を検出した入力部１０〜３０のマイクロフォン１１〜３１に対応するＬＥＤを点灯させることにより、話者にマイクロフォン１１〜３１（入力部１０〜３０）の異常を警告する。 The warning unit 45 is realized by, for example, an LED (Light Emitting Diode), and is installed in the vicinity of each of the microphones 11 to 31. And if the abnormality detection part 44a detects abnormality in the input parts 10-30, it will light the LED corresponding to the microphones 11-31 of the input part 10-30 which detected abnormality, and will make a speaker the microphones 11-31 ( Warning of abnormalities in the input units 10 to 30).

これにより、話者は完全な状態で動作していないマイクロフォン１１〜３１を知ることができ、場合によっては入力部１０〜３０が完全に故障する前に予防保守が可能になる。
また、上述した実施形態において、入力部１０，２０，３０の数は限定されるものではない。
さらに、上述した実施形態では、一つの入力部１０〜３０が有するマイクロフォン１１〜３１の数は限定されるものではない。つまり、上述した実施形態では、各入力部１０〜３０が一つのマイクロフォン１１〜３１をそなえて構成された例、つまり、入力される音声信号がモノラル音声である場合を例にあげて説明したが、本発明はこれに限定されるものではなく、例えば、各入力部１０〜３０が複数のマイクロフォン１１〜３１をそなえ、入力音声がステレオ音声（マルチチャネルの音声）であってもよい。 Thus, the speaker can know the microphones 11 to 31 that are not operating in a complete state, and in some cases, preventive maintenance can be performed before the input units 10 to 30 completely fail.
In the above-described embodiment, the number of input units 10, 20, and 30 is not limited.
Furthermore, in the embodiment described above, the number of microphones 11 to 31 included in one input unit 10 to 30 is not limited. That is, in the above-described embodiment, an example in which each input unit 10-30 is configured to include one microphone 11-31, that is, a case where the input audio signal is monaural audio has been described as an example. The present invention is not limited to this. For example, each of the input units 10 to 30 may include a plurality of microphones 11 to 31 and the input sound may be stereo sound (multi-channel sound).

また、上述した実施形態では、出力制御部４２ａ，４２ｂが複数の入力部１０〜３０もしくは入力部１０，２０からの入力音声信号をミキシングして出力音声信号を生成する場合を例にあげて説明したが、本発明はこれに限定されるものではなく、出力制御部４２ａ，４２ｂは、複数の入力音声信号のうちの一の入力音声信号を出力音声信号として出力するように構成してもよく、この場合、ノイズ検出部４１ａ，４１ｂによってノイズが検出されると、出力制御部４２ａ，４２ｂは、そのノイズを検出された入力音声信号以外の入力音声信号を出力音声信号として外部に出力する。これにより、上述した実施形態と同様の効果を得ることができる。 In the above-described embodiment, the case where the output control units 42a and 42b generate the output audio signal by mixing the input audio signals from the plurality of input units 10 to 30 or the input units 10 and 20 will be described as an example. However, the present invention is not limited to this, and the output control units 42a and 42b may be configured to output one of the plurality of input audio signals as an output audio signal. In this case, when noise is detected by the noise detection units 41a and 41b, the output control units 42a and 42b output an input audio signal other than the input audio signal from which the noise has been detected to the outside as an output audio signal. Thereby, the effect similar to embodiment mentioned above can be acquired.

なお、上述した実施形態では、ノイズ検出部４１ａ，４１ｂおよび出力制御部４２ａ，４２ｂをＤＳＰ４０ａ，４０ｂによって実現する例をあげて説明したが、本発明はこれに限定されるものではなく、これらノイズ検出部４１ａ，４１ｂおよび出力制御部４２ａ，４２ｂをゲートアレイ等のハードウェアで実現してもよい。
また、ノイズ検出部４１ａ，４１ｂおよび出力制御部４２ａ，４２ｂはＤＳＰ４０ａ，４０ｂではなく、パソコン（パーソナルコンピュータ）の例えばＣＰＵ（Central Processing Unit）によって実現してもよい。 In the above-described embodiment, the noise detection units 41a and 41b and the output control units 42a and 42b have been described as examples realized by the DSPs 40a and 40b. However, the present invention is not limited to this, and the noises are not limited thereto. The detection units 41a and 41b and the output control units 42a and 42b may be realized by hardware such as a gate array.
The noise detection units 41a and 41b and the output control units 42a and 42b may be realized by, for example, a CPU (Central Processing Unit) of a personal computer (personal computer) instead of the DSPs 40a and 40b.

つまり、昨今の処理能力が高いパソコンを使用すれば、専用のＤＳＰ４０ａ，４０ｂを使用せずとも、所定のアプリケーションプログラム（後述する音声信号処理プログラム）をＣＰＵによって実行することによって、ノイズ検出部４１ａ，４１ｂおよび出力制御部４２ａ，４２ｂはＤＳＰ４０ａ，４０ｂを実現することができる。
なお、上述したノイズ検出部４１ａ，４１ｂ，出力制御部４２ａ，４２ｂ，および異常検出部４４ａとしての機能は、コンピュータ（ＣＰＵ，情報処理装置，各種端末を含む）が所定のアプリケーションプログラム（音声信号処理プログラム）を実行することによって実現されてもよい。 In other words, if a personal computer having a high processing capability is used, a predetermined application program (an audio signal processing program to be described later) is executed by the CPU without using the dedicated DSPs 40a and 40b. 41b and output control units 42a and 42b can realize DSPs 40a and 40b.
The functions of the noise detection units 41a and 41b, the output control units 42a and 42b, and the abnormality detection unit 44a described above are performed by a computer (including a CPU, an information processing device, and various terminals) by a predetermined application program (audio signal processing). It may be realized by executing a program.

そのプログラムは、例えばフレキシブルディスク，ＣＤ（ＣＤ−ＲＯＭ，ＣＤ−Ｒ，ＣＤ−ＲＷなど），ＤＶＤ（ＤＶＤ−ＲＯＭ，ＤＶＤ−ＲＡＭ，ＤＶＤ−Ｒ，ＤＶＤ−ＲＷ，ＤＶＤ＋Ｒ，ＤＶＤ＋ＲＷなど）等のコンピュータ読取可能な記録媒体に記録された形態で提供される。この場合、コンピュータはその記録媒体から音声信号処理プログラムを読み取って内部記憶装置または外部記憶装置に転送し格納して用いる。また、そのプログラムを、例えば磁気ディスク，光ディスク，光磁気ディスク等の記憶装置（記録媒体）に記録しておき、その記憶装置から通信回線を介してコンピュータに提供するようにしてもよい。 The program is, for example, a computer such as a flexible disk, CD (CD-ROM, CD-R, CD-RW, etc.), DVD (DVD-ROM, DVD-RAM, DVD-R, DVD-RW, DVD + R, DVD + RW, etc.). It is provided in a form recorded on a readable recording medium. In this case, the computer reads the audio signal processing program from the recording medium, transfers it to an internal storage device or an external storage device, and uses it. Further, the program may be recorded in a storage device (recording medium) such as a magnetic disk, an optical disk, or a magneto-optical disk, and provided from the storage device to a computer via a communication line.

ここで、コンピュータとは、ハードウェアとＯＳ（オペレーティングシステム）とを含む概念であり、ＯＳの制御の下で動作するハードウェアを意味している。また、ＯＳが不要でアプリケーションプログラム単独でハードウェアを動作させるような場合には、そのハードウェア自体がコンピュータに相当する。ハードウェアは、少なくとも、ＣＰＵ等のマイクロプロセッサと、記録媒体に記録されたコンピュータプログラムを読み取るための手段とをそなえている。 Here, the computer is a concept including hardware and an OS (operating system) and means hardware that operates under the control of the OS. Further, when the OS is unnecessary and the hardware is operated by the application program alone, the hardware itself corresponds to the computer. The hardware includes at least a microprocessor such as a CPU and means for reading a computer program recorded on a recording medium.

上記音声信号処理プログラムとしてのアプリケーションプログラムは、上述のようなコンピュータに、ノイズ検出部４１ａ，４１ｂ，出力制御部４２ａ，４２ｂ，および異常検出部４４ａとしての機能を実現させるプログラムコードを含んでいる。また、その機能の一部は、アプリケーションプログラムではなくＯＳによって実現されてもよい。
なお、本実施形態としての記録媒体としては、上述したフレキシブルディスク，ＣＤ，ＤＶＤ，磁気ディスク，光ディスク，光磁気ディスクのほか、ＩＣカード，ＲＯＭカートリッジ，磁気テープ，パンチカード，コンピュータの内部記憶装置（ＲＡＭやＲＯＭなどのメモリ），外部記憶装置等や、バーコードなどの符号が印刷された印刷物等の、コンピュータ読取可能な種々の媒体を利用することもできる。 The application program as the audio signal processing program includes a program code that causes the above-described computer to realize the functions as the noise detection units 41a and 41b, the output control units 42a and 42b, and the abnormality detection unit 44a. Also, some of the functions may be realized by the OS instead of the application program.
In addition to the above-described flexible disk, CD, DVD, magnetic disk, optical disk, and magneto-optical disk, the recording medium according to this embodiment includes an IC card, ROM cartridge, magnetic tape, punch card, and internal storage device of a computer ( It is also possible to use various computer-readable media such as a memory such as a RAM or a ROM, an external storage device, or a printed matter on which a code such as a barcode is printed.

本発明の第１実施形態としての音声信号処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice signal processing apparatus as 1st Embodiment of this invention. 本発明の第１実施形態としての音声信号処理装置の携帯電話への適用例を説明するための図である。It is a figure for demonstrating the example of application to the mobile telephone of the audio | voice signal processing apparatus as 1st Embodiment of this invention. 本発明の第１実施形態としての音声信号処理装置の入力部のマイクロフォンをヘッドセットに搭載した例を説明するための図である。It is a figure for demonstrating the example which mounted the microphone of the input part of the audio | voice signal processing apparatus as 1st Embodiment of this invention in the headset. （ａ）〜（ｃ）は、本発明の第１実施形態としての音声信号処理装置の３つの入力部から入力された音声信号の音声波形をそれぞれ示す図である。(A)-(c) is a figure which each shows the audio | voice waveform of the audio | voice signal input from three input parts of the audio | voice signal processing apparatus as 1st Embodiment of this invention. 本発明の第１実施形態としての音声信号処理装置の出力制御部から出力される音声信号の音声波形を示す図である。It is a figure which shows the audio | voice waveform of the audio | voice signal output from the output control part of the audio | voice signal processing apparatus as 1st Embodiment of this invention. 本発明の第１実施形態としての音声信号処理方法の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the audio | voice signal processing method as 1st Embodiment of this invention. 本発明の第２実施形態としての音声信号処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice signal processing apparatus as 2nd Embodiment of this invention. （ａ），（ｂ）は、本発明の第２実施形態としての音声信号処理装置の複数の入力部から入力された音声信号の音声波形をそれぞれ示す図である。(A), (b) is a figure which respectively shows the audio | voice waveform of the audio | voice signal input from the several input part of the audio | voice signal processing apparatus as 2nd Embodiment of this invention. 本発明の第２実施形態としての音声信号処理装置の出力制御部によって生成される調整信号を説明するための図である。It is a figure for demonstrating the adjustment signal produced | generated by the output control part of the audio | voice signal processing apparatus as 2nd Embodiment of this invention. 本発明の変形例としての音声信号処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice signal processing apparatus as a modification of this invention.

Explanation of symbols

１ａ〜１ｃ音声信号処理装置
２携帯電話
３操作ボタン
４表示部
５ヘッドセット
６ヘッドフォン部
１０，２０，３０入力部
１１，２１，３１マイクロフォン
１２，２２，３２マイクアンプ
１３，２３，３３ＡＤＣ
４０ａ〜４０ｃＤＳＰ
４１ａ，４１ｂノイズ検出部
４２ａ，４２ｂ出力制御部
４３ａ保持部
４４ａ異常検出部
４５警告手段
５０ＤＡＣ
Ｓ１入力ステップ
Ｓ３，Ｓ５出力制御ステップ
Ｓ４ノイズ検出ステップ DESCRIPTION OF SYMBOLS 1a-1c Audio | voice signal processing apparatus 2 Mobile phone 3 Operation button 4 Display part 5 Headset 6 Headphone part 10,20,30 Input part 11,21,31 Microphone 12,22,32 Microphone amplifier 13,23,33 ADC
40a-40c DSP
41a, 41b Noise detection unit 42a, 42b Output control unit 43a Holding unit 44a Abnormality detection unit 45 Warning means 50 DAC
S1 input step S3, S5 output control step S4 noise detection step

Claims

Three or more input units for inputting audio signals;
An output control unit that generates and outputs one output audio signal based on three or more audio signals input by the three or more input units;
When the three or more audio signals input by the three or more input units are compared, and there is a difference between the three or more audio signals, the majority of the three or more audio signals are subjected to a majority vote. A noise detector that detects the difference between the majority voice signal as noise,
When the noise is detected by the noise detection unit, the output control unit reduces or eliminates a contribution rate of the audio signal including the noise to the generation of the output audio signal. Processing equipment.

A plurality of input units for inputting audio signals;
An output control unit that generates and outputs one output audio signal based on the plurality of audio signals input by the plurality of input units;
A noise detector that detects a portion of the plurality of audio signals whose signal level exceeds a threshold as noise;
When the noise is detected by the noise detection unit, the output control unit reduces or eliminates a contribution rate of the audio signal including the noise to the generation of the output audio signal. Processing equipment.

If no noise is detected by the noise detection unit, the output control unit generates and outputs the output audio signal by combining the audio signals at a predetermined ratio, while the noise detection unit detects the noise. The audio signal processing apparatus according to claim 1, wherein the ratio of the audio signal in which the noise is detected is reduced.

The output control unit outputs an average value of a plurality of the audio signals as the output audio signal if no noise is detected by the noise detection unit. On the other hand, when noise is detected by the noise detection unit, the noise is 3. The audio signal processing apparatus according to claim 1, wherein an average value of audio signals other than the detected audio signal is output as the output audio signal.

A holding unit that temporarily holds the audio signal input by the input unit;
5. The audio signal according to claim 1, wherein the output control unit generates and outputs the output audio signal based on the audio signal held in the holding unit. 6. Processing equipment.

An input step for inputting three or more audio signals;
An output control step of generating and outputting one output audio signal based on the three or more audio signals input in the input step;
When three or more audio signals input in the input step are compared and a difference occurs between the three or more audio signals, the majority voice of the minority audio signal is determined by a majority vote between the three or more audio signals. A noise detection step of detecting a difference portion with respect to the signal as noise,
In the output control step, when noise is detected in the noise detection step, the contribution rate of the audio signal including noise to the generation of the output audio signal is reduced or made zero. Processing method.

An audio signal processing program for causing a computer to realize a function of outputting one output audio signal based on three or more audio signals input from three or more input units,
An output control unit that generates and outputs the one output audio signal based on the three or more audio signals input by the three or more input units; and
When the three or more audio signals input by the three or more input units are compared, and there is a difference between the three or more audio signals, the majority of the three or more audio signals are subjected to a majority vote. As a noise detection unit that detects a difference portion with respect to a majority voice signal as noise, the computer functions,
When the noise is detected by the noise detection unit, the output control unit causes the computer to function so as to reduce or reduce the contribution ratio of the audio signal including noise to the generation of the output audio signal. An audio signal processing program.

A computer-readable recording medium on which an audio signal processing program for causing a computer to realize a function of outputting one output audio signal based on three or more audio signals input from three or more input units,
The audio signal processing program is
An output control unit that generates and outputs the one output audio signal based on the three or more audio signals input by the three or more input units; and
When the three or more audio signals input by the three or more input units are compared, and there is a difference between the three or more audio signals, the majority of the three or more audio signals are subjected to a majority vote. As a noise detection unit that detects a difference portion with respect to a majority voice signal as noise, the computer functions,
When the noise is detected by the noise detection unit, the output control unit causes the computer to function so as to reduce or reduce the contribution ratio of the audio signal including noise to the generation of the output audio signal. The computer-readable recording medium which recorded the audio | voice signal processing program characterized by the above-mentioned.