JP2013047710A

JP2013047710A - Sound signal processing apparatus, imaging apparatus, sound signal processing method, program, and recording medium

Info

Publication number: JP2013047710A
Application number: JP2011185553A
Authority: JP
Inventors: keiichi Osako; 慶一大迫; Toshiyuki Sekiya; 俊之関矢; Mototsugu Abe; 素嗣安部
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2011-08-29
Filing date: 2011-08-29
Publication date: 2013-03-07

Abstract

PROBLEM TO BE SOLVED: To reduce buffer memory length required for estimating an interpolation signal from signals neighboring a sound section and to reduce delay of an output sound corresponding to an input sound in association with interpolation processing.SOLUTION: A sound signal processing apparatus includes: a first buffer memory that temporarily stores an input sound signal for every prescribed section; a second buffer memory that temporarily stores a sound signal in an (n-1)th section immediately preceding a sound signal in an n-th section which is stored in the first buffer memory; an interpolation signal generation unit which, on detecting that noise is included in the sound signal in the n-th section, generates an interpolation signal from at least the sound signal in the (n-1)th section which is stored in the second buffer memory; and a signal interpolation unit which uses the interpolation signal to interpolate the sound signal in the n-th section which includes the noise.

Description

本開示は、音声信号処理装置、撮像装置、音声信号処理方法、プログラム及び記録媒体に関する。 The present disclosure relates to an audio signal processing device, an imaging device, an audio signal processing method, a program, and a recording medium.

外部音声を収音して記録する音声記録装置として、ビデオカメラや、動画撮像機能付きのデジタルカメラ、ＩＣレコーダ等が知られている。これら装置を動作させるときには、装置本体から発生するパルス状の作動音が、記録音声に混入してしまう場合がある。 Video cameras, digital cameras with a moving image capturing function, IC recorders, and the like are known as audio recording devices that collect and record external audio. When these devices are operated, there are cases where pulsed operating sound generated from the device main body is mixed into the recorded sound.

例えば、動画撮像機能を備えた撮像装置は、動画撮像中にマイクロホンにより装置周辺の外部音声を収音し、当該音声を動画とともに記録する。かかる動画撮像時には、撮像装置に筐体内で、ズーム駆動音、絞り駆動音、オートフォーカス駆動音、操作ボタンの押下音などのパルス状の作動音が発生する。特に、撮像光学系を駆動させる駆動装置（ズームモータ、絞り機構、フォーカスモータ等）の動作開始時又は終了時には、例えば、モータとギアが噛み合うときの「カチッ」というパルス状の機械駆動音が発生する。 For example, an imaging apparatus having a moving image capturing function picks up external sound around the apparatus with a microphone during moving image capturing, and records the sound together with the moving image. At the time of capturing a moving image, a pulsed operation sound such as a zoom drive sound, an aperture drive sound, an autofocus drive sound, and an operation button pressing sound is generated in the housing of the image pickup apparatus. In particular, at the start or end of operation of a drive device (zoom motor, aperture mechanism, focus motor, etc.) that drives the imaging optical system, for example, a pulsing mechanical drive sound is generated when the motor and gear mesh. To do.

かかるパルス状の作動音は、ユーザが録音を所望する外部音声に雑音として混入・記録されてしまうと、非常に耳障りである。このため、音声記録装置では、記録時にパルス状の作動音を低減するための静音対策や雑音除去対策が必要となる。 Such a pulsed operation sound is very annoying if it is mixed and recorded as noise in the external sound that the user desires to record. For this reason, in the audio recording apparatus, it is necessary to take countermeasures against silence and noise elimination to reduce the pulsed operating sound during recording.

パルス状の機械駆動音を低減する方法として、これまでもいくつかの方法が提案されている。例えば、特許文献１には、レンズ駆動部で生じた雑音を含む雑音区間の入力音声信号を、当該雑音区間の前後の区間の音声信号で補間することが提案されている。 Several methods have been proposed so far for reducing the pulse-like mechanical drive sound. For example, Patent Document 1 proposes to interpolate an input voice signal in a noise section including noise generated in the lens driving unit with a voice signal in a section before and after the noise section.

特開平８−１２４２９９号公報JP-A-8-124299

上記特許文献記載の雑音低減方法は、雑音区間の前後の区間の音声信号を用いて録音を所望する背景音の音声信号を推定し、その推定した信号を用いて雑音区間の信号を補間することによって、雑音を低減した音声信号を得る。しかしながら、この雑音低減方法では、雑音区間の前後の一定区間の信号を用いる必要があるため、その全ての区間分の信号を保持するための長いバッファメモリが必要となる。さらに、録音時に雑音低減処理を実行すると、当該長いバッファメモリに保持するために、入力音声に対して出力音声が大きく遅延してしまう。このため、遅延した音声に映像や制御クロック等を同期させることとなり、他の映像記録部や制御部などといったカメラシステム全体の遅延が懸念される。 The noise reduction method described in the above-mentioned patent document estimates a sound signal of a background sound desired to be recorded using sound signals in a section before and after the noise section, and interpolates a signal in the noise section using the estimated signal. Thus, an audio signal with reduced noise is obtained. However, in this noise reduction method, it is necessary to use signals in a certain interval before and after the noise interval, and thus a long buffer memory is required to hold signals for all the intervals. Further, when noise reduction processing is performed during recording, the output sound is greatly delayed with respect to the input sound because the long buffer memory holds the noise. For this reason, the video and the control clock are synchronized with the delayed audio, and there is a concern about the delay of the entire camera system such as another video recording unit or the control unit.

例えば、図１は、パルス状の機械駆動音を雑音として含む音声信号を録音する際に、上記特許文献１記載の雑音低減方法により、雑音区間の前後の区間の背景音のデータを用いて雑音区間を補間する場合を示す説明図である。この場合、雑音区間長をＮとすると、雑音区間及びその前後の区間の音声信号の全てを保持するためには、概ね３＊Ｎの長さのバッファメモリが必要となる。また、これら区間の音声信号の全てがバッファメモリに保存された後に補間信号を生成し、当該補間信号を用いて雑音区間を補間する方法を用いると、音声信号が入力されてから出力されるまでの間に、少なくとも２＊Ｎの遅延が発生する。 For example, FIG. 1 shows that when recording an audio signal including a pulsed mechanical drive sound as noise, the noise reduction method described in the above-mentioned Patent Document 1 uses the background sound data of sections before and after the noise section. It is explanatory drawing which shows the case where an area is interpolated. In this case, assuming that the noise interval length is N, a buffer memory having a length of approximately 3 * N is required to hold all of the audio signals in the noise interval and the preceding and succeeding intervals. In addition, when an interpolated signal is generated after all the audio signals in these sections are stored in the buffer memory and a noise section is interpolated using the interpolated signals, the audio signal is input and then output. There will be a delay of at least 2 * N.

以上のように、雑音低減のための補間処理では、雑音区間の近辺の信号を用いて補間信号を推定する必要がある。この際、精度の高い推定を行うためには、従来では、雑音前後のある程度長い区間の音声信号を保持可能なバッファメモリを確保する必要があった。このため、補間信号の推定に必要なバッファメモリが増大するだけでなく、入力音声に対して出力音声が大きく遅延するため、映像記録や制御等のカメラシステム全体の遅延が発生するという問題があった。 As described above, in the interpolation processing for noise reduction, it is necessary to estimate an interpolation signal using a signal in the vicinity of the noise section. At this time, in order to perform highly accurate estimation, it has conventionally been necessary to secure a buffer memory capable of holding a sound signal of a certain long section before and after noise. For this reason, there is a problem that not only the buffer memory necessary for estimating the interpolation signal is increased, but also the output sound is greatly delayed with respect to the input sound, which causes a delay of the entire camera system such as video recording and control. It was.

そこで、上記事情に鑑みれば、雑音区間の近辺の信号から補間信号を推定するために必要なバッファメモリ長を減少できるとともに、補間処理に伴う入力音声に対する出力音声の遅延も低減することが可能な雑音低減方法が求められていた。 Therefore, in view of the above circumstances, the buffer memory length necessary for estimating the interpolation signal from the signal in the vicinity of the noise interval can be reduced, and the delay of the output sound with respect to the input sound accompanying the interpolation process can be reduced. There has been a need for a noise reduction method.

本開示によれば、入力された音声信号を所定区間ごとに一時保存する第１のバッファメモリと、前記第１のバッファメモリに保存されているｎ番目の区間の音声信号よりも１つ前のｎ−１番目の区間の音声信号を一時保存する第２のバッファメモリと、前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、少なくとも前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から補間信号を生成する補間信号生成部と、前記補間信号を用いて、前記雑音を含む前記ｎ番目の区間の音声信号を補間する信号補間部と、を備える、音声信号処理装置が提供される。 According to the present disclosure, a first buffer memory that temporarily stores an input audio signal for each predetermined interval, and an audio signal that is one before the nth interval audio signal that is stored in the first buffer memory. a second buffer memory for temporarily storing the audio signal of the (n-1) -th section, and at least the second buffer memory when it is detected that noise is included in the audio signal of the n-th section An interpolated signal generating unit that generates an interpolated signal from the n-1th section audio signal, and a signal interpolating unit that interpolates the nth section audio signal including the noise using the interpolated signal. An audio signal processing device is provided.

また、本開示によれば、外部音声を音声信号に変換する収音部と、前記収音部と同一の筐体に設けられ、雑音を発生させる発音部と、前記収音部から入力された前記音声信号を所定区間ごとに一時保存する第１のバッファメモリと、前記第１のバッファメモリに保存されているｎ番目の区間の音声信号よりも１つ前のｎ−１番目の区間の音声信号を一時保存する第２のバッファメモリと、前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、少なくとも前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から補間信号を生成する補間信号生成部と、前記補間信号を用いて、前記雑音を含む前記ｎ番目の区間の音声信号を補間する信号補間部と、を備える、撮像装置が提供される。 Further, according to the present disclosure, a sound collection unit that converts external sound into an audio signal, a sound generation unit that is provided in the same casing as the sound collection unit, and generates noise, and input from the sound collection unit A first buffer memory that temporarily stores the audio signal for each predetermined interval, and an audio of the (n-1) th interval that is one previous to the audio signal of the nth interval that is stored in the first buffer memory A second buffer memory for temporarily storing a signal, and the n−1th buffer stored in at least the second buffer memory when it is detected that the audio signal in the nth section includes noise An imaging apparatus comprising: an interpolation signal generation unit that generates an interpolation signal from the audio signal in the interval; and a signal interpolation unit that interpolates the audio signal in the n-th interval including the noise using the interpolation signal. Provided.

また、本開示によれば、第１のバッファメモリに保存されているｎ−１番目の区間の音声信号を第２のバッファメモリに一時保存することと、入力されるｎ番目の区間の音声信号を前記第１のバッファメモリに一時保存することと、前記第１のバッファメモリに保存されている前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、少なくとも前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から補間信号を生成することと、前記補間信号を用いて、前記雑音を含む前記ｎ番目の区間の音声信号を補間することと、を含む、音声信号処理方法が提供される。 In addition, according to the present disclosure, the audio signal of the (n−1) th section stored in the first buffer memory is temporarily stored in the second buffer memory, and the input audio signal of the nth section is stored. Is temporarily stored in the first buffer memory, and at least the second is detected when it is detected that the nth section audio signal stored in the first buffer memory contains noise. Generating an interpolated signal from the audio signal of the (n-1) -th section stored in the buffer memory of the first, and interpolating the audio signal of the n-th section including the noise using the interpolated signal. An audio signal processing method is provided.

また、本開示によれば、第１のバッファメモリに保存されているｎ−１番目の区間の音声信号を第２のバッファメモリに一時保存することと、入力されるｎ番目の区間の音声信号を前記第１のバッファメモリに一時保存することと、前記第１のバッファメモリに保存されている前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、少なくとも前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から補間信号を生成することと、前記補間信号を用いて、前記雑音を含む前記ｎ番目の区間の音声信号を補間することと、をコンピュータに実行させるためのプログラムが提供される。 In addition, according to the present disclosure, the audio signal of the (n−1) th section stored in the first buffer memory is temporarily stored in the second buffer memory, and the input audio signal of the nth section is stored. Is temporarily stored in the first buffer memory, and at least the second is detected when it is detected that the nth section audio signal stored in the first buffer memory contains noise. Generating an interpolated signal from the audio signal of the (n-1) -th section stored in the buffer memory of the first, and interpolating the audio signal of the n-th section including the noise using the interpolated signal. A program for causing a computer to execute is provided.

また、本開示によれば、第１のバッファメモリに保存されているｎ−１番目の区間の音声信号を第２のバッファメモリに一時保存することと、入力されるｎ番目の区間の音声信号を前記第１のバッファメモリに一時保存することと、前記第１のバッファメモリに保存されている前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、少なくとも前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から補間信号を生成することと、前記補間信号を用いて、前記雑音を含む前記ｎ番目の区間の音声信号を補間することと、をコンピュータに実行させるためのプログラムが記録された、コンピュータ読み取り可能な記録媒体が提供される。 In addition, according to the present disclosure, the audio signal of the (n−1) th section stored in the first buffer memory is temporarily stored in the second buffer memory, and the input audio signal of the nth section is stored. Is temporarily stored in the first buffer memory, and at least the second is detected when it is detected that the nth section audio signal stored in the first buffer memory contains noise. Generating an interpolated signal from the audio signal of the (n-1) -th section stored in the buffer memory of the first, and interpolating the audio signal of the n-th section including the noise using the interpolated signal. And a computer-readable recording medium on which a program for causing the computer to execute is recorded.

上記構成により、入力されるｎ番目の区間の音声信号が第１のバッファメモリに保存完了し、かつ、第１のバッファメモリに保存されているｎ番目の区間の音声信号に雑音が含まれることが検出されたときに直ちに、第２のバッファメモリに保存されているｎ−１番目の区間の音声信号から補間信号が生成され、当該補間信号を用いてｎ番目の区間の音声信号が補間され、補間後のｎ番目の区間の音声信号が出力される。これにより、２つのバッファメモリを用いて、所定区間ごとの音声信号の入出力処理と、音声信号に含まれる雑音の補間処理とを、少ない遅延量で好適に実現できる。 With the above configuration, the input audio signal in the nth section is completely stored in the first buffer memory, and the nth section audio signal stored in the first buffer memory includes noise. As soon as is detected, an interpolated signal is generated from the audio signal of the (n-1) th section stored in the second buffer memory, and the audio signal of the nth section is interpolated using the interpolated signal. The audio signal of the nth section after interpolation is output. As a result, the input / output processing of the audio signal for each predetermined section and the interpolation processing of the noise included in the audio signal can be suitably realized with a small delay amount using the two buffer memories.

以上説明したように本開示によれば、雑音区間の近辺の信号から補間信号を推定するために必要なバッファメモリ長を減少できるとともに、補間処理に伴う入力音声に対する出力音声の遅延も低減することができる。 As described above, according to the present disclosure, it is possible to reduce the buffer memory length necessary for estimating the interpolation signal from the signal in the vicinity of the noise interval, and to reduce the delay of the output sound with respect to the input sound accompanying the interpolation process. Can do.

本開示の関連技術に係る雑音低減方法により、雑音区間の前後の区間の背景音のデータを用いて雑音区間を補間する場合を示す説明図である。It is explanatory drawing which shows the case where a noise area is interpolated using the background sound data of the area before and behind a noise area with the noise reduction method which concerns on the related technique of this indication. 本開示の第１の実施形態に係る音声信号処理装置が適用されたデジタルカメラのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the digital camera to which the audio | voice signal processing apparatus which concerns on 1st Embodiment of this indication was applied. 同実施形態に係る音声信号処理装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the audio | voice signal processing apparatus which concerns on the same embodiment. 同実施形態に係る雑音区間の前の入力音声信号から補間信号を生成する方法を示す概念図である。It is a conceptual diagram which shows the method to produce | generate an interpolation signal from the input audio | voice signal before the noise area which concerns on the same embodiment. 同実施形態に係る雑音区間の前の入力音声信号から補間信号を生成する方法を示す概念図である。It is a conceptual diagram which shows the method to produce | generate an interpolation signal from the input audio | voice signal before the noise area which concerns on the same embodiment. 同実施形態に係る音声信号処理装置の通常時の動作を示す模式図である。It is a schematic diagram which shows the operation | movement at the time of the normal of the audio | voice signal processing apparatus concerning the embodiment. 同実施形態に係る音声信号処理装置の雑音発生時の動作例を示す模式図である。It is a schematic diagram which shows the operation example at the time of noise generation of the audio | voice signal processing apparatus concerning the embodiment. 同実施形態に係る音声信号処理方法を示すフローチャートである。It is a flowchart which shows the audio | voice signal processing method concerning the embodiment. 本開示の第２の実施形態に係る音声信号処理装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the audio | voice signal processing apparatus which concerns on 2nd Embodiment of this indication. 同実施形態に係る雑音区間の前後の入力音声信号から仮補間信号及び補間信号を生成する別の方法を示す概念図である。It is a conceptual diagram which shows another method which produces | generates a temporary interpolation signal and an interpolation signal from the input audio | voice signal before and behind the noise area which concerns on the embodiment. 同実施形態に係る音声信号処理装置の通常時の動作を示す模式図である。It is a schematic diagram which shows the operation | movement at the time of the normal of the audio | voice signal processing apparatus concerning the embodiment. 同実施形態に係る音声信号処理装置の雑音発生時の動作例を示す模式図である。It is a schematic diagram which shows the operation example at the time of noise generation of the audio | voice signal processing apparatus concerning the embodiment. 同実施形態に係る音声信号処理装置の雑音発生時の動作例を示す模式図である。It is a schematic diagram which shows the operation example at the time of noise generation of the audio | voice signal processing apparatus concerning the embodiment. 同実施形態に係る音声信号処理方法を示すフローチャートである。It is a flowchart which shows the audio | voice signal processing method concerning the embodiment. 本開示の第３の実施形態に係る音声信号処理装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the audio | voice signal processing apparatus which concerns on 3rd Embodiment of this indication. 同実施形態に係る雑音を含む音声信号とフレームとの位置関係を示す説明図である。It is explanatory drawing which shows the positional relationship of the audio | voice signal containing a noise and frame which concern on the embodiment. 同実施形態に係る音声信号処理装置の雑音発生時の第１動作例を示す模式図である。It is a schematic diagram which shows the 1st operation example at the time of noise generation of the audio | voice signal processing apparatus concerning the embodiment. 同実施形態に係る音声信号処理装置の雑音発生時の第１動作例を示す模式図である。It is a schematic diagram which shows the 1st operation example at the time of noise generation of the audio | voice signal processing apparatus concerning the embodiment. 同実施形態に係る音声信号処理装置の雑音発生時の第２動作例を示す模式図である。It is a schematic diagram which shows the 2nd operation example at the time of noise generation of the audio | voice signal processing apparatus which concerns on the embodiment. 同実施形態に係る音声信号処理装置の雑音発生時の第２動作例を示す模式図である。It is a schematic diagram which shows the 2nd operation example at the time of noise generation of the audio | voice signal processing apparatus which concerns on the embodiment. 同実施形態に係る音声信号処理方法を示すフローチャートである。It is a flowchart which shows the audio | voice signal processing method concerning the embodiment.

以下に添付図面を参照しながら、本開示の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In addition, in this specification and drawing, about the component which has the substantially same function structure, duplication description is abbreviate | omitted by attaching | subjecting the same code | symbol.

なお、説明は以下の順序で行うものとする。
１．第１の実施の形態
１．１．機械音低減方法の概要
１．２．音声信号処理装置の構成
１．２．１．音声信号処理装置のハードウェア構成
１．２．２．音声信号処理装置の機能構成
１．３．音声信号処理装置の動作
１．３．１．雑音がない通常時の動作例
１．３．２．雑音発生時の動作例
１．４．音声信号処理方法
１．５．効果
２．第２の実施の形態
２．１．機械音低減方法の概要
２．２．音声信号処理装置の機能構成
２．３．音声信号処理装置の動作
２．３．１．雑音がない通常時の動作例
２．３．２．雑音発生時の動作例
２．４．音声信号処理方法
２．５．効果
３．第３の実施の形態
３．１．機械音低減方法の概要
３．２．音声信号処理装置の機能構成
３．３．音声信号処理装置の動作
３．３．１．雑音発生時の第１動作例
３．３．２．雑音発生時の第１動作例
３．４．音声信号処理方法
３．５．効果 The description will be made in the following order.
1. 1. First embodiment 1.1. Outline of mechanical sound reduction method 1.2. Configuration of audio signal processing apparatus 1.2.1. Hardware configuration of audio signal processing apparatus 1.2.2. Functional configuration of audio signal processing apparatus 1.3. Operation of audio signal processing apparatus 1.3.1. Example of normal operation without noise 1.3.2. Example of operation when noise occurs 1.4. Audio signal processing method 1.5. Effect 2. Second Embodiment 2.1. Outline of mechanical noise reduction method 2.2. Functional configuration of audio signal processing apparatus 2.3. Operation of audio signal processing apparatus 2.3.1. Example of normal operation without noise 2.3.2. Example of operation when noise occurs 2.4. Audio signal processing method 2.5. Effect 3. Third embodiment 3.1. Outline of mechanical sound reduction method 3.2. Functional configuration of audio signal processing apparatus 3.3. Operation of audio signal processing apparatus 3.3.1. First operation example when noise occurs 3.3.2. First operation example when noise occurs 3.4. Audio signal processing method 3.5. effect

＜１．第１の実施の形態＞
［１．１．機械音低減方法の概要］
まず、本開示の第１の実施形態に係る音声信号処理装置及び方法を用いた機械音低減方法の概要について説明する。 <1. First Embodiment>
[1.1. Outline of mechanical noise reduction method]
First, an outline of a mechanical sound reduction method using the audio signal processing device and method according to the first embodiment of the present disclosure will be described.

本実施形態に係る音声信号処理装置及び方法は、音声記録装置又は音声再生装置において、例えば、当該装置の筐体内に設置された発音部（例えば駆動装置）から発生するパルス状の作動音（雑音）を低減する技術に関する。特に、本実施形態では、動画撮像機能を有する撮像装置において、動画を撮像しながら周辺音声を録音するときに、撮像装置に内蔵された駆動装置の動作開始時或いは動作終了時に発生するパルス状の機械駆動音を低減対象とする。 The audio signal processing apparatus and method according to the present embodiment are, for example, a pulsed operation sound (noise) generated from a sounding unit (for example, a driving device) installed in a casing of the audio recording apparatus or audio reproduction apparatus. ). In particular, in the present embodiment, in an imaging apparatus having a moving image capturing function, when recording peripheral sounds while capturing a moving image, a pulse-like generated at the start or end of the operation of a drive device built in the imaging apparatus Reduces mechanical drive noise.

ここで、駆動装置は、撮像光学系を用いた撮像動作を行うために撮像装置に内蔵された駆動装置であり、例えば、ズームレンズを移動させるズームモータや、フォーカスレンズを移動させるフォーカスモータ、絞り機構又はシャッターを制御する駆動機構などを含む。これら駆動装置は、撮像装置の収音部と同一の筐体内に設けられる。また、パルス状の機械駆動音（以下、「パルス機械音」という。）は、例えば、上記各種の駆動装置が動作開始又は動作終了するときに発生する瞬間的な雑音（例えば、ズームモータの駆動音、フォーカスモータの駆動音、絞り機構の駆動音、シャッター音、操作ボタンの押下音など）である。例えば、パルス機械音は、ズームモータ等の動作開始時又は動作終了時に、モータとギアが噛み合うことにより発生する「カチッ」又は「パチッ」という音などである。 Here, the driving device is a driving device built in the imaging device to perform an imaging operation using the imaging optical system. For example, a zoom motor that moves the zoom lens, a focus motor that moves the focus lens, an aperture A drive mechanism for controlling the mechanism or the shutter is included. These driving devices are provided in the same housing as the sound collection unit of the imaging device. The pulse-like mechanical driving sound (hereinafter referred to as “pulse mechanical sound”) is, for example, instantaneous noise (for example, driving of a zoom motor) generated when the various driving devices start or end the operation. Sound, focus motor drive sound, aperture mechanism drive sound, shutter sound, operation button press sound, and the like. For example, the pulse mechanical sound is a “click” or “click” sound generated when the motor and the gear mesh when the operation of the zoom motor or the like starts or ends.

以下では、音声信号処理装置が、動画撮像機能を有するデジタルカメラであり、除去対象雑音であるパルス機械音が、該デジタルカメラにおける光学ズーム動作開始時に発生するズーム開始音である例について説明する。しかし、本開示の音声信号処理装置や、パルス機械音は、かかる例に限定されない。また、本開示の対象とする雑音も、パルス状の作動音に限定されず、音声信号処理装置に入力される音声のうち、録音を所望する背景音に混入する任意の種類・特性の雑音に適用可能である。 Hereinafter, an example will be described in which the audio signal processing device is a digital camera having a moving image capturing function, and the pulse mechanical sound that is the noise to be removed is a zoom start sound generated at the start of the optical zoom operation in the digital camera. However, the audio signal processing device and the pulse mechanical sound of the present disclosure are not limited to such examples. In addition, the noise targeted by the present disclosure is not limited to the pulsed operation sound, but is any type / characteristic noise mixed in the background sound desired to be recorded out of the sound input to the sound signal processing apparatus. Applicable.

デジタルカメラによる撮像及び録音中に、ユーザがズーム操作を行うと、該カメラの内部でズームモータが駆動して、ズームレンズを駆動させるギアと係合して、瞬間的に大きいパルス機械音（ズーム開始音）が発生する。すると、デジタルカメラのマイクロホンは、ユーザが録音を所望するカメラ周囲の外部音声（例えば、環境音、人の話し声など、マイクロホンに収音される任意の音声を含む。以下「所望音」又は「背景音」という。）のみならず、カメラ内部で発生したパルス機械音も収音してしまう。このため、所望音にパルス機械音が雑音として混入した状態で録音されてしまうので、当該録音された音声を再生したときに、所望音に混入したパルス機械音がユーザにとって耳障りとなる。例えば、パルス機械音は２００Ｈｚ以下の筐体の振動を伴い、マイクロホン近傍で発生するため、所望音に比べて大きな音量で収音される。このようにパルス機械音と所望音で音量差があるため、所望音に機械音が混入していると、録音音声の再生時にパルス機械音が目立ってしまう。従って、動画及び音声の記録時又は再生時に、上記ズーム開始音等のパルス機械音を適切に除去した上で所望音のみを記録可能な技術が希求されていた。 When a user performs a zoom operation during imaging and recording by a digital camera, a zoom motor is driven inside the camera and engaged with a gear for driving a zoom lens, and instantaneously a pulse mechanical sound (zoom) (Starting sound) occurs. Then, the microphone of the digital camera includes any sound collected by the microphone, such as external sound around the camera that the user desires to record (for example, environmental sound, human speech, etc.). "Sound"), as well as pulsed mechanical sound generated inside the camera. For this reason, since the pulse mechanical sound is recorded as noise in the desired sound, when the recorded sound is reproduced, the pulse mechanical sound mixed in the desired sound becomes annoying to the user. For example, since the pulse mechanical sound is generated near the microphone with vibration of the casing of 200 Hz or less, it is picked up with a louder volume than the desired sound. Thus, since there is a volume difference between the pulse mechanical sound and the desired sound, if the mechanical sound is mixed in the desired sound, the pulse mechanical sound becomes conspicuous when the recorded sound is reproduced. Accordingly, there has been a demand for a technique capable of recording only desired sound after appropriately removing pulse mechanical sound such as the zoom start sound at the time of recording or reproducing moving images and sounds.

従来の雑音低減技術では、上記特許文献１記載のように、駆動装置を制御するための駆動信号の送信タイミングによって、機械駆動音の発生区間（雑音区間）を推定し、当該雑音区間の前後の区間の信号を用いて補間信号を推定し、雑音区間の信号を当該補間信号で補間することにより、雑音を低減していた。しかし、かかる雑音低減方法では、上述したように、雑音区間の前後の区間の信号を保持して補間信号を生成するために、これら全ての区間の信号を同時に保持するためには、雑音区間長Ｎの３倍程度に相当するバッファメモリが必要であった（図１参照。）。このため、雑音低減処理に要するバッファメモリが増大するだけでなく、当該バッファメモリに３＊Ｎの信号を保持する時間分だけ、入力音声に対して出力音声が大幅に遅延していた（少なくとも２＊Ｎの遅延が発生）。 In the conventional noise reduction technology, as described in Patent Document 1, a mechanical drive sound generation interval (noise interval) is estimated based on the transmission timing of a drive signal for controlling the drive device, and before and after the noise interval. The interpolation signal is estimated using the signal in the section, and the noise is reduced by interpolating the signal in the noise section with the interpolation signal. However, in this noise reduction method, as described above, in order to hold the signals of the sections before and after the noise section and generate the interpolation signal, in order to simultaneously hold the signals of all these sections, the noise section length A buffer memory corresponding to about three times N was required (see FIG. 1). For this reason, not only the buffer memory required for the noise reduction processing increases, but also the output sound is significantly delayed with respect to the input sound by the time for holding the 3 * N signal in the buffer memory (at least 2). * N delay occurs).

そこで、本実施形態では、雑音低減処理回路に設けた２つのバッファメモリを上手く利用して、フレーム単位での音声信号の処理を好適に制御し、補間信号を生成することを特徴としている。これにより、補間信号の生成に必要なバッファメモリ長を減少できるとともに、入力音声信号に対する出力音声信号の遅延も大幅に低減できる。 Therefore, the present embodiment is characterized in that the two buffer memories provided in the noise reduction processing circuit are effectively used to suitably control the processing of the audio signal in units of frames and generate the interpolation signal. As a result, the buffer memory length required for generating the interpolation signal can be reduced, and the delay of the output audio signal with respect to the input audio signal can be greatly reduced.

さらに、本実施形態では、パルス機械音を含む雑音区間の前の区間の音声信号のみを用いて、補間信号を生成し、雑音区間の音声信号を当該補間信号で補間した上で出力することを特徴としている。このように、雑音区間の前の区間の音声信号のみを用いて補間信号を生成したとしても、パルス機械音を適切に低減することが可能である。この理由は次の通りである。 Furthermore, in the present embodiment, the interpolation signal is generated using only the audio signal in the section before the noise section including the pulse mechanical sound, and the audio signal in the noise section is output after being interpolated with the interpolation signal. It is a feature. Thus, even if the interpolation signal is generated using only the audio signal in the section before the noise section, the pulse mechanical sound can be appropriately reduced. The reason is as follows.

上述した特許文献１等に記載の従来技術では、雑音を含まない区間（雑音区間の前後の区間)の信号として、例えば人の話し声のような音声を仮定している。かかる音声は、狭い時間でみると、周期的な信号から構成されている。周期的な信号中の雑音を補間するためには、その雑音前後の信号の周期と同一の周期を有する補間信号を生成し、周期を乱さずに雑音区間の前後をつなげなければならない。この理由は、補間処理によって信号の周期が乱れた場合、聴感上違和感のある音になるからである。従って、従来では当業者にとって、雑音区間の前後の信号を用いて補間信号を生成することが一般的であり、雑音区間の前の信号のみを用いて補間信号を生成することは、音質の面で問題が生じると考えられていた。 In the prior art described in Patent Document 1 and the like described above, as a signal in a section not including noise (a section before and after the noise section), for example, speech such as human speech is assumed. Such sound is composed of periodic signals when viewed in a narrow time. In order to interpolate noise in a periodic signal, it is necessary to generate an interpolation signal having the same period as the period of the signal before and after the noise and connect the front and back of the noise interval without disturbing the period. The reason for this is that when the signal period is disturbed by the interpolation process, the sound becomes uncomfortable in terms of hearing. Therefore, conventionally, it is common for those skilled in the art to generate an interpolation signal using signals before and after the noise interval, and generating an interpolation signal using only the signal before the noise interval is a problem in sound quality. It was thought that problems would occur.

しかし、実際の録音環境では、人の話し声のような周期的な音声が常に発生しているわけではなく、様々な音が混ざり合って非周期的な音声が生じている場合の方が多い。もし、雑音区間の前後が非周期的な音声が存在する場合は、雑音区間の補間前後の周期を揃える必要がなく、違和感のある音の発生が起こりにくい。これにより、雑音の前方の音声のみを使って補間した場合であっても、実質的には適切な雑音除去が可能ということになる。 However, in an actual recording environment, periodic sounds such as human speech are not always generated, and in many cases, various sounds are mixed to generate non-periodic sounds. If there is a non-periodic speech before and after the noise section, it is not necessary to align the periods before and after the interpolation of the noise section, and it is difficult for a sound with an uncomfortable feeling to occur. As a result, even when interpolation is performed using only the speech in front of the noise, it is possible to remove noise substantially.

また、雑音前後が周期的な音声（人の話し声等）である場合も起こりうるが、カメラの近くで発話されている場合がほとんどであり、この場合には、大きな音量の音声としてマイクへ入力される。従って、カメラ内部で発生する雑音（パルス機械音等）よりも、外部から入力された音声の方が大きくなるため、マスキング現象によって雑音そのものが聞こえなくなる事が多い。従って、このような場合は、雑音区間の補間処理を行う必要がないため、雑音の前方の音声を使った補間による悪影響は無いと言える。 It can also occur when the noise is periodic (such as human speech), but in most cases it is spoken near the camera. In this case, the sound is input to the microphone as a loud sound. Is done. Therefore, since the voice inputted from the outside is larger than the noise (pulse mechanical sound etc.) generated inside the camera, the noise itself is often inaudible due to the masking phenomenon. Therefore, in such a case, it is not necessary to perform an interpolation process in the noise section, and it can be said that there is no adverse effect due to the interpolation using the speech in front of the noise.

そこで、以下に詳述する第１の実施形態では、入力される音声信号のうち、ｎ番目の区間が雑音を含む雑音区間である場合には、当該雑音区間の１つ前のｎ−１番目の区間の音声信号のみを用いて、雑音低減用の補間信号を生成する（ｎ：自然数）。かかる補間処理であっても、上記理由により、雑音を適切に低減することが可能である。以下に、第１の実施形態に係る音声信号処理装置及び方法について詳述する。 Therefore, in the first embodiment described in detail below, when the nth section of the input audio signal is a noise section including noise, the (n−1) th one before the noise section. Interpolation signals for noise reduction are generated using only the audio signal in the interval (n: natural number). Even with such an interpolation process, it is possible to appropriately reduce noise for the above reasons. The audio signal processing apparatus and method according to the first embodiment will be described in detail below.

［１．２．音声信号処理装置の構成］
［１．２．１．音声信号処理装置のハードウェア構成］
まず、図２を参照して、本実施形態に係る音声信号処理装置が適用されたデジタルカメラのハードウェア構成例について説明する。図２は、本実施形態に係る音声信号処理装置が適用されたデジタルカメラ１のハードウェア構成を示すブロック図である。 [1.2. Configuration of audio signal processing apparatus]
[1.2.1. Hardware configuration of audio signal processing apparatus]
First, a hardware configuration example of a digital camera to which the audio signal processing device according to this embodiment is applied will be described with reference to FIG. FIG. 2 is a block diagram illustrating a hardware configuration of the digital camera 1 to which the audio signal processing device according to the present embodiment is applied.

本実施形態に係るデジタルカメラ１は、例えば、動画撮像中に動画と共に音声も記録可能な撮像装置である。このデジタルカメラ１は、被写体を撮像して、当該撮像により得られた撮像画像（静止画又は動画のいずれでもよい。）をデジタル方式の画像データに変換し、音声とともに記録媒体に記録する。 The digital camera 1 according to the present embodiment is, for example, an imaging device that can record audio together with moving images during moving image imaging. The digital camera 1 captures an image of a subject, converts a captured image (either a still image or a moving image) obtained by the imaging into digital image data, and records the image together with sound on a recording medium.

図２に示すように、本実施形態に係るデジタルカメラ１は、概略的には、撮像部１０と、画像信号処理部２０と、表示部３０と、記録媒体４０と、収音部５０と、音声信号処理部６０と、制御部７０と、操作部８０とを備える。 As shown in FIG. 2, the digital camera 1 according to the present embodiment schematically includes an imaging unit 10, an image signal processing unit 20, a display unit 30, a recording medium 40, a sound collection unit 50, An audio signal processing unit 60, a control unit 70, and an operation unit 80 are provided.

撮像部１０は、被写体を撮像して、撮像画像を表すアナログ画像信号を出力する。撮像部１０は、撮像光学系１１と、撮像素子１２と、タイミングジェネレータ１３と、駆動装置１４とを備える。 The imaging unit 10 images a subject and outputs an analog image signal representing the captured image. The imaging unit 10 includes an imaging optical system 11, an imaging element 12, a timing generator 13, and a driving device 14.

撮像光学系１１は、フォーカスレンズ、ズームレンズ、補正レンズ等の各種レンズや、不要な波長を除去する光学フィルタ、シャッター、絞り等の光学部品からなる。被写体から入射された光学像（被写体像）は、撮像光学系１１における各光学部品を介して、撮像素子１２の露光面に結像される。撮像素子１２（イメージセンサ）は、例えば、ＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）又はＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）などの固体撮像素子で構成される。この撮像素子１２は、撮像光学系１１から導かれた光学像を光電変換し、撮像画像を表す電気信号（アナログ画像信号）を出力する。 The imaging optical system 11 includes various lenses such as a focus lens, a zoom lens, and a correction lens, and optical components such as an optical filter that removes unnecessary wavelengths, a shutter, and a diaphragm. An optical image (subject image) incident from a subject is imaged on the exposure surface of the image sensor 12 via each optical component in the imaging optical system 11. The image pickup device 12 (image sensor) is configured by a solid-state image pickup device such as a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS), for example. The image pickup device 12 photoelectrically converts the optical image guided from the image pickup optical system 11 and outputs an electric signal (analog image signal) representing the picked-up image.

撮像光学系１１には、該撮像光学系１１の光学部品を駆動するための駆動装置１４が機械的に接続されている。この駆動装置１４は、例えば、ズームモータ１５、フォーカスモータ１６、絞り機構（図示せず。）などを含む。駆動装置１４は、後述する制御部７０の指示に従って、撮像光学系１１の光学部品を駆動させ、ズームレンズ、フォーカスレンズを移動させたり、絞りを調整したりする。例えば、ズームモータ１５は、ズームレンズをテレ／ワイド方向に移動させることで、画角を調整するズーム動作を行う。また、フォーカスモータ１６は、フォーカスレンズを移動させることで、被写体に焦点を合わせるフォーカス動作を行う。 A driving device 14 for driving the optical components of the imaging optical system 11 is mechanically connected to the imaging optical system 11. The driving device 14 includes, for example, a zoom motor 15, a focus motor 16, a diaphragm mechanism (not shown), and the like. The drive device 14 drives the optical components of the imaging optical system 11 according to an instruction from the control unit 70 described later, and moves the zoom lens and the focus lens or adjusts the diaphragm. For example, the zoom motor 15 performs a zoom operation for adjusting the angle of view by moving the zoom lens in the tele / wide direction. Further, the focus motor 16 performs a focus operation for focusing on the subject by moving the focus lens.

また、タイミングジェネレータ１３（以下、ＴＧ１３という。）は、制御部７０の指示に従って、撮像素子１２に必要な動作パルスを生成する。例えば、ＴＧ１３は、垂直転送のための４相パルス、フィールドシフトパルス、水平転送のための２相パルス、シャッタパルスなどの各種パルスを生成し、撮像素子１２に供給する。このＴＧ１３により撮像素子１２を駆動させることで、被写体像が撮像される。また、ＴＧ１３が、撮像素子１２のシャッタースピードを調整することで、撮像画像の露光量や露光期間が制御される（電子シャッター機能）。上記の撮像素子１２が出力した画像信号は画像信号処理部２０に入力される。 Further, the timing generator 13 (hereinafter referred to as TG 13) generates an operation pulse necessary for the image sensor 12 in accordance with an instruction from the control unit 70. For example, the TG 13 generates various pulses such as a four-phase pulse for vertical transfer, a field shift pulse, a two-phase pulse for horizontal transfer, and a shutter pulse, and supplies them to the image sensor 12. By driving the image sensor 12 by the TG 13, a subject image is captured. Further, the exposure amount and the exposure period of the captured image are controlled by the TG 13 adjusting the shutter speed of the image sensor 12 (electronic shutter function). The image signal output from the image sensor 12 is input to the image signal processing unit 20.

画像信号処理部２０は、マイクロコントローラなどの電子回路で構成され、撮像素子１２から出力される画像信号に対して所定の画像処理を施し、当該画像処理後の画像信号を表示部３０や制御部７０に出力する。画像信号処理部２０は、アナログ信号処理部２１、アナログ／デジタル（Ａ／Ｄ）変換部２２、デジタル信号処理部２３を備える。 The image signal processing unit 20 is configured by an electronic circuit such as a microcontroller, performs predetermined image processing on the image signal output from the image sensor 12, and displays the image signal after the image processing on the display unit 30 or the control unit. Output to 70. The image signal processing unit 20 includes an analog signal processing unit 21, an analog / digital (A / D) conversion unit 22, and a digital signal processing unit 23.

アナログ信号処理部２１は、画像信号を前処理する所謂アナログフロントエンドである。該アナログ信号処理部２１は、例えば、撮像素子１２から出力される画像信号に対して、ＣＤＳ（ｃｏｒｒｅｌａｔｅｄｄｏｕｂｌｅｓａｍｐｌｉｎｇ：相関２重サンプリング）処理、プログラマブルゲインアンプ（ＰＧＡ）によるゲイン処理などを行う。Ａ／Ｄ変換部２２は、アナログ信号処理部２１から入力されたアナログ画像信号をデジタル画像信号に変換して、デジタル信号処理部２３に出力する。デジタル信号処理部２３は、入力されたデジタル画像信号に対して、例えば、ノイズ除去、ホワイトバランス調整、色補正、エッジ強調、ガンマ補正等のデジタル信号処理を行って、表示部３０や制御部７０等に出力する。 The analog signal processing unit 21 is a so-called analog front end that preprocesses an image signal. The analog signal processing unit 21 performs, for example, CDS (correlated double sampling) processing, gain processing using a programmable gain amplifier (PGA), and the like on the image signal output from the image sensor 12. The A / D conversion unit 22 converts the analog image signal input from the analog signal processing unit 21 into a digital image signal and outputs the digital image signal to the digital signal processing unit 23. The digital signal processing unit 23 performs, for example, digital signal processing such as noise removal, white balance adjustment, color correction, edge enhancement, and gamma correction on the input digital image signal, and the display unit 30 and the control unit 70. Etc.

表示部３０は、例えば、液晶ディスプレイ（ＬＣＤ：ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）、有機ＥＬディスプレイなどの表示装置で構成される。表示部３０は、制御部７０による制御に従って、入力された各種の画像データを表示する。例えば、表示部３０は、撮像中に画像信号処理部２０からリアルタイムで入力される撮像画像（スルー画像）を表示する。これにより、ユーザは、デジタルカメラ１で撮像中のスルー画像を見ながら、デジタルカメラ１を操作することができる。また、記録媒体４０に記録されている撮像画像を再生したときに、表示部３０は、当該再生画像を表示する。これにより、ユーザは、記録媒体４０に記録されている撮像画像の内容を確認することができる。 The display unit 30 includes, for example, a display device such as a liquid crystal display (LCD) or an organic EL display. The display unit 30 displays various input image data under the control of the control unit 70. For example, the display unit 30 displays a captured image (through image) input in real time from the image signal processing unit 20 during imaging. Accordingly, the user can operate the digital camera 1 while viewing the through image being captured by the digital camera 1. Further, when the captured image recorded on the recording medium 40 is reproduced, the display unit 30 displays the reproduced image. Thereby, the user can confirm the content of the captured image recorded on the recording medium 40.

記録媒体４０は、上記撮像画像のデータ、音声データ、それらのメタデータなどの各種のデータを記憶する。記録媒体４０は、例えば、メモリカード等の半導体メモリ、又は、光ディスク、ハードディスク等のディスク状記録媒体などを使用できる。なお、光ディスクは、例えば、ブルーレイディスク（Ｂｌｕ−ｒａｙＤｉｓｃ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）又はＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）等を含む。なお、記録媒体４０は、デジタルカメラ１に内蔵されてもよいし、デジタルカメラ１に着脱可能なリムーバブルメディアであってもよい。 The recording medium 40 stores various types of data such as the captured image data, audio data, and metadata thereof. As the recording medium 40, for example, a semiconductor memory such as a memory card or a disk-shaped recording medium such as an optical disk or a hard disk can be used. The optical disc includes, for example, a Blu-ray Disc, a DVD (Digital Versatile Disc), a CD (Compact Disc), and the like. The recording medium 40 may be built in the digital camera 1 or a removable medium that can be attached to and detached from the digital camera 1.

収音部５０は、デジタルカメラ１周辺の外部音声を収音する。本実施形態に係る収音部５０は、１つの外部音声収録用のマイクロホン５１からなるモノラルマイクロホンであるが、２つのマイクロホンからなるステレオマイクロホンで構成されてもよい。マイクロホン５１は、外部音声を収音して得られた音声信号をそれぞれ出力する。かかる収音部５０により、動画撮像中に外部音声を収音して、動画と共に記録できるようになる。かかるマイクロホン５１は、外部音声（所望音）を収音するためにデジタルカメラ１の筐体に設けられているが、当該筐体内に設けられた発音部（上記駆動装置１４）の機械駆動音も雑音として収音してしまう。 The sound collection unit 50 collects external sound around the digital camera 1. The sound collection unit 50 according to the present embodiment is a monaural microphone composed of one external sound recording microphone 51, but may be composed of a stereo microphone composed of two microphones. The microphone 51 outputs a sound signal obtained by collecting external sound. The sound collecting unit 50 collects external sound during moving image capturing and can record it together with the moving image. The microphone 51 is provided in the housing of the digital camera 1 to pick up external sound (desired sound), but the mechanical drive sound of the sound generation unit (the driving device 14) provided in the housing is also included. Sound is picked up as noise.

音声信号処理部６０は、マイクロコントローラなどの電子回路で構成され、音声信号に対して所定の音声処理を施して、記録用の音声信号を出力する。この音声処理は、例えば、ＡＤ変換処理、雑音低減処理などを含む。本実施形態は、この音声信号処理部６０による雑音低減処理を特徴としているが、その詳細説明は後述する。 The audio signal processing unit 60 is configured by an electronic circuit such as a microcontroller, performs predetermined audio processing on the audio signal, and outputs an audio signal for recording. This voice processing includes, for example, AD conversion processing and noise reduction processing. The present embodiment is characterized by noise reduction processing by the audio signal processing unit 60, and the detailed description thereof will be described later.

制御部７０は、マイクロコントローラなどの電子回路で構成され、デジタルカメラ１の全体の動作を制御する。制御部７０は、例えば、ＣＰＵ７１、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）７２、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）７３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）７４を備える。かかる制御部７０は、デジタルカメラ１内の各部を制御する。例えば、制御部７０は、マイクロホン５１により収音された音声信号から、駆動装置１４で発生した機械音を雑音として低減するに、音声信号処理部６０の動作を制御する。 The control unit 70 is configured by an electronic circuit such as a microcontroller, and controls the entire operation of the digital camera 1. The control unit 70 includes, for example, a CPU 71, an EEPROM (Electrically Erasable Programmable ROM) 72, a ROM (Read Only Memory) 73, and a RAM (Random Access Memory) 74. The control unit 70 controls each unit in the digital camera 1. For example, the control unit 70 controls the operation of the audio signal processing unit 60 to reduce the mechanical sound generated by the driving device 14 as noise from the audio signal collected by the microphone 51.

制御部７０におけるＲＯＭ７３には、ＣＰＵ７１に各種の制御処理を実行させるためのプログラムが格納されている。ＣＰＵ７１は、該プログラムに基づいて動作して、ＲＡＭ７４を用いながら、上記各制御のための必要な演算・制御処理を実行する。該プログラムは、デジタルカメラ１に内蔵された記憶装置（例えば、ＥＥＰＲＯＭ７２、ＲＯＭ７３等）に予め格納しておくことができる。また、当該プログラムは、ディスク状記録媒体、メモリカードなどのリムーバブル記録媒体に格納されて、デジタルカメラ１に提供されてもよいし、ＬＡＮ、インターネット等のネットワークを介してデジタルカメラ１にダウンロードされてもよい。 The ROM 73 in the control unit 70 stores programs for causing the CPU 71 to execute various control processes. The CPU 71 operates based on the program and executes the necessary calculation / control processing for each control described above while using the RAM 74. The program can be stored in advance in a storage device (for example, EEPROM 72, ROM 73, etc.) built in the digital camera 1. Further, the program may be stored in a removable recording medium such as a disk-shaped recording medium or a memory card and provided to the digital camera 1 or downloaded to the digital camera 1 via a network such as a LAN or the Internet. Also good.

ここで、制御部７０による制御の具体例について説明する。制御部７０は、上記撮像部１０のＴＧ１３や駆動装置１４を制御して、撮像部１０による撮像処理を制御する。例えば、制御部７０は、上記撮像光学系１１の絞りの調整、撮像素子１２の電子シャッタースピードの設定、アナログ信号処理部２１のＡＧＣのゲイン設定などにより、自動露光制御を行う（ＡＥ機能）。また、制御部７０は、上記撮像光学系１１のフォーカスレンズを移動させて、フォーカスポジションを変更することで、特定の被写体に対して撮像光学系１１の焦点を自動的に合わせるオートフォーカス制御を行う（ＡＦ機能）。また、制御部７０は、上記撮像光学系１１のズームレンズを移動させて、ズームポジションを変更することで、撮像画像の画角を調整する。また、制御部７０は、記録媒体４０に対して撮像画像、メタデータなどの各種のデータを記録し、また、記録媒体４０に記録されているデータを読み出して再生する。さらに、制御部７０は、表示部３０に表示するための各種の表示画像を生成し、表示部３０を制御して該表示画像を表示させる。 Here, a specific example of control by the control unit 70 will be described. The control unit 70 controls the TG 13 and the driving device 14 of the imaging unit 10 to control the imaging process by the imaging unit 10. For example, the control unit 70 performs automatic exposure control (AE function) by adjusting the aperture of the imaging optical system 11, setting the electronic shutter speed of the imaging device 12, setting the AGC gain of the analog signal processing unit 21, and the like. Further, the control unit 70 moves the focus lens of the imaging optical system 11 and changes the focus position, thereby performing autofocus control for automatically focusing the imaging optical system 11 on a specific subject. (AF function). The control unit 70 adjusts the angle of view of the captured image by moving the zoom lens of the imaging optical system 11 and changing the zoom position. In addition, the control unit 70 records various data such as captured images and metadata on the recording medium 40, and reads and reproduces data recorded on the recording medium 40. Further, the control unit 70 generates various display images to be displayed on the display unit 30 and controls the display unit 30 to display the display image.

操作部８０、表示部３０は、ユーザがデジタルカメラ１の動作を操作するためのユーザインターフェースとして機能する。操作部８０は、ボタン、レバー等の各種の操作キー、又はタッチパネル等で構成され、例えば、ズームボタン、シャッターボタン、電源ボタンなどを含む。操作部８０は、ユーザ操作に応じて、各種の撮像動作を指示するための指示情報を制御部７０に出力する。 The operation unit 80 and the display unit 30 function as a user interface for the user to operate the operation of the digital camera 1. The operation unit 80 includes various operation keys such as buttons and levers, or a touch panel, and includes, for example, a zoom button, a shutter button, and a power button. The operation unit 80 outputs instruction information for instructing various imaging operations to the control unit 70 in accordance with a user operation.

［１．２．２．音声信号処理装置の機能構成］
次に、図３を参照して、本実施形態に係るデジタルカメラ１に適用された音声信号処理装置の機能構成例について説明する。図２は、本実施形態に係る音声信号処理装置１００の機能構成を示すブロック図である。 [1.2.2. Functional configuration of audio signal processing apparatus]
Next, a functional configuration example of the audio signal processing device applied to the digital camera 1 according to the present embodiment will be described with reference to FIG. FIG. 2 is a block diagram showing a functional configuration of the audio signal processing apparatus 100 according to the present embodiment.

図３に示すように、音声信号処理装置１００は、信号入力部１１０と、入出力用バッファメモリ１２０（第１のバッファメモリ）と、補間用バッファメモリ１３０（第２のバッファメモリ）と、雑音検出部１４０と、雑音低減部１５０と、信号出力部１６０とを備える。信号入力部１１０は、上記図２のマイクロホン５１を備える。雑音低減部１５０は、補間信号生成部１５２と、信号補間部１５４とを備える、また、上記入出力用バッファメモリ１２０、補間用バッファメモリ１３０、雑音検出部１４０及び雑音低減部１５０は、上記図２の音声信号処理部６０を構成する。 As shown in FIG. 3, the audio signal processing apparatus 100 includes a signal input unit 110, an input / output buffer memory 120 (first buffer memory), an interpolation buffer memory 130 (second buffer memory), noise, and the like. A detection unit 140, a noise reduction unit 150, and a signal output unit 160 are provided. The signal input unit 110 includes the microphone 51 of FIG. The noise reduction unit 150 includes an interpolation signal generation unit 152 and a signal interpolation unit 154. Further, the input / output buffer memory 120, the interpolation buffer memory 130, the noise detection unit 140, and the noise reduction unit 150 are the same as those shown in FIG. 2 audio signal processing units 60 are configured.

これら音声信号処理装置１００の各部は、専用のハードウェアで構成されてもよいし、ソフトウェアで構成されてもよい。ソフトウェアを用いる場合、音声信号処理装置１００のプロセッサが、以下に説明する各機能部の機能を実現するためのプログラムを実行すればよい。当該プログラムは、コンピュータ読み取り可能な記録媒体（例えば、光ディスク、ハードディスク、半導体メモリ等）を介して音声信号処理装置１００に提供されてもよいし、又は各種の通信手段を介して提供されてもよい。以下に、音声信号処理装置１００の各部について説明する。 Each unit of the audio signal processing apparatus 100 may be configured by dedicated hardware or software. When software is used, the processor of the audio signal processing device 100 may execute a program for realizing the functions of the functional units described below. The program may be provided to the audio signal processing apparatus 100 via a computer-readable recording medium (for example, an optical disc, a hard disk, a semiconductor memory, etc.), or may be provided via various communication means. . Below, each part of the audio | voice signal processing apparatus 100 is demonstrated.

信号入力部１１０は、デジタルカメラ１の筐体に設置されたマイクロホン５１、ＡＤ変換部（図示せず。）等で構成される。信号入力部１１０は、マイクロホン５１は、デジタルカメラ１の周囲の所望音（録音対象の音声）を収音し、当該外部音声を音声信号に変換して出力する。この音声信号には、所望音のみならず、デジタルカメラ１の駆動装置１４で発生するパルス機械音やその他の機械駆動音などの雑音が混入する。また、不図示のＡＤ変換部は、上記マイクロホン５１から出力されたアナログ音声信号を、デジタル音声信号に変換して、出力する。 The signal input unit 110 includes a microphone 51 installed in the housing of the digital camera 1, an AD conversion unit (not shown), and the like. In the signal input unit 110, the microphone 51 collects a desired sound (sound to be recorded) around the digital camera 1, converts the external sound into a sound signal, and outputs the sound signal. In this audio signal, not only the desired sound but also noise such as pulse mechanical sound generated by the driving device 14 of the digital camera 1 and other mechanical driving sound are mixed. An AD converter (not shown) converts the analog audio signal output from the microphone 51 into a digital audio signal and outputs the digital audio signal.

入出力用バッファメモリ１２０（第１のバッファメモリ）、補間用バッファメモリ１３０（第２のバッファメモリ）は、マイクロホンから入力された音声信号や、生成した補間信号を一時保存する信号保持部として機能する。このように、本実施形態に係る音声信号処理装置１００は、２つのバッファメモリを備えており、この２つのバッファメモリを用いて音声信号を所定区間ごとに（つまり、フレーム単位で）処理することで雑音を低減する。本実施形態では、入出力用バッファメモリ１２０と補間用バッファメモリ１３０は、信号入力部１１０と信号出力部１６０との間に並列に接続されており、これにより、２つの区間の音声信号を並列処理することができる。 The input / output buffer memory 120 (first buffer memory) and the interpolation buffer memory 130 (second buffer memory) function as a signal holding unit that temporarily stores the audio signal input from the microphone and the generated interpolation signal. To do. As described above, the audio signal processing apparatus 100 according to the present embodiment includes two buffer memories, and processes audio signals for each predetermined section (that is, in units of frames) using the two buffer memories. Reduce noise. In the present embodiment, the input / output buffer memory 120 and the interpolation buffer memory 130 are connected in parallel between the signal input unit 110 and the signal output unit 160, whereby the audio signals of the two sections are paralleled. Can be processed.

音声信号処理装置１００が音声信号をフレーム単位で入出力及び処理するために、出力用バッファメモリ１２０は、現在入力される音声信号の１フレーム分を一時保存する。補間用バッファメモリ１３０は、雑音区間を補間するために、１フレーム分過去に入力された音声信号を保持する。これら２つのバッファメモリのメモリ長は同一であり、例えば、それぞれのバッファメモリが、１フレーム分のデジタル音声信号（サンプルデータ数Ｎ）を保存可能である。従って、音声信号処理装置１００が備えるバッファメモリの長さは、２＊Ｎとなる。なお、入出力用バッファメモリ１２０及び補間用バッファメモリ１３０は、物理的に分離された２つのバッファメモリで構成されてもよいし、物理的に１つのバッファメモリの記憶領域を分離することで構成されてもよい。 In order for the audio signal processing apparatus 100 to input / output and process audio signals in units of frames, the output buffer memory 120 temporarily stores one frame of the currently input audio signal. The interpolation buffer memory 130 holds an audio signal input in the past for one frame in order to interpolate the noise section. The memory lengths of these two buffer memories are the same. For example, each buffer memory can store a digital audio signal (number of sample data N) for one frame. Therefore, the length of the buffer memory included in the audio signal processing apparatus 100 is 2 * N. The input / output buffer memory 120 and the interpolation buffer memory 130 may be configured by two physically separated buffer memories, or may be configured by physically separating the storage areas of one buffer memory. May be.

入出力用バッファメモリ１２０は、信号入力部１１０から入力された音声信号を、所定区間ごとに（例えば、１フレームずつ）一時保存する。この入出力用バッファメモリ１２０は、入力される音声信号の１フレーム分全てを保存完了した時点で、当該１フレームの音声信号を出力する。これにより、信号入力部１１０から入力された音声信号は、１フレームずつ順次、入出力用バッファメモリ１２０に保存された後に、信号出力部１６０に出力される。 The input / output buffer memory 120 temporarily stores the audio signal input from the signal input unit 110 for each predetermined section (for example, one frame at a time). The input / output buffer memory 120 outputs the audio signal of one frame when the storage of all the frames of the input audio signal is completed. Thus, the audio signal input from the signal input unit 110 is sequentially stored in the input / output buffer memory 120 frame by frame and then output to the signal output unit 160.

また、入出力用バッファメモリ１２０から出力された１フレームの音声信号は、補間用バッファメモリ１３０に一時保存される。つまり、補間用バッファメモリ１３０は、入出力用バッファメモリ１２０に保存されている現在のフレーム（ｎ番目のフレーム）の音声信号よりも１つ前の過去のフレーム（ｎ−１番目のフレーム）の音声信号を一時保存する。従って、信号入力部１１０から入力されるｎ番目のフレームの音声信号が、入出力用バッファメモリ１２０に蓄積されている最中には、補間用バッファメモリ１３０にｎ−１番目のフレームの音声信号が保存されていることになる。これら２つのバッファメモリにより、常時、２フレーム分の音声信号が音声信号処理装置１００内に保持される。 The audio signal of one frame output from the input / output buffer memory 120 is temporarily stored in the interpolation buffer memory 130. In other words, the interpolation buffer memory 130 stores the previous frame (n−1th frame) before the audio signal of the current frame (nth frame) stored in the input / output buffer memory 120. Temporarily save the audio signal. Therefore, while the audio signal of the nth frame input from the signal input unit 110 is being accumulated in the input / output buffer memory 120, the audio signal of the (n-1) th frame is stored in the interpolation buffer memory 130. Is saved. By these two buffer memories, the audio signal for two frames is always held in the audio signal processing apparatus 100.

雑音検出部１４０は、信号入力部１１０から入力された音声信号のうち、パルス機械音等の雑音が含まれる区間（雑音区間）を検出する。雑音検出部１４０は、入出力用バッファメモリ１２０に保存されている所定区間の音声信号に雑音が含まれるか否かを検出し、雑音が含まれる場合は、当該区間が雑音区間であると判定する。雑音検出部１４０は、雑音区間を検出したときに、その区間を表す情報を雑音低減部１５０に通知する。 The noise detection unit 140 detects a section (noise section) in which noise such as pulse mechanical sound is included in the audio signal input from the signal input unit 110. The noise detection unit 140 detects whether or not the audio signal in the predetermined section stored in the input / output buffer memory 120 includes noise. If the noise is included, the noise detection section 140 determines that the section is a noise section. To do. When the noise detection unit 140 detects a noise interval, the noise detection unit 140 notifies the noise reduction unit 150 of information representing the interval.

例えば、雑音がパルス機械音である場合、雑音検出部１４０は、上記駆動装置１４が動作している区間を、雑音区間として検出する。雑音検出部１４０は、駆動装置１４の制御情報を取得することで、当該制御情報から駆動装置１４の動作期間（雑音区間）を検出可能である。 For example, when the noise is a pulse mechanical sound, the noise detection unit 140 detects a section in which the driving device 14 is operating as a noise section. The noise detection unit 140 can detect the operation period (noise interval) of the drive device 14 from the control information by acquiring the control information of the drive device 14.

また、雑音検出部１４０は、信号入力部１１０から入力された実際の音声信号を解析して雑音の特徴量を抽出することで、雑音の有無を判定し、雑音期間を検出してもよい。例えば、パルス機械音はパルス成分及び残響成分という特徴的な成分を含むため、これら２種類の成分を検出することができれば、パルス機械音の有無を正確に検出できる。そこで、雑音検出部１４０は、マイクロホン５１から出力された音声信号から、上記パルス機械音のパルス成分を表す特徴量（例えば、パルス成分の振幅最大値Ａ、パルス幅Ｗ）、パルス機械音の残響成分を表す特徴量（例えば、パルス機械音の残響成分を表す狭帯域信号のパワー値Ｐ、当該狭帯域信号の零交差点回数Ｍ）を抽出する。そして、雑音検出部１４０は、上記パルス機械音を表す特徴量（振幅最大値Ａ、パルス幅Ｗ、残響成分パワー値Ｐ等）に基づいて、音声信号にパルス機械音が含まれるか否かを判定する。例えば、雑音検出部１４０は、統計的識別法又はテーブル判定を用いた判定方法により、上記特徴量と所定の判定係数を用いて、音声信号におけるパルス機械音の有無を総合的に判定する。これにより、音声信号にパルス機械音が含まれているか否かを判定し、音声信号におけるパルス機械音が含まれている区間を特定することができる。 In addition, the noise detection unit 140 may analyze the actual speech signal input from the signal input unit 110 and extract a noise feature amount, thereby determining the presence or absence of noise and detecting the noise period. For example, since the pulse mechanical sound includes characteristic components such as a pulse component and a reverberation component, the presence or absence of the pulse mechanical sound can be accurately detected if these two types of components can be detected. Therefore, the noise detection unit 140, from the audio signal output from the microphone 51, features representing the pulse component of the pulse mechanical sound (for example, the maximum amplitude A of the pulse component, the pulse width W), the reverberation of the pulse mechanical sound. A feature amount representing a component (for example, a power value P of a narrowband signal representing a reverberation component of a pulse mechanical sound and the number of zero crossings M of the narrowband signal) is extracted. Then, the noise detection unit 140 determines whether or not a pulse mechanical sound is included in the audio signal based on the feature amount (the maximum amplitude value A, the pulse width W, the reverberation component power value P, etc.) representing the pulse mechanical sound. judge. For example, the noise detection unit 140 comprehensively determines the presence / absence of a pulse mechanical sound in the audio signal using the feature amount and a predetermined determination coefficient by a determination method using a statistical identification method or a table determination. Thus, it is possible to determine whether or not the pulse signal is included in the audio signal, and to specify the section in which the pulse signal is included in the audio signal.

雑音低減部１５０は、上記雑音検出部１４０による検出結果に応じて、音声信号に対して雑音低減処理を行い、音声信号からパルス機械音等の雑音を除去する。具体的には、入出力用バッファメモリ１２０に保存されている区間の音声信号にパルス機械音等の雑音が含まれると判定された場合に、雑音低減部１５０は、当該パルス機械音が含まれる区間の音声信号に対して雑音低減処理を行う。一方、パルス機械音が含まれていないと判定された場合に、雑音低減部１５０は、雑音低減処理を行わない。このように、パルス機械音が含まれる場合にのみ、当該パルス機械音が含まれる区間（雑音区間）の音声信号に対して雑音低減処理を行うことで、雑音低減処理の処理効率を向上し、無駄な処理負荷を軽減できる。 The noise reduction unit 150 performs noise reduction processing on the audio signal according to the detection result by the noise detection unit 140, and removes noise such as pulse mechanical sound from the audio signal. Specifically, when it is determined that noise such as pulse mechanical sound is included in the audio signal in the section stored in the input / output buffer memory 120, the noise reduction unit 150 includes the pulse mechanical sound. Noise reduction processing is performed on the audio signal in the section. On the other hand, when it is determined that no pulse mechanical sound is included, the noise reduction unit 150 does not perform noise reduction processing. Thus, only when pulse mechanical sound is included, by performing noise reduction processing on the audio signal in the section (noise section) including the pulse mechanical sound, the processing efficiency of the noise reduction processing is improved, Unnecessary processing load can be reduced.

雑音低減部１５０は、雑音低減方法として、雑音区間の前又は後の区間の信号から当該雑音区間の背景音の信号波形を推定し、推定した信号を用いて雑音区間の信号を補間する方法を使用する。この補間方法を実行するために、雑音低減部１５０は、補間信号生成部１５２と、信号補間部１５４とを備える。 As a noise reduction method, the noise reduction unit 150 estimates a signal waveform of a background sound in the noise section from a signal in a section before or after the noise section, and interpolates a signal in the noise section using the estimated signal. use. In order to execute this interpolation method, the noise reduction unit 150 includes an interpolation signal generation unit 152 and a signal interpolation unit 154.

補間信号生成部１５２は、雑音区間の前の区間の信号を用いて、雑音区間を補間するための補間信号を生成する。この補間信号の生成処理は、入出力用バッファメモリ１２０に保存されている現在のフレーム（ｎ番目のフレーム）の音声信号に雑音が含まれることが検出されたときに、実行される。このとき、補間信号生成部１５２は、補間用バッファメモリ１３０に保存されている１フレーム過去（ｎ−１番目のフレーム）の音声信号を用いて、現在、入出力用バッファメモリ１２０に保存されている雑音区間の音声信号を補間するための補間信号を生成する。 The interpolation signal generation unit 152 generates an interpolation signal for interpolating the noise interval using the signal in the interval before the noise interval. This interpolation signal generation process is executed when it is detected that the audio signal of the current frame (n-th frame) stored in the input / output buffer memory 120 contains noise. At this time, the interpolation signal generation unit 152 is currently stored in the input / output buffer memory 120 using the audio signal of the previous frame (n−1 frame) stored in the interpolation buffer memory 130. An interpolated signal for interpolating the audio signal in the noise section is generated.

ここで、図４、図５を参照して、上記補間信号の生成方法の例について説明する。図４、図５は、本実施形態に係る雑音区間の前の入力音声信号から補間信号を生成する方法を示す概念図である。 Here, an example of the method of generating the interpolation signal will be described with reference to FIGS. 4 and 5 are conceptual diagrams showing a method for generating an interpolated signal from the input speech signal before the noise section according to the present embodiment.

（ａ）シンプルな生成方法
図４の上段に示すように、補間用バッファメモリ１３０に保存されている１フレームの音声信号をｓ（ｎ）＝{ｓ_０，ｓ_１，・・・，ｓ_Ｎ−１}と表現する。ここで、ｓ_０，ｓ_１，・・・，ｓ_Ｎ−１は、当該１フレーム中のＮ個のサンプルデータの値を示す。かかる音声信号ｓ（ｎ）から補間信号Ｖ（ｎ）を生成する場合、例えば、図４の中段に示すように、音声信号ｓ（ｎ）を時間軸方向に反転させて、補間信号ｖ（ｎ）＝{ｓ_Ｎ−１，ｓ_Ｎ−２，・・・，ｓ_１，ｓ_０}を生成してもよい。また、図４の下段に示すように、音声信号ｓ（ｎ）を時間軸方向及び振幅方向に反転させて、補間信号ｖ（ｎ）＝{−ｓ_Ｎ−１，−ｓ_Ｎ−２，・・・，−ｓ_１，−ｓ_０}を生成してもよい。 (A) Simple Generation Method As shown in the upper part of FIG. 4, s (n) = {s ₀ , s ₁ ,..., S _{N -1} }. Here, s ₀ , s ₁ ,..., S _N−1 indicate the values of N sample data in the one frame. When generating the interpolated signal V (n) from the audio signal s (n), for example, as shown in the middle part of FIG. 4, the audio signal s (n) is inverted in the time axis direction and the interpolated signal v (n ) = {S _N−1 , s _N−2 ,..., S ₁ , s ₀ } may be generated. Further, as shown in the lower part of FIG. 4, the audio signal s (n) is inverted in the time axis direction and the amplitude direction, and the interpolation signal v (n) = {− s _N−1 , −s _N−2 ,. .., -S ₁ , -s ₀ } may be generated.

（ｂ）窓を用いた生成方法
また、図５は、別の補間信号生成方法を示す。図５に示すように、音声信号ｓ（ｎ）に適当な窓ｗ（ｎ）を乗算した信号ｐ（ｎ）とｑ（ｎ）を合成することで、より自然な補間信号ｖ（ｎ）を生成することもできる。ここで、窓ｗ（ｎ）としては、ハニング窓又はバーとレット窓などを使用できる。より詳細には、図５に示すように、まず、音声信号ｓ（ｎ）＝{ｓ_０，ｓ_１，・・・，ｓ_Ｎ−１}に窓ｗ（ｎ）＝{ｗ_０，ｗ_１，・・・，ｗ_Ｎ−１}を乗算して、信号ｐ（ｎ）＝{ｓ_０ｗ_０，ｓ_１ｗ_１，・・・，ｓ_Ｎ−１ｗ_Ｎ−１}を生成する。次いで、信号ｐ（ｎ）を時間軸方向に反転させて、信号ｑ（ｎ）＝{ｓ_Ｎ−１ｗ_Ｎ−１，・・・，ｓ_１ｗ_１，ｓ_０ｗ_０}を生成する。そして、信号ｐ（ｎ）と信号ｑ（ｎ）を加算して、補間信号ｖ（ｎ）＝ｐ（ｎ）＋ｑ（ｎ）＝{ｓ_０ｗ_０＋ｓ_Ｎ−１ｗ_Ｎ−１，ｓ_１ｗ_１＋ｓ_Ｎ−２ｗ_Ｎ−２，・・・，ｓ_Ｎ−１ｗ_Ｎ−１＋ｓ_０ｗ_０}を生成する。或いは、信号ｐ（ｎ）から信号ｑ（ｎ）を減算して、補間信号ｖ（ｎ）＝ｐ（ｎ）−ｑ（ｎ）＝{ｓ_０ｗ_０−ｓ_Ｎ−１ｗ_Ｎ−１，ｓ_１ｗ_１−ｓ_Ｎ−２ｗ_Ｎ−２，・・・，ｓ_Ｎ−１ｗ_Ｎ−１−ｓ_０ｗ_０}を生成する。このようにして、音声信号ｓ（ｎ）から、より自然な補間信号ｖ（ｎ）を生成することも可能である。 (B) Generation Method Using Window FIG. 5 shows another interpolation signal generation method. As shown in FIG. 5, a more natural interpolation signal v (n) is obtained by synthesizing signals p (n) and q (n) obtained by multiplying an audio signal s (n) by an appropriate window w (n). It can also be generated. Here, as the window w (n), a Hanning window or a bar and a let window can be used. More specifically, as shown in FIG. 5, first, the window w (n) = {w ₀ , w _{1 in the} audio signal s (n) = {s ₀ , s ₁ ,..., S _N−1 }. ,..., W _N ₋₁ } to generate a signal p (n) = {s ₀ w ₀ , s ₁ w ₁ ,..., S _N−1 w _N−1 }. Next, the signal p (n) is inverted in the time axis direction to generate signals q (n) = {s _N−1 w _N−1 ,..., S ₁ w ₁ , s ₀ w ₀ }. Then, the signal p (n) and the signal q (n) are added, and the interpolation signal v (n) = p (n) + q (n) = {s ₀ w ₀ + s _N−1 w _N−1 , s ₁ w ₁ + s _N−2 w _N−2 ,..., s _N−1 w _N−1 + s ₀ w ₀ } are generated. Alternatively, the signal q (n) is subtracted from the signal p (n), and the interpolation signal v (n) = p (n) −q (n) = {s ₀ w ₀ −s _N−1 w _N−1 , _{_{_{_{s 1 w 1 -s N-2}}}} w N-2, ···, and generates an _{s N-1 w N-1} -s 0 w 0}. In this way, a more natural interpolation signal v (n) can be generated from the audio signal s (n).

再び図３を参照して、音声信号処理装置１００の各部の説明を続ける。図３に示すように、信号補間部１５４は、上記補間信号生成部１５２により生成された補間信号を用いて、入出力用バッファメモリ１２０に保存されているｎ番目のフレームの音声信号（雑音区間の音声信号）を補間する。 With reference to FIG. 3 again, description of each part of the audio signal processing apparatus 100 will be continued. As illustrated in FIG. 3, the signal interpolation unit 154 uses the interpolation signal generated by the interpolation signal generation unit 152 to generate an nth frame audio signal (noise interval) stored in the input / output buffer memory 120. Audio signal).

例えば、信号補間部１５４は、入出力用バッファメモリ１２０に保存されている雑音区間の音声信号の全ての振幅値（つまり、Ｎ個のサンプルデータ）をゼロにした後に、上記補間信号をそのまま上書きすることによって、補間処理を実行してもよい。この補間処理により、雑音を含むｎ番目の区間の音声信号が補間信号に置換されて出力される。或いは、信号補間部１５４は、入出力用バッファメモリ１２０に保存されている雑音区間の音声信号と、補間信号を適当な混合比で合成することで、補間処理を実行してもよい。この補間処理により、雑音区間の音声信号が、雑音を低減された上で出力される。 For example, the signal interpolation unit 154 sets all amplitude values (that is, N sample data) of the audio signal in the noise section stored in the input / output buffer memory 120 to zero, and then overwrites the interpolation signal as it is. By doing so, the interpolation process may be executed. By this interpolation processing, the sound signal in the nth section including noise is replaced with the interpolation signal and output. Alternatively, the signal interpolation unit 154 may perform the interpolation process by synthesizing the speech signal in the noise interval stored in the input / output buffer memory 120 and the interpolation signal with an appropriate mixing ratio. By this interpolation processing, the audio signal in the noise section is output after noise is reduced.

かる信号補間部１５４による補間処理により、入力された雑音区間の音声信号に換えて、補間信号で補間された音声信号が出力されるようになるので、当該雑音区間に含まれる雑音を低減・除去することができる。 As a result of the interpolation processing by the signal interpolation unit 154, a voice signal interpolated with the interpolation signal is output instead of the voice signal of the input noise section, so that noise included in the noise section is reduced / removed. can do.

信号出力部１６０は、上記入出力用バッファメモリ１２０から出力された音声信号を１フレームずつ外部に出力する。雑音低減部１５０により雑音低減処理がなされた場合には、信号出力部１６０は、雑音が低減された音声信号を出力する。例えば、信号出力部１６０は、上記音声信号を信号記録部（上記図２の制御部７０及び記録媒体４０で構成される。）に出力してもよいし、或いは、スピーカ又はヘッドホンなどの音声出力部（図示せず。）に出力してもよい。音声信号を信号記録部に出力した場合には、上記雑音が低減された音声信号が記録媒体（図示せず。）に記録される。なお、記録媒体は、ハードディスク、磁気テープ等の磁気記録媒体、ＤＶＤ、ブルーレイディスク等の光記録媒体、フラッシュメモリ、ＵＳＢメモリ等の半導体メモリなど、任意の記録媒体であってよい。 The signal output unit 160 outputs the audio signal output from the input / output buffer memory 120 to the outside frame by frame. When noise reduction processing is performed by the noise reduction unit 150, the signal output unit 160 outputs an audio signal with reduced noise. For example, the signal output unit 160 may output the audio signal to a signal recording unit (configured by the control unit 70 and the recording medium 40 in FIG. 2), or an audio output such as a speaker or headphones. May be output to a unit (not shown). When an audio signal is output to the signal recording unit, the audio signal with reduced noise is recorded on a recording medium (not shown). The recording medium may be any recording medium such as a magnetic recording medium such as a hard disk or a magnetic tape, an optical recording medium such as a DVD or a Blu-ray disc, a semiconductor memory such as a flash memory or a USB memory.

［１．３．音声信号処理装置の動作］
次に、本実施形態に係る音声信号処理装置１００の動作について説明する。以下では、雑音がない通常時の動作例と、雑音発生時の動作例についてそれぞれ説明する。 [1.3. Operation of audio signal processing apparatus]
Next, the operation of the audio signal processing apparatus 100 according to the present embodiment will be described. In the following, an example of normal operation without noise and an example of operation when noise occurs will be described.

［１．３．１．雑音がない通常時の動作例］
まず、図６を参照して、雑音がない通常時の音声信号処理装置１００の動作について説明する。図６は、本実施形態に係る音声信号処理装置１００の通常時の動作を示す模式図である。 [1.3.1. Example of normal operation without noise]
First, with reference to FIG. 6, the operation of the audio signal processing apparatus 100 during normal time without noise will be described. FIG. 6 is a schematic diagram showing the normal operation of the audio signal processing apparatus 100 according to the present embodiment.

図６に示すように、雑音が発生していない通常時には、マイクロホン５１から入力された音声信号は、フレーム単位で順次、入出力用バッファメモリ１２０、補間用バッファメモリ１３０に一時保存される。補間用バッファメモリ１３０に保存されるフレームは、入出力用バッファメモリ１２０に蓄積されているフレームよりも１つ前（過去）のフレームである。例えば、図６Ａに示すように、現在、ｎ番目のフレームの音声信号ｓ（ｎ）が新たに入力されて、入出力用バッファメモリ１２０に蓄積されているときには、１フレーム分だけ過去に入力されたｎ−１番目のフレームの音声信号ｓ（ｎ−１）が補間用バッファメモリ１３０に保存されている。 As shown in FIG. 6, during normal times when noise is not generated, the audio signal input from the microphone 51 is temporarily stored in the input / output buffer memory 120 and the interpolation buffer memory 130 sequentially in units of frames. The frame stored in the interpolation buffer memory 130 is a frame one previous (past) before the frame stored in the input / output buffer memory 120. For example, as shown in FIG. 6A, when the audio signal s (n) of the nth frame is newly input and stored in the input / output buffer memory 120, it is input by one frame in the past. The audio signal s (n−1) of the (n−1) th frame is stored in the interpolation buffer memory 130.

そして、ｎ番目のフレームの音声信号ｓ（ｎ）の全てが入出力用バッファメモリ１２０に蓄積完了された時に直ちに、図６Ｂに示すように、当該入出力用バッファメモリ１２０に保存されているｎ番目のフレームの音声信号ｓ（ｎ）が外部に出力され、入出力用バッファメモリ１２０内のデータが消去される。このとき、雑音は検出されていないので、ｎ番目のフレームの音声信号ｓ（ｎ）に対して何ら特別な処理を施すことなく、当該音声信号ｓ（ｎ）がそのまま出力される。また、当該音声信号ｓ（ｎ）の出力とともに、当該音声信号ｓ（ｎ）が補間用バッファメモリ１３０にコピーされる。これは、次に入力されるｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）に雑音が検出された場合に、補間用バッファメモリ１３０内のｎ番目のフレームの音声信号ｓ（ｎ）から、ｎ＋１番目のフレーム用の補間信号ｖ（ｎ＋１）を生成するためである。 Then, as soon as all the audio signals s (n) of the nth frame have been accumulated in the input / output buffer memory 120, n stored in the input / output buffer memory 120 as shown in FIG. 6B. The audio signal s (n) of the second frame is output to the outside, and the data in the input / output buffer memory 120 is erased. At this time, since no noise is detected, the audio signal s (n) is output as it is without performing any special processing on the audio signal s (n) of the nth frame. The audio signal s (n) is copied to the interpolation buffer memory 130 together with the output of the audio signal s (n). This is because, when noise is detected in the audio signal s (n + 1) of the n + 1th frame that is input next, the n + 1th frame from the audio signal s (n) of the nth frame in the interpolation buffer memory 130. This is to generate the interpolation signal v (n + 1) for the frames.

［１．３．２．雑音発生時の動作例］
次に、図７を参照して、雑音発生時の音声信号処理装置１００の動作について説明する。図７は、本実施形態に係る音声信号処理装置１００の雑音発生時の動作例を示す模式図である。 [1.3.2. Example of operation when noise occurs]
Next, the operation of the audio signal processing apparatus 100 when noise occurs will be described with reference to FIG. FIG. 7 is a schematic diagram illustrating an operation example when noise is generated in the audio signal processing apparatus 100 according to the present embodiment.

図７に示すように、入力される音声信号に雑音（例えばパルス機械音）が含まれる場合であっても、マイクロホン５１から入力された音声信号は、フレーム単位で順次、入出力用バッファメモリ１２０、補間用バッファメモリ１３０に一時保存される。図７Ａに示すように、雑音が含まれるｎ番目のフレームの音声信号ｓ（ｎ）が新たに入力され、入出力用バッファメモリ１２０に蓄積されているときには、１フレーム分だけ過去のｎ−１番目のフレームの音声信号ｓ（ｎ−１）が補間用バッファメモリ１３０に一時保存されている。 As shown in FIG. 7, even when noise (for example, pulse mechanical sound) is included in the input audio signal, the audio signal input from the microphone 51 is sequentially input / output buffer memory 120 in units of frames. , Temporarily stored in the interpolation buffer memory 130. As shown in FIG. 7A, when an audio signal s (n) of the nth frame including noise is newly input and stored in the input / output buffer memory 120, the previous n−1 is stored by one frame. The audio signal s (n−1) of the second frame is temporarily stored in the interpolation buffer memory 130.

そして、ｎ番目のフレームの音声信号ｓ（ｎ）の全てが入出力用バッファメモリ１２０に蓄積完了し、かつ、当該音声信号ｓ（ｎ）に雑音が含まれることが検出されたときには、図７Ｂに示す補間処理が直ちに実行される。つまり、補間信号生成部１５２は、図７Ｂに示すように、補間用バッファメモリ１３０に保存されているｎ−１番目のフレームの音声信号ｓ（ｎ−１）から、雑音区間（ｎ番目のフレーム）の音声信号ｓ（ｎ）を補間するための補間信号ｖ（ｎ）を生成する。この補間信号ｖ（ｎ）の生成方法は前述した通りである（図４、図５参照。）。図６Ｂの例では、ｎ−１番目のフレームの音声信号ｓ（ｎ−１）を時間軸方向に反転させることにより、補間信号ｖ（ｎ）が生成されている。そして、信号補間部１５４は、入出力用バッファメモリ１２０に保存されているｎ番目のフレームの音声信号ｓ（ｎ）を削除して、上記補間信号ｖ（ｎ）を入出力用バッファメモリ１２０に保存する。 When it is detected that all of the audio signal s (n) of the nth frame has been accumulated in the input / output buffer memory 120 and noise is included in the audio signal s (n), FIG. The interpolation process shown in FIG. That is, as illustrated in FIG. 7B, the interpolation signal generation unit 152 generates a noise interval (nth frame) from the n−1th frame audio signal s (n−1) stored in the interpolation buffer memory 130. ) To generate an interpolated signal v (n) for interpolating the audio signal s (n). The method of generating the interpolation signal v (n) is as described above (see FIGS. 4 and 5). In the example of FIG. 6B, the interpolation signal v (n) is generated by inverting the audio signal s (n−1) of the (n−1) th frame in the time axis direction. Then, the signal interpolation unit 154 deletes the audio signal s (n) of the nth frame stored in the input / output buffer memory 120, and the interpolated signal v (n) is stored in the input / output buffer memory 120. save.

次いで、図７Ｃに示すように、信号補間部１５４は、入出力用バッファメモリ１２０に保存されている補間信号ｖ（ｎ）を、図７Ａで実際に入力されたｎ番目のフレームの音声信号ｓ（ｎ）に換えて外部に出力し、入出力用バッファメモリ１２０内のデータを消去する。さらに、信号補間部１５４は、上記補間信号ｖ（ｎ）の出力とともに、当該補間信号ｖ（ｎ）を補間用バッファメモリ１３０にコピーする。これは、次に入力されるｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）に雑音が検出された場合に、補間用バッファメモリ１３０内の補間信号ｖ（ｎ）から、ｎ＋１番目のフレーム用の補間信号ｖ（ｎ＋１）を生成するためである。 Next, as illustrated in FIG. 7C, the signal interpolation unit 154 converts the interpolation signal v (n) stored in the input / output buffer memory 120 into the audio signal s of the nth frame actually input in FIG. 7A. Instead of (n), the data is output to the outside, and the data in the input / output buffer memory 120 is erased. Further, the signal interpolation unit 154 copies the interpolation signal v (n) to the interpolation buffer memory 130 together with the output of the interpolation signal v (n). This is because interpolation is performed for the (n + 1) th frame from the interpolation signal v (n) in the interpolation buffer memory 130 when noise is detected in the audio signal s (n + 1) of the (n + 1) th frame to be input next. This is because the signal v (n + 1) is generated.

以上のように、ｎ番目のフレームの音声信号ｓ（ｎ）に雑音が含まれる場合には、当該雑音を低減するために、ｎ−１番目のフレームの音声信号ｓ（ｎ−１）を利用して補間信号ｖ（ｎ）を生成して、補間処理が実行される。この補間処理により、当該雑音を含むｎ番目のフレームの入力音声信号ｓ（ｎ）に換えて、雑音を含まない補間信号ｖ（ｎ）が外部に出力されるので、雑音を好適に除去できる。 As described above, when noise is included in the audio signal s (n) of the nth frame, the audio signal s (n-1) of the (n-1) th frame is used to reduce the noise. Then, an interpolation signal v (n) is generated and interpolation processing is executed. By this interpolation processing, the interpolation signal v (n) not including noise is output to the outside instead of the input speech signal s (n) of the nth frame including the noise, so that noise can be suitably removed.

また、上記のようにフレーム単位で音声信号を入出力して補間処理する場合、入出力用バッファメモリ１２０及び補間用バッファメモリ１３０のメモリ長はそれぞれ、１フレームのサンプルデータ数Ｎでよい。従って、装置全体で必要なバッファメモリ長は２＊Ｎで済む。また、入出力用バッファメモリ１２０に対して雑音区間の音声信号ｓ（ｎ）が蓄積完了した時点で直ちに、補間信号ｖ（ｎ）を生成して外部に出力できるので、入力音声に対する出力音声の遅延はゼロである。 When the audio signal is input and output in frame units as described above for interpolation processing, the memory lengths of the input / output buffer memory 120 and the interpolation buffer memory 130 may each be the number N of sample data for one frame. Therefore, the buffer memory length required for the entire apparatus is 2 * N. Further, since the interpolation signal v (n) can be generated and output to the outside immediately after the completion of the accumulation of the audio signal s (n) in the noise section in the input / output buffer memory 120, the output audio for the input audio can be output. The delay is zero.

［１．４．音声信号処理方法］
次に、図８を参照して、上記の音声信号処理装置１００を用いた音声信号処理方法（機械音低減方法）について説明する。図８は、本実施形態に係る音声信号処理方法を示すフローチャートである。 [1.4. Audio signal processing method]
Next, an audio signal processing method (mechanical sound reduction method) using the audio signal processing apparatus 100 will be described with reference to FIG. FIG. 8 is a flowchart showing an audio signal processing method according to the present embodiment.

本実施形態に係る音声信号処理装置１００を具備するデジタルカメラ１による撮像及び録音中には、その周囲の外部音声がマイクロホン５１により収音され、音声信号が出力される。すると、音声信号処理装置１００は、マイクロホン５１から入力されたアナログ音声信号を、デジタル音声信号に変換し、当該デジタル音声信号をフレーム単位で処理する。即ち、音声信号処理装置１００は、入力された音声信号を１フレームずつ入出力用バッファメモリ１２０に保存し、現在入力中のフレームの１つ前のフレームの音声信号を補間用バッファメモリ１３０に保存する。そして、音声信号処理装置１００は、フレーム単位で雑音の有無を検出して、雑音が検出された場合には、当該フレームに対して、その前のフレームの信号を用いて補間処理を施す。図８は、この処理の詳細フローを示す。 During imaging and recording by the digital camera 1 including the audio signal processing apparatus 100 according to the present embodiment, surrounding external audio is collected by the microphone 51 and an audio signal is output. Then, the audio signal processing apparatus 100 converts the analog audio signal input from the microphone 51 into a digital audio signal, and processes the digital audio signal in units of frames. That is, the audio signal processing apparatus 100 stores the input audio signal frame by frame in the input / output buffer memory 120, and stores the audio signal of the frame immediately before the currently input frame in the interpolation buffer memory 130. To do. Then, the audio signal processing apparatus 100 detects the presence or absence of noise in units of frames, and when noise is detected, performs an interpolation process on the frame using the signal of the previous frame. FIG. 8 shows a detailed flow of this processing.

図８に示すように、まず、音声信号処理装置１００は、マイクロホン５１から入力される１フレーム分の音声信号が入出力用バッファメモリ１２０に蓄積されたか否かを判定する（Ｓ１００）。ここでは、現在、ｎ番目のフレームの音声信号ｓ（ｎ）が入力中である場合の処理について説明する。Ｓ１００の判定の結果、ｎ番目のフレームの音声信号ｓ（ｎ）が入出力用バッファメモリ１２０に蓄積完了したときには直ちに、雑音検出部１４０は、当該音声信号ｓ（ｎ）に雑音が含まれるか否かを検出する（Ｓ１０２）。 As shown in FIG. 8, first, the audio signal processing apparatus 100 determines whether or not an audio signal for one frame input from the microphone 51 has been accumulated in the input / output buffer memory 120 (S100). Here, processing when the audio signal s (n) of the nth frame is currently being input will be described. As a result of the determination of S100, when the audio signal s (n) of the nth frame is completely stored in the input / output buffer memory 120, the noise detection unit 140 immediately determines whether the audio signal s (n) includes noise. Whether or not is detected (S102).

Ｓ１０２の雑音判定の結果、雑音が検出された場合には直ちに、補間処理（図７参照。）が実行される。即ち、補間信号生成部１５２は、補間信号生成部１５２は、補間用バッファメモリ１３０に保存されているｎ−１番目のフレーム（１フレーム分過去）の音声信号ｓ（ｎ−１）を用いて、補間信号ｖ（ｎ）を生成する（Ｓ１０４）。そして、信号補間部１５４は、Ｓ１０４で生成された補間信号ｖ（ｎ）を用いて、雑音を含むｎ番目のフレームの音声信号ｓ（ｎ）を補間し、補間信号ｖ（ｎ）を入出力用バッファメモリ１２０に保存する（Ｓ１０６）。このＳ１０６の補間処理では、雑音を含むｎ番目のフレームの音声信号ｓ（ｎ）を補間信号ｖ（ｎ）に置換してもよいし、当該音声信号ｓ（ｎ）と補間信号ｖ（ｎ）を適切な混合比で合成してもよい。以下では、置換した例について説明する。 If noise is detected as a result of the noise determination in S102, an interpolation process (see FIG. 7) is immediately executed. That is, the interpolation signal generation unit 152 uses the audio signal s (n−1) of the (n−1) th frame (1 frame past) stored in the interpolation buffer memory 130. The interpolation signal v (n) is generated (S104). Then, the signal interpolation unit 154 interpolates the audio signal s (n) of the nth frame including noise using the interpolation signal v (n) generated in S104, and inputs / outputs the interpolation signal v (n). Is stored in the buffer memory 120 (S106). In the interpolation processing of S106, the noise signal s (n) of the nth frame including noise may be replaced with the interpolation signal v (n), or the sound signal s (n) and the interpolation signal v (n) may be replaced. May be synthesized at an appropriate mixing ratio. Below, the substituted example is demonstrated.

次いで、信号補間部１５４は、入出力用バッファメモリ１２０に保存されている雑音低減後の補間信号ｖ（ｎ）（ｎ番目のフレームに相当する。）を補間用バッファメモリ１３０にコピーするとともに（Ｓ１０８）、当該補間信号ｖ（ｎ）を信号出力部１６０に出力する（Ｓ１１０）。 Next, the signal interpolation unit 154 copies the noise-reduced interpolation signal v (n) (corresponding to the nth frame) stored in the input / output buffer memory 120 to the interpolation buffer memory 130 ( In step S108, the interpolation signal v (n) is output to the signal output unit 160 (S110).

一方、Ｓ１０２の雑音判定の結果、雑音が検出されない場合は、上記Ｓ１０８、Ｓ１１０の補間処理を行わずに、入力されたｎ番目のフレームの音声信号ｓ（ｎ）をそのまま出力する。即ち、信号補間部１５４は、入出力用バッファメモリ１２０に保存されているｎ番目のフレームの音声信号ｓ（ｎ）を補間用バッファメモリ１３０にコピーするとともに（Ｓ１０８）、当該音声信号ｓ（ｎ）をそのまま入出力用バッファメモリ１２０から信号出力部１６０に出力する（Ｓ１１０）。 On the other hand, if no noise is detected as a result of the noise determination in S102, the input nth frame audio signal s (n) is output as it is without performing the interpolation processing in S108 and S110. That is, the signal interpolation unit 154 copies the audio signal s (n) of the nth frame stored in the input / output buffer memory 120 to the interpolation buffer memory 130 (S108), and the audio signal s (n ) As it is from the input / output buffer memory 120 to the signal output unit 160 (S110).

その後、デジタルカメラ１による撮像及び録音動作が終了（Ｓ１１２）するまで、入力音声信号の次の１フレームの音声信号ｓ（ｎ＋１）に対して、上記Ｓ１００〜Ｓ１００の処理が繰り返される。これにより、入力音声信号に対して１フレームごとに雑音の検出処理が行われ、必要に応じて補間処理（雑音低減処理）が施された上で、雑音の無い音声信号がフレーム単位で出力される。 Thereafter, until the imaging and recording operations by the digital camera 1 are completed (S112), the processes of S100 to S100 are repeated for the audio signal s (n + 1) of the next frame of the input audio signal. As a result, noise detection processing is performed on the input audio signal for each frame, and if necessary, interpolation processing (noise reduction processing) is performed, and then a noise-free audio signal is output in frame units. The

［１．５．効果］
以上、本開示の第１の実施形態に係る音声信号処理装置１００の構成と、これを用いた音声信号処理方法について説明した。本実施形態によれば、マイクロホン５１から入力されて入出力用バッファメモリ１２０に蓄積中のフレームの音声信号ｓ（ｎ）に雑音が検出された時点で直ちに、予め補間用バッファメモリ１３０に保存されている１フレーム分過去の音声信号ｓ（ｎ−１）のみを用いて補間信号ｖ（ｎ）を生成する。そして、当該補間信号ｖ（ｎ）を用いて、雑音区間の音声信号ｓ（ｎ）を補間して、補間後の音声信号を出力する。 [1.5. effect]
The configuration of the audio signal processing apparatus 100 according to the first embodiment of the present disclosure and the audio signal processing method using the same have been described above. According to the present embodiment, immediately after noise is detected in the audio signal s (n) of the frame input from the microphone 51 and being stored in the input / output buffer memory 120, it is stored in the interpolation buffer memory 130 in advance. The interpolated signal v (n) is generated using only the audio signal s (n−1) that is past one frame. Then, using the interpolated signal v (n), the speech signal s (n) in the noise section is interpolated and the interpolated speech signal is output.

これにより、音声信号の入出力に用いるバッファメモリを補間処理にも有効活用することができるので、補間信号の推定に必要なバッファメモリ長を短くでき、装置全体で必要なバッファメモリを削減できる。つまり、入出力用バッファメモリ１２０、補間用バッファメモリ１３０のメモリ長は、それぞれ１フレームのサンプルデータ数Ｎでよいので、装置全体で必要なバッファメモリ長は２＊Ｎで済む。上記従来の補間方法（図１参照。）では、雑音区間の前後の信号を用いて補間するため、少なくとも３＊Ｎのバッファメモリ長が必要であった。これに対し、本実施形態では、バッファメモリ長は２＊Ｎでよく、補間処理に必要なバッファメモリを大幅に削減できる。 As a result, the buffer memory used for input / output of the audio signal can be effectively used for the interpolation processing, so that the buffer memory length necessary for estimating the interpolation signal can be shortened, and the buffer memory necessary for the entire apparatus can be reduced. That is, the memory lengths of the input / output buffer memory 120 and the interpolation buffer memory 130 may be the number N of sample data for each frame, so that the buffer memory length required for the entire apparatus is 2 * N. In the conventional interpolation method (see FIG. 1), since interpolation is performed using signals before and after the noise interval, a buffer memory length of at least 3 * N is required. On the other hand, in this embodiment, the buffer memory length may be 2 * N, and the buffer memory required for the interpolation processing can be greatly reduced.

なお、上述したように、雑音区間の前後に、様々な音が混ざり合った非周期的な音声が存在する場合は、雑音区間の補間前後の周期を揃える必要がなく、違和感のある音の発生が起こりにくい。従って、雑音区間の前のフレームの音声信号のみを使って補間した場合であっても、実質的には好適な雑音除去が可能となる。 In addition, as described above, when there is aperiodic speech in which various sounds are mixed before and after the noise section, it is not necessary to align the periods before and after the interpolation of the noise section, and the generation of an uncomfortable sound Is unlikely to occur. Therefore, even when interpolation is performed using only the audio signal of the frame preceding the noise section, it is possible to remove noise substantially.

さらに、本実施形態によれば、２つのバッファメモリを有効活用してフレーム単位での音声信号の処理を好適に制御することで、遅延が少ない高品質の雑音低減処理を実現できる。つまり、上記従来の補間方法（図１参照。）では、雑音区間の後のフレームの信号がバッファメモリに蓄積完了するまでに１フレーム分の遅延が生じ、さらにその後に補間信号を生成するために１フレーム分の遅延が生じるので、少なくとも２＊Ｎ分の遅延（２フレーム分の遅延）が発生していた。 Furthermore, according to the present embodiment, high-quality noise reduction processing with less delay can be realized by effectively controlling processing of an audio signal in units of frames by effectively using two buffer memories. That is, in the above conventional interpolation method (see FIG. 1), a delay of one frame occurs until the signal of the frame after the noise interval is completely stored in the buffer memory, and further, an interpolation signal is generated thereafter. Since a delay of one frame occurs, a delay of at least 2 * N (a delay of two frames) occurs.

これに対し、本実施形態に係る補間処理では、雑音区間の後のｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）を用いずに、雑音区間の前のｎ−１番目の音声信号ｓ（ｎ−１）のみを用いて補間信号ｖ（ｎ）を生成する。これにより、雑音区間であるｎ番目のフレームの音声信号ｓ（ｎ）が蓄積完了した時点で直ちに補間処理を実行して、補間後の信号を出力することができ、従来の補間方法のように、雑音区間の後の信号が蓄積されるまで補間処理を待機する必要がない。従って、入力音声に対する出力音声の遅延をゼロにできるので、従来と比べて、補間処理に伴う出力音声の遅延を大幅に低減できる。 In contrast, in the interpolation processing according to the present embodiment, the (n + 1) th audio signal s (n−) before the noise interval is used without using the audio signal s (n + 1) of the (n + 1) th frame after the noise interval. The interpolation signal v (n) is generated using only 1). As a result, the interpolation process can be executed immediately after the accumulation of the audio signal s (n) of the nth frame, which is a noise interval, and the interpolated signal can be output, as in the conventional interpolation method. There is no need to wait for the interpolation process until the signal after the noise interval is accumulated. Accordingly, since the delay of the output sound with respect to the input sound can be made zero, the delay of the output sound accompanying the interpolation process can be greatly reduced as compared with the conventional case.

＜２．第２の実施の形態＞
次に、本開示の第２の実施形態に係る音声信号処理装置及び音声信号処理方法について説明する。第２の実施形態に係る音声信号処理装置は、雑音区間の前後の信号を用いて補間信号を生成し、補間処理を行うことを特徴としている。なお、第２の実施形態のその他の機能構成は、上記第１の実施形態と実質的に同一であるので、その詳細説明は省略する。 <2. Second Embodiment>
Next, an audio signal processing device and an audio signal processing method according to the second embodiment of the present disclosure will be described. The audio signal processing apparatus according to the second embodiment is characterized in that an interpolation signal is generated using signals before and after a noise interval, and interpolation processing is performed. The other functional configuration of the second embodiment is substantially the same as that of the first embodiment, and a detailed description thereof will be omitted.

［２．１．機械音低減方法の概要］
まず、第２の実施形態に係る機械音低減方法の概要について説明する。上述した第１の実施形態では雑音区間の前の区間（ｎ−１番目のフレーム）の音声信号のみを用いて補間信号を生成した。これに対し、第２の実施形態では、雑音区間の前の区間（ｎ−１番目のフレーム）の音声信号のみならず、雑音区間の後の区間（ｎ＋１番目のフレーム）の音声信号をも用いて補間信号を生成して、補間処理を行う。 [2.1. Outline of mechanical noise reduction method]
First, an outline of the mechanical sound reduction method according to the second embodiment will be described. In the first embodiment described above, the interpolation signal is generated using only the audio signal in the section (n−1th frame) before the noise section. On the other hand, in the second embodiment, not only the voice signal in the section (n−1th frame) before the noise section but also the voice signal in the section (n + 1th frame) after the noise section is used. To generate an interpolation signal and perform an interpolation process.

詳細には、ｎ番目のフレームの音声信号に雑音が検出された場合、ｎ−１番目のフレームの音声信号から第１の仮補間信号（前部仮補間信号）を生成するとともに、ｎ＋１番目のフレームの音声信号から第２の仮補間信号（後部仮補間信号）を生成する。そして、第１の仮補間信号と第２の仮補間信号を合成して補間信号を生成し、当該補間信号を用いて、雑音区間であるｎ番目のフレームの音声信号を補間する。 Specifically, when noise is detected in the audio signal of the nth frame, a first temporary interpolation signal (front temporary interpolation signal) is generated from the audio signal of the (n−1) th frame, and the n + 1th frame. A second temporary interpolation signal (rear temporary interpolation signal) is generated from the audio signal of the frame. Then, the first temporary interpolation signal and the second temporary interpolation signal are combined to generate an interpolation signal, and the nth frame audio signal, which is a noise interval, is interpolated using the interpolation signal.

かかる補間処理により、第１の実施形態と比べて、入力音声に対して出力音声に１フレーム分の遅延が生じるものの、雑音区間の前後の信号を用いて補間信号を生成することによって、補間信号を高精度で推定できる。従って、より高品質の雑音低減処理を実現できる。また、２つのバッファメモリを好適に使い分けて効率的に補間信号を生成するので、入力音声に対する出力音声の遅延を最大限抑制し、１フレーム分に抑えることが可能である。以下に、第２の実施形態に係る音声信号処理装置及び方法について詳述する。 Although the interpolation processing causes a delay of one frame in the output sound with respect to the input sound as compared with the first embodiment, the interpolation signal is generated by using the signals before and after the noise interval. Can be estimated with high accuracy. Therefore, higher quality noise reduction processing can be realized. Further, since the interpolation signal is efficiently generated by properly using the two buffer memories, it is possible to suppress the delay of the output sound with respect to the input sound to the maximum and to suppress it to one frame. The audio signal processing apparatus and method according to the second embodiment will be described in detail below.

［２．２．音声信号処理装置の機能構成］
次に、図９を参照して、第２の実施形態に係る音声信号処理装置１００の機能構成について説明する。図９は、第２の実施形態に係る音声信号処理装置１００の機能構成を示すブロック図である。 [2.2. Functional configuration of audio signal processing apparatus]
Next, a functional configuration of the audio signal processing apparatus 100 according to the second embodiment will be described with reference to FIG. FIG. 9 is a block diagram showing a functional configuration of the audio signal processing apparatus 100 according to the second embodiment.

図９に示すように、音声信号処理装置１００は、信号入力部１１０と、入力用バッファメモリ１２２（第１のバッファメモリ）と、出力用バッファメモリ１３２（第２のバッファメモリ）と、雑音検出部１４０と、雑音低減部１５０と、信号出力部１６０とを備える。また、上記入力用バッファメモリ１２２、出力用バッファメモリ１３２、雑音検出部１４０及び雑音低減部１５０は、上記図２の音声信号処理部６０を構成する。なお、第２の実施形態に係る信号入力部１１０、雑音検出部１４０及び信号出力部１６０は、上記第１の実施形態の場合と実質的に同一の機能構成を有するので、詳細説明は省略する。 As shown in FIG. 9, the audio signal processing apparatus 100 includes a signal input unit 110, an input buffer memory 122 (first buffer memory), an output buffer memory 132 (second buffer memory), and noise detection. Unit 140, noise reduction unit 150, and signal output unit 160. The input buffer memory 122, the output buffer memory 132, the noise detection unit 140, and the noise reduction unit 150 constitute the audio signal processing unit 60 of FIG. The signal input unit 110, the noise detection unit 140, and the signal output unit 160 according to the second embodiment have substantially the same functional configuration as that of the first embodiment, and thus detailed description thereof is omitted. .

第２の実施形態に係る音声信号処理装置１００は、入力用バッファメモリ１２２と、出力用バッファメモリ１３２という２つのバッファメモリを具備している。これらバッファメモリは、マイクロホンから入力された音声信号や、生成した補間信号を一時保存する信号保持部として機能する。そして、第２の実施形態では、入力用バッファメモリ１２２と出力用バッファメモリ１３２は、信号入力部１１０と信号出力部１６０との間に直列に接続されている。 The audio signal processing apparatus 100 according to the second embodiment includes two buffer memories: an input buffer memory 122 and an output buffer memory 132. These buffer memories function as a signal holding unit that temporarily stores the audio signal input from the microphone and the generated interpolation signal. In the second embodiment, the input buffer memory 122 and the output buffer memory 132 are connected in series between the signal input unit 110 and the signal output unit 160.

音声信号処理装置１００が音声信号をフレーム単位で入出力及び処理するために、入力用バッファメモリ１２２は、現在入力される音声信号の１フレーム分を一時保存し、出力用バッファメモリ１３２は、過去に入力された音声信号の１フレーム分を一時保存する。これら２つのバッファメモリのメモリ長は同一であり、例えば、それぞれのバッファメモリが、１フレーム分のデジタル音声信号（サンプルデータ数Ｎ）を保存可能である。従って、音声信号処理装置１００が備えるバッファメモリの長さは、２＊Ｎとなる。なお、入力用バッファメモリ１２２及び出力用バッファメモリ１３２は、物理的に分離された２つのバッファメモリで構成されてもよいし、物理的に１つのバッファメモリの記憶領域を分離することで構成されてもよい。 In order for the audio signal processing apparatus 100 to input / output and process audio signals in units of frames, the input buffer memory 122 temporarily stores one frame of the currently input audio signal, and the output buffer memory 132 stores the past. One frame of the audio signal input to is temporarily stored. The memory lengths of these two buffer memories are the same. For example, each buffer memory can store a digital audio signal (number of sample data N) for one frame. Therefore, the length of the buffer memory included in the audio signal processing apparatus 100 is 2 * N. The input buffer memory 122 and the output buffer memory 132 may be configured by two physically separated buffer memories, or may be configured by physically separating a storage area of one buffer memory. May be.

入力用バッファメモリ１２２は、信号入力部１１０から入力された音声信号を、所定区間ごとに（例えば、１フレームずつ）一時保存する。この入力用バッファメモリ１２２は、入力される音声信号の１フレーム分全てを保存完了した時点で、当該１フレームの音声信号を出力する。 The input buffer memory 122 temporarily stores the audio signal input from the signal input unit 110 at predetermined intervals (for example, one frame at a time). The input buffer memory 122 outputs the audio signal of one frame when the storage of all the frames of the input audio signal is completed.

入力用バッファメモリ１２２から出力された１フレームの音声信号は、出力用バッファメモリ１３２に一時保存される。つまり、出力用バッファメモリ１３２は、入力用バッファメモリ１２２に保存されている現在のフレームの音声信号（ｎ番目のフレームの音声信号）よりも１つ前の過去のフレームの音声信号（ｎ−１番目のフレームの音声信号）を一時保存する。従って、信号入力部１１０から入力されるｎ番目のフレームの音声信号が、入力用バッファメモリ１２２に蓄積されている最中には、出力用バッファメモリ１３２にｎ−１番目のフレームの音声信号が保存されていることになる。この出力用バッファメモリ１３２は、入力用バッファメモリ１２２から入力される音声信号の１フレーム分の保存を完了した時点で、当該１フレームの音声信号を信号出力部１６０に出力する。 One frame of the audio signal output from the input buffer memory 122 is temporarily stored in the output buffer memory 132. That is, the output buffer memory 132 stores the audio signal (n−1) of the previous frame immediately before the audio signal of the current frame (the audio signal of the nth frame) stored in the input buffer memory 122. The audio signal of the second frame) is temporarily saved. Therefore, while the audio signal of the nth frame input from the signal input unit 110 is being accumulated in the input buffer memory 122, the audio signal of the (n-1) th frame is stored in the output buffer memory 132. It will be saved. The output buffer memory 132 outputs the audio signal of one frame to the signal output unit 160 when the storage of one frame of the audio signal input from the input buffer memory 122 is completed.

これにより、信号入力部１１０から入力された音声信号は、１フレームずつ順次、入力用バッファメモリ１２２、出力用バッファメモリ１３２にそれぞれ一時保存された後に、信号出力部１６０に出力される。これら２つのバッファメモリにより、常時、２フレーム分の音声信号が音声信号処理装置１００内に保持される。 Thus, the audio signal input from the signal input unit 110 is temporarily stored in the input buffer memory 122 and the output buffer memory 132 sequentially for each frame, and then output to the signal output unit 160. By these two buffer memories, the audio signal for two frames is always held in the audio signal processing apparatus 100.

次に、第２の実施形態に係る雑音低減部１５０について説明する。雑音低減部１５０は、補間信号生成部１５２と、信号補間部１５４と、第１の仮補間信号生成部１５６と、第２の仮補間信号生成部１５７とを備える。 Next, the noise reduction unit 150 according to the second embodiment will be described. The noise reduction unit 150 includes an interpolation signal generation unit 152, a signal interpolation unit 154, a first temporary interpolation signal generation unit 156, and a second temporary interpolation signal generation unit 157.

雑音検出部１４０によりｎ番目のフレームの音声信号に雑音が検出された場合、第１の仮補間信号生成部１５６は、出力用バッファメモリ１３２に保存されているｎ−１番目のフレームの音声信号から第１の仮補間信号を生成する。第１の仮補間信号は、雑音区間の前の区間の入力音声信号から生成される仮の補間信号である。このように、第１の仮補間信号生成部１５６は、雑音区間（ｎ番目のフレーム）が入力用バッファメモリ１２２に保存された直後に、雑音区間の前の区間（ｎ−１番目のフレーム）の音声信号から、雑音区間（ｎ番目のフレーム）を補間するための第１の仮補間信号を生成する。 When noise is detected in the audio signal of the nth frame by the noise detection unit 140, the first temporary interpolation signal generation unit 156 stores the audio signal of the (n−1) th frame stored in the output buffer memory 132. To generate a first temporary interpolation signal. The first temporary interpolation signal is a temporary interpolation signal generated from the input speech signal in the section before the noise section. Thus, the first temporary interpolation signal generation unit 156 immediately follows the noise interval (n-th frame) immediately after the noise interval (n-th frame) is stored in the input buffer memory 122. The first temporary interpolation signal for interpolating the noise section (nth frame) is generated from the voice signal.

その後、ｎ＋１番目のフレームの音声信号が入力用バッファメモリ１２２に保存されたときに、第２の仮補間信号生成部１５７は、入力用バッファメモリ１２２に保存されているｎ＋１番目のフレームの音声信号から第２の仮補間信号を生成する。第２の仮補間信号は、雑音区間の後の区間の入力音声信号から生成される仮の補間信号である。このように、第２の仮補間信号生成部１５６は、雑音区間（ｎ番目のフレーム）が出力用バッファメモリ１３２に保存された直後に、雑音区間の後の区間（ｎ＋１番目のフレーム）の音声信号から、雑音区間（ｎ番目のフレーム）を補間するための第２の仮補間信号を生成する。 Thereafter, when the audio signal of the (n + 1) th frame is stored in the input buffer memory 122, the second temporary interpolation signal generation unit 157 stores the audio signal of the (n + 1) th frame stored in the input buffer memory 122. To generate a second temporary interpolation signal. The second temporary interpolation signal is a temporary interpolation signal generated from the input speech signal in the section after the noise section. As described above, the second temporary interpolation signal generation unit 156 immediately after the noise interval (n-th frame) is stored in the output buffer memory 132, immediately after the noise interval (n + 1-th frame). A second temporary interpolation signal for interpolating a noise interval (nth frame) is generated from the signal.

そして、補間信号生成部１５２は、上記第１及び第２の仮補間信号から補間信号を生成する。信号補間部１５４は、上記補間信号生成部１５２により生成された補間信号を用いて、出力用バッファメモリ１３２に保存されているｎ番目のフレームの音声信号（雑音区間の音声信号）を補間する。 Then, the interpolation signal generation unit 152 generates an interpolation signal from the first and second temporary interpolation signals. The signal interpolation unit 154 uses the interpolation signal generated by the interpolation signal generation unit 152 to interpolate the nth frame audio signal (the audio signal in the noise interval) stored in the output buffer memory 132.

例えば、信号補間部１５４は、出力用バッファメモリ１３２に保存されている雑音区間の音声信号の全ての振幅値（つまり、Ｎ個のサンプルデータ）をゼロにした後に、上記補間信号をそのまま上書きすることによって、補間処理を実行してもよい。この補間処理により、雑音を含むｎ番目の区間の音声信号が補間信号に置換されて出力される。或いは、信号補間部１５４は、出力用バッファメモリ１３２に保存されている雑音区間の音声信号と、補間信号を適当な混合比で合成することで、補間処理を実行してもよい。かかる信号補間部１５４による補間処理により、入力された雑音区間の音声信号に換えて、補間信号で補間された音声信号が出力されるようになるので、当該雑音区間に含まれる雑音を低減・除去することができる。 For example, the signal interpolation unit 154 sets all amplitude values (that is, N pieces of sample data) of the audio signal in the noise section stored in the output buffer memory 132 to zero, and then overwrites the interpolation signal as it is. Thus, the interpolation process may be executed. By this interpolation processing, the sound signal in the nth section including noise is replaced with the interpolation signal and output. Alternatively, the signal interpolation unit 154 may perform the interpolation process by combining the speech signal in the noise section stored in the output buffer memory 132 and the interpolation signal with an appropriate mixing ratio. By the interpolation processing by the signal interpolation unit 154, a voice signal interpolated with the interpolation signal is output instead of the voice signal in the input noise section, so that noise included in the noise section is reduced / removed. can do.

ここで、上記仮補間信号及び補間信号の生成方法の例について説明する。 Here, an example of the temporary interpolation signal and the method of generating the interpolation signal will be described.

（ａ）シンプルな生成方法
ｎ番目のフレームが雑音区間である場合、例えば、上記図４又は図５に示した補間信号の生成方法と同様にして、ｎ−１番目のフレームの入力音声信号ｓ（ｎ−１）から第１の仮補間信号ｐ（ｎ）を生成し、ｎ＋１番目のフレームの入力音声信号ｓ（ｎ＋１）から第２の仮補間信号ｑ（ｎ）を生成する。そして、以下の式（１）に示すように、第１の仮補間信号ｐ（ｎ）と第２の仮補間信号ｑ（ｎ）を所定の混合計数α（０＜α＜１）を用いて混合することで、補間信号ｖ（ｎ）を生成する。 (A) Simple generation method When the n-th frame is a noise section, for example, in the same manner as the interpolation signal generation method shown in FIG. 4 or FIG. A first temporary interpolation signal p (n) is generated from (n−1), and a second temporary interpolation signal q (n) is generated from the input speech signal s (n + 1) of the (n + 1) th frame. Then, as shown in the following formula (1), the first temporary interpolation signal p (n) and the second temporary interpolation signal q (n) are used with a predetermined mixing count α (0 <α <1). By mixing, the interpolation signal v (n) is generated.

ｖ（ｎ）＝α・ｐ（ｎ）＋（１−α）・ｑ（ｎ）・・・（１） v (n) = α · p (n) + (1−α) · q (n) (1)

例えば、α＝０．５とすることで、第１の仮補間信号ｐ（ｎ）と第２の仮補間信号ｑ（ｎ）を均等に混合して、補間信号ｖ（ｎ）を生成できる。また、ｐ（ｎ）又はｑ（ｎ）の重み付けを偏らせたいときには、αの数値を調整すればよい。上記のような生成方法により、雑音区間の前後の区間の音声信号を用いて、補間精度の高い補間信号ｖ（ｎ）を生成できる。 For example, by setting α = 0.5, the first temporary interpolation signal p (n) and the second temporary interpolation signal q (n) can be evenly mixed to generate the interpolation signal v (n). Further, when it is desired to bias the weighting of p (n) or q (n), the numerical value of α may be adjusted. With the generation method as described above, it is possible to generate an interpolation signal v (n) with high interpolation accuracy using the audio signals in the sections before and after the noise section.

（ｂ）窓を用いた生成方法
また、図１０は、本実施形態に係る雑音区間の前後の入力音声信号から仮補間信号及び補間信号を生成する別の方法を示す概念図である。 (B) Generation Method Using Window FIG. 10 is a conceptual diagram showing another method for generating a temporary interpolation signal and an interpolation signal from input speech signals before and after the noise interval according to this embodiment.

上記図４又は図５に示した補間信号の生成方法と同様にして、ｎ−１番目のフレームの入力音声信号ｓ（ｎ−１）から第１の仮補間信号ｐ（ｎ）を生成し、ｎ＋１番目のフレームの入力音声信号ｓ（ｎ＋１）から第２の仮補間信号ｑ（ｎ）を生成する。そして、図１０に示すように、上記のように生成した第１の仮補間信号ｐ（ｎ）と第２の仮補間信号ｑ（ｎ）を、ハニング窓又はバートレット窓等の任意の窓ｗ_１（ｎ）、ｗ_２（ｎ）を用いて混合する。具体的には、まず、仮補間信号ｐ（ｎ）、ｑ（ｎ）にそれぞれ窓ｗ_１（ｎ）、ｗ_２（ｎ）を乗算して、信号ｔ（ｎ）、信号ｕ（ｎ）を生成する。次いで、信号ｔ（ｎ）と信号ｕ（ｎ）を合成して補間信号ｖ（ｎ）を生成する。例えば、信号ｔ（ｎ）と信号ｕ（ｎ）を加算して、補間信号ｖ（ｎ）＝ｐ（ｎ）＋ｑ（ｎ）を生成してもよいし、或いは、信号ｐ（ｎ）から信号ｑ（ｎ）を減算して、補間信号ｖ（ｎ）＝ｐ（ｎ）−ｑ（ｎ）を生成してもよい。このような方法により、仮補間信号ｐ（ｎ）、ｑ（ｎ）から、より自然な補間信号ｖ（ｎ）を生成することが可能である。 In the same manner as the interpolation signal generation method shown in FIG. 4 or FIG. 5, the first temporary interpolation signal p (n) is generated from the input audio signal s (n−1) of the (n−1) th frame, A second temporary interpolation signal q (n) is generated from the input audio signal s (n + 1) of the (n + 1) th frame. Then, as shown in FIG. 10, the first temporary interpolation signal p (n) and the second temporary interpolation signal q (n) generated as described above are converted into an arbitrary window w ₁ such as a Hanning window or a Bartlett window. Mix using (n), w ₂ (n). Specifically, first, the temporary interpolation signals p (n) and q (n) are multiplied by the windows w ₁ (n) and w ₂ (n), respectively, to obtain the signals t (n) and u (n). Generate. Next, the signal t (n) and the signal u (n) are combined to generate an interpolation signal v (n). For example, the signal t (n) and the signal u (n) may be added to generate the interpolation signal v (n) = p (n) + q (n), or the signal p (n) The interpolated signal v (n) = p (n) −q (n) may be generated by subtracting q (n). By such a method, it is possible to generate a more natural interpolation signal v (n) from the temporary interpolation signals p (n) and q (n).

［２．３．音声信号処理装置の動作］
次に、第２の実施形態に係る音声信号処理装置１００の動作について説明する。以下では、雑音がない通常時の動作例と、雑音発生時の動作例についてそれぞれ説明する。 [2.3. Operation of audio signal processing apparatus]
Next, the operation of the audio signal processing apparatus 100 according to the second embodiment will be described. In the following, an example of normal operation without noise and an example of operation when noise occurs will be described.

［２．３．１．雑音がない通常時の動作例］
まず、図１１を参照して、雑音がない通常時の音声信号処理装置１００の動作について説明する。図１１は、第２の実施形態に係る音声信号処理装置１００の通常時の動作を示す模式図である。 [2.3.1. Example of normal operation without noise]
First, with reference to FIG. 11, the operation of the audio signal processing apparatus 100 in a normal time without noise will be described. FIG. 11 is a schematic diagram illustrating the normal operation of the audio signal processing apparatus 100 according to the second embodiment.

図１１に示すように、雑音が発生していない通常時には、マイクロホン５１から入力された音声信号は、フレーム単位で順次、入力用バッファメモリ１２２、出力用バッファメモリ１３２に一時保存される。出力用バッファメモリ１３２に保存されるフレームの音声信号は、入力用バッファメモリ１２２に現在蓄積されているフレームの音声信号よりも１つ前（過去）のフレームである。例えば、図１１Ａに示すように、現在、ｎ番目のフレームの音声信号が新たに入力されて、入力用バッファメモリ１２２に蓄積されているときには、１フレーム分だけ過去に入力されたｎ−１番目のフレームの音声信号ｓ（ｎ−１）が出力用バッファメモリ１３２に保存されている。 As shown in FIG. 11, during normal times when noise is not generated, the audio signal input from the microphone 51 is temporarily stored in the input buffer memory 122 and the output buffer memory 132 sequentially in units of frames. The audio signal of the frame stored in the output buffer memory 132 is a frame one previous (past) before the audio signal of the frame currently stored in the input buffer memory 122. For example, as shown in FIG. 11A, when an audio signal of the nth frame is newly input and stored in the input buffer memory 122, the (n−1) th frame that has been input in the past by one frame is currently stored. The audio signal s (n−1) of the frame is stored in the output buffer memory 132.

そして、ｎ番目のフレームの音声信号ｓ（ｎ）の全てが入力用バッファメモリ１２２に蓄積完了された時に直ちに、図１１Ｂに示すように、出力用バッファメモリ１３２に保存されているｎ番目のフレームの音声信号ｓ（ｎ）が外部に出力される。このとき、雑音は検出されていないので、ｎ−１番目のフレームの音声信号ｓ（ｎ−１）がそのまま出力される。また、当該音声信号ｓ（ｎ−１）の出力とともに、入力用バッファメモリ１２２に保存されているｎ番目のフレームの音声信号ｓ（ｎ）が、出力用バッファメモリ１３２にコピーされ、入力用バッファメモリ１２２内のデータが消去される。これは、次に入力されるｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）に雑音が検出された場合に、出力用バッファメモリ１３２内のｎ番目のフレームの音声信号ｓ（ｎ）から、ｎ＋１番目のフレーム用の補間信号ｖ（ｎ＋１）を生成するためである。 Then, as soon as all the audio signals s (n) of the nth frame have been accumulated in the input buffer memory 122, as shown in FIG. 11B, the nth frame stored in the output buffer memory 132. Audio signal s (n) is output to the outside. At this time, since no noise is detected, the audio signal s (n−1) of the (n−1) th frame is output as it is. In addition to the output of the audio signal s (n−1), the audio signal s (n) of the nth frame stored in the input buffer memory 122 is copied to the output buffer memory 132 to be input buffer. Data in the memory 122 is erased. This is because, when noise is detected in the next input audio signal s (n + 1) of the (n + 1) th frame, the n + 1th frame from the audio signal s (n) of the nth frame in the output buffer memory 132. This is to generate the interpolation signal v (n + 1) for the frames.

その後、ｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）が新たに入力され、当該ｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）の全てが入力用バッファメモリ１２２に蓄積完了された時に直ちに、ｎ番目のフレームの音声信号ｓ（ｎ）が出力用バッファメモリ１３２から出力される。このため、入力音声に対して出力音声が１フレーム分だけ遅延することになる（遅延量はＮ）。 Thereafter, when the audio signal s (n + 1) of the (n + 1) th frame is newly input and all the audio signals s (n + 1) of the (n + 1) th frame are completely accumulated in the input buffer memory 122, the nth frame is immediately The frame audio signal s (n) is output from the output buffer memory 132. For this reason, the output sound is delayed by one frame with respect to the input sound (the delay amount is N).

［２．３．２．雑音発生時の動作例］
次に、図１２、図１３を参照して、雑音発生時の音声信号処理装置１００の動作について説明する。図１２、図１３は、本実施形態に係る音声信号処理装置１００の雑音発生時の動作例を示す模式図である。 [2.3.2. Example of operation when noise occurs]
Next, the operation of the audio signal processing apparatus 100 when noise is generated will be described with reference to FIGS. 12 and 13 are schematic diagrams illustrating an operation example when noise is generated in the audio signal processing apparatus 100 according to the present embodiment.

図１２に示すように、入力される音声信号に雑音（例えばパルス機械音）が含まれる場合であっても、マイクロホン５１から入力された音声信号は、フレーム単位で順次、入力用バッファメモリ１２２、出力用バッファメモリ１３２に一時保存される。図１２Ａに示すように、雑音が含まれるｎ番目のフレームの音声信号ｓ（ｎ）が新たに入力され、入力用バッファメモリ１２２に蓄積されているときには、１フレーム分だけ過去のｎ−１番目のフレームの音声信号ｓ（ｎ−１）が出力用バッファメモリ１３２に一時保存されている。 As shown in FIG. 12, even when noise (for example, pulse mechanical sound) is included in the input audio signal, the audio signal input from the microphone 51 is sequentially input to the input buffer memory 122, in units of frames. It is temporarily stored in the output buffer memory 132. As shown in FIG. 12A, when the nth frame audio signal s (n) including noise is newly input and stored in the input buffer memory 122, the previous (n−1) th frame by one frame is stored. The audio signal s (n−1) of the frame is temporarily stored in the output buffer memory 132.

そして、ｎ番目のフレームの音声信号ｓ（ｎ）の全てが入力用バッファメモリ１２２に蓄積完了し、かつ、当該音声信号ｓ（ｎ）に雑音が含まれることが検出されたときには、図１２Ｂに示す第１の仮補間信号の生成処理が直ちに実行される。つまり、第１の仮補間信号生成部１５６は、図１２Ｂに示すように、出力用バッファメモリ１３２に保存されているｎ−１番目のフレームの音声信号ｓ（ｎ−１）から、雑音区間（ｎ番目のフレーム）の音声信号ｓ（ｎ）を補間するための第１の仮補間信号ｐ（ｎ）を生成する。図１２Ｂの例では、ｎ−１番目のフレームの音声信号ｓ（ｎ−１）を時間軸方向に反転させることにより、第１の仮補間信号ｐ（ｎ）が生成されている。そして、第１の仮補間信号生成部１５６は、出力用バッファメモリ１３２に保存されているｎ番目のフレームの音声信号ｓ（ｎ）を削除して、上記第１の仮補間信号ｐ（ｎ）を出力用バッファメモリ１３２に保存する。 When it is detected that all of the audio signal s (n) of the nth frame has been accumulated in the input buffer memory 122 and noise is included in the audio signal s (n), FIG. The first temporary interpolation signal generation process shown is immediately executed. That is, as shown in FIG. 12B, the first temporary interpolation signal generation unit 156 generates a noise interval (from the audio signal s (n−1) of the (n−1) th frame stored in the output buffer memory 132. A first temporary interpolation signal p (n) for interpolating the audio signal s (n) of the (nth frame) is generated. In the example of FIG. 12B, the first temporary interpolation signal p (n) is generated by inverting the audio signal s (n−1) of the (n−1) th frame in the time axis direction. Then, the first temporary interpolation signal generation unit 156 deletes the audio signal s (n) of the nth frame stored in the output buffer memory 132, and the first temporary interpolation signal p (n). Is stored in the output buffer memory 132.

次いで、図１２Ｃに示すように、信号補間部１５４は、出力用バッファメモリ１３２に保存されているｎ−１番目のフレームの音声信号ｓ（ｎ−１）を外部に出力する。さらに、信号補間部１５４は、上記音声信号ｓ（ｎ−１）の出力とともに、入力用バッファメモリ１２２に保存されている第１の仮補間信号ｐ（ｎ）を、出力用バッファメモリ１３２に移動させる。これは、出力用バッファメモリ１３２に第１の仮補間信号ｐ（ｎ）を保存しておき、次にｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）が入力されたときに仮補間信号ｐ（ｎ）を用いて補間信号ｖ（ｎ）を生成するためである。 Next, as illustrated in FIG. 12C, the signal interpolation unit 154 outputs the audio signal s (n−1) of the (n−1) th frame stored in the output buffer memory 132 to the outside. Further, the signal interpolation unit 154 moves the first temporary interpolation signal p (n) stored in the input buffer memory 122 to the output buffer memory 132 together with the output of the audio signal s (n−1). Let This is because the first temporary interpolation signal p (n) is stored in the output buffer memory 132, and the temporary interpolation signal p (n (n) when the audio signal s (n + 1) of the (n + 1) th frame is input next. ) To generate the interpolation signal v (n).

次いで、図１３Ａに示すように、次のｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）が新たに入力され、入力用バッファメモリ１２２に蓄積されているときには、上記第１の仮補間信号ｐ（ｎ）が出力用バッファメモリ１３２に一時保存されている。 Next, as shown in FIG. 13A, when the audio signal s (n + 1) of the next (n + 1) th frame is newly input and stored in the input buffer memory 122, the first temporary interpolation signal p (n ) Is temporarily stored in the output buffer memory 132.

そして、ｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）の全てが入力用バッファメモリ１２２に蓄積完了したときには、図１３Ｂに示す第２の仮補間信号の生成処理、及び補間信号の生成処理が直ちに実行される。つまり、第２の仮補間信号生成部１５７は、図１３Ｂに示すように、入力用バッファメモリ１２２に保存されているｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）から、雑音区間（ｎ番目のフレーム）の音声信号ｓ（ｎ）を補間するための第２の仮補間信号ｑ（ｎ）を生成する。そして、補間信号生成部１５２は、図１３Ｂに示すように、生成された第２の仮補間信号ｑ（ｎ）と、出力用バッファメモリ１３２に保存されている第１の仮補間信号ｐ（ｎ）とを合成して、補間信号ｖ（ｎ）を生成する。 When all of the audio signal s (n + 1) of the (n + 1) th frame has been accumulated in the input buffer memory 122, the second temporary interpolation signal generation process and the interpolation signal generation process shown in FIG. 13B are immediately executed. Is done. That is, as shown in FIG. 13B, the second temporary interpolation signal generation unit 157 generates a noise interval (nth frame) from the audio signal s (n + 1) of the (n + 1) th frame stored in the input buffer memory 122. ) Of the second temporary interpolation signal q (n) for interpolating the audio signal s (n). Then, as shown in FIG. 13B, the interpolation signal generation unit 152 generates the generated second temporary interpolation signal q (n) and the first temporary interpolation signal p (n) stored in the output buffer memory 132. ) And the interpolation signal v (n) is generated.

次いで、図１３Ｃに示すように、上記補間信号ｖ（ｎ）の生成後直ちに、信号補間部１５４は、図１２Ａで実際に入力されたｎ番目のフレームの音声信号ｓ（ｎ）に換えて、上記補間信号ｖ（ｎ）を外部に出力する。さらに、信号補間部１５４は、上記補間信号ｖ（ｎ）の出力とともに、入力用バッファメモリ１２２に保存されているｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）を、出力用バッファメモリ１３２に移動させる。これは、次に入力されるｎ＋２番目のフレームの音声信号ｓ（ｎ＋２）が入力用バッファメモリ１２２に蓄積完了した時点で、出力用バッファメモリ１３２から音声信号ｓ（ｎ＋１）を出力するためである。また、次に入力されるｎ＋２番目のフレームの音声信号ｓ（ｎ＋２）に雑音が検出された場合には、出力用バッファメモリ１３２内の音声信号ｓ（ｎ＋１）から、ｎ＋２番目のフレーム用の第１の仮補間信号ｐ（ｎ＋１）を生成することもできる。 Next, as shown in FIG. 13C, immediately after the generation of the interpolation signal v (n), the signal interpolation unit 154 replaces the audio signal s (n) of the nth frame actually input in FIG. The interpolation signal v (n) is output to the outside. Further, the signal interpolation unit 154 moves the audio signal s (n + 1) of the (n + 1) th frame stored in the input buffer memory 122 to the output buffer memory 132 together with the output of the interpolation signal v (n). . This is because the audio signal s (n + 1) is output from the output buffer memory 132 when the next input audio signal s (n + 2) of the (n + 2) th frame has been accumulated in the input buffer memory 122. . In addition, when noise is detected in the audio signal s (n + 2) of the (n + 2) th frame that is input next, the n + 2th frame signal is detected from the audio signal s (n + 1) in the output buffer memory 132. One temporary interpolation signal p (n + 1) can also be generated.

以上のように、本実施形態によれば、ｎ番目のフレームの音声信号ｓ（ｎ）に雑音が含まれる場合には、当該雑音を低減するために、ｎ−１番目及びｎ＋１番目のフレームの音声信号ｓ（ｎ−１）、ｓ（ｎ＋１）を利用して補間信号ｖ（ｎ）を生成して、補間処理が実行される。この補間処理により、当該雑音を含むｎ番目のフレームの入力音声信号ｓ（ｎ）に換えて、雑音を含まない補間信号ｖ（ｎ）が外部に出力されるので、雑音を好適に除去できる。さらに、雑音区間の前後の音声信号を用いて補間するので、より自然で高精度な補間処理を実行できるので、高品質の雑音低減を実現できる。 As described above, according to the present embodiment, when noise is included in the audio signal s (n) of the nth frame, in order to reduce the noise, the n−1th and n + 1th frames The interpolation signal v (n) is generated using the audio signals s (n−1) and s (n + 1), and the interpolation process is executed. By this interpolation processing, the interpolation signal v (n) not including noise is output to the outside instead of the input speech signal s (n) of the nth frame including the noise, so that noise can be suitably removed. Furthermore, since interpolation is performed using audio signals before and after the noise section, more natural and highly accurate interpolation processing can be executed, so that high-quality noise reduction can be realized.

また、上記のようにフレーム単位で音声信号を入出力して補間処理する場合、入力用バッファメモリ１２２及び出力用バッファメモリ１３２のメモリ長はそれぞれ、１フレームのサンプルデータ数Ｎでよい。従って、第１の実施形態同様に、装置全体で必要なバッファメモリ長は２＊Ｎで済む。また、入力用バッファメモリ１２２に対して次のフレームの音声信号ｓ（ｎ＋１）が蓄積完了した時点で直ちに、１つ前のフレームの音声信号ｓ（ｎ）が外部に出力されるので、入力音声に対する出力音声の遅延は１フレーム分で済む。 Further, in the case where the audio signal is input / output in units of frames as described above and the interpolation processing is performed, the memory lengths of the input buffer memory 122 and the output buffer memory 132 may each be the number N of sample data of one frame. Therefore, as in the first embodiment, the buffer memory length required for the entire apparatus is 2 * N. Also, immediately after the audio signal s (n + 1) of the next frame is completely stored in the input buffer memory 122, the audio signal s (n) of the previous frame is output to the outside. The delay of the output voice with respect to is only one frame.

［２．４．音声信号処理方法］
次に、図１４を参照して、上記の音声信号処理装置１００を用いた音声信号処理方法（機械音低減方法）について説明する。図１４は、第２の実施形態に係る音声信号処理方法を示すフローチャートである。 [2.4. Audio signal processing method]
Next, an audio signal processing method (mechanical sound reduction method) using the audio signal processing apparatus 100 will be described with reference to FIG. FIG. 14 is a flowchart showing an audio signal processing method according to the second embodiment.

図１４に示すように、まず、音声信号処理装置１００は、マイクロホン５１から入力される１フレーム分の音声信号が入力用バッファメモリ１２２に蓄積されたか否かを判定する（Ｓ２００）。ここでは、現在、ｎ番目のフレームの音声信号ｓ（ｎ）が入力中である場合の処理について説明する。Ｓ２００の判定の結果、ｎ番目のフレームの音声信号ｓ（ｎ）の全てが入力用バッファメモリ１２２に蓄積完了したときには直ちに、雑音検出部１４０は、当該音声信号ｓ（ｎ）に雑音が含まれるか否かを検出する（Ｓ２０２）。 As shown in FIG. 14, first, the audio signal processing apparatus 100 determines whether or not an audio signal for one frame input from the microphone 51 has been accumulated in the input buffer memory 122 (S200). Here, processing when the audio signal s (n) of the nth frame is currently being input will be described. As a result of the determination in S200, when all of the audio signal s (n) of the nth frame is completely stored in the input buffer memory 122, the noise detection unit 140 immediately includes noise in the audio signal s (n). Is detected (S202).

Ｓ２０２の判定の結果、雑音が検出された場合には直ちに、図１２に示した第１の仮補間信号の生成処理が実行される。即ち、第１の仮補間信号生成部１５６は、出力用バッファメモリ１３２に保存されているｎ−１番目のフレーム（１フレーム分過去）の音声信号ｓ（ｎ−１）を用いて、第１の仮補間信号ｐ（ｎ）を生成する（Ｓ２０４）。そして、第１の仮補間信号生成部１５６は、出力用バッファメモリ１３２からｎ−１番目のフレームの音声信号ｓ（ｎ−１）をそのまま信号出力部１６０に出力するとともに、第１の仮補間信号ｐ（ｎ）を出力用バッファメモリ１３２に保存する（Ｓ２０６）。 If noise is detected as a result of the determination in S202, the first temporary interpolation signal generation process shown in FIG. 12 is immediately executed. That is, the first temporary interpolation signal generation unit 156 uses the audio signal s (n−1) of the (n−1) th frame (one frame past) stored in the output buffer memory 132 to The temporary interpolation signal p (n) is generated (S204). Then, the first temporary interpolation signal generation unit 156 outputs the audio signal s (n−1) of the (n−1) th frame from the output buffer memory 132 to the signal output unit 160 as it is, and also performs the first temporary interpolation. The signal p (n) is stored in the output buffer memory 132 (S206).

次いで、新たに入力されたｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）を入力用バッファメモリ１２２に蓄積し、当該音声信号ｓ（ｎ＋１）の全てが入力用バッファメモリ１２２に蓄積完了したか否を判定する（Ｓ２１０）。この結果、ｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）の全てが入力用バッファメモリ１２２に蓄積完了したときには直ちに、図１３に示した第２の仮補間信号の生成処理及び補間処理が実行される。 Next, the newly input audio signal s (n + 1) of the (n + 1) th frame is accumulated in the input buffer memory 122, and whether or not all the audio signals s (n + 1) have been accumulated in the input buffer memory 122. Determine (S210). As a result, when all of the audio signal s (n + 1) of the (n + 1) th frame is completely stored in the input buffer memory 122, the second temporary interpolation signal generation process and the interpolation process shown in FIG. .

即ち、第２の仮補間信号生成部１５７は、入力用バッファメモリ１２２に保存されているｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）を用いて、第２の仮補間信号ｑ（ｎ）を生成する（Ｓ２１４）。そして、補間信号生成部１５２は、出力用バッファメモリ１３２に保存されている第１の仮補間信号ｐ（ｎ）と、Ｓ２１４で生成された第２の仮補間信号ｑ（ｎ）から、補間信号ｖ（ｎ）を生成する（Ｓ２１６）。さらに、信号補間部１５４は、Ｓ２１６で生成された補間信号ｖ（ｎ）を用いて、雑音を含むｎ番目のフレームの音声信号ｓ（ｎ）を補間し、補間後の信号を出力用バッファメモリ１３２に保存する（Ｓ２１８）。このＳ２１８の補間処理では、雑音を含むｎ番目のフレームの音声信号ｓ（ｎ）を補間信号ｖ（ｎ）に置換してもよいし、当該音声信号ｓ（ｎ）と補間信号ｖ（ｎ）を適切な混合比で合成してもよい。以下では、置換した例について説明する。 That is, the second temporary interpolation signal generation unit 157 generates the second temporary interpolation signal q (n) using the audio signal s (n + 1) of the (n + 1) th frame stored in the input buffer memory 122. (S214). Then, the interpolation signal generation unit 152 generates an interpolation signal from the first temporary interpolation signal p (n) stored in the output buffer memory 132 and the second temporary interpolation signal q (n) generated in S214. v (n) is generated (S216). Further, the signal interpolation unit 154 uses the interpolation signal v (n) generated in S216 to interpolate the noise signal s (n) of the nth frame including noise, and outputs the interpolated signal as an output buffer memory. It is stored in 132 (S218). In the interpolation processing in S218, the audio signal s (n) of the nth frame including noise may be replaced with the interpolation signal v (n), or the audio signal s (n) and the interpolation signal v (n) may be replaced. May be synthesized at an appropriate mixing ratio. Below, the substituted example is demonstrated.

その後、信号補間部１５４は、ｎ番目のフレームの音声信号ｓ（ｎ）に換えて、Ｓ２１８で出力用バッファメモリ１３２に保存された補間信号ｖ（ｎ）を、信号出力部１６０に出力する（Ｓ２２０）。そして、入力用バッファメモリ１２２に保存されたｎ＋１番目のフレームの音声信号ｓ（ｎ）を出力用バッファメモリ１３２に移動させる。 Thereafter, the signal interpolation unit 154 outputs the interpolation signal v (n) stored in the output buffer memory 132 in S218 to the signal output unit 160 in place of the audio signal s (n) of the nth frame ( S220). Then, the audio signal s (n) of the (n + 1) th frame stored in the input buffer memory 122 is moved to the output buffer memory 132.

一方、上記Ｓ２０２の雑音判定の結果、ｎ番目のフレームの音声信号ｓ（ｎ）に雑音が検出されない場合は、上記のような補間処理を行わず、通常の入出力処理を行う。即ち、図１１に示したように、出力用バッファメモリ１３２からｎ−１番目のフレームの音声信号ｓ（ｎ−１）をそのまま信号出力部１６０出力し、入力用バッファメモリ１２２に保存されたｎ番目のフレームの音声信号ｓ（ｎ）を出力用バッファメモリ１３２に移動させる（Ｓ２０８）。そして、次のｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）の全てが入力用バッファメモリ１２２に蓄積完了したときに（Ｓ２１０）、出力用バッファメモリ１３２からｎ番目のフレームの音声信号ｓ（ｎ）をそのまま信号出力部１６０に出力し（Ｓ２２０）、入力用バッファメモリ１２２に保存されたｎ＋１番目のフレームの音声信号ｓ（ｎ）を出力用バッファメモリ１３２に移動させる。 On the other hand, if no noise is detected in the sound signal s (n) of the nth frame as a result of the noise determination in S202, normal input / output processing is performed without performing the above interpolation processing. That is, as shown in FIG. 11, the audio signal s (n−1) of the (n−1) th frame is output from the output buffer memory 132 as it is, and is output to the signal output unit 160 and stored in the input buffer memory 122. The audio signal s (n) of the th frame is moved to the output buffer memory 132 (S208). Then, when all the audio signals s (n + 1) of the next n + 1th frame have been accumulated in the input buffer memory 122 (S210), the audio signal s (n) of the nth frame is output from the output buffer memory 132. Is directly output to the signal output unit 160 (S220), and the audio signal s (n) of the (n + 1) th frame stored in the input buffer memory 122 is moved to the output buffer memory 132.

その後、デジタルカメラ１による撮像及び録音動作が終了（Ｓ２２２）するまで、入力音声信号の次の１フレームの音声信号ｓ（ｎ＋２）に対して、上記Ｓ２００〜Ｓ２２０の処理が繰り返される。これにより、入力音声信号に対して１フレームごとに雑音の検出処理が行われ、必要に応じて補間処理（雑音低減処理）が施された上で、雑音の無い音声信号がフレーム単位で出力される。 Thereafter, the processes of S200 to S220 are repeated for the audio signal s (n + 2) of the next frame of the input audio signal until the imaging and recording operation by the digital camera 1 is completed (S222). As a result, noise detection processing is performed on the input audio signal for each frame, and if necessary, interpolation processing (noise reduction processing) is performed, and then a noise-free audio signal is output in frame units. The

［２．５．効果］
以上、本開示の第２の実施形態に係る音声信号処理装置１００の構成と、これを用いた音声信号処理方法について説明した。第２の実施形態によれば、雑音区間の前後の音声信号ｓ（ｎ−１）、ｓ（ｎ＋１）を用いて補間信号を生成することで、雑音区間の背景音（雑音を除いた外部音声）を適切に表す補間信号を高精度で推定できる。従って、補間処理の精度を高めて、雑音を低減しつつ、背景音を高精度で再現できるので、雑音低減処理の精度を大幅に向上できる。 [2.5. effect]
The configuration of the audio signal processing apparatus 100 according to the second embodiment of the present disclosure and the audio signal processing method using the same have been described above. According to the second embodiment, the interpolated signal is generated using the audio signals s (n−1) and s (n + 1) before and after the noise interval, so that the background sound in the noise interval (external sound excluding noise) is generated. ) Can be estimated with high accuracy. Accordingly, the accuracy of the interpolation processing can be increased to reduce the noise and the background sound can be reproduced with high accuracy, so that the accuracy of the noise reduction processing can be greatly improved.

また、第１の実施形態と同様に、信号の入出力に用いるバッファメモリを、補間処理にも有効活用することで、補間信号の推定に必要なバッファメモリ長を減少でき、装置全体で必要なバッファメモリを削減できる。第２の実施形態でも、補間信号の推定に必要なバッファメモリ長は２＊Ｎで済むので、上記従来の補間方法（図１参照。）が少なくとも３＊Ｎのバッファメモリ長が必要であるのと比べて、補間処理に必要なバッファメモリを大幅に削減できる。 Similarly to the first embodiment, the buffer memory used for signal input / output is also effectively used for interpolation processing, so that the buffer memory length required for interpolation signal estimation can be reduced, which is necessary for the entire apparatus. Buffer memory can be reduced. Also in the second embodiment, the buffer memory length required for estimation of the interpolation signal is only 2 * N. Therefore, the conventional interpolation method (see FIG. 1) requires a buffer memory length of at least 3 * N. Compared to the above, the buffer memory required for the interpolation process can be greatly reduced.

さらに、本実施形態によれば、２つのバッファメモリを有効利用してフレーム単位での音声信号の処理を好適に制御することで、遅延が少ない高品質の雑音低減処理を実現できる。即ち、上記従来の補間方法（図１参照。）では、雑音区間の前後の信号を用いて補間処理するために、上記のように少なくとも２＊Ｎ分の遅延（２フレーム分の遅延）が発生していた。これに対し、本実施形態によれば、雑音区間の前後の音声信号ｓ（ｎ−１）、ｓ（ｎ＋１）を用いて補間信号ｖ（ｎ）を生成するけれども、入力用バッファメモリ１２２に対する音声信号ｓ（ｎ＋１）の蓄積が完了した時点で直ちに補間信号ｖ（ｎ）を生成して出力できる。これにより、入力音声に対する出力音声の遅延を１フレーム分（遅延量：Ｎ）に抑えることができるので、補間処理に伴う出力音声の遅延を従来の補間方法よりも半分に低減できる。 Furthermore, according to the present embodiment, high-quality noise reduction processing with less delay can be realized by effectively controlling processing of audio signals in units of frames by effectively using two buffer memories. That is, in the conventional interpolation method (see FIG. 1), since interpolation processing is performed using signals before and after the noise interval, a delay of at least 2 * N (delay of 2 frames) occurs as described above. Was. On the other hand, according to the present embodiment, the interpolated signal v (n) is generated using the audio signals s (n−1) and s (n + 1) before and after the noise section, but the audio to the input buffer memory 122 is generated. The interpolation signal v (n) can be generated and output immediately when the accumulation of the signal s (n + 1) is completed. Thereby, since the delay of the output sound with respect to the input sound can be suppressed to one frame (delay amount: N), the delay of the output sound accompanying the interpolation process can be reduced to half that of the conventional interpolation method.

＜３．第３の実施の形態＞
次に、本開示の第３の実施形態に係る音声信号処理装置及び音声信号処理方法について説明する。第３の実施形態に係る音声信号処理装置は、雑音の開始点、終了点検出した上で、雑音の前後の信号を用いて補間信号を生成し、雑音の開始点から終了点までの信号に対して補間処理を行うことを特徴としている。なお、第３の実施形態のその他の機能構成は、上記第２の実施形態と実質的に同一であるので、その詳細説明は省略する。 <3. Third Embodiment>
Next, an audio signal processing device and an audio signal processing method according to the third embodiment of the present disclosure will be described. The audio signal processing apparatus according to the third embodiment detects a noise start point and an end point, generates an interpolation signal using signals before and after the noise, and generates a signal from the noise start point to the end point. It is characterized in that an interpolation process is performed on it. The remaining functional configuration of the third embodiment is substantially the same as that of the second embodiment, and a detailed description thereof will be omitted.

［３．１．機械音低減方法の概要］
まず、第３の実施形態に係る機械音低減方法の概要について説明する。 [3.1. Outline of mechanical noise reduction method]
First, an outline of the mechanical sound reduction method according to the third embodiment will be described.

上述した第１、２の実施形態では、図７等に示すように、パルス機械音等の雑音が音声信号の１フレーム内に収まっていることを前提として、フレーム単位で補間処理を行っていた。しかしながら、実際には、１つの雑音が音声信号の１フレーム内に必ず収まる訳ではなく、図１６に示すように１つの雑音が２つのフレームに跨って存在する場合も生じうる。つまり、このような場合には、前述した第１、第２の実施形態に係る補間方法では、雑音を好適に低減することが困難になる。 In the first and second embodiments described above, as shown in FIG. 7 and the like, interpolation processing is performed in units of frames on the assumption that noise such as pulse mechanical sound is within one frame of the audio signal. . However, in practice, one noise does not necessarily fall within one frame of the audio signal, and there may be a case where one noise exists over two frames as shown in FIG. That is, in such a case, it is difficult to suitably reduce noise with the interpolation methods according to the first and second embodiments described above.

そこで、第３の実施形態では、雑音基準点検出部により雑音の基準点を検出することにより、雑音が２つのフレームに跨って存在する場合であっても、その雑音の前後の信号を用いて雑音を効果的に低減するものである。雑音基準点は、音声信号における雑音の位置を示す基準点であり、図１６に示すように、雑音開始点Ｐ_Ｓ、雑音中間点Ｐ_Ｍ、雑音終了点Ｐ_Ｅの３つである。この雑音基準点を検出することにより、フレーム単位以外にも、音声信号の任意の区間で補間処理を実現できる。 Therefore, in the third embodiment, the noise reference point is detected by the noise reference point detection unit, so that even if the noise exists across two frames, the signals before and after the noise are used. Noise is effectively reduced. The noise reference points are reference points indicating the position of noise in the speech signal, and are three noise start points P _S , noise intermediate points P _M , and noise end points P _E as shown in FIG. By detecting this noise reference point, interpolation processing can be realized in an arbitrary section of the audio signal in addition to the frame unit.

ここで、音声信号のフレームの位置と雑音の位置との関係について、より詳細に説明する。音声信号処理のみを考慮した場合には、フレームの位置、即ち、１フレームにおけるサンプルデータ数Ｎの決め方は任意である。一般的には、音声信号を周波数領域に変換するためにＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）を扱うことが多いため、サンプルデータ数Ｎとして、２のべき乗である「２５６」、「５１２」、「１０２４」等が広く使用される。ただし、周波数変換を行わない場合にはこの限りではない。 Here, the relationship between the frame position of the audio signal and the noise position will be described in more detail. When only audio signal processing is considered, the position of the frame, that is, how to determine the number N of sample data in one frame is arbitrary. In general, since FFT (Fast Fourier Transform) is often used to convert an audio signal to the frequency domain, the number N of sample data is “256”, “512”, “1024”. Etc. are widely used. However, this is not the case when frequency conversion is not performed.

一方、デジタルカメラ、ビデオカメラ等においては、カメラ内部のシステム制御クロックや映像信号（動画）に対して音声信号の同期をとる必要があるため、音声信号処理のフレームのサンプルデータ数Ｎを自由に決定することは難しい。ここで、フレームを長くとる（即ち、Ｎを大きくする）と、カメラシステムの遅延増加につながるため、現実的には、サンプルデータ数Ｎを１００〜２０００程度とすることが多い。 On the other hand, in digital cameras, video cameras, etc., it is necessary to synchronize the audio signal with the system control clock and video signal (moving image) inside the camera. It is difficult to decide. Here, taking a long frame (that is, increasing N) leads to an increase in the delay of the camera system. Therefore, in reality, the number N of sample data is often about 100 to 2000.

上記の理由により、パルス機械音の時間長（全体の時間幅）に合わせて、音声信号のフレームのサンプルデータ数Ｎを任意に決定することは現実には困難である。 For the above reason, it is actually difficult to arbitrarily determine the number N of sample data of the frame of the audio signal in accordance with the time length (total time width) of the pulse mechanical sound.

ところで、一般に、パルス機械音は他の雑音と比べて時間長が短いことを特徴としている。このため、パルス機械音の時間長は、音声信号のフレームのサンプルデータ数Ｎと同程度、またはそれよりも短いとみなしても問題ない。従って、パルス機械音全体が１フレーム中に収まれば（図７等参照。）、第１、第２の実施形態のような補間処理を問題なく行うことができる。 By the way, in general, a pulse mechanical sound is characterized by a shorter time length than other noises. For this reason, there is no problem even if the time length of the pulse mechanical sound is regarded as being equal to or shorter than the number N of sample data of the frame of the audio signal. Therefore, if the entire pulse mechanical sound is within one frame (see FIG. 7 and the like), the interpolation processing as in the first and second embodiments can be performed without any problem.

しかし、現実には、パルス機械音がフレームの境界からずれて存在し、２つのフレームに跨って存在することの方が多い（図１６参照。）。従って、音声信号に設定されたフレームの境界とは別に、パルス機械音の区切り（基準点）を検知した上で、その雑音の区切り位置の前後の信号を用いて、パルス機械音を補間処理することが好ましい。 However, in reality, there are more cases where the pulse mechanical sound is shifted from the boundary of the frame and exists across two frames (see FIG. 16). Therefore, apart from the frame boundaries set in the audio signal, a pulse mechanical sound break (reference point) is detected, and the pulse mechanical sound is interpolated using signals before and after the noise break position. It is preferable.

そこで第３の実施形態では、雑音（例えばパルス機械音）を含む音声信号のフレームが入力されたときに、当該雑音の基準点（雑音開始点Ｐ_Ｓ、雑音中間点Ｐ_Ｍ及び雑音終了点Ｐ_Ｅ）を検出し、フレームとは無関係に雑音区間を特定する。そして、雑音開始点Ｐ_Ｓよりも前の信号から、前部補間信号（第１の補間信号）を生成し、当該前部補間信号を用いて雑音の前半部分（雑音開始点Ｐ_Ｓから雑音中間点Ｐ_Ｍまでの区間）を補間する。さらに、次のフレームが入力したときに、雑音終了点Ｐ_Ｅよりも後の信号から後部補間信号（第２の補間信号）を生成し、当該後部補間信号を用いて雑音の後半部分（雑音中間点Ｐ_Ｍから雑音終了点Ｐ_Ｅまでの区間）を補間する。 Therefore, in the third embodiment, when a frame of an audio signal including noise (for example, pulse mechanical sound) is input, the noise reference point (noise start point P _S , noise intermediate point P _M, and noise end point P). _E )) is detected, and the noise interval is specified regardless of the frame. Then, from the previous signal than noise start point P _S, it generates a front interpolation signal (first interpolation signal), noise intermediate from the first half (noise start point P _S noise by using the front interpolation signal interpolating the section) to the point _{P M.} Further, when the next frame is input, it generates a rear interpolation signal (second interpolation signal) from the signal after the noise end point P _E, the latter part of the noise using the posterior interpolation signal (noise intermediate interpolating the interval) from the point _{P M} until the noise end point _{P E.}

かかる補間処理により、雑音が音声信号の複数フレームに跨って存在する場合であっても、フレーム境界は関わらずに、当該雑音の前後の任意の区間の音声信号を用いて補間処理を行うことができるので、当該雑音を適切に低減することができる。以下に、第３の実施形態に係る音声信号処理装置及び方法について詳述する。 By such interpolation processing, even when noise exists over a plurality of frames of the audio signal, the interpolation processing can be performed using the audio signal in an arbitrary section before and after the noise regardless of the frame boundary. Therefore, the noise can be appropriately reduced. The audio signal processing apparatus and method according to the third embodiment will be described in detail below.

［３．２．音声信号処理装置の機能構成］
次に、図１５を参照して、第３の実施形態に係る音声信号処理装置１００の機能構成について説明する。図１５は、第３の実施形態に係る音声信号処理装置１００の機能構成を示すブロック図である。 [3.2. Functional configuration of audio signal processing apparatus]
Next, a functional configuration of the audio signal processing apparatus 100 according to the third embodiment will be described with reference to FIG. FIG. 15 is a block diagram illustrating a functional configuration of the audio signal processing apparatus 100 according to the third embodiment.

図１５に示すように、音声信号処理装置１００は、信号入力部１１０と、入力用バッファメモリ１２２（第１のバッファメモリ）と、出力用バッファメモリ１３２（第２のバッファメモリ）と、雑音検出部１４０と、雑音基準点検出部１４２と、雑音低減部１５０と、信号出力部１６０とを備える。また、上記入力用バッファメモリ１２２、出力用バッファメモリ１３２、雑音検出部１４０、雑音基準点検出部１４２及び雑音低減部１５０は、上記図２の音声信号処理部６０を構成する。なお、第３の実施形態に係る信号入力部１１０、入力用バッファメモリ１２２、出力用バッファメモリ１３２、雑音検出部１４０及び信号出力部１６０は、上記第２の実施形態の場合と実質的に同一の機能構成を有するので、詳細説明は省略する。 As shown in FIG. 15, the audio signal processing apparatus 100 includes a signal input unit 110, an input buffer memory 122 (first buffer memory), an output buffer memory 132 (second buffer memory), and noise detection. Unit 140, noise reference point detection unit 142, noise reduction unit 150, and signal output unit 160. The input buffer memory 122, output buffer memory 132, noise detection unit 140, noise reference point detection unit 142, and noise reduction unit 150 constitute the audio signal processing unit 60 of FIG. The signal input unit 110, the input buffer memory 122, the output buffer memory 132, the noise detection unit 140, and the signal output unit 160 according to the third embodiment are substantially the same as those in the second embodiment. Detailed description will be omitted.

第３の実施形態に係る音声信号処理装置１００は、雑音基準点検出部１４２を更に備えることを特徴としている。雑音基準点検出部１４２は、音声信号に含まれる雑音の信号特性に基づいて、音声信号に含まれる雑音（パルス機械音）の基準点（雑音開始点Ｐ_Ｓ、雑音中間点Ｐ_Ｍ及び雑音終了点Ｐ_Ｅ）を検出する。図１６に示すように、雑音開始点Ｐ_Ｓは、音声信号においてパルス機械音が開始する位置である。また、雑音中間点Ｐ_Ｍは、音声信号においてパルス機械音の中間の位置（例えばパルス成分の振幅が最大となる位置）である。さらに、雑音終了点Ｐ_Ｅは、音声信号においてパルス機械音が終了する位置である。雑音基準点検出部１４２によるこれら基準点の検出方法は、例えば以下の通りである。 The audio signal processing apparatus 100 according to the third embodiment is characterized by further including a noise reference point detection unit 142. The noise reference point detection unit 142 is configured to generate a reference point (noise start point P _S , noise intermediate point P _M, and noise end point) of noise (pulse mechanical sound) included in the audio signal based on the signal characteristics of noise included in the audio signal. The point P _E ) is detected. As shown in FIG. 16, the noise start point P _S is a position at which pulsed mechanical sound starts in the speech signal. Also, the noise midpoint P _M is an intermediate position of the pulsed mechanical sound in the speech signal (e.g., the amplitude of the pulse components is maximized position). Furthermore, the noise end point _PE is a position where the pulse mechanical sound ends in the audio signal. A method for detecting these reference points by the noise reference point detection unit 142 is, for example, as follows.

まず、雑音基準点検出部１４２は、雑音中間点Ｐ_Ｍを検出する。雑音中間点Ｐ_Ｍの検出方法としては、例えば以下の（ａ）〜（ｃ）が例示される。 First, the noise reference point detection unit 142 detects a noise intermediate point _{P M.} Detection methods for noise midpoint P _M is, for example, the following (a) ~ (c) are exemplified.

（ａ）振幅最大値を利用
パルス機械音の振幅の絶対値の最大値が存在する位置を、雑音中間点Ｐ_Ｍとしてもよい。図１６に示すように、パルス機械音は、パルス成分と残響成分を含み、パルス成分のパルスのピーク（振幅最大値）は概ねパルス機械音の中間点と一致する。従って、パルス機械音の振幅の絶対値が最大となる位置が、雑音中間点Ｐ_Ｍであると推定することができる。 (A) a position where the maximum value exists in the absolute value of the amplitude of the maximum amplitude available pulsed mechanical sound may be a noise midpoint P _M. As shown in FIG. 16, the pulse mechanical sound includes a pulse component and a reverberation component, and the pulse peak (maximum amplitude value) of the pulse component substantially coincides with the intermediate point of the pulse mechanical sound. Therefore, the absolute value of the amplitude of the pulsed mechanical sound becomes maximum position can be estimated to be the noise midpoint P _M.

（ｂ）雑音区間情報を利用
また、雑音検出部１４０から雑音区間情報を取得した時から一定時間が経過した時点の位置を雑音中間点Ｐ_Ｍとしてもよい。雑音検出部１４０は、雑音が含まれる区間を表す雑音区間情報を生成し、雑音基準点検出部１４２に出力することができる。この雑音区間情報は、上述した雑音検出処理により生成されてもよいし、又は、パルス機械音を発生する駆動装置１４の制御情報に基づいて生成されてもよい。 (B) noise interval information utilizing also the may be noise midpoint P _M the position when a certain time has elapsed from when obtaining the noise section information from the noise detecting unit 140. The noise detection unit 140 can generate noise section information representing a section including noise and output the noise section information to the noise reference point detection unit 142. This noise section information may be generated by the noise detection process described above, or may be generated based on control information of the driving device 14 that generates a pulse mechanical sound.

（ｃ）信号の傾きの変化値を利用
また、雑音検出部１４０からパルス機械音の信号の傾きが急峻に変化した直後の変化点の位置を雑音中間点Ｐ_Ｍとしてもよい。パルス機械音のパルス成分は振幅が急峻に変化するので、この急峻な変化の直後に、振幅の微分値がゼロとなる位置はパルス成分のピークを示す。従って、当該振幅の微分値の変化点の位置が、雑音中間点Ｐ_Ｍであると推定することができる。 Utilizing the inclination of the change value of (c) signal also the position of the change point immediately after the tilt of the pulsed mechanical sound signal from the noise detection unit 140 has changed abruptly may be noise midpoint P _M. Since the amplitude of the pulse component of the pulse mechanical sound changes abruptly, the position where the amplitude differential value becomes zero immediately after this abrupt change shows the peak of the pulse component. Therefore, the position of the change point of the differential value of the amplitude can be estimated to be the noise midpoint P _M.

次に、雑音基準点検出部１４２は、雑音開始点Ｐ_Ｓを検出する。雑音開始点Ｐ_Ｓの検出方法としては、例えば以下の（ａ）、（ｂ）が例示される。 Next, the noise reference point detection unit 142 detects a noise start point _{P S.} Detection methods for noise start point P _S, for example the following (a), is exemplified (b).

（ａ）信号エネルギーを利用
雑音中間点Ｐ_Ｍよりも前の音声信号において信号エネルギーが閾値よりも低下する点を、雑音開始点Ｐ_Ｓとしてもよい。図１６に示すように、一般に、パルス機械音は背景音よりも振幅が大きいため、パルス機械音が存在する部分の信号エネルギーは、背景音のみが存在する部分の信号エネルギーよりも大きくなる。従って、上記検出された雑音中間点Ｐ_Ｍよりも時間的に前の音声信号において、信号エネルギーが所定の閾値以下となる点が、雑音開始点Ｐ_Ｓであると推定することができる。 (A) Use of signal energy The point where the signal energy falls below the threshold in the audio signal before the noise intermediate point P _M may be set as the noise start point P _S. As shown in FIG. 16, since the pulse mechanical sound generally has a larger amplitude than the background sound, the signal energy of the portion where the pulse mechanical sound exists is larger than the signal energy of the portion where only the background sound exists. Therefore, it can be estimated that the point where the signal energy is equal to or lower than the predetermined threshold in the audio signal temporally before the detected noise intermediate point P _M is the noise start point P _S.

（ｂ）予め設定されたサンプルデータ数を利用
また、予め設定されたサンプルデータ数だけ雑音中間点Ｐ_Ｍよりも前の点を、雑音開始点Ｐ_Ｓとしてもよい。事前にパルス機械音の時間幅を測定し、雑音中間点Ｐ_Ｍと雑音開始点Ｐ_Ｓとの差分を予め求めておくことで、当該差分を表すサンプルデータ数をパラメータとして設定しておけばよい。このパラメータを用いて、雑音中間点Ｐ_Ｍから雑音開始点Ｐ_Ｓを推定できる。 (B) The pre-use set number of sample data, the point before the preset sample data number only noise midpoint P _M, may be a noise start point P _S. The time width of the pulse mechanical sound is measured in advance, and the difference between the noise intermediate point P _M and the noise start point P _S is obtained in advance, and the number of sample data representing the difference may be set as a parameter. . Using this parameter can be estimated noise start point P _S from the noise midpoint P _M.

さらに、雑音基準点検出部１４２は、雑音終了点Ｐ_Ｅを検出する。雑音終了点Ｐ_Ｅの検出方法は、上記の雑音開始点Ｐ_Ｓの検出方法と同様である。ただし、雑音中間点Ｐ_Ｍよりも前の信号ではなく、雑音中間点Ｐ_Ｍよりも後の信号において雑音終了点Ｐ_Ｅが検出される。 Furthermore, noise reference point detection unit 142 detects the noise end point _{P E.} Detection method of noise end point P _E is the same as the detection method of the noise start point P _S. However, rather than the previous signal than noise midpoint P _M, the noise end point P _E is detected in the signal later than the noise midpoint P _M.

以上のようにして、雑音基準点検出部１４２は、入力音声信号の雑音区間における実際の雑音の基準点を検出する。この雑音の基準点のうち雑音開始点Ｐ_Ｓから雑音終了点Ｐ_Ｅまでが、実際の雑音の範囲を表す。雑音開始点Ｐ_Ｓ及び雑音終了点Ｐ_Ｅは、音声信号における雑音と背景音との区切り位置となる。 As described above, the noise reference point detection unit 142 detects the actual noise reference point in the noise section of the input speech signal. From the noise start point P _S of the reference point of the noise to the noise end point P _E is representative of the range of actual noise. The noise start point P _S and the noise end point P _E are separation positions between noise and background sound in the audio signal.

次に、第３の実施形態に係る雑音低減部１５０について説明する。雑音低減部１５０は、補間信号生成部１５２と、信号補間部１５４を備える。そして、補間信号生成部１５２は、前部補間信号生成部１５８（第１の補間信号生成部）と、後部補間信号生成部１５９（第１の補間信号生成部）を備えることを特徴としている。 Next, the noise reduction unit 150 according to the third embodiment will be described. The noise reduction unit 150 includes an interpolation signal generation unit 152 and a signal interpolation unit 154. The interpolation signal generation unit 152 includes a front interpolation signal generation unit 158 (first interpolation signal generation unit) and a rear interpolation signal generation unit 159 (first interpolation signal generation unit).

雑音検出部１４０によりｎ番目のフレームの音声信号に雑音が検出された場合、前部補間信号生成部１５８は、上記雑音開始点Ｐ_Ｓよりも前の所定区間の音声信号を用いて、雑音の前半部分を補間するための前部補間信号（第１の補間信号）を生成する。例えば、前部補間信号生成部１５８は、ｎ−１番目、ｎ番目のフレームの音声信号のうち雑音開始点Ｐ_Ｓよりも前の音声信号において、雑音開始点Ｐ_Ｓと雑音中間点Ｐ_Ｍとの間の長さに相当する分だけ雑音開始点Ｐ_Ｓよりも前に位置する区間の音声信号から、前部補間信号を生成する。 If the noise in the audio signal of the n-th frame by the noise detection unit 140 is detected, the front interpolation signal generating unit 158, by using the audio signal before a predetermined interval than the noise start point P _S, noise A front interpolation signal (first interpolation signal) for interpolating the first half is generated. For example, the front interpolation signal generating unit 158, n-1-th, the n-th audio signal before the noise start point P _S of speech signal frame, the noise start point P _S and a noise intermediate point P _M from the interval of the audio signal located before the length amount corresponding noise start point P _S corresponding to between generates a front interpolation signal.

その後、ｎ＋１番目のフレームの音声信号が入力用バッファメモリ１２２に保存されたときに、後部補間信号生成部１５９は、上記雑音終了点Ｐ_Ｅよりも後の所定区間の音声信号を用いて、雑音の後半部分を補間するための後部補間信号（第２の補間信号）を生成する。例えば、後部補間信号生成部１５９は、ｎ番目、ｎ＋１番目のフレームの音声信号のうち雑音終了点Ｐ_Ｅよりも後の音声信号において、雑音中間点Ｐ_Ｍと雑音終了点Ｐ_Ｅとの間の長さに相当する分だけ雑音終了点Ｐ_Ｅよりも後に位置する区間の音声信号から、後部補間信号を生成する。 Then, when the n + 1-th frame of the speech signal is stored in the input buffer memory 122, a rear interpolation signal generating unit 159 uses the audio signal of a predetermined period later than the noise end point P _E, noise A rear interpolation signal (second interpolation signal) for interpolating the latter half of the signal is generated. For example, the rear interpolated signal generation unit 159 may determine the difference between the noise intermediate point P _M and the noise end point P _E in the audio signal after the noise end point P _E among the audio signals of the nth and n + 1th frames. from the interval of the speech signal located after the amount corresponding noise end point P _E corresponding to the length, to generate a rear interpolation signal.

このように、第３の実施形態では、第２の実施形態のようにフレームを基準として補間信号を生成するのではなく、上記雑音基準点によって特定される区間を基準として前部補間信号及び後部補間信号を生成する。これら前部補間信号及び後部補間信号の生成方法の詳細は後述する。 As described above, in the third embodiment, the interpolation signal is not generated on the basis of the frame as in the second embodiment, but the front interpolation signal and the rear portion are based on the section specified by the noise reference point. Generate an interpolation signal. Details of the method of generating these front interpolation signal and rear interpolation signal will be described later.

そして、信号補間部１５４は、上記前部補間信号生成部１５８により生成された前部補間信号を用いて、ｎ−１番目及び／又はｎ番目のフレームの音声信号に含まれる雑音の前半部分を補間する。さらに、信号補間部１５４は、上記後部補間信号生成部１５９により生成された後部補間信号を用いて、ｎ番目及び／又はｎ＋１番目のフレームの音声信号に含まれる雑音の後半部分を補間する。 Then, the signal interpolation unit 154 uses the front interpolation signal generated by the front interpolation signal generation unit 158 to calculate the first half of noise included in the audio signal of the (n−1) th and / or nth frame. Interpolate. Further, the signal interpolation unit 154 interpolates the latter half of the noise included in the audio signal of the nth and / or n + 1th frame using the rear interpolation signal generated by the rear interpolation signal generation unit 159.

例えば、信号補間部１５４は、音声信号に含まれる雑音の前半部分を前部補間信号で置換し、当該雑音の後半部分を後部補間信号で置換してもよい。或いは、信号補間部１５４は、音声信号に含まれる雑音の前半部分と前部補間信号を適当な混合比で合成し、雑音の後半部分と後部補間信号を適当な混合比で合成することで、補間処理を実行してもよい。この補間処理により、入力音声信号における雑音部分が補間されて、雑音が低減された音声信号が出力されるので、当該雑音を低減・除去することができる。 For example, the signal interpolation unit 154 may replace the first half of the noise included in the audio signal with the front interpolation signal and replace the second half of the noise with the rear interpolation signal. Alternatively, the signal interpolation unit 154 combines the first half of the noise and the front interpolation signal included in the audio signal with an appropriate mixing ratio, and combines the second half of the noise and the rear interpolation signal with an appropriate mixing ratio, Interpolation processing may be executed. By this interpolation processing, the noise portion in the input speech signal is interpolated and the speech signal with reduced noise is output, so that the noise can be reduced / removed.

［３．３．音声信号処理装置の動作］
次に、第３の実施形態に係る音声信号処理装置１００の動作について説明する。雑音がない通常時の動作は、第２の実施形態の場合（図１１参照。）と同様であるので詳細説明は省略する。以下では、第３の実施形態に係る雑音発生時の動作例について、雑音がｎ番目とｎ＋１番目のフレームに跨って存在する場合（第１動作例）と、雑音がｎ−１番目とｎ番目のフレームに跨って存在する場合（第２動作例）をそれぞれ説明する。なお、双方の場合とも、ｎ番目のフレームに雑音（パルス機械音）のパルス成分のピークが存在するため、ｎ番目のフレームの入力時に雑音が検出されるものとする。 [3.3. Operation of audio signal processing apparatus]
Next, the operation of the audio signal processing apparatus 100 according to the third embodiment will be described. Since the normal operation without noise is the same as in the second embodiment (see FIG. 11), detailed description is omitted. In the following, with respect to an operation example at the time of noise generation according to the third embodiment, when noise exists over the nth and n + 1th frames (first operation example), the noise is n−1th and nth. A case (second operation example) in which the frame exists over two frames will be described. In both cases, since the peak of the pulse component of noise (pulse mechanical sound) exists in the nth frame, it is assumed that noise is detected when the nth frame is input.

［３．３．１．雑音発生時の第１動作例］
まず、図１７、図１８を参照して、雑音がｎ番目とｎ＋１番目のフレームに跨って存在する場合の音声信号処理装置１００の第１動作例について説明する。図１７、図１８は、本実施形態に係る音声信号処理装置１００の雑音発生時の第１動作例を示す模式図である。 [3.3.1. First operation example when noise occurs]
First, a first operation example of the audio signal processing apparatus 100 in the case where noise exists across the nth and n + 1th frames will be described with reference to FIGS. 17 and 18. 17 and 18 are schematic diagrams illustrating a first operation example when noise occurs in the audio signal processing apparatus 100 according to the present embodiment.

図１７Ａに示すように、ｎ番目のフレームの音声信号ｓ（ｎ）の全てが入力用バッファメモリ１２２に蓄積完了し、かつ、当該音声信号ｓ（ｎ）に雑音のピークが含まれることが検出されたときには、図１７Ａに示す雑音基準点の検出処理及び前部補間信号の生成処理と、図１７Ｂに示す前部補間処理が直ちに実行される。 As shown in FIG. 17A, it is detected that all of the audio signal s (n) of the nth frame has been accumulated in the input buffer memory 122 and that the audio signal s (n) includes a noise peak. When this is done, the noise reference point detection process and the front interpolation signal generation process shown in FIG. 17A and the front interpolation process shown in FIG. 17B are immediately executed.

詳細には、まず、前部補間信号生成部１５８は、図１７Ａに示すように、雑音開始点Ｐ_Ｓから雑音前部区間長Ｌ_Ｆだけ前の点Ｐ_Ａまでの区間Ｓ_Ａの信号から、雑音前部区間Ｓ_Ｆを補間するための前部補間信号ｔ（ｎ）を生成する。ここで、雑音前部区間Ｓ_Ｆは、雑音開始点Ｐ_Ｓから雑音中間点Ｐ_Ｍまでの区間であり、雑音前部区間長Ｌ_Ｆは、雑音開始点Ｐ_Ｓから雑音中間点Ｐ_Ｍまでの区間の長さである。 Specifically, first, as shown in FIG. 17A, the front interpolated signal generation unit 158, from the signal in the section S _A from the noise start point P _S to the point P _A before the noise front section length L _F , generating a front interpolation signal t (n) for interpolating the noise front section S _F. Here, the noise front section S _F is a section from the noise start point P _S to noise midpoint P _M, the noise front section length L _F is from the noise start point P _S to noise intermediate point P _M The length of the section.

区間Ｓ_Ａは、パルス機械音の雑音開始点Ｐ_Ｓよりも前に存在し、雑音を含まない区間である。本実施形態では、区間Ｓ_Ａの区間長は、雑音前部区間長Ｌ_Ｆと同一となるように設定される。しかし、区間Ｓ_Ａの区間長は、雑音前部区間長Ｌ_Ｆに応じて適宜設定されればよく、Ｌ_Ｆより短い、又は長くてもよい。かかる区間Ｓ_Ａは、少なくともｎ番目のフレームの前部の区間を含み、雑音前部区間長Ｌ_Ｆによってはｎ−１番目のフレームの後部の区間をも含む。図１７Ａの例では、区間Ｓ_Ａはｎ番目及びｎ−１番目のフレームの双方に跨って設定されている。 Section S _A is present before the noise start point P _S of the pulsed mechanical sound, a section that does not contain noise. In the present embodiment, the section length of the section S _A is set equal to the noise front section length L _F. However, section length of the section S _A may be set as appropriate depending on the noise front section length L _F, shorter than L _F, or may be longer. Such interval S _A includes front section of at least n-th frame, by the noise front section length L _F also includes a rear section of the n-1 th frame. In the example of FIG. 17A, the section S _A is set across both the n-th and n-1 th frame.

前部補間信号生成部１５８は、ｎ番目及びｎ−１番目のフレームの音声信号ｓ（ｎ）、ｓ（ｎ−１）のうち上記区間Ｓ_Ａの信号を用いて、前部補間信号ｔ（ｎ）を生成する。この前部補間信号ｔ（ｎ）の生成方法は、前述した第１の実施形態に係る補間信号ｖ（ｎ）の生成方法と同様であり（図４、図５参照。）、例えば、区間Ｓ_Ａの信号を時間軸方向に反転させることで、前部補間信号ｔ（ｎ）が生成される。 Front interpolation signal generating unit 158, n-th and (n-1) th frame of the speech signal s (n), using the signal of the interval _{S A} of s (n-1), the front interpolation signal t ( n). The method for generating the front interpolation signal t (n) is the same as the method for generating the interpolation signal v (n) according to the first embodiment described above (see FIGS. 4 and 5). _The front interpolation signal t (n) is generated by inverting the _A signal in the time axis direction.

次いで、信号補間部１５４は、図１７Ｂに示すように、上記前部補間信号ｔ（ｎ）を用いて、ｎ番目のフレームの音声信号ｓ（ｎ）のうち、雑音前部区間Ｓ_Ｆの信号を補間する。図１７Ｂの前部補間処理の例では、入力用バッファメモリ１２２に保存されているｎ番目のフレームのうち雑音前部区間Ｓ_Ｆの音声信号ｓ（ｎ）が、前部補間信号ｔ（ｎ）に置換されている。かかる前部補間処理により、雑音前部区間Ｓ_Ｆの雑音が低減される。 Then, signal interpolation unit 154, as shown in FIG. 17B, by using the front interpolation signal t (n), of the n th frame of the speech signal s (n), the noise front section S _F of the signal Is interpolated. In the example of the front interpolation processing FIG. 17B, the audio signal s of the noise front section S _F of the n-th frame stored in the input buffer memory 122 (n) is a front interpolation signal t (n) Has been replaced. Such front interpolation processing, noise of the noise front section S _F is reduced.

次いで、図１７Ｃに示すように、上記前部補間処理後に直ちに、信号補間部１５４は、出力用バッファメモリ１３２に保存されているｎ−１番目のフレームの音声信号ｓ（ｎ−１）を信号出力部１６０に出力する。さらに、信号補間部１５４は、上記音声信号ｓ（ｎ−１）の出力とともに、入力用バッファメモリ１２２に保存されている前部補間音声信号ｓ（ｎ）＋ｔ（ｎ）を、出力用バッファメモリ１３２に移動させる。ここで、前部補間音声信号ｓ（ｎ）＋ｔ（ｎ）とは、上記前部補間信号ｔ（ｎ）により雑音前部区間Ｓ_Ｆが補間されたｎ番目のフレームの音声信号ｓ（ｎ）である。このように、前部補間音声信号ｓ（ｎ）＋ｔ（ｎ）を出力用バッファメモリ１３２に移動させておくことで、次のｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）が入力されたときに、前部補間音声信号ｓ（ｎ）＋ｔ（ｎ）のうちの後部雑音区間を補間することができる。 Next, as shown in FIG. 17C, immediately after the front interpolation process, the signal interpolation unit 154 receives the audio signal s (n−1) of the (n−1) th frame stored in the output buffer memory 132 as a signal. Output to the output unit 160. Further, the signal interpolation unit 154 outputs the front interpolated audio signal s (n) + t (n) stored in the input buffer memory 122 together with the output of the audio signal s (n−1) to the output buffer memory. Move to 132. Here, the front interpolated audio signal s (n) + t (n) is the nth frame audio signal s (n) in which the noise front section _SF is interpolated by the front interpolated signal t (n). It is. In this way, by moving the front interpolated audio signal s (n) + t (n) to the output buffer memory 132, when the audio signal s (n + 1) of the next n + 1th frame is input. The rear noise section of the front interpolated speech signal s (n) + t (n) can be interpolated.

次いで、図１８Ａに示すように、次のｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）が新たに入力され、入力用バッファメモリ１２２に蓄積されているときには、上記前部補間音声信号ｓ（ｎ）＋ｔ（ｎ）が出力用バッファメモリ１３２に一時保存されている。 Next, as shown in FIG. 18A, when the audio signal s (n + 1) of the next n + 1-th frame is newly input and stored in the input buffer memory 122, the front interpolated audio signal s (n) + T (n) is temporarily stored in the output buffer memory 132.

そして、ｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）の全てが入力用バッファメモリ１２２に蓄積完了したときには、図１８Ａに示す後部補間信号の生成処理と、図１８Ｂに示す後部補間処理が直ちに実行される。 Then, when all of the audio signal s (n + 1) of the (n + 1) th frame has been accumulated in the input buffer memory 122, the rear interpolation signal generation process shown in FIG. 18A and the rear interpolation process shown in FIG. 18B are immediately executed. The

詳細には、まず、後部補間信号生成部１５９は、図１８Ａに示すように、雑音終了点Ｐ_Ｅから雑音後部区間長Ｌ_Ｒだけ前の点Ｐ_Ｂまでの区間Ｓ_Ｂの信号から、雑音後部区間Ｓ_Ｒを補間するための後部補間信号ｕ（ｎ）を生成する。ここで、雑音後部区間Ｓ_Ｒは、雑音中間点Ｐ_Ｍから雑音終了点Ｐ_Ｅまでの区間であり、雑音後部区間長Ｌ_Ｒは、雑音中間点Ｐ_Ｍから雑音終了点Ｐ_Ｅまでの区間の長さである。 More specifically, first, the rear interpolation signal generating unit 159, as shown in FIG. 18A, the section S _B of the signal from the noise end point P _E to P _B point earlier by noise rear section length L _R, noise rear generating a rear interpolation signal u (n) for interpolating the section S _R. Here, the noise rear section S _R is a section from the noise intermediate point P _M to the noise end point P _E , and the noise rear section length L _R is a section of the section from the noise intermediate point P _M to the noise end point P _E. Length.

区間Ｓ_Ｂは、パルス機械音の雑音終了点Ｐ_Ｅよりも後に存在し、雑音を含まない区間である。本実施形態では、区間Ｓ_Ｂの区間長は、雑音後部区間長Ｌ_Ｒと同一となるように設定される。しかし、区間Ｓ_Ｂの区間長は、雑音後部区間長Ｌ_Ｒに応じて適宜設定されればよく、Ｌ_Ｂより短い、又は長くてもよい。かかる区間Ｓ_Ｂは、少なくともｎ番目のフレームの後部の区間、及びｎ＋１番目のフレームの前部の区間を含み、雑音後部区間長Ｌ_Ｒによっては、ｎ＋２番目のフレームの前部の区間をも含む。図１８Ａの例では、区間Ｓ_Ｂはｎ番目及びｎ＋１番目のフレームの双方に跨って設定されている。 Section S _B is present after the noise end point P _E of pulsed mechanical sound, a section that does not contain noise. In the present embodiment, the section length of the section S _B is set equal to the noise rear section length L _R. However, section length of the section S _B may be set appropriately in accordance with the noise rear section length L _R, shorter than L _B, or may be longer. Such section S _B includes front section of at least n-th rear section of the frame and (n + 1) th frame, depending on the noise rear section length L _R, including a front section of the n + 2 th frame . In the example of FIG. 18A, the section S _B is set across both the n-th and (n + 1) th frame.

後部補間信号生成部１５９は、ｎ番目及びｎ＋１番目のフレームの音声信号ｓ（ｎ）、ｓ（ｎ＋１）のうち上記区間Ｓ_Ｂの信号を用いて、後部補間信号ｕ（ｎ）を生成する。この後部補間信号ｕ（ｎ）の生成方法は、前述した第１の実施形態に係る補間信号ｖ（ｎ）の生成方法と同様であり（図４、図５参照。）、例えば、区間Ｓ_Ｂの信号を時間軸方向に反転させることで、後部補間信号ｕ（ｎ）が生成される。 Rear interpolation signal generating unit 159, n-th and (n + 1) th frame of the speech signal s (n), using the signal of the section _{S B} of s (n + 1), and generates a rear interpolation signal u (n). The method of generating the rear interpolation signal u (n) is the same as the method of generating the interpolation signal v (n) according to the first embodiment described above (see FIGS. 4 and 5). For example, the section S _B Is inverted in the time axis direction to generate the rear interpolation signal u (n).

次いで、信号補間部１５４は、図１８Ｂに示すように、上記後部補間信号ｕ（ｎ）を用いて、ｎ番目及びｎ＋１番目のフレームの音声信号ｓ（ｎ）、ｓ（ｎ＋１）のうち、雑音後部区間Ｓ_Ｒの信号を補間する。図１８Ｂの後部補間処理の例では、出力用バッファメモリ１３２に保存されているｎ番目のフレームの前部補間音声信号ｓ（ｎ）＋ｔ（ｎ）、及び入力用バッファメモリ１２２に保存されているｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）のうち、雑音後部区間Ｓ_Ｒの信号が、後部補間信号ｕ（ｎ）に置換されている。かかる後部補間処理により、雑音後部区間Ｓ_Ｒの雑音が低減される。 Next, as shown in FIG. 18B, the signal interpolation unit 154 uses the rear interpolation signal u (n) to generate noise among the audio signals s (n) and s (n + 1) of the nth and n + 1th frames. interpolating the signals of the rear section S _R. In the example of the rear interpolation processing of FIG. 18B, the front interpolation audio signal s (n) + t (n) of the nth frame stored in the output buffer memory 132 and the input buffer memory 122 are stored. among (n + 1) th frame of the speech signal s (n + 1), the signal of the noise rear section S _R is substituted at the rear interpolation signal u (n). Such rear interpolation processing, noise of the noise rear section S _R is reduced.

次いで、図１８Ｃに示すように、上記後部補間処理後に直ちに、信号補間部１５４は、図１７Ａで実際に入力されたｎ番目のフレームの音声信号ｓ（ｎ）に換えて、出力用バッファメモリ１３２に保存されている前後部補間信号ｓ（ｎ）＋ｔ（ｎ）＋ｕ（ｎ）を、信号出力部１６０に出力する。ここで、前後部補間音声信号ｓ（ｎ）＋ｔ（ｎ）＋ｕ（ｎ）とは、上記前部補間信号ｔ（ｎ）により雑音前部区間Ｓ_Ｆが補間され、かつ、上記後部補間信号ｕ（ｎ）により雑音後部区間Ｓ_Ｒが補間されたｎ番目のフレームの音声信号ｓ（ｎ）である。 Next, as shown in FIG. 18C, immediately after the rear interpolation process, the signal interpolation unit 154 replaces the audio signal s (n) of the nth frame actually input in FIG. 17A with the output buffer memory 132. The front / rear interpolation signal s (n) + t (n) + u (n) stored in is output to the signal output unit 160. Here, the front and rear interpolated audio signal s (n) + t (n ) + u (n), the noise front section _{S F} by the front interpolation signal t (n) are interpolated, and the rear interpolation signal u (n) by the noise rear section S _R is interpolated n-th frame of the speech signal s (n).

さらに、図１８Ｃに示すように、信号補間部１５４は、上記前後部補間音声信号ｓ（ｎ）＋ｔ（ｎ）＋ｕ（ｎ）の出力とともに、入力用バッファメモリ１２２に保存されている、ｎ＋１番目のフレームの前部補間音声信号ｕ（ｎ）＋ｓ（ｎ＋１）を、出力用バッファメモリ１３２に移動させる。これにより、次にｎ＋２番目のフレームの音声信号ｓ（ｎ＋１）が入力されたときに、雑音が低減されたｎ＋１番目のフレームの前部補間音声信号ｕ（ｎ）＋ｓ（ｎ＋１）を出力することが可能となる。 Further, as shown in FIG. 18C, the signal interpolating unit 154 includes the output of the front and rear interpolated audio signals s (n) + t (n) + u (n) and the (n + 1) th stored in the input buffer memory 122. The front interpolated audio signal u (n) + s (n + 1) of the current frame is moved to the output buffer memory 132. As a result, when the audio signal s (n + 1) of the (n + 2) th frame is input next, the front-interpolated audio signal u (n) + s (n + 1) of the (n + 1) th frame with reduced noise is output. Is possible.

上記第１動作例のように、雑音がｎ番目とｎ＋１番目のフレームに跨って存在する場合には、ｎ番目とｎ−１番目のフレームのうち雑音開始点Ｐ_Ｓの直前の信号を用いて雑音前部区間Ｓ_Ｆが補間され、ｎ番目とｎ＋１番目のフレームのうち雑音終了点Ｐ_Ｅの直後の信号を用いて雑音後部区間Ｓ_Ｒが補間される。 As in the above first operation example, in the presence of noise across n-th and (n + 1) th frame, by using a signal just before the noise start point P _S of the n-th and n-1 th frame noise front section S _F is interpolated noise rear section S _R using the signal immediately after the n-th and n + 1 th noise end point P _E of the frames is interpolated.

［３．３．２．雑音発生時の第２動作例］
次に、図１９、図２０を参照して、雑音がｎ−１番目とｎ番目のフレームに跨って存在する場合の音声信号処理装置１００の第２動作例について説明する。図１９、図２０は、本実施形態に係る音声信号処理装置１００の雑音発生時の第２動作例を示す模式図である。 [3.3.2. Second operation example when noise occurs]
Next, with reference to FIGS. 19 and 20, a second operation example of the audio signal processing apparatus 100 when noise is present across the (n−1) th and nth frames will be described. 19 and 20 are schematic diagrams illustrating a second operation example when noise occurs in the audio signal processing apparatus 100 according to the present embodiment.

図１９Ａに示すように、ｎ番目のフレームの音声信号ｓ（ｎ）の全てが入力用バッファメモリ１２２に蓄積完了し、かつ、当該音声信号ｓ（ｎ）に雑音のピークが含まれることが検出されたときには、図１９Ａに示す雑音基準点の検出処理及び前部補間信号の生成処理と、図１９Ｂに示す前部補間処理が直ちに実行される。 As shown in FIG. 19A, it is detected that all of the audio signal s (n) of the nth frame has been accumulated in the input buffer memory 122 and that the audio signal s (n) includes a noise peak. If so, the noise reference point detection process and the front interpolation signal generation process shown in FIG. 19A and the front interpolation process shown in FIG. 19B are immediately executed.

詳細には、まず、前部補間信号生成部１５８は、図１９Ａに示すように、雑音開始点Ｐ_Ｓから雑音前部区間長Ｌ_Ｆだけ前の点Ｐ_Ａまでの区間Ｓ_Ａの信号から、雑音前部区間Ｓ_Ｆを補間するための前部補間信号ｔ（ｎ）を生成する。雑音前部区間Ｓ_Ｆ及び区間Ｓ_Ａ等の定義は、前述の第１動作例と同様である。ただし、第２動作例では、雑音開始点Ｐ_Ｓがｎ−１番目のフレームに存在するため、雑音前部区間Ｓ_Ｆは、ｎ−１番目及びｎ番目のフレームに跨って存在する。また、区間Ｓ_Ａは、少なくともｎ−１番目のフレームの一部の区間を含み、雑音前部区間長Ｌ_Ｆによってはｎ−２番目のフレームの後部の区間をも含む。図１９Ａの例では、区間Ｓ_Ａはｎ−１番目のフレーム内に設定されている。 Specifically, first, as shown in FIG. 19A, the front interpolation signal generation unit 158, from the signal in the section S _A from the noise start point P _S to the point P _A before the noise front section length L _F , generating a front interpolation signal t (n) for interpolating the noise front section S _F. Defining such noise front section S _F and section S _A is the same as in the first operation example described above. However, in the second operation example, since the noise start point P _S is present in the n-1 th frame, the noise front section S _F is present across n-1 th and n th frame. Furthermore, the section S _A includes a portion of the section of at least (n-1) th frame, by the noise front section length L _F also includes a rear section of the (n-2) th frame. In the example of FIG. 19A, the section S _A is set in the n-1 th frame.

前部補間信号生成部１５８は、ｎ−１番目のフレームの音声信号ｓ（ｎ−１）のうち上記区間Ｓ_Ａの信号を用いて、前部補間信号ｔ（ｎ）を生成する。この前部補間信号ｔ（ｎ）の生成方法は、上記第１動作例と同様である。 Front interpolation signal generating unit 158, by using the signal of the interval S _A among the n-1 th frame of the speech signal s (n-1), generates a front interpolation signal t (n). The method of generating the front interpolation signal t (n) is the same as that in the first operation example.

次いで、信号補間部１５４は、図１９Ｂに示すように、上記前部補間信号ｔ（ｎ）を用いて、ｎ−１番目及びｎ番目のフレームの音声信号ｓ（ｎ−１）、ｓ（ｎ）のうち、雑音前部区間Ｓ_Ｆの信号を補間する。図１９Ｂの前部補間処理の例では、出力用バッファメモリ１３２に保存されているｎ番目のフレーム、及び入力用バッファメモリ１２２に保存されているｎ番目のフレームのうち、雑音前部区間Ｓ_Ｆの音声信号ｓ（ｎ−１）、ｓ（ｎ）が、前部補間信号ｔ（ｎ）に置換されている。かかる前部補間処理により、雑音前部区間Ｓ_Ｆの雑音が低減される。 Next, as shown in FIG. 19B, the signal interpolation unit 154 uses the front interpolation signal t (n) to generate the audio signals s (n−1) and s (n (n−1) th and n−1th frames. of), interpolating the signal in the noise front section S _F. In the example of the front interpolation process of FIG. 19B, the noise front section S _F among the n th frame stored in the output buffer memory 132 and the n th frame stored in the input buffer memory 122. Audio signals s (n−1) and s (n) are replaced with the front interpolation signal t (n). Such front interpolation processing, noise of the noise front section S _F is reduced.

次いで、図１９Ｃに示すように、上記前部補間処理後に直ちに、信号補間部１５４は、出力用バッファメモリ１３２に保存されている、後部補間音声信号ｓ（ｎ−１）＋ｔ（ｎ）を信号出力部１６０に出力する。ここで、後部補間音声信号ｓ（ｎ−１）＋ｔ（ｎ）とは、上記前部補間信号ｔ（ｎ）により雑音後部区間Ｓ_Ｆが補間されたｎ−１番目のフレームの音声信号ｓ（ｎ−１）である。 Next, as shown in FIG. 19C, immediately after the front interpolation process, the signal interpolation unit 154 receives the rear interpolation audio signal s (n−1) + t (n) stored in the output buffer memory 132. Output to the output unit 160. Here, the rear interpolated audio signal s (n−1) + t (n) is an audio signal s (n−1) th frame in which the noise rear section _SF is interpolated by the front interpolated signal t (n). n-1).

さらに、図１９Ｃに示すように、信号補間部１５４は、上記後部補間音声信号ｓ（ｎ−１）＋ｔ（ｎ）の出力とともに、入力用バッファメモリ１２２に保存されている前部補間音声信号ｔ（ｎ）＋ｓ（ｎ）を、出力用バッファメモリ１３２に移動させる。ここで、前部補間音声信号ｔ（ｎ）＋ｓ（ｎ）とは、上記前部補間信号ｔ（ｎ）により雑音前部区間Ｓ_Ｆが補間されたｎ番目のフレームの音声信号ｓ（ｎ）である。このように、前部補間音声信号ｔ（ｎ）＋ｓ（ｎ）を出力用バッファメモリ１３２に移動させておくことで、次のｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）が入力されたときに、前部補間音声信号ｔ（ｎ）＋ｓ（ｎ）のうちの後部雑音区間を補間することができる。 Further, as shown in FIG. 19C, the signal interpolation unit 154 outputs the front interpolated audio signal t stored in the input buffer memory 122 together with the output of the rear interpolated audio signal s (n−1) + t (n). (N) + s (n) is moved to the output buffer memory 132. Here, the front interpolated audio signal t (n) + s (n) is the nth frame audio signal s (n) in which the noise front section _SF is interpolated by the front interpolated signal t (n). It is. In this way, by moving the front interpolated audio signal t (n) + s (n) to the output buffer memory 132, when the audio signal s (n + 1) of the next (n + 1) th frame is input. The rear noise section of the front interpolated speech signal t (n) + s (n) can be interpolated.

次いで、図２０Ａに示すように、次のｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）が新たに入力され、入力用バッファメモリ１２２に蓄積されているときには、上記前部補間音声信号ｔ（ｎ）＋ｓ（ｎ）が出力用バッファメモリ１３２に一時保存されている。 Next, as shown in FIG. 20A, when the audio signal s (n + 1) of the next n + 1-th frame is newly input and accumulated in the input buffer memory 122, the front interpolated audio signal t (n) + S (n) is temporarily stored in the output buffer memory 132.

そして、ｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）の全てが入力用バッファメモリ１２２に蓄積完了したときには、図２０Ａに示す後部補間信号の生成処理と、図２０Ｂに示す後部補間処理が直ちに実行される。 Then, when all of the audio signal s (n + 1) of the (n + 1) th frame has been accumulated in the input buffer memory 122, the rear interpolation signal generation process shown in FIG. 20A and the rear interpolation process shown in FIG. 20B are immediately executed. The

詳細には、まず、後部補間信号生成部１５９は、図２０Ａに示すように、雑音終了点Ｐ_Ｅから雑音後部区間長Ｌ_Ｒだけ前の点Ｐ_Ｂまでの区間Ｓ_Ｂの信号から、雑音後部区間Ｓ_Ｒを補間するための後部補間信号ｕ（ｎ）を生成する。雑音後部区間Ｓ_Ｒ及び区間Ｓ_Ｂ等の定義は、前述の第１動作例と同様である。ただし、第２動作例では、雑音終了点Ｐ_Ｅがｎ番目のフレームに存在するため、雑音後部区間Ｓ_Ｆは、ｎ＋１番目のフレーム内に存在する。また、区間Ｓ_Ｂは、少なくともｎ番目のフレームの一部の区間を含み、雑音後部区間長Ｌ_Ｒによってはｎ−１番目のフレームの前部後部の区間をも含む。図２０Ａの例では、区間Ｓ_Ｂはｎ番目及びｎ＋１番目のフレームの双方に跨って設定されている。 Specifically, first, as shown in FIG. 20A, the rear interpolation signal generation unit 159 generates a noise rear part from the signal in the section S _B from the noise end point P _E to the point P _B that is the noise rear section length L _R before. generating a rear interpolation signal u (n) for interpolating the section S _R. Defining such noise rear section S _R and segment S _B is the same as in the first operation example described above. However, in the second operation example, since the noise end point P _E is present in the n-th frame, the noise rear section S _F is present in the n + 1 th frame. Furthermore, the section S _B may comprise part of a section of at least n-th frame, by the noise rear section length L _R also includes a front rear section of the n-1 th frame. In the example of FIG. 20A, the section S _B is set across both the n-th and (n + 1) th frame.

後部補間信号生成部１５９は、ｎ番目及びｎ＋１番目のフレームの音声信号ｓ（ｎ）、ｓ（ｎ＋１）のうち上記区間Ｓ_Ｂの信号を用いて、後部補間信号ｕ（ｎ）を生成する。この後部補間信号ｕ（ｎ）の生成方法は、上記第１動作例と同様である。 Rear interpolation signal generating unit 159, n-th and (n + 1) th frame of the speech signal s (n), using the signal of the section _{S B} of s (n + 1), and generates a rear interpolation signal u (n). The method for generating the rear interpolation signal u (n) is the same as in the first operation example.

次いで、信号補間部１５４は、図２０Ｂに示すように、上記後部補間信号ｕ（ｎ）を用いて、ｎ番目のフレームの音声信号ｓ（ｎ）のうち、雑音後部区間Ｓ_Ｒの信号を補間する。図２０Ｂの後部補間処理の例では、出力用バッファメモリ１３２に保存されているｎ番目のフレームの前部補間音声信号ｔ（ｎ）＋ｓ（ｎ）のうち、雑音後部区間Ｓ_Ｒの信号が、後部補間信号ｕ（ｎ）に置換されている。かかる後部補間処理により、雑音後部区間Ｓ_Ｒの雑音が低減される。 Then, signal interpolation unit 154, as shown in FIG. 20B, by using the rear interpolation signal u (n), of the n-th frame voice signal s (n), interpolates the signal noise rear section S _R To do. In the example of the rear interpolation processing FIG. 20B, among the front interpolated audio signal t of the n-th frame stored in the output buffer memory 132 (n) + s (n ), the signal of the noise rear section S _R is, It is replaced by the rear interpolation signal u (n). Such rear interpolation processing, noise of the noise rear section S _R is reduced.

次いで、図２０Ｃに示すように、上記後部補間処理後に直ちに、信号補間部１５４は、図１９Ａで実際に入力されたｎ番目のフレームの音声信号ｓ（ｎ）に換えて、出力用バッファメモリ１３２に保存されている前後部補間信号ｔ（ｎ）＋ｕ（ｎ）＋ｓ（ｎ）を、信号出力部１６０に出力する。ここで、前後部補間音声信号ｔ（ｎ）＋ｕ（ｎ）＋ｓ（ｎ）とは、上記前部補間信号ｔ（ｎ）により雑音前部区間Ｓ_Ｆが補間され、かつ、上記後部補間信号ｕ（ｎ）により雑音後部区間Ｓ_Ｒが補間されたｎ番目のフレームの音声信号ｓ（ｎ）である。 Next, as shown in FIG. 20C, immediately after the rear interpolation process, the signal interpolation unit 154 replaces the audio signal s (n) of the nth frame actually input in FIG. 19A with the output buffer memory 132. The front / rear interpolation signal t (n) + u (n) + s (n) stored in is output to the signal output unit 160. Here, the front and rear interpolated audio signal t (n) + u (n ) + s (n), the noise front section _{S F} by the front interpolation signal t (n) are interpolated, and the rear interpolation signal u (n) by the noise rear section S _R is interpolated n-th frame of the speech signal s (n).

さらに、図２０Ｃに示すように、信号補間部１５４は、上記前後部補間音声信号ｔ（ｎ）＋ｕ（ｎ）＋ｓ（ｎ）の出力とともに、入力用バッファメモリ１２２に保存されている、ｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）を、出力用バッファメモリ１３２に移動させる。これにより、次にｎ＋２番目のフレームの音声信号ｓ（ｎ＋１）が入力されたときに、ｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）を出力することが可能となる。 Furthermore, as shown in FIG. 20C, the signal interpolation unit 154 stores the (n + 1) th stored in the input buffer memory 122 together with the output of the front and rear interpolated audio signals t (n) + u (n) + s (n). The audio signal s (n + 1) of the current frame is moved to the output buffer memory 132. As a result, when the audio signal s (n + 1) of the (n + 2) th frame is input next, the audio signal s (n + 1) of the (n + 1) th frame can be output.

上記第２動作例のように、雑音がｎ−１番目とｎ番目のフレームに跨って存在する場合には、ｎ−１番目のフレームのうち雑音開始点Ｐ_Ｓの直前の信号を用いて雑音前部区間Ｓ_Ｆが補間され、ｎ番目とｎ＋１番目のフレームのうち雑音終了点Ｐ_Ｅの直後の信号を用いて雑音後部区間Ｓ_Ｒが補間される。 As in the second operation example, in the presence of noise across n-1 th and n-th frame, by using a signal just before the noise start point P _S of the n-1 th frame noise front section S _F is interpolated noise rear section S _R using the signal immediately after the n-th and n + 1 th noise end point P _E of the frames is interpolated.

以上のように、本実施形態によれば、雑音が２つのフレームに跨って存在する場合に、当該雑音の基準点を検出し、雑音開始点Ｐ_Ｓより前の区間Ｓ_Ａの信号を用いて雑音前部区間Ｓ_Ｆを補間するとともに、雑音終了点Ｐ_Ｅより後の区間Ｓ_Ｂの信号を用いて雑音後部区間Ｓ_Ｒを補間する。これにより、フレーム単位で補間処理を行う必要が無くなり、雑音の直近の自由な区間の信号を用いて補間処理を行うことができるので、より一層、自然で高精度な補間処理を実行できるので、高品質の雑音低減を実現できる。 As described above, according to this embodiment, in the presence of noise over two frames, and detects the reference point of the noise, using a signal of a previous segment S _A from the noise start point P _S with interpolated noise front section S _F, it interpolates the noise rear section S _R using the signal of the section S _B after the noise end point P _E. This eliminates the need to perform interpolation processing on a frame-by-frame basis and allows interpolation processing to be performed using a signal in a free section closest to noise, so that even more natural and highly accurate interpolation processing can be executed. High quality noise reduction can be realized.

また、上記のように雑音基準点を検出して補間処理する場合であっても、入力用バッファメモリ１２２及び出力用バッファメモリ１３２のメモリ長はそれぞれ、１フレームのサンプルデータ数Ｎでよい。従って、第１、２の実施形態同様に、装置全体で必要なバッファメモリ長は２＊Ｎで済む。また、入力用バッファメモリ１２２に対して次のフレームの音声信号ｓ（ｎ＋１）が蓄積完了した時点で直ちに、１つ前のフレームの音声信号ｓ（ｎ）が外部に出力されるので、入力音声に対する出力音声の遅延は１フレーム分で済む。 Further, even when the noise reference point is detected and interpolation processing is performed as described above, the memory lengths of the input buffer memory 122 and the output buffer memory 132 may each be the number N of sample data of one frame. Therefore, as in the first and second embodiments, the buffer memory length required for the entire apparatus is 2 * N. Also, immediately after the audio signal s (n + 1) of the next frame is completely stored in the input buffer memory 122, the audio signal s (n) of the previous frame is output to the outside. The delay of the output voice with respect to is only one frame.

［３．４．音声信号処理方法］
次に、図２１を参照して、上記の音声信号処理装置１００を用いた音声信号処理方法（機械音低減方法）について説明する。図２１は、第３の実施形態に係る音声信号処理方法を示すフローチャートである。 [3.4. Audio signal processing method]
Next, an audio signal processing method (mechanical sound reduction method) using the audio signal processing apparatus 100 will be described with reference to FIG. FIG. 21 is a flowchart showing an audio signal processing method according to the third embodiment.

図２１に示すように、まず、音声信号処理装置１００は、マイクロホン５１から入力される１フレーム分の音声信号が入力用バッファメモリ１２２に蓄積されたか否かを判定する（Ｓ３００）。ここでは、現在、ｎ番目のフレームの音声信号ｓ（ｎ）が入力中である場合の処理について説明する。Ｓ３００の判定の結果、ｎ番目のフレームの音声信号ｓ（ｎ）の全てが入力用バッファメモリ１２２に蓄積完了したときには直ちに、雑音検出部１４０は、当該音声信号ｓ（ｎ）に雑音が含まれるか否かを検出する（Ｓ３０２）。 As shown in FIG. 21, first, the audio signal processing apparatus 100 determines whether or not an audio signal for one frame input from the microphone 51 has been accumulated in the input buffer memory 122 (S300). Here, processing when the audio signal s (n) of the nth frame is currently being input will be described. As a result of the determination in S300, as soon as all of the audio signal s (n) of the nth frame has been accumulated in the input buffer memory 122, the noise detection unit 140 includes noise in the audio signal s (n). Is detected (S302).

Ｓ３０２の判定の結果、雑音が検出された場合には直ちに、雑音基準点の検出処理（Ｓ３０４）、前部補間信号の生成処理（Ｓ３０６）、及び前部補間処理（Ｓ３０８）が実行される。 If noise is detected as a result of the determination in S302, a noise reference point detection process (S304), a front interpolation signal generation process (S306), and a front interpolation process (S308) are immediately executed.

詳細には、まず、雑音基準点検出部１４２は、上述したように、音声信号ｓ（ｓ）に含まれる雑音の特性に基づいて、雑音開始点Ｐ_Ｓ、雑音中間点Ｐ_Ｍ、雑音終了点Ｐ_Ｅを計算する（Ｓ３０４）。次いで、前部補間信号生成部１５８は、図１７、図１９に示したように、雑音開始点Ｐ_Ｓより前の所定区間Ｓ_Ａの音声信号ｓ（ｎ−１）、ｓ（ｎ）を用いて、雑音前部区間Ｓ_Ｆを補間するための前部補間信号ｔ（ｎ）を生成する（Ｓ３０６）。 Specifically, first, the noise reference point detection unit 142, as described above, based on the characteristics of noise included in the audio signal s (s), the noise start point P _S , the noise intermediate point P _M , and the noise end point. calculating a P _E (S304). Then, the front interpolation signal generating unit 158, FIG. 17, as shown in FIG. 19, a predetermined section before the noise start point _{P S} _{S A} speech signal s (n-1), with s (n) Te, generating the front interpolation signal t (n) for interpolating the noise front section _{S F} (S306).

さらに、信号補間部１５４は、Ｓ３０６で生成された前部補間信号ｔ（ｎ）を用いて、音声信号ｓ（ｎ−１）、ｓ（ｎ）のうちの雑音前部区間Ｓ_Ｆの信号を補間する（Ｓ３０８）。このＳ３０８の前部補間処理では、雑音前部区間Ｓ_Ｆの信号を前部補間信号ｔ（ｎ）に置換してもよいし、当該雑音前部区間Ｓ_Ｆの信号と前部補間信号ｔ（ｎ）を適切な混合比で合成してもよい。以下では、置換した例について説明する。 Furthermore, signal interpolation unit 154 uses the front interpolation signal t (n) generated by S306, the speech signal s (n-1), s the signal of the noise front section _{S F} of the (n) Interpolate (S308). In front interpolation processing in step S308, to the signal in the noise front section S _F may be replaced by the front interpolation signal t (n), the signal of the noise front section S _F and the front interpolation signal t ( n) may be synthesized at an appropriate mixing ratio. Below, the substituted example is demonstrated.

その後、信号補間部１５４は、入力用バッファメモリ１２２内の前部補間音声信号ｓ（ｎ）＋ｔ（ｎ）を出力用バッファメモリ１３２に移動させる（Ｓ３１０）。 After that, the signal interpolation unit 154 moves the front interpolation audio signal s (n) + t (n) in the input buffer memory 122 to the output buffer memory 132 (S310).

次いで、新たに入力されたｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）を入力用バッファメモリ１２２に蓄積し、当該音声信号ｓ（ｎ＋１）の全てが入力用バッファメモリ１２２に蓄積完了したか否を判定する（Ｓ３１２）。この結果、ｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）の全てが入力用バッファメモリ１２２に蓄積完了したときには直ちに、図１８、図２０に示した後部補間信号の生成処理（Ｓ３１６）、及び後部補間処理（Ｓ３１８）が実行される。 Next, the newly input audio signal s (n + 1) of the (n + 1) th frame is accumulated in the input buffer memory 122, and whether or not all the audio signals s (n + 1) have been accumulated in the input buffer memory 122. Determination is made (S312). As a result, when all of the audio signals s (n + 1) of the (n + 1) th frame are completely stored in the input buffer memory 122, the rear interpolation signal generation processing (S316) and the rear interpolation shown in FIGS. Processing (S318) is executed.

詳細には、まず、後部補間信号生成部１５９は、図１８、図２０に示したように、雑音終了点Ｐ_Ｅより後前の所定区間Ｓ_Ｂの音声信号ｓ（ｎ）、ｓ（ｎ＋１）を用いて、雑音後部区間Ｓ_Ｒを補間するための後部補間信号ｕ（ｎ）を生成する（Ｓ３１６）。 More specifically, first, the rear interpolation signal generating unit 159, FIG. 18, as shown in FIG. 20, a predetermined interval prior to back than the noise end point _{P E} _{S B} of the audio signal s (n), s (n + 1) It is used to generate a rear interpolation signal u (n) for interpolating the noise rear section _{S R} (S316).

次いで、信号補間部１５４は、Ｓ３１６で生成された後部補間信号ｕ（ｎ）を用いて、音声信号ｓ（ｎ）、ｓ（ｎ＋１）のうちの雑音後部区間Ｓ_Ｒの信号を補間する（Ｓ３１８）。このＳ３１８の後部補間処理では、雑音後部区間Ｓ_Ｒの信号を後部補間信号ｕ（ｎ）に置換してもよいし、当該雑音後部区間Ｓ_Ｒの信号と後部補間信号ｕ（ｎ）を適切な混合比で合成してもよい。以下では、置換した例について説明する。 Then, signal interpolation unit 154, using the rear interpolation signal u (n) generated in S316, interpolating the signal in the noise rear section _{S R} of the speech signal s (n), s (n + 1) (S318 ). In the rear interpolation processing in S318 is to signal noise rear section S _R may be replaced with the rear interpolation signal u (n), an appropriate signal of the noise rear section S _R and the rear interpolation signal u (n) You may synthesize | combine by a mixture ratio. Below, the substituted example is demonstrated.

その後、信号補間部１５４は、実際に入力されたｎ番目のフレームの音声信号ｓ（ｎ）に換えて、Ｓ３０８、Ｓ３１８で前部補間信号ｔ（ｎ）及び後部補間信号ｕ（ｎ）により補間された前後部補間音声信号ｓ（ｎ）、ｔ（ｎ）、ｕ（ｎ）を、信号出力部１６０に出力する（Ｓ３２０）。そして、入力用バッファメモリ１２２に保存されたｎ＋１番目のフレームの音声信号ｓ（ｎ）を出力用バッファメモリ１３２に移動させる。 Thereafter, the signal interpolation unit 154 interpolates with the front interpolation signal t (n) and the rear interpolation signal u (n) in S308 and S318, instead of the actually input n-th frame audio signal s (n). The front and rear interpolated audio signals s (n), t (n), and u (n) are output to the signal output unit 160 (S320). Then, the audio signal s (n) of the (n + 1) th frame stored in the input buffer memory 122 is moved to the output buffer memory 132.

一方、上記Ｓ３０２の雑音判定の結果、ｎ番目のフレームの音声信号ｓ（ｎ）に雑音が検出されない場合は、上記のような補間処理を行わず、通常の入出力処理を行う。即ち、図１１に示したように、出力用バッファメモリ１３２からｎ−１番目のフレームの音声信号ｓ（ｎ−１）をそのまま信号出力部１６０出力し、入力用バッファメモリ１２２に保存されたｎ番目のフレームの音声信号ｓ（ｎ）を出力用バッファメモリ１３２に移動させる（Ｓ３１０）。そして、次のｎ＋１番目のフレームの音声信号ｓ（ｎ＋１）の全てが入力用バッファメモリ１２２に蓄積完了したときに（Ｓ３１２）、出力用バッファメモリ１３２からｎ番目のフレームの音声信号ｓ（ｎ）をそのまま信号出力部１６０に出力し（Ｓ３２０）、入力用バッファメモリ１２２に保存されたｎ＋１番目のフレームの音声信号ｓ（ｎ）を出力用バッファメモリ１３２に移動させる。 On the other hand, if no noise is detected in the audio signal s (n) of the nth frame as a result of the noise determination in S302, normal input / output processing is performed without performing the above interpolation processing. That is, as shown in FIG. 11, the audio signal s (n−1) of the (n−1) th frame is output from the output buffer memory 132 as it is, and is output to the signal output unit 160 and stored in the input buffer memory 122. The audio signal s (n) of the th frame is moved to the output buffer memory 132 (S310). Then, when all the audio signals s (n + 1) of the next n + 1th frame have been accumulated in the input buffer memory 122 (S312), the audio signal s (n) of the nth frame is output from the output buffer memory 132. Are directly output to the signal output unit 160 (S320), and the audio signal s (n) of the (n + 1) th frame stored in the input buffer memory 122 is moved to the output buffer memory 132.

その後、デジタルカメラ１による撮像及び録音動作が終了（Ｓ３２２）するまで、入力音声信号の次の１フレームの音声信号ｓ（ｎ＋２）に対して、上記Ｓ３００〜Ｓ３２０の処理が繰り返される。これにより、入力音声信号に対して１フレームごとに雑音の検出処理が行われ、必要に応じて補間処理（雑音低減処理）が施された上で、雑音の無い音声信号がフレーム単位で出力される。 Thereafter, the processes of S300 to S320 are repeated for the audio signal s (n + 2) of the next frame of the input audio signal until the imaging and recording operation by the digital camera 1 is completed (S322). As a result, noise detection processing is performed on the input audio signal for each frame, and if necessary, interpolation processing (noise reduction processing) is performed, and then a noise-free audio signal is output in frame units. The

［３．５．効果］
以上、本開示の第３の実施形態に係る音声信号処理装置１００の構成と、これを用いた音声信号処理方法について説明した。第３の実施形態によれば、上述した第２の実施形態の効果に加え、さらに以下の効果がある。 [3.5. effect]
The configuration of the audio signal processing apparatus 100 according to the third embodiment of the present disclosure and the audio signal processing method using the same have been described above. According to the third embodiment, in addition to the effects of the second embodiment described above, the following effects are further obtained.

第３の実施形態によれば、雑音基準点（雑音開始点Ｐ_Ｓ、雑音中間点Ｐ_Ｍ、雑音終了点Ｐ_Ｅ）を検出することにより、音声信号のフレーム単位に関わらずに、雑音基準点に基づき、雑音前後の音声信号の任意の区間を自由に選択して、補間処理を実現できる。即ち、雑音開始点Ｐ_Ｓの直前の区間Ｓ_Ａの信号から前部補間信号ｔ（ｎ）を生成して、雑音前部区間Ｓ_Ｆを補間するとともに、雑音終了点Ｐ_Ｅの直後の区間Ｓ_Ｂの信号から後部補間信号ｕ（ｎ）を生成して、雑音後部区間Ｓ_Ｒを補間する。従って、雑音が複数のフレームに跨って存在する場合であっても、その雑音区間の直近前後の区間の信号を好適に用いて、補間処理を適切に実現できる。 According to the third embodiment, the noise reference point (noise start point P _S , noise intermediate point P _M , noise end point P _E ) is detected, so that the noise reference point can be obtained regardless of the frame unit of the audio signal. Based on the above, it is possible to freely select an arbitrary section of the audio signal before and after the noise and realize the interpolation processing. That generates a front interpolation signal t (n) from the signal of the section S _A immediately before the noise start point P _S, as well as interpolated noise front section S _F, immediately after the noise end point P _E section S from the signal of _B to generate a rear interpolation signal u (n), it interpolates the noise rear section S _R. Therefore, even when noise exists over a plurality of frames, it is possible to appropriately implement the interpolation process by suitably using signals in the sections immediately before and after the noise section.

さらに、第２の実施形態と同様に、雑音区間の前後の信号を用いて補間処理を行うので、補間処理の精度を高めて、雑音を低減しつつ、背景音を高精度で再現できるので、雑音低減処理の精度を大幅に向上できる。 Furthermore, as in the second embodiment, since interpolation processing is performed using signals before and after the noise section, the background sound can be reproduced with high accuracy while increasing the accuracy of the interpolation processing and reducing noise. The accuracy of noise reduction processing can be greatly improved.

また、第１、２の実施形態と同様に、第３の実施形態でも、補間信号の推定に必要なバッファメモリ長は２＊Ｎで済むので、上記従来の補間方法（図１参照。）が少なくとも３＊Ｎのバッファメモリ長が必要であるのと比べて、補間処理に必要なバッファメモリを大幅に削減できる。 Similarly to the first and second embodiments, in the third embodiment, the buffer memory length required for estimating the interpolation signal is 2 * N, so the conventional interpolation method (see FIG. 1) is used. Compared with the need for a buffer memory length of at least 3 * N, the buffer memory required for interpolation processing can be greatly reduced.

さらに、第２の実施形態と同様に、入力音声に対する出力音声の遅延を１フレーム分（遅延量：Ｎ）に抑えることができるので、補間処理に伴う出力音声の遅延を従来の補間方法よりも半分に低減できる。 Further, as in the second embodiment, since the delay of the output sound with respect to the input sound can be suppressed to one frame (delay amount: N), the delay of the output sound associated with the interpolation process is less than that of the conventional interpolation method. Can be reduced to half.

以上、添付図面を参照しながら本開示の好適な実施形態について詳細に説明したが、本技術はかかる例に限定されない。本開示の技術分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本開示の技術的範囲に属するものと了解される。 The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the present technology is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field of the present disclosure can come up with various changes or modifications within the scope of the technical idea described in the claims. Of course, it is understood that it belongs to the technical scope of the present disclosure.

例えば、上記実施形態では主に、音声信号処理装置としてデジタルカメラ１を例示し、動画撮像と共に録音する時に機械音を低減する例について説明したが、本技術はかかる例に限定されない。本技術の音声信号処理装置は、各種の音声信号記録装置又は音声信号再生装置等の電子機器に適用できる。例えば、音声信号処理装置は、記録再生装置（例えば、ブルーレイディスク／ＤＶＤレコーダ）、テレビジョン受像器、システムステレオ装置、撮像装置（例えば、デジタルカメラ、デジタルビデオカメラ）、携帯端末（例えば、携帯型音楽／映像プレーヤ、携帯型ゲーム機、ＩＣレコーダ）、パーソナルコンピュータ、ゲーム機、カーナビゲーション装置、デジタルフォトフレーム、家庭電化製品、自動販売機、ＡＴＭ、キオスク端末など、任意の電子機器に適用できる。 For example, in the above-described embodiment, the digital camera 1 is mainly exemplified as an audio signal processing device, and an example in which mechanical sound is reduced when recording with moving image capturing is described. However, the present technology is not limited to such an example. The audio signal processing apparatus according to the present technology can be applied to electronic apparatuses such as various audio signal recording apparatuses or audio signal reproduction apparatuses. For example, the audio signal processing apparatus includes a recording / reproducing apparatus (for example, a Blu-ray disc / DVD recorder), a television receiver, a system stereo apparatus, an imaging apparatus (for example, a digital camera, a digital video camera), and a portable terminal (for example, a portable type). Music / video player, portable game machine, IC recorder), personal computer, game machine, car navigation device, digital photo frame, home appliance, vending machine, ATM, kiosk terminal, etc.

また、上記実施形態では、デジタルカメラ１による音声信号の記録時に、雑音低減処理を実行する例について説明した。しかし、かかる例に限定されず、本技術の音声信号処理装置を音声信号再生装置に適用すれば、記録された音声信号を再生するときにも、再生対象の音声信号に含まれる雑音を適切に低減することができる。 Further, in the above-described embodiment, the example in which the noise reduction process is performed when the audio signal is recorded by the digital camera 1 has been described. However, the present invention is not limited to this example, and if the audio signal processing device of the present technology is applied to an audio signal reproduction device, the noise included in the audio signal to be reproduced is appropriately reduced even when the recorded audio signal is reproduced. Can be reduced.

なお、本技術は以下のような構成も取ることができる。
（１）入力された音声信号を所定区間ごとに一時保存する第１のバッファメモリと、
前記第１のバッファメモリに保存されているｎ番目の区間の音声信号よりも１つ前のｎ−１番目の区間の音声信号を一時保存する第２のバッファメモリと、
前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、少なくとも前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から補間信号を生成する補間信号生成部と、
前記補間信号を用いて、前記雑音を含む前記ｎ番目の区間の音声信号を補間する信号補間部と、
を備える、音声信号処理装置。 In addition, this technique can also take the following structures.
(1) a first buffer memory that temporarily stores an input audio signal for each predetermined section;
A second buffer memory for temporarily storing the audio signal of the (n-1) th section immediately before the audio signal of the nth section stored in the first buffer memory;
Interpolation that generates an interpolated signal from at least the (n−1) th section audio signal stored in the second buffer memory when it is detected that the nth section audio signal contains noise. A signal generator;
Using the interpolation signal, a signal interpolation unit that interpolates the sound signal of the nth section including the noise;
An audio signal processing apparatus comprising:

（２）前記音声信号に含まれる前記雑音の開始点及び終了点を検出する雑音基準点検出部をさらに備え、
前記補間信号生成部は、
前記ｎ−１番目又は前記ｎ番目の区間の一方若しくは双方の音声信号から第１の補間信号を生成する第１の補間信号生成部と、
前記ｎ＋１番目又は前記ｎ番目の区間の一方若しくは双方の音声信号から第２の補間信号を生成する第２の補間信号生成部と、
を備え、
前記ｎ番目の区間の音声信号が前記第１のバッファメモリに一時保存され、かつ、少なくとも前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、前記第１の補間信号生成部は、前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号、及び前記第１のバッファメモリに保存されている前記ｎ番目の区間の音声信号のうち、前記開始点よりも前の所定区間の信号から、前記第１の補間信号を生成し、前記信号補間部は、前記ｎ−１番目及び前記ｎ番目の区間の音声信号のうち前記雑音の前部の信号を、前記第１の補間信号を用いて補間し、前記第１の補間信号による補間後の前記ｎ番目の区間の音声信号を前記第２のバッファメモリに一時保存し、
前記ｎ＋１番目の区間の音声信号が前記第１のバッファメモリに一時保存されたときに、前記第２の補間信号生成部は、前記第２のバッファメモリに保存されている、前記第１の補間信号による補間後の前記ｎ番目の区間の音声信号、及び前記第１のバッファメモリに保存されている前記ｎ＋１番目の区間の音声信号のうち、前記終了点よりも後の所定区間の信号から、第２の補間信号を生成し、前記信号補間部は、前記第１の補間信号による補間後のｎ番目の区間の音声信号及び前記ｎ＋１番目の区間の音声信号のうち前記雑音の後部の信号を、前記第１の補間信号を用いて補間し、前記第１の補間信号及び前記第２の補間信号による補間後の前記ｎ番目の区間の音声信号を前記第２のバッファメモリから出力する、前記（１）に記載の音声信号処理装置。 (2) further comprising a noise reference point detection unit for detecting a start point and an end point of the noise included in the audio signal;
The interpolation signal generator is
A first interpolation signal generating unit that generates a first interpolation signal from one or both of the n-1 and / or n-th audio signals;
A second interpolation signal generating unit that generates a second interpolation signal from one or both of the n + 1-th and n-th interval audio signals;
With
When the audio signal of the nth section is temporarily stored in the first buffer memory and it is detected that at least the audio signal of the nth section includes noise, the first interpolation signal The generating unit includes the n-1th section audio signal stored in the second buffer memory and the nth section audio signal stored in the first buffer memory. The first interpolation signal is generated from a signal in a predetermined interval before the start point, and the signal interpolation unit is configured to detect the noise front portion of the audio signals in the (n−1) th and nth intervals. A signal is interpolated using the first interpolation signal, and the audio signal of the n-th section after interpolation by the first interpolation signal is temporarily stored in the second buffer memory,
When the audio signal of the (n + 1) th section is temporarily stored in the first buffer memory, the second interpolation signal generation unit stores the first interpolation stored in the second buffer memory. Of the audio signal of the nth section after interpolation by the signal and the audio signal of the n + 1st section stored in the first buffer memory, from the signal of the predetermined section after the end point, A second interpolation signal is generated, and the signal interpolation unit outputs a signal behind the noise among the sound signal of the nth section and the sound signal of the n + 1th section after interpolation by the first interpolation signal. Interpolating using the first interpolation signal, and outputting the audio signal of the nth section after interpolation by the first interpolation signal and the second interpolation signal from the second buffer memory, Sound described in (1) Signal processor.

（３）前記雑音基準点検出部は、前記雑音の中間点を検出し、前記中間点に基づいて前記開始点及び前記終了点を検出し、
前記ｎ番目の区間の音声信号が前記第１のバッファメモリに一時保存され、かつ、少なくとも前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、前記第１の補間信号生成部は、前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号、及び前記第１のバッファメモリに保存されている前記ｎ番目の区間の音声信号のうち、前記雑音の前記開始点と前記中間点との間の長さに相当する分だけ前記開始点よりも前に位置する区間の信号から、前記第１の補間信号を生成し、前記信号補間部は、前記ｎ−１番目及び前記ｎ番目の区間の音声信号のうち前記雑音の前記開始点から前記中間点までの区間の信号を前記第１の補間信号に置換し、前記第１の補間信号による置換後の前記ｎ番目の区間の音声信号を前記第２のバッファメモリに一時保存し、
前記ｎ＋１番目の区間の音声信号が前記第１のバッファメモリに一時保存されたときに、前記第２の補間信号生成部は、前記第２のバッファメモリに保存されている、前記第１の補間信号による置換後の前記ｎ番目の区間の音声信号、及び前記第１のバッファメモリに保存されている前記ｎ＋１番目の区間の音声信号のうち、前記雑音の前記中間点と前記終了点との間の長さに相当する分だけ前記終了点よりも後に位置する区間の信号から、第２の補間信号を生成し、前記信号補間部は、前記第１の補間信号による置換後のｎ番目の区間の音声信号及び前記ｎ＋１番目の区間の音声信号のうち前記雑音の前記中間点から前記終了点までの区間の信号を前記第２の補間信号に置換し、前記第１の補間信号及び前記第２の補間信号による置換後の前記ｎ番目の区間の音声信号を前記第２のバッファメモリから出力する、前記（２）に記載の音声信号処理装置。 (3) The noise reference point detection unit detects an intermediate point of the noise, detects the start point and the end point based on the intermediate point,
When the audio signal of the nth section is temporarily stored in the first buffer memory and it is detected that at least the audio signal of the nth section includes noise, the first interpolation signal The generating unit includes the n-1th section audio signal stored in the second buffer memory and the nth section audio signal stored in the first buffer memory. The signal interpolation unit generates the first interpolation signal from a signal in a section located before the start point by an amount corresponding to the length between the start point and the intermediate point of noise. Of the speech signals of the (n-1) th and nth intervals, the signal in the interval from the start point to the intermediate point of the noise is replaced with the first interpolation signal, and the replacement by the first interpolation signal The audio signal of the nth section after Temporarily stored in the second buffer memory,
When the audio signal of the (n + 1) th section is temporarily stored in the first buffer memory, the second interpolation signal generation unit stores the first interpolation stored in the second buffer memory. Of the sound signal of the nth section after replacement by a signal and the sound signal of the n + 1th section stored in the first buffer memory, between the intermediate point and the end point of the noise The second interpolation signal is generated from the signal in the section located after the end point by an amount corresponding to the length of the first interpolation signal, and the signal interpolation unit replaces the nth section with the first interpolation signal. Of the noise signal and the sound signal of the (n + 1) th section are replaced with the second interpolation signal in the section from the intermediate point to the end point of the noise, and the first interpolation signal and the second interpolation signal are replaced. After replacement by interpolation signal The audio signal of the serial n th interval outputted from said second buffer memory, the audio signal processing apparatus according to (2).

（４）前記補間信号生成部は、
前記ｎ−１番目の区間の音声信号から第１の仮補間信号を生成する第１の仮補間信号生成部と、
前記ｎ＋１番目の区間の音声信号から第２の仮補間信号を生成する第２の仮補間信号生成部と、
を備え、
前記ｎ番目の区間の音声信号が前記第１のバッファメモリに一時保存され、かつ、前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、前記第１の仮補間信号生成部は、前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から前記第１の仮補間信号を生成し、前記第１の仮補間信号を前記第２のバッファメモリに一時保存し、
前記ｎ＋１番目の区間の音声信号が前記第１のバッファメモリに一時保存されたときに、前記第２の仮補間信号生成部は、前記第１のバッファメモリに保存されている前記ｎ＋１番目の区間の音声信号から第２の仮補間信号を生成し、前記信号補間部は、前記第２の仮補間信号、及び前記第２のバッファメモリに保存されている前記第１の仮補間信号から前記補間信号を生成し、前記ｎ番目の区間の音声信号に換えて前記補間信号を前記第２のバッファメモリから出力する、前記（１）に記載の音声信号処理装置。 (4) The interpolation signal generation unit
A first temporary interpolation signal generating unit that generates a first temporary interpolation signal from the audio signal of the (n-1) th section;
A second temporary interpolation signal generation unit that generates a second temporary interpolation signal from the audio signal of the (n + 1) th section;
With
When the audio signal of the nth section is temporarily stored in the first buffer memory and it is detected that the audio signal of the nth section includes noise, the first temporary interpolation signal The generation unit generates the first temporary interpolation signal from the audio signal of the (n−1) -th section stored in the second buffer memory, and the first temporary interpolation signal is generated in the second buffer. Temporarily store it in memory,
When the audio signal of the (n + 1) th section is temporarily stored in the first buffer memory, the second temporary interpolation signal generation unit is configured to store the n + 1th section stored in the first buffer memory. A second temporary interpolation signal is generated from the audio signal, and the signal interpolation unit performs the interpolation from the second temporary interpolation signal and the first temporary interpolation signal stored in the second buffer memory. The audio signal processing device according to (1), wherein a signal is generated and the interpolated signal is output from the second buffer memory in place of the audio signal in the n-th interval.

（５）前記ｎ番目の区間の音声信号が前記第１のバッファメモリに一時保存され、かつ、前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、前記補間信号生成部は、前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から前記補間信号を生成し、前記信号補間部は、前記第１のバッファメモリに保存されている前記ｎ番目の区間の音声信号に換えて前記補間信号を前記第１のバッファメモリから出力する、前記（１）に記載の音声信号処理装置。 (5) generating the interpolated signal when the audio signal in the n-th section is temporarily stored in the first buffer memory and it is detected that the audio signal in the n-th section includes noise. The unit generates the interpolation signal from the audio signal of the (n-1) th section stored in the second buffer memory, and the signal interpolation unit is stored in the first buffer memory. The audio signal processing device according to (1), wherein the interpolation signal is output from the first buffer memory in place of the audio signal in the nth section.

（６）前記雑音は、前記音声信号を出力する収音部と同一の筐体に設けられた発音部から発生するパルス状の作動音である、前記（１）〜（５）のいずれか一項に記載の音声信号処理装置。 (6) The noise is any one of (1) to (5), which is a pulsed operation sound generated from a sound generation unit provided in the same housing as the sound collection unit that outputs the audio signal. The audio signal processing device according to item.

（７）前記発音部は、前記収音部と同一の筐体に設けられた駆動装置であり、
前記作動音は、前記駆動装置の動作開始時又は動作終了時に発生するパルス状の機械駆動音である、前記（６）に記載の音声信号処理装置。 (7) The sound generation unit is a driving device provided in the same housing as the sound collection unit,
The sound signal processing device according to (6), wherein the operation sound is a pulse-like mechanical drive sound generated when the operation of the drive device starts or ends.

（８）前記音声信号の処理単位である前記所定区間の時間長は、前記パルス状の機械駆動音の時間長よりも長い、前記（１）〜（７）のいずれか一項に記載の音声信号処理装置。 (8) The sound according to any one of (1) to (7), wherein a time length of the predetermined section, which is a processing unit of the sound signal, is longer than a time length of the pulse-like mechanical drive sound. Signal processing device.

（９）外部音声を音声信号に変換する収音部と、
前記収音部と同一の筐体に設けられ、雑音を発生させる発音部と、
前記収音部から入力された前記音声信号を所定区間ごとに一時保存する第１のバッファメモリと、
前記第１のバッファメモリに保存されているｎ番目の区間の音声信号よりも１つ前のｎ−１番目の区間の音声信号を一時保存する第２のバッファメモリと、
前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、少なくとも前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から補間信号を生成する補間信号生成部と、
前記補間信号を用いて、前記雑音を含む前記ｎ番目の区間の音声信号を補間する信号補間部と、
を備える、撮像装置。 (9) a sound collection unit that converts external sound into a sound signal;
A sound generation unit that is provided in the same housing as the sound collection unit and generates noise;
A first buffer memory for temporarily storing the audio signal input from the sound collection unit for each predetermined section;
A second buffer memory for temporarily storing the audio signal of the (n-1) th section immediately before the audio signal of the nth section stored in the first buffer memory;
Interpolation that generates an interpolated signal from at least the (n−1) th section audio signal stored in the second buffer memory when it is detected that the nth section audio signal contains noise. A signal generator;
Using the interpolation signal, a signal interpolation unit that interpolates the sound signal of the nth section including the noise;
An imaging apparatus comprising:

（１０）第１のバッファメモリに保存されているｎ−１番目の区間の音声信号を第２のバッファメモリに一時保存することと、
入力されるｎ番目の区間の音声信号を前記第１のバッファメモリに一時保存することと、
前記第１のバッファメモリに保存されている前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、少なくとも前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から補間信号を生成することと、
前記補間信号を用いて、前記雑音を含む前記ｎ番目の区間の音声信号を補間することと、
を含む、音声信号処理方法。 (10) temporarily storing the audio signal of the (n-1) th section stored in the first buffer memory in the second buffer memory;
Temporarily storing the input audio signal of the nth section in the first buffer memory;
When it is detected that the nth section audio signal stored in the first buffer memory includes noise, at least the n−1th section stored in the second buffer memory. Generating an interpolated signal from the audio signal of the section;
Using the interpolated signal to interpolate the audio signal of the nth section containing the noise;
An audio signal processing method comprising:

（１１）第１のバッファメモリに保存されているｎ−１番目の区間の音声信号を第２のバッファメモリに一時保存することと、
入力されるｎ番目の区間の音声信号を前記第１のバッファメモリに一時保存することと、
前記第１のバッファメモリに保存されている前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、少なくとも前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から補間信号を生成することと、
前記補間信号を用いて、前記雑音を含む前記ｎ番目の区間の音声信号を補間することと、
をコンピュータに実行させるためのプログラム。 (11) temporarily storing the audio signal of the (n−1) th section stored in the first buffer memory in the second buffer memory;
Temporarily storing the input audio signal of the nth section in the first buffer memory;
When it is detected that the nth section audio signal stored in the first buffer memory includes noise, at least the n−1th section stored in the second buffer memory. Generating an interpolated signal from the audio signal of the section;
Using the interpolated signal to interpolate the audio signal of the nth section containing the noise;
A program that causes a computer to execute.

（１２）第１のバッファメモリに保存されているｎ−１番目の区間の音声信号を第２のバッファメモリに一時保存することと、
入力されるｎ番目の区間の音声信号を前記第１のバッファメモリに一時保存することと、
前記第１のバッファメモリに保存されている前記ｎ番目の区間の音声信号に雑音が含まれることが検出されたときに、少なくとも前記第２のバッファメモリに保存されている前記ｎ−１番目の区間の音声信号から補間信号を生成することと、
前記補間信号を用いて、前記雑音を含む前記ｎ番目の区間の音声信号を補間することと、
をコンピュータに実行させるためのプログラムが記録された、コンピュータ読み取り可能な記録媒体。 (12) temporarily storing the audio signal of the (n−1) -th section stored in the first buffer memory in the second buffer memory;
Temporarily storing the input audio signal of the nth section in the first buffer memory;
When it is detected that the nth section audio signal stored in the first buffer memory includes noise, at least the n−1th section stored in the second buffer memory. Generating an interpolated signal from the audio signal of the section;
Using the interpolated signal to interpolate the audio signal of the nth section containing the noise;
A computer-readable recording medium on which a program for causing a computer to execute is recorded.

１デジタルカメラ
１０撮像部
１４駆動装置
１５ズームモータ
１６フォーカスモータ
５１マイクロホン
６０音声信号処理部
７０制御部
１００音声信号処理装置
１１０信号入力部
１２０入出力用バッファメモリ
１２２入力用バッファメモリ
１３０補間用バッファメモリ
１３２出力用バッファメモリ
１４０雑音検出部
１４２雑音基準点検出部
１５０雑音低減部
１５２補間信号生成部
１５４信号補間部
１５６第１の仮補間信号生成部
１５７第２の仮補間信号生成部
１５８前部補間信号生成部
１５９後部補間信号生成部
１６０信号出力部
ｓ音声信号
ｖ補間信号
ｐ第１の仮補間信号
ｑ第２の仮補間信号
ｔ前部補間信号
ｕ後部補間信号
Ｐ_Ｓ雑音開始点
Ｐ_Ｍ雑音中間点
Ｐ_Ｅ雑音終了点
Ｓ_Ｆ雑音前部区間
Ｓ_Ｒ雑音後部区間
Ｌ_Ｆ雑音前部区間長
Ｌ_Ｒ雑音後部区間長
DESCRIPTION OF SYMBOLS 1 Digital camera 10 Image pick-up part 14 Drive apparatus 15 Zoom motor 16 Focus motor 51 Microphone 60 Audio | voice signal processing part 70 Control part 100 Audio | voice signal processing apparatus 110 Signal input part 120 Input / output buffer memory 122 Input buffer memory 130 Interpolation buffer memory 130 132 buffer memory for output 140 noise detection unit 142 noise reference point detection unit 150 noise reduction unit 152 interpolation signal generation unit 154 signal interpolation unit 156 first temporary interpolation signal generation unit 157 second temporary interpolation signal generation unit 158 front interpolation signal generating unit 159 rear interpolation signal generating unit 160 signal output unit s audio signal v interpolated signal p first temporary interpolated signal q second temporary interpolated signal t front interpolation signal u rear interpolation signal P _S noise start point P _M noise intermediate point P _E noise end point S _F noise front section S _R miscellaneous Rear section L _F noise front section length L _R noise rear section length

Claims

A first buffer memory for temporarily storing the input audio signal for each predetermined section;
A second buffer memory for temporarily storing the audio signal of the (n-1) th section immediately before the audio signal of the nth section stored in the first buffer memory;
Interpolation that generates an interpolated signal from at least the (n−1) th section audio signal stored in the second buffer memory when it is detected that the nth section audio signal contains noise. A signal generator;
Using the interpolation signal, a signal interpolation unit that interpolates the sound signal of the nth section including the noise;
An audio signal processing apparatus comprising:

A noise reference point detection unit for detecting a start point and an end point of the noise included in the audio signal;
The interpolation signal generator is
A first interpolation signal generating unit that generates a first interpolation signal from one or both of the n-1 and / or n-th audio signals;
A second interpolation signal generating unit that generates a second interpolation signal from one or both of the n + 1-th and n-th interval audio signals;
With
When the audio signal of the nth section is temporarily stored in the first buffer memory and it is detected that at least the audio signal of the nth section includes noise, the first interpolation signal The generating unit includes the n-1th section audio signal stored in the second buffer memory and the nth section audio signal stored in the first buffer memory. The first interpolation signal is generated from a signal in a predetermined interval before the start point, and the signal interpolation unit is configured to detect the noise front portion of the audio signals in the (n−1) th and nth intervals. A signal is interpolated using the first interpolation signal, and the audio signal of the n-th section after interpolation by the first interpolation signal is temporarily stored in the second buffer memory,
When the audio signal of the (n + 1) th section is temporarily stored in the first buffer memory, the second interpolation signal generation unit stores the first interpolation stored in the second buffer memory. Of the audio signal of the nth section after interpolation by the signal and the audio signal of the n + 1st section stored in the first buffer memory, from the signal of the predetermined section after the end point, A second interpolation signal is generated, and the signal interpolation unit outputs a signal behind the noise among the sound signal of the nth section and the sound signal of the n + 1th section after interpolation by the first interpolation signal. Interpolating using the first interpolation signal, and outputting the audio signal of the nth section after interpolation by the first interpolation signal and the second interpolation signal from the second buffer memory. The voice of item 1 No. processing apparatus.

The noise reference point detection unit detects an intermediate point of the noise, detects the start point and the end point based on the intermediate point,
When the audio signal of the nth section is temporarily stored in the first buffer memory and it is detected that at least the audio signal of the nth section includes noise, the first interpolation signal The generating unit includes the n-1th section audio signal stored in the second buffer memory and the nth section audio signal stored in the first buffer memory. The signal interpolation unit generates the first interpolation signal from a signal in a section located before the start point by an amount corresponding to the length between the start point and the intermediate point of noise. Of the speech signals of the (n-1) th and nth intervals, the signal in the interval from the start point to the intermediate point of the noise is replaced with the first interpolation signal, and the replacement by the first interpolation signal The audio signal of the nth section after Temporarily stored in the second buffer memory,
When the audio signal of the (n + 1) th section is temporarily stored in the first buffer memory, the second interpolation signal generation unit stores the first interpolation stored in the second buffer memory. Of the sound signal of the nth section after replacement by a signal and the sound signal of the n + 1th section stored in the first buffer memory, between the intermediate point and the end point of the noise The second interpolation signal is generated from the signal in the section located after the end point by an amount corresponding to the length of the first interpolation signal, and the signal interpolation unit replaces the nth section with the first interpolation signal. Of the noise signal and the sound signal of the (n + 1) th section are replaced with the second interpolation signal in the section from the intermediate point to the end point of the noise, and the first interpolation signal and the second interpolation signal are replaced. After replacement by interpolation signal The audio signal of the serial n th interval outputted from said second buffer memory, the audio signal processing apparatus according to claim 2.

The interpolation signal generator is
A first temporary interpolation signal generating unit that generates a first temporary interpolation signal from the audio signal of the (n-1) th section;
A second temporary interpolation signal generation unit that generates a second temporary interpolation signal from the audio signal of the (n + 1) th section;
With
When the audio signal of the nth section is temporarily stored in the first buffer memory and it is detected that the audio signal of the nth section includes noise, the first temporary interpolation signal The generation unit generates the first temporary interpolation signal from the audio signal of the (n−1) -th section stored in the second buffer memory, and the first temporary interpolation signal is generated in the second buffer. Temporarily store it in memory,
When the audio signal of the (n + 1) th section is temporarily stored in the first buffer memory, the second temporary interpolation signal generation unit is configured to store the n + 1th section stored in the first buffer memory. A second temporary interpolation signal is generated from the audio signal, and the signal interpolation unit performs the interpolation from the second temporary interpolation signal and the first temporary interpolation signal stored in the second buffer memory. 2. The audio signal processing device according to claim 1, wherein a signal is generated, and the interpolation signal is output from the second buffer memory in place of the n-th interval audio signal.

When the audio signal of the nth section is temporarily stored in the first buffer memory and it is detected that noise is included in the audio signal of the nth section, the interpolation signal generation unit is The interpolation signal is generated from the audio signal of the (n−1) -th section stored in the second buffer memory, and the signal interpolation unit is configured to generate the n-th section stored in the first buffer memory. The audio signal processing apparatus according to claim 1, wherein the interpolated signal is output from the first buffer memory in place of an audio signal in a section.

The audio signal processing apparatus according to claim 1, wherein the noise is a pulsed operation sound generated from a sound generation unit provided in the same housing as the sound collection unit that outputs the audio signal.

The sound generation unit is a driving device provided in the same housing as the sound collection unit,
The audio signal processing device according to claim 6, wherein the operation sound is a pulse-like mechanical drive sound that is generated when the operation of the drive device starts or ends.

The audio signal processing device according to claim 1, wherein a time length of the predetermined section, which is a processing unit of the audio signal, is longer than a time length of the pulse-like mechanical drive sound.

A sound collection unit that converts external sound into an audio signal;
A sound generation unit that is provided in the same housing as the sound collection unit and generates noise;
A first buffer memory for temporarily storing the audio signal input from the sound collection unit for each predetermined section;
A second buffer memory for temporarily storing the audio signal of the (n-1) th section immediately before the audio signal of the nth section stored in the first buffer memory;
Interpolation that generates an interpolated signal from at least the (n−1) th section audio signal stored in the second buffer memory when it is detected that the nth section audio signal contains noise. A signal generator;
Using the interpolation signal, a signal interpolation unit that interpolates the sound signal of the nth section including the noise;
An imaging apparatus comprising:

Temporarily storing the audio signal of the (n−1) th section stored in the first buffer memory in the second buffer memory;
Temporarily storing the input audio signal of the nth section in the first buffer memory;
When it is detected that the nth section audio signal stored in the first buffer memory includes noise, at least the n−1th section stored in the second buffer memory. Generating an interpolated signal from the audio signal of the section;
Using the interpolated signal to interpolate the audio signal of the nth section containing the noise;
An audio signal processing method comprising:

Temporarily storing the audio signal of the (n−1) th section stored in the first buffer memory in the second buffer memory;
Temporarily storing the input audio signal of the nth section in the first buffer memory;
When it is detected that the nth section audio signal stored in the first buffer memory includes noise, at least the n−1th section stored in the second buffer memory. Generating an interpolated signal from the audio signal of the section;
Using the interpolated signal to interpolate the audio signal of the nth section containing the noise;
A program that causes a computer to execute.

Temporarily storing the audio signal of the (n−1) th section stored in the first buffer memory in the second buffer memory;
Temporarily storing the input audio signal of the nth section in the first buffer memory;
When it is detected that the nth section audio signal stored in the first buffer memory includes noise, at least the n−1th section stored in the second buffer memory. Generating an interpolated signal from the audio signal of the section;
Using the interpolated signal to interpolate the audio signal of the nth section containing the noise;
A computer-readable recording medium on which a program for causing a computer to execute is recorded.