KR100612616B1

KR100612616B1 - Signal-to-Noise Ratio Estimation Method Using Zero Crossing Point and Sound Source Direction Detection Method

Info

Publication number: KR100612616B1
Application number: KR1020040035611A
Authority: KR
Inventors: 길이만; 김영익
Original assignee: 한국과학기술원
Priority date: 2004-05-19
Filing date: 2004-05-19
Publication date: 2006-08-17
Also published as: KR20050110790A

Abstract

본 발명은 2개의 센서에서 수신된 음성신호의 영교차점 시간차의 분산값을 이용하여 신호대잡음비를 추정하고 음원 방향을 탐지하는 방법에 관한 것이다.The present invention relates to a method for estimating a signal-to-noise ratio and detecting a sound source direction by using a variance of zero crossing time differences of voice signals received from two sensors.

본 발명에 따른 영교차점을 이용한 음원 방향 탐지방법은, 동일한 음원으로부터 출력되는 신호를 상기 2개의 센서를 이용하여 수신하고 수신신호를 채널별로 주파수 분리하는 단계와; 상기 채널별로 주파수 분리된 각 채널신호를 ZCPA 코딩하여 영교차점과 최고치를 구하는 ZCPA코딩단계와; 상기 ZCPA코딩단계에서 얻어진 각 채널별 영교차점과 최고치 정보를 이용하여 ITD값을 구하는 ITD계산단계와; 상기 각 채널신호를 동일한 개수의 영교차점을 포함하는 다수의 윈도우로 나누고, 각 채널별 및 윈도우별로 ITD값의 분산을 구하는 분산계산단계와; 상기 각 채널별 및 윈도우별로 상기 ITD값의 분산과 상기 채널의 중심주파수를 이용하여 신호대잡음비를 계산하는 신호대잡음비계산단계와; 상기 신호대잡음비가 임계치보다 큰 ITD값을 추출하는 신뢰도향상단계와; 상기 신호대잡음비를 가중치로 하는 ITD값의 수평각 히스토그램을 구하고, 상기 수평각 히스토그램의 최고값을 음원의 공간적 위치로 결정하는 방향추정단계를 포함한다.A sound source direction detection method using a zero crossing point according to the present invention includes the steps of receiving a signal output from the same sound source using the two sensors and separating the received signal by channel; ZCPA coding to obtain a zero crossing point and a maximum value by ZCPA coding each channel signal separated in frequency for each channel; An ITD calculation step of calculating an ITD value by using the zero crossing point and the highest value for each channel obtained in the ZCPA coding step; A variance calculation step of dividing each channel signal into a plurality of windows including the same number of zero crossing points and obtaining a variance of ITD values for each channel and window; A signal-to-noise ratio calculation step of calculating a signal-to-noise ratio using the variance of the ITD value for each channel and window and the center frequency of the channel; A reliability enhancement step of extracting an ITD value whose signal-to-noise ratio is greater than a threshold; Obtaining a horizontal angle histogram of the ITD value using the signal-to-noise ratio as a weight, and determining a maximum value of the horizontal angle histogram as a spatial position of the sound source.

음원 방향 탐지, 영교차점, ZCPA코딩, 시간지연, 신호대잡음비 추정Source direction detection, zero crossing, ZCPA coding, time delay, signal to noise ratio estimation

Description

The signal-to-noise ratio estimation method and sound source localization method based on zero-crossings}

도 1은 본 발명의 한 실시예에 따른 영교차점을 이용한 신호대잡음비 추정방법 및 음원 방향 탐지방법을 도시한 블록도,1 is a block diagram illustrating a signal-to-noise ratio estimation method and a sound source direction detection method using a zero crossing point according to an embodiment of the present invention;

도 2는 i 번째 채널신호에 대한 ZCPA 코딩의 개념도,2 is a conceptual diagram of ZCPA coding for an i th channel signal;

도 3은 잡음이 섞였을 때 영교차점의 이동을 설명하기 위한 도면,3 is a view for explaining the movement of the zero crossing when the noise is mixed,

도 4는 여러 가지 배경잡음 속에서 음원이 20도와 40도에 위치할 때 교차상관에 의한 방향 탐지방법과 본 발명에 따른 방향 탐지방법의 방향 추정결과를 정리한 도면,4 is a view summarizing the direction estimation results of the direction detection method by cross-correlation and the direction detection method according to the present invention when the sound source is located at 20 degrees and 40 degrees in various background noises.

도 5는 여러 가지 배경잡음 속에서 음원이 각각 -10도, 0도, 10도, 40도에 위치할 때 교차상관에 의한 방향 탐지방법의 수평각 히스토그램과 본 발명에 따른 방향 탐지방법의 수평각 히스토그램을 비교한 도면이다.5 is a horizontal angle histogram of a direction detection method based on cross correlation and a horizontal angle histogram of a direction detection method according to the present invention when a sound source is located at -10 degrees, 0 degrees, 10 degrees, and 40 degrees, respectively, in various background noises. It is a figure compared.

본 발명은 영교차점을 이용한 신호대잡음비 추정방법 및 음원 방향 탐지방법에 관한 것으로서, 보다 상세하게 설명하면 2개의 센서에서 발생하는 음성신호의 영교차점 시간차의 분산값을 이용하여 신호대잡음비를 추정하고 음원 방향을 탐지하는 방법에 관한 것이다.The present invention relates to a method for estimating a signal-to-noise ratio using a zero crossing point and a method for detecting a sound source direction. More specifically, the present invention relates to a signal-to-noise ratio using a dispersion value of zero-crossing time differences of voice signals generated by two sensors. It is about how to detect.

음원의 방향 탐지 기술은 음원의 공간상 위치에 따라 두 개 이상의 센서에서 발생하는 음성신호의 시간 지연과 세기 차이를 이용하여 음원의 방향을 찾는 것을 말한다. 이 기술은 두 귀를 이용한 자동 음성인식시스템에 적용되어, 잡음 속에서 특정 방향의 음원에 주의 집중하거나, 여러 개의 음원이 섞여 있을 때 방향 정보를 이용하여 각각의 음원을 분리하는 데 사용된다. 또한, 이 음원의 방향 탐지 기술은 적응적 빔포밍(adaptive beamforming)기술과 방향성 필터링(directional filtering)기술에 적용되어, 능동적 방향성 잡음을 제거하기 위해 사용된다.The direction detection technology of sound sources refers to the direction of sound sources by using time delays and intensity differences of voice signals generated from two or more sensors according to spatial positions of the sound sources. This technique is applied to the automatic speech recognition system using two ears, which is used to focus attention on a sound source in a specific direction in noise or to separate each sound source by using direction information when several sound sources are mixed. In addition, the direction detection technique of this sound source is applied to adaptive beamforming technique and directional filtering technique, and is used to remove active directional noise.

종래의 음원의 방향 탐지 기술은 대부분 2개의 센서에 연결된 필터뱅크 신호를 시간 지연기에 적용하여, 모든 시간 지연에 대한 교차상관(cross-correlation)값을 구하고, 최고치의 교차상관값을 가지는 시간지연을 찾는다. 하지만, 이 종래의 방법은 모든 시간 지연에 대해 음성신호를 교차상관하는 데 많은 계산이 필요하기 때문에 구현시 어려움이 따른다. 또한, 음원의 정확한 시간 지연을 찾기 위해서는 긴 시간 윈도우가 유리하지만 국지적인 음원의 방향을 탐지하기 위해서는 짧은 시간 윈도우가 유리한데, 이 종래의 방법은 교차상관값을 계산하기 위해 고정된 크기의 시간 윈도우를 사용하기 때문에 윈도우의 크기를 결정하는 데 어려움이 있 다. 따라서, 일반적으로 교차상관에 기초한 음원의 방향 탐지 기술은 잡음이 섞였을 때 정확도가 떨어지고, 여러 개의 음원이 존재할 경우에는 음원간의 간섭이 심한 문제점이 발생한다.Conventional sound direction detection technology applies the filter bank signal connected to the two sensors to the time delay, finds the cross-correlation value for all time delays, and finds the time delay with the highest cross correlation value. Find. However, this conventional method is difficult to implement because many calculations are required to cross-correlate the speech signal for all time delays. In addition, a long time window is advantageous for finding the correct time delay of a sound source, but a short time window is advantageous for detecting the direction of a local sound source. This conventional method has a fixed time window for calculating a cross-correlation value. Because of this problem, it is difficult to determine the size of the window. Therefore, in general, the cross-correlation-based direction detection technique of the sound source is less accurate when the noise is mixed, when there are a plurality of sound sources, there is a problem that severe interference between the sound source.

두 귀를 모델로 하는 음원의 방향 탐지 기술은 대부분 제프리(L. Jeffress)의 논문 [A place theory of sound localization, J. Comp. Physiology and Psychology, 41: 35-39, 1948]에서 제시한 시간 지연기를 이용한 교차상관값에 기초하고 있다.The direction detection technique of a sound source modeling two ears is mostly described in L. Jeffress's paper [A place theory of sound localization, J. Comp. Physiology and Psychology, 41: 35-39, 1948, based on cross-correlation using time delay.

스턴(R. Stern)과 콜번(H. Colburn)은 논문[Theory of binaural interaction based on auditory-nerve data: Ⅳ. A model of subjective lateral position, J. Acoust. Soc. Amer., 64(1): 127-140, 1978]에서, 대표적인 음원의 방향 정보인 ITD(interaural time difference)와 IID(interaural intensity difference)를 제프리가 제시한 교차상관에 기초하여 정량적으로 구하고 있다.S. R. Stern and H. Colburn [Theory of binaural interaction based on auditory-nerve data: IV. A model of subjective lateral position, J. Acoust. Soc. Amer., 64 (1): 127-140, 1978] quantitatively obtain the interaural time difference (ITD) and the interaural intensity difference (IID), which are representative directions of representative sound sources, based on the cross-correlation presented by Jeffrey.

또한, 샴마(S. A. Shamma)와 센(N. Shen)과 고파라스워미(P. Gopalaswamy)는 논문[Stereausis: binaural processing without neural delays, J. Acoust. Soc. Amer., 86(3): 989-1006, 1989]에서, 시간 지연기를 인위적으로 사용하지 않고, 아날로그 전자회로에서 발생하는 시간 지연을 이용하여 음원의 방향 정보를 얻는 기술을 제안한다.In addition, S. A. Shamma, N. Shen and P. Gopalaswamy have also published a paper in Stereausis: binaural processing without neural delays, J. Acoust. Soc. Amer., 86 (3): 989-1006, 1989] proposes a technique for obtaining direction information of a sound source using a time delay generated in an analog electronic circuit without artificially using a time delay.

또한, 브리바트(J. Breebaart)와 팔(S. Par)과 코라우쉬(A. Kohlausch)는 논문[Binaural processing model based on contralateral inhibition. I. Model structure, J. Acoust. Soc. Amer., 110(2): 1074-1088, 2001]에서, 교차상관값에 기초한 접근방법에서 벗어나서 양쪽 귀의 음원신호를 억제와 흥분의 두 가지 원리를 적용하여 측음화 현상을 설명하고 있다.In addition, J. Breebaart, S. Par and A. Kohlausch [Binaural processing model based on contralateral inhibition. I. Model structure, J. Acoust. Soc. Amer., 110 (2): 1074-1088, 2001, describes a side-tone phenomenon by applying two principles of suppression and excitation to the sound source signals of both ears, deviating from the cross-correlation based approach.

빔포밍 기술은 여러 개의 안테나에 각기 다른 시간값으로 시간 지연시킨 다음 가중치를 주어 더하는 방법으로서, 원하는 방향에서 오는 신호만을 얻는 기술이다. 그리피스(L. J. Griffiths)는 논문[A simple adaptive algorithm for real-time processing in antenna arrarys, Proc. IEEE, 57: 1696-1704, 1969]에서, 원하는 신호와 가중치값 사이의 교차상관값을 미리 알고 있을 때, 파일럿 신호의 도움없이 오차신호의 최소평균자승(least mean square)값으로부터 가중치값을 구하고 있다.The beamforming technique is a technique of obtaining a signal from a desired direction by adding a weight by delaying and then weighting a plurality of antennas with different time values. Griffith (L. J. Griffiths) reported in A simple adaptive algorithm for real-time processing in antenna arrarys, Proc. IEEE, 57: 1696-1704, 1969], when knowing the cross-correlation value between the desired signal and the weight value in advance, obtains the weight value from the least mean square value of the error signal without the aid of the pilot signal. have.

방향성 필터링 기술로서, 위트콥(T. Wittkop)은 논문[Two-channel noise reduction algorithms motivated by models of binaural interaction, Ph.D. thesis Univ. Oldenburg, 2001]에서, 두 귀에 해당하는 센서신호를 STFT(Short Time Fourier Transform)하여 주파수 분리한 다음 교차상관하여 방향정보를 얻어내고, 그 방향에 따라 신호에 서로 다른 가중치를 준 다음 I-STFT(Inverse-STFT)로 특정 방향의 음원을 복원하고 있다. 이 기술은 실제 청각 장애인을 위한 보청기에 사용되어 높은 효과를 주고 있지만, 음원의 정확한 방향 탐지가 어려울 뿐만 아니라 방향에 따른 가중치 함수를 만들기 위해 파라미터값을 실험적으로 구하기 어려운 문제점이 있다.As a directional filtering technique, T. Wittkop [Two-channel noise reduction algorithms motivated by models of binaural interaction, Ph.D. thesis Univ. Oldenburg, 2001], the frequency signals are separated by STFT (Short Time Fourier Transform) for two ears, cross-correlated to obtain direction information, and the signals are given different weights according to the direction and then I-STFT ( Inverse-STFT) to restore the sound source in a specific direction. Although this technique is used in hearing aids for the hearing impaired, it has a high effect, but it is not only difficult to detect the exact direction of the sound source, but also difficult to experimentally obtain parameter values to make a weight function according to the direction.

본 발명은 상기한 종래기술의 문제점을 해결하기 위하여 안출된 것으로서, 필터뱅크의 출력신호의 영교차점과 최고치(zero-crossing and peak amplitudes, ZCPA)를 이용하여 신호대잡음비를 추정하고, 음원의 방향을 탐지하는 방법을 제공하는 데 그 목적이 있다.
The present invention has been made to solve the above problems of the prior art, and estimates the signal-to-noise ratio using the zero-crossing and peak amplitudes (ZCPA) of the output signal of the filter bank, and the direction of the sound source The purpose is to provide a method of detection.

상기한 목적을 달성하기 위한 본 발명에 따른 영교차점을 이용한 신호대잡음비 추정방법은, 사람의 두 귀에 해당하는 2개의 센서와, 상기 2개의 센서로부터 수신된 신호를 채널별로 주파수 분리하는 필터뱅크와, 상기 필터뱅크에서 주파수 분리된 각 채널신호에 포함된 신호대잡음비를 계산하는 프로세서를 구비한 장치에서의 영교차점을 이용한 신호대잡음비 추정방법에 있어서,Signal-to-noise ratio estimation method using a zero crossing point according to the present invention for achieving the above object, two sensors corresponding to the two ears of a person, a filter bank for frequency-dividing the signals received from the two sensors for each channel, In the signal-to-noise ratio estimation method using a zero crossing point in a device having a processor for calculating the signal-to-noise ratio included in each channel signal separated from the filter bank,

동일한 음원으로부터 출력되는 신호를 상기 2개의 센서를 이용하여 수신하고 상기 수신신호를 채널별로 주파수 분리하는 단계와;Receiving signals output from the same sound source using the two sensors and separating the received signals by channels;

상기 채널별로 주파수 분리된 각 채널신호를 ZCPA(zero-crossing and peak amplitudes) 코딩하여 영교차점과 최고치를 구하는 ZCPA코딩단계와;A ZCPA coding step of zero-crossing and peak amplitudes (ZCPA) coding each channel signal separated by frequency for each channel to obtain a zero crossing point and a maximum value;

상기 ZCPA코딩단계에서 얻어진 각 채널별 영교차점과 최고치 정보를 이용하여 ITD(interaural time difference)값을 구하는 ITD계산단계와;An ITD calculation step of obtaining an interaural time difference (ITD) value by using the zero crossing point and the highest value for each channel obtained in the ZCPA coding step;

상기 각 채널신호를 동일한 개수의 영교차점을 포함하는 다수의 윈도우로 나누고, 각 채널별 및 윈도우별로 ITD값의 분산을 구하는 분산계산단계와;A variance calculation step of dividing each channel signal into a plurality of windows including the same number of zero crossing points and obtaining a variance of ITD values for each channel and window;

상기 각 채널별 및 윈도우별로 상기 ITD값의 분산과 상기 채널의 중심주파수 를 이용하여 신호대잡음비를 계산하는 신호대잡음비계산단계를 포함한 것을 특징으로 특징으로 한다.And a signal-to-noise ratio calculation step of calculating a signal-to-noise ratio using the dispersion of the ITD value for each channel and each window and the center frequency of the channel.

상기한 목적을 달성하기 위한 본 발명에 따른 영교차점을 이용한 음원 방향 탐지방법은, 동일한 음원으로부터 출력되는 신호를 수신하는 사람의 두 귀에 해당하는 2개의 센서와, 상기 2개의 센서로부터 수신된 신호를 채널별로 주파수 분리하는 필터뱅크와, 상기 필터뱅크에서 주파수 분리된 각 채널신호를 이용하여 상기 음원의 방향을 탐지하는 프로세서를 구비한 장치에서의 영교차점을 이용한 음원 방향 탐지방법에 있어서,Sound source direction detection method using a zero crossing point according to the present invention for achieving the above object, two sensors corresponding to two ears of a person receiving a signal output from the same sound source, and the signal received from the two sensors In the sound source direction detection method using a zero crossing point in the device having a filter bank for separating the frequency for each channel, and a processor for detecting the direction of the sound source using each channel signal separated in the filter bank,

동일한 음원으로부터의 출력되는 신호를 상기 2개의 센서를 이용하여 수신하고 수신신호를 채널별로 주파수 분리하는 단계와;Receiving signals output from the same sound source using the two sensors and frequency-receiving the received signals for each channel;

상기 각 채널별 및 윈도우별로 상기 ITD값의 분산과 상기 채널의 중심주파수를 이용하여 신호대잡음비를 계산하는 신호대잡음비계산단계와;A signal-to-noise ratio calculation step of calculating a signal-to-noise ratio using the variance of the ITD value for each channel and window and the center frequency of the channel;

상기 신호대잡음비가 임계치보다 큰 ITD값을 추출하는 신뢰도향상단계와;A reliability enhancement step of extracting an ITD value whose signal-to-noise ratio is greater than a threshold;

상기 신호대잡음비를 가중치로 하는 ITD값의 수평각 히스토그램을 구하고, 상기 수평각 히스토그램의 최고값을 음원의 공간적 위치로 결정하는 방향추정단계를 포함한 것을 특징으로 한다.And a direction estimating step of obtaining a horizontal angle histogram of the ITD value using the signal-to-noise ratio as a weight and determining the maximum value of the horizontal angle histogram as a spatial position of the sound source.

본 발명에 따르면, 동일한 음원으로부터 출력되는 신호를 인간의 두 귀에 해당하는 2개의 센서로 수신하고 채널별로 주파수 분리한 후 각 채널신호에 포함된 신호대잡음비를 계산하는 시스템에, 영교차점을 이용한 신호대잡음비 추정방법을 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체가 제공되며,
상기 영교차점을 이용한 신호대잡음비 추정방법은,
상기 채널별로 주파수 분리된 각 채널신호를 ZCPA(zero-crossing and peak amplitudes) 코딩하여 영교차점과 최고치를 구하는 ZCPA코딩단계와;
상기 ZCPA코딩단계에서 얻어진 각 채널별 영교차점과 최고치 정보를 이용하여 ITD(interaural time difference)값을 구하는 ITD계산단계와;
상기 각 채널신호를 동일한 개수의 영교차점을 포함하는 다수의 윈도우로 나누고, 각 채널별 및 윈도우별로 ITD값의 분산을 구하는 분산계산단계와;
상기 각 채널별 및 윈도우별로 상기 ITD값의 분산과 상기 채널의 중심주파수를 이용하여 신호대잡음비를 계산하는 신호대잡음비계산단계를 포함한 영교차점을 이용한 신호대잡음비 추정방법을 포함한다.According to the present invention, a signal output from the same sound source is received by two sensors corresponding to the two ears of the human, and the frequency separation for each channel after calculating the signal-to-noise ratio included in each channel signal, the signal-to-noise ratio using a zero crossing point A computer readable recording medium having recorded thereon a program for executing the estimation method is provided.
Signal-to-noise ratio estimation method using the zero crossing point,
A ZCPA coding step of zero-crossing and peak amplitudes (ZCPA) coding each channel signal separated by frequency for each channel to obtain a zero crossing point and a maximum value;
An ITD calculation step of obtaining an interaural time difference (ITD) value by using the zero crossing point and the highest value for each channel obtained in the ZCPA coding step;
A variance calculation step of dividing each channel signal into a plurality of windows including the same number of zero crossing points and obtaining a variance of ITD values for each channel and window;
And a signal-to-noise ratio estimation method including a signal-to-noise ratio calculation step of calculating a signal-to-noise ratio using the dispersion of the ITD value and the center frequency of the channel for each channel and window.

또한, 본 발명에 따르면 동일한 음원으로부터 출력되는 신호를 인간의 두 귀에 해당하는 2개의 센서로 수신하고 채널별로 주파수 분리한 후 각 채널신호를 이용하여 상기 음원의 방향을 탐지하는 시스템에, 영교차점을 이용한 음원 방향 탐지방법을 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체가 제공되며,
상기 영교차점을 이용한 음원 방향 탐지 방법은,
상기 채널별로 주파수 분리된 각 채널신호를 ZCPA(zero-crossing and peak amplitudes) 코딩하여 영교차점과 최고치를 구하는 ZCPA코딩단계와;
상기 ZCPA코딩단계에서 얻어진 각 채널별 영교차점과 최고치 정보를 이용하여 ITD(interaural time difference)값을 구하는 ITD계산단계와;
상기 각 채널신호를 동일한 개수의 영교차점을 포함하는 다수의 윈도우로 나누고, 각 채널별 및 윈도우별로 ITD값의 분산을 구하는 분산계산단계와;
상기 각 채널별 및 윈도우별로 상기 ITD값의 분산과 상기 채널의 중심주파수를 이용하여 신호대잡음비를 계산하는 신호대잡음비계산단계와;
상기 신호대잡음비가 임계치보다 큰 ITD값을 추출하는 신뢰도향상단계와;
상기 신호대잡음비를 가중치로 하는 ITD값의 수평각 히스토그램을 구하고, 상기 수평각 히스토그램의 최고값을 음원의 공간적 위치로 결정하는 방향추정단계를 포함한다.In addition, according to the present invention receives a signal output from the same sound source with two sensors corresponding to the two ears of the human, and the frequency separation for each channel and then using a channel signal to detect the direction of the sound source using a zero crossing point, Provided is a computer readable recording medium having recorded thereon a program for executing a sound source direction detecting method.
Sound source direction detection method using the zero crossing point,
A ZCPA coding step of zero-crossing and peak amplitudes (ZCPA) coding each channel signal separated by frequency for each channel to obtain a zero crossing point and a maximum value;
An ITD calculation step of obtaining an interaural time difference (ITD) value by using the zero crossing point and the highest value for each channel obtained in the ZCPA coding step;
A variance calculation step of dividing each channel signal into a plurality of windows including the same number of zero crossing points and obtaining a variance of ITD values for each channel and window;
A signal-to-noise ratio calculation step of calculating a signal-to-noise ratio using the variance of the ITD value for each channel and window and the center frequency of the channel;
A reliability enhancement step of extracting an ITD value whose signal-to-noise ratio is greater than a threshold;
Obtaining a horizontal angle histogram of the ITD value using the signal-to-noise ratio as a weight, and determining a maximum value of the horizontal angle histogram as a spatial position of the sound source.

이하, 첨부된 도면을 참조하며 본 발명에 따른 영교차점을 이용한 신호대잡음비 추정방법 및 음원 방향 탐지방법을 보다 상세하게 설명하면 다음과 같다.Hereinafter, a method of estimating a signal-to-noise ratio using a zero crossing point and a sound source direction detection method will be described in detail with reference to the accompanying drawings.

본 발명은 ZCPA(zero-crossing and peak amplitudes)를 이용한 음성신호 코딩방법을 이용하여 음원에 포함된 신호대잡음비를 추정하고, 음원의 방향을 탐지한다. 본 발명의 발명자(D. S. Kim과 S. Y. Lee와 R. M. Kil)는 논문[Auditory proseccing of speech signals for robust speech recognition in real-world noidy environments, IEEE Trans. Speech and Audio Proc., 7(1): 55-69, 1999]에서, ZCPA를 이용한 음성신호 코딩방법을 제안한다. 이 ZCPA를 이용한 음성신호 코딩방법은 청신경계가 음원의 발화패턴을 코딩하는 방법에 착안하여 만들어진 방법으로서, 기존의 대표적인 음성특징 추출방법인 LPCC(Linear Prediction Cepstrum Coefficient)나 MFCC(Mel-Frequency Cepstrum Coefficient)방법에 비해 잡음 환경에서 보다 나은 인식 결과가 얻어진다.The present invention estimates the signal-to-noise ratio included in the sound source by using a voice signal coding method using ZCPA (zero-crossing and peak amplitudes), and detects the direction of the sound source. The inventors of the present invention (D. S. Kim and S. Y. Lee and R. M. Kil) report in Audi proseccing of speech signals for robust speech recognition in real-world noidy environments, IEEE Trans. Speech and Audio Proc., 7 (1): 55-69, 1999] propose a speech signal coding method using ZCPA. The ZCPA voice signal coding method is a method created by the auditory system based on the method of coding the speech pattern of a sound source. Better recognition results in a noisy environment compared to the

도 1은 본 발명에 따른 영교차점을 이용한 신호대잡음비 추정방법 및 음원 방향 탐지방법을 도시한 블록도이다.1 is a block diagram illustrating a signal-to-noise ratio estimation method and a sound source direction detection method using a zero crossing point according to the present invention.

음원(θ_S)의 방향을 탐지하기 위해 사람의 두 귀에 해당하는 두 개의 센서(미도시)로부터 신호를 수신하고, 두 개의 필터뱅크(111, 112)를 이용하여 센서로부터 수신된 신호를 채널별로 주파수 분리한다. 그리고, 각각의 채널신호를 ZCPA코딩(121, 122)하고, ZCPA코딩 결과값으로부터 대표적인 방향정보인 ITD(interaural time difference)값과 IID(interaural intensity difference)값을 계산(131)한다. 그리고, 이 계산된 ITD값을 이용하여 신호의 신호대잡음비(SNR)를 계산(141)한다. 신호대잡음비가 임계치보다 큰 ITD값을 선택(151)하고, 선택된 ITD값에 가중치를 부여하여 히스토그램을 생성(161)한 후 그 히스토그램으로부터 최고치를 찾음(171)으로서 음원의 방향을 찾는다.In order to detect the direction of the sound source θ _S , signals are received from two sensors (not shown) corresponding to two ears of a person, and the signals received from the sensors are channel-by-channel using two filter banks 111 and 112. Frequency separation. Each channel signal is ZCPA coded 121 and 122, and an interaural time difference (ITD) value and an interaural intensity difference (IID) value, which is representative direction information, is calculated from the ZCPA coding result. The signal-to-noise ratio (SNR) of the signal is calculated 141 using the calculated ITD value. An ITD value whose signal-to-noise ratio is greater than a threshold value is selected (151), weighted to the selected ITD value to generate a histogram (161), and then the maximum value is found from the histogram (171) to find the direction of the sound source.

각 단계를 상세하게 설명한다.Each step is explained in detail.

1. ZCPA 코딩1. ZCPA Coding

먼저, 두 개의 센서로부터 신호를 수신하는데, 이 수신신호는 음원신호에 간섭잡음과 배경잡음 등이 포함된 신호이다. 센서에서 수신된 신호는 2개의 필터뱅크(111, 112)에서 각 채널로 주파수 분리된다.First, a signal is received from two sensors. The received signal is a signal including interference noise and background noise in a sound source signal. The signal received from the sensor is frequency separated into each channel in the two filter banks 111 and 112.

주파수 분리된 각 채널신호는 각각 ZCPA 코딩하는데, ZCPA 코딩하는 방법은 앞서 언급하였던 바와 같이 본 발명의 발명자(D. S. Kim과 S. Y. Lee와 R. M. Kil) 가 발표한 논문[Auditory proseccing of speech signals for robust speech recognition in real-world noidy environments, IEEE Trans. Speech and Audio Proc., 7(1): 55-69, 1999]에 상세하게 설명되어 있으며, 본 발명의 명세서에서는 간략하게 설명한다.Each frequency-separated channel signal is ZCPA coded, and the method of ZCPA coding is described in the paper published by the inventors of the present invention (DS Kim, SY Lee and RM Kil) as described above. in real-world noidy environments, IEEE Trans. Speech and Audio Proc., 7 (1): 55-69, 1999, which are described in detail herein.

음성신호의 ZCPA 코딩은 필터뱅크를 통과한 채널신호를 상향 영교차와 인접한 영교차 사이의 신호의 최고치로 나타낸다. 이를 수식적으로 표현하면, 필터뱅크의 i 번째 채널 신호

에서 영교차점을 각각 t₁,t₂,...,t_N이라고 하면, ZCPA 코딩은 아래의 수학식 1과 같다.ZCPA coding of the speech signal represents the channel signal passing through the filter bank as the maximum value of the signal between the upstream zero crossing and the adjacent zero crossing. Expressing this formally, the i-th channel signal of the filterbank

If the zero crossing points at t ₁ , t ₂ , ..., t _N , respectively, ZCPA coding is given by Equation 1 below.

도 2는 i 번째 채널신호에 대한 ZCPA 코딩의 개념도이다. 즉, ZCPA 코딩값은 n 번째 영교차점에서는 t_n-1 에서 t_n까지의 구간에서의 x_i(t)값 중 최고치가 출력되고, 나머지 구간에서는 0이 출력된다.2 is a conceptual diagram of ZCPA coding for an i-th channel signal. That is, the maximum value of x _i (t) values in the interval from t _n-1 to t _n is output at the _n- th zero crossing point, and 0 is output in the remaining intervals.

사람의 두 귀에서 발생하는 i 번째 채널의 ZCPA 코딩신호를 각각

,

라고 가정한다. 여기서, n(=1,2,...,N)과 m(=1,2,...,M)은 각각 왼쪽과 오른쪽 귀의 i 번째 채널에서 발생하는 영교차점을 나타낸다.Each of the i-channel ZCPA coded signals

,

Assume that Where n (= 1,2, ..., N) and m (= 1,2, ..., M) represent the zero crossings occurring in the i-th channel of the left and right ears, respectively.

2. ITD값과 IID값 계산2. ITD value and IID value calculation

두 귀에서 발생하는 채널의 ZCPA 코딩값이 얻어지면, 이 양쪽 채널의 영교차점들을 아래의 수학식 2에 적용하여 ITD값

을 연산한다. 즉, 왼쪽 채널의 n번째 영교차점을 기준으로 오른쪽 채널의 가장 가까운 k번째 영교차점을 구하고, 두 영교차점의 차이를 ITD값으로서 구한다.Once the ZCPA coding values of the channels occurring at both ears are obtained, the zero crossing points of both channels are applied to Equation 2 below to determine the ITD value.

Calculate That is, the kth zero crossing point of the right channel is determined based on the nth zero crossing point of the left channel, and the difference between the two zero crossing points is obtained as the ITD value.

여기서, 왼쪽 채널의 n 번째 영교차점을 기준으로 오른쪽 채널의 가장 가까운 k번째 영교차점은 아래의 수학식 3과 같이 구한다.Here, the closest kth zero crossing point of the right channel based on the nth zero crossing point of the left channel is obtained as in Equation 3 below.

이렇게 구한 ITD값

은 주파수-시간 영역에서 음성신호의 i 번째 주파수 채널과 n 번째 영교차점 시간에 해당하는 ITD값이다. 그리고, 이때의 IID값

은 최고치값들로부터 아래의 수학식 4와 같이 구한다.ITD value

Is the ITD value corresponding to the i th frequency channel and n th zero crossing time of the voice signal in the frequency-time domain. And IID value at this time

Is obtained from Equation 4 below.

수학식 2와 수학식 3에서는 왼쪽 채널의 n 번째 영교차점을 기준으로 오른쪽 채널의 가장 가까운 k 번째 영교차점을 구하여 ITD값을 구하고 있으나, 고주파수 신호일 경우에는 왼쪽 채널과 오른쪽 채널과의 시간지연값보다 한 파장의 길이가 짧기 때문에, 가장 가까운 영교차점으로만 ITD값을 구하면 두 채널간 정확한 시간 지연을 구할 수 없다. 따라서, 음원의 공간적인 위치에 따라 발생할 수 있는 최대 시간지연값과 채널의 중심주파수의 한 파장의 길이를 고려한 개수의 ITD값을 구한다. 즉, 최대 시간지연값이 채널의 중심주파수의 한 파장 길이의 a 배인 경우, 왼쪽 채널의 임의의 영교차점을 기준으로 인접한 오른쪽 채널의 영교차점을 2a개를 찾아서, 그 차이값으로서 2a개의 ITD값을 계산한다.In Equation 2 and Equation 3, the ITD value is obtained by obtaining the closest kth zero crossing point of the right channel based on the nth zero crossing point of the left channel. However, in the case of a high frequency signal, the time delay value between the left channel and the right channel is calculated. Because of the short length of one wavelength, if the ITD value is obtained only at the nearest zero crossing point, the exact time delay between the two channels cannot be obtained. Therefore, the ITD value of the number considering the maximum time delay value that can occur according to the spatial position of the sound source and the length of one wavelength of the center frequency of the channel is obtained. That is, when the maximum time delay value is a times the length of one wavelength of the center frequency of the channel, 2a of the zero crossing points of the adjacent right channel are found based on an arbitrary zero crossing point of the left channel, and 2a ITD values as the difference value. Calculate

3. 신호대잡음비 계산3. Signal to Noise Ratio Calculation

이렇게 구해진 ITD값

으로부터 s개의 영교차점을 시간 윈도우로 사용하여 해당 시간 윈도우 에서의 ITD값의 평균과 분산을 계산한다. 본 발명에서는 인간의 음향 인지 특성을 고려하여 21개의 영교차점을 시간 윈도우로 결정한다. 이렇게 영교차점의 개수를 이용하여 시간 윈도우를 가변하면 저주파 채널에서는 시간 윈도우가 넓어지고, 고주파 채널에서는 시간 윈도우가 좁아지기 때문에, 인간의 음향 인지 특성에 부합된다.ITD value thus obtained

Using s zero crossings as a time window, we compute the mean and variance of the ITD values in that time window. In the present invention, 21 zero crossing points are determined as time windows in consideration of human acoustic recognition characteristics. If the time window is changed using the number of zero crossing points, the time window is widened in the low frequency channel and narrowed in the high frequency channel, thereby meeting the human acoustic recognition characteristics.

먼저, ITD값으로부터 평균과 분산을 구하는 과정을 살펴본다. ITD값

은 잡음에 의해 왜곡되기 때문에 이 ITD값

를 이용하면 음성신호와 잡음의 섞인 비율 즉, 신호대잡음비를 추정할 수 있게 된다. 일반적으로

값이 환경적인 요소나 측정오차 등의 잡음으로 인해 왜곡되는 현상은 아래의 수학식 5와 같이 표현된다.First, we look at the process of obtaining the mean and the variance from the ITD value. ITD value

Since this is distorted by noise, this ITD value

By using, it is possible to estimate the mixed ratio of speech signal and noise, that is, the signal-to-noise ratio. Generally

The phenomenon in which the value is distorted due to noise such as environmental factors or measurement errors is expressed as in Equation 5 below.

여기서,

와

는 잡음이 없을 때의 영교차점이고,

와

은 잡음에 의한 영교차점의 왜곡 정도로서, 평균과 분산이 각각 0과 인 독립균등분포(identically and independently distributed)를 갖는다고 가정한다. 음원의 공간적인 위치에 따라 두 귀에서 발생하는 신호의 시간지연을 Δ라고 하면 좌측의 잡음이 없을 때의 영교차점

은 아래의 수학식 6과 같이 표현된다.here,

Wow

Is the zero crossing in the absence of noise,

Wow

Is the degree of distortion of the zero crossing due to noise, and mean and variance are 0 and Suppose we have an independently and independently distributed. Δ is the time delay of the signal from both ears according to the spatial position of the sound source.

Is expressed by Equation 6 below.

잡음에 의한 영교차점의 왜곡 정도

와

는 서로 독립이고 평균값이 0이라 가정하면, ITD값

의 평균과 분산은 각각 Δ와 2

가 된다.Distortion degree of zero crossing point by noise

Wow

Are independent of each other and the average value is 0, the ITD value

The mean and variance of are Δ and 2

Becomes

여기서, 잡음이 섞인 채널에서 IDT값의 분산

에 대해 살펴본다.Where the variance of the IDT values in the noisy channel

Take a look at

먼저 고려하는 채널은 중심주파수가 ω_c 이고, 대역폭이 2b 인 이상적인 대역통과필터라고 가정한다. 필터의 대역폭이 상당히 작은 때 왼쪽과 오른쪽 귀의 i 번째 채널에서는 아래의 수학식 7과 같은 신호가 발생한다.The channel considered first is assumed to be an ideal bandpass filter with a center frequency of ω _c and a bandwidth of 2b. When the bandwidth of the filter is quite small, a signal such as Equation 7 below occurs in the i-th channel of the left and right ears.

이때,

과

은 평균과 분산이 각각 0과 1인 백색 잡음이고, A와 B는 각각 신호와 잡음의 세기에 해당한다. 위 수학식 7에서의 두 신호를 각각 ZCPA 코 딩하여

와

를 얻고, 그 평균과 분산을 구하면 각각 Δ와 2

가 된다. 그리고, 위 식은 영교차점에서 아래 수학식 8의 조건을 만족하며, 도 3은 잡음에 따른 영교차점의 이동을 도시한다.At this time,

and

Is white noise with mean and variance of 0 and 1, respectively, and A and B correspond to signal and noise strengths, respectively. ZCPA coding of two signals in Equation 7 above

Wow

If we find the mean and the variance,

Becomes The above equation satisfies the condition of Equation 8 below at the zero crossing point, and FIG. 3 shows the movement of the zero crossing point according to the noise.

여기서, 랜덤변수 x_n 과 y_n을 아래의 수학식 9와 같이 정의한다.Here, the random variables x _n and y _n are defined as in Equation 9 below.

이때, 랜덤변수

의 함수 형태로 볼 수 있기 때문에

이며,

이다.

인 경우에는 -π/2 ≤ x_n ≤ +π/2 구간에서 영교차점이 발생하지 않을 수도 있기 때문에, 고려 대상에서 제외한다.At this time, random variable

Can be seen as a function of

Is,

to be.

In this case, since the zero crossing may not occur in the interval -π / 2 ≤ x _n ≤ + π / 2, it is excluded from consideration.

랜덤변수 x_n 과 y_n의 비선형적인 관계로부터 랜덤변수 y_n의 확률분포함수는 아래의 수학식 10과 같이 구해진다.From the nonlinear relationship between the random variables x _n and y _n , the probability distribution function of the random variable y _n is obtained as in Equation 10 below.

여기서, r_n은 정규분포를 따른다면, x_n도 정규분포를 따르게 되며, 이때 확률분포함수는 아래의 수학식 11과 같이 정리된다.Here, if r _n follows a normal distribution, x _n also follows a normal distribution, and the probability distribution function is summarized as in Equation 11 below.

이때, 랜덤변수 x_n의 분산은 아래의 수학식 12와 같이 구해진다.In this case, the variance of the random variable x _n is obtained as in Equation 12 below.

그리고, 랜덤변수 y_n의 영역에 대한 조건으로부터 y_n의 평균은 아래의 수학식 13과 같이 구해진다.The average of _{n n} is obtained from the condition of the region of the random variable y _n as shown in Equation 13 below.

여기서, y=sinθ라고 하면, 수학식 13의 y_n의 평균은 아래의 수학식 14와 같이 표현할 수 있다.Here, if y = sinθ, the average of y _n in Equation 13 can be expressed as Equation 14 below.

위에서 주어진 y_n의 평균함수는 기함수(odd function)이며, 이때의 y_n의 분산은 아래의 수학식 15과 같이 구해진다.The average function of y _n given above is an odd function, and the variance of y _n at this time is calculated as in Equation 15 below.

위의 식에서는 일반적으로 σ_x ≪ π/2라는 사실을 이용한다. 이제 v(t_n)이 백색잡음이라는 가정을 이용하여 아래의 수학식 16을 유도한다.In the above formula, the fact that σ _x ≪ π / 2 is generally used. Now, using the assumption that v (t _n ) is white noise, Equation 16 is derived.

i 번째 채널을 통과한 신호의 신호대잡음비(SNR)는 아래의 수학식 17과 같다.The signal-to-noise ratio (SNR) of the signal passing through the i-th channel is expressed by Equation 17 below.

그러므로, 랜덤변수 y_n의 분산은 아래의 수학식 18을 만족한다.Therefore, the variance of the random variable y _n satisfies Equation 18 below.

따라서, 랜덤변수 x_n의 분산은 아래의 수학식 19와 같다.Therefore, the variance of the random variable x _n is expressed by Equation 19 below.

이제 랜덤변수 x_n의 분산과 ITD값

의 분산 사이의 수학식 20과 같은 관계를 이용하여, ITD값

의 분산을 구하면 아래의 수학식 21과 같다.Now the variance and ITD of the random variable x _n

The ITD value is obtained by using the relationship

The variance of is calculated by Equation 21 below.

최종적으로 ITD값

과 신호대잡음비(SNR)의 관계식은 아래의 수학식 22와 같다.Finally ITD value

And the signal-to-noise ratio (SNR) is expressed by Equation 22 below.

본 발명에서는 상술한 과정으로부터 도출된 수식들을 이용하여, s개의 영교차점 시간 윈도우로부터 국지적 ITD값

의 평균과 분산을 구하는데, 이 국지적 ITD값

의 평균과 분산은 아래의 수학식 23과 같다.In the present invention, using the equations derived from the above-described process, the local ITD value from the s zero crossing time window

Find the mean and the variance of

The mean and the variance of are as shown in Equation 23 below.

이때, i(= 1,...,P)는 필터뱅크의 채널 번호에 해당하고, j(=1,...,N-s+1)는 영교차점의 번호에 해당한다. 그리고, i 번째 채널, j 번째 영교차점에서의 신호대잡음비(SNR_i(j))는 아래의 수학식 24와 같다.In this case, i (= 1, ..., P) corresponds to the channel number of the filter bank, j (= 1, ..., N-s + 1) corresponds to the number of zero crossing point. The signal-to-noise ratio SNR _i (j) at the i-th channel and the j-th zero crossing point is expressed by Equation 24 below.

여기서, ω_C는 채널의 중심주파수이다.Where ω _C is the center frequency of the channel.

그리고, 각 시간 윈도우의 중앙에 위치한 ITD값과 IID값을 해당 시간 윈도우의 ITD값과 IID값으로 설정한다. 이를 수식으로 표현하면 수학식 25와 수학식 26과 같다. 각 시간 윈도우에 포함된 영교차점의 개수(s)는 홀수인 것으로 가정한다.The ITD value and IID value located in the center of each time window are set as the ITD value and IID value of the corresponding time window. This may be expressed as an equation (25) and (26). It is assumed that the number s of zero crossings included in each time window is odd.

4. ITD값과 IID값의 가중치 히스토그램4. Weighted histogram of ITD and IID values

수학식 24에 의해 구해진 ITD값과 IID값들 중 신뢰도가 높은 자료를 고르기 위해 신호대잡음비(SNR) 추정값이 임계치(τ)보다 큰 ITD값과 IID값을 취한다. 그리고, 이 신호대잡음비(SNR) 추정값을 가중치로 하여 ITD값, 또는 ITD값과 IID값의 히스토그램을 생성한다.In order to select highly reliable data among the ITD and IID values obtained by Equation 24, the ITD and IID values whose SNR estimates are larger than the threshold τ are taken. The signal-to-noise ratio (SNR) estimated value is used as a weight to generate a histogram of the ITD value or the ITD value and the IID value.

5. 방향 탐지5. Direction detection

ITD값과 IID값의 히스토그램의 최고값을 실제 음원의 공간적인 위치에 따라 발생하는 ITD값과 IID값으로 결정한다.The maximum value of the histogram of the ITD and IID values is determined by the ITD and IID values generated according to the spatial position of the actual sound source.

[실험결과][Experiment result]

본 발명의 효과를 입증하기 위하여 다음과 같은 실험 환경을 구성한다. 먼저, 음원을 HRTF(Head-Related Transfer-Function) 데이터베이스를 사용하여 공간상의 특정 좌표 위치에서 두 귀에 도달하는 소리 신호를 만든다. 그리고, 와우각 필터로 0.2kHz에서 1.2kHz사이에 6개 채널을 가지는 감마톤 필터뱅크를 구성한다. 각 채널마다 음원의 수평각(azimuth angle)에 해당하는 ITD값이 다르기 때문에 채널의 중심주파수를 가지는 톤펄스를 이용하여 채널의 수평각과 ITD값과의 함수관계를 HRTF 데이터베이스 값에 대해 조사한다.In order to demonstrate the effect of the present invention, the following experimental environment is constructed. First, the sound source uses a head-related transfer-function database to generate a sound signal reaching both ears at a specific coordinate position in space. The wah-angle filter forms a gammatone filter bank having six channels between 0.2 kHz and 1.2 kHz. Since the ITD value corresponding to the azimuth angle of the sound source is different for each channel, the function relationship between the horizontal angle of the channel and the ITD value is examined for the HRTF database value using the tone pulse having the center frequency of the channel.

이와 같은 실험 환경 하에서 기존의 교차상관에 의한 방향 탐지방법과 본 발명에 따른 방향 탐지방법을 비교 실험한다.In this experimental environment, the direction detection method according to the present invention and the direction detection method according to the present invention are compared.

교차상관에 의한 방향 탐지방법은 각 채널마다 교차상관 관계의 계산을 위한 시간 윈도우의 크기를 20msec로 고정하고, 매 10msec마다 ITD값을 계산한다. 그리고, 음원에 대한 최종적인 방향은 계산되어진 ITD값들로부터 만들어진 수평각 히스토그램에서 최고치(peak)값들로 결정한다. 마찬가지로, 본 발명에 따른 방향 탐지방법에서는 각 채널마다 영교차점으로부터 ITD값을 계산한 후, SNR 추정을 위한 시간 윈도우 크기를 21개의 영교차점으로 사용한다. 이 시간 윈도우 크기는 인간의 인지 능력과 연관이 있어서, 높은 주파수 채널에서는 5 내지 20msec의 짧은 시간 정보만을 이용하는 반면, 낮은 주파수 채널에서는 50 내지 140msec의 보다 긴 시간 정보를 이용한다. 즉, 교차상관에 의한 방향 탐지방법은 동일한 크기의 시간 윈도 우를 사용하는 반면, 본 발명에 따른 방향 탐지방법은 주파수에 따라 시간 윈도우의 크기가 인간의 음향 인지 특성에 따라 가변된다.In the cross-direction detection method, the size of the time window for calculating cross-correlation for each channel is fixed to 20 msec, and the ITD value is calculated every 10 msec. The final direction for the sound source is then determined by the peak values in the horizontal angle histogram made from the calculated ITD values. Similarly, in the direction detection method according to the present invention, after calculating the ITD value from the zero crossing point for each channel, the time window size for SNR estimation is used as 21 zero crossing points. This time window size is associated with human cognitive abilities, using only short time information of 5-20 msec on high frequency channels, while using longer time information of 50-140 msec on low frequency channels. That is, the direction detection method by cross correlation uses the same sized time window, whereas the direction detection method according to the present invention varies the size of the time window according to the frequency of human acoustic recognition.

먼저, 배경잡음 속에서 종래의 교차상관에 의한 방향 탐지방법과 본 발명에 따른 방향 탐지방법의 정확도를 평가하기 위해 다음과 같은 실험 환경을 구성한다. 음원으로 Ti20 데이터베이스의 숫자음을 HRTF를 이용해 수평각 20도 혹은 40도에 위치시킨다. 그리고, 배경잡음으로는 백색잡음, NOISEX 데이터베이스의 군중잡음(speech noise)과 자동차잡음(car noise)을 사용한다. 이 실험에서는 두 방향 탐지방법의 정확도를 측정하기 위해 배경잡음을 달리하면서 90번 실험한 후, 결과의 평균과 표준편차를 각각 측정한다.First, in order to evaluate the accuracy of the conventional direction detection method based on cross-correlation and the direction detection method according to the present invention in the background noise, the following experimental environment is constructed. The sound source of the Ti20 database is placed at a horizontal angle of 20 degrees or 40 degrees using HRTF. As background noise, white noise, speech noise of a NOISEX database, and car noise are used. In this experiment, we experiment 90 times with different background noise to measure the accuracy of the two-way detection method, and then measure the mean and standard deviation of the results.

도 4는 실험결과를 정리한 표로서, (a)와 (b)는 백색잡음, (c)와 (d)는 군중잡음, (e)와 (f)는 자동차잡음 환경에서, 종래의 교차상관에 의한 방향 탐지방법과 본 발명에 따른 방향 탐지방법에 대한 실험 결과이다. 이 결과에서 알 수 있듯이 본 발명에 따른 방향 탐지방법은 교차상관에 의한 방향 탐지방법보다 정확하고 잡음이 섞인 경우에도 표준편차가 낮아 잡음에 대해 보다 강인한 방향 탐지방법임을 알 수 있다.4 is a table summarizing the experimental results, where (a) and (b) are white noise, (c) and (d) are crowd noise, and (e) and (f) are vehicle crossover environments. Experimental results for the direction detection method and the direction detection method according to the present invention. As can be seen from this result, the direction detection method according to the present invention is more accurate than the direction detection method by cross-correlation, and even when the noise is mixed, the standard deviation is low, and thus it can be seen that the direction detection method is more robust against noise.

다음 실험으로서, 여러 개의 음원이 동시에 존재할 때 종래의 교차상관에 의한 방향 탐지방법과 본 발명에 따른 방향 탐지방법의 성능을 비교하기 위해, TI-DIGIT 데이터베이스에서 4개의 음원(여성화자 2명과 남성화자 2명)을 골라 여러가 지 배경잡음에 대해 실험한다. 4개의 음원은 모두 동일한 세기를 가지도록 하고, 각각 수평각 -10도, 0도, 10도, 40도에 위치시킨다.In the following experiment, four sound sources (two female and two female speakers) were used in the TI-DIGIT database to compare the performance of the conventional cross-correlation method and the direction detection method according to the present invention when several sound sources exist at the same time. Select two people and experiment with various background noises. All four sound sources should have the same intensity, and are located at horizontal angles of -10 degrees, 0 degrees, 10 degrees, and 40 degrees, respectively.

도 5는 실험결과로서, 종래의 방향 탐지방법과 본 발명에 따른 방향 탐지방법의 수평각 히스토그램을 비교한 도면으로서, 도 5a와 도 5b는 백색잡음, 도 5c와 도 5d는 군중잡음, 도 5e와 도 5f는 자동차잡음을 배경잡음으로 사용한 결과이다. 배경잡음은 모두 5dB의 세기로 설정한다. 이는 ITD값만을 고려한 가중치 히스토그램이며, ITD값과 IID값을 모두 고려하여 가중치 히스토그램을 생성하면, 3차원 형상의 히스토그램이 얻어지기 때문에 이로부터 음원 방향을 좀 더 정확하게 탐지할 수 있다. 히스토그램 결과, 본 발명에 따른 방향 탐지방법은 4개의 음원을 뚜렷하게 구분하지만, 종래의 교차상관에 의한 방향 탐지방법은 음원과 음원 사이에 잡음이 많이 발생하여 한 두 개의 뚜렷한 음원만을 구분할 수가 있을 뿐이다. 이러한 현상은 교차상관에 의한 방향 탐지방법이 고정된 크기의 시간 윈도우를 사용하여 ITD를 계산한 결과로서, 여러 개의 음원이 섞이면 음원간의 상호 간섭이 발생하여 음원들의 방향을 정확하게 탐지하기 곤란하다. 이에 반해 본 발명에 의한 방향 탐지방법은 시간 윈도우의 크기가 가변되기 때문에 음원간의 상호 간섭이 낮아서 음원들의 방향을 정확하게 탐지할 수 있다.5 is an experimental result, a comparison of the horizontal angle histogram of the conventional direction detection method and the direction detection method according to the present invention, Figures 5a and 5b is white noise, Figures 5c and 5d are crowd noise, Figure 5e and 5F shows the result of using automobile noise as background noise. All background noise is set to 5dB intensity. This is a weighted histogram considering only the ITD value. When the weighted histogram is generated in consideration of both the ITD value and the IID value, a three-dimensional histogram is obtained, and thus the sound source direction can be detected more accurately. As a result of the histogram, the direction detection method according to the present invention clearly distinguishes four sound sources, but the conventional direction detection method by cross-correlation can distinguish only one or two distinct sound sources due to a lot of noise generated between the sound source and the sound source. This phenomenon is a result of cross-correlation direction ITD calculation using a fixed time window. When several sound sources are mixed, mutual interference between sound sources occurs, making it difficult to accurately detect the direction of sound sources. On the other hand, the direction detection method according to the present invention can detect the direction of the sound source accurately because the mutual interference between the sound source is low because the size of the time window is variable.

이상에서 본 발명에 대한 기술 사상을 첨부 도면과 함께 서술하였지만, 이는 본 발명의 가장 양호한 일 실시예를 예시적으로 설명한 것이지 본 발명을 한정하는 것은 아니다. 또한, 이 기술 분야의 통상의 지식을 가진 자이면 누구나 본 발명의 기술 사상의 범주를 이탈하지 않는 범위 내에서 다양한 변형 및 모방이 가능함은 명백한 사실이다.Although the technical spirit of the present invention has been described above with reference to the accompanying drawings, it is intended to exemplarily describe the best embodiment of the present invention, but not to limit the present invention. In addition, it is obvious that any person skilled in the art may make various modifications and imitations without departing from the scope of the technical idea of the present invention.

본 발명에 따른 영교차점을 이용한 음원 방향 탐지방법은 종래의 교차상관에 의한 음원 방향 탐지방법에 비해 계산량이 적은 잇점이 있다. 채널의 N개의 음성 자료에서 ITD계산을 위한 최대 시간지연 윈도우 크기가 T[sec]이고, 음성신호의 샘플링 주파수가 f_s[Hz]일 때, 종래의 교차상관에 의한 음원 방향 탐지방법은 O(NTf_s)의 계산상의 복잡도를 가지지만, 본 발명에 따른 영교차점을 이용한 음원 방향 탐지방법은 O(Nsf_i/f_s)의 복잡도를 가진다. 여기서, s는 신호대잡음비 계산을 위한 영교차점의 개수이고, f_i[Hz]는 채널의 중심주파수이다. 두 값을 비교하면, 본 발명에 따른 방향 탐지방법은 종래기술에 비해

만큼 계산량이 적다. 예를 들면, N=2.5×10⁴, T=1.6×10^-3[sec], fs=1.0×10⁵[Hz], fs=1.0×10⁵[Hz], s=21인 경우, 종래기술의 방향 탐지방법은 4×10⁶의 연산이 필요하지만, 본 발명에 따른 방향 탐지방법은 4×10³의 연산이 필요하다. 즉, 본 발명은 종래에 비해 계산량이 1/1000로 줄어든다.The sound source direction detection method using the zero crossing point according to the present invention has an advantage that the amount of calculation is small compared to the sound source direction detection method by the conventional cross-correlation. When the maximum time delay window size for ITD calculation is T [sec] and the sampling frequency of the voice signal is f _s [Hz] in N voice data of the channel, the conventional sound source direction detection method using cross correlation is O ( NTf _s ) has a computational complexity, but the sound source direction detection method using the zero crossing point according to the present invention has a complexity of O (Nsf _i / f _s ). Where s is the number of zero crossings for signal-to-noise ratio calculation, and f _i [Hz] is the center frequency of the channel. Comparing the two values, the direction detection method according to the present invention compared to the prior art

As little computation. For example, in the case of N = 2.5 × 10 ⁴ , T = 1.6 × 10 ⁻³ [sec], fs = 1.0 × 10 ⁵ [Hz], fs = 1.0 × 10 ⁵ [Hz], s = 21, the prior art The direction detection method of requires a calculation of 4 × 10 ⁶ , but the direction detection method according to the present invention requires a calculation of 4 × 10 ³ . That is, the present invention reduces the calculation amount to 1/1000 as compared with the conventional.

또한, 본 발명에 따른 음원 방향 탐지방법은 종래의 교차상관에 의한 방향 탐지방법에 비해 여러 개의 음원이 동시에 존재하거나 여러 가지 잡음이 존재하는 일반적인 환경에서 음원간의 간섭현상이 적기 때문에 보다 정확한 방향 탐지가 가 능하고, 음원을 정확하게 분리할 수 있다.
In addition, the sound source direction detection method according to the present invention is more accurate direction detection because there is less interference between sound sources in a general environment in which a plurality of sound sources exist at the same time or a variety of noise than the conventional cross-direction direction detection method It is possible to separate the sound source accurately.

Claims

Two sensors corresponding to two ears of a person, a filter bank for frequency-dividing the signals received from the two sensors for each channel, and a processor for calculating a signal-to-noise ratio included in each channel signal separated in frequency from the filter bank. In the signal-to-noise ratio estimation method using the zero crossing point in one device,

Receiving the signal output from the same sound source by the filter bank using the two sensors and separating the received signal by channel for each frequency;

A ZCPA coding step of the processor obtaining zero-crossing points and peak values by zero-crossing and peak amplitudes (ZCPA) coding each channel signal separated in frequency for each channel;

An ITD calculation step of the processor obtaining an interaural time difference (ITD) value by using the zero crossing point and the highest value for each channel obtained in the ZCPA coding step;

A dispersion calculation step of the processor dividing each channel signal into a plurality of windows including the same number of zero crossing points and obtaining a variance of the ITD values for each channel and each window;

And a signal-to-noise ratio calculation step in which the processor calculates a signal-to-noise ratio using the dispersion of the ITD value for each channel and each window and the center frequency of the channel.

The method of claim 1, wherein the ITD calculation step,

A method for estimating a signal-to-noise ratio using a zero crossing point, comprising: obtaining a zero crossing point of a right channel closest to an arbitrary zero crossing point of a left channel, and setting a difference value between the two zero crossing points as an ITD value.

The method of claim 1, wherein the ITD calculation step,

If the maximum time delay value of the left channel and the right channel is less than k times the length of one wavelength of the center frequency of the channel, 2k zero crossings of the right channel adjacent to any zero crossing of the left channel are obtained, and the zero of the left channel is obtained. A signal-to-noise ratio estimation method using a zero crossing point, wherein a difference value between an intersection point and zero crossing points of 2k right channels is set as an ITD value.

The method of claim 1, wherein each window of the dispersion calculation step,

A method for estimating a signal-to-noise ratio using a zero crossing point, wherein a high frequency channel is divided into a time window having a smaller size than a low frequency channel including a number of zero crossing points according to human acoustic recognition characteristics.

The method of claim 1, wherein the signal-to-noise ratio calculation step,

The signal-to-noise ratio at the j-th zero crossing point window of the i-th channel is calculated by applying the center frequency (ω _C ) of the channel and the variance (S _i ² (j)) of the ITD value to the following equation. Signal-to-Noise Ratio Estimation Method Using Intersection Points.

[Equation]

Using two sensors corresponding to two ears of a person receiving a signal output from the same sound source, a filter bank for separating the signals received from the two sensors for each channel, and each channel signal separated in the frequency from the filter bank In the sound source direction detection method using a zero crossing point in the device having a processor for detecting the direction of the sound source,

Receiving, by the filter bank, a signal output from the same sound source using the two sensors, and separating the received signal for each channel by frequency;

A signal-to-noise ratio calculation step of calculating, by the processor, a signal-to-noise ratio using the dispersion of the ITD value for each channel and window and the center frequency of the channel;

A reliability enhancement step of the processor extracting an ITD value whose signal-to-noise ratio is greater than a threshold;

And a direction estimating step, wherein the processor obtains a horizontal angle histogram of an ITD value having the signal-to-noise ratio as a weight, and determines a maximum value of the horizontal angle histogram as a spatial position of a sound source.

The method of claim 6, wherein the ITD calculation step,

A method of detecting a sound source direction using a zero crossing point, comprising: obtaining a zero crossing point of a right channel closest to an arbitrary zero crossing point of a left channel, and setting a difference value between the two zero crossing points as an ITD value.

The method of claim 6, wherein the ITD calculation step,

If the maximum time delay value of the left channel and the right channel is less than k times the length of one wavelength of the center frequency of the channel, 2k zero crossings of the right channel adjacent to any zero crossing of the left channel are obtained, and the zero of the left channel is obtained. A sound source direction detection method using a zero crossing point, characterized in that the difference between the intersection point and the zero crossing point of 2k right channels is set as the ITD value.

The method of claim 6, wherein each window of the dispersion calculation step,

A method of detecting a sound source direction using a zero crossing point, wherein a high frequency channel is divided into a time window having a smaller size than a low frequency channel, including a number of zero crossing points according to human acoustic recognition characteristics.

The method of claim 6, wherein the signal-to-noise ratio calculation step,

The signal-to-noise ratio at the j-th zero crossing point window of the i-th channel is calculated by applying the center frequency (ω _C ) of the channel and the variance (S _i ² (j)) of the ITD value to the following equation. Sound source direction detection method using intersection points.

[Equation]

The method of claim 6,

And applying an IID (interaural intensity difference) value by applying the zero crossing point and the maximum value information for each channel obtained in the ZCPA coding step to the following equations.

[Equation]

here,

Is the IID value,

Is the ZCPA coding result signal of the i-th channel received from the sensor corresponding to the left ear,

Is the ZCPA coding result signal of the i-th channel received from the sensor corresponding to the right ear.

12. The method of claim 11, wherein the reliability improvement step extracts an ITD value and an IID value whose signal-to-noise ratio is greater than a threshold value, and the direction estimation step obtains a horizontal angle histogram of an ITD value and an IID value that weights the signal-to-noise ratio, The sound source direction detection method using a zero crossing point, characterized in that for determining the highest value of the horizontal angle histogram as a spatial position of the sound source.

In the system that receives the signal output from the same sound source with two sensors corresponding to the two ears of the human, separates the frequency for each channel and calculates the signal-to-noise ratio included in each channel signal,

A ZCPA coding step of zero-crossing and peak amplitudes (ZCPA) coding each channel signal separated by frequency for each channel to obtain a zero crossing point and a maximum value;

An ITD calculation step of obtaining an interaural time difference (ITD) value by using the zero crossing point and the highest value for each channel obtained in the ZCPA coding step;

A variance calculation step of dividing each channel signal into a plurality of windows including the same number of zero crossing points and obtaining a variance of ITD values for each channel and window;

A computer program for executing a signal-to-noise ratio estimation method using a zero crossing point including a signal-to-noise ratio calculation step of calculating the signal-to-noise ratio using the variance of the ITD value and the center frequency of the channel for each channel and window. Recordable media.

In a system for receiving a signal output from the same sound source to the two sensors corresponding to the two ears of the human being, frequency separation for each channel and then detecting the direction of the sound source using each channel signal,

A signal-to-noise ratio calculation step of calculating a signal-to-noise ratio using the variance of the ITD value for each channel and window and the center frequency of the channel;

A reliability enhancement step of extracting an ITD value whose signal-to-noise ratio is greater than a threshold;

A computer recording a program for executing a sound source direction detection method using a zero crossing point comprising a direction estimating step of obtaining a horizontal angle histogram of an ITD value weighted by the signal-to-noise ratio, and determining a maximum value of the horizontal angle histogram as a spatial position of a sound source. Recordable media that can be read by