CN103235287B

CN103235287B - Sound source localization camera shooting tracking device

Info

Publication number: CN103235287B
Application number: CN201310133558.2A
Authority: CN
Inventors: 张东阳; 张培华; 翟俊义; 任建文; 翟学明; 林陈伟; 贾晓霞; 袁思远; 魏子辉
Original assignee: North China Electric Power University
Current assignee: North China Electric Power University
Priority date: 2013-04-17
Filing date: 2013-04-17
Publication date: 2015-05-20
Anticipated expiration: 2033-04-17
Also published as: CN103235287A

Abstract

A sound source positioning camera tracking device, which includes a pan-tilt carrying a camera, a single-chip microcomputer and four pickups, wherein the first pickup, the second pickup and the third pickup are arranged in sequence along a horizontal straight line, and the fourth pickup is located on the side of the second pickup. Directly above, the output ends of the four pickups are respectively connected to the four input ports of the single-chip microcomputer through four filters, and the horizontal angle motor and the elevation angle motor of the cloud platform are connected to the output ports of the single-chip microcomputer. According to the sound signals picked up by the four pickups, the present invention uses a positioning method based on time delay estimation to determine the position of the monitoring target, which not only realizes the dynamic tracking of the specific target, but also has the advantages of small amount of calculation, easy implementation, and high positioning accuracy. The invention can realize multi-angle coverage monitoring by using a single camera, which not only improves the monitoring ability, but also saves the monitoring cost.

Description

A sound source localization camera tracking device

技术领域 technical field

本发明涉及一种基于广义互相关算法的声源定位摄像追踪装置，可使摄像机自动跟踪监控目标，属于测量技术领域。 The invention relates to a sound source positioning camera tracking device based on a generalized cross-correlation algorithm, which can enable a camera to automatically track a monitoring target, and belongs to the field of measurement technology.

背景技术 Background technique

目前,国内外的视频监控装置大致可分为两种，一种是静态监控装置，一种是动态监控装置。静态监控装置的摄像头是固定的，只能对某一场景进行拍摄，不能实现监控目标的动态跟踪，其监控范围较小。动态监控装置品种繁多，有的采用匀速转动摄像头的方法实现多角度监控，这种监控装置不能实现监控目标的动态跟踪，监控效果不理想；有的监控装置虽然能够对某特定目标进行跟踪，但是主要基于图像处理的方式捕获动态目标，其缺点是发现动态目标困难，图像数据处理设备成本高。 At present, domestic and foreign video monitoring devices can be roughly divided into two types, one is a static monitoring device, and the other is a dynamic monitoring device. The camera of the static monitoring device is fixed, and can only shoot a certain scene, and cannot realize the dynamic tracking of the monitoring target, and its monitoring range is small. There are many kinds of dynamic monitoring devices, and some use the method of rotating the camera at a uniform speed to realize multi-angle monitoring. This kind of monitoring device cannot realize the dynamic tracking of the monitoring target, and the monitoring effect is not ideal; although some monitoring devices can track a specific target, but Mainly based on image processing to capture dynamic targets, its disadvantages are that it is difficult to find dynamic targets and the cost of image data processing equipment is high.

中国专利申请号为201010204772.9、名称为“一种声源定位方法”,它提出建立声源信号的混叠模型、采用三维声测量阵列采集被测声源在X、Y、Z三个方向的混合声信号、消噪处理获得干净的观测信号、估计出被测声源的分离矩阵、获得频响矩阵、采用基于峰值检测的整体波达方向估计策略,一次性地获得对所有声源信号波达方向的准确估计及利用空间几何的相关知识,进行空间角度计算,最终实现声源信号的空间定位等步骤。克服了现有声源一定位方法无法有效实现多个混叠声源的三维空间定位,存在定位方法使用不灵活、估计结果不稳定且精度较低、实用性不够好等问题，但将其应用于视频监控装置仍感不便。 The Chinese patent application number is 201010204772.9, and the name is "a method of sound source localization". Acoustic signal, denoising processing to obtain a clean observation signal, estimate the separation matrix of the measured sound source, obtain the frequency response matrix, adopt the overall direction of arrival estimation strategy based on peak detection, and obtain the arrival of all sound source signals at one time. Accurate estimation of the direction and the use of relevant knowledge of spatial geometry to calculate the spatial angle and finally realize the spatial positioning of the sound source signal. It overcomes the problems that the existing sound source-location method cannot effectively realize the three-dimensional space positioning of multiple aliasing sound sources, and there are problems such as inflexible use of positioning methods, unstable estimation results, low precision, and insufficient practicability, etc., but it is applied to Video surveillance devices are still inconvenient.

总之，传统视频监控装置的监控范围都很有限，不能实现多角度的覆盖性监控，要实现全方位监控就必须增加监控设备的数量，这样就增加了监控设备的成本。 In short, the monitoring range of traditional video surveillance devices is very limited, and multi-angle coverage monitoring cannot be realized. To realize all-round monitoring, the number of monitoring equipment must be increased, which increases the cost of monitoring equipment.

发明内容 Contents of the invention

本发明的目的在于针对现有技术之弊端，提供一种声源定位摄像追踪装置，在实现多角度覆盖性监控的同时，尽量减小监控设备的制造成本。 The object of the present invention is to provide a sound source localization camera tracking device for the disadvantages of the prior art, which can reduce the manufacturing cost of the monitoring equipment as much as possible while realizing multi-angle coverage monitoring.

本发明所述问题是以下述技术方案实现的： Problem described in the present invention is realized with following technical scheme:

一种声源定位摄像追踪装置，构成中包括承载摄像机的云台、单片机和四个拾音器，其中，第一拾音器、第二拾音器和第三拾音器沿水平直线依次排列，第四拾音器位于第二拾音器的正上方，四个拾音器的输出端分别经四个滤波器接单片机的四个输入端口，所述云台的水平角电机和仰角电机接单片机的输出端口，该声源定位摄像追踪装置按如下方式运作： A camera tracking device for sound source localization, comprising a camera-carrying platform, a single-chip microcomputer, and four pickups, wherein the first pickup, the second pickup, and the third pickup are arranged in sequence along a horizontal line, and the fourth pickup is positioned at the second pickup. Directly above, the output ends of the four pickups are respectively connected to the four input ports of the single-chip microcomputer through four filters, and the horizontal angle motor and the elevation angle motor of the pan/tilt are connected to the output ports of the single-chip microcomputer. The sound source positioning camera tracking device is as follows How it works:

A.以第二拾音器为原点建立三维直角坐标系，第三拾音器在X轴的正半轴上，第四拾音器在Y轴的正半轴上，第二拾音器与其余三个拾音器的距离均为a ，摄像机云台放在原点； A. Establish a three-dimensional rectangular coordinate system with the second pickup as the origin, the third pickup is on the positive semi-axis of the X axis, the fourth pickup is on the positive semi-axis of the Y axis, and the distance between the second pickup and the remaining three pickups is a , the camera pan/tilt is placed at the origin;

B. 利用四个拾音器实时采集被追踪物体发出的声音信号，并采用时延估计算法求得声源到第一拾音器与第二拾音器的距离差d12、声源到第一拾音器与第三拾音器的距离差d13、声源到第三拾音器与第二拾音器的距离差d32、声源到第四拾音器与第二拾音器的距离差d42； B. Use four pickups to collect the sound signal from the tracked object in real time, and use the time delay estimation algorithm to obtain the distance difference d12 from the sound source to the first pickup and the second pickup, and the distance from the sound source to the first pickup and the third pickup The distance difference d13, the distance difference d32 from the sound source to the third pickup and the second pickup, the distance d42 from the sound source to the fourth pickup and the second pickup;

C.声源相对原点和X轴的水平角θ根据下式求得： C. The horizontal angle θ of the sound source relative to the origin and the X-axis is obtained according to the following formula:

θ=acos(d13/2a)， θ=acos(d13/2a),

声源相对原点和Y轴的仰角φ由下式求得： The elevation angle φ of the sound source relative to the origin and the Y axis is obtained by the following formula:

φ=acos(d24/a)， φ=acos(d24/a),

D.单片机驱动云台的水平角电机和仰角电机运转，将摄像机的水平角调整为θ，仰角调整到φ，从而实现监控目标的动态跟踪。 D. The single-chip microcomputer drives the pan-tilt's horizontal angle motor and elevation angle motor to run, and adjusts the horizontal angle of the camera to θ and the elevation angle to φ, so as to realize the dynamic tracking of the monitoring target.

上述声源定位摄像追踪装置，采用时延估计算法求得声源到两拾音器距离差的具体步骤如下： The specific steps for obtaining the distance difference between the sound source and the two pickups by using the delay estimation algorithm in the above-mentioned sound source localization camera tracking device are as follows:

① 求两信号间的互相关函数 ① Find the cross-correlation function between two signals

设两拾音器的输出信号经傅里叶变换后分别为，则两信号间的互功率谱为： Set the output signal of the two pickups After Fourier transform, they are , then the cross power spectrum between the two signals for:

， ,

其中为取共轭； in To take the conjugate;

对上式两边取傅里叶变换，有： Taking Fourier transform on both sides of the above formula, we have:

， ,

对信号在频域内加权后，由，将反变换到时域得到两信号之间的广义互相关函数： After weighting the signal in the frequency domain, by ,Will Inverse transform to time domain to get generalized cross-correlation function between two signals :

， ,

其中，为加权函数； in, is the weighting function;

② 根据两信号间的互相关函数求声源到两拾音器的距离差 ② According to the cross-correlation function between the two signals Find the distance difference between the sound source and the two pickups

求出互相关函数的最大值，得到两组信号间的时延，将求得的时延与声音的传播速度相乘，即可得到声源到两个拾音器的距离差。 Find the cross-correlation function The maximum value of , to get the time delay between the two sets of signals, multiply the obtained time delay by the propagation speed of the sound, you can get the distance difference between the sound source and the two pickups.

上述声源定位摄像追踪装置，在采用声源定位摄像追踪目标的同时，还要通过摄像的活体识别技术提高声源位置的精确度，其具体步骤如下： The above-mentioned sound source localization camera tracking device, while using the sound source localization camera to track the target, also needs to improve the accuracy of the sound source location through the live body recognition technology of the camera. The specific steps are as follows:

1、用B*0.11 + G*0.59 + R*0.3 计算灰度值，将得到的灰度值补给图像中灰度较小的像素点； 1. Use B*0.11 + G*0.59 + R*0.3 to calculate the gray value, and supply the obtained gray value to the pixel with a smaller gray value in the image;

其中，R表示红色，G表示绿色，B表示蓝色； Among them, R represents red, G represents green, and B represents blue;

0.11、0.59、0.3分别是蓝、绿、红三基色的权重； 0.11, 0.59, and 0.3 are the weights of the three primary colors of blue, green, and red respectively;

2、 2,

，， Rv1、Bv1、Gv1:分别是源图像像素点的三基色（红、蓝、绿）分量； Rv2、Bv2、Gv2:分别是目标图像像素点的三基色（红、蓝、绿）分量； Rvd、Bvd、Gvd取平均值，再与允许最大差异值相比，若大于则用源图像像素点更新目标图像像素点，否则不更新，最后形成差异图。 , , Rv1, Bv1, Gv1: respectively the three primary colors (red, blue, green) components of the source image pixels; Rv2, Bv2, Gv2: respectively the three primary colors (red, blue, green) components of the target image pixels; Rvd , Bvd, and Gvd take the average value, and then compare it with the allowable maximum difference value. If it is greater than the source image pixel, the target image pixel is updated, otherwise it is not updated, and finally a difference map is formed.

上述声源定位摄像追踪装置，所述加权函数为：。 The above-mentioned sound source localization camera tracking device, the weighting function is: .

上述声源定位摄像追踪装置，所述拾音器采用麦克风。 In the above-mentioned sound source localization camera tracking device, the sound pickup adopts a microphone.

本发明根据四个拾音器拾取的声音信号，采用基于时延估计的定位方法确定监控目标的位置，不仅实现了对特定目标的动态跟踪，而且具有运算量小，易于实现，定位精度高等优点。该发明利用单个摄像机就可实现多角度的覆盖性监控，不仅提高了监控能力，而且节约了监控成本。 According to the sound signals picked up by the four pickups, the present invention uses a positioning method based on time delay estimation to determine the position of the monitoring target, which not only realizes the dynamic tracking of the specific target, but also has the advantages of small amount of calculation, easy implementation, and high positioning accuracy. The invention can realize multi-angle coverage monitoring by using a single camera, which not only improves the monitoring ability, but also saves the monitoring cost.

本装置可以自动寻找发声地点并进行观察，使安防更加安全。在视频会议中或者大型会议场所，本装置可实现智能视频采集，体现摄像功能的人性化。 The device can automatically find and observe the place where the sound is emitted, making the security more secure. In a video conference or a large meeting place, this device can realize intelligent video collection, reflecting the humanization of the camera function.

附图说明 Description of drawings

下面结合附图对本发明作进一步说明。 The present invention will be further described below in conjunction with accompanying drawing.

图1是摄像头伺服随动系统的电原理图； Figure 1 is an electrical schematic diagram of the camera servo servo system;

图2是四个拾音器的安装位置示意图； Fig. 2 is the installation position schematic diagram of four pickups;

图3是光照补偿差异性图像识别算法框图。 Figure 3 is a block diagram of an image recognition algorithm for illumination compensation differences.

图中各标号清单为：MIC1～MIC 4、第一拾音器～第四拾音器，U1～U4、第一滤波器～第四滤波器，U5、单片机，M1、水平角电机，M2、仰角电机。 The list of labels in the figure is: MIC1～MIC 4, the first pickup to the fourth pickup, U1～U4, the first filter to the fourth filter, U5, single-chip microcomputer, M1, horizontal angle motor, M2, elevation angle motor.

文中各符号清单为：d12、声源到第一拾音器与第二拾音器的距离差，d13、声源到第一拾音器与第三拾音器的距离差，d32、声源到第三拾音器与第二拾音器的距离差，d42、声源到第四拾音器与第二拾音器的距离差，r、声源相对原点的距离，θ、声源相对原点和X轴的水平角，φ、声源相对原点和Y轴的仰角，a、第二拾音器与其余三个拾音器的距离，、两个拾音器的输出信号，、经傅里叶变换后的信号，、两信号间的互功率谱，、两信号之间的广义互相关函数，、加权函数。 The list of symbols in the text is: d12, the distance difference from the sound source to the first pickup and the second pickup, d13, the distance difference from the sound source to the first pickup and the third pickup, d32, the distance difference from the sound source to the third pickup and the second pickup distance difference, d42, the distance difference from the sound source to the fourth pickup and the second pickup, r, the distance from the sound source to the origin, θ, the horizontal angle of the sound source to the origin and the X axis, φ, the sound source to the origin and Y The elevation angle of the axis, a, the distance between the second pickup and the remaining three pickups, , the output signals of the two pickups, , The signal after Fourier transform, , the cross-power spectrum between the two signals, , the generalized cross-correlation function between the two signals, , Weighting function.

具体实施方式 Detailed ways

本系统采用拾音器（麦克风）阵列拾取语音信号，根据声音频域和时域判断出声音方向，对声源进行定位。之后，摄像头自动转向发声方位，以实现对声源位置的监控，再通过摄像的活体识别技术提高声源位置的精确度。实现了单一设备的全方位监控，提高了监控能力而又节约了成本。 This system uses a pickup (microphone) array to pick up voice signals, and judges the sound direction according to the sound frequency domain and time domain, and locates the sound source. After that, the camera automatically turns to the direction of the sound to monitor the position of the sound source, and then the accuracy of the sound source position is improved through the living body recognition technology of the camera. It realizes the all-round monitoring of a single device, improves the monitoring ability and saves the cost.

本发明中摄像头云台采用型号为普天视PTS-303；其中，单片机PB1、PB2端控制云台上带动摄像头上下转动的水平角电机M1；单片机PB3、PB4端控制云台上带动摄像头左右旋转的仰角电机M2。 In the present invention, the model used for the camera cloud platform is Putianshi PTS-303; wherein, the horizontal angle motor M1 that drives the camera to rotate up and down on the single-chip microcomputer PB1 and PB2 ends controls the cloud platform; Elevation motor M2.

麦克风摆放图如图2所示，第二拾音器MIC2在原点，它与其它拾音器的距离均为a ，声源到各拾音器对之间的距离差分别为d12,d13,d32,d42（距离差等于时延乘声速）;建立如图所示的坐标系，根据几何近似可求得声源相对原点和X轴的水平角θ为： The layout of the microphone is shown in Figure 2. The second pickup MIC2 is at the origin, and the distance between it and other pickups is a, and the distance difference between the sound source and each pickup pair is d12, d13, d32, d42 (distance difference Equal to the time delay multiplied by the speed of sound); establish the coordinate system shown in the figure, and according to the geometric approximation, the horizontal angle θ of the sound source relative to the origin and the X axis can be obtained as:

θ=acos(d13/2a)； θ=acos(d13/2a);

声源相对原点和m2-m4的仰角φ为： The elevation angle φ of the sound source relative to the origin and m2-m4 is:

φ=acos(d24/a)； φ=acos(d24/a);

设声源坐标为（x,y,z），声源相对原点的距离r由以下式子推出： Let the coordinates of the sound source be (x, y, z), and the distance r of the sound source relative to the origin is deduced from the following formula:

sqrt((x+a)*(x+a)+（y*y)+(z*z))-sqrt((x*x)+(y*y)+(z*z))=d12; sqrt((x+a)*(x+a)+(y*y)+(z*z))-sqrt((x*x)+(y*y)+(z*z))=d12;

sqrt((x-a)*(x+a)+(y*y)+(z*z))-sqrt((x*x)+(y*y)+(z*z))=d32; sqrt((x-a)*(x+a)+(y*y)+(z*z))-sqrt((x*x)+(y*y)+(z*z))=d32;

r=sqrt((x*x)+(y*y)+(z*z)); r=sqrt((x*x)+(y*y)+(z*z));

sqrt()是开平方。 sqrt() is the square root.

整理化简可得：r=((a*a)-((d12*d12)+(d32*d32)))/(d12+d32); Arranging and simplifying can be obtained: r=((a*a)-((d12*d12)+(d32*d32)))/(d12+d32);

在实际定位中，只需将摄像机放在原点，先转到水平角θ，再转到仰角φ即可。 In the actual positioning, just put the camera at the origin, first turn to the horizontal angle θ, and then turn to the elevation angle φ.

在声源定位方面，考虑到系统实现的实时性，本系统采用的是基于时延估计的声源定位技术。由于声源信号是一个非平稳信号，易受到噪声和混响的干扰，但具有短时平稳性，在30ms内可以看做周期信号，于是可以用FFT算法进行频域分析。在时延估计算法中，采用了具有一定抗噪声和混响能力的广义互相关函数（GCC）方法。该方法通过求两个信号之间的互功率谱，然后在频域内给予不同的加权运算，最后再反变换到时域，得到两组信号之间的互相关函数。求出互相关函数的最大值即为两组信号间的时延。根据时延计算出声源到两个麦克风的距离差，利用到四个点的距离差就能确定声源的位置。我们通过四个拾音器对声音信号的采集，实现声源的方位确定。 In terms of sound source localization, considering the real-time performance of the system, this system adopts the sound source localization technology based on time delay estimation. Since the sound source signal is a non-stationary signal, which is easily disturbed by noise and reverberation, but has short-term stability, it can be regarded as a periodic signal within 30ms, so the FFT algorithm can be used for frequency domain analysis. In the delay estimation algorithm, the generalized cross-correlation function (GCC) method with certain anti-noise and reverberation ability is adopted. This method obtains the cross-correlation function between two groups of signals by calculating the cross-power spectrum between the two signals, then giving different weighting operations in the frequency domain, and finally inversely transforming it to the time domain. Finding the maximum value of the cross-correlation function is the time delay between the two groups of signals. The distance difference between the sound source and the two microphones is calculated according to the time delay, and the position of the sound source can be determined by using the distance difference to the four points. We use four pickups to collect sound signals to determine the direction of the sound source.

广义互相关（GCC）法求时延为间接求时延的基本原理是：求得两信号间的互功率谱，然后在频域内给予不同的加权运算，最后在反变换到时域，得到两组信号的之间的互相关函数。求出互相关函数的最大值即为两组信号间的时延。 The basic principle of the generalized cross-correlation (GCC) method to obtain the time delay indirectly is: to obtain the cross-power spectrum between the two signals, and then give different weighting operations in the frequency domain, and finally inversely transform to the time domain to obtain the two Cross-correlation function between groups of signals. Finding the maximum value of the cross-correlation function is the time delay between the two groups of signals.

两组信号的傅里叶变换后为，两信号间的互功率谱如下，其中为取共轭： two sets of signals After the Fourier transform of , the cross power spectrum between the two signals as follows, where For conjugation:

， ,

由于两个信号的互相关函数与信号间的互功率谱为傅里叶变换对，对上式式两边取傅里叶变换，有： Since the cross-correlation function of the two signals and the cross-power spectrum between the signals are Fourier transform pairs, taking Fourier transform on both sides of the above formula, we have:

， ,

由，将反变换到时域可得： Depend on ,Will Inverse transformation to the time domain can be obtained:

， ,

在实际情况中，由于混响的存在，中包含有反射声的互功率谱成分，导致互相关函数峰值偏移；另外，在低信噪比的情况下，会弱化互相光函数的峰值从而导致峰值检测困难。广义互相关函数为了抑制噪声和混响的影响，对信号在频域内加权，以此对信号和噪声进行白化处理，从而增强信号中信噪比较高的频率成分，抑制噪声的影响，最后再反变换到时域，得到两信号之间的广义互相关函数，即： In practice, due to the presence of reverberation, contains the cross-power spectrum component of the reflected sound, which leads to the peak shift of the cross-correlation function; in addition, in the case of low signal-to-noise ratio, the peak of the mutual light function will be weakened, resulting in difficulty in peak detection. In order to suppress the influence of noise and reverberation, the generalized cross-correlation function weights the signal in the frequency domain, so as to whiten the signal and noise, thereby enhancing the frequency components with high signal-to-noise ratio in the signal, suppressing the influence of noise, and finally Inversely transformed to the time domain, the generalized cross-correlation function between the two signals is obtained, namely:

， ,

为加权函数。我们经过试验当=时效果最好。所以我们用的加权函数为。 is the weighting function. We have tested when = works best. So the weighting function we use is .

活体识别 live recognition

光照补偿差异性图像识别算法作用: Effect of illumination compensation difference image recognition algorithm:

选择先后相邻的两张图像做差异性算法运算形成差异图。在差异图中找到真正与背景相差的点，更新到差异累计图中，其余点累计次数。并去除差异累计图中已移走的差异部份。根据差异部分大小判断是否有目标物体移动。 Select two successively adjacent images to perform difference algorithm operations to form a difference map. Find the point that is really different from the background in the difference map, update it to the cumulative difference map, and accumulate the remaining points. And remove the difference part that has been removed in the cumulative difference graph. Judging whether there is a target object moving according to the size of the difference part.

算法实现: Algorithm implementation:

1.将差异图中的非差异点累计次数，若达到更新背景次数，用旧图中的点更新背景。若为新点且已静止2次，置高差异累计图对应点值。 1. Accumulate the number of non-difference points in the difference map. If the number of times to update the background is reached, use the points in the old map to update the background. If it is a new point and has been stationary for 2 times, set the value of the corresponding point on the cumulative difference graph to a higher value.

2.将差异图中的差异点在新图与背景图之间用“光照补偿差异算法”求差异度，根据差异度选择： 2. Use the "illumination compensation difference algorithm" to find the difference between the difference points in the difference map between the new map and the background map, and choose according to the difference:

2.1.无差异，刷新“静止次数累计图”和“差异累计图”对应像素。 2.1. If there is no difference, refresh the corresponding pixels of the "accumulated number of stationary times" and "accumulated difference".

2.2有差异，刷新“静止次数累计图”对应像素，并置高差异累计图对应像素。对应记录累计有差异像素点数。 2.2 If there is a difference, refresh the corresponding pixel of the "cumulative image of static times" and set the corresponding pixel of the cumulative image of the difference to a higher value. The number of pixels with differences accumulated in the corresponding records.

2.3.找到“差异累计图”中曾为新差异点的像素，在新图对应像素中用“光照补偿差异算法”与对应背景像素求差异度，若有差异则刷新“静止次数累计图”和“差异累计图”对应像素。 2.3. Find the pixel that was a new difference point in the "difference cumulative map", use the "light compensation difference algorithm" and the corresponding background pixel to find the difference in the corresponding pixel of the new map, if there is a difference, refresh the "still times cumulative map" and The "difference cumulative map" corresponds to pixels.

2.4.根据以上计算出来的有差异像素点数估算目标物体的移动情况。选定固定像素矩阵面计算有差异像素点在矩阵面中的百分比。当大于预设定值时就断为有目标物体移动，并把像素矩阵面中心定位到有差异像素点集中区域，实现跟踪定位效果。 2.4. Estimate the movement of the target object based on the number of difference pixels calculated above. Select the fixed pixel matrix surface to calculate the percentage of different pixels in the matrix surface. When it is greater than the preset value, it will be judged that there is a target object moving, and the center of the pixel matrix surface is positioned to the concentrated area of different pixel points to achieve the effect of tracking and positioning.

在摄像头方面我们在采集图像的基础上，采用摄像的活体识别技术，利用了“光照补偿差异性算法”。度图像数据先进行光照补偿，再进行差异性算法。光照补偿法：用B*0.11 + G*0.59 + R*0.3 计算灰度值，将得到的灰度值补给灰度较小的像素。差异性算法： v分别取r,g,b （1、2下标分别表示源像素点和目标像素点），最后将三个vd取平均值，再与允许最大差异值相比返回比较结果。算法实现中的特化与优化：灰度差大于某值时不进行补偿，为了优化将“光照补偿法”与“差异性算法”进行了融合处理，并不是补后再求差异。最终实现图像上的活体动态识别效果。 In terms of the camera, we use the living body recognition technology of the camera on the basis of collecting images, and use the "light compensation difference algorithm". The high-resolution image data is first subjected to illumination compensation, and then to the difference algorithm. Light compensation method: use B*0.11 + G*0.59 + R*0.3 to calculate the gray value, and supply the obtained gray value to the pixel with a smaller gray value. Difference Algorithm: v take r, g, b respectively (subscripts 1 and 2 represent the source pixel and target pixel respectively), and finally take the average value of the three vd, and then return the comparison result with the allowable maximum difference value. Specialization and optimization in algorithm implementation: No compensation will be performed when the gray level difference is greater than a certain value. For optimization, the "illumination compensation method" and "difference algorithm" are fused together, instead of finding the difference after supplementing. Finally, the live dynamic recognition effect on the image is realized.

在数据传输方面，采用3G无线网络将图像数据传输到服务端。实现了视频监控和图像采集的更加自动化智能化。 In terms of data transmission, 3G wireless network is used to transmit image data to the server. It realizes more automation and intelligentization of video monitoring and image acquisition.

本发明实现了特定目标的自动监控，可以减轻监控人员的工作强度，提高监控效率，节约人力。在现场监控中可以根据发声地点进行具体位置的观察，使安防更加安全。在视频会议中或者大型会议场所，可通过此功能实现更智能的视频采集，体现摄像功能的人性化。 The invention realizes automatic monitoring of a specific target, can reduce the work intensity of monitoring personnel, improve monitoring efficiency and save manpower. In the on-site monitoring, the specific location can be observed according to the location of the sound, making the security more secure. In video conferences or large meeting places, this function can be used to realize more intelligent video collection, reflecting the humanization of the camera function.

Claims

1. A sound source localization camera tracking method is characterized in that it utilizes a sound source localization camera tracking device, and said sound source localization camera tracking device comprises a pan platform carrying a video camera, a single-chip microcomputer (U5) and four pickups, wherein , the first pickup (MIC1), the second pickup (MIC2) and the third pickup (MIC3) are arranged in sequence along a horizontal line, the fourth pickup (MIC4) is located directly above the second pickup (MIC2), and the output terminals of the four pickups The four input ports of the single-chip microcomputer (U5) are respectively connected through four filters, the horizontal angle motor (M1) and the elevation angle motor (M2) of the pan/tilt are connected to the output port of the single-chip microcomputer (U5), and the sound source positioning camera tracking The device works as follows:

A. Establish a three-dimensional rectangular coordinate system with the second pickup (MIC2) as the origin, the third pickup (MIC3) is on the positive semi-axis of the X axis, the fourth pickup (MIC4) is on the positive semi-axis of the Y axis, and the second pickup (MIC2) The distance from the other three pickups is a, and the camera pan/tilt is placed at the origin;

B. Use four pickups to collect the sound signal from the tracked object in real time, and use the delay estimation algorithm to obtain the distance difference d12 from the sound source to the first pickup (MIC1) and the second pickup (MIC2), and the distance difference from the sound source to the first pickup The distance difference d13 between the pickup (MIC1) and the third pickup (MIC3), the distance difference d32 between the sound source and the third pickup (MIC3) and the second pickup (MIC2), the distance difference between the sound source and the fourth pickup (MIC4) and the second pickup (MIC2) distance difference d42;

C. The horizontal angle θ of the sound source relative to the origin and the X-axis is obtained according to the following formula:

θ=acos(d13/2a),

The elevation angle φ of the sound source relative to the origin and the Y axis is obtained by the following formula:

φ=acos(d24/a),

D. The single-chip microcomputer (U5) drives the pan-tilt’s horizontal angle motor (M1) and elevation angle motor (M2) to run, and adjusts the camera’s horizontal angle to θ, and the elevation angle to φ, so as to realize the dynamic tracking of the monitoring target;

The specific steps to obtain the distance difference between the sound source and the two pickups using the time delay estimation algorithm are as follows:

① Find the cross-correlation function between two signals

Set the output signal of the two pickups After Fourier transform, they are , then the cross power spectrum between the two signals for:

,

in To take the conjugate;

Taking Fourier transform on both sides of the above formula, we have:

,

After weighting the signal in the frequency domain, by ,Will Inverse transform to time domain to get generalized cross-correlation function between two signals :

, ,

in, is the weighting function;

② According to the cross-correlation function between the two signals Find the distance difference between the sound source and the two pickups

Find the cross-correlation function The maximum value of , to get the time delay between the two sets of signals, multiply the obtained time delay by the propagation speed of the sound, you can get the distance difference between the sound source and the two pickups.

2. a kind of sound source localization camera tracking method according to claim 1, it is characterized in that, when adopting sound source localization camera to track target, also will improve the accuracy of sound source position by the living body recognition technology of camera, its Specific steps are as follows:

Use B*0.11 + G*0.59 + R*0.3 to calculate the grayscale value, and supply the obtained grayscale value to the pixels with smaller grayscale in the image;

Among them, R represents red, G represents green, and B represents blue;

0.11, 0.59, and 0.3 are the weights of the three primary colors of blue, green, and red respectively;

, , , Rv1, Bv1, Gv1: respectively the three primary colors (red, blue, green) components of the source image pixels; Rv2, Bv2, Gv2: respectively the three primary colors (red, blue, green) components of the target image pixels; Rvd , Bvd, and Gvd take the average value, and then compare it with the allowable maximum difference value. If it is greater than the source image pixel, the target image pixel is updated, otherwise it is not updated, and finally a difference map is formed.

3. A kind of sound source location camera tracking method according to claim 2, is characterized in that, described weighting function is: .

4. A sound source localization camera tracking method according to claim 3, wherein the sound pickup adopts a microphone.