JPH0449756A

JPH0449756A - Conference speech device

Info

Publication number: JPH0449756A
Application number: JP2160490A
Authority: JP
Inventors: Masaharu Shimada; 正治島田; Shinji Hayashi; 伸二林
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1990-06-18
Filing date: 1990-06-18
Publication date: 1992-02-19

Abstract

PURPOSE:To perform the static of an image at a reception audible side by processing signals from plural mikes for sound, and specifying the position of the mike in accordance with a sender. CONSTITUTION:The sound pressure levels of the mikes M1-Mn are detected by sound detecting means 3a-3d, and a talker position is detected sequentially starting from a first talker by a talker number detecting means, and those talker numbers are stored. Thence, the cross-correlation functions of audio signals received from the first and n-th mikes are found by a cross-correlation function calculation means. Thence, time difference in which the maximum value of the cross-correlation function is found by a maximum value detecting means. Then, match between found time difference and the talker position stored in the talker number storage means is confirmed, and a sender code signal 18 for the talker number is sent out. Therefore, it is possible to perform the static of the image at the reception audible side by attaching a number on the mike.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、通信回線を介して複数の通話者が会話するこ
とができる会議通話装置に利用する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Field of Application] The present invention is applied to a conference call device in which a plurality of callers can have a conversation via a communication line.

本発明は、特に、複数のマイク入力からなる多数の話者
間の会議通信方式に関するもので、送話者の位置の情報
を知ることにより、話者が誰であるのかを明確にするた
め同時話者位置を検出する会議通話装置に関する。In particular, the present invention relates to a conference communication system between multiple speakers using multiple microphone inputs. The present invention relates to a conference call device that detects speaker positions.

[Conventional technology]

従来、複数の話者が２対地に分散され、音像を再生する
方法にステレオ伝送方式がある。例えば、文献（南「疑
似ステレオ音声を用いたテレコンファレンスの音声系」
電子通信学会、回路とシステム研究会資料ＣＡＳ８６−
２９．１９８６＞がある。この内容は周知のように、送
話側では二つのマイクの相関をとり、この相関量を別な
回線によって送出し受聴側で二つの拡声器から、その相
関量をもとにもとの音場を再生しようとするものである
。Conventionally, there is a stereo transmission method in which a plurality of speakers are distributed over two locations and a sound image is reproduced. For example, see the literature (Minami ``Teleconference audio system using pseudo-stereo audio'')
Institute of Electronics and Communication Engineers, Circuit and System Study Group Material CAS86-
29.1986>. As is well known, the transmitter calculates the correlation between the two microphones, sends out this correlation over a separate line, and the receiver uses the two loudspeakers to create the original sound based on the correlation. It is an attempt to revitalize the place.

しかし、この方式では同時話者があった場合、相関量は
不定となり、正確に求めることはできず現実的な音場再
生とはならない。しかも、単独話者の場合でも、集音す
るマイクの位置や、送話側および受聴側の音響条件、特
に残響条件によって音像定位が正確に行われない。これ
は会議通話の場合、各話者が点音源であるため、従来の
音楽ステレオ放送で聞いたオーケストラのように、右側
からピアノが、左側からバイオリンが聞こえてくると行
った曖昧な音像定位と本質的に異なるからである。However, in this method, when there are simultaneous speakers, the amount of correlation becomes undefined and cannot be determined accurately, resulting in unrealistic sound field reproduction. Moreover, even in the case of a single speaker, sound image localization cannot be performed accurately depending on the position of the microphone for collecting sound, the acoustic conditions on the transmitting side and the listening side, especially the reverberation conditions. In the case of a conference call, each speaker is a point sound source, so when a piano is heard from the right side and a violin is heard from the left side, like an orchestra heard in a conventional music stereo broadcast, the sound image localization is ambiguous. This is because they are essentially different.

この理由として、例えば、ＮＨＫ　（日本放送協会）が
現在、高精彩画テレビ、すなわちＨＤ　Ｔ　Ｖのサービ
スを開始しようとしているが、拡声器の数は５個とし、
前面には拡声器を３個設置し、中央での音像のぼけを防
止しているが、これも音像定位をより正確なものとした
いと言う理由からくるものである。しかし、前述したよ
うに遠隔会議では各話者の位置が重要な情報源であり、
その話者の位置は点音源であり、より正確な音像定位が
求められることはいうまでもない。The reason for this is, for example, that NHK (Japan Broadcasting Corporation) is currently trying to start a high-definition television, or HD TV, service, but the number of loudspeakers is 5.
Three loudspeakers are installed in the front to prevent the sound image from becoming blurred in the center, but this is also due to the desire to make the sound image localization more accurate. However, as mentioned above, in remote conferences, the location of each speaker is an important source of information.
Since the speaker's position is a point sound source, it goes without saying that more accurate sound image localization is required.

これらの欠点を打破し、これを解決する方式として、例
えば、文献（島田、鈴木「多対地音声会議通信システム
の対地識別音像生成方式」電子通信学会誌、第Ｊ７０−
Ｂ巻、第９号、１９８７）がある。As a method to overcome and solve these drawbacks, for example, the literature (Shimada, Suzuki, "Ground identification sound image generation method for multi-point audio conference communication system", Journal of the Institute of Electronics and Communication Engineers, No. J70-
Volume B, No. 9, 1987).

この方式では複数の対地に通話者端末を配置し、その通
話者端末ごとに異なる位置に音像を発生させる方式につ
いて提案している。この方式を用いても対向する会議室
間の複数話者相互通信への拡張が１可能であり、その応
用例としては各送話者に送話者の番号をあらかじめ付与
することによって、送話者となった音声信号と同時にそ
の話者の番号を同時伝送し、受聴側では各拡声器にその
送話者番号があらかじめ設定されているので、その番号
に対応する拡声器に音声信号を発生させれば拡声器の設
置場所からその送話者の音声を受聴することになる。This method proposes a method in which caller terminals are placed at multiple locations and sound images are generated at different positions for each caller terminal. Even if this method is used, it is possible to extend communication between multiple speakers between opposing conference rooms.An example of its application is to assign a caller's number to each speaker in advance. At the same time, the number of the speaker is transmitted at the same time as the voice signal of the speaker, and since the number of the speaker is preset in each loudspeaker on the listening side, the voice signal is generated to the loudspeaker corresponding to that number. If you do so, you will be able to hear the speaker's voice from where the loudspeaker is installed.

[Problem to be solved by the invention]

しかし、一つの会議室に複数の話者が同時に在席し、複
数のマイクで集音する会議通話方式において、該当のマ
イクに送話者の音声だけが集音するとはかぎらない。こ
の理由として各話者ごとにマイクが割り当てられたとし
ても、室内の音響条件やマイクと送話者の配置によ−っ
て他のマイクに話者の音声の影響を受けるからである。However, in a conference call system in which multiple speakers are present in one conference room at the same time and multiple microphones are used to collect sound, only the voice of the speaker is not necessarily collected by the corresponding microphone. The reason for this is that even if a microphone is assigned to each speaker, the speaker's voice may be influenced by other microphones depending on the acoustic conditions in the room or the arrangement of the microphone and the speaker.

従って、従来の音声会議やテレビ会議の、Ｊ″うな遠隔
会議通信を行う会議通話装置には以下の欠点があった。Therefore, conventional conference call devices that perform remote conference communications such as J'' in audio conferences and video conferences have the following drawbacks.

（１）該当のマイクだけに送話者の音声を集音すること
は室内の音響条件を考慮すれば困蕪なことから、話者ご
とにマイクを設置して、そのマイクに人力された音声だ
けが特定の話者の番号として定められないこと。(1) Since it is difficult to collect the speaker's voice only into the corresponding microphone, considering the acoustic conditions in the room, a microphone is installed for each speaker, and the human voice is collected by the microphone. Only the number of a particular speaker cannot be determined.

（２）　　この（１）の欠点を防止するために超指向性
のマイクを使用することも考慮されるが、非常に高価で
あり、また、話者が送話中に動いたりすると、その音量
レベルが変化したり、さらには音圧を検出できなくなる
こと。(2) In order to prevent the disadvantage of (1), it is considered to use a super-directional microphone, but it is very expensive and the volume may be affected if the speaker moves while talking. Changes in level or even inability to detect sound pressure.

本発明の目的は、前記の欠点を除去することにより、一
つの会議室内において複数の話者が同時に送話者となっ
た場合や単独話者の場合、複数マイクからの音声に対す
る信号を処理し送話者に対応するマイクの位置を特定で
き、その結果、そのマイクに番号を付与することにより
、受聴側での音像定位が可能となるばかりか、対応する
マイクだけを動作させ通話系での信号対雑音比を増大さ
せることができる会議通話装置を提供することにある。An object of the present invention is to eliminate the above-mentioned drawbacks, and to process signals for voices from multiple microphones when multiple speakers are simultaneously transmitting or when a single speaker is in one conference room. By identifying the location of the microphone corresponding to the caller and assigning a number to that microphone, it is possible not only to localize the sound image on the listening side, but also to enable only the corresponding microphone to operate during the call system. An object of the present invention is to provide a conference call device that can increase the signal-to-noise ratio.

[Means to solve the problem]

本発明は、各話者ごとに対応して設置位置がほぼ一列に
なるようにかつ各話者から一定距離を隔てて配置された
複数ｎのマイクと、各マイクから入力された各話者の音
声信号をディジタル音声信号に変換出力するディジタル
化手段と、このディジタル化手段から出力されるディジ
タル音声信号に対応する話者を特定する送話者符号信号
を生成出力する送話者符号信号生成手段とを備えた会議
通話装置において、前記送話者符号信号生成手段は、各
マイクからの音圧レベルを検出する複数ｎの音声検出手
段と、前記複数ｎの音声検出手段から出力される出力信
号の順番を検出しそれに対応する話者番号を記憶する話
者番号検出記憶１段と、配置されている第一番目と第ｎ
番目の二つのマイクから受信される音声信号の相互相関
関数を求める相互相関関数算出手段と、この相互相関関
数算出手段により算出された相互相関関数から相互を目
間関数が最大となる時間差を求める最大値検出手段と、
この最大値検出手段により算出された時間差と前記話者
番号検出記憶手段に記憶さた話者番号に対する話者の位
置との関係の一致を確認し、当該話者番号に対する送話
者符号信号を送出する確認手段とを含むことを特徴とす
る。The present invention has a plurality of n microphones arranged so that the installation positions are almost in a line corresponding to each speaker and a certain distance away from each speaker, and A digitizing means for converting and outputting an audio signal into a digital audio signal, and a speaker code signal generating means for generating and outputting a speaker code signal for identifying a speaker corresponding to the digital audio signal output from the digitizing means. In the conference call device, the caller code signal generation means includes a plurality of n voice detection means for detecting the sound pressure level from each microphone, and an output signal output from the plurality of n voice detection means. one stage of speaker number detection memory for detecting the order of speakers and storing the corresponding speaker numbers;
a cross-correlation function calculating means for calculating a cross-correlation function of the audio signals received from the second two microphones; and calculating a time difference at which the inter-eye function is maximum from the cross-correlation function calculated by the cross-correlation function calculating means. maximum value detection means;
It is confirmed that the time difference calculated by the maximum value detection means matches the position of the speaker for the speaker number stored in the speaker number detection storage means, and the speaker code signal for the speaker number is detected. and confirmation means for sending.

また、本発明は、相互相関関数算出手段は、ｎ個の全マ
イクをｍ個のマイクからなる小区間に分割し、ｍ区間ご
とに相互相関関数を求める手段であることができる。Further, in the present invention, the cross-correlation function calculation means can be a means for dividing all n microphones into small sections each consisting of m microphones, and calculating a cross-correlation function for each m section.

[Effect]

本発明における会議通話装置で、複数同時話者の位置を
検出する方式は、二つのマイクから受信される音声信号
の相互相関関数の最大値となる時間差と、音源すなわち
話者から各マイクまでの時間差に対応していることに基
づいている。In the conference call device of the present invention, the method of detecting the positions of multiple simultaneous speakers is based on the time difference that is the maximum value of the cross-correlation function of the audio signals received from two microphones, and the time difference between the sound source, that is, the speaker and each microphone. This is based on the fact that it corresponds to time differences.

ずなわぢ、音声検出手段で各マイクからの音圧レベルを
検出し、話者番号検出記憶手段により最初の話者位置か
ら順番に話者位置を検出しそれらの話者番号を記憶する
。そして、相互相関関数算出手段により第一番目と第ｎ
番目のマイク（またはｎ個のマイクをｍ個ずつの小区分
に分割して小区分内の両端のマイク）から受信される音
声信号の相互相関関数を求め、最大値検出手段により相
互相関関数が最大値となる時間差を求め、確認手段によ
り、この求められた時間差と前記話者番号記憶手段に記
憶された話者の位置との一致を確認し、当該話者番号に
対する送話者符号信号を送出する。The sound pressure level from each microphone is detected by the voice detection means, and the speaker positions are sequentially detected from the first speaker position by the speaker number detection and storage means, and the speaker numbers are stored. Then, the first and nth
The cross-correlation function of the audio signal received from the second microphone (or the microphones at both ends of each sub-section by dividing n microphones into m sub-sections) is calculated, and the cross-correlation function is determined by the maximum value detection means. A time difference having a maximum value is determined, and a confirmation means confirms that the determined time difference matches the position of the speaker stored in the speaker number storage means, and a speaker code signal for the speaker number is determined. Send.

従って、マイクに番号を付与することにより受聴側での
音像定位が可能となる。さらに、マイクをｍ区分するこ
とにより、対応するマイクだけを動作させ通話系での信
号対雑音比を増大させることが可能となる。Therefore, by assigning numbers to the microphones, it becomes possible to localize the sound image on the listening side. Furthermore, by dividing the microphones into m categories, it is possible to operate only the corresponding microphones to increase the signal-to-noise ratio in the communication system.

〔Example〕

以下、本発明の実施例について図面を診照して説明する
。Embodiments of the present invention will be described below with reference to the drawings.

始めに本発明の基本的な原理にってい説明し、その後で
具体的な実施例について説明する。First, the basic principle of the present invention will be explained, and then specific embodiments will be explained.

第１図は本発明の第一実施例（／、）９部を示す説明図
で、マイクの配置と話者の座席位置の関係を示す。FIG. 1 is an explanatory diagram showing part 9 of the first embodiment of the present invention (/,), showing the relationship between the arrangement of microphones and the seat position of the speaker.

ここで、Ｓｌ、　、Ｓｌ　％　　、Ｓ、、は話者であり
、これに対応したＭ　１　％　　、Ｍｉ　ｓ　　、Ｌは
マイクである。送話者とマイクの距離はｒとし、マイク
間の距離をｄとする。Here, Sl, , Sl % , S, , are speakers, and the corresponding M 1 % , Mi s , L are microphones. Let the distance between the speaker and the microphone be r, and the distance between the microphones be d.

第１図を用いて、本発明の基本的原理の説明を以下に行
う。いま、自由拡散音場であると仮定し１、単独話者の
場合と複数話者の場合について基本的な考察を行う。The basic principle of the present invention will be explained below using FIG. Now, assuming a free-diffusion sound field 1, we will make basic considerations regarding the case of a single speaker and the case of multiple speakers.

第２図はマイクＭ１とマイクＭｈ間の相互相関関数例を
示したものである。FIG. 2 shows an example of the cross-correlation function between microphone M1 and microphone Mh.

（１）単独話者（単一音源）の場合話者ＳＩの音源からマイクＭ１　とＭ、、に人力される
音圧波形Ｒｉ（ｔ）とＲｎ（ｔ）は、一般的に、で与え
られ、ここでＡ　（［１１＞は振幅線分を、φ（ＩＴＩ
）は位相成分を示す。(1) In the case of a single speaker (single sound source) The sound pressure waveforms Ri(t) and Rn(t) humanly input from the sound source of the speaker SI to the microphones M1 and M, , are generally given by , where A ([11> is the amplitude line segment, φ(ITI
) indicates the phase component.

相互相関関数Ｃｃ、（ｔ、τ）は、で与えられる。The cross-correlation function Cc, (t, τ) is is given by

話者別に対応するマイクＭ１とＭ、、の相互相関関数Ｃ
ｃ、（ｔ、　　τ）が最大となる時間差τは、ｒ＝ＴＩ
Ｉ　Ｔｉｎである。すなわち、第１図において、１ｄ＝ｎｄ／２のとき、いい換えれば、Ｌ、、−１、。Cross-correlation function C of microphones M1 and M, , corresponding to each speaker
The time difference τ at which c, (t, τ) is maximum is r = TI
It is ITin. That is, in FIG. 1, when 1d=nd/2, in other words, L, -1.

のとき、 ■目−Ｔｉｎとなり、ｒ＝Ｑとなり、第２図に示した点線のような相互相関関数が得
られる。従って、話者軸の音源位置はマイクＭ１とマイ
クＭ、の位置の中央にあることが判明し、そのとき、式
（１）が最大値（極大値が一つの場合）となる。When , then -Tin, r=Q, and a cross-correlation function like the dotted line shown in FIG. 2 is obtained. Therefore, it turns out that the sound source position on the speaker axis is at the center of the positions of microphones M1 and M, and at that time, equation (1) takes the maximum value (in the case where there is one local maximum value).

（２）複数の音源がある場合（特に２音源の場合）話者
Ｓｌと話者Ｓ、の各音源からマイクＭ、とＭ７に人力さ
れる音圧波形Ｒｌ　（ｔ）とＲ，（ｔ）　とは、”’　
　Ｌ。(2) When there are multiple sound sources (especially in the case of two sound sources), the sound pressure waveforms Rl (t) and R, (t) that are manually input from the sound sources of speakers Sl and S to microphones M and M7 What is "'?
L.

音圧波形Ｒｌ　（ｔ）とＲ、（ｔ）の相関関数Ｃｃｒ（
ｔ、　ｒ）は、で与えられる。Correlation function Ccr(
t, r) is given by.

話者Ｓｌと話者Ｓ、の各音源に対するマイクＭ。Microphone M for each sound source of speaker Sl and speaker S.

とＭ、との相互相関関数ｃ　ｃｔ　（ｔ、　　τ）が最
大値となる第一項目の時間差の値は τ＝　Ｔ’　ｉ　１　　Ｔ　ｉ　ｈ第二項目のその値は τ−ＴＪＩ　　Ｔｊ・である。The value of the time difference in the first term where the cross-correlation function c ct (t, τ) between and M is the maximum value is τ = T' i 1 T i h The value of the second term is τ - TJI Tj be.

いま、音速をＣとし、話者ＳＬとマイクＭ１の距離をり
、とすれば、話者８、の音源からマイクＭ、に到達する
遅延時間１゛１．は、Ｔ　Ｉ　ｌ　＝　Ｌ　Ｌ　ｌ　／　ｃ　＝　ＥＰＴ〒漏
Ｆ／　Ｃで与えられる。Now, if the speed of sound is C and the distance between the speaker SL and the microphone M1 is, then the delay time from the sound source of the speaker 8 to the microphone M is 1゛1. is given by T I l = L L l / c = EPT〒F/C.

複数話者の音源がある場合についても同様に式（２）か
らも判るように、複数の最大値、すなわち極大値となる
時間差が音源から各マイクまでの時間差に対応している
から、音源の位置が時間差から判明することとなる。第
２図の実線は話者ＳＩ　と話者Ｓ、が中央の位置から対
象に座席し、同時に話者となった例を示している。すな
わち、（ＬｉＬＬ・幽・）−１−・ｊｌ　　Ｌｊ・の場
合である。Similarly, when there is a sound source from multiple speakers, as can be seen from equation (2), the time difference between the multiple maximum values, that is, the local maximum value, corresponds to the time difference from the sound source to each microphone. The location will be determined from the time difference. The solid line in FIG. 2 shows an example in which speakers SI and S sit in the center and become speakers at the same time. That is, this is the case of (LiLL・Yu・)−1−・jl Lj・.

以上は自由音場すなわち残響時間のない室内での動作概
要ではあるが、残響時間がある場合も残響は信号のイン
パルス応答の初期応答部分には影響がなく、後続部分に
影響があるので、本発明の方式に本質的な影響は与えな
い。また、拡散性の室内騒音の場合については時間的に
無相関であるから、式（２）が成立（７、雑音に対して
も強い方式となる。The above is an outline of the operation in a free sound field, that is, a room with no reverberation time, but even if there is a reverberation time, reverberation does not affect the initial response part of the signal impulse response, but affects the subsequent part, so the main point is There is no essential influence on the method of the invention. In addition, in the case of diffuse indoor noise, there is no temporal correlation, so equation (2) holds true (7), making it a robust method against noise.

第３図は本第−実施例の具体的なブロック構成図である
。本第−実施例は、各話者Ｍ、〜Ｍ、、ごとに対応して
設置位置がほぼ一列になるようにかつ各話者Ｍ１〜Ｍ、
から一定距離を隔てて配置された複数ｎのマイク１ａ〜
１ｄと、各マイクｌａ〜Ｉｄから入力された各話者Ｍ＋
　””−Ｍ−の音声信号をディジタル音声信号１７に変
換出力するディジタル化手段としての、アナログディジ
タル変換器２、遅延回＃１４ａ　〜１４ｄ、ならびに論
理積回路１５ａ〜１５ｄと、このディジタル化手段から
出力されるディジタル音声信号１７に対応する話者を特
定する送話者符号信号１８を生成出力する送話者符号信
号生成手段を備えた会議通話装置において、本発明の特
徴とするところの、前記送話者符号信号生成手段は、各マイク１ａ〜１ｄか
らの音圧レベルを検出する複数ｎの音声検出手段として
の音声検出回路３ａ〜３ｄと、音声検出手段３８〜３ｄ
から出力される出力信号から第一話者を検出しそれに対
応する話者番号を記憶する話者番号検出記憶手段として
の第一話者検出回路４および番号記憶回路５と、配置さ
れている第一番目と第ｎ番目の二つのマイク１ａと１ｄ
とから受信される音声信号の相互相関関数を求める相互
相関関数算出手段としての、シフトレジスタ回路６ａお
よび６ｂ、遅延時間掃引回路７、可変遅延回路８ａおよ
び３ｂ、ならびに乗算回路９と、この相互相関関数算出
手段により算出された相互相関関数から相互相関関数が
最大となる時間差を求める最大値検出手段としての積分
回路１０および最大値検出回路１１と、この最大値検出
回路１１により算出された時間差と番号記憶回路５に記
憶さた話者番号に対する話者の位置との関係の一致を＊
認し当該話者番号に対する送話者符号信号１８を送出す
る確認手段とし、ての位置検出回路１２および一致回路
１３とを含んでいる。FIG. 3 is a concrete block diagram of the present embodiment. In the present embodiment, the installation positions are arranged almost in a row corresponding to each speaker M, to M, and each speaker M1 to M,
A plurality of n microphones 1a arranged at a certain distance from
1d, and each speaker M+ input from each microphone la to Id.
Analog-to-digital converter 2, delay circuits #14a to 14d, and AND circuits 15a to 15d, which serve as digitizing means for converting and outputting the audio signal of ""-M- to a digital audio signal 17, and from this digitizing means. In a conference call device equipped with a speaker code signal generating means for generating and outputting a speaker code signal 18 for specifying a speaker corresponding to an output digital audio signal 17, the present invention is characterized by the following: The speaker code signal generation means includes a plurality of voice detection circuits 3a to 3d as voice detection means for detecting the sound pressure level from each of the microphones 1a to 1d, and voice detection means 38 to 3d.
A first speaker detection circuit 4 and a number storage circuit 5 as speaker number detection and storage means for detecting the first speaker from an output signal output from the first speaker and storing the corresponding speaker number; First and nth two microphones 1a and 1d
The shift register circuits 6a and 6b, the delay time sweep circuit 7, the variable delay circuits 8a and 3b, and the multiplication circuit 9 as cross-correlation function calculation means for calculating the cross-correlation function of the audio signal received from the An integrating circuit 10 and a maximum value detection circuit 11 as maximum value detection means for determining the time difference at which the cross-correlation function is maximum from the cross-correlation function calculated by the function calculation means, and the time difference calculated by the maximum value detection circuit 11. The correspondence between the speaker number stored in the number storage circuit 5 and the speaker's position is determined *
The system includes a position detection circuit 12 and a matching circuit 13 as confirmation means for transmitting a speaker code signal 18 corresponding to the speaker number.

次に、本第−実施例の動作について説明する。Next, the operation of the present embodiment will be explained.

まず、説明を簡単にするために、第１図において、話者
がＳＩおよびＳＪであるとし、話者Ｓ１が最初に話者と
なり、話者Ｓ４が二番目の話者であるとする。マイクＭ
ｉ　には第１図から明らかなように他の各マイクより逸
早く音声信号が到達する。この結果、各音声検出回路３
ａ〜３ｄは音圧レベルを検出する時刻が一番早いのは、
第３図の音声検出間￥８３ｂであり、これを第一話者と
して第一話者検出回路４で検出する。ここで、番号記憶
回路５の一番若番にマイク番号を書き込む。First, to simplify the explanation, in FIG. 1, assume that the speakers are SI and SJ, that speaker S1 is the first speaker, and that speaker S4 is the second speaker. Mike M
As is clear from FIG. 1, the audio signal reaches i earlier than the other microphones. As a result, each voice detection circuit 3
For a to 3d, the earliest time to detect the sound pressure level is
The voice detection period in FIG. 3 is ¥83b, and this is detected by the first speaker detection circuit 4 as the first speaker. Here, the microphone number is written in the smallest number in the number storage circuit 5.

次に、通話者軸が話者となった場合、やはり音声検出回
路３Ｃで音圧レベルを検出することとなり、これを第二
番目の話者として番号記憶回路５に記憶させる。当然、
このとき、話者Ｓｉ　が受聴者となった場合には、音声
検出回路３ｂの音圧レベルが減少するので、第一話者と
して話者Ｓ。Next, when the caller axis becomes the speaker, the sound pressure level is also detected by the voice detection circuit 3C, and this is stored in the number storage circuit 5 as the second speaker. Of course,
At this time, if the speaker Si becomes the listener, the sound pressure level of the voice detection circuit 3b decreases, so the speaker S becomes the first speaker.

が番号記憶回路５の番地が書き込まれることになる。従
って、第一話者は常に番号記憶回路５の内容の若番を占
有しており、全員が受聴者となった場合には、この番号
記憶回路５にはマイクの番号は書き込まれず、クリア状
態となっている。従って、番号記憶回路５の内容を調べ
れば常に音声検出回路３ａ〜３ｄて検出された第一話者
、第二話者が行番号順に登録されていることになる。The address of the number storage circuit 5 will be written. Therefore, the first speaker always occupies the lowest number in the number memory circuit 5, and when all the speakers become listeners, the microphone number is not written in the number memory circuit 5 and is in a clear state. It becomes. Therefore, if the contents of the number storage circuit 5 are examined, the first and second speakers detected by the voice detection circuits 3a to 3d are always registered in the order of their line numbers.

方、マイクＭ１　とマイクＭ。からの音声信号の相関を
とるために、ある時間区間の音声信号を記憶しておくシ
フトレジスタ回路６ａおよび６ｂに音声信号が人力され
る。最大の遅延時間差を示す話者とマイクの距離差、す
なわち第２図において、可変遅延回路８ａおよび８ｂの
最大遅延時間ｒｌｌａＭは話者Ｓ、からマイクＭ、まで
の距離と話者Ｓ１からマイクＭｈまでの距離差を音速で
割算した時間である。ここで可変遅延回路８ａと８ｈと
は互いに連動しており、双方の遅延時間が０のとき、式
（２）の相互相関関数のτが０を示している。On the other hand, Mike M1 and Mike M. In order to correlate the audio signals from the input terminal, the audio signals are manually input to shift register circuits 6a and 6b which store audio signals of a certain time interval. The distance difference between the speaker and the microphone showing the maximum delay time difference, that is, the maximum delay time rllaM of the variable delay circuits 8a and 8b in FIG. This is the time calculated by dividing the distance difference by the speed of sound. Here, the variable delay circuits 8a and 8h are interlocked with each other, and when both delay times are 0, τ of the cross-correlation function in equation (2) indicates 0.

可変遅延関数８ａが遅延時間を掃引している場合は可変
遅延関数８ｈの遅延時間は０で静止しておリ、逆に可変
遅延関数８ｂが遅延時間を掃引している場合は可変遅延
関数８ａの遅延時間は０で静止している。ずなわち、−
τ１，８のときは可変遅延回路８ａの遅延時間量が最大
で、可変遅延量τは減少して０の値になったら、次に可
変遅延回路８ｈの遅延時間＠Ｔが増大し、最大遅延時間
十τ□９まで掃引するように遅延時間掃引関数７で指令
する。When the variable delay function 8a sweeps the delay time, the delay time of the variable delay function 8h remains at 0. Conversely, when the variable delay function 8b sweeps the delay time, the variable delay function 8a The delay time is 0 and it is stationary. Zunai, -
When τ1, 8, the delay time amount of the variable delay circuit 8a is maximum, and when the variable delay amount τ decreases to a value of 0, the delay time @T of the variable delay circuit 8h increases, and the maximum delay is reached. A command is given using the delay time sweep function 7 to sweep up to a time of 10τ□9.

二つの可変遅延回路８ａおよび８ｂの出力音声信号を乗
算回路９で乗算して、時間間隔Ｔの積分を行う積分回路
１０を経て、その積分値を監視し、比較機能を有した最
大値検出回路１１で前の値より低くなったら、そのとき
の遅延時間掃引回路７の遅延時間に対応するマイク位置
を記憶している位置検出回路１２に伝達し、そのマイク
の位置情報を、番号記憶回路５の出力信号と一致回路１
３とで照合を行う。A maximum value detection circuit that multiplies the output audio signals of the two variable delay circuits 8a and 8b in a multiplier circuit 9, passes through an integration circuit 10 that performs integration over a time interval T, monitors the integrated value, and has a comparison function. When the value becomes lower than the previous value in step 11, the microphone position corresponding to the delay time of the delay time sweep circuit 7 at that time is transmitted to the position detection circuit 12 which stores the position, and the position information of the microphone is transmitted to the number storage circuit 5. Output signal and matching circuit 1
Verify with 3.

さらに、各マイクＭ、、、９Ｍ、、、、Ｍｊ、、、Ｍｎ
から出力された音声信号は、音声検出に必要な処理遅延
時間や相互相関関数の極大値を検出するのに必要な処理
時間分だけの遅延時間を有した遅延回路群１４ａ〜１４
ｄを介して、一致回路１３の出力から得られたマイク位
置情報によって、該当のマイクのみの音声信号だけを論
理積回路１５ａ〜１５ｄによって通過させ、さらに、複
数の話者の状態のときは加算回路１６によって加算され
、ディジタル音声信号１７を得る。また、話者Ｓに対応
したマイクＭの信号も送話者符号信号１８として出力さ
れる。Furthermore, each microphone M, , 9M, , , Mj, , Mn
The audio signal outputted from the delay circuit groups 14a to 14 has a delay time equal to the processing delay time necessary for audio detection and the processing time necessary to detect the maximum value of the cross-correlation function.
Based on the microphone position information obtained from the output of the matching circuit 13 via d, only the audio signal of the corresponding microphone is passed through the AND circuits 15a to 15d, and further, when there are multiple speakers, addition is performed. They are summed by a circuit 16 to obtain a digital audio signal 17. Further, the signal from the microphone M corresponding to the speaker S is also output as the speaker code signal 18.

ところで、話者Ｓ、がマイクＭ、、とが音圧が検出でき
ないほど離れていると、話者ＳＩが端にあるマイクＭ１
の入力は音声信号として充分な音圧レベルに達しない場
合がある。By the way, if speaker S is so far away from microphone M that the sound pressure cannot be detected, speaker SI is far away from microphone M1, which is at the end.
The input may not reach a sufficient sound pressure level as an audio signal.

第４図に示す本第二実施例は、信号対雑音比量を増大す
るようにマイク位置と話者位置の関係を定めたものであ
る。In the second embodiment shown in FIG. 4, the relationship between the microphone position and the speaker position is determined so as to increase the signal-to-noise ratio.

すなわち、マイクＭ、とマイクＭｂの区間の相互相関関
数の演算、マイクＭ、とマイクＭ。の区間の相互相関関
数の演算、およびマイクＭｄとマイクＭ、の区間の相互
相関関数の演算をそれぞれ独立に行うことにより、信号
対雑音比の量を太きく取れることになるので、確度の高
い話者位置検出が可能となる。That is, calculation of the cross-correlation function between microphone M and microphone Mb, microphone M and microphone M. By independently calculating the cross-correlation function in the interval between and the cross-correlation function in the interval between microphone Md and microphone M, it is possible to obtain a large signal-to-noise ratio, resulting in high accuracy. It becomes possible to detect the speaker's position.

具体的な動作ブロック図の説明で第３図と異なるところ
は、シフトレジスタ回路６ａおよび６ｂ。In the explanation of the specific operational block diagram, the difference from FIG. 3 is the shift register circuits 6a and 6b.

遅延時間掃引回路７、可変遅延回路８ａおよび８ｈ、乗
算回路９、積分回路１０、最大値検出回路１１、お、Ｌ
び位置検出回路１２が複数構成で動作するようにすれば
よい。Delay time sweep circuit 7, variable delay circuits 8a and 8h, multiplication circuit 9, integration circuit 10, maximum value detection circuit 11, O, L
The position detection circuit 12 may be configured to operate in a plurality of configurations.

また、第５図は本発明の第三実施例の要部を示す説明図
で、本発明をテレビ会議に使用される弓形構造机に適用
した場合のマイク位置と話者位置の関係を示す。これま
ではマイクの列を直線で説明したが、マイクの列が中形
でも本質的な動作は同じである。Further, FIG. 5 is an explanatory view showing the main part of a third embodiment of the present invention, and shows the relationship between the microphone position and the speaker position when the present invention is applied to a desk with an arcuate structure used for a video conference. So far, we have explained the rows of microphones in terms of straight lines, but the essential operation is the same even if the rows of microphones are medium-sized.

〔Effect of the invention〕

以上説明したように、本発明は、複数のマイクに人力さ
れる第一の話者を検出し、また二つのマイクの入力音声
信号の相互相関関数が最大となる時間差を求めることに
よって話者の位置が判明するので、音声検出手段と照合
することにより、同時に複数話者が存在する会話状態で
も話者となっている該当のマイクの位置を検出すること
ができるので、会議室内において複数の話者が同時に送
話者となった場合や単独話者の場合、そのマイクに洲号
を付与することにより、受聴側での音像定位が可能とな
るばかりか、対応するマイクだけを動作させ通話系での
信号対雑音比を増大させることができる効果がある。As explained above, the present invention detects the first speaker who is manually input to a plurality of microphones, and also determines the time difference at which the cross-correlation function of the input audio signals of the two microphones is maximum. Since the position is known, by comparing it with the voice detection means, it is possible to detect the position of the corresponding microphone of the speaker even in a conversation state where there are multiple speakers at the same time. When a person is simultaneously a transmitter or a single speaker, by assigning a name to the microphone, it not only becomes possible to localize the sound image on the listening side, but also activates only the corresponding microphone to improve the communication system. This has the effect of increasing the signal-to-noise ratio.

[Brief explanation of the drawing]

第１図は本発明の第一実施例の要部を示す説明図。第２図はその二つのマイク間の相互相関関数の一例を示
す特性図。第３図はその具体的なブロック構成図。第４図は本発明の第二実施例の要部を示す説明図。第５図は本発明の第三実施例の要部を示す説明図。１ａ〜１ｄ・・・マイク、２・・・アナログディジタル
変換器、３ａ〜３ｄ・・・音声検出回路、４・・・第一
話者検出回路、５・・・番号記憶回路、６ａ　、６ｂ・
・・シフトレジスタ回路、７・・・遅延時間掃引回路、
８ａ。８ｂ・・・可変遅延回路、９・・・乗算回路、１０・・
・積分回路、１１・・・最大値検出回路、１２・・・位
置検出回路、１３・・・一致回路、１４ａ〜１４ｄ・・
・遅延回路、１５ａ〜１５ｄ・・・論理積回路、１６・
・・加算回路、１７・・・ディジタル音声信号、１８・
・・送話者符号信号、Ｍ、−Ｍ、・・・マイク、Ｓ、〜
Ｓｈ・・・話者。特許出願人　　日本電信電話株式会社代理人　　弁理士　井　出　直　孝禿−芙基１没バマイク決シｌと“話遺イ立Ｉ）３１　回 −を階ｍａｘ７啼亮−芙全例（相１相関関数分」）亮　２　図FIG. 1 is an explanatory diagram showing the main parts of a first embodiment of the present invention. FIG. 2 is a characteristic diagram showing an example of the cross-correlation function between the two microphones. FIG. 3 is a detailed block configuration diagram thereof. FIG. 4 is an explanatory diagram showing main parts of a second embodiment of the present invention. FIG. 5 is an explanatory diagram showing main parts of a third embodiment of the present invention. 1a to 1d... Microphone, 2... Analog-digital converter, 3a to 3d... Voice detection circuit, 4... First speaker detection circuit, 5... Number storage circuit, 6a, 6b.
...Shift register circuit, 7...Delay time sweep circuit,
8a. 8b... variable delay circuit, 9... multiplication circuit, 10...
・Integrator circuit, 11... Maximum value detection circuit, 12... Position detection circuit, 13... Matching circuit, 14a to 14d...
・Delay circuit, 15a to 15d...AND circuit, 16・
... Addition circuit, 17... Digital audio signal, 18.
...Speaker code signal, M, -M, ...Microphone, S, ~
Sh...Speaker. Patent Applicant Nippon Telegraph and Telephone Corporation Agent Patent Attorney Nao Ide Correlation function”) Ryo 2 Figure

Claims

[Scope of Claims] 1. A plurality of speakers arranged corresponding to each speaker so that the installation positions are almost in a line and separated from each speaker by a certain distance.
, a digitizing means for converting and outputting the audio signals of each speaker inputted from each microphone into digital audio signals, and a transmitter for identifying the speaker corresponding to the digital audio signal output from the digitizing means. In the conference call device, the caller code signal generation means includes a plurality of n voice detection means for detecting the sound pressure level from each microphone. , speaker number detection storage means for detecting the order of output signals output from the plurality of n voice detection means and storing the corresponding speaker numbers; and two arranged first and nth voice detection means. cross-correlation function calculation means for calculating a cross-correlation function of an audio signal received from the microphone; maximum value detection means for calculating a time difference at which the cross-correlation function is maximum from the cross-correlation function calculated by the cross-correlation function calculation means; It is confirmed that the time difference calculated by the maximum value detection means matches the position of the speaker for the speaker number stored in the speaker number detection storage means, and the speaker code signal for the speaker number is detected. 1. A conference call device comprising: confirmation means for transmitting a message. 2. The conference call device according to claim 1, wherein the cross-correlation function calculation means divides all n microphones into small sections each consisting of m microphones, and calculates the cross-correlation function for each m section.