JP4804014B2

JP4804014B2 - Audio conferencing equipment

Info

Publication number: JP4804014B2
Application number: JP2005047369A
Authority: JP
Inventors: 弘美青柳; 稔智稲葉
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2005-02-23
Filing date: 2005-02-23
Publication date: 2011-10-26
Anticipated expiration: 2025-02-23
Also published as: JP2006237839A

Description

本発明は音声会議装置に関し、例えば、多人数二地点間でのステレオ音声信号による音声会議装置及びシステムに適用し得るものである。 The present invention relates to an audio conference apparatus, and can be applied to, for example, an audio conference apparatus and system using a stereo audio signal between two points of a large number of people.

企業内ＬＡＮやＡＤＳＬなどの広帯域なブロードバンドＩＰ回線の普及により、多人数二地点間（二つのグループが別々の場所にいる）のテレビ会議システムが普及してきている。このようなテレビ会議システムに適用される音声会議システムとして、非特許文献１に記載のようなＶｏＩＰに対応したものも提案されている。この音声会議システムは、スピーカとマイクロフォンとを有する端末を利用したものである。
ＮＥＣ技報ｖｏｌ．５６Ｎｏ１２／２００３ With the widespread use of broadband broadband IP lines such as corporate LAN and ADSL, video conferencing systems between two points (two groups are in different locations) have become widespread. As an audio conference system applied to such a video conference system, a system compatible with VoIP as described in Non-Patent Document 1 has been proposed. This audio conference system uses a terminal having a speaker and a microphone.
NEC Technical Report vol. 56 No12 / 2003

しかしながら、上述の非特許文献１の音声会議システムでは、話者を特定すること（誰がしゃべっているか）が困難であるという問題があった。 However, the audio conference system of Non-Patent Document 1 described above has a problem that it is difficult to identify a speaker (who is speaking).

このような不都合を解決するため、スピーカとマイクロフォンとをそれぞれ、一対設置してステレオ対応とすることが考えられる。しかしながら、音声会議システムの場合、多人数に聴取させるべく、スピーカの音量を大きくしているため、スピーカとマイクロフォンとを１個ずつ設けた場合でもエコーの問題が大きく、スピーカとマイクロフォンとをステレオ対応にした場合には、エコーの問題は一段と大きくなって、エコーキャンセラの構成が複雑、高価となってしまう。 In order to solve such inconvenience, it is conceivable to install a pair of speakers and microphones so as to be stereo-compatible. However, in the case of an audio conference system, since the volume of the speaker is increased so that a large number of people can listen, even if one speaker and one microphone are provided, the problem of echo is large, and the speaker and microphone are compatible with stereo. In this case, the problem of echo is further increased, and the configuration of the echo canceller becomes complicated and expensive.

そのため、話者特定問題とエコー問題とを共に解決し得る音声会議装置が望まれている。 Therefore, an audio conference apparatus that can solve both the speaker identification problem and the echo problem is desired.

第１の本発明は、最大Ｎ人（Ｎは３以上）の参加者に対応できる、会議に係る２地点のそれぞれの地点に設けられる音声会議装置において、（１）各参加者の口元近傍に設けられる指向性のモノラルマイクロフォンと、（２）対応するモノラルマイクロフォンからのモノラル音声信号を、当該モノラル／ステレオ変換手段によって定まる方向性が付加されたＲチャンネル音声信号及びＬチャンネル音声信号でなるステレオ音声信号に変換するモノラル／ステレオ変換手段と、（３）上記各モノラル／ステレオ変換手段からのＲチャンネル音声信号を合成して合成Ｒチャンネル音声信号を得ると共に、上記各モノラル／ステレオ変換手段からのＬチャンネル音声信号を合成して合成Ｌチャンネル音声信号を得、合成Ｒチャンネル音声信号及び合成Ｌチャンネル音声信号を対向する音声会議装置に送信するステレオ音声送信手段と、（４）Ｒチャンネル用のスピーカ及びＬチャンネル用のスピーカと、（５）対向する音声会議装置からの合成Ｒチャンネル音声信号をＲチャンネル用の上記スピーカに与えると共に、対向する音声会議装置からの合成Ｌチャンネル音声信号をＬチャンネル用の上記スピーカに与える受信分配手段とを有することを特徴とする。 The first aspect of the present invention is an audio conference apparatus provided at each of two locations related to a conference that can accommodate up to N participants (N is 3 or more). (1) In the vicinity of each participant's mouth (1) stereo sound composed of an R channel audio signal and an L channel audio signal to which a direction determined by the monaural / stereo conversion means is added to a monaural audio signal from the corresponding monaural microphone; Monaural / stereo conversion means for converting into signals , and (3) combining the R channel audio signals from each of the monaural / stereo conversion means to obtain a synthesized R channel audio signal, and Synthesize the channel audio signal to obtain the synthesized L channel audio signal, Stereo audio transmission means for transmitting an L channel audio signal to the opposing audio conference device, (4) an R channel speaker and an L channel speaker, and (5) a synthesized R channel audio signal from the opposing audio conference device And receiving and distributing means for supplying the L channel speaker with a synthesized L channel audio signal from the opposite audio conference apparatus.

本発明の音声会議装置によれば、ステレオ音声信号を送受信するに際して、参加者毎のヘッドフォンを利用したり、又は及び、参加者毎の指向性モノラルマイクロフォンを利用したりすることにより、話者を特定できると共にエコーの発生を抑圧できるようになる。 According to the audio conferencing apparatus of the present invention, when transmitting and receiving a stereo audio signal, a speaker can be selected by using headphones for each participant or using a directional monaural microphone for each participant. This makes it possible to identify and suppress the occurrence of echoes.

（Ａ）第１の実施形態
以下、本発明の音声会議装置を、多人数二地点間の音声会議システムに適用した第１の実施形態を、図面を参照しながら説明する。 (A) 1st Embodiment Hereinafter, the 1st Embodiment which applied the audio conference apparatus of this invention to the audio conference system between many people two points is described, referring drawings.

図１は、第１の実施形態の音声会議装置の概略構成を示している。かかる構成の音声会議装置１０が対向している２地点にそれぞれ設けられることにより、第１の実施形態の音声会議システムが構成される。 FIG. 1 shows a schematic configuration of the audio conference apparatus according to the first embodiment. The audio conference system according to the first embodiment is configured by providing the audio conference apparatus 10 having such a configuration at two opposing points.

音声会議装置１０は、自地点からのステレオ音声信号を送信する送信構成と、他地点捕捉のステレオ音声信号を受信する構成とからなっている。 The audio conference apparatus 10 has a transmission configuration for transmitting a stereo audio signal from its own location and a configuration for receiving a stereo audio signal captured at another location.

音声会議装置１０の送信構成は、一対のマイクロフォン２１Ｒ、２１Ｌと、音声多重送信部２２とを有する。 The transmission configuration of the audio conference apparatus 10 includes a pair of microphones 21R and 21L and an audio multiplex transmission unit 22.

許容し得る最大人数（Ｎ人）の会議参加者１−１〜１−Ｎ（図１の例ではＮは４）が所定の直線又は曲線（例えば円弧）上に位置した場合でも、どの参加者の音声も十分にしかも対向するステレオ発音で位置の相違が生じるように、一対のマイクロフォン２１Ｒ、２１Ｌが設置されている。設置は固定設置に限定されるものではないが、一対のマイクロフォン２１Ｒ及び２１Ｌ間の距離が、左右の耳間の平均的な距離程度になっていることが好ましい。 Even if the maximum allowable number (N) of conference participants 1-1 to 1-N (N in the example of FIG. 1 is 4) is located on a predetermined straight line or curve (for example, arc), any participant A pair of microphones 21R and 21L is installed so that the position of the sound is sufficiently different and the stereo sound is opposed to each other. Installation is not limited to fixed installation, but the distance between the pair of microphones 21R and 21L is preferably about an average distance between the left and right ears.

音声多重送信部２２は、マイクロフォン２１ＲからのＲチャンネル音声信号及びマイクロフォン２１ＬからのＬチャンネル音声信号を多重して得たステレオ音声信号を、対向する音声会議装置に送信するものである。ここで、多重、送信方法は既存のいかなる方法を適用するようにしても良い。音声多重送信部２２は、例えば、Ｒチャンネル音声信号及びＬチャンネル音声信号をそれぞれ、デジタル信号に変換した後、符号化し、その後に多重するようにしても良く、さらに、使用するネットワークなどに合わせて、通信路変調などを施すようにしたり、パケット化したりしても良い。 The audio multiplex transmission unit 22 transmits a stereo audio signal obtained by multiplexing the R channel audio signal from the microphone 21R and the L channel audio signal from the microphone 21L to the opposing audio conference apparatus. Here, any existing method may be applied as the multiplexing and transmission method. For example, the audio multiplex transmission unit 22 may convert the R channel audio signal and the L channel audio signal into digital signals, encode them, and multiplex them thereafter, and further multiplex them according to the network to be used. Alternatively, channel modulation or the like may be performed or packetized.

音声会議装置１０の受信構成は、受信分配部３１と、各参加者用のヘッドフォン３２−１〜３２−Ｎとを有する。 The reception configuration of the audio conference apparatus 10 includes a reception distribution unit 31 and headphones 32-1 to 32 -N for each participant.

受信分配部３１は、対向する音声会議装置が送信したステレオ音声信号を、Ｒチャンネル音声信号及びＬチャンネル音声信号に分離すると共に、分離して得た一対のＲチャンネル音声信号及びＬチャンネル音声信号を、各参加者用のヘッドフォン３２−１、…、３２−Ｎに分配するものである。各参加者用のヘッドフォン３２−１、…、３２−Ｎへの分配機能を除けば、受信分配部３１は、上述した音声多重送信部２２の逆処理を行うものである。例えば、音声多重送信部２２が通信路変調を施すものであれば受信分配部３１は通信路復調を行い、音声多重送信部２２がパケットの組立てを行うものであれば受信分配部３１は受信パケットの分解を行い、音声多重送信部２２が音声信号の符号化を施すものであれば受信分配部３１は復号化を行う。 The receiving / distributing unit 31 separates the stereo audio signal transmitted by the opposing audio conferencing apparatus into the R channel audio signal and the L channel audio signal, and the pair of R channel audio signal and L channel audio signal obtained by the separation. , And the headphones 32-1, ..., 32-N for each participant. Except for the distribution function to the headphones 32-1,..., 32-N for each participant, the reception distribution unit 31 performs the reverse process of the voice multiplex transmission unit 22 described above. For example, if the voice multiplex transmission unit 22 performs channel modulation, the reception distribution unit 31 performs channel demodulation, and if the voice multiplex transmission unit 22 assembles packets, the reception distribution unit 31 receives a received packet. If the audio multiplex transmission unit 22 encodes the audio signal, the reception distribution unit 31 performs decoding.

なお、受信分配部３１は、ヘッドフォンジャックを有し、ヘッドフォンのプラグが挿入されていないジャックへは、一対のＲチャンネル音声信号及びＬチャンネル音声信号を分配しないものであっても良い。 The reception distributor 31 may have a headphone jack, and may not distribute a pair of R channel audio signals and L channel audio signals to a jack in which a headphone plug is not inserted.

また、上述した一部の機能を受信分配部３１ではなく、各ヘッドフォン３２−１、…、３２−Ｎに担うようにさせても良い。 Further, some of the functions described above may be assigned to the headphones 32-1,..., 32-N instead of the reception distributor 31.

各ヘッドフォン３２−１、…、３２−Ｎはそれぞれ、受信分配部３１から与えられたＲチャンネル音声信号及びＬチャンネル音声信号に係る音声を、該当する耳側の受聴部から発音出力させるものである。 Each of the headphones 32-1,..., 32 -N is for outputting the sound related to the R channel audio signal and the L channel audio signal given from the reception distribution unit 31 from the corresponding ear side listening unit. .

ここで、各ヘッドフォン３２−１、…、３２−Ｎとしては、周囲音を透過する構造のものであることが、同一地点の他の参加者の音声を聴取させることができて好ましい。 Here, it is preferable that each of the headphones 32-1,..., 32-N has a structure that allows ambient sounds to pass therethrough so that the voices of other participants at the same point can be heard.

以下、音声会議に供している一方の地点Ａの参加者の発音内容を、他方の地点Ｂの参加者が聴取する場合の動作を説明する。以下の説明においては、地点Ａの構成要素の符号には、その末尾に「Ａ」を付与し、地点Ｂの構成要素の符号には、その末尾に「Ｂ」を付与する。 Hereinafter, an operation when the participant at the other point B listens to the pronunciation of the participant at one point A provided for the audio conference will be described. In the following description, “A” is added to the end of the component of the point A, and “B” is added to the end of the reference of the component of the point B.

一方の地点Ａの参加者１−ｎＡが発言すると、その発音音声は、音声会議装置１０Ａの一対のマイクロフォン２１ＲＡ、２１ＬＡによって捕捉され、Ｒチャンネル音声信号及びＬチャンネル音声信号に変換されて音声多重送信部２２Ａに与えられる。音声多重送信部２２Ａにおいて、Ｒチャンネル音声信号及びＬチャンネル音声信号は多重され、得られたステレオ音声信号が、対向する音声会議装置１０Ｂに向けて送信される。 When the participant 1-nA at one point A speaks, the pronunciation sound is captured by the pair of microphones 21RA and 21LA of the audio conference apparatus 10A, converted into an R channel audio signal and an L channel audio signal, and audio multiplexed transmission. Is given to the part 22A. In the audio multiplex transmission unit 22A, the R channel audio signal and the L channel audio signal are multiplexed, and the obtained stereo audio signal is transmitted to the opposite audio conference apparatus 10B.

このような送信信号を、音声会議装置１０Ｂの受信分配部３１Ｂが受信する。受信分配部３１Ｂは、対向する音声会議装置１０Ａが送信したステレオ音声信号を、Ｒチャンネル音声信号及びＬチャンネル音声信号に分離すると共に、分離して得た一対のＲチャンネル音声信号及びＬチャンネル音声信号を、各参加者のヘッドフォン３２−１Ｂ、…、３２−ＮＢに分配する。 Such a transmission signal is received by the reception distributing unit 31B of the audio conference apparatus 10B. The reception distributor 31B separates the stereo audio signal transmitted by the opposing audio conference device 10A into an R channel audio signal and an L channel audio signal, and a pair of R channel audio signal and L channel audio signal obtained by the separation. Are distributed to the headphones 32-1B,..., 32-NB of each participant.

各参加者のヘッドフォン３２−１Ｂ、…、３２−ＮＢはそれぞれ、受信分配部３１Ｂから与えられたＲチャンネル音声信号及びＬチャンネル音声信号に対応する音声を、該当する耳側の受聴部から発音出力させる。 Each of the headphones 32-1B,..., 32-NB of each participant generates a sound corresponding to the R channel audio signal and the L channel audio signal given from the reception distribution unit 31B from the corresponding ear listening unit. Let

ヘッドフォン３２−１Ｂ、…、３２−ＮＢを装着している参加者１−１Ｂ〜１−ＮＢはそれぞれ、参加者１−ｎＡの発音音声をステレオで聴取するので、参加者１−ｎＡの位置を、他の参加者１−１Ａ〜１−ＮＡ（但し１−ｎＡを除く）と区別でき、他方の地点であっても、臨場感を感じつつ発音音声を聴取することができる。 Since each of the participants 1-1B to 1-NB wearing the headphones 32-1B,..., 32-NB listens to the sound of the participant 1-nA in stereo, the positions of the participants 1-nA are determined. It can be distinguished from the other participants 1-1A to 1-NA (except 1-nA), and even at the other point, it is possible to listen to the pronunciation sound while feeling a sense of reality.

上記第１の実施形態によれば、一対のマイクロフォンが捕捉したステレオ音声信号を他方の地点に送信しても音声信号の発音手段がヘッドフォンであるので、エコーが問題となることがなく、また、ステレオによる聴取であるので、話者や話者の位置なども聴取者が認識し易いものとなっている。このような効果を奏するための構成は、簡易、安価にすることができる。 According to the first embodiment, even if the stereo sound signal captured by the pair of microphones is transmitted to the other point, the sound signal is generated by the headphones, so that the echo does not become a problem. Since the listening is performed by stereo, the listener and the position of the speaker are easily recognized by the listener. The configuration for producing such an effect can be simplified and inexpensive.

（Ｂ）第２の実施形態
次に、本発明の音声会議装置を、多人数二地点間の音声会議システムに適用した第２の実施形態を、図面を参照しながら説明する。 (B) Second Embodiment Next, a voice conference apparatus of the present invention, the second embodiment applied to a voice conference system between multiplayer point-to-point, will be described with reference to the drawings.

図２は、第２の実施形態の音声会議装置の概略構成を示している。かかる構成の音声会議装置１１が対向している２地点にそれぞれ設けられることにより、第２の実施形態の音声会議システムが構成される。 FIG. 2 shows a schematic configuration of the audio conference apparatus according to the second embodiment. The audio conference system 11 according to the second embodiment is configured by providing the audio conference apparatus 11 having such a configuration at two points facing each other.

第２の実施形態の音声会議装置１１も、自地点からのステレオ音声信号を送信する送信構成と、他地点捕捉のステレオ音声信号を受信する構成とからなっている。 The audio conference apparatus 11 of the second embodiment also has a transmission configuration for transmitting a stereo audio signal from its own location and a configuration for receiving a stereo audio signal captured at another location.

第２の実施形態の音声会議装置１１の送信構成は、参加者毎のマイクロフォン２１―１〜２１−Ｎと、参加者毎の定位処理部４１−１〜４１−Ｎと、音声多重送信部２２とを有する。 The transmission configuration of the audio conference apparatus 11 according to the second embodiment includes microphones 21-1 to 21 -N for each participant, localization processing units 41-1 to 41-N for each participant, and an audio multiplexing transmission unit 22. And have.

参加者毎の１個のマイクロフォン２１−１〜２１−Ｎは、モノラル出力型のマイクロフォンであって、対応する参加者の口元の近傍に設けられる指向性が高いものである。マイクロフォン２１−１〜２１−Ｎを、例えば、ヘッドセットのような取り付け具を介して参加者の口元の近傍に設けるようにしても良く、また例えば、ピンマイクを適用し、参加者の衣服などにピンによって取り付けて参加者の口元の近傍に設けるようにしても良く、さらに、テーブルの上などに載置された取付けスタンドを利用して参加者の口元の近傍に設けるようにしても良い。各マイクロフォン２１−１〜２１−Ｎからのモノラル音声信号は、各マイクロフォン２１−１〜２１−Ｎに対応する定位処理部４１−１〜４１−Ｎに与えられるようになされている。 Each microphone 21-1 to 21-N for each participant is a monaural output type microphone and has high directivity provided in the vicinity of the corresponding participant's mouth. The microphones 21-1 to 21-N may be provided in the vicinity of the participant's mouth, for example, via an attachment such as a headset. For example, a pin microphone may be applied to the participant's clothes. It may be attached by a pin and provided near the participant's mouth, or may be provided near the participant's mouth using an attachment stand placed on a table or the like. The monaural audio signals from the microphones 21-1 to 21-N are given to the localization processing units 41-1 to 41-N corresponding to the microphones 21-1 to 21-N.

各定位処理部４１−１〜４１−Ｎはそれぞれ、対応するマイクロフォン２１−１〜２１−Ｎからのモノラル音声信号に対し、ＨＲＴＦを用いた既存の定位処理を実施して方向性を付加したステレオ音声信号（Ｒチャンネル音声信号及びＬチャンネル音声信号）を形成するものである。ここで、方向性は、参加者毎に予め固定的に設定しておく。なお、既存の定位処理については、例えば、特開２００２−２０９３００号公報や特開２００３−１０２０９９号公報に記載のものを適用し得る。各定位処理部４１−１〜４１−Ｎはそれぞれ、形成したステレオ音声信号を音声多重送信部２２に与える。 Each of the localization processing units 41-1 to 41-N performs stereo processing using HRTFs on the monaural audio signals from the corresponding microphones 21-1 to 21-N to add directionality. An audio signal (R channel audio signal and L channel audio signal) is formed. Here, the directionality is fixedly set in advance for each participant. As for the existing localization processing, for example, those described in JP-A-2002-209300 and JP-A-2003-102099 can be applied. Each of the localization processing units 41-1 to 41-N supplies the formed stereo audio signal to the audio multiplex transmission unit 22.

この第２の実施形態の音声多重送信部２２は、各定位処理部４１−１〜４１−Ｎからのステレオ音声信号を、Ｒチャンネル及びＬチャンネル別に合成し、その後、合成Ｒチャンネル音声信号及び合成Ｌチャンネル音声信号を多重し、得られたた合成ステレオ音声信号を、対向する音声会議装置に送信するものである。 The audio multiplex transmission unit 22 according to the second embodiment synthesizes the stereo audio signals from the localization processing units 41-1 to 41-N for each of the R channel and the L channel, and then combines the synthesized R channel audio signal and the synthesis. The L-channel audio signal is multiplexed and the resultant synthesized stereo audio signal is transmitted to the opposing audio conference apparatus.

第２の実施形態の音声会議装置１１の受信構成は、受信分配部３１と、Ｒチャンネル及びＬチャンネル用のスピーカ３２Ｒ、３２Ｌとを有する。 The reception configuration of the audio conference apparatus 11 according to the second embodiment includes a reception distribution unit 31 and speakers 32R and 32L for the R channel and the L channel.

受信分配部３１は、対向する音声会議装置が送信した合成ステレオ音声信号を、合成Ｒチャンネル音声信号及び合成Ｌチャンネル音声信号に分離すると共に、分離して得た一対の合成Ｒチャンネル音声信号及び合成Ｌチャンネル音声信号をそれぞれ、Ｒチャンネル及びＬチャンネル用のスピーカ３２Ｒ、３２Ｌに与えるものである。 The receiving / distributing unit 31 separates the synthesized stereo audio signal transmitted by the opposing audio conference device into a synthesized R channel audio signal and a synthesized L channel audio signal, and a pair of synthesized R channel audio signals and synthesized signals obtained by the separation. The L channel audio signal is given to the speakers 32R and 32L for the R channel and the L channel, respectively.

Ｒチャンネル及びＬチャンネル用のスピーカ３２Ｒ、３２Ｌはそれぞれ、受信分配部３１から与えられた合成Ｒチャンネル音声信号及び合成Ｌチャンネル音声信号を発音出力させるものである。 The R-channel and L-channel speakers 32R and 32L are for outputting the synthesized R-channel audio signal and the synthesized L-channel audio signal supplied from the reception distribution unit 31, respectively.

一方の地点Ａの参加者１−ｎＡが発言すると、その発音音声は、音声会議装置１１Ａのその参加者１−ｎＡ用のモノラルマイクロフォン２１−ｎＡによって捕捉され、モノラル音声信号が定位処理部４１−ｎＡに与えられる。この定位処理部４１−ｎＡにより、モノラル音声信号は、方向性が付加されたステレオ音声信号（Ｒチャンネル音声信号及びＬチャンネル音声信号）に変換される。このような方向性が付加されたステレオ音声信号（Ｒチャンネル音声信号及びＬチャンネル音声信号）は、音声多重送信部２２によって、他の定位処理部４１−１Ａ〜４１−ＮＡ（但し４１−ｎＡを除く）からのステレオ音声信号と、チャンネル別に合成された後、チャンネル多重されて、対向する音声会議装置１１Ｂに向けて送信される。 When the participant 1-nA at one point A speaks, the pronunciation sound is captured by the monaural microphone 21-nA for the participant 1-nA of the audio conference apparatus 11A, and the monaural audio signal is localized processing unit 41- given to nA. By this localization processing unit 41-nA, the monaural audio signal is converted into a stereo audio signal (R channel audio signal and L channel audio signal) to which directionality is added. The stereo audio signal (R channel audio signal and L channel audio signal) to which such directionality is added is transmitted to the other localization processing units 41-1A to 41-NA (41-nA by the audio multiplex transmission unit 22). Are combined with the stereo audio signal from each other, and then channel-multiplexed and transmitted to the opposite audio conference apparatus 11B.

このような送信信号を、音声会議装置１１Ｂの受信分配部３１Ｂが受信する。受信分配部３１Ｂは、対向する音声会議装置１１Ａが送信した合成ステレオ音声信号を、合成Ｒチャンネル音声信号及び合成Ｌチャンネル音声信号に分離すると共に、分離して得た一対の合成Ｒチャンネル音声信号及び合成Ｌチャンネル音声信号を、Ｒチャンネル及びＬチャンネル用のスピーカ３２Ｒ、３２Ｌに与えて発音出力させる。これにより、受信側の各参加者１−１Ｂ〜１−ＮＢは、定位処理された合成Ｒチャンネル音声及び合成Ｌチャンネル音声を聴取する。 Such a transmission signal is received by the reception distribution unit 31B of the audio conference apparatus 11B. The reception distributor 31B separates the synthesized stereo audio signal transmitted by the opposing audio conference device 11A into a synthesized R channel audio signal and a synthesized L channel audio signal, and a pair of synthesized R channel audio signals obtained by the separation and The synthesized L-channel audio signal is applied to the R-channel and L-channel speakers 32R and 32L to generate sound. As a result, each of the participants 1-1B to 1-NB on the receiving side listens to the synthesized R channel sound and synthesized L channel sound that have been subjected to localization processing.

地点Ａにおいて、参加者１−ｎＡ以外の参加者は発音していない場合であると、受信側の各参加者１−１Ｂ〜１−ＮＢが聴取する合成Ｒチャンネル音声及び合成Ｌチャンネル音声は、参加者１−ｎＡに係るモノラル音声を定位処理した音声だけとなり、各参加者１−１Ｂ〜１−ＮＢは、発音参加者１−ｎＡが誰であるかや、位置等を認識する。 When the participant other than the participant 1-nA does not sound at the point A, the synthesized R channel sound and the synthesized L channel sound that are received by the respective participants 1-1B to 1-NB on the receiving side are: Only the monaural sound related to the participant 1-nA is localized, and each participant 1-1B to 1-NB recognizes who the pronunciation participant 1-nA is, the position, and the like.

上記第２の実施形態によれば、発音者の口元近傍のモノラルマイクロフォンが捕捉したモノラル音声信号を定位処理して方向性を付与したステレオ音声信号に変換して他方の地点に送信するようにしたので、音声信号の発音手段がステレオスピーカであっても、エコーが問題となることがなく、また、ステレオによる聴取であるので、話者や話者の位置なども聴取者が認識し易いものとなっている。 According to the second embodiment, the monaural sound signal captured by the monaural microphone near the mouth of the speaker is localized and converted into a stereo sound signal with directionality and transmitted to the other point. Therefore, even if the sound signal is generated by a stereo speaker, the echo does not become a problem, and since the listening is performed by stereo, the listener and the position of the speaker can be easily recognized by the listener. It has become.

ここで、定位処理を利用しているので、方向性の設定によっては実際の席順と別の仮想的な席順を対向グループに認識させることが可能となり、第１の実施形態に比べて、より多彩な音声会議を実現することが可能となっている。 Here, since the localization process is used, depending on the setting of directionality, it becomes possible to make the opposing group recognize the virtual seating order different from the actual seating order, and it is more versatile than the first embodiment. It is possible to realize a voice conference.

第２の実施形態の変形実施形態としては、各定位処理部４１−１〜４１−Ｎが付与する方向性が固定ではなく、可変設定できるものを挙げることができる。すなわち、図示は省略するが、各定位処理部４１−１〜４１−Ｎに対し、外部から角度情報Ｄ１〜ＤＮを入力でき、各定位処理部４１−１〜４１−Ｎはそれぞれ、その角度情報Ｄ１〜ＤＮが規定している方向性を付与する。 As a modified embodiment of the second embodiment, the directionality provided by each localization processing unit 41-1 to 41-N is not fixed but can be variably set. That is, although illustration is omitted, angle information D1 to DN can be input from the outside to each of the localization processing units 41-1 to 41-N, and each of the localization processing units 41-1 to 41-N has its angle information. The directionality defined by D1 to DN is given.

このような変形実施形態によれば、発話者の方向性操作に自由度が増し、より臨場感のある音声会議を実現することができる。 According to such a modified embodiment, the degree of freedom is increased in the directional operation of the speaker, and a more realistic voice conference can be realized.

（Ｃ）第３の実施形態
次に、本発明の音声会議装置を、多人数二地点間の音声会議システムに適用した第３の実施形態を、図面を参照しながら説明する。 (C) Third Embodiment Next, a third embodiment in which the audio conference apparatus of the present invention is applied to an audio conference system between two points of a large number of people will be described with reference to the drawings.

図３は、第３の実施形態の音声会議装置の概略構成を示しており、第２の実施形態に係る図２との同一、対応部分には同一符号を付して示している。かかる構成の音声会議装置１２が対向している２地点にそれぞれ設けられることにより、第３の実施形態の音声会議システムが構成される。 FIG. 3 shows a schematic configuration of the audio conference apparatus according to the third embodiment, and the same reference numerals are given to the same and corresponding parts as in FIG. 2 according to the second embodiment. The audio conference system 12 according to the third embodiment is configured by providing the audio conference apparatus 12 having such a configuration at two points facing each other.

第３の実施形態の音声会議装置１２は、第２の実施形態に比較すると、モノラル音声信号からステレオ音声信号を生成する定位処理部４１−１〜４１−Ｎが、バランス処理部（送信側のバランス処理部）５１−１〜５１−Ｎに置き換わっている点が異なり、その他の点は、第２の実施形態と同様である。 Compared to the second embodiment, the audio conference device 12 according to the third embodiment includes localization processing units 41-1 to 41-N that generate stereo audio signals from monaural audio signals. (Balance processing unit) 51-1 to 51-N are different, and the other points are the same as in the second embodiment.

ｎ（ｎは１〜Ｎ）番目のバランス処理部５１−ｎは、入力されたモノラル音声信号Ｓｎから、Ｌチャンネル音声信号ＳＴｎ（Ｌ）、Ｒチャンネル音声信号ＳＴｎ（Ｒ）を以下の式に従って生成する。 The n (n is 1 to N) -th balance processing unit 51-n generates an L channel audio signal STn (L) and an R channel audio signal STn (R) from the input monaural audio signal Sn according to the following formula. To do.

ＳＴｎ（Ｌ）＝Ｓｎ＊（１−Ａｎ）ＳＴｎ（Ｒ）＝Ｓｎ＊Ａｎ
但し、Ａｎ＝（ｎ−１）／（Ｎ−１）
第３の実施形態によれば、第２の実施形態と同様な効果を奏することができる。さらに、第２の実施形態より簡単な処理によって方向性を付与することができる。 STn (L) = Sn * (1-An) STn (R) = Sn * An
However, An = (n-1) / (N-1)
According to the third embodiment, the same effects as those of the second embodiment can be obtained. Furthermore, directionality can be imparted by a simpler process than in the second embodiment.

第３の実施形態の変形実施形態としては、各バランス処理部５１−１〜５１−Ｎが付与する方向性が固定ではなく、可変設定できるものを挙げることができる。すなわち、図示は省略するが、各バランス処理部５１−１〜５１−Ｎに対し、外部から方向性情報Ａ１〜ＡＮを入力でき、各バランス処理部５１−１〜５１−Ｎはそれぞれ、その方向性情報Ａ１〜ＡＮが規定している方向性を付与する。 As a modified embodiment of the third embodiment, the directionality provided by each balance processing unit 51-1 to 51-N is not fixed but can be variably set. That is, although illustration is omitted, directionality information A1 to AN can be input from the outside to each balance processing unit 51-1 to 51-N, and each balance processing unit 51-1 to 51-N has its direction. The directionality defined by the sex information A1 to AN is given.

（Ｄ）第４の実施形態
次に、本発明の音声会議装置を、多人数二地点間の音声会議システムに適用した第４の実施形態を、図面を参照しながら説明する。 (D) Fourth Embodiment Next, a fourth embodiment in which the audio conference apparatus according to the present invention is applied to an audio conference system between two points of a large number of people will be described with reference to the drawings.

図４は、第４の実施形態の音声会議装置の概略構成を示しており、第１の実施形態に係る図１との同一、対応部分には同一符号を付して示している。かかる構成の音声会議装置１３が対向している２地点にそれぞれ設けられることにより、第４の実施形態の音声会議システムが構成される。 FIG. 4 shows a schematic configuration of the audio conference apparatus according to the fourth embodiment, and the same or corresponding parts as those in FIG. 1 according to the first embodiment are denoted by the same reference numerals. The audio conference system 13 according to the fourth embodiment is configured by providing the audio conference apparatus 13 having such a configuration at two points facing each other.

第４の実施形態の音声会議装置１３は、第１の実施形態に比較すると、受信分配部３１とヘッドフォン３２−１〜３２−Ｎとの間に、バランス処理部（受信側のバランス処理部）６１−１〜６１−Ｎが設けられている点、各ヘッドフォン３２−１〜３２−Ｎが対応するバランス処理部６１−１〜６１−Ｎに距離測定用信号（無線、赤外線等）を送信する機能を有している点が異なっている。 Compared to the first embodiment, the audio conference apparatus 13 according to the fourth embodiment has a balance processing unit (a receiving side balance processing unit) between the reception distribution unit 31 and the headphones 32-1 to 32-N. The point 61-1 to 61-N is provided, and the distance measurement signals (wireless, infrared, etc.) are transmitted to the balance processing units 61-1 to 61-N corresponding to the headphones 32-1 to 32-N. It is different in that it has a function.

各ヘッドフォン３２−１〜３２−Ｎはそれぞれ、左耳受聴部から距離測定用信号を送信すると共に、右耳受聴部から距離測定用信号を送信する。 Each of the headphones 32-1 to 32-N transmits a distance measurement signal from the left ear listening unit and transmits a distance measurement signal from the right ear listening unit.

各バランス処理部６１−１〜６１−Ｎはそれぞれ、対応するヘッドフォン３２−１〜３２−Ｎの左耳受聴部及び右耳受聴部から送信された距離測定用信号を受信し、左耳受聴部及び右耳受聴部までの距離ｒ１、ｒ２を得て、受信分配部３１からのステレオ音声信号（Ｒチャンネル音声信号Ｒ、Ｌチャンネル音声信号Ｌ）を下記のように補正して、補正したステレオ音声信号（Ｒチャンネル音声信号ＲＭ、Ｌチャンネル音声信号ＬＭ）を対応するヘッドフォン３２−１〜３２−Ｎに与える。 Each balance processing unit 61-1 to 61-N receives the distance measurement signal transmitted from the left ear listening unit and the right ear listening unit of the corresponding headphones 32-1 to 32-N, and the left ear listening unit. And the distances r1 and r2 to the right ear listening unit, and the stereo audio signals (R channel audio signal R and L channel audio signal L) from the reception distribution unit 31 are corrected as follows, and the corrected stereo audio is corrected. Signals (R channel audio signal RM, L channel audio signal LM) are applied to the corresponding headphones 32-1 to 32-N.

ＬＭ＝Ｌ＊ｒ２／（ｒ１＋ｒ２）
ＲＭ＝Ｒ＊ｒ１／（ｒ１＋ｒ２）
第４の実施形態によれば、第１の実施形態と同様な効果を奏することができる。さらに、聴取者の向きによらずに、話者が同一位置に居るように感じることができ、第１の実施形態より、臨場感のある音声会議が可能となる。 LM = L * r2 / (r1 + r2)
RM = R * r1 / (r1 + r2)
According to the fourth embodiment, the same effect as that of the first embodiment can be obtained. Furthermore, it is possible to feel that the speaker is in the same position regardless of the orientation of the listener, and a voice conference with a sense of reality is possible from the first embodiment.

例えば、聴取者１−ｎが、図５（Ａ）に示すようにバランス処理部６１−ｎに向いていても、また、図５（Ｂ）に示すようにバランス処理部６１−ｎより左側を向いていても、バランス処理部６１−ｎの補正処理により、話者が同一位置に居るように感じることができる。因みに、第１の実施形態の場合、聴取者１−ｎの向きにより話者が居ると感じる位置も変化するものであった。 For example, even if the listener 1-n is facing the balance processing unit 61-n as shown in FIG. 5A, the listener 1-n is placed on the left side of the balance processing unit 61-n as shown in FIG. Even if it is facing, it can be felt that the speaker is at the same position by the correction processing of the balance processing unit 61-n. Incidentally, in the case of the first embodiment, the position at which the speaker feels changes depending on the direction of the listener 1-n.

第４の実施形態においては、ヘッドフォンとバランス処理部との通信により、聴取者の頭部の向き情報を得るものを示したが、頭部の向きの検出方法はこれに限定されないことは勿論である。また、ステレオ音声信号の頭部向きに応じた補正方法も、上述した式による補正方法に限定されず、頭部向きの検出方法に合わせて、適宜選定すれば良い。 In the fourth embodiment, it has been shown that the head direction of the listener is obtained by communication between the headphones and the balance processing unit. However, the head direction detection method is not limited to this. is there. Also, the correction method according to the head direction of the stereo audio signal is not limited to the correction method based on the above-described formula, and may be appropriately selected according to the head direction detection method.

（Ｅ）他の実施形態
第ｉの実施形態（その変形実施形態を含む）の送信構成と、第ｊ（ｊはｉと異なる）の実施形態（その変形実施形態を含む）の受信構成とを、組み合わせ可能ならば、組み合わせて音声会議装置を実現するようにしても良い。例えば、第３の実施形態の送信構成と、第４の実施形態の受信構成とを有する音声会議装置であっても良い。 (E) Other Embodiments The transmission configuration of the i-th embodiment (including the modified embodiment) and the receiving configuration of the j-th (j is different from i) embodiment (including the modified embodiment). If possible, a voice conference device may be realized by combining. For example, an audio conference device having the transmission configuration of the third embodiment and the reception configuration of the fourth embodiment may be used.

本発明の音声会議装置及び音声会議システムは、テレビ会議などの音声処理部として実現されても良く、単独の音声会議装置として実現されても良い。 The audio conference apparatus and audio conference system of the present invention may be realized as an audio processing unit such as a video conference, or may be realized as a single audio conference apparatus.

第１の実施形態の音声会議装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the audio conference apparatus of 1st Embodiment. 第２の実施形態の音声会議装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the audio conference apparatus of 2nd Embodiment. 第３の実施形態の音声会議装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the audio conference apparatus of 3rd Embodiment. 第４の実施形態の音声会議装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the audio conference apparatus of 4th Embodiment. 第４の実施形態の音声会議装置の効果の説明図である。It is explanatory drawing of the effect of the audio conference apparatus of 4th Embodiment.

Explanation of symbols

１０〜１３…音声会議装置、２１Ｒ、２１Ｌ、２１−１〜２１−Ｎ…マイクロフォン、２２…音声多重送信部、３１…受信分配部、３２−１〜３２−Ｎ…ヘッドフォン、３２Ｒ、３２Ｌ…スピーカ、４１−１〜４１−Ｎ…定位処理部、５１−１〜５１−Ｎ…送信側のバランス処理部、６１−１〜６１−Ｎ…受信側のバランス処理部。

10 to 13: Audio conferencing apparatus, 21R, 21L, 211-1 to 21-N: Microphone, 22: Audio multiplexing transmission unit, 31: Reception distribution unit, 32-1 to 32-N: Headphone, 32R, 32L ... Speaker , 41-1 to 41-N: localization processing unit, 51-1 to 51-N ... balance processing unit on transmission side, 61-1 to 61-N ... balance processing unit on reception side.

Claims

In the audio conference apparatus provided at each of the two locations related to the conference, which can accommodate up to N participants (N is 3 or more),
A directional monaural microphone provided near the mouth of each participant;
Monaural / stereo conversion means for converting a monaural audio signal from a corresponding monaural microphone into a stereo audio signal composed of an R channel audio signal and an L channel audio signal to which a direction determined by the monaural / stereo conversion means is added ;
The R channel audio signal from each of the monaural / stereo conversion means is synthesized to obtain a synthesized R channel audio signal, and the L channel audio signal from each of the monaural / stereo conversion means is synthesized to obtain a synthesized L channel audio signal. Stereo audio transmission means for transmitting the synthesized R channel audio signal and the synthesized L channel audio signal to the opposing audio conference device;
An R channel speaker and an L channel speaker;
Receiving and distributing means for providing a synthesized R channel audio signal from the opposite audio conference device to the R channel speaker and providing a synthesized L channel audio signal from the opposite audio conference device to the L channel speaker. An audio conference apparatus.

Each mono / stereo conversion means, depending on the parameters defining the direction designated from the outside, according to claim 1, characterized in that the monaural audio signal is of variable configuration for converting to the stereo audio signal The audio conference device according to 1.

Each mono / stereo conversion means, based on the localization data, by localization processing using the HRTF, wherein said monaural audio signals to claim 1 or 2, characterized in that to convert to the stereo audio signal Audio conferencing equipment.

3. The audio conference according to claim 1, wherein each of the monaural / stereo conversion means converts the monaural audio signal into the stereo audio signal by weighting processing based on weighting information. apparatus.