JP2007329753A

JP2007329753A - Voice communication device and voice communication device

Info

Publication number: JP2007329753A
Application number: JP2006160002A
Authority: JP
Inventors: Noriyuki Hata; 紀行畑
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-06-08
Filing date: 2006-06-08
Publication date: 2007-12-20

Abstract

PROBLEM TO BE SOLVED: To provide a voice communication device capable of adjusting an output voice to the voice of a specific speaker flexibly by listeners by a relatively simple constitution. SOLUTION: When it is hard to hear the voice of a conference member J at a spot (b) and then conference members A and G at a spot (a) adjust an output sound, a voice conference device 111A adjusts the mutual relation between and among voices outputted from respective speakers in a speaker array to adjust only output voices in directions Dir11 and Dir18 corresponding to the conference members A and G based upon respective output voice adjustment contents. The output voice adjustment contents are transmitted to a network server 101 and when the number of output voice adjustment contents to the conference member J reaches a predetermined value, the network server 101 supplies sound collection correction data for correcting a sound gathering signal from a direction Dir24 corresponding to the conference member J to a voice conference device 111B. The voice conference device 111B corrects the sound gathering signal from the direction Dir24 based upon the sound gathering correction data and sends the corrected signal to each of voice conference devices 111A and 111C. COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、ネットワークを介して相互に音声信号を通信することで遠隔地会議等を行う音声通信システムおよびこの音声通信システムで用いる音声通信装置に関するものである。 The present invention relates to a voice communication system that performs a remote conference and the like by communicating voice signals with each other via a network, and a voice communication apparatus used in the voice communication system.

複数の地点をネットワークで接続して音声会議やチャットを行う音声通信システムが各種考案されている。 Various voice communication systems have been devised in which a plurality of points are connected via a network to perform voice conferences and chats.

例えば、特許文献１では、それぞれの会議者が音声通信装置に相当するパソコンを個別にネットワークへ接続し、仮想会議室で互いに会議をするシステムが開示されている。 For example, Patent Document 1 discloses a system in which each conference person individually connects a personal computer corresponding to an audio communication device to a network and has a meeting with each other in a virtual conference room.

そして、特許文献１では、それぞれの会議者が個別に音声通信装置を操作して、受信音声信号の音質、音量、音響を調整して放音することで、各話者単位で臨場感の有る会議を行えるものである。
特開平８−１２５７６１号公報 And in patent document 1, each conference person operates a voice communication apparatus separately, and adjusts the sound quality, volume, and sound of a received voice signal, and has a sense of presence in each speaker unit. A meeting can be held.
JP-A-8-125761

しかしながら、特許文献１の音声通信システムでは、会議者毎に音声通信装置を設置しなければならず、会議規模が大きくなると音声通信システムの規模が大幅に大きくなってしまう。 However, in the audio communication system of Patent Document 1, an audio communication device must be installed for each conference person. When the conference scale increases, the scale of the audio communication system increases significantly.

また、特許文献１の音声通信システムでは、仮想会議室での各会議者の位置関係により放音特性が設定されるが、特定の話者の声が小さくて聴き取り難い場合等に、この特定話者に対する放音のみをフレキシブルに調整することができない。 Further, in the voice communication system of Patent Document 1, the sound emission characteristics are set according to the positional relationship of each conference person in the virtual conference room. This specification is used when the voice of a specific speaker is low and difficult to hear. It is not possible to flexibly adjust the sound emission to the speaker.

さらには、１つの音声通信装置に複数の会議者が在席していても、所定方向への放音制御を行うことができないので、全会議者に対して同じ音声しか提供することができない。 Furthermore, even if a plurality of conference persons are present in one voice communication apparatus, sound emission control in a predetermined direction cannot be performed, so that only the same voice can be provided to all the conference participants.

したがって、この発明の目的は、会議者数に影響されにくい比較的簡素なシステム構成で、特定話者の声に対する放音音声を聞き手毎にフレキシブルに調整することができる音声通信システムおよびこの音声通信システムに用いる音声通信装置を提供することにある。 Accordingly, an object of the present invention is to provide a voice communication system capable of flexibly adjusting a sound emitted to a specific speaker's voice for each listener with a relatively simple system configuration that is hardly affected by the number of participants, and the voice communication. An object of the present invention is to provide a voice communication device used in a system.

（１）この発明の音声通信装置は、複数のスピーカが所定配列されたスピーカアレイと、放音特性の調整操作を受け付ける操作受付手段と、所定の複数方位のみに放音ビームを形成するとともに、操作受付手段で受け付けた放音特性に基づき指定された方位への放音ビームを調整するように、入力した音声通信信号を遅延・振幅処理して複数のスピーカに与える放音制御手段と、を備えたことを特徴とする。 (1) The voice communication device according to the present invention forms a sound emitting beam only in a predetermined plurality of directions, a speaker array in which a plurality of speakers are arranged in a predetermined manner, operation receiving means for receiving a sound emission characteristic adjustment operation, A sound emission control means for delaying / amplifying the input voice communication signal and applying it to a plurality of speakers so as to adjust a sound emission beam in a designated direction based on a sound emission characteristic received by the operation reception means; It is characterized by having.

この構成では、放音制御手段は、入力した音声通信信号を遅延・振幅処理することで、スピーカアレイの各スピーカから放音される音声を設定する。この際、放音制御手段は、会議者がそれぞれ在席する位置等により設定される所定の複数方位のみに放音されるように音声通信信号の遅延・振幅処理を行い、当該複数方位のそれぞれに対して強い指向性を有するように各放音ビームを設定する。さらに、放音制御手段は、会議者から操作受付手段を介して放音特性を設定する操作を受け付けると、この受け付けた放音特性に基づいて該当会議者に対応する方位への放音ビームを調整する。これにより、各会議者に向けて放音するとともに、放音音声の特性を変えたい会議者に対しては、特性を変化させて放音することができる。 In this configuration, the sound emission control means sets the sound emitted from each speaker of the speaker array by performing delay / amplitude processing on the input voice communication signal. At this time, the sound emission control means performs the delay / amplitude processing of the voice communication signal so that the sound is emitted only in a predetermined plurality of directions set depending on the position at which the conference person is present. Each sound emitting beam is set so as to have a strong directivity with respect to. Further, when the sound emission control unit receives an operation for setting the sound emission characteristic from the conference through the operation reception unit, the sound emission control unit emits a sound emission beam in a direction corresponding to the conference person based on the received sound emission characteristic. adjust. As a result, sound can be emitted toward each conference person and the characteristics can be changed for a conference person who wants to change the characteristics of the emitted sound.

（２）また、この発明の音声通信装置は、音量、音質、音声特徴量のいずれか、またはこれらの組み合わせにより、放音特性を設定することを特徴とする。 (2) Further, the voice communication apparatus of the present invention is characterized in that the sound emission characteristic is set by any one of volume, sound quality, voice feature quantity, or a combination thereof.

この構成では、音量、音質、音声特徴量を適宜操作することにより、放音特性が変化し、会議者（聴者）に適する音声が提供される。 In this configuration, by appropriately operating the volume, sound quality, and audio feature amount, the sound emission characteristics change, and sound suitable for the conference (listener) is provided.

（３）また、この発明の音声通信装置は、所定の複数方位に対して収音ビームを形成し、該収音ビーム強度を比較することで話者方位を同定し、当該話者方位とともに該話者方位の収音ビームに基づく音声通信信号を出力する収音手段を備えたことを特徴とする。 (3) Further, the voice communication apparatus of the present invention forms a sound collecting beam with respect to a plurality of predetermined directions, identifies the speaker direction by comparing the sound collecting beam intensities, and together with the speaker direction, A sound collecting means for outputting a voice communication signal based on a sound collecting beam in a speaker direction is provided.

この構成では、話者の音声をネットワークに出力する場合に、音声通信信号と話者方位とが関連付けされた状態で通信される。 In this configuration, when the voice of the speaker is output to the network, the communication is performed in a state where the voice communication signal and the speaker orientation are associated with each other.

（４）また、この発明の音声通信システムは、前記音声通信装置を複数ネットワーク接続するとともに、当該ネットワークの通信を管理するネットワークサーバを備えたものであって、音声通信装置は受け付けた放音特性の調整内容を前記ネットワークサーバに与え、該ネットワークサーバは、各音声通信装置から受け付けた複数の調整内容が同一話者方位に対するものであり、同じ傾向であって、且つこれら調整内容の受付数が所定数以上であれば、当該話者方位に対する収音補正特性を設定して、該当する音声通信装置に与え、収音補正特性が与えられた音声通信装置は、該当方位からの音声通信信号を収音補正特性で補正して出力することを特徴とする。 (4) Further, the voice communication system of the present invention includes a network server that manages a plurality of voice communication apparatuses connected to the network and manages the communication of the network, and the voice communication apparatus accepts sound emission characteristics. To the network server, and the network server has a plurality of adjustment contents received from each voice communication device for the same speaker direction, and has the same tendency, and the number of adjustment contents received is If the number is greater than or equal to the predetermined number, the sound collection correction characteristic for the speaker orientation is set and given to the corresponding voice communication apparatus, and the voice communication apparatus given the sound collection correction characteristic receives the voice communication signal from the corresponding direction. It is characterized by being output after being corrected by sound collection correction characteristics.

この構成では、複数の聴者が放音特性の調整を行うと、該当する複数の聴者が在席する位置に配置された各音声通信装置は放音特性の調整操作を受け付ける。各音声通信装置は、この放音特性の調整内容をネットワークサーバに送信し、ネットワークサーバはこれを受信する。ネットワークサーバは、受信した放音特性の調整内容を比較し、同じ傾向の内容（例えば、全てが音量等の増加を示すもの）の受付数が所定値以上であるかどうかを判定する。ここで、所定値とは、例えば、現在ネットワークに接続して会議（音声通信）を行っている会議者（話者）の過半数等で設定する。ネットワークサーバは、所定値以上であると判定すると、該当する話者方位に対して設定された調整内容群に基づく収音補正特性を、該当する音声通信装置に送信する。収音補正特性を受信した音声通信装置は、該当方位（話者方位）から得られる収音ビームが選択され音声通信信号に変換される際に、当該音声通信信号を収音補正特性で補正して、ネットワークに出力する。これにより、所定値以上の聴者が特定話者からの音声を調整する場合に、話者の収音時に一元して音声を調整することができる。 In this configuration, when a plurality of listeners adjust the sound emission characteristics, each voice communication device disposed at a position where the corresponding plurality of listeners are present accepts an operation for adjusting the sound emission characteristics. Each voice communication device transmits the adjustment contents of the sound emission characteristics to the network server, and the network server receives this. The network server compares the received adjustment contents of the sound emission characteristics, and determines whether or not the number of receptions of the same tendency contents (for example, all indicating an increase in volume or the like) is greater than or equal to a predetermined value. Here, the predetermined value is set by, for example, a majority of conferencers (speakers) currently connected to the network and conducting a conference (voice communication). When the network server determines that the value is equal to or greater than the predetermined value, the network server transmits the sound collection correction characteristic based on the adjustment content group set for the corresponding speaker orientation to the corresponding voice communication device. The voice communication apparatus that has received the sound collection correction characteristic corrects the voice communication signal with the sound collection correction characteristic when a sound collection beam obtained from the corresponding direction (speaker direction) is selected and converted into a voice communication signal. Output to the network. As a result, when a listener of a predetermined value or more adjusts the sound from a specific speaker, the sound can be adjusted in a unified manner when the speaker collects sound.

（５）また、この発明の音声通信システムのネットワークサーバは、収音補正特性をネットワークに接続する全ての音声通信装置に与え、各音声通信装置の放音制御手段は、与えられた収音補正特性と受け付けた放音特性との差分に基づいて放音ビームを調整することを特徴とする。 (5) Further, the network server of the voice communication system of the present invention provides the sound collection correction characteristics to all the voice communication devices connected to the network, and the sound emission control means of each voice communication device provides the given sound collection correction. The sound emission beam is adjusted based on a difference between the characteristic and the received sound emission characteristic.

この構成では、前述のような収音時の補正を行うと、特定話者からの音声通信信号が収音補正特性で補正された状態で他の各音声通信装置に入力される。この際、各音声通信装置には、ネットワークサーバから同じ収音補正特性が与えられる。各音声通信装置は、この収音補正特性と、自装置に操作入力された放音特性の調整内容とを差分する。各音声通信装置は、放音特性の調整内容を指定した方位へ、差分結果に基づく放音ビームを形成する。これにより、放音特性を調整した聴者に対して、収音補正特性と放音特性の調整内容とが重畳した状態で放音ビームが形成されるのではなく、元々の放音特性の調整内容に基づく放音ビームが形成される。 In this configuration, when the above-described correction at the time of sound collection is performed, the voice communication signal from the specific speaker is input to each of the other voice communication devices in a state corrected by the sound collection correction characteristic. At this time, each voice communication device is given the same sound collection correction characteristic from the network server. Each voice communication device makes a difference between the sound collection correction characteristic and the adjustment content of the sound emission characteristic input to the own device. Each voice communication device forms a sound emission beam based on the difference result in a direction in which the adjustment content of the sound emission characteristic is designated. As a result, for the listener who adjusted the sound emission characteristics, the sound emission beam is not formed in a state where the sound collection correction characteristics and the sound emission characteristic adjustment contents are superimposed, but the original sound emission characteristic adjustment contents A sound emitting beam based on

（６）また、この発明の音声通信システムの各音声通信装置の放音制御手段は、放音特性の調整操作が行われていない方位の放音ビームに対して、収音補正特性を打ち消す調整を行うことを特徴とする。 (6) Further, the sound emission control means of each voice communication device of the voice communication system according to the present invention is an adjustment that cancels the sound collection correction characteristic with respect to the sound emission beam in the direction where the sound emission characteristic adjustment operation is not performed. It is characterized by performing.

この構成では、該当話者の在席する音声通信装置以外の各音声通信装置は、収音補正特性を取得すると、当該収音補正特性を打ち消す特性を生成する。各音声通信装置は、収音補正特性で補正された音声通信信号を入力すると、放音特性の調整操作を行っていない方位の放音ビームに対して収音補正特性を打ち消す特性に基づく調整を行う。これにより、調整操作を行っていない聴者に対して、収音補正特性による補正を行う以前の生の話者音声に基づく放音ビームが形成される。 In this configuration, each voice communication device other than the voice communication device in which the speaker is present acquires a sound collection correction characteristic, and generates a characteristic that cancels the sound collection correction characteristic. When each voice communication device receives a voice communication signal corrected with the sound collection correction characteristic, the sound communication apparatus performs adjustment based on the characteristic that cancels the sound collection correction characteristic for the sound emission beam in the direction where the sound emission characteristic adjustment operation is not performed. Do. As a result, a sound emission beam based on the raw speaker voice before correction by the sound collection correction characteristic is formed for the listener who has not performed the adjustment operation.

この発明によれば、話者（聴者）の１人１人に対して音声通信装置を配分することなく、１つの音声通信装置に対して複数の話者（聴者）を在席させても、聴者毎に好みの音声で、話者の発声音を放音することができる。 According to the present invention, even if a plurality of speakers (listeners) are present in one voice communication device without allocating the voice communication device to each speaker (listener), The voice of the speaker can be emitted with a favorite voice for each listener.

また、この発明によれば、所定値以上の聴者が特定話者からの音声を調整する場合に、特定話者からの収音時に一元して音声を補正することで、全員に対して同時に音声を調整することができる。これは、例えば、聴き取りづらいけれど調整操作方法が分からない人がいたり、聴き取りづらいけれど敢えて調整を行わない人がいる場合に、これらの人々に対しても聴き取りやすい音声を提供することができる。 In addition, according to the present invention, when a listener of a predetermined value or more adjusts the sound from a specific speaker, the sound is collectively corrected at the time of sound collection from the specific speaker, so that all the members can simultaneously hear the sound. Can be adjusted. For example, if there are people who are difficult to hear but don't know how to make adjustments, or people who are hard to hear but don't dare to make adjustments, it is easy to listen to these people. it can.

また、この発明によれば、音声調整を行っていない聴者に対しては、特定話者の音声を元のままで放音することができる。これは、例えば、調整を行っていない人は特に問題なく聴き取れているとする場合に、音声調整を行いたい人には補正した音声を提供し、音声調整を行っていない人には元のままの音声を提供することができる。 Further, according to the present invention, it is possible to emit the sound of the specific speaker as it is to the listener who has not adjusted the sound. This is because, for example, if the person who has not made the adjustment is listening without any problem, the person who wants to adjust the sound is provided with the corrected sound, and the person who has not made the sound adjustment is the original. As-is audio can be provided.

以下の実施形態では、具体的な音声通信システムのシステム例として、音声会議システムについて、図を参照して説明する。 In the following embodiments, an audio conference system will be described with reference to the drawings as a specific system example of an audio communication system.

図１は、本実施形態の音声会議システムの構成図である。
図２（Ａ）は図１に示す音声会議システム中の地点ａの構成を示す図であり、（Ｂ）は図２（Ａ）に示すリモコン装置１２０（１２０Ａ〜１２０Ｇ）の平面図である。
図３は本実施形態の音声会議装置１１１（１１１Ａ〜１１１Ｃ）の両側面図と底面図とを示す。
図４は、図３に示す音声会議装置の主要構成を示すブロック図である。
図５は、放音時および収音時のメイン制御部１０の処理を説明するための簡略図であり、音声会議装置１１１Ａのメイン制御部１０Ａが放音制御を行い、音声会議装置１１１Ｂのメイン制御部１０Ｂが収音制御を行っている場合を示す。
図６は本実施形態のネットワークサーバ１０１の主要構成を示すブロック図である。 FIG. 1 is a configuration diagram of an audio conference system according to the present embodiment.
2A is a diagram showing the configuration of the point a in the audio conference system shown in FIG. 1, and FIG. 2B is a plan view of the remote control device 120 (120A to 120G) shown in FIG.
FIG. 3 shows a side view and a bottom view of the audio conference apparatus 111 (111A to 111C) of the present embodiment.
FIG. 4 is a block diagram showing the main configuration of the audio conference apparatus shown in FIG.
FIG. 5 is a simplified diagram for explaining processing of the main control unit 10 at the time of sound emission and sound collection. The main control unit 10A of the audio conference apparatus 111A performs sound emission control, and the main of the audio conference apparatus 111B. The case where the control part 10B is performing sound collection control is shown.
FIG. 6 is a block diagram showing the main configuration of the network server 101 of this embodiment.

本実施形態の音声会議システムは、ネットワーク１００に接続された音声会議装置１１１Ａ〜１１１Ｃと、ネットワークサーバ１０１とを備える。 The audio conference system according to the present embodiment includes audio conference apparatuses 111 A to 111 C connected to a network 100 and a network server 101.

音声会議装置１１１Ａ〜１１１Ｃは、それぞれ離れた地点ａ〜ｃにそれぞれ配置されている。地点ａには音声会議装置１１１Ａが配置され、地点ｂには音声会議装置１１１Ｂが配置され、地点ｃには音声会議装置１１１Ｃが配置されている。 The audio conference apparatuses 111 A to 111 C are respectively arranged at points a to c that are separated from each other. The audio conference device 111A is arranged at the point a, the audio conference device 111B is arranged at the point b, and the audio conference device 111C is arranged at the point c.

地点ａには、音声会議装置１１１Ａが配置されており、該音声会議装置１１１Ａを囲むように、会議者Ａ〜Ｇの７人が、音声会議装置１１１Ａに対してそれぞれ方位Ｄｉｒ１１〜Ｄｉｒ１６，Ｄｉｒ１８で在席している。地点ｂには、音声会議装置１１１Ｂが配置されており、該音声会議装置１１１Ｂを囲むように、会議者Ｈ〜会議者Ｌの５人が、音声会議装置１１１Ｂに対して、それぞれ方位Ｄｉｒ２１，Ｄｉｒ２２，Ｄｉｒ２４，Ｄｉｒ２６，Ｄｉｒ２８で在席している。地点ｃには、音声会議装置１１１Ｃが配置されており、該音声会議装置１１１Ｃを囲むように、会議者Ｍ，Ｎ，Ｐ，Ｑが音声会議装置１１１Ｃに対して、それぞれ方位Ｄｉｒ３１，Ｄｉｒ３４，Ｄｉｒ３６，Ｄｉｒ３８で在席している。 At the point a, the audio conference device 111A is arranged, and so as to surround the audio conference device 111A, the seven participants A to G have directions Dir11 to Dir16 and Dir18 with respect to the audio conference device 111A, respectively. Being present. The audio conference device 111B is arranged at the point b, and five persons from the conference person H to the conference person L surround the audio conference device 111B with respect to the audio conference device 111B, respectively, with the directions Dir21 and Dir22. , Dir24, Dir26, Dir28. An audio conference device 111C is arranged at the point c, and the conference persons M, N, P, and Q are directed to the audio conference device 111C in directions Dir31, Dir34, and Dir36 so as to surround the audio conference device 111C. , Is present at Dir38.

ここで、各会議者は音声会議装置を囲んで在席するとともに、それぞれの手元に放音調整用のリモコン１２０を備えている。例えば、図２に示すように、地点ａの場合、音声会議装置１１１Ａを囲んで会議者Ａ〜Ｇが在席し、各会議者Ａ〜Ｇがそれぞれにリモコン１０Ａ〜１２０Ｇを持っている。 Here, each conference person is present surrounding the voice conference device, and has a remote control 120 for adjusting sound emission at each hand. For example, as shown in FIG. 2, in the case of a point a, conference participants A to G are present surrounding the audio conference apparatus 111 A, and each conference participant A to G has a remote controller 10 A to 120 G.

リモコン１２０は、例えば、図２（Ｂ）に示すように、表示部１２１、選択ボタン１２２、実行ボタン１２３、調整キー１２４、リモコン信号送信部１２５を備える。表示部１２１には、現在設定されている「音量」、「音質」、「声質」が表示される。「音質」は、さらに、「ＨＩ（高音）」、「ＭＩＤ（中音）」、「ＬＯＷ（低音）」に区別される。 For example, as shown in FIG. 2B, the remote controller 120 includes a display unit 121, a selection button 122, an execution button 123, an adjustment key 124, and a remote control signal transmission unit 125. The display unit 121 displays currently set “volume”, “sound quality”, and “voice quality”. “Sound quality” is further classified into “HI (high sound)”, “MID (medium sound)”, and “LOW (low sound)”.

そして、会議者が、選択ボタン１２２で調整したい放音特性（「音量」、「音質」、「声質」）を選択し、調整キー１２４で所望量または所望声質に調整することができる。「音量」、「音質」は、例えば、「＋１」や「−３」等の現在値に対する相対値で設定される。「声質」は、そのままを示すモードや、アナウンサ等の特定人のフォルマントを利用するモード等で設定される。そして、会議者が実行ボタン１２３で調整を確定すると、リモコン信号送信部１２５から赤外線等のリモコン通信信号が音声会議装置１１１のリモコン送受信部２０に送信される。音声会議装置１１１Ａ〜１１１Ｃは、このリモコン信号から、後述する「音声会議装置毎の個別処理」または「ネットワークサーバによる一括処理」のいずれかに基づいて、放音音声を会議者毎に設定する。 Then, the conference person can select a sound emission characteristic (“volume”, “sound quality”, “voice quality”) to be adjusted with the selection button 122, and can adjust the desired volume or desired voice quality with the adjustment key 124. “Volume” and “sound quality” are set as relative values with respect to the current value such as “+1” and “−3”, for example. “Voice quality” is set in a mode for indicating the state as it is, a mode in which a formant of a specific person such as an announcer is used, and the like. When the conference person confirms the adjustment with the execution button 123, a remote control communication signal such as infrared rays is transmitted from the remote control signal transmission unit 125 to the remote control transmission / reception unit 20 of the audio conference device 111. From the remote control signal, the audio conference apparatuses 111A to 111C set the sound emission for each conference person based on either “individual processing for each audio conference apparatus” or “batch processing by the network server” described later.

図３に示すように、本実施形態の音声会議装置１１１は、機構的に、筐体１１２、脚部１１３、操作部１１４を備える。
筐体１１２は一方向に長尺な略直方体形状からなり、筐体１１２の長尺な辺（面）の両端部には、筐体１１２の下面を設置面から所定間隔離間する所定高さの脚部１１３が設置されている。なお、以下の説明では、筐体１１２の四側面のうち、長尺な面を長尺面、短尺な面を短尺面と称する。 As shown in FIG. 3, the audio conference apparatus 111 according to the present embodiment mechanically includes a housing 112, a leg 113, and an operation unit 114.
The casing 112 has a substantially rectangular parallelepiped shape that is long in one direction, and has a predetermined height that separates the lower surface of the casing 112 from the installation surface at predetermined intervals at both ends of the long side (surface) of the casing 112. Legs 113 are installed. In the following description, of the four side surfaces of the housing 112, a long surface is referred to as a long surface, and a short surface is referred to as a short surface.

筐体１１２の上面における長尺な方向の一方端には、複数のボタンや表示画面からなる操作部１１４が設置されている。これら操作部１１４は筐体１１２内に設置されたメイン制御部１０に接続し、会議者からの操作入力を受け付けて、メイン制御部１０に出力するとともに、操作内容や実行モード等を表示画面に表示する。 An operation unit 114 including a plurality of buttons and a display screen is installed at one end of the upper surface of the housing 112 in the long direction. These operation units 114 are connected to the main control unit 10 installed in the housing 112, receive operation inputs from conference participants, output them to the main control unit 10, and display operation contents and execution modes on a display screen. indicate.

筐体１１２における操作部１１４が設置された側の短尺面には、図示しないが、ネットワーク接続端子等の各種入出力インターフェース端子が設置されている。 Although not shown, various input / output interface terminals such as a network connection terminal are installed on the short surface of the housing 112 on the side where the operation unit 114 is installed.

筐体１１２の下面には、同形状からなるスピーカＳＰ１〜ＳＰ１６が設置されている。これらスピーカＳＰ１〜ＳＰ１６は長尺方向に沿って一定の間隔で直線状に設置されており、これによりスピーカアレイが構成される。筐体１１２の一方の長尺面には、同形状からなるマイクＭＩＣ１０１〜ＭＩＣ１１６が設置されている。これらマイクＭＩＣ１０１〜ＭＩＣ１１６は長尺方向に沿って一定の間隔で直線状に設置されており、これによりマイクアレイが構成される。また、筐体１１２の他方の長尺面にも、同形状からなるマイクＭＩＣ２０１〜ＭＩＣ２１６が設置されている。これらマイクＭＩＣ２０１〜ＭＩＣ２１６も長尺方向に沿って一定の間隔で直線状に設置されており、これによりマイクアレイが構成される。そして、筐体１１２の下面側には、これらスピーカアレイおよびマイクアレイを覆う形状で形成され、パンチメッシュされた下面グリル（図示せず）が設置されている。なお、本実施形態では、スピーカアレイのスピーカ数を１６本とし、各マイクアレイのマイク数をそれぞれ１６本としたが、これに限ることなく、仕様に応じてスピーカ数およびマイク数は適宜設定すればよい。 Speakers SP 1 to SP 16 having the same shape are installed on the lower surface of the housing 112. These speakers SP1 to SP16 are installed in a straight line at regular intervals along the longitudinal direction, thereby constituting a speaker array. On one long surface of the housing 112, microphones MIC101 to MIC116 having the same shape are installed. These microphones MIC101 to MIC116 are installed in a straight line at regular intervals along the longitudinal direction, thereby forming a microphone array. In addition, microphones MIC201 to MIC216 having the same shape are also installed on the other long surface of the casing 112. These microphones MIC201 to MIC216 are also installed in a straight line at regular intervals along the longitudinal direction, thereby forming a microphone array. On the lower surface side of the housing 112, a lower surface grill (not shown) formed in a shape covering the speaker array and the microphone array and punch meshed is installed. In this embodiment, the number of speakers in the speaker array is 16 and the number of microphones in each microphone array is 16. However, the present invention is not limited to this, and the number of speakers and the number of microphones may be set as appropriate according to the specifications. That's fine.

音声会議装置１１１Ａ〜１１１Ｃは、機能的には図４に示すように、メイン制御部１０、通信制御部１１、放音制御部１２、Ｄ／Ａコンバータ１３、放音アンプ（ＡＭＰ）１４、収音アンプ（ＡＭＰ）１５、Ａ／Ｄコンバータ１６、収音制御部１７、エコーキャンセル部１８、音声信号補正部１９、リモコン送受信部２０、操作部１１４、スピーカＳＰ１〜ＳＰ１６、マイクＭＩＣ１０１〜ＭＩＣ１１６、ＭＩＣ２０１〜ＭＩＣ２１６、を備える。 As shown in FIG. 4, the audio conference apparatuses 111 A to 111 C functionally include a main control unit 10, a communication control unit 11, a sound emission control unit 12, a D / A converter 13, a sound emission amplifier (AMP) 14, a receiver. Sound amplifier (AMP) 15, A / D converter 16, sound collection control unit 17, echo cancellation unit 18, audio signal correction unit 19, remote control transmission / reception unit 20, operation unit 114, speakers SP1 to SP16, microphones MIC101 to MIC116, MIC201 -MIC216.

メイン制御部１０は、音声会議装置の全体制御を行うとともに、操作部１１４から入力される電源オン／オフ等の制御や、その他信号処理系の各種制御を行う。 The main control unit 10 performs overall control of the audio conference apparatus, and also performs control such as power on / off input from the operation unit 114, and various control of other signal processing systems.

ここで、説明を分かりやすくするため、音声会議装置１１１Ａで放音制御し、音声会議装置１１１Ｂで収音制御する場合を想定して説明する。図５において、放音側（放音時）の制御部１０Ａは、リモコン１２０からリモコン送受信部２０Ａを介して放音調整データＤｃｄを受け付けると、当該放音調整データＤｃｄを、通信制御部１１Ａ、ネットワーク１００を介してネットワークサーバ１０１に送信する。 Here, in order to make the explanation easy to understand, explanation will be made assuming that sound emission control is performed by the audio conference apparatus 111A and sound collection control is performed by the audio conference apparatus 111B. In FIG. 5, when the sound emission side (during sound emission) control unit 10A receives the sound emission adjustment data Dcd from the remote controller 120 via the remote control transmission / reception unit 20A, the sound emission adjustment data Dcd is received as the communication control unit 11A, The data is transmitted to the network server 101 via the network 100.

これと同時に、収音側（収音時）のメイン制御部１０Ｂは、収音制御部１７Ｂから与えられた話者方位データＤｄｗを、通信制御部１１Ｂ、ネットワーク１００を介してネットワークサーバ１０１に送信する。ネットワークサーバ１０１は、後述する方法に基づき収音補正が必要と判断すると音声会議装置１１１Ｂに対する収音補正データＤｓＢを各音声会議装置１１１Ａ，１１１Ｂに送信し、収音側のメイン制御部１０Ｂと放音側のメイン制御部１０Ａとは、この収音補正データＤｓＢを受け付ける。 At the same time, the main control unit 10B on the sound collection side (at the time of sound collection) transmits the speaker orientation data Ddw given from the sound collection control unit 17B to the network server 101 via the communication control unit 11B and the network 100. To do. When the network server 101 determines that sound collection correction is necessary based on a method described later, the network server 101 transmits sound collection correction data DsB for the audio conference apparatus 111B to each audio conference apparatus 111A, 111B, and releases it to the main controller 10B on the sound collection side. The sound-side main control unit 10A receives the sound collection correction data DsB.

収音側のメイン制御部１０Ｂは、収音補正データＤｓＢが自装置に対するものであることを検出し、且つ現在の話者方位データＤｄｗが収音補正対象であることを検出すると、収音ビーム信号を補正する収音制御データＤｒ（＝収音補正データＤｓＢ＋話者方位データＤｄｗ）を音声信号補正部１９Ｂに与える。音声信号補正部１９Ｂは、収音ビーム信号を収音補正データＤｓＢで補正した音声通信信号を出力する。 When the sound collection side main control unit 10B detects that the sound collection correction data DsB is for the device itself and detects that the current speaker orientation data Ddw is a sound collection correction target, the sound collection beam The sound collection control data Dr (= sound collection correction data DsB + speaker direction data Ddw) for correcting the signal is given to the sound signal correction unit 19B. The sound signal correcting unit 19B outputs a sound communication signal obtained by correcting the sound collecting beam signal with the sound collecting correction data DsB.

一方、放音側のメイン制御部１０Ａは、収音補正データＤｓＢが自装置に対するものでないことを検出し、且つ受信中の音声通信信号の話者方位データＤｄｗが補正対象であることを検出すると、自装置で受け付けた放音調整データＤｃｄから、他装置に対する収音補正データＤｓＢを減算してなる放音制御データＤｃＡ（＝Ｄｃｄ−ＤｓＢ）を生成して放音制御部１２Ａに与える。放音制御部１２Ａは与えられた放音制御データＤｃＡに基づいて通信制御部１１Ａで受信した入力音声信号を放音制御する。 On the other hand, when the main control unit 10A on the sound emission side detects that the sound collection correction data DsB is not for the own device and detects that the speaker orientation data Ddw of the voice communication signal being received is a correction target. Then, the sound emission control data DcA (= Dcd−DsB) obtained by subtracting the sound collection correction data DsB for the other device is generated from the sound emission adjustment data Dcd received by the own device, and is given to the sound emission control unit 12A. The sound emission control unit 12A performs sound emission control on the input voice signal received by the communication control unit 11A based on the given sound emission control data DcA.

また、放音側のメイン制御部１０Ａは、ネットワークサーバ１０１から収音補正データがＤｓＢを受け付けていなければ、自装置で受け付けた放音調整データＤｃｄのみにより設定される放音制御データＤｃＡ’（＝Ｄｃｄ）を生成して放音制御部１２Ａに与える。 Further, if the sound collection correction data from the network server 101 has not received DsB from the network server 101, the main control unit 10A on the sound emission side emits sound control data DcA ′ (set only by the sound emission adjustment data Dcd received by its own device. = Dcd) is generated and given to the sound emission control unit 12A.

通信制御部１１は、ネットワーク１００に接続し、ネットワーク１００を介して受信した他装置からの音声ファイルを、ネットワーク形式のデータから一般的な音声信号に変換して、エコーキャンセル部１８を介して放音制御部１２に出力する。ここで、通信制御部１１は、受け付けた音声ファイルに対応する装置データおよび話者方位データから送信元の音声会議装置を同定して、それぞれの音声会議装置の音声信号ごとに出力する。例えば、本実施形態の音声会議装置１１１Ａの場合、音声会議装置１１１Ｂからの音声信号Ｓ１と、音声会議装置１１１Ｃからの音声信号Ｓ２とを放音制御部１２に出力する。 The communication control unit 11 is connected to the network 100, converts an audio file received from another device via the network 100 from a network format data to a general audio signal, and releases it via the echo cancellation unit 18. The sound is output to the sound control unit 12. Here, the communication control unit 11 identifies the audio conference device as the transmission source from the device data and the speaker orientation data corresponding to the received audio file, and outputs the audio conference device for each audio conference device. For example, in the case of the audio conference apparatus 111A of the present embodiment, the audio signal S1 from the audio conference apparatus 111B and the audio signal S2 from the audio conference apparatus 111C are output to the sound emission control unit 12.

また、通信制御部１１は、リモコン１２０により入力された放音調整データがメイン制御部１０から与えられると、当該放音調整データをネットワークサーバ１０１に送信する。 Further, when the sound emission adjustment data input from the remote controller 120 is given from the main control unit 10, the communication control unit 11 transmits the sound emission adjustment data to the network server 101.

また、通信制御部１１は、音声信号補正部１９からの音声通信信号に対して、メイン制御部１０からの話者方位データと、装置の認識データとなる装置データとを添付して、ネットワーク通信形式に変換し、ネットワーク１００に送信する。なお、装置データとは、ネットワーク１００に接続する各音声会議装置に対して設定される個体識別ＩＤのようなものであり、各音声会議装置がネットワーク１００に接続した時点に自動的に割り当てられる。 In addition, the communication control unit 11 attaches the speaker orientation data from the main control unit 10 and the device data serving as device recognition data to the audio communication signal from the audio signal correction unit 19 to perform network communication. The data is converted into a format and transmitted to the network 100. The device data is an individual identification ID set for each voice conference device connected to the network 100 and is automatically assigned when each voice conference device is connected to the network 100.

放音制御部１２は、メイン制御部１０からの放音制御データに基づいて、入力された音声信号Ｓ１または音声信号Ｓ２に対して遅延処理や振幅処理等を行って、音声会議装置の周りに在席する各会議者へ個別の特性で放音ビームを形成するように、各スピーカＳＰ１〜ＳＰ１６に対応する放音信号を生成する。 The sound emission control unit 12 performs delay processing, amplitude processing, and the like on the input audio signal S1 or audio signal S2 based on the sound emission control data from the main control unit 10 to surround the audio conference apparatus. A sound emission signal corresponding to each of the speakers SP1 to SP16 is generated so as to form a sound emission beam with individual characteristics for each conference participant present.

各Ｄ／Ａコンバータ１３は、入力された放音信号をディジタル−アナログ変換して、各放音アンプ１４に与え、各放音アンプ１４はアナログ化された放音信号を増幅して、各スピーカＳＰ１〜ＳＰ１６に与える。各スピーカＳＰ１〜ＳＰ１６は、入力された電気的な音声信号を音声に変換して放音する。 Each D / A converter 13 performs digital-analog conversion on the input sound emission signal and applies it to each sound emission amplifier 14, and each sound emission amplifier 14 amplifies the analog sound emission signal to produce each speaker. Give to SP1-SP16. Each of the speakers SP1 to SP16 converts the input electrical sound signal into sound and emits the sound.

この際、前述の放音制御を行っていることで、各会議者へ同時に且つ個別に、自装置で受け付けた放音調整データや、他装置に対して設定された収音補正データに対応する放音音声を提供することができる。すなわち、各会議者に対して、それぞれに適切な音量、音質や声質で音声を放音することができる。 At this time, by performing the sound emission control described above, the sound emission adjustment data received by the own device and the sound collection correction data set for the other device are simultaneously and individually received by each conference participant. Sound emission can be provided. That is, it is possible to emit a sound with appropriate volume, sound quality and voice quality for each conference person.

マイクＭＩＣ１０１〜ＭＩＣ１１６、ＭＩＣ２０１〜ＭＩＣ２１６は、自装置の周囲に在席する話者からの発声音を含む周囲の音を収音して電気的な収音信号に変換し、収音アンプ１５に与える。収音アンプ１５は収音信号を増幅してＡ／Ｄコンバータ１６に与え、Ａ／Ｄコンバータ１６は、アナログ形式の収音信号をディジタル変換して、収音制御部１７に出力する。 The microphones MIC101 to MIC116 and MIC201 to MIC216 collect ambient sounds including utterances from speakers present around the device, convert them into electrical sound collection signals, and provide them to the sound collection amplifier 15 . The sound collection amplifier 15 amplifies the sound collection signal and applies it to the A / D converter 16, and the A / D converter 16 converts the analog sound collection signal into a digital signal and outputs it to the sound collection control unit 17.

収音制御部１７は、各マイクＭＩＣ１０１〜ＭＩＣ１１６，ＭＩＣ２０１〜ＭＩＣ２１６の収音信号に対して遅延処理等を行い、各会議者の方位を含む所定方位に強い指向性を有する収音ビーム信号を生成する。例えば、図１の音声会議装置１１１Ａであれば、会議者Ａの方位に対応する収音方位Ｄｉｒ１１、会議者Ｂの方位に対応する収音方位Ｄｉｒ１２、会議者Ｃの方位に対応する収音方位Ｄｉｒ１３、会議者Ｄの方位に対応する収音方位Ｄｉｒ１４、会議者Ｅの方位に対応する収音方位Ｄｉｒ１５、会議者Ｆの方位に対応する収音方位Ｄｉｒ１６、会議者Ｇの方位に対応する収音方位Ｄｉｒ１８を含む、所定の収音方位Ｄｉｒ１１〜Ｄｉｒ１８のそれぞれに強い指向性を有する収音ビーム信号を生成する。収音制御部１７は、生成した各方位の収音ビーム信号の振幅を比較し、最も振幅の大きい収音ビーム信号を選択して、エコーキャンセル部１８に出力する。また、収音制御部１７は、選択した収音ビーム信号に対応する収音方位Ｄｉｒを抽出して、前記話者方位データとしてメイン制御部１０に与える。 The sound collection control unit 17 performs a delay process on the sound collection signals of the microphones MIC101 to MIC116 and MIC201 to MIC216, and generates a sound collection beam signal having a strong directivity in a predetermined direction including the direction of each conference person. To do. For example, in the audio conference apparatus 111A of FIG. 1, the sound collection direction Dir11 corresponding to the direction of the conference A, the sound collection direction Dir12 corresponding to the direction of the conference B, and the sound collection direction corresponding to the direction of the conference C Dir13, sound collection direction Dir14 corresponding to the direction of the party D, sound collection direction Dir15 corresponding to the direction of the party E, sound collection direction Dir16 corresponding to the direction of the party F, and the collection corresponding to the direction of the party G A sound collecting beam signal having strong directivity in each of the predetermined sound collecting directions Dir11 to Dir18 including the sound direction Dir18 is generated. The sound collection control unit 17 compares the amplitudes of the generated sound collection beam signals in the respective directions, selects the sound collection beam signal having the largest amplitude, and outputs it to the echo cancellation unit 18. Further, the sound collection control unit 17 extracts the sound collection direction Dir corresponding to the selected sound collection beam signal, and gives it to the main control unit 10 as the speaker direction data.

エコーキャンセル部１８は、二つのエコーキャンセラ１８１，１８２からなり、各エコーキャンセラ１８１，１８２はそれぞれ適応型フィルタとポストプロセッサとを備える。エコーキャンセラ１８１は、適応型フィルタで音声信号Ｓ１に基づく擬似回帰音信号を生成して、ポストプロセッサで収音制御部１７から出力された収音ビーム信号から、音声信号Ｓ１の擬似回帰音信号を減算して、エコーキャンセラ１８２のポストプロセッサに出力する。エコーキャンセラ１８２は、適応型フィルタで音声信号Ｓ２に基づく擬似回帰音信号を生成して、エコーキャンセラ１８１のポストプロセッサで減算された収音ビーム信号から、音声信号Ｓ２の擬似回帰音信号を減算して、音声信号補正部１９に出力する。これにより、スピーカＳＰからマイクＭＩＣへの回り込み音を抑圧する。 The echo cancellation unit 18 includes two echo cancellers 181 and 182, and each echo canceller 181 and 182 includes an adaptive filter and a post processor, respectively. The echo canceller 181 generates a pseudo regression sound signal based on the audio signal S1 with an adaptive filter, and converts the pseudo regression sound signal of the audio signal S1 from the sound collection beam signal output from the sound collection control unit 17 with a post processor. Subtract and output to the post processor of the echo canceller 182. The echo canceller 182 generates a pseudo regression sound signal based on the audio signal S2 with an adaptive filter, and subtracts the pseudo regression sound signal of the audio signal S2 from the collected sound beam signal subtracted by the post processor of the echo canceller 181. And output to the audio signal correction unit 19. Thereby, the wraparound sound from the speaker SP to the microphone MIC is suppressed.

音声信号補正部１９は、メイン制御部１０からの収音制御データに基づいて、指定された特定話者に対応するエコーキャンセル後の収音ビーム信号に、振幅処理、イコライジング、さらには必要に応じて声質変換処理等を行うことで音声通信信号を生成する。音声信号補正部１９は、この音声通信信号を通信制御部１１に出力する。なお、前述のように、ネットワークサーバ１０１から与えられる収音補正データが自装置を対象とするものではなく、ネットワーク１００に接続する他装置を対象とするものであれば、メイン制御部１０は収音制御データを生成せず、音声信号補正部１９に与えない。したがって、音声信号補正部１９は、入力された収音ビーム信号をそのまま音声通信信号として出力する。このような構成を用いることで、他の各音声会議装置で個別に放音調整せずに、収音側の音声会議装置で音声を補正して送信することができる。すなわち、収音側の音声会議装置で、他の各音声会議装置の会議者に対して一括して音の補正（調整）を行うことができる。 Based on the sound collection control data from the main control unit 10, the audio signal correction unit 19 performs amplitude processing, equalizing, and further, if necessary, on the collected sound beam signal after echo cancellation corresponding to the specified specific speaker. A voice communication signal is generated by performing voice quality conversion processing. The audio signal correction unit 19 outputs this audio communication signal to the communication control unit 11. As described above, if the sound collection correction data provided from the network server 101 is not intended for the device itself, but is intended for other devices connected to the network 100, the main control unit 10 collects data. Sound control data is not generated and is not given to the audio signal correction unit 19. Therefore, the audio signal correction unit 19 outputs the input sound collection beam signal as it is as an audio communication signal. By using such a configuration, the sound can be corrected and transmitted by the voice conference device on the sound collection side without adjusting the sound emission individually by each other voice conference device. That is, it is possible to perform sound correction (adjustment) in a lump for the conference participants of the other audio conference devices by the audio conference device on the sound collection side.

ネットワークサーバ１０１は、ネットワーク制御部１０２と会議情報記憶部１０３とを備える。 The network server 101 includes a network control unit 102 and a conference information storage unit 103.

ネットワーク制御部１０２はネットワーク１００全体の制御を行う。会議情報記憶部１０３は、現在会議に参加している会議者数、放音調整データに基づく調整内容ＤＢ、および、収音補正データの生成履歴である補正履歴等を記憶する。ネットワーク制御部１０２は、会議情報記憶部１０３に記憶された各情報に基づき、特定話者に対する放音調整の数が所定閾値以上であれば、収音補正データを生成して、各音声会議装置に送信する。この際、収音補正データには、収音補正対象となる音声会議装置を示す装置データと対象の話者方位データとが添付される。 The network control unit 102 controls the entire network 100. The conference information storage unit 103 stores the number of conference participants currently participating in the conference, an adjustment content DB based on sound emission adjustment data, a correction history that is a generation history of sound collection correction data, and the like. Based on the information stored in the conference information storage unit 103, the network control unit 102 generates sound collection correction data if the number of sound emission adjustments for a specific speaker is equal to or greater than a predetermined threshold, and each audio conference device Send to. At this time, device data indicating the voice conference device that is the target of sound collection correction and target speaker orientation data are attached to the sound collection correction data.

図７はネットワークサーバ１０１の収音補正設定フローを示すフローチャートである。 FIG. 7 is a flowchart showing a sound collection correction setting flow of the network server 101.

ネットワーク制御部１０２は、ネットワーク１００を介して各音声会議装置から放音調整データを順次受信する（Ｓ２０１）。また、同時に、ネットワーク制御部１０２は、それぞれの放音調整データに対応する話者方位データ（装置データを含む）を検出する（Ｓ２０２）。ここで、話者方位データとは、送信元の音声会議装置から送信される音声ファイルに添付された特定話者を指定する方位データであり、放音調整データを取得した時点で、ネットワーク１００にて送受信される音声ファイルから取得する。 The network control unit 102 sequentially receives sound emission adjustment data from each audio conference device via the network 100 (S201). At the same time, the network control unit 102 detects speaker orientation data (including device data) corresponding to each sound emission adjustment data (S202). Here, the speaker orientation data is orientation data for designating a specific speaker attached to an audio file transmitted from the transmission source audio conference apparatus. When the sound emission adjustment data is acquired, the speaker orientation data is stored in the network 100. Obtain from the audio file sent and received.

ネットワーク制御部１０２は、各放音調整データを解析して、放音調整内容を取得して、話者方位データに関連付けして調整内容ＤＢに記憶する（Ｓ２０３）。ここで、放音調整内容とは、発信元方位データ、音量設定量、高音（ＨＩ）音質設定量、中音（ＭＩＤ）音質設定量、低音（ＬＯＷ）音質設定量、声質設定内容で表され、音量設定量と各音質設定量は、現在値に対する大小により設定される。なお、発信元方位データとは、放音調整データが発信された聴者の方位を特定する方位データであり、各音声会議装置からの放音調整データに関連付けして送信されるものである。 The network control unit 102 analyzes each sound emission adjustment data, acquires the sound emission adjustment content, and stores it in the adjustment content DB in association with the speaker orientation data (S203). Here, the sound emission adjustment content is represented by transmission source azimuth data, volume setting amount, high tone (HI) sound quality setting amount, medium sound (MID) sound quality setting amount, bass (LOW) sound quality setting amount, and voice quality setting content. The volume setting amount and each sound quality setting amount are set depending on the magnitude of the current value. Note that the transmission source azimuth data is azimuth data that specifies the azimuth of the listener from which the sound emission adjustment data is transmitted, and is transmitted in association with the sound emission adjustment data from each audio conference device.

ネットワーク制御部１０２は、話者方位データ毎に発信元方位データ数をカウントして、同じ話者方位データに対する発信元方位データが所定閾値以上であることを検出すると（Ｓ２０４）、該当する話者方位データに対応する方位からの音声を収音時に補正する収音補正データを生成する（Ｓ２０５）。この収音補正データは、装置データを含む補正対象方位データ、「音量」、「音質」、「声質」を備え、「音量」と「音質」とは、放音調整データと同様に現在値に対する相対値で設定される。なお、本説明では特定の話者方位データに対する発信元方位データ数が所定閾値以上になる場合に収音補正データを生成する例を示したが、予め記憶している会議者数に基づき、発信元方位データ数が会議者数の過半数に達した場合に収音補正データを生成するようにしてもよい。なお、ネットワーク制御部１０２は、収音補正データを生成すると、会議情報記憶部１０３に記録する。 When the network control unit 102 counts the number of source direction data for each speaker direction data and detects that the source direction data for the same speaker direction data is equal to or greater than a predetermined threshold (S204), the corresponding speaker Sound collection correction data for correcting the sound from the direction corresponding to the direction data at the time of sound collection is generated (S205). This sound collection correction data includes correction direction data including device data, “volume”, “sound quality”, and “voice quality”, and “volume” and “sound quality” are the same as the sound emission adjustment data with respect to the current value. Set as a relative value. In this description, the example in which the sound collection correction data is generated when the number of transmission source direction data with respect to specific speaker direction data is equal to or greater than a predetermined threshold is shown. Sound collection correction data may be generated when the number of original orientation data reaches a majority of the number of conference participants. When the network control unit 102 generates sound collection correction data, the network control unit 102 records it in the conference information storage unit 103.

ネットワーク制御部１０２は、ネットワーク１００を介して、収音補正データを各音声会議装置１１１Ａ〜１１１Ｃに送信する（Ｓ２０６）。 The network control unit 102 transmits the sound collection correction data to each of the audio conference apparatuses 111A to 111C via the network 100 (S206).

次に、音声会議装置における放音調整および収音補正のより具体的な方法について図を参照して説明する。
図８は音声会議装置の放収音処理を示すフローチャートである。
各音声会議装置１１１は、通信制御部１１での音声ファイルの受信状況、および、収音制御部１７での収音状況に基づいて、自装置が収音状態、放音状態、待受状態のいずれの状態であるかを判断する（Ｓ１）。ここで、放音状態であれば以下に示す放音処理を行い、収音状態であれば以下に示す収音処理を行い、待受状態であれば放音状態または収音状態になるまで状態検出を繰り返す。 Next, a more specific method of sound emission adjustment and sound collection correction in the audio conference apparatus will be described with reference to the drawings.
FIG. 8 is a flowchart showing the sound emission and collection processing of the audio conference apparatus.
Each voice conference device 111 is in a sound pickup state, a sound emission state, and a standby state based on the reception status of the voice file in the communication control unit 11 and the sound pickup state in the sound pickup control unit 17. It is determined in which state (S1). Here, if the sound is emitted, the following sound emission process is performed. If the sound is collected, the sound collection process shown below is performed. If the sound is in the standby state, the sound is emitted or the sound is collected. Repeat detection.

このような放音、収音、待受処理の状態で、ネットワークサーバ１０１から収音補正データを受信したり、会議者（リモコン）から放音制御の操作入力が行われると、音声会議装置は、図９に示す割込処理を実行する。
図９は音声会議装置の放音調整変更、収音補正変更の割込処理を示すフローチャートである。
音声会議装置１１１は、電源ＯＮ状態であれば、放音、収音、待受のいずれの状態であっても、随時ネットワークサーバ１０１およびリモコン１２０からの割り込み処理を受け付けられる状態で動作する。そして、音声会議装置１１１は割込を検出すると（Ｓ１０１）、当該割込処理の種別を判別する（Ｓ１０２）。 When the sound collection correction data is received from the network server 101 in the state of sound emission, sound collection, and standby processing, or a sound emission control operation input is performed from a conference person (remote controller), the audio conference apparatus The interrupt process shown in FIG. 9 is executed.
FIG. 9 is a flowchart showing interruption processing for sound emission adjustment change and sound collection correction change of the audio conference apparatus.
The voice conference device 111 operates in a state where interrupt processing from the network server 101 and the remote controller 120 can be accepted at any time in any state of sound emission, sound collection, and standby as long as the power is on. When the voice conference apparatus 111 detects an interrupt (S101), it determines the type of the interrupt process (S102).

具体的には、リモコン１２０からのリモコン通信信号を検出すると、音声会議装置１１１はユーザ割込であることを検出する。そして、音声会議装置１１１は、リモコン１２０により設定された放音調整内容を受け付ける（Ｓ１０３）。この際、音声会議装置１１１は、装置周囲に配置されたいずれのリモコン１２０からのリモコン通信信号であるかを同時に検出する。 Specifically, when a remote control communication signal from the remote controller 120 is detected, the audio conference apparatus 111 detects a user interruption. Then, the audio conference apparatus 111 receives the sound emission adjustment content set by the remote controller 120 (S103). At this time, the audio conference apparatus 111 simultaneously detects which remote controller 120 is arranged around the apparatus to determine the remote control communication signal.

音声会議装置１１１は、放音を行う各方位（会議者方位）に対してそれぞれ放音調整フラグを備えている。音声会議装置１１１は、送信元のリモコン１２０に対応する方位に対して、放音調整フラグをＯＮ状態にする（Ｓ１０４）。 The audio conference apparatus 111 includes a sound emission adjustment flag for each direction (conference person's direction) that emits sound. The audio conference apparatus 111 sets the sound emission adjustment flag to the ON state for the direction corresponding to the remote controller 120 that is the transmission source (S104).

そして、音声会議装置１１１は、受け付けた放音調整内容から放音調整データを生成して記憶し（Ｓ１０５）、放音調整データと発信元の方位データとを関連付けして、通信制御部１１を介してネットワークサーバ１０１に送信する（Ｓ１０６）。 Then, the audio conference apparatus 111 generates and stores sound emission adjustment data from the received sound emission adjustment content (S105), associates the sound emission adjustment data with the direction data of the transmission source, and causes the communication control unit 11 to Via the network server 101 (S106).

一方、通信制御部１１にてネットワークサーバ１０１からの収音補正データを検出すると、音声会議装置１１１はサーバ割込であることを検出し、受信した収音補正データを受け付ける（Ｓ１０７）。音声会議装置１１１は、収音補正データを解析して、装置データから自装置を対象とする収音補正データであるかどうかを検出する（Ｓ１０８）。 On the other hand, when the communication control unit 11 detects sound collection correction data from the network server 101, the audio conference apparatus 111 detects that the server is interrupted and accepts the received sound collection correction data (S107). The audio conference apparatus 111 analyzes the sound collection correction data and detects whether or not the sound collection correction data is for the own apparatus from the apparatus data (S108).

音声会議装置１１１は、自装置を対象とする収音補正データであれば、収音補正データから話者方位データを取得する。音声会議装置１１１は、各方位に対してそれぞれ収音補正フラグを備えており、取得した話者方位データに対応する方位に対して収音補正フラグをＯＮ状態にする（Ｓ１０９）。そして、音声会議装置１１１は収音補正データを記憶する（Ｓ１１０）。 If the audio conferencing device 111 is sound collection correction data for the device itself, the audio conference device 111 acquires speaker orientation data from the sound collection correction data. The audio conference apparatus 111 includes a sound collection correction flag for each direction, and turns on the sound collection correction flag for the direction corresponding to the acquired speaker direction data (S109). Then, the audio conference apparatus 111 stores sound collection correction data (S110).

このように、音声会議装置１１１は、放音時には放音調整内容に基づいて放音調整フラグを設定し、収音時には自装置が補正対象であれば収音補正フラグを設定する。 In this way, the audio conference apparatus 111 sets the sound emission adjustment flag based on the sound emission adjustment content when sound is output, and sets the sound collection correction flag when the own apparatus is a correction target during sound collection.

図８に示すフローに戻り、自装置が放音状態であることを検出すると、音声会議装置１１１のメイン制御部１０は、ネットワークサーバ１０１から収音補正データを取得しているかどうかを検出する（Ｓ２）。メイン制御部１０は、収音補正データを取得して記憶していれば、放音を行う各方位に対して放音調整データを受け付けているかどうかを検出する（Ｓ４）。メイン制御部１０は、放音調整データを受け付けていなければ、すなわち全ての方位に対して放音調整フラグがＯＦＦ状態であることを確認すれば、収音補正データに基づいて、放音をする全方位に対して同等の放音調整量からなる放音制御データを生成し、放音制御部１２に与える（Ｓ６）。 Returning to the flow shown in FIG. 8, when it is detected that the own apparatus is in the sound emission state, the main control unit 10 of the audio conference apparatus 111 detects whether or not sound collection correction data has been acquired from the network server 101 ( S2). If the sound collection correction data is acquired and stored, the main control unit 10 detects whether sound emission adjustment data is received for each direction in which sound emission is performed (S4). If the main control unit 10 has not received the sound emission adjustment data, that is, if it is confirmed that the sound emission adjustment flag is OFF for all directions, the main control unit 10 emits sound based on the sound collection correction data. Sound emission control data having the same sound emission adjustment amount for all directions is generated and given to the sound emission control unit 12 (S6).

また、メイン制御部１０は、放音調整データを受け付けていれば、収音補正データによる放音調整量を基準量として、該基準量から放音調整データに基づく放音調整量を差分した差分量を、放音調整フラグがＯＮ状態にある方位毎に設定することで放音制御データを生成し、放音制御部１２に与える（Ｓ７）。すなわち、放音調整データを受け付けた方位（放音調整フラグがＯＮ状態の方位）には、差分量に基づく放音調整を行い、放音調整データを受け付けていない方位（放音調整フラグがＯＦＦ状態の方位）には、収音補正データに基づく放音調整を行う放音制御データを与える。 Further, if the main control unit 10 has received the sound emission adjustment data, the difference obtained by subtracting the sound emission adjustment amount based on the sound emission adjustment data from the reference amount using the sound emission adjustment amount based on the sound collection correction data as a reference amount. Sound emission control data is generated by setting the amount for each direction in which the sound emission adjustment flag is in the ON state, and is given to the sound emission control unit 12 (S7). That is, for the direction in which the sound emission adjustment data is received (the direction in which the sound emission adjustment flag is ON), sound emission adjustment is performed based on the difference amount, and the direction in which the sound emission adjustment data is not received (the sound emission adjustment flag is OFF) Sound direction control data for performing sound emission adjustment based on the sound collection correction data is given to the (direction of state).

また、メイン制御部１０は、収音補正データがない場合にも、放音を行う各方位に対して放音調整データを受け付けているかどうかを検出する（Ｓ５）。メイン制御部１０は、放音調整データを受け付けていなければ、すなわち、全方位に対して放音調整フラグがＯＦＦ状態であれば、全方位に対して受信した音声通信信号をそのまま放音する放音制御データを生成し、放音制御部１２に与える。なお、この場合、特に放音制御データを与えなくても良い。 Moreover, the main control part 10 detects whether sound emission adjustment data is received with respect to each azimuth | direction which performs sound emission, even when there is no sound collection correction data (S5). If the sound emission adjustment data is not received, that is, if the sound emission adjustment flag is OFF for all directions, the main control unit 10 releases the voice communication signal received for all directions as it is. Sound control data is generated and given to the sound emission control unit 12. In this case, it is not necessary to give sound emission control data.

また、メイン制御部１０は、収音補正データが無い場合で、放音調整データを受け付けている場合には、放音調整フラグがＯＮ状態である各方位の放音調整量を設定した放音制御データを生成して、放音制御部１２に与える（Ｓ８）。すなわち、放音調整データを受け付けた方位（放音調整フラグがＯＮ状態の方位）には、放音調整データに基づく放音調整を行い、放音調整データを受け付けていない方位（放音調整フラグがＯＦＦ状態の方位）には、そのまま放音する放音制御データを与える。 In addition, when there is no sound collection correction data and the sound emission adjustment data is received, the main control unit 10 sets the sound emission adjustment amount for each direction in which the sound emission adjustment flag is ON. Control data is generated and given to the sound emission control unit 12 (S8). That is, in the direction in which the sound emission adjustment data has been received (the direction in which the sound emission adjustment flag is ON), the sound emission adjustment is performed based on the sound emission adjustment data, and the direction in which the sound emission adjustment data is not received (the sound emission adjustment flag) In the OFF state), sound emission control data for giving a sound as it is is given.

放音制御部１２は、与えられた放音制御データに基づいて、各方位へ所望の放音ビームが形成されるように、各スピーカＳＰ１〜ＳＰ１６に与える放音信号を生成して出力する。 The sound emission control unit 12 generates and outputs sound emission signals to be given to the speakers SP1 to SP16 based on the given sound emission control data so that a desired sound emission beam is formed in each direction.

一方、自装置が収音状態であることを検出すると、音声会議装置１１１のメイン制御部１０は、ネットワークサーバ１０１から収音補正データを取得しているかどうかを検出する（Ｓ３）。収音補正データを受け付けており、自装置に対する収音補正データであることを検出すると、すなわちいずれかの方位に対して収音補正フラグがＯＮ状態であることを検出すると、メイン制御部１０は、収音補正データに基づく収音制御データを音声信号補正部１９に与える（Ｓ９）。 On the other hand, when it is detected that the own device is in the sound collecting state, the main control unit 10 of the audio conference apparatus 111 detects whether or not sound collecting correction data is acquired from the network server 101 (S3). When receiving the sound collection correction data and detecting that it is the sound collection correction data for the own device, that is, detecting that the sound collection correction flag is in the ON state for any one direction, the main control unit 10 The sound collection control data based on the sound collection correction data is given to the sound signal correction unit 19 (S9).

また、音声会議装置１１１のメイン制御部１０は収音補正データを取得していなければ、音声信号補正部１９に対して特に制御を行わない。 The main control unit 10 of the audio conference apparatus 111 does not particularly control the audio signal correction unit 19 unless the sound collection correction data is acquired.

音声信号補正部１９は、収音制御データが与えられていれば、メイン制御部１０から与えられる話者方位データと収音制御データとに基づいて、収音補正フラグがＯＮ状態である方位からの収音ビーム信号を補正して、音声通信信号を生成する。通信制御部１１は、この音声通信信号に話者方位データおよび装置データを添付してネットワーク１００に送信する（Ｓ１０）。 If the sound collection control data is given, the voice signal correction unit 19 starts from the direction in which the sound collection correction flag is ON based on the speaker orientation data and the sound collection control data given from the main control unit 10. A sound communication signal is generated by correcting the collected sound beam signal. The communication control unit 11 attaches the speaker orientation data and the device data to the voice communication signal and transmits it to the network 100 (S10).

次に、このような構成を用いた場合の実際の放収音の状況を、図１，図１０〜図１３を参照して説明する。
なお、以下の説明では、地点ｂの会議者Ｊの声が聴き取り難い状況を例に示したものである。 Next, the actual state of sound emission and collection when such a configuration is used will be described with reference to FIGS.
In the following description, a situation in which it is difficult to hear the voice of the conference person J at the point b is shown as an example.

（１）放音調整個別対応
図１０は放音調整個別対応の場合の放収音状況を示した図である。
図１０に示すように、地点ｂの会議者Ｊが発言中に、地点ａの会議者Ａと会議者Ｇとがリモコン１２０を操作して放音調整を行った場合、地点ａの音声会議装置１１１Ａは、各リモコン１２０で操作された放音調整内容を取得する。この場合、会議者Ａに対して、音量を「＋７」に、音質ＨＩを「＋５」に、音質ＬＯＷを「−２」にする放音調整内容と、会議者Ｇに対して、音量を「＋５」に、音質ＨＩを「＋４」にする放音調整内容とを取得する。音声会議装置１１１Ａは、これら放音調整内容を放音調整データとして、ネットワークサーバ１０１に送信するとともに、会議者Ａ，Ｇのそれぞれに該当する方位Ｄｉｒ１１，Ｄｉｒ１８に対して放音調整フラグをＯＮに設定する。そして、音声会議装置１１１Ａは、受信した音声通信信号から話者データを取得して、会議者Ｊの声であることを検出すると、方位Ｄｉｒ１１，Ｄｉｒ１８への放音音声を、それぞれの放音調整内容に従って調整して放音する。これにより、会議者Ａ，Ｇには、会議者Ｊの声が、指定した放音調整内容に従って調整された状態で聴ける。すなわち、会議者Ａには、音量が「７」大きく、高音が「５」大きく、低音が「２」小さくなった会議者Ｊの声が聞こえ、会議者Ｇには、音量が「５」大きく、高音が「４」大きくなった会議者Ｊの声が聞こえ、他の会議者（地点ａの会議者Ｂ〜会議者Ｆ、地点ｃの会議者Ｍ〜会議者Ｑ）には、会議者Ｊの声が調整されることなく、そのまま聞こえる。 (1) Individual correspondence of sound emission adjustment FIG. 10 is a diagram showing a sound emission and collection situation in the case of individual correspondence of sound emission adjustment.
As shown in FIG. 10, when the conference person A and the conference person G at the point a perform sound emission adjustment by operating the remote controller 120 while the conference person J at the point b speaks, the audio conference apparatus at the point a 111A acquires the sound emission adjustment contents operated by each remote controller 120. In this case, with respect to the conference A, the sound output adjustment contents for setting the volume to “+7”, the sound quality HI to “+5”, and the sound quality LOW to “−2”, and the volume to the conference G are “ The sound emission adjustment content for setting the sound quality HI to “+4” is acquired at “+5”. The audio conference apparatus 111A transmits these sound emission adjustment contents as sound emission adjustment data to the network server 101, and sets the sound emission adjustment flag to ON for the directions Dir11 and Dir18 corresponding to the conference participants A and G, respectively. Set. When the voice conference apparatus 111A acquires the speaker data from the received voice communication signal and detects that the voice is the voice of the conference person J, the voice conference apparatus 111A adjusts the sound emission to the directions Dir11 and Dir18. Adjust according to the content and emit sound. As a result, the conference participants A and G can listen to the voice of the conference participant J in a state adjusted according to the specified sound emission adjustment content. That is, conference A can hear the voice of conference J where the volume is “7” higher, the treble is “5” higher, and the bass is “2” lower, and conference G is “5” higher. , The voice of the conferee J whose treble is increased by “4” is heard, and the other conferees (the conferee B to the conferee F at the point a and the conferee M to the conferee Q at the point c) Can be heard without being adjusted.

この場合、放音調整を行った会議者が、全体の会議者に対して少数派であるので、ネットワークサーバ１０１は、会議者Ｊの音声を収音時に一括して補正する制御を行わない。 In this case, since the conferee who performed the sound emission adjustment is a minority group with respect to the entire conferencing party, the network server 101 does not perform control to collectively correct the voice of the conferee J at the time of sound collection.

このように、特定会議者（話者）に対して放音調整を行う会議者（聴者）数が極少ない場合には、それぞれの聴者がいる音声会議装置で聴者毎に放音調整を行う。これにより、放音調整したい聴者にのみ調整内容に応じた放音を行うことができる。 In this way, when the number of conferences (listeners) that perform sound emission adjustment for a specific conference (speaker) is extremely small, sound output adjustment is performed for each listener in the audio conference device in which each listener is present. Thereby, only the listener who wants to adjust the sound emission can emit sound according to the adjustment contents.

（２）収音補正一括対応
図１１、図１２は、収音補正一括対応の場合の放収音状況を示した図であり、図１１が一括補正前、図１２が一括補正後の状況を示す。 (2) Collecting sound correction collectively FIG. 11 and FIG. 12 are diagrams showing the state of sound emission and collection in the case of collect sound correction collectively. FIG. 11 shows the situation before collective correction and FIG. 12 shows the situation after collective correction. Show.

図１１に示すように、地点ｂの会議者Ｊが発言中に、地点ａの会議者Ａと会議者Ｇとがリモコン１２０を操作して放音調整を行った場合、地点ａの音声会議装置１１１Ａは、各リモコン１２０で操作された放音調整内容を取得する。この場合、会議者Ａに対して、音量を「＋７」に、音質ＨＩを「＋５」に、音質ＬＯＷを「−２」にする放音調整内容を取得し、会議者Ｇに対して、音量を「＋５」に、音質ＨＩを「＋４」にする放音調整内容を取得する。音声会議装置１１１Ａは、これら放音調整内容を放音調整データとして、ネットワークサーバ１０１に送信するとともに、会議者Ａ，Ｇのそれぞれに該当する方位Ｄｉｒ１１，Ｄｉｒ１８に対して放音調整フラグをＯＮに設定する。 As shown in FIG. 11, when the conference person A and the conference person G at the point a perform sound emission adjustment by operating the remote controller 120 while the conference person J at the point b speaks, the audio conference apparatus at the point a 111A acquires the sound emission adjustment contents operated by each remote controller 120. In this case, for the conference A, the sound emission adjustment content for obtaining the volume “+7”, the sound quality HI “+5”, and the sound quality LOW “−2” is acquired, and the volume of the conference G is obtained. Is set to “+5”, and the sound emission adjustment contents for setting the sound quality HI to “+4” are acquired. The audio conference apparatus 111A transmits these sound emission adjustment contents as sound emission adjustment data to the network server 101, and sets the sound emission adjustment flag to ON for the directions Dir11 and Dir18 corresponding to the conference participants A and G, respectively. Set.

同様に、地点ｃの会議者Ｍと会議者Ｎと会議者Ｑとがリモコン１２０を操作して放音調整を行った場合、地点ｃの音声会議装置１１１Ｃは、各リモコン１２０で操作された放音調整内容を取得する。この場合、会議者Ｍに対して、音量を「＋３」に、音質ＨＩを「＋２」にする放音調整内容を取得し、会議者Ｎに対して、音量を「＋４」に、音質ＨＩを「＋２」に、音質ＬＯＷを「−１」にする放音調整内容を取得し、会議者Ｑに対して、音量を「＋２」に、音質ＨＩを「＋１」にする放音調整内容を取得する。音声会議装置１１１Ｃは、これら放音調整内容を放音調整データとして、ネットワークサーバ１０１に送信するとともに、会議者Ｍ，Ｎ，Ｑにそれぞれ該当する方位Ｄｉｒ３１，Ｄｉｒ３４，Ｄｉｒ３８に対して放音調整フラグをＯＮに設定する。そして、音声会議装置１１１Ａ，１１１Ｃは、受信した音声通信信号から話者データを取得して、会議者Ｊの声であることを検出すると、方位Ｄｉｒ１１，Ｄｉｒ１８，Ｄｉｒ３１，Ｄｉｒ３４，Ｄｉｒ３８への放音音声を、それぞれの放音調整内容に従って調整して放音する。これにより、会議者Ａ，Ｇ，Ｍ，Ｎ，Ｑには、会議者Ｊの声が、指定した放音調整内容に従って調整された状態で聴ける。すなわち、会議者Ａには、音量が「７」大きく、高音が「５」大きく、低音が「２」小さくなった会議者Ｊの声が聞こえ、会議者Ｇには、音量が「５」大きく、高音が「４」大きくなった会議者Ｊの声が聞こえる。また、会議者Ｍには、音量が「３」大きく、高音が「２」大きくなった会議者Ｊの声が聞こえ、会議者Ｎには、音量が「４」大きく、高音が「２」大きく、低音が「１」小さくなった会議者Ｊの声が聞こえ、会議者Ｍには、音量が「２」大きく、高音が「１」大きくなった会議者Ｊの声が聞こえる。 Similarly, when the conference person M, the conference person N, and the conference person Q at the point c operate the remote controller 120 to adjust the sound emission, the audio conference device 111C at the point c is operated by the remote controller 120. Get the sound adjustment contents. In this case, the sound emission adjustment content for obtaining the volume “+3” and the sound quality HI “+2” is obtained for the conference person M, and the sound quality HI is obtained for the conference person N with the volume “+4”. The sound emission adjustment content that sets the sound quality LOW to “−1” is acquired at “+2”, and the sound emission adjustment content that sets the volume to “+2” and the sound quality HI to “+1” is acquired for the conference person Q. To do. The audio conference apparatus 111C transmits these sound emission adjustment contents as sound emission adjustment data to the network server 101 and emits sound emission adjustment flags for the directions Dir31, Dir34, and Dir38 corresponding to the conference persons M, N, and Q, respectively. Set to ON. When the voice conference apparatuses 111A and 111C acquire the speaker data from the received voice communication signal and detect that the voice is the voice of the conference person J, the voice conference apparatuses 111A and 111C emit sound to the directions Dir11, Dir18, Dir31, Dir34, and Dir38. The sound is adjusted according to the sound emission adjustment contents and emitted. Accordingly, the voices of the conference person J can be heard by the conference persons A, G, M, N, and Q in a state adjusted according to the designated sound emission adjustment contents. That is, conference A can hear the voice of conference J where the volume is “7” higher, the treble is “5” higher, and the bass is “2” lower, and conference G is “5” higher. You can hear the voice of the conferee J whose treble is "4" louder. In addition, the conference participant M can hear the voice of the conference participant J whose volume is increased by “3” and the high tone is increased by “2”, and the conference participant N is increased by “4” and the high tone is increased by “2”. The voice of the conferee J whose bass is reduced by “1” is heard, and the voice of the conferee J whose volume is increased by “2” and whose treble is increased by “1” is heard by the conferee.

ネットワークサーバ１０１は、会議者Ｊに対する放音調整データの数が、会議者数の過半数を超えたことを検出すると、これら放音調整データの各調整量を取得し、平均値処理する。図１１の例であれば、音量が「＋４」、音質ＨＩが「＋３」、音質ＬＯＷが「−１」と算出される。ネットワークサーバ１０１は、このように算出した各調整量を用いて収音補正データを生成し、補正対象となる話者データ（方位データ）を添付して各音声会議装置１１１Ａ〜１１１Ｃに与える。 When the network server 101 detects that the number of sound emission adjustment data for the conference participant J exceeds a majority of the number of conference participants, the network server 101 acquires each adjustment amount of the sound emission adjustment data and performs average processing. In the example of FIG. 11, the volume is calculated as “+4”, the sound quality HI is “+3”, and the sound quality LOW is calculated as “−1”. The network server 101 generates sound collection correction data using the adjustment amounts calculated as described above, and attaches speaker data (direction data) to be corrected to the audio conference apparatuses 111A to 111C.

補正対象の会議者Ｊが在席する音声会議装置１１１Ｂは、受信した収音補正データに基づいて、会議者Ｊから収音した収音ビーム信号を補正して、ネットワーク１００に送信する。この例では、会議者Ｊの収音ビーム信号の音量を「＋４」とし、音質ＨＩを「＋３」とし、音質ＬＯＷを「−１」と補正して送信する。 The audio conference apparatus 111B in which the conference subject J to be corrected is present corrects the collected beam signal collected from the conference participant J based on the received collected sound correction data, and transmits it to the network 100. In this example, the volume of the collected beam signal of the conference person J is set to “+4”, the sound quality HI is set to “+3”, and the sound quality LOW is corrected to “−1” and transmitted.

このままでは、音声会議装置１１１Ａ，１１１Ｃで放音調整済みの方位では、放音調整内容と補正内容とが加算された状態で放音されるので、必要以上に大きな音となってしまう。 In this state, in the direction in which sound emission adjustment has been performed by the audio conference apparatuses 111A and 111C, sound is emitted in a state where the sound emission adjustment content and the correction content are added, resulting in a louder sound than necessary.

そこで、音声会議装置１１１Ａは、受信した収音補正データの各調整量と予め設定記憶した放音調整データの各調整量とを差分し、この差分値により設定される調整量から放音調整を行う。具体的には、会議者Ａには、音量「７−４」＝「＋３」、音質ＨＩ「５−３」＝「＋２」、音質ＬＯＷ「−２−（−１）」＝「−１」に調整量を変更する。会議者Ｇには、音量「５−４」＝「＋１」、音質ＨＩ「４−３」＝「＋１」、音質ＬＯＷ「０−（−１）」＝「＋１」に調整量を変更する。そして、この変更された調整量に基づいて、補正された状態（音量「＋４」、音質ＨＩ「＋３」、音質ＬＯＷ「−１」）の音声通信信号を放音調整する。これにより、会議者Ａ，Ｇは、自身が調整した内容に応じた会議者Ｊの声を聞ける。 Therefore, the audio conference apparatus 111A makes a difference between each adjustment amount of the received sound collection correction data and each adjustment amount of the sound emission adjustment data set and stored in advance, and performs sound emission adjustment from the adjustment amount set by this difference value. Do. Specifically, the conference person A has a volume “7-4” = “+ 3”, a sound quality HI “5-3” = “+ 2”, and a sound quality LOW “-2-(− 1)” = “− 1”. Change the adjustment amount to. For the conference person G, the adjustment amount is changed to volume “5-4” = “+ 1”, sound quality HI “4-3” = “+ 1”, and sound quality LOW “0-(− 1)” = “+ 1”. Then, based on the changed adjustment amount, the sound communication signal in the corrected state (volume “+4”, sound quality HI “+3”, sound quality LOW “−1”) is adjusted for sound emission. Thereby, the conference persons A and G can hear the voice of the conference person J according to the content which self adjusted.

また、音声会議装置１１１Ｃは、音声会議装置１１１Ａと同様に、差分値により設定される調整量から放音調整を行う。具体的には、会議者Ｍには、音量「３−４」＝「−１」、音質ＨＩ「２−３」＝「−１」、音質ＬＯＷ「０−（−１）」＝「＋１」に調整量を変更する。会議者Ｎには、音量「４−４」＝「０」、音質ＨＩ「２−３」＝「−１」、音質ＬＯＷ「−１−（−１）」＝「０」に調整量を変更する。会議者Ｑには、音量「２−４」＝「−２」、音質ＨＩ「１−３」＝「−２」、音質ＬＯＷ「０−（−１）」＝「＋１」に調整量を変更する。そして、この変更された調整量に基づいて、補正された状態（音量「＋４」、音質ＨＩ「＋３」、音質ＬＯＷ「−１」）の音声通信信号を放音調整する。これにより、会議者Ｍ，Ｎ，Ｑも、自身が調整した内容に応じた会議者Ｊの声を聞ける。 Similarly to the audio conference apparatus 111A, the audio conference apparatus 111C performs sound emission adjustment from the adjustment amount set by the difference value. Specifically, the conference person M has a volume “3-4” = “− 1”, a sound quality HI “2-3” = “− 1”, and a sound quality LOW “0 − (− 1)” = “+ 1”. Change the adjustment amount to. For the conference person N, the adjustment amount is changed to volume “4-4” = “0”, sound quality HI “2-3” = “− 1”, and sound quality LOW “−1 − (− 1)” = “0”. To do. For the conference person Q, the volume is adjusted to “2-4” = “− 2”, the sound quality HI “1-3” = “− 2”, and the sound quality LOW “0 − (− 1)” = “+ 1”. To do. Then, based on the changed adjustment amount, the sound communication signal in the corrected state (volume “+4”, sound quality HI “+3”, sound quality LOW “−1”) is adjusted for sound emission. Thereby, the conference persons M, N, and Q can also hear the voice of the conference person J according to the content adjusted by the conference persons.

また、他の会議者（地点ａの会議者Ｂ〜会議者Ｆ、地点ｃの会議者Ｐ）には、会議者Ｊの声が、収音側で補正された状態で聞こえる。 In addition, the voices of the conferee J can be heard by other conferees (the conferee B to the conferee F at the point a and the conferee P at the point c) in a state corrected on the sound collection side.

これにより、放音調整した各会議者（聴者）には、会議者（聴者）が設定した音で話者の音声を聞かせることができ、放音調整していない会議者（聴者）に対しても補正された聴き取り易いであろう音で話者の音声を聞かせることができる。 This allows each speaker (listener) adjusted for sound emission to hear the speaker's voice with the sound set by the conference (listener). However, the speaker's voice can be heard with the corrected sound that is easy to hear.

なお、放音調整を行っていない会議者は、会議者Ｊの声を聴き取りにくいとは感じていない場合もある。 In addition, the conference person who has not performed the sound emission adjustment may not feel that it is difficult to hear the voice of the conference person J.

この場合、図１３に示すように、放音調整していない会議者に対して逆補正をかけるようにしてもよい。 In this case, as shown in FIG. 13, reverse correction may be applied to a conference person who has not adjusted the sound emission.

図１３は、図１１、図１２と同様な場合で且つ逆補正を行う場合の放収音状況を示した図である。
放音調整した会議者に対する放音調整の方法は、図１２の場合と同じであるので説明は省略する。 FIG. 13 is a diagram showing a sound emission and collection situation in the case similar to FIGS. 11 and 12 and performing reverse correction.
The method for adjusting the sound emission for the conference participant who has adjusted the sound emission is the same as in FIG.

音声通信信号を受信する側の音声会議装置１１１Ａ，１１１Ｃは、ネットワークサーバ１０１から収音補正データを取得すると、当該収音補正データの各調整量を逆補正する逆補正用放音調整データを生成する。図１３の例であれば、補正調整量である音量「＋４」、音質ＨＩ「＋３」、音質ＬＯＷ「−１」に対して、逆補正調整量として、音量「−４」、音質ＨＩ「−３」、音質ＬＯＷ「＋１」を設定する。 Upon receiving the sound collection correction data from the network server 101, the voice conference apparatuses 111A and 111C on the side that receives the voice communication signal generate reverse correction sound emission adjustment data that reversely corrects each adjustment amount of the sound collection correction data. To do. In the example of FIG. 13, with respect to the volume “+4”, the sound quality HI “+3”, and the sound quality LOW “−1” that are correction adjustment amounts, the volume “−4”, the sound quality HI “−” 3 ”and sound quality LOW“ +1 ”are set.

音声会議装置１１１Ａは、図１２の場合と異なり、全ての会議者Ａ〜Ｇに対応する方位Ｄｉｒ１１〜Ｄｉｒ１６，Ｄｉｒ１８に対して放音調整フラグをＯＮにし、放音調整が指定されていない会議者Ｂ〜Ｆに対しては、逆補正用放音調整データを適用する。これにより、会議者Ｂ〜Ｆには、補正される前の生の会議者Ｊの音声を聞かせることができる。同様に、音声会議装置１１１Ｃも、放音調整が指定されていない会議者Ｐに対しては、逆補正用放音調整データを適用する。これにより、会議者Ｐにも、補正される前の生の会議者Ｊの音声を聞かせることができる。 Unlike the case of FIG. 12, the audio conference apparatus 111A turns on the sound emission adjustment flag for the directions Dir11 to Dir16 and Dir18 corresponding to all the participants A to G, and the conference person for whom sound emission adjustment is not specified. For B to F, reverse correction sound emission adjustment data is applied. Thereby, the audio | voices of the raw conference person J before correction | amendment can be heard to the conference persons BF. Similarly, the audio conferencing apparatus 111C also applies reverse correction sound emission adjustment data to the conference person P for whom sound emission adjustment is not designated. Thereby, the voice of the raw conference person J before correction | amendment can also be heard also to the conference person P. FIG.

これにより、放音調整を行っていない人は、会議者Ｊの声が聴き取り難い訳ではないという判断があるものとして、そのままの音声を放音することができる。 Thereby, the person who has not performed the sound emission adjustment can emit the voice as it is, assuming that there is a judgment that the voice of the conference person J is not difficult to hear.

なお、このような調整不必要の場合、リモコン１２０の調整不必要のボタンやコマンドを予め設けておけば、調整不必要かどうかをより明確に判断することができる。 In the case where such adjustment is unnecessary, it is possible to more clearly determine whether adjustment is unnecessary by providing buttons and commands that do not require adjustment on the remote controller 120 in advance.

以上のように、本実施形態の構成および処理を用いることにより、遠隔地間で会議を行うような場合に、比較的簡素なシステムで、特定話者の声を聴者毎に違う音量、音質で聞かせることができる。 As described above, by using the configuration and processing of the present embodiment, when a conference is performed between remote locations, the voice of a specific speaker can be changed with a different volume and sound quality for each listener with a relatively simple system. I can tell you.

なお、前述の説明では、声質に関する調整・補正の例を示さなかったが、予め聴き取りやすい声質を記憶しておき、適宜声質を選択することで、選択した声質で話者の音声を放音することもできる。例えば、テレビアナウンサーのフォルマント情報を記憶しておき、この声質が選択されれば、特定話者の音声をフォルマント変換して放音すればよい。 In the above description, the example of adjustment / correction related to voice quality was not shown. However, voice quality that is easy to hear is stored in advance, and the voice of the speaker is emitted with the selected voice quality by selecting the appropriate voice quality. You can also For example, formant information of a television announcer is stored, and if this voice quality is selected, the sound of a specific speaker may be formant converted and emitted.

また、前述の説明において、ネットワークサーバ１０１は、収音補正データや放音調整データを対応する話者方位データとともに、会議情報記憶部１０３に記憶しておいても良い。そして、次回以降、同じメンバで会議が行われる場合に、ネットワークサーバ１０１は、この話者方位データと収音補正データ、放音調整データを読み出して、音声会議装置１１１Ａ〜１１１Ｃに送信する。各音声会議装置１１１Ａ〜１１１Ｃは、取得した収音補正データ、放音調整データに基づいて、収音、放音する。これにより、次回以降は、会議の最初から、各会議者が自分の好みの音声で話者の発言を聞くことができる。 In the above description, the network server 101 may store sound collection correction data and sound emission adjustment data in the conference information storage unit 103 together with corresponding speaker orientation data. Then, when a meeting is held with the same member from the next time onward, the network server 101 reads the speaker orientation data, sound collection correction data, and sound emission adjustment data, and transmits them to the audio conference apparatuses 111A to 111C. Each of the audio conference apparatuses 111A to 111C collects and emits sound based on the acquired sound collection correction data and sound emission adjustment data. As a result, from the beginning of the conference, each conference participant can hear the speaker's speech with his / her favorite voice.

本発明の実施形態の音声会議システムの構成図である。It is a block diagram of the audio conference system of embodiment of this invention. 図１に示す音声会議システム中の地点ａの構成を示す図およびリモコン装置１２０（１２０Ａ〜１２０Ｇ）の平面図である。It is a figure which shows the structure of the point a in the audio conference system shown in FIG. 1, and a top view of remote control device 120 (120A-120G). 本発明の実施形態の音声会議装置１１１（１１１Ａ〜１１１Ｃ）の両側面図と底面図である。It is the both-sides figure and bottom view of the audio conference apparatus 111 (111A-111C) of embodiment of this invention. 図３に示す音声会議装置の主要構成を示すブロック図である。It is a block diagram which shows the main structures of the audio conference apparatus shown in FIG. 放音時および収音時のメイン制御部１０の処理を説明するための簡略図である。It is a simplified diagram for explaining processing of the main control unit 10 at the time of sound emission and sound collection. 本発明の実施形態のネットワークサーバ１０１の主要構成を示すブロック図であるIt is a block diagram which shows the main structures of the network server 101 of embodiment of this invention. ネットワークサーバの収音補正設定フローを示すフローチャートである。It is a flowchart which shows the sound collection correction | amendment setting flow of a network server. 音声会議装置の放収音処理を示すフローチャートである。It is a flowchart which shows the sound emission processing of an audio conference apparatus. 音声会議装置の放音調整変更、収音補正変更の割込処理を示すフローチャートである。It is a flowchart which shows the interruption process of the sound emission adjustment change of a voice conference apparatus, and a sound-collection correction change. 放音調整個別対応の場合の放収音状況を示した図である。It is the figure which showed the sound emission collection situation in the case of sound emission adjustment individual correspondence. 収音補正一括対応の場合の一括補正前の放収音状況を示した図である。It is the figure which showed the sound emission collection condition before the collective correction in the case of collective sound correction collective correspondence. 収音補正一括対応の場合の一括補正後の放収音状況を示した図である。It is the figure which showed the sound collection condition after the collective correction in the case of collective sound correction collective correspondence. 図１１、図１２と同様な場合で且つ逆補正を行う場合の放収音状況を示した図である。It is the figure which showed the sound emission and collection condition in the case similar to FIG. 11, FIG. 12, and when performing reverse correction.

Explanation of symbols

１００−ネットワーク、１０１−ネットワークサーバ、１０２−ネットワーク制御部、１０３−会議情報記憶部、１１１（１１１Ａ〜１１１Ｃ）−音声会議装置、１２０−リモコン、１１２−音声会議装置の筐体、１１３−音声会議装置の脚部、１１４−操作部、１０−メイン制御部、１１−通信制御部、１２−放音制御部、１３−Ｄ／Ａコンバータ、１４−放音アンプ、１５−収音アンプ、１６−Ａ／Ｄコンバータ、１７−収音制御部、１８−エコーキャンセル部、１９−音声信号補正部、２０−リモコン送受信部 100-network, 101-network server, 102-network control unit, 103-conference information storage unit, 111 (111A to 111C) -voice conference device, 120-remote controller, 112-housing for voice conference device, 113-voice conference Legs of apparatus, 114-operation unit, 10-main control unit, 11-communication control unit, 12-sound emission control unit, 13-D / A converter, 14-sound emission amplifier, 15-sound collection amplifier, 16- A / D converter, 17-sound collecting control unit, 18-echo canceling unit, 19-audio signal correcting unit, 20-remote control transmission / reception unit

Claims

A speaker array in which a plurality of speakers are arranged in a predetermined manner;
Operation accepting means for accepting a sound emission characteristic adjustment operation;
The input voice communication signal is delayed / amplified so that the sound emission beam is formed only in a plurality of predetermined directions and the sound emission beam in the designated direction is adjusted based on the sound emission characteristics received by the operation receiving means. Sound emission control means for processing and providing to the plurality of speakers;
A voice communication device comprising:

The voice communication apparatus according to claim 1, wherein the sound emission characteristic is set by any one of volume, sound quality, voice feature amount, or a combination thereof.

The voice communication based on the sound collecting beam of the speaker direction together with the speaker direction by identifying a speaker direction by forming a sound collecting beam with respect to the predetermined plurality of directions and comparing the sound collecting beam intensities The voice communication apparatus according to claim 1, further comprising sound collection means for outputting a signal.

A voice communication system comprising a network server for managing a plurality of voice communication apparatuses according to claim 3 connected to a network, and managing communication of the network,
The voice communication apparatus gives the adjustment contents of the received sound emission characteristics to the network server,
In the network server, if a plurality of adjustment contents received from each voice communication device correspond to the same speaker orientation and have the same tendency, and the number of adjustment contents received is a predetermined number or more, Set the sound collection correction characteristic for the speaker orientation and give it to the corresponding voice communication device,
The voice communication apparatus to which the sound collection correction characteristic is given is a voice communication system that corrects and outputs a voice communication signal obtained from a corresponding direction with the sound collection correction characteristic.

The network server provides sound collection correction characteristics to all voice communication devices connected to the network,
5. The voice communication system according to claim 4, wherein the sound emission control means of each voice communication device adjusts the sound emission beam based on a difference between the given sound collection correction characteristic and the accepted sound emission characteristic.

6. The voice communication according to claim 5, wherein the sound emission control means of each of the voice communication devices performs an adjustment for canceling the sound collection correction characteristic with respect to a sound emission beam in a direction where a sound emission characteristic adjustment operation is not performed. system.