JP2588970B2

JP2588970B2 - Multipoint conference method

Info

Publication number: JP2588970B2
Application number: JP1150173A
Authority: JP
Inventors: 裕明名取
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1989-06-13
Filing date: 1989-06-13
Publication date: 1997-03-12
Anticipated expiration: 2012-03-12
Also published as: JPH0314386A

Description

【発明の詳細な説明】〔概要〕３箇所以上の会議端末間でセンター装置を介して同時
にテレビ会議を行う多地点会議方式に関し、できるだけ簡易な構成で適切な会議室端末の画像切替
を行えるようにすることを目的とし、各会議室端末が、参加者に対して設けたマイクロホン
群から話者検出を行う話者検出部を有すると共に話者検
出された発言者に向けて人物カメラを制御し、該発言者
の音声及びその話者検出信号と共に人物カメラの画像を
該センター装置に送出し、該センター装置では、該話者
検出信号を受ける毎に該話者検出信号に係る会議室端末
を中心会議室端末としてその出力画像をタイマにより一
定時間だけ保持し、他の会議室端末のモニタに分配する
ように構成する。DETAILED DESCRIPTION OF THE INVENTION [Overview] Regarding a multi-point conference system in which a video conference is simultaneously performed between three or more conference terminals via a center device, an appropriate image switching of a conference room terminal can be performed with a configuration as simple as possible. Each conference room terminal has a speaker detection unit that performs speaker detection from a group of microphones provided for the participants, and controls a person camera toward the speaker whose speaker is detected. Transmitting the image of the person camera together with the voice of the speaker and the speaker detection signal to the center device, and the center device transmits a conference room terminal related to the speaker detection signal each time the speaker detection signal is received. The output image of the central conference room terminal is held for a certain period of time by a timer and distributed to monitors of other conference room terminals.

[Industrial applications]

本発明は、多地点会議方式に関し、特に３箇所以上の
会議室端末間でセンター装置を介して同時にテレビ会議
を行う多地点会議方式に関するものである。The present invention relates to a multipoint conference system, and more particularly to a multipoint conference system in which video conferences are simultaneously performed between three or more conference room terminals via a center device.

迅速な情報の伝達や人・物が移動（出張）する際の時
間・経費の節減を目的として、遠隔地点に存在する会議
室端末間でテレビ会議が利用されている。2. Description of the Related Art Video conferences are used between conference room terminals located at remote locations in order to quickly transmit information and to save time and money when people and things move (business trips).

３箇所以上の会議室端末を接続して同時にテレビ会議
を行う場合、データ伝送量を考えると、１つの会議室端
末の画像を他の会議室端末に分配する方法がベストであ
る。この時、この会議の中心となる会議室端末は通常、
発言者が存在する会議室端末を選択することが望まし
く、また、この選択制御が自動的に行われることが望ま
しい。In a case where three or more conference room terminals are connected and a video conference is held at the same time, the method of distributing an image of one conference room terminal to another conference room terminal is best in consideration of a data transmission amount. At this time, the conference room terminal that is the center of this conference is usually
It is desirable to select the conference room terminal where the speaker exists, and it is desirable that this selection control be performed automatically.

[Conventional technology]

第８図は３箇所以上の会議室端末間ではセンター装置
を介して同時にテレビ会議を行う多地点会議方式の従来
例を示したもので、各会議室端末１には、会議参加者に
対向して設けられた複数個のマイクロホン群11と、これ
らのマイクロホン群11からの出力を合成するマイクミキ
サ12と、他の会議室端末からの音声を出力するスピーカ
13と、スピーカ13の出力音声がマイクロホン群11へ回り
込まないようにするための入出力音声のスイッチング動
作又はエコー消去動作等を行う音声制御部14と、この音
声制御部14からの又は音声制御部14への音声に対する符
号化／複合化部（以下、CODECと略称する）15と、会議
室端末１の全景、人物又は資料を映すためのカメラ16
と、他の会議室端末の画像を表示するためのモニタ17
と、カメラ16及びモニタ17の画像に対するCODEC18と、
音声CODEC15と画像CODEC18の入力／出力信号に対する多
重化／分離部（MUX/DMUX）19とで構成されている。FIG. 8 shows a conventional example of a multipoint conference system in which video conferences are simultaneously performed between three or more conference room terminals via a center device. Each conference room terminal 1 faces a conference participant. A plurality of microphone groups 11, a microphone mixer 12 for synthesizing outputs from these microphone groups 11, and a speaker for outputting sound from another conference room terminal.
13, an audio control unit 14 that performs an input / output audio switching operation or an echo canceling operation or the like to prevent the output audio of the speaker 13 from sneaking into the microphone group 11, and an audio control unit from or from the audio control unit 14. An encoding / decoding unit (hereinafter abbreviated as CODEC) 15 for audio to 14, and a camera 16 for displaying the whole view of the conference room terminal 1, a person or a material
And a monitor 17 for displaying images of other conference room terminals
And CODEC 18 for images of camera 16 and monitor 17,
It comprises a multiplexing / demultiplexing unit (MUX / DMUX) 19 for input / output signals of an audio CODEC 15 and an image CODEC 18.

このような構成を有する各会議室端末１がセンター装
置２に相互接続されている。このセンター装置２は、各
会議室端末１からの又は各会議室端末１への音声・画像
に対する多重化／分離部21と、この多重化／分離部21に
接続された音声CODEC22と、各音程CODEC22からの音声信
号のレベルを比較する音声レベル比較器23と、比較器23
で比較した後の音声信号を合成して音声CODEC22及び多
重化／分離部21を介して各会議室端末１のスピーカ13へ
送るためのミキサ24と、多重化／分離部21で分離された
画像データを選択するための画像データセレクタ25と、
音声レベル比較器23での比較結果を受けてセレクタ25に
選択信号を与える制御部26と、セレクタ25で選択された
画像データを多重化／分離部21を介して各会議室端末１
のモニタ17へ送るための分配部27とで構成されている。Each conference room terminal 1 having such a configuration is interconnected to the center device 2. The center device 2 includes a multiplexing / separating unit 21 for audio / video from each conference room terminal 1 or to each conference room terminal 1, an audio CODEC 22 connected to the multiplexing / separating unit 21, and each pitch. An audio level comparator 23 for comparing the level of the audio signal from the CODEC 22;
A mixer 24 for synthesizing the audio signal after the comparison and sending it to the speaker 13 of each conference room terminal 1 via the audio CODEC 22 and the multiplexing / demultiplexing unit 21 and an image separated by the multiplexing / demultiplexing unit 21 An image data selector 25 for selecting data,
The control unit 26 receives the comparison result from the audio level comparator 23 and supplies a selection signal to the selector 25, and the image data selected by the selector 25 via the multiplexing / demultiplexing unit 21 to each conference room terminal 1
And a distribution unit 27 for sending to the monitor 17.

この従来例の動作においては、各会議室端末１のマイ
クロホン群11からの合成音声のレベルをセンター装置２
内の音声レベル比較器23で比較し、その中で最大の音声
レベルを出力した会議室端末を中心会議室端末と決定し
てその会議室端末の画像をセレクタ25で選択し、分配部
27でその他の会議室端末へ分配している。In the operation of this conventional example, the level of the synthesized voice from the microphone group 11 of each conference room terminal 1 is set to the center device 2.
, The conference room terminal outputting the maximum audio level among them is determined as the central conference room terminal, the image of the conference room terminal is selected by the selector 25, and the
At 27, it is distributed to other conference room terminals.

このような多地点会議方式は、例えば特開昭61−4568
9号公報に示されたものである。Such a multipoint conference system is disclosed in, for example, Japanese Patent Application Laid-Open No. 61-4568.
This is shown in JP-A-9.

[Problems to be solved by the invention]

しかしながら、第８図に示す従来からの多地点会議方
式では、会議室端末毎の合成音声を用いるため、発言以
外の雑音やざわめき等によって発言と認識してしまい、
発言していない会議室端末の不必要な画像を各会議室端
末へ分配してしまう結果、中心会議室の画像を分配でき
なくなる等、運用上に問題があった。However, in the conventional multipoint conference system shown in FIG. 8, since the synthesized speech for each conference room terminal is used, the speech is recognized as a speech due to noise or noises other than the speech.
As a result of distributing the unnecessary images of the conference room terminals that are not speaking to each conference room terminal, there is a problem in operation such that the image of the central conference room cannot be distributed.

従って、本発明は、３箇所以上の会議室端末間でセン
ター装置を介して同時にテレビ会議を行う多地点会議方
式において、できるだけ簡易な構成で適切な会議室端末
の画像切替を行えるようにすることを目的とする。Accordingly, the present invention is to enable appropriate image switching of conference room terminals with a configuration as simple as possible in a multipoint conference system in which a video conference is simultaneously performed between three or more conference room terminals via a center device. With the goal.

[Means and actions for solving the problem]

第１図は、上記の目的を達成するための本発明に係る
多地点会議方式の原理的な構成図であり、図示のよう
に、各会議室端末１では、話者検出部３で参加者に対し
て設けたマイクロホン群11から話者検出を行って人物カ
メラ４を制御してその発言者に向け、発言者の音声及び
その話者検出信号と共に人物カメラ４の画像をセンター
装置２に向けて送出する。FIG. 1 is a block diagram showing the principle of a multipoint conference system according to the present invention for achieving the above-mentioned object. As shown in FIG. , The speaker is detected from the group of microphones 11 provided to the camera, and the person camera 4 is controlled and directed to the speaker. And send it out.

センター装置２では、第２図に示すように、その話者
検出信号を受ける毎に話者検出信号に係る会議室端末を
中心会議室端末としてその出力画像をタイマ６により一
定時間だけ保持し、他の会議室端末１のモニタ17に分配
する。従って、その保持時間が経過しても話者検出信号
を受け続けた場合には、その保持時間は更新されると共
に、保持時間内に同一話者検出信号が入力されなくなっ
たときには、次の話者検出信号によりタイマ６の保持時
間が開始されることになる。In the center device 2, as shown in FIG. 2, each time the speaker detection signal is received, the conference room terminal related to the speaker detection signal is used as the central conference room terminal, and the output image is held for a fixed time by the timer 6, It is distributed to the monitor 17 of another conference room terminal 1. Therefore, if the speaker detection signal continues to be received even after the elapse of the holding time, the holding time is updated, and if the same speaker detection signal is not input within the holding time, the next talk is performed. The holding time of the timer 6 is started by the person detection signal.

これにより、順次、発言者の居る会議室端末が中心会
議室端末となってその人物画像が他の会議室端末に分配
されて会議が進行することとなる。As a result, the conference room terminal where the speaker is present becomes the central conference room terminal, and the person image is sequentially distributed to the other conference room terminals, and the conference proceeds.

また、本発明では、会議室端末１とセンター装置２に
より話者音声レベルを２段階で検出しており、各マイク
ロホン入力毎に発言者を検出しているので、ヒソヒソ話
等を発言とみなしてしまうことがない。In the present invention, the speaker's voice level is detected in two stages by the conference room terminal 1 and the center device 2, and the speaker is detected for each microphone input. There is no end.

また、本発明では、資料カメラ５による資料画像を出
力した会議室端末が在るときには、センター装置２はそ
の会議室端末を中心会議室端末として他の会議室にその
資料画像を分配するが、このときに、他の会議室端末が
話者検出信号を送ってきても、その資料画像を出力して
いる会議室端末を依然中心会議室端末とし、目下分配し
ている資料画像は切り替えないようにする。In addition, in the present invention, when there is a conference room terminal that outputs a material image by the material camera 5, the center device 2 distributes the material image to another conference room using the conference room terminal as a central conference room terminal. At this time, even if another conference room terminal sends a speaker detection signal, the conference room terminal that is outputting the material image is still the central conference room terminal, and the currently distributed material image is not switched. To

これにより、資料画像を出力する会議室端末を常に中
心会議室端末として会議を進めることができる。This allows the conference to proceed with the conference room terminal that outputs the material image always as the central conference room terminal.

更に、本発明では、センター装置２が人物画像と資料
画像の合成画像を各会議室端末１に送出することがで
き、この場合には、第３図に示すように、資料画像を出
力している中心会議室端末に対してその自局資料画像と
話者検出信号を発生した会議室端末の人物画像を合成表
示させるように切り替えることにより資料画像を優先さ
せると共にこの資料画像を見て発言した参加者の人物画
像が中心会議室側で見れるようにしている。Further, in the present invention, the center device 2 can send a composite image of the person image and the material image to each conference room terminal 1, and in this case, as shown in FIG. By switching the central conference room terminal to combine and display its own material image and the person image of the conference room terminal that generated the speaker detection signal, the material image is prioritized and the user speaks while viewing this material image The participant's image can be viewed on the central conference room side.

〔Example〕

第４図は本発明に係る多地点会議方式の一実施例を示
したもので、この実施例では、第９図の従来例に対し
て、各会議室１に話者検出部３と、この話者検出部３か
らの話者検出信号を伝送データに含めるためのデータ伝
送部８とが設けられており、また、センター装置２にお
いて話者検出信号を抽出するためのデータ伝送部28が多
重化／分離部21に接続されており、更に音声レベル比較
器が除かれている点が異なっている。尚、会議室には人
物カメラ４及び資料カメラ５の他に全景カメラ７が示さ
れているが、本発明では特に用いなくてもよい。FIG. 4 shows an embodiment of the multipoint conference system according to the present invention. In this embodiment, a speaker detector 3 is provided in each conference room 1 in comparison with the conventional example shown in FIG. A data transmission unit 8 for including a speaker detection signal from the speaker detection unit 3 in transmission data is provided, and a data transmission unit 28 for extracting the speaker detection signal in the center device 2 is multiplexed. The second embodiment is different from the first embodiment in that the second embodiment is connected to the multiplexing / separating unit 21 and that the audio level comparator is omitted. Note that the panoramic camera 7 is shown in the conference room in addition to the person camera 4 and the document camera 5, but the present invention may not be particularly used.

このような実施例の動作のおいては、出席者毎に１本
づつ設けられたマイクロホン群11からの音声出力はマイ
クミキサ12を介して話者検出部３に送られる。この話者
検出部３ではマイクロホンへの音声入力が有ったか否か
が判定されて話者検出信号（話者検出の有無と会議室番
号）がデータ伝送部８へ送られると共に、その話者検出
に対応した人物の画像が得られるように話者認識番号信
号が人物カメラ４に送られて旋回制御を行う。In the operation of this embodiment, the audio output from the microphone group 11 provided one by one for each attendee is sent to the speaker detector 3 via the microphone mixer 12. The speaker detector 3 determines whether or not a voice has been input to the microphone, and sends a speaker detection signal (whether or not a speaker has been detected and a conference room number) to the data transmitter 8 and the speaker. A speaker identification number signal is sent to the person camera 4 to perform turning control so that an image of a person corresponding to the detection is obtained.

この話者検出動作は本出願人が例えば特願昭62−3218
46号において既に開示したものであるが、ここで簡単に
説明する。This speaker detection operation is performed by the present applicant, for example, in Japanese Patent Application No. 62-3218.
This is already disclosed in No. 46, but will be briefly described here.

第５図は話者検出部３の具体的な構成を示したもの
で、ミキサ12のからの各マイクロホンに対応した各出力
信号（合成出力信号だけが音声制御部14に送られる）を
サンプリング回路30でサンプリングする。このサンプリ
ング回路30はマイク出力が所定閾値レベル以上のときに
オンで、それ以外のときにはオフとなる２値のディジタ
ル信号に変換する回路であり、このサンプリング回路30
の出力は入力バッファ31を介して各マイクロホンに対し
て用意された蓄積バッファ32にそれぞれ分配して蓄積さ
れる。FIG. 5 shows a specific configuration of the speaker detection unit 3. Each output signal corresponding to each microphone from the mixer 12 (only the synthesized output signal is sent to the voice control unit 14) is sampled by a sampling circuit. Sample at 30. This sampling circuit 30 is a circuit for converting into a binary digital signal which is turned on when the microphone output is equal to or higher than a predetermined threshold level and turned off otherwise.
Are distributed and accumulated via the input buffer 31 to the accumulation buffers 32 prepared for the respective microphones.

これらの蓄積バッファ32の各ビット数は所定秒数、例
えば４秒間のサンプリング数に対応しており、蓄積バッ
ファ32にセットされたビット数によりマイクロホンの音
声出力の通算時間が確認されることになる。Each bit number of the accumulation buffer 32 corresponds to a predetermined number of seconds, for example, a sampling number of 4 seconds, and the total time of the audio output of the microphone is confirmed by the number of bits set in the accumulation buffer 32. .

このビット数によって示された通算時間は処理回路33
に入力され、その通算時間が約２秒間に相当するビット
数、即ちほぼ半数のビットがセットされている蓄積バッ
ファに対応するマイクロホンの話者を発言者として話者
検出する。The total time indicated by the number of bits is calculated by the processing circuit 33.
The number of bits corresponding to the total time of about 2 seconds, that is, the speaker of the microphone corresponding to the accumulation buffer in which almost half of the bits are set is detected as the speaker.

そして、この場合、話者検出された発言者が複数人存
在した場合には、蓄積バッファ32にセットされたビット
数、即ち通算時間の最も長いバッファに対応するマイク
ロホンの話者を発言者と認識して話者検出信号（話者位
置番号信号）を出力する。In this case, when there are a plurality of speakers detected as speakers, the number of bits set in the accumulation buffer 32, that is, the speaker of the microphone corresponding to the buffer having the longest total time is recognized as the speaker. And outputs a speaker detection signal (speaker position number signal).

このようにして、各会議室１からは、マイクロホン群
11の合成音声データと、人物カメラ４からの画像データ
と、話者検出信号とが多重化／分離部19で多重化されて
ディジタル回線を通じセンター装置２へ送られる。尚、
データ伝送部８からの話者検出信号は話者が存在する間
中サイクリックに送出し続けられる。In this way, from each conference room 1, the microphone group
The synthesized voice data of 11, the image data from the human camera 4, and the speaker detection signal are multiplexed by the multiplexing / demultiplexing unit 19 and sent to the center device 2 through a digital line. still,
The speaker detection signal from the data transmission unit 8 is continuously transmitted cyclically while the speaker is present.

センター装置２では、多重化／分離部21及びデータ伝
送部28を介して話者検出信号（話者有り信号及び会議室
番号）を抽出しこれを制御部26に与えると、制御部26は
その会議室の出力画像を選択するようにセレクタ25を切
替え、分配部27により話者検出された会議室以外の会議
室へ同じ画像を分配する。また、音声はい多重化／分離
部21及び音声CODEC22を介してミキサ28で合成されて各
会議室１へ転送される。In the center device 2, a speaker detection signal (speaker presence signal and conference room number) is extracted via the multiplexing / demultiplexing unit 21 and the data transmission unit 28 and is provided to the control unit 26. The selector 25 is switched so as to select the output image of the conference room, and the same image is distributed to conference rooms other than the conference room in which the speaker is detected by the distribution unit 27. The audio is synthesized by the mixer 28 via the multiplexing / demultiplexing unit 21 and the audio CODEC 22, and is transferred to each conference room 1.

ここで、制御部26が複数の話者検出信号を受けた時の
動作を第６図及び第７図のフローチャートにより説明す
る。Here, the operation when the control unit 26 receives a plurality of speaker detection signals will be described with reference to the flowcharts of FIGS.

まず、各会議室端末からの何らかの信号を受信する
と、制御部26には割込信号が上がり、第６図（ａ）に示
すように各会議室端末を順次スキャンして話者検出（話
者有り）信号が入力されているときには第７図に示す話
者検出制御を行う。First, when any signal from each conference room terminal is received, an interrupt signal is sent to the control unit 26, and each conference room terminal is sequentially scanned as shown in FIG. When a signal is input, the speaker detection control shown in FIG. 7 is performed.

第７図に示す話者検出制御ではまず、現時点で話者が
検出されている状態かどうかを判断する（ステップS
1）。もし、話者検出状態であれば話者有り信号の送信
元（入力会議室端末）と、今まで話者と認識されていた
局とが同一か否かを判定する（ステップS2）。同一であ
れば第１図に概念的に示した話者検出保持タイマを更新
し（ステップS3）、そうでなければこの信号自体を無視
する。In the speaker detection control shown in FIG. 7, it is first determined whether or not a speaker is currently detected (step S).
1). If the speaker is detected, it is determined whether the source of the speaker presence signal (input conference room terminal) is the same as the station that has been recognized as the speaker (step S2). If they are the same, the speaker detection holding timer conceptually shown in FIG. 1 is updated (step S3), otherwise, this signal itself is ignored.

このタイマ動作は第６図（ｂ）に示すように、ステッ
プS3により割込がかかり、一つの会議室端末から話者有
り信号を受信すると初期値がセットされ、いずれの会議
室端末にも発言者が居ない状態の時、プログラムの周期
毎に「１」づつデクリメントして行くものである。As shown in FIG. 6 (b), this timer operation is interrupted in step S3, and when a talker presence signal is received from one conference room terminal, an initial value is set, and a speech is made to any conference room terminal. When no user is present, the value is decremented by "1" every cycle of the program.

この保持タイマは、話者有り信号が各会議室端末から
頻繁に上がってきた場合、表示される会議室端末がその
度に変わってしまうことを避けるために設けた最低保持
時間を計るタイマである（現在初期値は約4secとしてい
る）。また、話者検出状態がセットされている時（保持
タイマ≠０のとき）、別の会議室端末から話者有り信号
を受信しても無視され、現在話者と認識されている会議
室端末からの話者有り信号は、保持タイマの更新（初期
値の再セット）を行う（第２図参照）。This holding timer is a timer for measuring the minimum holding time provided in order to prevent the displayed conference room terminal from being changed each time when the speaker presence signal frequently comes up from each conference room terminal. (Currently the initial value is about 4 seconds). Also, when the speaker detection state is set (when the hold timer is $ 0), even if a speaker presence signal is received from another conference room terminal, the signal is ignored and the conference room terminal which is currently recognized as a speaker is received. Updates the holding timer (resets the initial value) (see FIG. 2).

一方、ステップS1において話者検出状態でない場合に
は、画像セレクタ25の切替を行うか否かの判定を行う
（ステップS4）。ここで話者検出信号の送信元がすでに
話者検出信号を送っていて中心の会議室端末となってい
る場合は画像セレクタ25を切り替える必要はなくタイマ
の更新を行う（ステップS3）。On the other hand, if it is not in the speaker detection state in step S1, it is determined whether to switch the image selector 25 (step S4). Here, if the sender of the speaker detection signal has already sent the speaker detection signal and is the central conference room terminal, there is no need to switch the image selector 25 and the timer is updated (step S3).

送信元が中心会議室端末になっていないときには、中
心となる会議室端末のモニタ画面状態をチェックする
（ステップS5）。When the transmission source is not the central conference room terminal, the monitor screen state of the central conference room terminal is checked (step S5).

この結果、中心会議室端末が資料カメラ５からの資料
画像を送出している時は、全会議室端末がその資料画像
に注目しているのであるから、中心となる会議室端末の
自動切り替えは行わない。As a result, when the central conference room terminal is transmitting the material image from the material camera 5, since all the conference room terminals are paying attention to the material image, the automatic switching of the central conference room terminal is not performed. Not performed.

但し、中心会議室端末のモニタに資料画像に加えて任
意の会議室端末の人物画像が合成表示されるときには、
第３図に示したように、その相手人物画像のみを発言者
の人物画像に切り替えるため画像セレクタテーブルの副
会議室端末（中心会議室端末以外を呼称する）を更新す
る（ステップS6）。However, when a person image of an arbitrary conference room terminal is synthesized and displayed on the monitor of the central conference room terminal in addition to the material image,
As shown in FIG. 3, the sub-conference room terminal (other than the central conference room terminal) in the image selector table is updated in order to switch only the partner person image to the speaker's person image (step S6).

中心となる会議室端末が資料以外の画像を送出してい
る時は話者検出により自動切替を行うため、画像セレク
タテーブルの中心会議室端末及び副会議室端末をそれぞ
れ更新する（ステップS6、７）と共に画像セレクタ25へ
選択信号を出力する（ステップS9）。When the central conference room terminal is transmitting an image other than the material, the central conference room terminal and the secondary conference room terminal in the image selector table are updated to perform automatic switching by speaker detection (steps S6 and S7). And outputs a selection signal to the image selector 25 (step S9).

尚、この直前まで中心だった会議室端末の人物画像
は、話者検出により選択された新しい中心の会議室端末
の画像とすることができる。It should be noted that the person image of the conference room terminal which has been the center until immediately before this can be the image of the new center conference room terminal selected by the speaker detection.

〔The invention's effect〕

以上のように本発明に係る多地点会議方式によれば、
各会議室端末において話者検出を行ってその人物の画像
を出力し、センター装置で該話者検出信号を受ける毎に
一定時間だけタイマ保持して該話者検出信号に係る会議
室端末を中心会議室端末としてその出力画像を他の会議
室に分配するように構成したので、雑音等による誤った
人物画像を分配することなく適切且つ円滑な会議の進行
が実現できる。また、話者状態の認識にタイマを用いる
ことにより、発言の連続性を重視し、会議室端末切替が
頻繁になり過ぎないようにしている。As described above, according to the multipoint conference system according to the present invention,
Each of the conference room terminals performs speaker detection and outputs an image of the person, and each time the center device receives the speaker detection signal, the timer is held for a certain period of time to center the conference room terminal related to the speaker detection signal. Since the output image is distributed to the other conference rooms as the conference room terminals, the appropriate and smooth progress of the conference can be realized without distributing an erroneous person image due to noise or the like. In addition, by using a timer to recognize the speaker state, the continuity of speech is emphasized, and the switching of conference room terminals is prevented from becoming too frequent.

[Brief description of the drawings]

第１図は本発明に係る多地点会議方式の原理構成を示す
ブロック図、第２図及び第３図は本発明方式を説明するための図、第４図は本発明方式の一実施例を示すブロック図、第５図は本発明方式に用いる話者検出部の一実施例を示
した図、第６図及び第７図は本発明方式のセンター装置に用いる
制御部の制御アルゴリズムを示したフローチャート図、第８図は従来例を説明するためのブロック図、である。第１図において、１……会議室端末、２……センター装置、３……話者検出部、４……人物カメラ、５……資料カメラ、６……タイマ、 11……マイクロホン群、 17……モニタ。図中、同一符号は同一又は相当部分を示す。FIG. 1 is a block diagram showing the principle configuration of a multipoint conference system according to the present invention, FIGS. 2 and 3 are diagrams for explaining the system of the present invention, and FIG. 4 is an embodiment of the system of the present invention. FIG. 5 is a diagram showing an embodiment of a speaker detection unit used in the system of the present invention. FIGS. 6 and 7 show control algorithms of a control unit used in the center device of the system of the present invention. FIG. 8 is a flow chart, and FIG. 8 is a block diagram for explaining a conventional example. In FIG. 1, 1... Conference room terminal, 2... Center device, 3... Speaker detector, 4... Human camera, 5... Material camera, 6... Timer, 11. ……monitor. In the drawings, the same reference numerals indicate the same or corresponding parts.

Claims

(57) [Claims]

In a multipoint conference system in which a video conference is simultaneously performed between three or more conference room terminals (1) via a center device (2), each conference quality terminal (1) is connected to a participant. A speaker detection unit (3) for detecting a speaker corresponding to each microphone from the microphone group (11) provided is provided, and a person camera (4) is controlled toward a speaker whose speaker has been detected. An image of the person camera (4) is transmitted to the center device (2) together with the voice of the speaker and the speaker detection signal, and the center device (2) receives the speaker detection signal each time the speaker detection signal is received. A multipoint conference system characterized in that the conference room terminal according to (1) is used as a central conference room terminal, and the output image is held for a fixed time by a timer (6) and distributed to monitors (17) of other conference room terminals.

2. When the center device (2) receives a material image from the material camera (5), the conference room terminal that has output the material image is used as a central conference room terminal and the material image is transferred to another conference room. 2. The multipoint conference system according to claim 1, wherein the material image is distributed, and the material image is not switched even after receiving the speaker detection signal from another conference room terminal.

3. The center apparatus (2) outputs its own station material image and the speaker detection signal to the central conference room terminal (1) that has output the material image by the material camera (5). 3. The multipoint conference system according to claim 2, wherein switching is performed so as to synthesize and display a person image of the conference room terminal.