CN1618233A

CN1618233A - Video conferencing system and method of operation

Info

Publication number: CN1618233A
Application number: CNA028277430A
Authority: CN
Inventors: 阿瑟·拉莱
Original assignee: Motorola Inc
Current assignee: Motorola Solutions Inc
Priority date: 2002-01-30
Filing date: 2002-12-16
Publication date: 2005-05-18
Also published as: FI20041039L; GB0202101D0; HK1058450A1; KR20040079973A; GB2384932B; GB2384932A; WO2003065720A1; JP2005516557A

Abstract

A method of relaying video images in a multimedia videoconference between a plurality of multimedia user equipment (550, 560, 570, 580) includes the step of transmitting layered video images by a number of said plurality of user equipment wherein said layered video images inlcude a base layer (552, 562, 572, 582) and one or more enhancement layers (555, 565, 575, 585). The transmitted layered video images are received at a multipoint control unit (520), where a number of base layer video images of a number of active speakers (535) and one or more enhancement layers (540) of a most active speaker are selected. the multipoint control unit (520) transmits the base layer video images, and one or more enhancement layers (540) of the most active speaker, to one or more of the plurality of multimedia user equipment (550, 560, 570, 580). The identification of the speakers is much improved compared to traditional videoconference systems, as the available bandwidth is shared to allow one enhancement layer and several base layers to be sent, instead of only one full quality video stream.

Description

Video conferencing system and method of operation

技术领域technical field

本发明涉及视频会议。本发明可以应用于基于H.323和/或基于SIP的集中视频会议中、使用分层视频编码的视频交换机制，但并不限于此。The present invention relates to video conferencing. The present invention can be applied to a video exchange mechanism using layered video coding in centralized video conferencing based on H.323 and/or SIP, but is not limited thereto.

背景技术Background technique

由于商业步伐加快，商业关系扩展到全世界，因此对快速并且经济地跨越通信距离的需要变成了主要的难题。为了在竞争越来越激烈的市场上取得成功，把顾客和工作人员有效地集合起来是关键。商家正在寻找灵活的解决方案，以使用各种通信方法，例如语音、视频、图像数据和它们的任何组合来支持跨国以及跨大洲的实时信息共享。As the pace of commerce increases and business relationships expand worldwide, the need to quickly and economically span communication distances becomes a major challenge. In order to succeed in an increasingly competitive marketplace, effectively bringing customers and staff together is key. Merchants are looking for flexible solutions to support real-time information sharing across borders and continents using various communication methods such as voice, video, image data and any combination thereof.

尤其是，跨国组织越来越希望取消昂贵的旅行并连接多个位置，以便让组织内的群组能够更有效地通信。在互联网协议(IP)网络上操作的多点会议系统试图解决该需要。在本发明的领域中，已知在多点视频会议中终端可以实时交换音频和视频流。现有的在IP网络上建立多点会议的方法是使用多点控制单元(MCU)。MCU是网络上的端点，为三个或更多终端和/或通信网关提供参与多点会议的能力。在点对点会议中MCU还可以连接两个终端，以便他们能够具有发展成多点会议的能力。In particular, multinational organizations are increasingly looking to eliminate costly travel and connect multiple locations so groups within the organization can communicate more effectively. Multipoint conferencing systems operating on Internet Protocol (IP) networks attempt to address this need. In the field of the invention, it is known that in a multipoint video conference terminals can exchange audio and video streams in real time. The existing method of establishing a multipoint conference on an IP network is to use a multipoint control unit (MCU). An MCU is an endpoint on a network that provides the ability for three or more endpoints and/or communication gateways to participate in a multipoint conference. The MCU can also connect two terminals in a point-to-point conference so that they can develop into a multi-point conference.

首先参照图1，示出了已知的集中会议模型100。集中会议利用基于MCU的会议网桥。所有终端(端点)120、122、125发送并接收去往/来自MCU110的以音频、视频和/或数据信号形式的媒体信息以及控制信息流140。这些传输可以以点对点方式进行。这在图1中示出了。Referring first to FIG. 1 , a known centralized conferencing model 100 is shown. Centralized conferences utilize MCU-based conference bridges. All terminals (endpoints) 120 , 122 , 125 send and receive media information in the form of audio, video and/or data signals and control information stream 140 to/from MCU 110 . These transfers can be done on a peer-to-peer basis. This is shown in Figure 1.

MCU100由多点控制器(MC)和零或多个多点处理器(MP)组成。MC处理所有终端之间的呼叫建立和呼叫信令协商，以确定用于音频和视频处理的共同能力。MC110不直接处理任何一个媒体流。这留给MP处理，MP混合、交换并处理音频、视频和/或数据比特。MCU100 consists of a multipoint controller (MC) and zero or more multipoint processors (MP). The MC handles call setup and call signaling negotiations between all terminals to determine common capabilities for audio and video processing. MC110 does not directly handle any one media stream. This is left to the MP to mix, switch and process audio, video and/or data bits.

以这种方式，MCU提供举行多位置会议、销售会议、群组会议和其他‘面对面’通信的性能。已知多点会议可以用于各种应用，例如：In this way, the MCU provides the capability to hold multi-location conferences, sales meetings, group meetings and other 'face-to-face' communications. Multipoint conferencing is known to be used in various applications such as:

(i)在多个位置的执行者和管理者能够‘面对面’开会，共享实时信息，并且更快地做出决定，而没有时间、开支的任何损失和旅行的需要。(i) Executives and managers in multiple locations are able to meet 'face-to-face', share real-time information, and make decisions faster without any loss of time, expense, and need for travel.

(ii)项目组和知识工作者可以以实时方式协调各自的任务，并且观看和修正共享的文档、文稿、设计和文件；并且(ii) project teams and knowledge workers can coordinate their respective tasks in real-time and view and revise shared documents, manuscripts, designs and files; and

(iii)在远程位置的学生、受训人员和雇员可以跨越任何距离或时区访问共享的教育/培训资源。(iii) Students, trainees and employees at remote locations can access shared educational/training resources across any distance or time zone.

因此，可以想象得到，在未来基于IP的网络上的多媒体通信中，基于MCU的系统将起到重要作用。Therefore, it is conceivable that MCU-based systems will play an important role in future multimedia communications over IP-based networks.

这样的多媒体通信通常采用视频传输。在这种传输中，在发送和接收单元之间传送图像序列，这些图像序列通常被称为帧。可以使用各种方法建立多点多媒体会议系统，例如H.232和SIP会话层协议标准所规定的方法。有关SIP的参考文献可以在以下网址找到：Such multimedia communications typically employ video transmission. In this type of transmission, image sequences, usually referred to as frames, are transferred between the sending and receiving units. Various methods can be used to establish a multi-point multimedia conference system, such as the methods specified in the H.232 and SIP session layer protocol standards. References on SIP can be found at:

http：//www.ietf.org/rfc/rfc2534.txt，以及 http://www.ietf.org/rfc/rfc2534.txt , and

http：//www.cs.columbia.edu/～hgs/sip. http://www.cs.columbia.edu/~hgs/sip .

此外，例如在使用ITU H.263视频压缩[ITU-T Recommendation，H.263，‘Video Coding for Low Bit Rate Communication’]系统中，视频序列的第一帧包括相当数量的综合图像数据，通常被称为帧内编码信息。由于帧内编码帧是第一帧，因此它可以提供要显示的图像的实质部分。帧内编码帧随后是帧间编码(预测)信息，其通常包括与正发送的图像中的变化有关的数据。因此预测的帧间编码信息包含的信息比帧内编码信息少得多。Furthermore, for example in systems using ITU H.263 video compression [ITU-T Recommendation, H.263, 'Video Coding for Low Bit Rate Communication'], the first frame of a video sequence includes a considerable amount of synthetic image data, usually It is called intra-frame coding information. Since the intra-coded frame is the first frame, it can provide a substantial portion of the image to be displayed. Intra-coded frames are followed by inter-coded (predicted) information, which typically includes data related to changes in the picture being sent. Thus predicted inter-coded information contains much less information than intra-coded information.

在传统的多媒体会议系统中，当用户讲话时他们需要识别自己本身，以便接收终端知道是谁在讲话。很明显，如果发送终端不能识别它本身，收听的用户将不得不猜测是谁在讲话。In a traditional multimedia conferencing system, users need to identify themselves when they speak, so that the receiving terminal knows who is speaking. Obviously, if the sending terminal does not recognize itself, the listening user will have to guess who is speaking.

一种已知的技术通过分析音频流并转发活跃发言人的姓名和视频流给所有参与方来解决该问题。在集中会议系统中，MCU通常执行该功能。之后MCU可以通过把合适的输入多媒体流交换到输入端口/路径来把发言人的姓名和相应的视频和音频流发送给所有参与方。One known technique solves this problem by analyzing the audio stream and forwarding the active speaker's name and video stream to all parties involved. In a centralized conferencing system, the MCU usually performs this function. The MCU can then send the speaker's name and corresponding video and audio streams to all participants by switching the appropriate input multimedia streams to the input ports/paths.

视频交换是一种公知的技术，其旨在给每个端点传送一个单独的视频流，相当于安排多个点对点会话。视频交换可以是：Video switching is a well-known technique that aims to deliver to each endpoint a separate video stream, equivalent to arranging multiple point-to-point sessions. Video exchanges can be:

(i)语音激活交换，其中MCU发送活跃发言人的视频。(i) Voice-activated switching, where the MCU sends video of the active speaker.

(ii)定时激活交换，其中在预定时间隔相继发送每个参与方的视频。(ii) A time-activated exchange, where each participant's video is sent sequentially at predetermined time intervals.

(iii)个人视频选择交换，其中每一端点可以请求他/她希望接收的参与方视频流。(iii) Personal video opt-in exchange, where each endpoint can request which party video streams he/she wishes to receive.

现在参照图2，示出了传统的视频交换机制200的功能框图。在传统的集中会议系统中，视频交换的执行如下。MCU220，例如位于基于互联网协议(IP)的网络210内的MCU200，包含交换机230。MCU220接收所有参与方(用户设备)250、260、270、280的视频流255、265、275、285。MCU还可以从正在讲话的参与方单独接收组合(多路复用)的音频流290。之后，MCU220选择一个视频流并且把该视频流240发送给所有参与方250、260、270、280。Referring now to FIG. 2, a functional block diagram of a conventional video switching mechanism 200 is shown. In a traditional centralized conference system, video exchange is performed as follows. MCU 220 , such as MCU 200 located within Internet Protocol (IP) based network 210 , includes switch 230 . The MCU 220 receives the video streams 255 , 265 , 275 , 285 of all participants (user devices) 250 , 260 , 270 , 280 . The MCU may also separately receive the combined (multiplexed) audio stream 290 from the speaking party. Afterwards, the MCU 220 selects a video stream and sends the video stream 240 to all participants 250 , 260 , 270 , 280 .

这种传统系统有一个缺点，就是他们只发送活跃发言人的视频流。用户还可能有一个问题，就是如果几个发言人同时讲话或活跃发言人不断改变，识别视频流发言人会有问题。这尤其是大型视频会议中存在的问题。One downside of this traditional system is that they only send the video stream of the active speaker. Users may also have an issue identifying the speaker of the video stream if several speakers are speaking at the same time or the active speaker keeps changing. This is especially problematic in large video conferences.

作为替换，可以将每个参与方的视频发送给所有参与方。但是，在基于无线的会议中该方法会由于带宽限制而受影响。Alternatively, each participant's video may be sent to all participants. However, in wireless based conferencing this method suffers due to bandwidth limitations.

在视频技术领域，已知视频作为一系列静态图像/画面发送。因为视频信号的质量在视频信号编码和压缩期间会受到影响，因此已知会包括附加的信息‘层’，这些层基于视频信号和编码视频比特流之间的差异。包含附加层能够使接收信号质量随着解码和/或解压缩而得到增强。因此，使用图像和被分为一个或多个层的增强图像这样的分级结构来产生分层视频比特流。In the field of video technology, it is known that video is sent as a series of still images/pictures. Because the quality of a video signal is affected during video signal encoding and compression, it is known to include additional 'layers' of information based on the differences between the video signal and the encoded video bitstream. The inclusion of additional layers enables the received signal quality to be enhanced with decoding and/or decompression. Therefore, a hierarchical video bitstream is generated using a hierarchical structure of pictures and enhanced pictures divided into one or more layers.

在分层(可量测)视频比特流中，可以在基本层之外通过以下之一对视频信号进行增强：In a layered (scalable) video bitstream, the video signal can be enhanced outside the base layer by one of the following:

(i)提高画面的分辨率(空间量测性)；(i) Improve the resolution of the picture (spatial scalability);

(ii)包括错误信息，以改善画面的信噪比(SNR量测性)；或(ii) include error messages to improve the signal-to-noise ratio (SNR measurability) of the picture; or

(iii)包括额外的画面，以提高帧速率(时间量测性)。(iii) Include extra frames to increase frame rate (temporal scalability).

这样的增强可以应用于整个画面，或应用到画面中任意形状的目标，这被称为基于目标的量测性。为了保留时间增强层的可任意处理特性，H.263+标准规定，包含在时间量测性模式中的画面应该是如图3的视频流所示的双向预测(B)画面。Such enhancements can be applied to the entire frame, or to objects of arbitrary shape in the frame, which is called object-based scalability. In order to preserve the discretionary feature of the temporal enhancement layer, the H.263+ standard stipulates that the pictures included in the temporal scalable mode should be bidirectional predictive (B) pictures as shown in the video stream of FIG. 3 .

图3示出了可量测视频配置300的示意性图示，说明了视频编码技术领域公知的B画面预测相关性。最初的帧内编码帧(I₁)310后是双向预测帧(B₂)320。随后是(单向)预测帧(P₃)330，再然后是第二双向预测帧(B₄)330。再后面是(单向)预测帧(P₅)350，等等。Fig. 3 shows a schematic illustration of a scalable video configuration 300, illustrating B-picture prediction dependencies known in the art of video coding. The initial intra-coded frame (I ₁ ) 310 is followed by a bi-directional predictive frame (B ₂ ) 320 . This is followed by a (uni-directional) predictive frame (P ₃ ) 330 and then a second bi-directional predictive frame (B ₄ ) 330 . This is followed by a (unidirectional) predicted frame ( _P5 ) 350, and so on.

图4是视频编码技术领域中已知的分层视频配置的示意性图示。分层视频比特流包括基本层405和一个或多个增强层435。Fig. 4 is a schematic illustration of a layered video configuration known in the field of video coding technology. A layered video bitstream includes a base layer 405 and one or more enhancement layers 435 .

基本层(层1)包括从原始视频信号画面抽样、编码和/或压缩得到的一个或多个帧内编码画面(I画面)410。另外，基本层包括从帧内编码帧410预测的多个预测帧间编码画面(P画面)420、430。The base layer (Layer 1) includes one or more intra-coded pictures (I-pictures) 410 that are sampled, encoded and/or compressed from the original video signal pictures. In addition, the base layer includes a plurality of predicted inter-coded pictures (P-pictures) 420 , 430 predicted from the intra-coded frame 410 .

在增强层(层2或3或更多)435中，可以使用三种类型的画面：In an enhancement layer (layer 2 or 3 or more) 435, three types of pictures can be used:

(i)双向预测(B)画面(未示出)；(i) bidirectional prediction (B) picture (not shown);

(ii)基于基本层405的帧内编码画面410的增强内插(EI)画面440；和(ii) an enhanced interpolation (EI) picture 440 based on an intra-coded picture 410 of the base layer 405; and

(iii)基于基本层的帧间编码画面420、430的增强预测(EP)画面450、460。(iii) Enhancement prediction (EP) pictures 450, 460 based on inter-coded pictures 420, 430 of the base layer.

从较低层向上的垂直箭头说明，增强层中的画面是从参考(较低)层中的画面的重构近似中预测的。Vertical arrows going up from lower layers illustrate that pictures in the enhancement layer are predicted from reconstructed approximations of pictures in the reference (lower) layer.

总之，可以在多点通信多媒体会议中使用可量测视频编码，并且仅仅在点对点或多点通信视频通信的情况中使用。但是，当前无线网络不支持多点通信。另外，通过多点通信，每一层在分开的多点通信会话中发送，接收方确定它自己是否登录到一个或多个会话中。In conclusion, scalable video coding can be used in multipoint multimedia conferencing, and only in the case of point-to-point or multipoint video communications. However, current wireless networks do not support multipoint communication. Alternatively, with multipoint communication, each layer is sent in a separate multipoint communication session, and the receiver determines whether it is logged into one or more sessions.

因此，需要一种改进的视频会议配置和操作方法，可以减轻上面提到的缺点。Therefore, there is a need for an improved method of videoconferencing configuration and operation that mitigates the above-mentioned disadvantages.

发明内容Contents of the invention

根据本发明，提供一种如权利要求1所述的在多媒体视频会议中中继视频图像的方法，一种如权利要求7所述的用于中继视频图像的视频会议设备，一种如权利要求11所述的用于参与视频会议的无线装置，一种如权利要求12所述的多点处理器，一种如权利要求16所述的视频通信系统，一种如权利要求18所述的媒体资源功能元件，一种如权利要求19或权利要求20所述的视频通信单元，一种如权利要求23所述的存储介质。本发明的其他方面如从属权利要求所述。According to the present invention, there is provided a method for relaying video images in a multimedia video conference as claimed in claim 1, a video conferencing device for relaying video images as claimed in claim 7, a A wireless device for participating in a video conference as claimed in claim 11, a multi-point processor as claimed in claim 12, a video communication system as claimed in claim 16, a system as claimed in claim 18 A media resource functional element, a video communication unit as claimed in claim 19 or claim 20, and a storage medium as claimed in claim 23. Other aspects of the invention are set out in the dependent claims.

总之，本发明的发明原理是通过提供一种视频交换方法来解决现有技术配置的缺点，以在视频会议中改善参与方和发言人的识别。本发明利用分层视频编码，以更好地利用可用于每一用户的带宽。In summary, the inventive principle of the present invention is to solve the disadvantages of the prior art configurations by providing a video exchange method to improve identification of participants and speakers in a video conference. The present invention utilizes layered video coding to better utilize the bandwidth available to each user.

附图说明Description of drawings

图1示出了一种已知的集中会议模型。Figure 1 shows a known centralized conference model.

图2示出了传统的视频交换机制的功能框图。Fig. 2 shows a functional block diagram of a traditional video exchange mechanism.

图3是一个视频配置的示意性图示，表示在视频编码技术领域中已知的画面预测相关性。Fig. 3 is a schematic illustration of a video configuration representing picture prediction dependencies known in the field of video coding technology.

图4是在视频编码技术领域中已知的分层视频配置的示意性图示。Fig. 4 is a schematic illustration of a layered video configuration known in the field of video coding technology.

现在将参照附图描述本发明的示例性实施例，其中：Exemplary embodiments of the invention will now be described with reference to the accompanying drawings, in which:

图5示出了根据本发明的优选实施例的视频交换机制的功能框图。Fig. 5 shows a functional block diagram of a video exchange mechanism according to a preferred embodiment of the present invention.

图6示出了根据本发明的优选实施例的多点处理单元的功能框图/流程图。Fig. 6 shows a functional block diagram/flow diagram of a multi-point processing unit according to a preferred embodiment of the present invention.

图7示出了使用本发明的优选实施例参与视频会议的无线装置的视频显示。Figure 7 shows a video display of a wireless device participating in a video conference using the preferred embodiment of the present invention.

图8示出了根据本发明的优选实施例采用的UMTS(3GPP)通信系统。Fig. 8 shows a UMTS (3GPP) communication system employed in accordance with a preferred embodiment of the present invention.

具体实施方式Detailed ways

总体来说，本发明的优选实施例提出一种用于多媒体会议的新的视频交换机制，该机制使用分层视频编码。以前，分层视频编码只用于把一个视频比特流分成多于一个的层：如上面对照图4所描述的基本层和一个或几个增强层。这些用于可量测视频通信的已知技术在诸如H.263和MPEG-4的标准中进行了详细描述。In general, the preferred embodiments of the present invention propose a new video switching mechanism for multimedia conferencing using layered video coding. Previously, layered video coding was only used to divide a video bitstream into more than one layer: a base layer and one or several enhancement layers as described above with reference to FIG. 4 . These known techniques for scalable video communications are described in detail in standards such as H.263 and MPEG-4.

但是，本发明的发明人已经认识到通过采用分层视频编码的原理并把采用的原理应用到多媒体视频会议应用中可以得到这些好处。以这种方式，本发明定义了一种与点对点或多点通信视频通信不同类型的可量测视频编码来用于多媒体会议。However, the inventors of the present invention have realized that these benefits can be obtained by adopting the principles of layered video coding and applying the adopted principles to multimedia videoconferencing applications. In this way, the present invention defines a scalable video coding for multimedia conferencing as opposed to peer-to-peer or multipoint video communications.

现在参照图5，示出了根据本发明的优选实施例的视频交换机制的功能框图500。与传统的集中会议系统形成相比，这种视频交换的执行如下。MCU520，例如位于基于互联网协议(IP)的网络510内的MCU520，包含交换机530。Referring now to FIG. 5, there is shown a functional block diagram 500 of a video switching mechanism in accordance with a preferred embodiment of the present invention. Compared with traditional centralized conference system formation, this video exchange is performed as follows. MCU 520 , such as MCU 520 located within Internet Protocol (IP) based network 510 , includes switch 530 .

值得注意的是，MCU520接收‘分层’视频流，该视频流包括所有参与方(用户设备)550、560、570、580的基本层552、562、572、582和一个或多个增强层流555、565、575、585。为了清楚的目的，每个参与方只示出了一个增强层视频流。Notably, the MCU 520 receives a 'layered' video stream comprising the base layer 552, 562, 572, 582 and one or more enhancement layer streams of all parties (user equipment) 550, 560, 570, 580 555, 565, 575, 585. For purposes of clarity, only one enhancement layer video stream per participant is shown.

MCU520还可以单独从参与方接收组合(多路复用)的音频流590。之后MCU520使用交换机530选择多个活跃发言人535的基本层视频流和最活跃发言人的增强层540。之后MCU520发送这些视频流535、540给所有的参与方550、560、570、580。MCU 520 may also receive combined (multiplexed) audio streams 590 from participants individually. The MCU 520 then uses the switch 530 to select the base layer video stream for the number of active speakers 535 and the enhancement layer 540 for the most active speaker. The MCU 520 then sends these video streams 535 , 540 to all participants 550 , 560 , 570 , 580 .

优选地，确定最活跃发言人的选择过程通过MCU520分析音频流590来执行，以便首先确定所有这些活跃发言人都是谁。然后如图6所述，优选地，在多点处理器单元中确定最活跃发言人。优选地，根据基于每一参与方的活跃性的优先级把一个或多个基本层和一个增强层发送到参与方。Preferably, the selection process of determining the most active speakers is performed by the MCU 520 analyzing the audio stream 590 to first determine who all of these active speakers are. The most active speaker is then determined, preferably in the multipoint processor unit, as described in FIG. 6 . Preferably, one or more base layers and one enhancement layer are sent to the participants according to a priority based on each participant's activity.

为了实现图5的改进的但是更复杂的视频交换机制，多点处理单元(MP)600适于促进根据本发明的优选实施例和如图6所示的新的视频交换机制。To implement the improved but more complex video switching mechanism of FIG. 5, a multipoint processing unit (MP) 600 is adapted to facilitate a new video switching mechanism according to a preferred embodiment of the present invention and as shown in FIG.

MP600还通过分组过滤模块610从参与方的视频/多媒体通信单元接收音频流590并且把该音频流路由到分组路由模块630。但是，音频流现在还被路由到一个发言人识别模块620，该模块分析该音频流590以便确定谁是活跃的发言人。发言人识别模块620基于每一参与方的活跃性分配优先级并且确定：MP 600 also receives audio stream 590 from a participant's video/multimedia communication unit via packet filtering module 610 and routes the audio stream to packet routing module 630 . However, the audio stream is now also routed to a speaker identification module 620 which analyzes the audio stream 590 to determine who is the active speaker. Speaker identification module 620 assigns priorities based on each participant's activity and determines:

(i)最活跃发言人620，(i) most active speakers 620,

(ii)任何其他的活跃发言人625以及缺席的人(ii) any other active speakers 625 and absent persons

(iii)任何剩余的不活跃发言人。(iii) Any remaining inactive speakers.

根据本发明的优选实施例，之后发言人识别模块620把优先级信息转发到交换模块640，该交换模块适于处理发言人的优先级。另外，交换模块640适于通过分组过滤模块610从参与方的视频通信单元接收分层视频流，该分层视频流包括视频基本层流552、562、572和582以及视频增强层流555、565、575和585。交换模块640使用该发言人信息通过分组路由模块630把第二(次)活跃发言人和最活跃发言人的视频基本层和最活跃发言人的视频增强层发送给所有参与方。According to a preferred embodiment of the present invention, the speaker identification module 620 then forwards the priority information to the switching module 640, which is adapted to handle the speaker's priority. In addition, the switching module 640 is adapted to receive layered video streams from a participant's video communication unit via the packet filtering module 610, the layered video streams comprising video base layer streams 552, 562, 572 and 582 and video enhancement layer streams 555, 565 , 575 and 585. The switching module 640 uses the speaker information to send the video base layer of the second (secondary) active speaker and the most active speaker and the video enhancement layer of the most active speaker to all participants through the packet routing module 630 .

因此，多点处理器的一个或多个接收端口适于从一组用户设备550、560、570和580接收分层视频流，给分层视频流包括基本层视频流552、562、572和582以及增强层视频流555、565、575和585。在本发明的考虑中，如果确定只有一个活跃发言人，交换模块640可以只选择一个基本层视频图像和相应的一个或多个增强层。之后将该发言人自动指定为最活跃发言人，以发送到一个或一组用户设备550、560、570和580。Accordingly, one or more receive ports of the multipoint processor are adapted to receive layered video streams from a set of user equipment 550, 560, 570, and 580, the layered video streams including base layer video streams 552, 562, 572, and 582 and enhancement layer video streams 555, 565, 575 and 585. In contemplation of the present invention, if it is determined that there is only one active speaker, the switching module 640 may select only one base layer video image and the corresponding one or more enhancement layers. That speaker is then automatically designated as the most active speaker for transmission to one or a group of user devices 550, 560, 570, and 580.

如在视频会议中发生的，当最活跃发言人经常改变时，将不断交换增强层。本发明的发明人已经认识到这样经常并快速交换所具有的潜在问题。在这种情况下，如果第一帧实际上是来自之前只是第二活跃发言人的预测帧(EP)，那么该帧需要被转换为内插帧(EI)。As happens in video conferencing, when the most active speaker changes frequently, enhancement layers will be constantly swapped. The inventors of the present invention have recognized potential problems with such frequent and rapid exchanges. In this case, if the first frame is actually a predicted frame (EP) from the previous only second active speaker, then that frame needs to be converted to an interpolated frame (EI).

为了解决该潜在问题，优选地将来自分组过滤模块610的视频基本层流552、562、572和582以及视频增强层流555、565、575、585输入到解包功能元件680。解包功能元件680对视频流去多路复用并且把经过去多路复用的视频流提供给视频解码器和缓冲器功能元件670。To address this potential problem, video base layer streams 552 , 562 , 572 and 582 and video enhancement layer streams 555 , 565 , 575 , 585 from packet filtering module 610 are preferably input to unpacketization function 680 . Depacketization function 680 demultiplexes the video stream and provides the demultiplexed video stream to video decoder and buffer function 670 .

为了同步并配合视频解码，视频解码器和缓冲器功能元件670接收最活跃发言人622的指示。在提取最活跃发言人的视频流信息后，视频解码器和缓冲器功能元件670提供最活跃发言人622的双向预测(BP)675和/或预测(EP)视频流数据给‘EP帧到EI帧译码模块’660。该‘EP帧到EI帧译码模块’660处理输入视频流，以提供最初的发言人增强层视频流，如帧内编码(EI)帧。To synchronize and coordinate video decoding, the video decoder and buffer function 670 receives an indication of the most active speaker 622 . After extracting the video stream information of the most active speaker, the video decoder and buffer functional element 670 provides bi-directional prediction (BP) 675 and/or prediction (EP) video stream data of the most active speaker 622 to 'EP frame to EI Frame decoding module '660. The 'EP Frame to EI Frame Decoding Module' 660 processes the input video stream to provide the original speaker enhancement layer video stream as intra-coded (EI) frames.

之后将最初的发言人增强层视频流输入到打包功能元件650，在那里打包并且输入到交换模块640。之后交换模块640组合最初的发言人增强层视频流和第二活跃发言人的视频基本层流552、562、572和582并且把组合的多媒体流路由到分组路由模块630。之后分组路由模块根据图5的方法把该信息路由到参与方。The initial speaker enhancement layer video stream is then input to the packetization function 650 where it is packetized and input to the switching module 640 . The switch module 640 then combines the initial speaker enhancement layer video stream and the second active speaker's video base layer stream 552 , 562 , 572 and 582 and routes the combined multimedia stream to the packet routing module 630 . The packet routing module then routes the information to the participants according to the method of FIG. 5 .

在本发明的优选实施例中，当确定最初发言人改变时，视频交换模块640使用‘EP帧到EI帧译码模块’660的输出。In a preferred embodiment of the present invention, the video exchange module 640 uses the output of the 'EP frame to EI frame decoding module' 660 when determining that the initial speaker change has occurred.

在本发明的考虑中，还可以在MP600中包括类似于模块660的一个或多个模块，以便当认为第二发言人已经改变时对他们执行相同的功能。否则，在使用单个‘EP帧到EI帧译码模块’660来译码最初发言人的视频流的实施例中，当假定一个不活跃的发言人变成第二活跃发言人时，发言人识别模块620(或交换模块640)可以请求新的内插帧。作为选择的，交换模块640可以在发送相应的视频基本层流给所有参与方之前等待新的第二活跃发言人的新内插帧。It is contemplated by the present invention that one or more modules similar to module 660 may also be included in MP 600 to perform the same function on the second speaker when they are deemed to have changed. Otherwise, in embodiments where a single 'EP Frame to EI Frame Decoding Module' 660 is used to decode the video stream of the initial speaker, when an inactive speaker is assumed to become the second active speaker, the speaker identification Module 620 (or switch module 640) may request a new interpolated frame. Alternatively, the exchange module 640 may wait for a new interpolated frame of the new second active speaker before sending the corresponding video base layer stream to all participants.

在本发明的优选实施例之外，在多于一个的增强层可以使用的情况中使用多类发言人也在本发明的考虑之内。通过使用多类发言人，由于改善了发言人识别，可以得到多媒体消息的更精确的可量测性，尤其是对大的视频会议来说。In addition to the preferred embodiment of the present invention, it is also contemplated by the present invention to use multiple types of speakers in cases where more than one enhancement layer can be used. By using multiple classes of speakers, more accurate scalability of multimedia messages can be obtained due to improved speaker recognition, especially for large video conferences.

对一个或多个基本层流增加预测帧到内插帧的转换，也在本发明的考虑内。以这种方式，交换模块640可以快速的在基本层之间交换而不需等待新的内插帧。Adding the conversion of predicted frames to interpolated frames for one or more base layer streams is also contemplated by the present invention. In this way, the switching module 640 can quickly switch between base layers without waiting for new interpolation frames.

图7示出了使用本发明的优选实施例参加视频会议的无线装置700的视频显示器710。通过实现此前描述的本发明原理，可以得到改善的视频通信。具体地说，对于给定的带宽，通过降低次(第二)活跃发言人730的视频质量并且不为不活跃的发言人提供视频，参与方现在能够接收最活跃发言人720的更好的视频质量。为了提供这种改善的视频会议，视频通信装置接收最活跃发言人720的增强层和基本层、第二活跃发言人730的基本层并且不从不活跃发言人接收视频。Figure 7 shows a video display 710 of a wireless device 700 participating in a video conference using the preferred embodiment of the present invention. By implementing the principles of the invention described previously, improved video communications can be obtained. Specifically, for a given bandwidth, by reducing the video quality of the next (second) active speaker 730 and not providing video for inactive speakers, the participant is now able to receive a better video of the most active speaker 720 quality. To provide this improved video conferencing, the video communication device receives the enhancement and base layers of the most active speaker 720, the base layer of the second active speaker 730, and receives no video from inactive speakers.

以这种方式，视频通信单元可以在更大、更高分辨率的显示器提供不断更新的最活跃发言人的视频图像，同时较小的显示器可以显示第二(次)活跃发言人。In this way, the video communication unit can provide a continuously updated video image of the most active speaker on the larger, higher resolution display, while the smaller display can show the second (secondary) active speaker.

优选地，无线装置700具有用于显示最活跃发言人的较高质量视频图像的主要视频显示器710，以及一个或多个第二不同的显示器，用于显示各个次活跃发言人。优选地，由可操作地耦合到视频显示器的处理器(未示出)执行各个视频图像进入相应显示器的处理。处理器接收最活跃发言人720和次活跃发言人的指示，并且确定所接收的哪一个视频图像应该在第一显示器上显示，从次活跃发言人730接收的哪一个图像应该在第二显示器上显示。有益地，可以设置第二显示器，以提供较低质量的次活跃发言人视频图像，从而节省费用。Preferably, the wireless device 700 has a primary video display 710 for displaying a higher quality video image of the most active speaker, and one or more second, different displays for displaying each of the less active speakers. The processing of the respective video images into the respective displays is preferably performed by a processor (not shown) operatively coupled to the video displays. The processor receives indications of the most active speaker 720 and the second active speaker and determines which of the received video images should be displayed on the first display and which image received from the second active speaker 730 should be displayed on the second display show. Advantageously, a second display can be provided to provide a lower quality video image of the second active speaker, thereby saving costs.

可以预料到，在未来，基于MCU的系统将会有助于在基于IP的网络上的多媒体通信。因此，本发明的发明人想到，在此描述的技术可以包含在利用MCU的任何基于H.323/SIP的多点多媒体会议或系统中。It can be expected that in the future, MCU-based systems will facilitate multimedia communications over IP-based networks. Therefore, it is contemplated by the inventors of the present invention that the techniques described herein can be incorporated into any H.323/SIP based multipoint multimedia conference or system utilizing an MCU.

前述的优选应用是在用于宽带码分多址(WCDMA)标准的第三代合作计划(3GPP)规范中。具体地说，本发明可以应用于IP多媒体域(在规范的3G TS25.xxx系列中描述)，其计划把H.323/SIP MCU结合到3GPP网络中。见图8，MCU将由媒体资源功能元件(MRF)890A支持。The foregoing preferred application is in the Third Generation Partnership Project (3GPP) specification for the Wideband Code Division Multiple Access (WCDMA) standard. In particular, the invention can be applied in the IP multimedia domain (described in the 3G TS25.xxx series of specifications), which plans to incorporate H.323/SIP MCUs into 3GPP networks. Referring to Figure 8, the MCU will be supported by a Media Resource Function (MRF) 890A.

图8示出了一种以分级结构形式的3GPP(UMTS)通信系统/网络800，其能够在根据本发明的的优选实施例中采用。通信系统800适于并且包含能够在UMTS和/或GPRS空中接口上操作的网络元件。Figure 8 shows a 3GPP (UMTS) communication system/network 800 in a hierarchical structure, which can be employed in a preferred embodiment according to the present invention. The communication system 800 is adapted to and contains network elements capable of operating over UMTS and/or GPRS air interfaces.

通常认为该网络包括：This network is generally considered to include:

(i)用户设备域810，由以下构成：(i) User Equipment Domain 810, consisting of:

(a)用户SIM(USIM)域820，以及(a) User SIM (USIM) field 820, and

(b)移动设备域830；和(b) mobile device domain 830; and

(ii)基础设施域840，由以下构成：(ii) Infrastructure domain 840, consisting of:

(c)接入网域850，和(c) access network domain 850, and

(d)核心网域860，其由以下(至少)构成：(d) core network domain 860, which consists of (at least) the following:

(di)服务网域870，和(di) service domain 870, and

(dii)转接网域880，和(dii) transit domain 880, and

(diii)IP多媒体域890，具有由SIP提供的多媒体(ETFRFC2543)。(diii) IP Multimedia domain 890 with multimedia provided by SIP (ETFRFC2543).

在移动设备域830中，UE830A经有线Cu接口从USIM域820中的用户SIM820A接收数据。UE830A经无线Uu接口与网络接入域850中的节点B850A传送数据。在网络接入域850内，节点B850A包含一个或多个收发信机单元并且经UMTS规范定义的Iub接口与基于蜂窝的系统基础设施的其余部分，例如，RNC850B通信。In mobile device domain 830 UE 830A receives data from subscriber SIM 820A in USIM domain 820 via a wired Cu interface. UE 830A communicates data with Node B 850A in network access domain 850 via the wireless Uu interface. Within the Network Access Domain 850, Node B 850A contains one or more transceiver units and communicates with the rest of the cellular based system infrastructure, eg RNC 850B, via the Iub interface defined by the UMTS specification.

RNC850B经Iu接口与其它RNC(未示出)通信。RNC850B经Iu接口与服务网域870中的SGSN870A通信。在服务网域870内，SGSN870A经Gn接口与GGSN870B通信，并且SGSN870A经Gs接口与VLR服务器870C通信。根据本发明的优选实施例，SGSN870A与MCU(未示出)通信，该MCU位于IP多媒体域890的媒体资源功能元件(890A)内。经Gi接口执行通信。RNC 850B communicates with other RNCs (not shown) via the Iu interface. RNC 850B communicates with SGSN 870A in serving network domain 870 via the Iu interface. Within service network domain 870, SGSN 870A communicates with GGSN 870B via Gn interface, and SGSN 870A communicates with VLR server 870C via Gs interface. According to a preferred embodiment of the present invention, SGSN 870A communicates with an MCU (not shown), which is located within the media resource function element (890A) of IP multimedia domain 890 . Communication is performed via the Gi interface.

GGSN870B(和/或SSGN)负责UMTS(或GPRS)与诸如因特网或公共交换电话网(PSTN)这样的公共交换数据网(PDSN)880A接口。SGSN870A执行UMTS核心网内业务的路由和隧道功能，同时GGSN870B连接到外部分组网络，在这种情况中是任何一个访问系统的UMTS模式的网络。GGSN 870B (and/or SSGN) is responsible for interfacing UMTS (or GPRS) with a Public Switched Data Network (PDSN) 880A such as the Internet or the Public Switched Telephone Network (PSTN). The SGSN870A performs the routing and tunneling functions for traffic within the UMTS core network, while the GGSN870B connects to the external packet network, in this case any UMTS-mode network that accesses the system.

RNC850B是负责许多节点B的资源控制和分配的UTRAN元件；通常，一个RNC850B可以控制50到100个节点B。RNC850B还通过空中接口提供可靠的用户业务传送。多个RNC彼此通信(经接口Iur)以支持切换和宏分集。RNC 850B is a UTRAN element responsible for resource control and allocation of many Node Bs; typically, one RNC 850B can control 50 to 100 Node Bs. RNC850B also provides reliable user business transmission through the air interface. Multiple RNCs communicate with each other (via interface Iur) to support handover and macrodiversity.

SGSN870A是UMTS核心网元件，负责会话控制以及到位置寄存器(HLR和VLR)的接口。SGSN是用于许多RNC的大型集中控制器。SGSN870A is the UMTS core network element responsible for session control and interface to location registers (HLR and VLR). SGSN is a large centralized controller for many RNCs.

GGSN870B是UMTS核心网元件，负责把核心分组网的用户数据集中并隧道到最终的目的地(例如，因特网服务提供商(ISP))。这样的用户数据包括去往/来自IP多媒体域890的多媒体和相关的信令数据。在IP多媒体域890中，MRF被分为多媒体资源功能控制器(MRFC)892A和多媒体资源功能处理器(MPFP)891A。如上所述，MRFC892A提供多点控制器(MC)功能性，而MPFP891A提供多点处理器(MP)功能性。GGSN870B is a UMTS core network element, responsible for centralizing and tunneling user data of the core packet network to the final destination (for example, Internet Service Provider (ISP)). Such user data includes multimedia and associated signaling data to/from IP multimedia domain 890 . In the IP multimedia domain 890, the MRF is divided into a multimedia resource function controller (MRFC) 892A and a multimedia resource function processor (MPFP) 891A. As mentioned above, the MRFC892A provides multipoint controller (MC) functionality, while the MPFP891A provides multipoint processor (MP) functionality.

跨越Mr参考点/接口893A使用的协议是SIP(如RFC2543定义的)。呼叫状态控制功能元件(CSCF)895A充当呼叫服务器并处理多媒体呼叫信令。The protocol used across the Mr reference point/interface 893A is SIP (as defined in RFC2543). Call State Control Function (CSCF) 895A acts as a call server and handles multimedia call signaling.

因此，根据本发明的优选实施例，如在此之前描述的，元件SGSN870A、GGSN870B和所有MRF890A中的部分都适于促进多媒体消息。此外，如在此之前描述的，UE830A、节点B850A和RNC850B还适于促进改进的多媒体消息。Therefore, according to a preferred embodiment of the present invention, elements SGSN 870A, GGSN 870B and all parts of MRF 890A are adapted to facilitate multimedia messages as described heretofore. Furthermore, UE 830A, Node B 850A, and RNC 850B are also adapted to facilitate improved multimedia messaging, as described hereinbefore.

总的来说，这种适配可以以任何合适的方式在各个通信单元中实现。例如，可以在现有的通信单元添加新的装置，或作为选择的采用现有的通信单元的现有部分，例如通过对其中的一个或多个处理器重新编程。这样，所要求的适配可以以存储在存储介质上的处理器可实现指令的形式来实现，这里的存储介质例如软盘、硬盘、PROM、RAM或任何这些或其他存储多媒体的组合。In general, this adaptation can be carried out in the respective communication unit in any suitable way. For example, new devices may be added to an existing communication unit, or alternatively existing portions of an existing communication unit may be employed, such as by reprogramming one or more processors therein. Thus, the required adaptations may be implemented in the form of processor-implementable instructions stored on a storage medium such as a floppy disk, hard disk, PROM, RAM, or any combination of these or other stored multimedia.

作为选择的，多媒体消息的这种适配还可以通过采用通信系统800的任何其他部分来控制、全部实现或部分实现，这也在本发明的考虑中。Alternatively, such adaptation of the multimedia message may also be controlled, fully or partially implemented by employing any other part of the communication system 800, which is also contemplated by the present invention.

尽管通常提供上面的元件作为分立单元(在它们自己各自的软件/硬件平台上)，分为移动设备域830、接入网域850和服务网域870，但是可以想到也可以采用其他的配置。While the above elements are typically provided as discrete units (on their own respective software/hardware platforms), divided into mobile device domain 830, access network domain 850, and service network domain 870, it is contemplated that other configurations may also be employed.

另外，在其他网络基础设施的情况中，例如GSM网中，处理操作的实现可以由任何合适的节点来执行，例如任何其他合适类型的基站、基站控制器、移动交换中心或可操作和管理控制器等等。作为选择的，可以通过分布在任何合适网络网络内的不同位置或实体的各种部件来执行上面提到的步骤。Also, in the case of other network infrastructures, such as a GSM network, implementation of the processing operations may be performed by any suitable node, such as any other suitable type of base station, base station controller, mobile switching center or operational and management control device and so on. Alternatively, the above-mentioned steps may be performed by various components distributed at various locations or entities within any suitable network network.

如上所述，优选的，当应用在集中视频会议中时，使用分层视频编码的视频会议方法可以提供以下的优点：As mentioned above, preferably, when applied in centralized video conferencing, the video conferencing method using layered video coding can provide the following advantages:

(i)与传统系统相比，发言人的识别有了很大改善，因为共享带宽允许发送一个或多个增强层和几个基本层而不是只发送一个完全质量视频流。(i) Speaker recognition is much improved compared to conventional systems, since the shared bandwidth allows sending one or more enhancement layers and a few base layers instead of just one full-quality video stream.

(ii)当活跃发言人改变时，使用在此描述的本发明原理的视频交换更加平滑，这是因为它定义了几个状态，活跃发言人、第二最活跃发言人、不活跃发言人。(ii) Video switching using the inventive principles described here is smoother when the active speaker changes because it defines several states, active speaker, second most active speaker, inactive speaker.

(iii)最活跃发言人的视频质量得到了改善。(iii) The video quality of the most active speakers has been improved.

(iv)改进的视频通信单元可以显示各种发言人，每一被显示的图像依赖于与相应视频通信单元的传输有关的优先级。(iv) The improved video communication unit can display various speakers, each displayed image depending on the priority associated with the transmission of the corresponding video communication unit.

已经描述了一种在多个多媒体用户设备之间的多媒体视频会议中中继视频图像的方法。该方法包括以下步骤：通过许多用户设备中的多个发送分层视频图像，其中分层视频图像包括基本层和一个或多个增强层，并且在多点控制单元接收发送的分层视频图像。选择许多活跃发言人的许多基本层图像和最活跃发言人的一个或多个增强层。该多点控制单元把许多活跃发言人的许多基本层视频图像和最活跃发言人的一个或多个增强层发送给多个多媒体用户设备的一个或多个。A method of relaying video images in a multimedia video conference between a plurality of multimedia user equipment has been described. The method comprises the steps of: transmitting a layered video image comprising a base layer and one or more enhancement layers through a plurality of a plurality of user equipments, and receiving the transmitted layered video image at a multipoint control unit. A number of base layer images of many active speakers and one or more enhancement layers of the most active speakers are selected. The multipoint control unit transmits a plurality of base layer video images of a plurality of active speakers and one or more enhancement layers of a most active speaker to one or more of a plurality of multimedia user devices.

此外，描述了一种用于在多个用户设备之间中继视频图像的视频会议装置。另外，还描述了一种用于参与视频会议的无线装置，其中许多参与方发送视频图像。Furthermore, a video conferencing apparatus for relaying video images between a plurality of user equipment is described. Additionally, a wireless device for participating in a video conference in which a number of participants send video images is described.

Claims

1. A method for relaying video images in a multimedia conference between a plurality of multimedia user equipment (550, 560, 570, 580), the method comprising the steps of:

transmitting a layered video image by a plurality of said groups of user equipment, wherein said layered video image includes a base layer (552, 562, 572, 582) and one or more enhancement layers (555, 565, 575, 585);

Receive the layered video images sent at the multipoint control unit (520);

selecting a number of base layer video images of a number of active speakers (535) and one or more enhancement layers (540) of the most active speakers; and

sending, by the multipoint control unit (520), a plurality of base layer video images of the plurality of active speakers (535) and one or more enhancement layers (540) of the most active speakers to the plurality of multimedia user devices (550 , 560, 570, 580) one or more.

2. The method of relaying video images in a multimedia conference according to claim 1, wherein the selecting step further comprises the step of:

A number of audio data streams (590) transmitted by said plurality of multimedia user devices (550, 560, 570, 580) are analyzed to determine said number of active speakers and/or a most active speaker.

3. The method for relaying video images in a multimedia conference according to claim 1 or 2, wherein the method further comprises the steps of:

assigning a priority to each layered video image and/or said audio data stream sent by a respective user equipment; and

Based on the assigned priorities, a number of base layer video images (535) and one or more enhancement layers (540) are selected for transmission to all of the plurality of multimedia user equipments (550, 560, 570, 580) one or more of the above.

4. A method of relaying video images in a multimedia conference according to any preceding claim, wherein the method further comprises the step of:

The first predicted frame of the video image of the most active speaker is decoded (660) as an intra-coded frame for enhancing the video quality of the most active speaker.

5. A method of relaying video images in a multimedia conference according to any preceding claim, wherein the method further comprises the step of:

When more than one enhancement layer is available, an indication of the classification of the one or more speakers is received by the multimedia control unit (520) along with the transmission of each layered video image, so as to provide a classification of the video image More precise measurability.

6. A method of relaying video images in a multimedia conference according to any preceding claim, wherein the method further comprises the step of:

For one or more base layer video streams, convert predicted frames to intra-coded frames.

7. A video conferencing device for relaying video images between a plurality of user equipment (550, 560, 570, 580), the video conferencing device comprising:

A multipoint control unit (520) adapted to receive a plurality of layered video images of said plurality of user equipment transmissions, wherein said layered video images include a base layer (552, 562, 572, 582) and one or more enhancement layers (555, 565, 575, 585); and

a video exchange module (530), operatively coupled to said multipoint control unit (520) and adapted to select a number of base layer video images of a number of active speakers (535) and one or more of the most active speakers enhancement layer (540); wherein

The multipoint control unit (520) is further adapted to send a plurality of base layer video images of the plurality of active speakers (535) and one or more enhancement layers (540) of the most active speakers to the plurality of multimedia user devices One or more of (550, 560, 570, 580).

8. The video conferencing device of claim 7, further comprising:

a predicted frame to intra-coded frame decoding module (660), operatively coupled to said video exchange module (530), such that if said frame is initially received by said multipoint control unit (520) as a predicted frame, it Provides the most active speaker enhancement layer video stream as intra-coded frames.

9. A video conferencing device according to claim 7 or 8, further comprising:

A speaker identification module (620) analyzes a number of audio streams (590) to determine a number of active speakers and/or said most active speaker.

10. The video conferencing device of claim 9, wherein said speaker identification module (620) assigns priorities based on the determined activity of each participant to determine one or more of the following: most active speaker (622 ), any other active speakers (625), and any inactive speakers.

11. A wireless device (700) for participating in a video conference, in which a plurality of participants send video images, the wireless device (700) comprising:

a video display (710) having a first display and one or more second, different displays for displaying respective parties (720, 730) from the plurality of parties; and

a processor, operatively coupled to said video display, for receiving an indication of a most active speaker (720) and a plurality of next active speakers (730), and determining to receive from said most active speaker (720) The video images are displayed on the first display providing higher quality video images, and the video images received from the plurality of secondary active speakers (730) are displayed on all the lower quality video images on the second display.

12. A multipoint processor comprising:

One or more receiving ports adapted to receive a layered video image comprising a base layer video stream (552, 562, 572, 582) and an enhancement layer from a plurality of user equipments (550, 560, 570, 580) video streams (555, 565, 575, 585); and

a switching module (640), operatively coupled to the one or more receive ports, selects a number of base layer video images of a number of active speakers (535) and one or more enhancement layers (540) of the most active speakers, for transmission to one or more user equipments (550, 560, 570, 580).

13. The multipoint processor according to claim 12, further comprising:

a speaker identification module (620), operatively coupled to said one or more receive ports, for analyzing a plurality of audio streams (590) received from a plurality of said plurality of user devices to determine a plurality of active speakers and/or the most active speakers described.

14. The multipoint processor according to claim 12 or 13, wherein said speaker identification module (620) assigns priorities based on the determined activity of a number of participants to determine one or more of the following: most active speaker (622), any other active speakers (625), and any inactive speakers.

15. A multipoint processor according to any one of claims 12 to 14, further comprising:

a predicted frame-to-intra-coded frame decoding module (660), operatively coupled to said switching module (640), such that if said most active speaker's enhancement layer video stream received at a corresponding port is a predicted frame, Just convert it to an intra-coded frame.

16. A video conferencing device adapted to perform the method steps of any one of claims 1 to 6, or adapted to include any one of claims 7 to 10, or adapted to include a plurality of point processor for video communication systems.

17. The video communication system according to claim 16, wherein the video communication system is compatible with the UMTS communication standard (800) with the Internet Protocol Multimedia Domain (890) to facilitate video conference communication.

18. A video conferencing device adapted to perform the method steps of any one of claims 1 to 6, or adapted to include any one of claims 7 to 10, or adapted to include any one of claims 12 to 15 The media resource function element (890A) of the point processor.

19. A video communication unit (700) adapted to receive a layered video conference image produced according to the method of claims 1 to 6.

20. A video communication unit adapted to generate a layered video conference image for use in the method of claims 1 to 6, or to transmit a layered video conference image generated according to the method of claims 1 to 6.

21. The video communication unit of claim 19, wherein the video communication unit is one of:

Node B (850A), RNC (850B), SGSN (870A), GGSN (870B), MRF (890A).

22. A method of relaying video images in a multimedia video conference according to claims 1 to 6 or a video conferencing device according to any one of claims 7 to 10, or a multipoint processor according to any one of claims 12 to 15 Or a video communication system according to claims 16 to 17, or a media resource functional element (890A) according to claim 18 or a video communication unit according to claims 19, 20 or 21, all adapted to facilitate Standard video conferencing image.

23. A storage medium storing processor implementable instructions for controlling a processor to perform the method of any one of claims 1 to 6.