CN104782122B

CN104782122B - Controlled three-dimensional communication endpoint system and method

Info

Publication number: CN104782122B
Application number: CN201380053160.6A
Authority: CN
Inventors: Y·C·史密斯; E·G·朗; C·F·惠特玛; Z·张
Original assignee: Microsoft Corp; Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2012-10-10
Filing date: 2013-10-09
Publication date: 2018-01-19
Anticipated expiration: 2033-10-09
Also published as: EP2907302A2; CN104782122A; US8976224B2; JP2016500954A; WO2014058931A3; KR20150067194A; KR102108596B1; US9332222B2; EP3651454A1; WO2014058931A2; US20140098183A1; US20150163454A1; JP6285941B2

Abstract

A controlled three-dimensional (3D) communication endpoint system and method for emulating in-person communication between participants in an online meeting or conference and providing easy scalability of the virtual environment when additional participants join. This gives the participant the illusion that the other participants are in the same room and sitting around the same table as the viewer. The controlled communication endpoint includes a plurality of camera clusters that capture video of the participants 360 degrees around the participants. The controlled communication endpoint also includes a display device configuration including a display device positioned at least 180 degrees around the participant and displaying a virtual environment including geometric proxies of other participants. Scalability is easily achieved by placing participants at the virtual round table and increasing the diameter of the virtual table when additional participants are added.

Description

Controlled three-dimensional communication endpoint system and method

背景background

当前视频会议技术通常使用单个相机来捕捉本地场景的RGB数据(来自红、蓝以及绿(RGB)色彩模型)。这一本地场景通常包括参与视频会议的人，称为会议参与者。数据随后被实时传送到远程位置并随后显示给处于与其他会议参与者不同的位置处的另一会议参与者。Current videoconferencing technologies typically use a single camera to capture RGB data (from the red, blue, and green (RGB) color model) of the local scene. This local scene typically includes people participating in a video conference, known as conference participants. The data is then transmitted in real time to the remote location and then displayed to another meeting participant at a different location than the other meeting participants.

尽管在视频会议技术中已取得了帮助提供更高分辨率捕捉、压缩、以及传输的进步，但该体验通常达不到重建亲临会议的面对面体验。这一点的一个原因是典型的视频会议体验缺少眼睛注视以及其他纠正对话几何。例如，通常，被远程捕捉的人没有像在面对面对话中体验到的那样看向你的眼睛。此外，缺少像运动视差和图像深度以及改变场景中的视角的自由度等三维(3D)元素，因为只存在捕捉该场景以及会议参与者的单个固定摄像机。Although advances have been made in videoconferencing technology to help provide higher resolution capture, compression, and transmission, the experience often falls short of recreating the face-to-face experience of an in-person meeting. One reason for this is the lack of eye gaze and other corrective conversation geometry that is typical of the video conferencing experience. For example, often, the person being captured remotely does not meet your eyes the way it is experienced in a face-to-face conversation. Furthermore, three-dimensional (3D) elements like motion parallax and image depth and freedom to change the perspective in the scene are lacking since there is only a single fixed camera capturing the scene and the conference participants.

发明内容Contents of the invention

提供本概述以便以简化的形式介绍将在以下详细描述中进一步描述的一些概念。本概述并不旨在标识所要求保护主题的关键特征或必要特征，也不旨在用于限制所要求保护主题的范围。This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

受控三维(3D)通信端点系统和方法的各实施例对在线会面或会议中的参与者之间的亲临通信进行仿真。另外，受控3D通信端点系统和方法的各实施例允许容易地伸缩包含参与者的虚拟环境，使得附加参与者可通过仅仅增加虚拟环境中包含的虚拟桌的大小而被添加。此外，受控端点允许观看者感觉如同其他参与者与他在同一房间中一样。Embodiments of the controlled three-dimensional (3D) communication endpoint systems and methods simulate in-person communication between participants in an online meeting or conference. Additionally, embodiments of the controlled 3D communication endpoint system and method allow for easy scaling of a virtual environment containing participants such that additional participants can be added by merely increasing the size of a virtual table contained in the virtual environment. Furthermore, controlled endpoints allow the viewer to feel as if other participants are in the same room as him.

具体而言，受控3D通信端点系统和方法的各实施例使用端点处的多个相机群来捕捉参与者的3D视频图像。受控端点中的多个相机群被安排成使得它们允许绕参与者360度来捕捉该参与者。根据视频所捕捉的数据，为参与者创建几何代理。使用来自捕捉的视频的RGB数据和深度信息，为每一参与者创建几何代理。Specifically, embodiments of the controlled 3D communication endpoint systems and methods use multiple camera clusters at the endpoint to capture 3D video images of participants. The camera clusters in the controlled endpoints are arranged such that they allow the participant to be captured 360 degrees around the participant. Based on the data captured by the video, geometric proxies are created for the participants. Using RGB data and depth information from captured video, geometric proxies are created for each participant.

场景几何由该系统和方法的各实施例根据亲临通信中会存在的眼睛注视和对话几何来创建。场景几何的一般概念是创建参与者之间的相对几何。场景被实际上对齐以模拟现实生活场景，如同参与者在同一物理位置且参与亲临通信。Scene geometry is created by embodiments of the systems and methods from eye gaze and dialogue geometry that would exist in presence communications. The general concept of scene geometry is to create relative geometry between actors. The scenes are virtually aligned to simulate real life scenarios as if the participants were in the same physical location and participating in the in-person communication.

场景几何使用虚拟框来保持参与者之间的相对、一致的几何。具有两个参与者的会议(或一对一(1:1)场景几何)包括占据两个参与者的相应监视器(未示出)前方的空间的两个框。在存在三个参与者时，场景几何包括按等距的方式绕虚拟圆桌放置的三个虚拟框。Scene geometry uses virtual boxes to maintain relative, consistent geometry between participants. A conference with two participants (or one-to-one (1:1) scene geometry) includes two boxes occupying the space in front of the respective monitors (not shown) of the two participants. When there are three participants, the scene geometry includes three virtual boxes placed equidistantly around the virtual round table.

场景几何还包括虚拟相机。虚拟相机是来自多个相机群中的两者或更多者的图像的合成，以获得没有被任何一个相机群单独捕捉的相机视图。这允许该系统和方法的各实施例获得人们之间的自然眼睛注视以及联系。脸部跟踪技术可被用来通过帮助虚拟相机保持与观看者的眼睛注视对齐来改进性能。这意味着虚拟相机保持水平且在垂直和水平两个方向上与观看者的眼睛对齐。虚拟相机与脸部跟踪交互，以创建具有用户看向该用户的眼睛正在看向的位置的虚拟视点。因而，如果用户正看向远方，则虚拟视点始自用户看向远方的角度。如果用户正在看另一参与者，则虚拟视点始自用户正在看该另一参与者的角度。这不是通过人工使其看起来像用户正在看另一参与者来完成的，而是通过创建正确地表示用户正在看何处的虚拟几何来完成的。Scene geometry also includes virtual cameras. A virtual camera is a composite of images from two or more of multiple camera groups to obtain a camera view that was not captured by any one camera group alone. This allows embodiments of the systems and methods to achieve natural eye gaze and connection between people. Face tracking technology can be used to improve performance by helping the virtual camera stay in alignment with the viewer's eye gaze. This means that the virtual camera remains horizontal and aligned with the viewer's eyes both vertically and horizontally. The virtual camera interacts with face tracking to create a virtual point of view with the user looking where the user's eyes are looking. Thus, if the user is looking into the distance, the virtual viewpoint starts from the angle from which the user is looking into the distance. If the user is looking at another participant, the virtual point of view is from the angle from which the user is looking at the other participant. This is done not by artificially making it look like the user is looking at another participant, but by creating virtual geometry that correctly represents where the user is looking.

几何代理被相对于彼此渲染且与场景几何一起被放置到虚拟环境中。所渲染的几何代理和场景几何被传送给参与者中的每一个。虚拟环境被显示给端点的受控环境中的观看者(他也是参与者之一)。具体而言，每一端点包含使用虚拟视点向观看者显示虚拟环境的显示设备配置。虚拟视点依赖于观看者的眼睛的位置和定向。取决于眼睛的位置和定向，观看者看到会议中的其他参与者的不同角度以及虚拟环境的其他方面。Geometry proxies are rendered relative to each other and placed into the virtual environment along with the scene geometry. The rendered geometry proxy and scene geometry are communicated to each of the participants. The virtual environment is displayed to the viewer (who is also one of the participants) in the controlled environment of the endpoint. Specifically, each endpoint contains a display device configuration that displays a virtual environment to a viewer using a virtual viewpoint. The virtual point of view depends on the position and orientation of the viewer's eyes. Depending on the position and orientation of the eyes, the viewer sees different angles of other participants in the meeting and other aspects of the virtual environment.

现实空间与虚拟空间的配准确保所显示的图像是观看者在她正在察看虚拟环境中其他参与者的情况下会看到的。另外，脸部跟踪技术可被用来跟踪观看者的眼睛以知晓虚拟视点应当显示什么。为了以高效的方式大规模地为参与者创建真实几何且为了帮助维护参与者全部都在一个物理位置的假象，控制端点的大小和布局使得更易于构建解决方案。The registration of the real space to the virtual space ensures that the displayed image is what the viewer would see if she were looking at other participants in the virtual environment. Additionally, face tracking technology can be used to track the viewer's eyes to know what the virtual viewpoint should display. Controlling the size and placement of endpoints makes it easier to build solutions in order to create real geometry for participants at scale in an efficient manner and to help maintain the illusion that participants are all in one physical location.

显示设备配置包含多个显示设备(如监视器或屏幕)。显示设备配置控制端点环境，使得显示设备被安排成绕观看者至少180度。这确保观看者具有沉浸式体验并且感觉如同他实际上与其他参与者在同一物理空间。A display device configuration contains multiple display devices (such as monitors or screens). The display device configuration controls the endpoint environment such that the display devices are arranged at least 180 degrees around the viewer. This ensures that the viewer has an immersive experience and feels as if he is actually in the same physical space as the other participants.

该系统和方法的各实施例还允许容易的可扩展性。具体而言，在一些实施例中，虚拟桌是具有第一直径的圆形(或环形)虚拟桌。参与者中的每一个的几何代理被绕该虚拟桌置于虚拟环境中。这确保观看者可以看到围绕虚拟桌的参与者中的每一个。如果更多参与者被添加到在线会议中，则虚拟圆桌的大小被扩展到大于第一直径的第二直径。第二直径可以是大于第一直径的任何直径。这一扩展将参与者中的每一个仍然保持在视野中以供观看，且给出了与其他参与者一起绕桌处于同一房间的假象。Embodiments of the system and method also allow for easy scalability. Specifically, in some embodiments, the virtual table is a circular (or annular) virtual table having a first diameter. A geometric proxy for each of the participants is placed in the virtual environment around the virtual table. This ensures that viewers can see every one of the participants surrounding the virtual table. If more participants are added to the online meeting, the size of the virtual round table is expanded to a second diameter that is larger than the first diameter. The second diameter may be any diameter greater than the first diameter. This extension keeps each of the participants still in view for viewing and gives the illusion of being in the same room with the other participants around the table.

该系统和方法的各实施例还包括促进在单个端点处的多个参与者。在一些实施例中，脸部跟踪技术跟踪两个不同的脸并随后向不同的观看者提供不同视图。在其他实施例中，端点处的多个参与者中的每一个佩戴眼镜，并且在一些实施例中，该眼镜上具有快门，快门向每一佩戴者示出由监视器显示的被调谐到每一对眼镜的交替帧。其他实施例使用具有多个观看角度的监视器，使得正在从右侧观看监视器的观看者看到一个场景且正在从左侧观看监视器的另一观看者看到不同场景。Embodiments of the systems and methods also include facilitating multiple participants at a single endpoint. In some embodiments, face tracking technology tracks two different faces and then provides different views to different viewers. In other embodiments, each of the plurality of participants at the endpoint wears glasses, and in some embodiments, has a shutter on the glasses that shows each wearer the information displayed by the monitor that is tuned to each Alternate frames of a pair of glasses. Other embodiments use monitors with multiple viewing angles such that a viewer who is looking at the monitor from the right sees one scene and another viewer who is looking at the monitor from the left sees a different scene.

应当注意，替换实施例也是可能的，并且此处所讨论的步骤和元素可取决于特定实施例而改变、添加或消除。这些替换实施例包括可使用的替换步骤和替换元素，以及可做出的结构上的改变，而不脱离本发明的范围。It should be noted that alternative embodiments are possible and that steps and elements discussed herein may be changed, added or eliminated depending on the particular embodiment. These alternative embodiments include alternative steps and elements that may be used, and structural changes that may be made without departing from the scope of the present invention.

附图简述Brief description of the drawings

现在参考附图，在全部附图中，相同的附图标记表示相应的部分：Referring now to the drawings, like reference numerals designate corresponding parts throughout:

图1是示出在计算环境中实现的受控三维(3D)通信端点系统和方法的各实施例的一般概览的框图。1 is a block diagram illustrating a general overview of embodiments of a controlled three-dimensional (3D) communication endpoint system and method implemented in a computing environment.

图2是示出图1所示的3D通信处理系统的系统细节的框图。FIG. 2 is a block diagram showing system details of the 3D communication processing system shown in FIG. 1 .

图3是示出图1所示的受控3D通信端点和方法的各实施例的相机群的示例性实施例的细节的框图。FIG. 3 is a block diagram showing details of an exemplary embodiment of a camera cluster of embodiments of the controlled 3D communication endpoint and method shown in FIG. 1 .

图4示出使用四个相机群的相机群布局(诸如图2所示)的示例性实施例。FIG. 4 illustrates an exemplary embodiment of a camera cluster layout such as that shown in FIG. 2 using four camera clusters.

图5示出使用三个显示设备的显示设备配置(诸如图1所示)的示例性实施例。FIG. 5 shows an exemplary embodiment of a display device configuration such as that shown in FIG. 1 using three display devices.

图6示出其上可实现此处描述且在图1-5和7-15中所示的3D通信窗口系统和方法的各实施例和元素的通用计算机系统的简化示例。Figure 6 illustrates a simplified example of a general-purpose computer system on which various embodiments and elements of the 3D communication window systems and methods described herein and shown in Figures 1-5 and 7-15 may be implemented.

图7是示出图1所示的受控3D通信端点系统的总体操作的流程图。FIG. 7 is a flowchart showing the overall operation of the controlled 3D communication endpoint system shown in FIG. 1 .

图8是示出图1所示的3D通信处理系统的总体操作的流程图。FIG. 8 is a flowchart showing the overall operation of the 3D communication processing system shown in FIG. 1 .

图9示出扩展该系统和方法的各实施例以容纳附加端点的示例性实施例。Figure 9 illustrates an exemplary embodiment of extending the embodiments of the systems and methods to accommodate additional endpoints.

图10示出创建单个会议参与者的几何代理的示例性概览。Figure 10 shows an exemplary overview of creating a geometric proxy for a single conference participant.

图11示出当在线会议中存在两个参与者(在两个不同的端点处)时参与者之间的场景几何的示例性实施例。Figure 11 shows an exemplary embodiment of scene geometry between participants when there are two participants (at two different endpoints) in an online meeting.

图12示出当在线会议中存在处于三个不同端点处的三个参与者时参与者之间的场景几何的示例性实施例。Figure 12 shows an exemplary embodiment of scene geometry between participants when there are three participants at three different endpoints in an online meeting.

图13示出基于参与者正在看的位置的虚拟相机的示例性实施例。Figure 13 illustrates an exemplary embodiment of a virtual camera based on where a participant is looking.

图14示出通过基于观看者所面向的位置的运动视差来提供深度的示例性实施例。FIG. 14 illustrates an exemplary embodiment in which depth is provided by motion parallax based on where the viewer is facing.

图15示出使用具有多个观看角度的监视器处理单个端点处的多个参与者的技术的示例性实施例。15 illustrates an exemplary embodiment of a technique for handling multiple participants at a single endpoint using monitors with multiple viewing angles.

详细描述A detailed description

在以下对受控三维(3D)通信端点系统和方法的描述中，对附图进行了参考，附图形成了该描述的一部分，并且其中作为说明示出了可实践3D通信端点系统和方法的各实施例的一个具体示例。可以理解，可以利用其他实施例，并且可以作出结构上的改变而不背离所要求保护的主题的范围。In the following description of the controlled three-dimensional (3D) communication endpoint system and method, reference is made to the accompanying drawings, which form a part hereof, and in which are shown by way of illustration an aspect of the practicable 3D communication endpoint system and method A specific example of each embodiment. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the claimed subject matter.

I.系统概览 I. System overview

受控3D通信端点系统和方法的各实施例创建用于沉浸式在线会议和会面的受控捕捉和观看空间。该系统和方法的各实施例确保在参与者加入在线会议或会面时各端点处的一致性。各端点在在线会议期间是完全受控的，包括光照、房间设计、以及几何形状。此外，端点包括用于捕捉和观看3D沉浸式会议的装备，使得对观看者而言，其他参与者看起来实际上与该参与者处于同一房间(或同一物理空间)。Embodiments of the controlled 3D communication endpoint system and method create a controlled capture and viewing space for immersive online meetings and meetings. Embodiments of the system and method ensure consistency at endpoints when participants join an online meeting or meeting. Each endpoint is fully controlled during the online meeting, including lighting, room design, and geometry. In addition, endpoints include gear for capturing and viewing immersive meetings in 3D, such that to the viewer it appears that other participants are actually in the same room (or same physical space) as the participant.

端点是包含在线会议或会面的各参与者中的至少一者的物理位置，如房间或其他类型的环境。每一在线会议具有至少两个端点，其中每一端点具有至少一个参与者。每一端点可具有两个或更多个参与者。下文详细讨论处理具有两个或更多个参与者的端点的方式。An endpoint is a physical location, such as a room or other type of environment, that contains at least one of the participants of an online meeting or meeting. Each online meeting has at least two endpoints, where each endpoint has at least one participant. Each endpoint can have two or more participants. The manner in which endpoints with two or more participants are handled is discussed in detail below.

图1是示出在计算环境中实现的受控三维(3D)通信端点系统100和方法的各实施例的一般概览的框图。系统100和方法的各实施例包括一起工作来为在线会面或会议的参与者创建沉浸式体验的各种组件和系统。1 is a block diagram illustrating a general overview of embodiments of a controlled three-dimensional (3D) communication endpoint system 100 and method implemented in a computing environment. Embodiments of the system 100 and method include various components and systems that work together to create an immersive experience for participants in an online meeting or conference.

如图1所示，系统100和方法包括促进参与者的沉浸式体验的3D通信处理系统105。3D通信处理系统105被实现在计算设备110上。这一计算设备可以是单个计算设备或可以分布在多个设备上。此外，计算设备110实际上可以是具有处理器的任何设备，包括台式计算机、平板计算设备以及嵌入式计算设备。As shown in FIG. 1 , the system 100 and method includes a 3D communication processing system 105 that facilitates a participant's immersive experience. The 3D communication processing system 105 is implemented on a computing device 110 . This computing device can be a single computing device or can be distributed across multiple devices. Moreover, computing device 110 may be virtually any device having a processor, including desktop computers, tablet computing devices, and embedded computing devices.

系统100和方法的各实施例包括至少两个端点。出于教学和易于解释的目的，图1只示出了两个端点。然而，应当注意，系统100和方法的各实施例可包括若干更多端点。此外，虽然图1中的每一端点仅示出了单个参与者，但应当注意，任何数量的参与者可被包括在任何端点处。Various embodiments of the system 100 and method include at least two endpoints. For pedagogical and ease of explanation purposes, Figure 1 shows only two endpoints. It should be noted, however, that various embodiments of the system 100 and method may include several more endpoints. Furthermore, while only a single participant is shown for each endpoint in Figure 1, it should be noted that any number of participants may be included at any endpoint.

系统100和方法的各实施例包括第一端点115和第二端点120。在图1中，第一端点115和第二端点120被示出在平面图中。换言之，如果第一和第二端点115、120是房间，则图1是房间的平面图。Various embodiments of the system 100 and method include a first endpoint 115 and a second endpoint 120 . In FIG. 1 , the first end point 115 and the second end point 120 are shown in plan view. In other words, if the first and second endpoints 115, 120 are rooms, then Figure 1 is a plan view of the room.

第一端点115包括其中包含的第一参与者125。第一端点115还包含多个捕捉和观看设备。第一端点115处的观看设备包括第一监视器130、第二监视器135、以及第三监视器140。观看设备向第一参与者125提供在线会议中的沉浸式体验，使得第一参与者125感觉好像他与其他参与者一起在该房间中。The first endpoint 115 includes a first participant 125 contained therein. The first endpoint 115 also includes a number of capture and viewing devices. The viewing devices at the first endpoint 115 include a first monitor 130 , a second monitor 135 , and a third monitor 140 . The viewing device provides the first participant 125 with an immersive experience in the online meeting such that the first participant 125 feels as if he is in the room with the other participants.

系统100和方法的各实施例包括具有被安排成使得它们至少绕参与者180度的监视器或屏幕的监视器配置。监视器的配置可实际上是任何安排，只要它们绕参与者至少180度来放置。如下文详细解释的，这确保参与者的体验是完全沉浸式的且使得能够取决于在线会议参与者的数量而伸缩。Embodiments of the system 100 and method include monitor configurations having monitors or screens arranged such that they circle at least 180 degrees around a participant. The configuration of monitors can be virtually any arrangement as long as they are placed at least 180 degrees around the participant. As explained in detail below, this ensures that the participants' experience is fully immersive and enables scaling depending on the number of online meeting participants.

图1中的监视器配置示出了第一端点115中的与第一监视器130成直角的第二和第三监视器135、140。此外，第一端点115中的监视器130、135、140绕第一参与者125至少180度。在替换实施例中，监视器配置可以是曲线形的，诸如半圆、或彼此之间的角度可以小于直角。The monitor configuration in FIG. 1 shows second and third monitors 135 , 140 in the first end point 115 at right angles to the first monitor 130 . Additionally, the monitors 130 , 135 , 140 in the first endpoint 115 circle the first participant 125 at least 180 degrees. In alternative embodiments, the monitor configurations may be curved, such as semicircles, or may be at less than right angles to each other.

系统100和方法的各实施例还包括用于捕捉第一端点115内的第一参与者125的至少一部分的捕捉设备。系统100和方法的各实施例使用多个相机群作为捕捉设备。应当注意，虽然图1中示出了六个相机群，但可以使用更少或更多相机群。Embodiments of the system 100 and method also include a capture device for capturing at least a portion of the first participant 125 within the first endpoint 115 . Embodiments of the system 100 and method use multiple camera clusters as capture devices. It should be noted that while six camera clusters are shown in Figure 1, fewer or more camera clusters may be used.

如图1所示，第一端点115包括位于第一参与者125前方的第一多个相机群145以及位于第一参与者125后方的第二多个相机群150。每一相机群的细节在下文详细解释。图1示出了附连到第一监视器130的第一多个相机群145以及附连到第一端点115的支撑结构(如房间中的墙或房间的地板上)的第二多个相机群150。然而，应当注意，在替换实施例中，第一和第二多个相机群145、150可被安装在某一其他结构上，或者一些相机群可被安装在第一监视器130上且其他相机群被安装在其他结构上。As shown in FIG. 1 , the first endpoint 115 includes a first plurality of camera clusters 145 located in front of the first participant 125 and a second plurality of camera clusters 150 located behind the first participant 125 . The details of each camera group are explained in detail below. FIG. 1 shows a first plurality of camera clusters 145 attached to a first monitor 130 and a second plurality of cameras attached to a support structure at a first end point 115, such as a wall in a room or the floor of a room. Camera group 150. It should be noted, however, that in alternative embodiments, the first and second plurality of camera groups 145, 150 may be mounted on some other structure, or some camera groups may be mounted on the first monitor 130 and other cameras Swarms are mounted on other structures.

第二端点120包括其中包含的第二参与者155。与第一端点115类似，第二端点120也包含多个捕捉和观看设备。第二端点120处的观看设备包括第四监视器160、第五监视器165、以及第六监视器170。这些监视器160、165、170向第二参与者155提供在线会议中的沉浸式体验，使得第一参与者125感觉好像他与其他参与者一起在该房间中。The second endpoint 120 includes a second participant 155 contained therein. Similar to the first endpoint 115, the second endpoint 120 also includes multiple capture and viewing devices. The viewing devices at the second endpoint 120 include a fourth monitor 160 , a fifth monitor 165 , and a sixth monitor 170 . These monitors 160, 165, 170 provide the second participant 155 with an immersive experience in the online meeting, making the first participant 125 feel as if he is in the room with the other participants.

图1中的监视器配置示出了第二端点120中的与第四监视器160之间的角度小于90度的第五和第六监视器165、170。此外，第二端点120中的监视器160、165、170绕第二参与者155至少180度。在替换实施例中，监视器配置也可以是曲线形的，如半圆。The monitor configuration in FIG. 1 shows fifth and sixth monitors 165 , 170 in the second end point 120 having an angle of less than 90 degrees from the fourth monitor 160 . Additionally, the monitors 160 , 165 , 170 in the second endpoint 120 circle the second participant 155 at least 180 degrees. In alternative embodiments, the monitor configuration may also be curved, such as a semicircle.

系统100和方法的各实施例还包括用于捕捉第二端点120内的第二参与者155的至少一部分的捕捉设备。系统100和方法的各实施例使用多个相机群作为捕捉设备。应当注意，虽然在图1中的第二端点120中示出了十个相机群，但可以使用更少或更多相机群。Embodiments of the system 100 and method also include a capture device for capturing at least a portion of the second participant 155 within the second endpoint 120 . Embodiments of the system 100 and method use multiple camera clusters as capture devices. It should be noted that although ten camera clusters are shown in the second endpoint 120 in FIG. 1 , fewer or more camera clusters may be used.

如图1所示，第二端点120包括位于第二参与者155前方的第三多个相机群175以及位于第二参与者155后方的第四多个相机群180。每一相机群的细节在下文详细解释。此外，第五多个相机群185位于第二参与者155左侧且第六多个相机群190位于第二参与者155右侧。As shown in FIG. 1 , the second endpoint 120 includes a third plurality of camera clusters 175 located in front of the second participant 155 and a fourth plurality of camera clusters 180 located behind the second participant 155 . The details of each camera group are explained in detail below. Additionally, the fifth plurality of camera groups 185 is located to the left of the second participant 155 and the sixth plurality of camera groups 190 is located to the right of the second participant 155 .

图1示出了第三多个相机群175附连到第四监视器160，第五多个相机群185附连到第五监视器165，且第六多个相机群190附连到第六监视器170。第四多个相机群180附连到第二端点120的支撑结构(如房间中的墙或房间的地板上)。然而，应当注意，在替换实施例中，第三、第四、第五、以及第六多个相机群175、180、185、190可被安装在某一其他结构上，或者一些相机群安装在第二端点120内的其他结构上。1 shows that the third plurality of camera groups 175 is attached to the fourth monitor 160, the fifth plurality of camera groups 185 is attached to the fifth monitor 165, and the sixth plurality of camera groups 190 is attached to the sixth Monitor 170. A fourth plurality of camera groups 180 is attached to a supporting structure of the second end point 120 (eg, a wall in a room or the floor of a room). However, it should be noted that in alternative embodiments, the third, fourth, fifth, and sixth plurality of camera groups 175, 180, 185, 190 may be mounted on some other structure, or some of the camera groups may be mounted on On other structures within the second endpoint 120.

第一参与者125被第一端点115中的相机群捕捉，且第二参与者被第二端点120中的相机群捕捉。这一捕捉到的信息随后被传送给3D通信处理系统105的各实施例，如下文详细解释的。第一端点115的捕捉设备通过网络195与3D 通信处理系统105通信。网络195与第一端点115之间的通信是使用第一通信链路来促进的。类似地，网络195与第二端点120之间的通信由第二通信链路198来促进。在图1中，3D通信处理系统105的各实施例被示为驻留在网络195上。然而，应当注意，这只是3D通信处理系统105可被实现在系统100和方法的各实施例内的一种方式。The first participant 125 is captured by the camera cluster in the first endpoint 115 and the second participant is captured by the camera cluster in the second endpoint 120 . This captured information is then communicated to various embodiments of the 3D communication processing system 105, as explained in detail below. The capture device of the first endpoint 115 communicates with the 3D communication processing system 105 through the network 195 . Communication between network 195 and first endpoint 115 is facilitated using a first communication link. Similarly, communication between network 195 and second endpoint 120 is facilitated by second communication link 198 . In FIG. 1 , various embodiments of 3D communication processing system 105 are shown as residing on network 195 . It should be noted, however, that this is only one way in which the 3D communication processing system 105 may be implemented within various embodiments of the system 100 and method.

捕捉到的信息被处理且发送给各端点以供在监视器上观看。系统100和方法的各实施例向各端点处的每一参与者提供虚拟视点。如下文详细解释的，虚拟视点允许观看者从依赖于观看者的脸的位置和定向的可变视角来观看在线会议。在一些实施例中，脸部跟踪被用来跟踪观看者的眼睛注视并确定经处理的信息应当如何被呈现给观看者。The captured information is processed and sent to endpoints for viewing on monitors. Embodiments of the system 100 and method provide virtual viewpoints to each participant at each endpoint. As explained in detail below, virtual viewpoints allow viewers to view an online meeting from variable perspectives that depend on the position and orientation of the viewer's face. In some embodiments, face tracking is used to track the viewer's eye gaze and determine how the processed information should be presented to the viewer.

II.系统细节II. System Details

系统100和方法的各实施例包括被一起用来向参与者提供在线会议中的沉浸式体验的各种组件和设备。现在将讨论这些组件和设备。应当注意，其它实施例是可能的，并且其它设备可被用于或替换来实现所讨论的组件和设备的目的和功能。Embodiments of the system 100 and method include various components and devices that are used together to provide participants with an immersive experience in an online meeting. These components and devices will now be discussed. It should be noted that other embodiments are possible and other devices may be used or substituted to achieve the purpose and function of the components and devices discussed.

系统100和方法的各实施例包括一起工作以创建“亲临”通信体验的三个主要组件。第一组件是捕捉并创建参与会议的每一个人的3D视频图像。第二组件是基于会议中的参与者的数量来创建相关场景几何。并且，第三组件是渲染并提供虚拟视图，如同相机被放置在观看者正在看的位置的视角，从而重新创建参与者在亲临交谈时将具有的相同场景几何。Embodiments of the system 100 and method include three main components that work together to create an "in-person" communication experience. The first component is to capture and create a 3D video image of everyone participating in the meeting. The second component is to create relevant scene geometry based on the number of participants in the conference. And, the third component is to render and provide a virtual view as if the camera were placed where the viewer is looking, thus recreating the same scene geometry that the participants would have if they were talking in person.

II.A. 3D通信处理系统II.A. 3D communication processing system

图2是示出图1所示的3D通信处理系统105的系统细节的框图。如图2所示，3D通信处理系统105包括捕捉和创建组件200、场景几何组件210、以及虚拟视点组件220。捕捉和创建组件200被用于捕捉和创建端点处的参与者的3D视频图像。FIG. 2 is a block diagram showing system details of the 3D communication processing system 105 shown in FIG. 1 . As shown in FIG. 2 , the 3D communication processing system 105 includes a capture and creation component 200 , a scene geometry component 210 , and a virtual viewpoint component 220 . Capture and creation component 200 is used to capture and create 3D video images of participants at endpoints.

具体而言，捕捉和创建组件200包括包含多个相机群的相机群布局230。相机群布局230被用来从多个视角来捕捉参与者。计算机视觉方法被用来创建每一会议参与者的高保真度几何代理。如下文详细解释的，这通过取得从RGB数据收集模块235获得的RGB数据以及深度信息计算模块240获得并计算得到的深度信息来实现。根据这一信息，几何代理创建模块245创建每一参与者的几何代理250。基于图像的渲染方法被用来创建几何代理250的逼真纹理，诸如与视点相关纹理映射一样。Specifically, capture and creation component 200 includes a camera cluster layout 230 that includes multiple camera clusters. Camera group layout 230 is used to capture participants from multiple perspectives. Computer vision methods were used to create high-fidelity geometric proxies of each meeting participant. As explained in detail below, this is achieved by taking the RGB data obtained from the RGB data collection module 235 and the depth information obtained and calculated by the depth information calculation module 240 . From this information, geometric proxy creation module 245 creates a geometric proxy 250 for each participant. Image-based rendering methods are used to create realistic textures for geometry proxy 250, such as with view-dependent texture mapping.

场景几何组件210被用来创建正确场景几何以模拟各参与者在一起进行真实对话。这一场景几何依赖于会议中参与者的数量。3D配准模块260被用来获得显示设备或监视器与相机群的精确配准。此外，空间对齐模块265将相机群的定向与真实世界对齐。对于1:1会议(具有两个端点)，这是在虚拟环境中将这两个物理空间简单地彼此面对面排成列。为每一参与者重新创建的捕捉区域是监视器前方的区域。The scene geometry component 210 is used to create the correct scene geometry to simulate the participants having a real conversation together. This scenario geometry depends on the number of participants in the conference. The 3D registration module 260 is used to obtain precise registration of the display device or monitor with the camera cluster. Additionally, the spatial alignment module 265 aligns the orientation of the camera population to the real world. For a 1:1 meeting (with two endpoints), this is simply lining up the two physical spaces facing each other in a virtual environment. The capture area recreated for each participant is the area in front of the monitor.

一旦为每一会议参与者创建了纹理化的几何代理250并且参与者被表示在与会议中的其他参与者相关的3D虚拟空间中，几何代理就按与对话几何相一致的方式来向彼此渲染。此外，这一渲染是基于会议中的参与者的数量来完成的。Once the textured geometric proxies 250 are created for each conference participant and the participants are represented in a 3D virtual space in relation to the other participants in the conference, the geometric proxies are rendered to each other in a manner consistent with the conversation geometry . Also, this rendering is done based on the number of participants in the meeting.

几何代理(且在一些情况下是配准和对齐信息)被传输模块270传送给远程参与者。虚拟视点组件220被用来增强渲染给远程参与者的虚拟视点。‘在现场’的体验是通过使用将运动视差和深度添加到参与者身后的场景的运动视差模块280来增强的。任一参与者的水平和横向移动改变他们本地显示器上示出的视点，且参与者从不同的视角看到他们正在看的场景以及其中的人。这极大地增强了会议参与者的体验。Geometric proxies (and in some cases registration and alignment information) are communicated by transmission module 270 to remote participants. The virtual viewpoint component 220 is used to enhance the virtual viewpoint rendered to the remote participant. The 'on-set' experience is enhanced by using a motion parallax module 280 that adds motion parallax and depth to the scene behind the participant. Horizontal and lateral movement of either participant changes the point of view shown on their local display, and the participant sees the scene they are looking at and the people in it from a different perspective. This greatly enhances the experience of meeting participants.

II.B.相机群II.B. Camera group

如上所述，系统100和方法的捕捉和创建组件200包括被用来捕捉端点中的参与者和场景的多个相机群。每一相机群具有多个传感器。图3是示出图1所示的受控3D通信端点系统100和方法的各实施例的相机群300的示例性实施例的细节的框图。如图1所示，系统100和方法的各实施例通常包括一个以上相机群300。然而，仅出于教学目的，将描述单个相机群。此外，应当注意，多个相机群不一定必须包括相同传感器。系统100和方法的一些实施例可包括包含彼此不同的传感器的多个相机群。As noted above, the capture and creation component 200 of the system 100 and method includes multiple camera clusters that are used to capture participants and scenes in endpoints. Each camera group has multiple sensors. FIG. 3 is a block diagram showing details of an exemplary embodiment of a camera cluster 300 of embodiments of the controlled 3D communication endpoint system 100 and method shown in FIG. 1 . As shown in FIG. 1 , various embodiments of the system 100 and method generally include more than one camera cluster 300 . However, for educational purposes only, a single camera cluster will be described. Furthermore, it should be noted that multiple camera clusters do not necessarily have to include the same sensor. Some embodiments of the system 100 and method may include multiple camera clusters that include different sensors from each other.

如图3所示，相机群300包括多个相机传感器。这些传感器包括立体传感器红外(IR)相机310、RGB相机320、以及IR发射器330。为了捕捉参与者和端点的3D图像，相机群300捕捉RGB数据和深度坐标，以计算深度图。图3示出了IR立体IR相机310和IR发射器330被用来捕捉深度计算。RGB相机320被用于纹理获取以及使用深度分割来加强深度线索。在计算机视觉领域中公知的深度分割寻求使用背景移除来将图像中的对象与背景分开。As shown in FIG. 3 , camera group 300 includes a plurality of camera sensors. These sensors include a stereo sensor infrared (IR) camera 310 , an RGB camera 320 , and an IR emitter 330 . To capture 3D images of participants and endpoints, camera cluster 300 captures RGB data and depth coordinates to compute a depth map. Figure 3 shows an IR stereo IR camera 310 and IR emitter 330 are used to capture depth calculations. RGB camera 320 is used for texture acquisition and depth cues are enhanced using depth segmentation. Depth segmentation, well known in the field of computer vision, seeks to separate objects in an image from the background using background removal.

在替换实施例中，代替IR结构光方法，相机群300使用飞行时间传感器或超声来实现立体感测。飞行时间相机是基于光速并通过测量光信号在相机与对象之间的飞行时间来计算图像中每一点的距离的距离成像相机系统。超声技术可被用来通过生成某一方向上的超声脉冲来计算距离。如果脉冲的路径中存在对象，则该脉冲的一部分或全部将作为回声被反射回发射机。可通过测量发射的脉冲与接收到的回声之差来得出距离。在其他实施例中，距离可通过使用RGB相机的立体对执行RGB深度计算来得出。In an alternative embodiment, instead of the IR structured light approach, the camera cluster 300 uses time-of-flight sensors or ultrasound to achieve stereo sensing. A time-of-flight camera is a range-imaging camera system that calculates the distance of each point in an image based on the speed of light and by measuring the time-of-flight of the light signal between the camera and the object. Ultrasound techniques can be used to calculate distance by generating ultrasound pulses in a certain direction. If there is an object in the path of the pulse, some or all of that pulse will be reflected back to the transmitter as an echo. Distance can be derived by measuring the difference between the transmitted pulse and the received echo. In other embodiments, the distance may be derived by performing RGB depth calculations using a stereo pair of RGB cameras.

II.C.相机群布局II.C. Camera Group Layout

一个或多个相机群被配置成特定布局，以捕捉端点的包括参与者中的一者或多者的3D图像。相机群的数量直接影响捕捉到的图像的质量以及遮挡的数量。随着相机群的数量增加，有更多的RGB数据可用且这改进了图像质量。此外，遮挡的数量将随着相机群数量的增加而减少。The one or more camera clusters are configured in a particular layout to capture 3D images of the endpoint including one or more of the participants. The number of camera clusters directly affects the quality of captured images as well as the amount of occlusions. As the number of camera clusters increases, more RGB data is available and this improves image quality. Furthermore, the number of occlusions will decrease as the number of camera clusters increases.

如图1所示，第一端点115包含6个相机群且第二端点120包含10个相机群。在替换实施例中，可以使用任何数量的相机。事实上，可存在使用单个相机群的较低端版本。例如，单个相机群可被安装在监视器顶部并使用图像失真校正技术来校正任何成像误差。标准是相机群布局应当具有足够相机群以提供包含参与者的端点的3D视图。As shown in FIG. 1 , the first endpoint 115 includes 6 camera clusters and the second endpoint 120 includes 10 camera clusters. In alternative embodiments, any number of cameras may be used. In fact, there may be lower end versions that use a single camera cluster. For example, a single camera cluster could be mounted on top of a monitor and use image distortion correction techniques to correct any imaging errors. The criterion is that the camera cluster layout should have enough camera clusters to provide a 3D view of the endpoint containing the participants.

图4示出使用四个相机群的相机群布局(诸如图2所示那样)的示例性实施例。如图4所示，四个相机群300被嵌入在监视器400的边框中。监视器400实际上可以是任何大小，但较大监视器提供更多与实物大小一样的重新投影。这通常向用户提供更真实的体验。显示在监视器400上的是参与在线会议或会面的远程参与者410。FIG. 4 illustrates an exemplary embodiment of a camera cluster layout such as that shown in FIG. 2 using four camera clusters. As shown in FIG. 4 , four camera groups 300 are embedded in a frame of a monitor 400 . Monitor 400 can be virtually any size, but larger monitors provide more life-size re-projections. This generally provides a more realistic experience to the user. Displayed on monitor 400 are remote participants 410 participating in an online meeting or meeting.

如图4所示，四个相机群300被安排成菱形配置。这允许系统100和方法的各实施例从上至下以及从一侧到另一侧来捕捉用户。此外，两个中间顶部和底部相机群可被用来无缝地获得用户的脸上的真实纹理。注意，角落中的相机将通常造成接缝问题。在其他实施例中，四个相机群300的实际上任何配置和安排可被使用且可被安装在监视器400上的任何位置。在又一些其他实施例中，四个相机群300中的一者或多者被安装在监视器400以外的位置。As shown in FIG. 4, four camera clusters 300 are arranged in a diamond configuration. This allows embodiments of the system 100 and method to capture users from top to bottom and side to side. Furthermore, two middle top and bottom camera clusters can be used to seamlessly obtain the real texture of the user's face. Note that cameras in corners will often cause seam problems. In other embodiments, virtually any configuration and arrangement of the four camera clusters 300 may be used and mounted anywhere on the monitor 400 . In still other embodiments, one or more of the four camera clusters 300 are mounted at locations other than the monitor 400 .

在替换实施例中，三个相机群被使用且位于监视器400顶部或底部。一些实施例使用位于监视器400的顶部和底部角落的两个相机群。在又一些其他实施例中，N个相机群被使用，其中N大于四(N>4)。在这一实施例中，N个相机群被围绕监视器400的外边缘来放置。在又一些实施例中，有多个相机群位于监视器400之后，以捕捉包含参与者的端点的3D场景。In an alternate embodiment, three camera clusters are used and are located on the top or bottom of the monitor 400 . Some embodiments use two camera clusters located at the top and bottom corners of monitor 400 . In still other embodiments, N camera groups are used, where N is greater than four (N>4). In this embodiment, N camera clusters are placed around the outer edge of the monitor 400. In yet other embodiments, multiple camera clusters are located behind the monitor 400 to capture the 3D scene including the participants' endpoints.

II.D.显示设备配置II.D. Display device configuration

若干显示设备(诸如监视器和屏幕)被配置成特定布局，以向每一参与者显示并呈现其他参与者中的至少一些的捕捉到的图像。系统100和方法的各实施例将显示设备配置成使得该安排绕端点中的参与者至少180度。这确保系统100和方法的各实施例可以伸缩并向参与者提供沉浸式体验。换言之，向端点中的参与者提供至少180度显示设备使他们能够同时看到虚拟桌处的每一个人。使用至少180度显示设备，在观看者绕虚拟圆桌向右看和向左看时，她将能够看到该桌处的每一人。Several display devices, such as monitors and screens, are configured in a particular layout to display and present to each participant the captured images of at least some of the other participants. Embodiments of the system 100 and method configure the display devices such that the arrangement is at least 180 degrees around the participants in the endpoints. This ensures that embodiments of the system 100 and method can scale and provide participants with an immersive experience. In other words, providing at least a 180 degree display device to the participants in the endpoint enables them to see everyone at the virtual table at the same time. Using at least a 180 degree display device, as the viewer looks right and left around the virtual round table, she will be able to see everyone at the table.

图5示出使用三个显示设备的显示设备配置(诸如图1所示那样)的示例性实施例。如图5所示，显示设备配置500被部署在端点环境510中。显示设备配置500包括被定位成使得它处于端点环境510中的参与者(未示出)前方的监视器#1520。显示设备配置还包括位于监视器#1520两侧的监视器#2530和监视器#3540。如图5所示，监视器#2530和监视器#3540各自以45度角连接到监视器#1520或与其接触。FIG. 5 shows an exemplary embodiment of a display device configuration such as that shown in FIG. 1 using three display devices. As shown in FIG. 5 , display device configuration 500 is deployed in endpoint environment 510 . The display device configuration 500 includes a monitor #1 520 positioned such that it is in front of a participant (not shown) in the endpoint environment 510 . The display device configuration also includes monitor #2530 and monitor #3540 located on both sides of monitor #1520. As shown in Figure 5, monitor #2 530 and monitor #3 540 are each attached to or in contact with monitor #1 520 at a 45 degree angle.

系统100和方法的各实施例使用端点环境510来进行捕捉和显示。在一些实施例中，显示设备配置500可以是360度配置。换言之，可存在完全围绕端点环境510中的参与者的显示设备。在其他实施例中，显示设备可包括被安排成围绕端点环境510的、范围从包括180度到360度的任何度数的显示设备。在又一些其他实施例中，显示设备配置500，其中端点环境510的所有墙和天花板都是显示设备。这一类型的显示设备配置可使参与者完全沉浸在纯虚拟环境中。Embodiments of the system 100 and method use the endpoint environment 510 for capture and display. In some embodiments, display device configuration 500 may be a 360-degree configuration. In other words, there may be a display device that completely surrounds a participant in endpoint environment 510 . In other embodiments, the display device may comprise a display device arranged to surround the endpoint environment 510 in any degree ranging from 180 degrees to 360 degrees inclusive. In yet other embodiments, the display device configuration 500, wherein all walls and ceilings of the endpoint environment 510 are display devices. This type of display configuration allows participants to fully immerse themselves in a purely virtual environment.

III.示例性操作环境III. Exemplary Operating Environment

在进一步继续受控3D通信端点系统100和方法的各实施例的操作概览和细节之前，现在将给出该受控3D通信端点系统100和方法的各实施例可在其中操作的示例性操作环境的讨论。受控3D通信端点系统100和方法的各实施例可在众多类型的通用或专用计算系统环境或配置内操作。Before continuing further with the operational overview and details of embodiments of the controlled 3D communication endpoint system 100 and method, an exemplary operating environment in which embodiments of the controlled 3D communication endpoint system 100 and method may operate will now be presented discussion. Embodiments of the controlled 3D communication endpoint system 100 and method are operable within numerous types of general purpose or special purpose computing system environments or configurations.

图6示出其上可实现此处描述且在图1-5和7-15中所示的3D通信端点系统100和方法的各实施例和元素的通用计算机系统的简化示例。应当注意，图6中由折线或虚线所表示的任何框表示简化计算设备的替换实施方式，并且以下描述的这些替换实施方式中的任一个或全部可以结合贯穿本文所描述的其他替换实施方式来使用。FIG. 6 shows a simplified example of a general-purpose computer system on which various embodiments and elements of the 3D communication endpoint system 100 and method described herein and shown in FIGS. 1-5 and 7-15 may be implemented. It should be noted that any blocks represented by broken or dashed lines in FIG. 6 represent alternative implementations of a simplified computing device, and that any or all of these alternative implementations described below may be combined with other alternative implementations described throughout this document. use.

例如，图6示出了总系统图，其示出简化计算设备10。该简化计算设备10可以是图1中所示的计算设备110的简化版本。这样的计算设备通常可以在具有至少一些最小计算能力的设备中找到，这些设备包括但不限于个人计算机、服务器计算机、手持式计算设备、膝上型或移动计算机、诸如蜂窝电话和PDA等通信设备、多处理器系统、基于微处理器的系统、机顶盒、可编程消费电子产品、网络PC、小型计算机、大型计算机、音频或视频媒体播放器等。For example, FIG. 6 shows a general system diagram showing a simplified computing device 10 . The simplified computing device 10 may be a simplified version of the computing device 110 shown in FIG. 1 . Such computing devices are typically found in devices with at least some minimal computing capabilities, including but not limited to personal computers, server computers, handheld computing devices, laptop or mobile computers, communication devices such as cell phones and PDAs , multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, audio or video media players, etc.

为允许设备实现此处描述的受控3D通信端点系统100和方法的各实施例，该设备应具有足够的计算能力和系统存储器以启用基本的计算操作。具体而言，如图6所示，计算能力一般由一个或多个处理单元12示出，并且还可包括一个或多个GPU 14，处理单元与GPU中的任一者或两者均与系统存储器16通信。注意，通用计算设备的处理单元12可以是专用微处理器，如DSP、VLIW、或其他微控制器，或可以是具有一个或多个处理核的常规CPU，包括多核CPU中的基于GPU的专用核。To allow a device to implement embodiments of the controlled 3D communication endpoint system 100 and method described herein, the device should have sufficient computing power and system memory to enable basic computing operations. Specifically, as shown in FIG. 6 , computing power is typically represented by one or more processing units 12, and may also include one or more GPUs 14, either or both of which are associated with the system The memory 16 communicates. Note that the processing unit 12 of a general-purpose computing device may be a special-purpose microprocessor, such as a DSP, VLIW, or other microcontroller, or may be a conventional CPU with one or more processing cores, including GPU-based special-purpose CPUs in multi-core CPUs. nuclear.

另外，图6的简化计算设备10还可包括其他组件，诸如通信接口18等。图6的简化计算设备10还可包括一个或多个常规计算机输入设备20(诸如，指示笔、定点设备、键盘、音频输入设备、视频输入设备、触觉输入设备、用于接收有线或无线数据传输的设备等)。图6的简化计算设备10还可包括其他可任选组件，诸如例如一个或多个常规计算机输出设备22(例如，显示设备24、音频输出设备、视频输出设备、用于传送有线或无线数据传输的设备等)。注意，通用计算机的典型的通信接口18、输入设备20、输出设备22、以及存储设备26对本领域技术人员而言是公知的，并且在此不会详细描述。Additionally, the simplified computing device 10 of FIG. 6 may also include other components, such as a communication interface 18 and the like. The simplified computing device 10 of FIG. 6 may also include one or more conventional computer input devices 20, such as a stylus, pointing device, keyboard, audio input device, video input device, tactile input device, for receiving wired or wireless data transmissions. equipment, etc.). The simplified computing device 10 of FIG. 6 may also include other optional components, such as, for example, one or more conventional computer output devices 22 (e.g., a display device 24, an audio output device, a video output device, for wired or wireless data transmission) equipment, etc.). Note that typical communication interfaces 18, input devices 20, output devices 22, and storage devices 26 of a general purpose computer are well known to those skilled in the art and will not be described in detail here.

图6的简化计算设备10还可包括各种计算机可读介质。计算机可读介质可以是可由简化计算设备10经由存储设备26访问的任何可用介质，并且包括是可移动28和/或不可移动30的易失性和非易失性介质，用于存储诸如计算机可读或计算机可执行指令、数据结构、程序模块或其他数据等信息。作为示例而非限制，计算机可读介质可包括计算机存储介质和通信介质。计算机存储介质包括但不限于：计算机或机器可读介质或存储设备，诸如DVD、CD、软盘、磁带驱动器、硬盘驱动器、光盘驱动器、固态存储器设备、RAM、ROM、EEPROM、闪存或其他存储器技术、磁带盒、磁带、磁盘存储或其他磁存储设备、或可用于存储所需信息并且可由一个或多个计算设备访问的任何其他设备。The simplified computing device 10 of FIG. 6 may also include various computer-readable media. Computer-readable media can be any available media that can be accessed by simplified computing device 10 via storage device 26, and includes both volatile and non-volatile media that are removable 28 and/or non-removable 30 for storing information such as computer-readable Information such as readable or computer-executable instructions, data structures, program modules, or other data. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes, but is not limited to: computer or machine readable media or storage devices such as DVDs, CDs, floppy disks, tape drives, hard drives, optical drives, solid state memory devices, RAM, ROM, EEPROM, flash memory or other memory technologies, Magnetic tape cartridges, tapes, disk storage or other magnetic storage devices, or any other device that can be used to store desired information and can be accessed by one or more computing devices.

诸如计算机可读或计算机可执行指令、数据结构、程序模块等信息的保留还可通过使用各种上述通信介质中的任一种来编码一个或多个已调制数据信号或载波或其他传输机制或通信协议来实现，并且包括任何有线或无线信息传递机制。注意，术语“已调制数据信号”或“载波”一般指以对信号中的信息进行编码的方式设置或改变其一个或多个特征的信号。例如，通信介质包括诸如有线网络或直接线连接等携带一个或多个已调制数据信号的有线介质，以及诸如声学、RF、红外线、激光和其他无线介质等用于传送和/或接收一个或多个已调制数据信号或载波的无线介质。上述通信介质的任一组合也应包括在通信介质的范围之内。Information such as computer-readable or computer-executable instructions, data structures, program modules, etc. may also be retained by encoding one or more modulated data signals or carrier waves or other transport mechanisms using any of the various aforementioned communication media or communication protocol, and includes any wired or wireless information transfer mechanism. Note that the terms "modulated data signal" or "carrier carrier" generally refer to a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. For example, communication media includes wired media such as a wired network or direct-wire connection, which carry one or more modulated data signals, and wireless media such as acoustic, RF, infrared, laser and other wireless media for transmitting and/or receiving one or more A wireless medium with a modulated data signal or carrier wave. Any combination of the above communication media should also be included within the scope of communication media.

此外，可以按计算机可执行指令或其他数据结构的形式存储、接收、传送或者从计算机或机器可读介质或存储设备和通信介质的任何所需组合中读取具体化此处所描述的受控3D通信端点系统100和方法的各种实施方式中的部分或全部的软件、程序和/或计算机程序产品或其各部分。In addition, the controlled 3D software embodying the controls described herein may be stored, received, transmitted, or read from computer- or machine-readable media or any desired combination of storage devices and communication media in the form of computer-executable instructions or other data structures. Some or all of the software, programs and/or computer program products or portions thereof in various embodiments of the communication endpoint system 100 and method.

最终，此处所描述的受控3D通信端点系统100和方法的各实施例还可在由计算设备执行的诸如程序模块等计算机可执行指令的一般上下文中描述。一般而言，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等。本文描述的各实施例还可以在其中任务由通过一个或多个通信网络链接的一个或多个远程处理设备执行或者在该一个或多个设备的云中执行的分布式计算环境中实现。在分布式计算环境中，程序模块可以位于包括媒体存储设备在内的本地和远程计算机存储介质中。此外，上述指令可以部分地或整体地作为可以包括或不包括处理器的硬件逻辑电路来实现。Finally, embodiments of the controlled 3D communication endpoint system 100 and methods described herein may also be described in the general context of computer-executable instructions, such as program modules, being executed by a computing device. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Embodiments described herein can also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices that are linked through one or more communications networks, or in a cloud of one or more devices. In a distributed computing environment, program modules may be located in both local and remote computer storage media including media storage devices. Furthermore, the above-described instructions may be implemented in part or in whole as hardware logic circuits that may or may not include a processor.

IV.操作概览IV. OPERATIONAL OVERVIEW

图7是示出图1所示的受控3D通信端点系统100的总体操作的流程图。如图7所示，系统100的操作通过捕捉本地端点处的本地参与者的3D视频来开始(框700)。作为示例，本地端点可以是办公建筑中的房间。捕捉到的视频是使用捕捉RGB数据和深度信息两者的多个相机群来获得的(框705)。该多个相机群被定位成绕本地参与者360度。换言之，捕捉到的视频包含一直围绕本地参与者的视图。FIG. 7 is a flowchart showing the overall operation of the controlled 3D communication endpoint system 100 shown in FIG. 1 . As shown in FIG. 7, operation of the system 100 begins by capturing 3D video of a local participant at a local endpoint (block 700). As an example, a local endpoint may be a room in an office building. Captured video is obtained using multiple camera clusters that capture both RGB data and depth information (block 705). The plurality of camera clusters are positioned 360 degrees around the local participant. In other words, the captured video contains views all the way around the local participant.

该方法的各实施例随后使用捕捉到的3D视频来创建本地参与者的本地几何代理(框710)。接着，该方法生成与亲临通信相一致的场景几何(框715)。一般概念是创建模仿亲临通信的动态性的虚拟环境。该方法随后将本地几何代理置于场景几何中以创建虚拟环境(框720)。本地几何代理和场景几何被传送给远程端点处的远程参与者(框725)。Embodiments of the method then use the captured 3D video to create a local geometric proxy of the local participant (block 710). Next, the method generates scene geometry consistent with in-person communications (block 715). The general concept is to create a virtual environment that mimics the dynamics of in-person communication. The method then places native geometry proxies into the scene geometry to create the virtual environment (block 720). The local geometry proxy and scene geometry are communicated to the remote participant at the remote endpoint (block 725).

类似地，使用多个相机群来捕捉参与在线会议或会面的远程参与者和任何其他参与者且为他们中的每一个创建几何代理。这些几何代理中的每一个被渲染并置于虚拟环境的场景几何中。这些经渲染的几何代理和场景几何随后被传送给其他参与者。Similarly, multiple camera fleets are used to capture remote participants and any other participants participating in an online conference or meeting and create geometric proxies for each of them. Each of these geometric proxies is rendered and placed within the scene geometry of the virtual environment. These rendered geometric proxies and scene geometry are then communicated to other participants.

接收到的虚拟环境在端点中的占据绕远程参与者至少180度的空间的显示设备上被显示给观看者(如远程参与者)(框730)。这向远程参与者提供了进入虚拟环境的虚拟视点。如下文详细解释的，在观看者看虚拟视点时看到的事物部分依赖于观看者的头部的位置和定向。The received virtual environment is displayed to a viewer (eg, the remote participant) on a display device in the endpoint occupying a space of at least 180 degrees around the remote participant (block 730). This provides remote participants with a virtual viewpoint into the virtual environment. As explained in detail below, what the viewer sees when looking at the virtual viewpoint depends in part on the position and orientation of the viewer's head.

该方法的各实施例定义虚拟环境内的虚拟桌。所渲染的参与者中的每一个随后被围绕虚拟环境中的虚拟桌放置。在一些实施例中，虚拟桌具有带第一直径的圆形(框735)。这允许容易地发生伸缩。具体而言，虚拟环境可通过将参与者的数量增加到超过当前两个参与者(本地参与者和远程参与者)来扩展(框740)。为了容纳参与者的这一增加，该方法随后将虚拟桌的大小从第一直径增加到第二直径，其中第二直径大于第一直径(框745)。参与者的几何代理被放置在具有增加的大小的虚拟桌处，使得远程参与者可以看到虚拟环境中虚拟桌处的每一参与者(框750)。Embodiments of the method define virtual tables within a virtual environment. Each of the rendered participants is then placed around a virtual table in the virtual environment. In some embodiments, the virtual table has a circular shape with a first diameter (block 735). This allows scaling to occur easily. Specifically, the virtual environment may be expanded by increasing the number of participants beyond the current two participants (local participant and remote participant) (block 740). To accommodate this increase in participants, the method then increases the size of the virtual table from the first diameter to a second diameter, wherein the second diameter is larger than the first diameter (block 745). The participants' geometric proxies are placed at the virtual table with an increased size so that remote participants can see each participant at the virtual table in the virtual environment (block 750).

系统100和方法的各实施例包括3D通信处理系统105。图8是示出图1所示的3D通信处理系统105的总体操作的流程图。如图8所示，3D通信处理系统105的操作通过捕捉在线会议或会面中的各参与者中的每一者的图像来开始(框800)。参与者中的至少一者是远程参与者，这意味着该远程参与者与其他参与者不处于同一物理位置或端点。对每一参与者的捕捉是通过使用相机群来实现的。Various embodiments of the system 100 and method include a 3D communication processing system 105 . FIG. 8 is a flowchart showing the overall operation of the 3D communication processing system 105 shown in FIG. 1 . As shown in FIG. 8 , operation of the 3D communication processing system 105 begins by capturing images of each of the participants in an online meeting or meeting (block 800 ). At least one of the participants is a remote participant, meaning that the remote participant is not at the same physical location or endpoint as the other participants. The capture of each participant is achieved using a camera cluster.

接着，该方法的各实施例使用来自捕捉到的图像的数据来为每一参与者创建几何代理(框810)。参与者的数量随后被确定(框820)。这一确定可以脱序执行，使得参与者的数量被预先确定或已知。该方法的各实施例随后基于在线会议中参与者的数量来生成场景几何(框830)。这一场景几何生成帮助模拟与远程参与者进行亲临对话或会议的体验。Embodiments of the method then use data from the captured images to create geometric proxies for each participant (block 810). The number of participants is then determined (block 820). This determination can be performed out of sequence such that the number of participants is predetermined or known. Embodiments of the method then generate scene geometry based on the number of participants in the online meeting (block 830). This scene geometry generation helps simulate the experience of having an in-person conversation or meeting with remote participants.

特定参与者的每一几何代理随后被渲染给场景几何内的其他参与者的其他几何代理(框840)。这一渲染被执行，使得几何代理被按与亲临对话一致的方式来安排。这些经渲染的几何代理和场景几何随后被传送给各参与者(框850)。变化的虚拟视点被显示给参与者中的每一个，使得虚拟视点依赖于观看者的脸部的位置和定向(框860)。出于附加的真实性，运动视差和深度被添加，以增强参与者的观看体验(框870)。如下文详细解释的，运动视差和深度依赖于观看者相对于该观看者在其上观看会议或会面的显示设备或监视器的眼睛注视。Each geometric proxy for a particular participant is then rendered to other geometric proxies of other participants within the scene geometry (block 840). This rendering is performed such that the geometry agents are arranged in a manner consistent with the presence dialog. These rendered geometry proxies and scene geometry are then communicated to the participants (block 850). A varying virtual viewpoint is displayed to each of the participants such that the virtual viewpoint is dependent on the position and orientation of the viewer's face (block 860). For additional realism, motion parallax and depth are added to enhance the participant's viewing experience (block 870). As explained in detail below, motion parallax and depth are dependent on the eye gaze of the viewer relative to the display device or monitor on which the viewer is viewing the meeting or meeting.

V.操作细节V. Operating Details

现在将讨论受控3D通信端点系统100和方法的各实施例的操作细节。这包括系统100的可伸缩性、几何代理创建、以及场景几何的创建的细节。此外，还讨论了虚拟相机的概念、将运动视差和深度添加到几何代理和场景几何、以及处理同一环境中的一个以上参与者并观看同一显示设备或监视器。Operational details of various embodiments of the controlled 3D communication endpoint system 100 and method will now be discussed. This includes details of scalability of the system 100, geometry proxy creation, and creation of scene geometry. Additionally, the concept of virtual cameras, adding motion parallax and depth to geometric proxies and scene geometry, and handling more than one actor in the same environment and viewing the same display device or monitor are discussed.

V.A.可伸缩性V.A. Scalability

受控3D通信端点系统100和方法的各实施例是可伸缩的。这意味着每当附加端点被添加到在线会议时，系统100和方法的各实施例可容易地扩展以容纳该附加端点。图9示出扩展系统100和方法的各实施例以容纳附加端点的示例性实施例。Embodiments of the controlled 3D communication endpoint system 100 and method are scalable. This means that embodiments of the system 100 and method can be easily extended to accommodate additional endpoints whenever they are added to the online meeting. FIG. 9 illustrates an exemplary embodiment of extending various embodiments of the system 100 and method to accommodate additional endpoints.

因为至少180度的显示设备配置，可伸缩性被增强。例如，如果单个平面屏幕在墙上且存在两个端点，每一端点具有一参与者，则这两个参与者可被置于虚拟环境中的圆桌处。每一参与者将能够看到另一参与者。如果这被扩展且如果10个端点处的10个参与者尝试加入在线会议，则观看者可从他跨桌看到人，但每一其他人将迷失在人群中。然而，使用至少180度显示设备配置，只要屏幕上的参与者在虚拟环境中围成一圈，则该圈可被做得如需要的那样大，并且观看者将仍然能够看到每一参与者。Scalability is enhanced because of a display device configuration of at least 180 degrees. For example, if a single flat screen is on a wall and there are two endpoints, each with a participant, the two participants can be placed at a round table in the virtual environment. Each participant will be able to see the other participant. If this is extended and if 10 participants at 10 endpoints try to join an online meeting, the viewer can see people from him across the table, but every other person will get lost in the crowd. However, with at least a 180-degree display configuration, as long as the on-screen participants form a circle in the virtual environment, the circle can be made as large as needed and the viewer will still be able to see each participant .

当然，这意味着越多参与者被添加，虚拟桌就必须越大。在某一时刻，参与者的数量变得如此之大，以致于该桌的最远端的参与者太小，使得观看者不能识别他们。此外，尽管虚拟桌不需要是圆形的，但使用其他形状会存在遮挡且人们开始彼此阻挡。Of course, this means that the more participants are added, the larger the virtual table must be. At some point, the number of participants becomes so large that the farthest participants at the table are too small for the viewer to identify them. Also, while the virtual table doesn't need to be round, with other shapes there is occlusion and people start blocking each other.

如图9所示，虚拟环境900示出了系统100和方法的各实施例如何将参与者几何代理相对于彼此进行安排。在图9的左侧，三个参与者905、906、907 被安排成围绕第一虚拟圆桌910。在这一虚拟环境中，参与者905、906、907中的每一个通过虚拟窗口观看在线会议。具体而言，虚拟窗口920、925、930被分别定位在三个参与者905、906、907中的每一个的前方。这些虚拟窗口920、925、930向三个参与者905、906、907给出了绕第一虚拟圆桌910的虚拟视点。这允许每一参与者感觉好像他实际上与其他参与者一起在房间中。As shown in FIG. 9, a virtual environment 900 illustrates how various embodiments of the system 100 and method arrange participant geometric agents relative to each other. On the left side of FIG. 9 , three participants 905 , 906 , 907 are arranged around a first virtual round table 910 . In this virtual environment, each of the participants 905, 906, 907 views the online meeting through a virtual window. Specifically, virtual windows 920, 925, 930 are positioned in front of each of the three participants 905, 906, 907, respectively. These virtual windows 920 , 925 , 930 give the three participants 905 , 906 , 907 virtual viewpoints around the first virtual round table 910 . This allows each participant to feel as if he is actually in the room with the other participants.

箭头935指示附加端点已被添加到虚拟环境900。在添加了附加参与者的情况下，第一虚拟圆桌910已被扩张成第二虚拟圆桌940。八个参与者950、951、952、953、954、955、956、957被安排成围绕第二虚拟圆桌940。此外，多个虚拟窗口960被定位在八个参与者950、951、952、953、954、955、956、957中的每一个的前方。多个虚拟窗口960中的每一个向参与者950、951、952、953、954、955、956、957给出绕第二虚拟圆桌940的虚拟视点。这向每一参与者给出参与者中的每一个一起在一个大虚拟房间中的假象。Arrow 935 indicates that additional endpoints have been added to virtual environment 900 . With the addition of additional participants, the first virtual round table 910 has been expanded into a second virtual round table 940 . Eight participants 950 , 951 , 952 , 953 , 954 , 955 , 956 , 957 are arranged around the second virtual round table 940 . Additionally, a plurality of virtual windows 960 are positioned in front of each of the eight participants 950 , 951 , 952 , 953 , 954 , 955 , 956 , 957 . Each of the plurality of virtual windows 960 gives participants 950 , 951 , 952 , 953 , 954 , 955 , 956 , 957 a virtual viewpoint around the second virtual round table 940 . This gives each participant the illusion that each of the participants is together in one large virtual room.

V.B.几何代理创建V.B. Geometry Proxy Creation

捕捉和创建组件200的另一部分是几何代理创建模块245。模块245为会议或会面中的各参与者中的每一个创建几何代理。根据相机群300捕捉到的距离数据来计算深度信息。一旦获得了深度信息，就根据捕捉到的深度信息中包含的深度点来创建稀疏点云。随后使用已知方法和捕捉到的深度信息来生成密集深度点云。在一些实施例中，根据密集点云来构建网格并根据该网格生成几何代理。在替换实施例中，密集点云被纹理化以生成几何代理。Another part of the capture and creation component 200 is the geometry proxy creation module 245 . Module 245 creates geometric proxies for each of the participants in the conference or meeting. Depth information is calculated from distance data captured by the camera group 300 . Once the depth information is obtained, a sparse point cloud is created from the depth points contained in the captured depth information. A dense depth point cloud is then generated using known methods and captured depth information. In some embodiments, a mesh is constructed from the dense point cloud and a geometric proxy is generated from the mesh. In an alternative embodiment, dense point clouds are textured to generate geometric proxies.

图10示出创建单个会议参与者的几何代理的示例性概览。如图10所示，从相机群300的RGB相机捕捉RGB数据1000。另外，根据相机群300获得的深度数据来计算深度信息1010。RGB数据1000和深度信息1010被加在一起以创建单个会议参与者的几何代理250。这一几何代理创建是针对参与者中的每一个来执行的，使得每一参与者具有对应的几何代理。Figure 10 shows an exemplary overview of creating a geometric proxy for a single conference participant. As shown in FIG. 10 , RGB data 1000 is captured from RGB cameras of camera group 300 . In addition, depth information 1010 is calculated from depth data obtained by the camera group 300 . The RGB data 1000 and depth information 1010 are added together to create a geometric proxy 250 of an individual conference participant. This geometric proxy creation is performed for each of the participants such that each participant has a corresponding geometric proxy.

V.C. 3D体积的配准以及3D空间的对齐V.C. Registration of 3D volumes and alignment in 3D space

受控3D通信端点系统100和方法的各实施例的第二组件是场景几何组件210。这包括相机群300捕捉的3D体积的配准和3D空间的对齐两者。场景几何组件210的一般概念是创建会议参与者之间的相对几何。需要将场景确切地对齐，如同参与者在同一物理位置且参与亲临通信一样。A second component of embodiments of the controlled 3D communication endpoint system 100 and method is the scene geometry component 210 . This includes both the registration of the 3D volume captured by the camera cluster 300 and the alignment of the 3D space. The general concept of the scene geometry component 210 is to create relative geometry between conference participants. The scenes need to be aligned exactly as if the participants were in the same physical location and participating in the in-person communication.

系统100和方法的各实施例创建作为锚定在端点(或捕捉环境)处的3D场景的场景几何。为了实现这一点，具有对包含参与者中的每一个的环境的精确估计是合乎需要的。一旦获得了这一点，则系统100和方法的各实施例计算显示设备(或监视器)与相机的精确配准。这产生了与真实世界对齐的虚拟空间中的定向。换言之，虚拟空间与真实空间对齐。这一配准和对齐是使用已知方法来实现的。在系统100和方法的优选实施例中，在制造时执行校准。在其他实施例中，使用环境中的参考对象来执行校准。Embodiments of the system 100 and method create scene geometry that is a 3D scene anchored at endpoints (or capture environments). In order to achieve this, it is desirable to have an accurate estimate of the environment containing each of the participants. Once this is obtained, various embodiments of the system 100 and method calculate an exact registration of the display device (or monitor) and the camera. This produces an orientation in virtual space that aligns with the real world. In other words, the virtual space is aligned with the real space. This registration and alignment is achieved using known methods. In a preferred embodiment of the system 100 and method, calibration is performed at the time of manufacture. In other embodiments, calibration is performed using reference objects in the environment.

场景几何寻求创建本地参与者与远程参与者之间的相对几何。这包括创建眼睛注视和对话几何，如同参与者处于亲临会议中一样。使眼睛注视和对话几何正确的一种方式是具有参与者之间的相对、一致的几何。在一些实施例中，这通过使用虚拟框来实现。具体而言，如果在参与者一起处于房间中时，围绕真实空间中的参与者来绘制框，则这些虚拟框按照虚拟布局被重新创建来创建场景几何。几何的形状不像它在参与者之间的一致性那样要紧。Scene geometry seeks to create the relative geometry between local and remote actors. This includes creating eye gaze and conversation geometry as if participants were in an in-person meeting. One way to get eye gaze and dialogue geometry correct is to have relative, consistent geometry between the participants. In some embodiments, this is accomplished through the use of virtual boxes. Specifically, if boxes are drawn around participants in real space while the participants are in a room together, these virtual boxes are recreated in a virtual layout to create the scene geometry. The shape of the geometry matters less than its consistency among participants.

某些输入形状因子(像单个监视器或多个监视器)将影响最优布局以及解决方案的可伸缩性。场景几何还依赖于参与者的数量。具有两个参与者(本地参与者和远程参与者)的会议是与存在三个或更多个参与者的情况下的场景几何不同的一对一(1:1)场景几何。此外，如将从以下示例看到的，场景几何包括参与者之间的眼睛注视。Certain input form factors like single monitor or multiple monitors will affect the optimal layout and scalability of the solution. Scene geometry also depends on the number of participants. A conference with two participants (a local participant and a remote participant) is a different one-to-one (1:1) scene geometry than where there are three or more participants. Furthermore, as will be seen from the examples below, the scene geometry includes eye gaze between participants.

图11示出当在线会议中存在两个参与者(在两个不同的端点处)时参与者之间的场景几何的示例性实施例。如图11所示，1:1会议的这一场景几何1100包括第三参与者1110以及第四参与者1120。这些参与者不在同一物理位置。换言之，他们在不同的端点处。Figure 11 shows an exemplary embodiment of scene geometry between participants when there are two participants (at two different endpoints) in an online meeting. As shown in FIG. 11 , this scene geometry 1100 for a 1:1 conference includes a third participant 1110 and a fourth participant 1120 . These participants are not in the same physical location. In other words, they are at different endpoints.

在1:1会议的这一场景几何1100中，该几何包括占据参与者1100、1120的相应显示设备或监视器(未示出)的前方空间的两个框。第一虚拟框1130被绘制成围绕第三参与者1110且第二虚拟框1140被绘制成围绕第四参与者1120。假定同样大小的监视器和一致的设置允许系统100和方法的各实施例知晓场景几何是正确的，而无需对捕捉到的数据的任何操纵。In this scene geometry 1100 for a 1:1 conference, the geometry includes two boxes occupying the space in front of the participants' 1100, 1120 respective display devices or monitors (not shown). A first virtual frame 1130 is drawn around the third participant 1110 and a second virtual frame 1140 is drawn around the fourth participant 1120 . Assuming equally sized monitors and consistent settings allows embodiments of the system 100 and method to know that the scene geometry is correct without requiring any manipulation of the captured data.

在系统100和方法的替换实施例中，存在多个远程参与者，且几何与1:1会议的场景几何1100不同。图12示出当在线会议中存在处于三个不同端点处的三个参与者时参与者之间的场景几何的示例性实施例。这是3端点会议的场景几何1200。如上所述，端点是包含会议或会面的参与者的环境。在3端点会议中，存在处于三个不同物理位置的参与者。In an alternative embodiment of the system 100 and method, there are multiple remote participants and the geometry is different from the scene geometry 1100 of a 1:1 conference. Figure 12 shows an exemplary embodiment of scene geometry between participants when there are three participants at three different endpoints in an online meeting. This is the scene geometry 1200 for the 3 endpoint session. As mentioned above, an endpoint is the environment containing the participants of a meeting or meeting. In a 3-endpoint conference, there are participants at three different physical locations.

在图12中，3端点会议的场景几何1200包括围绕虚拟圆桌1235的参与者 #11210、参与者#21220、以及参与者#31230。虚拟框#11240被绘制成围绕参与者#11210，虚拟框#21250被绘制成围绕参与者#21220，且虚拟框#31260被绘制成围绕参与者#31230。虚拟框1240、1250、1260中的每一个被放置成以等距的方式围绕虚拟圆桌1235。这创建了3端点会议的场景几何1200。注意，这一场景几何可针对附加端点来被扩展，如以上相关于可伸缩性来讨论的。In Figure 12, the scene geometry 1200 for a 3-endpoint conference includes Participant #1 1210, Participant #2 1220, and Participant #3 1230 surrounding a virtual round table 1235. Virtual frame #1 1240 is drawn around participant #1 1210, virtual frame #2 1250 is drawn around participant #2 1220, and virtual frame #3 1260 is drawn around participant #3 1230. Each of the virtual boxes 1240 , 1250 , 1260 is placed to surround the virtual round table 1235 in an equidistant manner. This creates a scene geometry of 1200 for the 3-endpoint meeting. Note that this scene geometry can be extended for additional endpoints, as discussed above with respect to scalability.

V.D.虚拟相机V.D. Virtual Camera

场景几何组件210还包括虚拟相机。虚拟相机定义透视投影，根据该透视投影，将渲染3D几何代理的新颖视图。这允许系统100和方法的各实施例获得人们之间的自然眼睛注视以及联系。当前视频会议中的一个故障因人们没有看向相机所处的位置而发生，使得该会议中的远程参与者感觉如同另一人没有看他们。这是不自然的并且通常不会发生在亲临对话中。The scene geometry component 210 also includes a virtual camera. The virtual camera defines a perspective projection from which a novel view of the 3D geometric proxy will be rendered. This allows embodiments of the system 100 and method to achieve natural eye gaze and connection between people. A glitch in current video conferencing occurs when people are not looking where the camera is located, making the remote participant in the conference feel as if the other person is not looking at them. This is unnatural and doesn't usually happen in an in-person conversation.

系统100和方法的各实施例中的虚拟相机是使用来自场景几何的虚拟空间和每一参与者的3D几何代理(具有详细纹理信息)创建的。这一虚拟相机没有绑定到被用于捕捉图像的真实相机群的位置。此外，系统100和方法的一些实施例使用脸部跟踪(包括眼睛注视跟踪)来确定各参与者位于何处以及他们在他们的虚拟空间中看向何处。这允许基于参与者在看向场景中的何处来创建虚拟相机。这用于准确地传达参与者对其他参与者的正确注视并向他们提供正确视图。因而，虚拟相机促进了会议参与者之间的交互的自然眼睛注视和对话几何。The virtual cameras in embodiments of the system 100 and method are created using the virtual space from the scene geometry and each participant's 3D geometry proxy (with detailed texture information). This virtual camera is not bound to the location of the real camera cluster that was used to capture the image. Additionally, some embodiments of the system 100 and method use face tracking (including eye gaze tracking) to determine where various participants are located and where they are looking in their virtual space. This allows virtual cameras to be created based on where in the scene the participant is looking. This is used to accurately communicate the participant's correct gaze to other participants and provide them with the correct view. Thus, the virtual camera facilitates natural eye gaze and conversational geometry of interactions between meeting participants.

这些虚拟相机是通过创建场景几何并在该几何中放入额外的人或物来创建的。根据相机群获得的多个视角，虚拟相机能够在场景几何中四处移动。例如，如果头部被认为是气球，则该气球的前方将被该气球前方的相机群捕捉，且该气球的一侧将被该气球的该侧上的相机群捕捉。通过合成来自这两个相机群的图像，虚拟相机可被创建在正前方和该侧之间的任何位置。换言之，虚拟相机视图被创建成来自覆盖特定空间的不同相机的图像的合成。These virtual cameras are created by creating scene geometry and placing additional people or things in that geometry. The virtual camera is able to move around the scene geometry based on the multiple viewpoints obtained by the camera swarm. For example, if the head is considered to be a balloon, the front of the balloon will be captured by the cluster of cameras in front of the balloon, and one side of the balloon will be captured by the cluster of cameras on that side of the balloon. By compositing images from these two camera clusters, a virtual camera can be created anywhere between directly in front and to the side. In other words, a virtual camera view is created as a composite of images from different cameras covering a particular space.

图13示出基于参与者正在看的位置的虚拟相机的示例性实施例。这也可被认为是使用虚拟注视来获得自然眼睛注视。如图13所示，监视器400将远程参与者410显示给本地参与者1300。监视器400包括四个相机群300。虚拟眼睛注视框1310被绘制成围绕远程参与者1320的眼睛和本地参与者1330的眼睛。虚拟眼睛注视框1310是水平的，使得在虚拟空间中，远程参与者1320的眼睛和本地参与者1330的眼睛正在互相看。Figure 13 illustrates an exemplary embodiment of a virtual camera based on where a participant is looking. This can also be thought of as using virtual gaze to obtain natural eye gaze. As shown in FIG. 13 , monitor 400 displays remote participant 410 to local participant 1300 . The monitor 400 includes four camera groups 300 . A virtual eye gaze box 1310 is drawn around the remote participant's 1320 eyes and the local participant's 1330 eyes. The virtual eye gaze frame 1310 is horizontal so that in the virtual space, the eyes of the remote participant 1320 and the eyes of the local participant 1330 are looking at each other.

虚拟相机的一些实施例使用脸部跟踪来提高性能。脸部跟踪帮助系统100和方法的各实施例改变视角，使得各参与者面向彼此。脸部跟踪帮助虚拟相机与观看者的眼睛注视保持水平。这模仿人眼在亲临对话期间的工作方式。虚拟相机与脸部跟踪交互，以创建用户直接看向另一参与者的情况下的虚拟视点。换言之，脸部跟踪被用来改变虚拟相机的虚拟视点。Some embodiments of the virtual camera use face tracking to improve performance. Face tracking helps embodiments of the system 100 and method change the perspective so that the participants face each other. Face tracking helps the virtual camera stay level with the viewer's eye gaze. This mimics how the human eye works during an in-person conversation. The virtual camera interacts with face tracking to create a virtual point of view where the user is looking directly at another participant. In other words, face tracking is used to change the virtual viewpoint of the virtual camera.

V.E.通过运动视差的深度V.E. Depth via Motion Parallax

系统100和方法的第三组件是虚拟视点组件220。一旦经渲染的几何代理和场景几何被传送给各参与者，它被渲染在参与者的监视器上。为了增加显示在监视器上的场景的真实性，使用运动视差的深度被添加，以提供在观看某事物的某人的位置改变时发生的、视图中的有细微差别的改变。A third component of the system 100 and method is the virtual viewpoint component 220 . Once the rendered geometry proxy and scene geometry is delivered to each participant, it is rendered on the participant's monitor. To increase the realism of the scene displayed on the monitor, depth is added using motion parallax to provide the nuanced changes in view that occur when the position of someone viewing something changes.

运动视差是使用在观看者的头部移动时使相机视图改变的高速头部跟踪来添加的。这创建了深度的假象。图14示出通过基于观看者所面向的位置的运动视差来提供深度的示例性实施例。如图14所示，具有四个相机群300的监视器400显示远程参与者410的图像。注意到，在图14中，远程参与者410被示为虚线人物1400和实现人物1410。虚线人物1410示出了远程参与者410正看向其左侧并且因而具有包括虚线参与者1430的第一视野1420。实线人物1410示出了远程参与者410正看向其右侧并且因而具有包括实线参与者1450的第二视野1440。Motion parallax is added using high speed head tracking that causes the camera view to change as the viewer's head moves. This creates the illusion of depth. FIG. 14 illustrates an exemplary embodiment in which depth is provided by motion parallax based on where the viewer is facing. As shown in FIG. 14 , a monitor 400 with four camera clusters 300 displays an image of a remote participant 410 . Note that in FIG. 14 , remote participants 410 are shown as dashed character 1400 and implementing character 1410 . Dashed figure 1410 shows remote participant 410 looking to his left and thus has a first field of view 1420 that includes dashed participant 1430 . Solid line figure 1410 shows that remote participant 410 is looking to its right and thus has a second field of view 1440 that includes solid line participant 1450 .

在远程参与者410的视点从一侧移至另一侧时，他对其他空间的视角变化。这向远程参与者410给出了其他参与者和其他参与者所处的房间(或环境)的不同视图。因而，如果远程参与者向左、右、上、或下移动，则他将看到该远程参与者410正与其交互的参与者的稍微不同的视图，并且该人之后的背景也改变。这向场景给出了深度的感觉，并且向场景中的人给出了在他们亲身与某人交谈时得到的体积的感觉。远程参与者的视点是使用头部跟踪或低等待时间脸部跟踪技术来跟踪的。通过运动视差的深度动态地增强了体积感，同时提供移动的完全自由度，因为观看者不被锁定到一个相机视角。As the viewpoint of the remote participant 410 moves from side to side, his perspective of other spaces changes. This gives the remote participant 410 a different view of the other participants and the room (or environment) in which the other participants are located. Thus, if the remote participant moves left, right, up, or down, he will see a slightly different view of the participant the remote participant 410 is interacting with, and the background behind that person also changes. This gives the scene a sense of depth, and the people in the scene the sense of volume they get when they talk to someone in person. The viewpoint of the remote participant is tracked using head tracking or low latency face tracking techniques. The sense of volume is dynamically enhanced through the depth of motion parallax, while providing complete freedom of movement, as the viewer is not locked to one camera perspective.

V.F.单个端点处的多个参与者V.F. Multiple Participants at a Single Endpoint

系统100和方法的各实施例还包括一端点处存在一个以上参与者的情况。用于通过运动视差的深度的以上技术对单个观看者而言由于跟踪该观看者以及基于他们的观看角度和位置在监视器上提供适当视图的能力而工作良好。然而，在同一端点处存在第二人并观看同一监视器的情况下，这不能工作，因为监视器一次只能提供一个场景并且它将锁定到一个人。这使得视图脱离了没有被跟踪的另一观看者。Embodiments of the system 100 and method also include situations where there is more than one participant at an endpoint. The above technique for depth through motion parallax works well for a single viewer due to the ability to track that viewer and provide the appropriate view on the monitor based on their viewing angle and position. However, where there is a second person at the same endpoint and looking at the same monitor, this will not work because the monitor can only provide one scene at a time and it will lock to one person. This takes the view away from another viewer who is not being tracked.

系统100和方法的各实施例有若干方式来解决这一问题。在一些实施例中，使用向不同观看者提供不同图像的监视器。在这些实施例中，脸部跟踪技术跟踪两个不同的脸并随后向不同的观看者提供不同视图。在其他实施例中，运动视差被移除且固定虚拟相机被锁定在监视器的中心。在一个以上参与者处于一端点处时，这创建了未达标准的体验。在又一些其他实施例中，在端点处的多个参与者中的每一个都佩戴眼镜。每一副眼镜被用来提供不同视图。在又一些其他实施例中，眼镜上具有向每一佩戴者示出与监视器不同的帧的快门。监视器所显示的交替帧被调谐到每一副眼镜，并且基于每一观看者的位置向该观看者提供正确图像。Embodiments of the system 100 and method have several ways to address this issue. In some embodiments, monitors are used that provide different images to different viewers. In these embodiments, the face tracking technology tracks two different faces and then provides different views to different viewers. In other embodiments, motion parallax is removed and a fixed virtual camera is locked to the center of the monitor. This creates a substandard experience when more than one participant is at an endpoint. In still other embodiments, each of the plurality of participants at the endpoint wears glasses. Each pair of glasses is used to provide a different view. In still other embodiments, the glasses have shutters on them that show each wearer a different frame than the monitor. Alternate frames displayed by the monitor are tuned to each pair of glasses and provide the correct image to each viewer based on that viewer's position.

另一实施例使用具有多个观看角度的监视器。图15示出使用具有多个观看角度的监视器处理单个端点处的多个参与者的技术的示例性实施例。这向监视器前方的每一观看者提供了远程参与者410以及远程参与者410之后的房间的不同视图。Another embodiment uses a monitor with multiple viewing angles. 15 illustrates an exemplary embodiment of a technique for handling multiple participants at a single endpoint using monitors with multiple viewing angles. This provides each viewer in front of the monitor with a different view of the remote participant 410 and the room behind the remote participant 410 .

如图15所示，具有透镜显示器(这允许多个观看角度)且具有四个相机群300的监视器1500正在显示远程参与者410。第一观看者1510正在从监视器1500的左侧看监视器1500。第一观看者1520的眼睛正在从左侧看监视器1500且具有监视器1500的左视野1530。第二观看者1540正在从监视器1500的右侧看监视器1500。第二观看者1550的眼睛正在从右侧看监视器1500且具有右视野1560。因为监视器1500上的透镜显示，左视野1530和右视野1560是不同的。换言之，向第一观看者1510和第二观看者1540提供了远程参与者410以及远程参与者410之后的房间的不同视图。因而，即使第一观看者1510和第二观看者1541并排，他们也会基于他们的视点而在监视器1500上看到不同事物。As shown in FIG. 15 , a monitor 1500 with a lenticular display (which allows for multiple viewing angles) and with four camera clusters 300 is displaying a remote participant 410 . A first viewer 1510 is looking at the monitor 1500 from the left side of the monitor 1500 . The eyes of the first viewer 1520 are looking at the monitor 1500 from the left side and have a left field of view 1530 of the monitor 1500 . A second viewer 1540 is looking at the monitor 1500 from the right side of the monitor 1500 . The eyes of the second viewer 1550 are looking at the monitor 1500 from the right side and have a right field of view 1560 . Because of the lens display on the monitor 1500, the left field of view 1530 and the right field of view 1560 are different. In other words, the first viewer 1510 and the second viewer 1540 are provided with different views of the remote participant 410 and the room behind the remote participant 410 . Thus, even though the first viewer 1510 and the second viewer 1541 are side by side, they will see different things on the monitor 1500 based on their point of view.

此外，尽管用专门描述结构特征和/或方法动作的语言描述了主题，但是应当理解，在后附权利要求书中限定的主题并不一定局限于上述特定的特征或动作。相反，上述具体特征和动作是作为实现权利要求的示例形式公开的。Furthermore, although the subject matter has been described in language specifically describing structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A method for simulating in-person communication, comprising:

capturing 3D video of the local participant at the local endpoint;

creating a local geometric proxy for the local participant using the captured 3D video, further comprising adding the captured RGB data and captured depth information from the 3D video to create the local geometric proxy;

Generate scene geometry and dialogue geometry with correct eye gaze consistent with in-person communications;

placing the native geometry proxy in the scene geometry to create a virtual environment; and

communicating the local geometry proxy and the scene geometry to a remote participant at a remote endpoint, wherein the local endpoint and the remote endpoint are at different physical locations to simulate a relationship between the local participant and the remote participant in-person communication.

2. The method of claim 1, further comprising capturing the 3D video at the local endpoint in a controlled manner using a plurality of camera clusters surrounding the local participant The latter captures both RGB data and depth information about the local participant in 360 degrees.

3. The method of claim 1, further comprising:

capturing three-dimensional video of the remote participant at the remote endpoint;

creating a remote geometry proxy for said remote participant;

placing the remote geometry proxy in the scene geometry and virtual environment; and

Both the local geometry proxy and the remote geometry proxy are rendered towards each other in the scene geometry and virtual environment.

4. The method of claim 3, further comprising communicating a rendered local geometry proxy, a rendered remote geometry proxy, and the scene geometry to the local endpoint and the remote endpoint.

5. The method of claim 1 , further comprising displaying the virtual environment to the remote participant at the remote endpoint on a display device occupying a space of at least 180 degrees around the remote participant. A participant to provide the remote participant with a virtual viewpoint of the virtual environment.

6. The method of claim 5, wherein the display devices comprise a first display device disposed in front of the remote participant, a second display device to one side of the first display device, and a third display device on the other side of the first display device.

7. The method of claim 6, further comprising:

positioning the second display device at right angles to the first display device; and

The third display device is positioned at right angles to the first display device.

8. The method of claim 6, further comprising:

positioning the second display device at a first angle less than 90 degrees to the first display device; and

The third display device is positioned at a second angle less than 90 degrees to the first display device.

9. The method of claim 8, further comprising setting the first angle and the second angle equal to each other.

10. The method of claim 3, further comprising:

defining virtual tables in said virtual environment; and

The local geometry agent and the remote geometry agent are placed around the virtual table to simulate the presence communication in the virtual environment.

11. The method of claim 10, further comprising defining the virtual table to have a circle with a first diameter.

12. The method of claim 11, further comprising:

Scale the virtual environment by increasing the number of participants from two to more than two participants;

increasing the size of the virtual table from a first diameter to a second diameter, wherein the second diameter is larger than the first diameter; and

A geometric proxy for each of the participants is placed at the virtual table.

13. The method of claim 5, further comprising adding depth to the virtual viewpoint using motion parallax.

14. The method of claim 13, further comprising:

tracking the remote participant's head; and

What is displayed to the remote participant through the virtual viewpoint is changed based on the position and orientation of the remote participant's head.

15. A controlled three-dimensional 3D endpoint system comprising:

a plurality of camera clusters arranged around a first endpoint to capture 3D video of a participant at said first endpoint such that 360 degrees around said participant is captured by said plurality of camera clusters ;

a geometric proxy of the participant obtained by adding captured RGB data and captured depth information from the 3D video;

Scene geometry and dialog geometry with correct eye gaze for creating virtual environments consistent with in-person communications; and

a display device configuration having a plurality of display devices located at a second end point such that the display devices are positioned at least 180 degrees around a viewer at the second end point to enable the viewer to The participant is viewed through a virtual viewpoint, wherein the viewer's perspective of the participant in the virtual environment varies based on the position and orientation of the viewer's head.

16. The controlled three-dimensional 3D endpoint system of claim 15, further comprising:

a virtual round table located in said virtual environment; and

A rendered geometric proxy of the participant placed around the virtual round table, along with other participants at other endpoints of the online meeting.

17. A method for scaling the number of participants in an online meeting comprising:

organizing a controlled capture environment at endpoints, the endpoints having a plurality of camera clusters disposed around each of the endpoints;

capturing three-dimensional video of each participant at each endpoint using the plurality of camera clusters;

creating a geometric proxy for each of the participants, further comprising adding captured RGB data and captured depth information from the 3D video to create the geometric proxy;

generating scene geometry including virtual tables based on the number of participants;

rendering each of the geometric agents toward each other in the scene geometry consistent with presence communication;

placing a rendered geometry proxy in the scene geometry around the virtual table to create a virtual environment;

organizing a controlled viewing environment at the endpoint having a display device that wraps at least 180 degrees around a participant at the endpoint;

displaying the virtual environment to the participants in the controlled viewing environment using the display device;

changing the virtual point of view of participants viewing the display device based on the position and orientation of each participant's head;

increase the number of participants such that additional participants are added; and

The size of the virtual table is increased to accommodate the additional participants.

18. The method of claim 17, further comprising defining the virtual table as a virtual round table having a diameter.

19. The method of claim 18, wherein increasing the size of the virtual table further comprises increasing a diameter of the virtual round table to accommodate the additional participants.