CN111937041A

CN111937041A - Video encoding by providing geometry proxies

Info

Publication number: CN111937041A
Application number: CN201980024478.9A
Authority: CN
Inventors: 迈克尔·和梅尔; 阿米什·马卡迪亚
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2018-09-26
Filing date: 2019-09-17
Publication date: 2020-11-13
Anticipated expiration: 2039-09-17
Also published as: CN111937041B; US20210360286A1; US20200099954A1; EP3752985A1; WO2020068492A1; US12192518B2; US20250142123A1; US11109065B2

Abstract

Compressing a video frame includes receiving a video frame, identifying a three-dimensional (3D) object in the frame, matching the 3D object to a stored 3D object, compressing the video frame using a color prediction scheme based on the 3D object and the stored 3D object, and storing the compressed frame with metadata that identifies the 3D object, indicates a location of the 3D object in the video frame, and indicates an orientation of the 3D object in the video frame.

Description

Video encoding by providing geometry proxies

相关申请Related applications

本申请是2018年9月26日提交的美国非临时申请No.16/143,165的继续并且要求其权益，其全部内容通过引用合并于此。This application is a continuation of and claims the benefit of US Non-Provisional Application No. 16/143,165, filed September 26, 2018, the entire contents of which are incorporated herein by reference.

技术领域technical field

实施例涉及对三维(3D)视频数据进行压缩和解压缩。Embodiments relate to compression and decompression of three-dimensional (3D) video data.

背景技术Background technique

用于视频压缩的技术都通过通用方法相关。通常，视频的帧通过将该帧的块定义为残差(例如，依据从先前帧或将来帧的位移)来临时压缩。对于帧内具有可以通过平面内刚性变换表征的残差的对象(例如，随时间推移在图像平面中移位和旋转的对象)，此压缩技术通常是可接受的(例如，在解压缩时具有最小的伪像或误差)。尽管此模型捕获许多视频动态源(例如，相机或场景平移)，但是存在常见场景，针对其这不是最佳模型(解压缩时效率低下或包含过多的伪像或误差)。The techniques used for video compression are all related by a common method. Typically, frames of video are temporarily compressed by defining blocks of the frame as residuals (eg, in terms of displacements from previous or future frames). For objects within the frame that have residuals that can be characterized by in-plane rigid transformations (e.g. objects that are shifted and rotated in the image plane over time), this compression technique is generally acceptable (e.g., when decompressed with minimal artifacts or errors). Although this model captures many video dynamic sources (eg, camera or scene translation), there are common scenarios for which this is not the best model (inefficient when decompressing or contains too many artifacts or errors).

换句话说，典型的预测方案可以可靠地预测先前和/或未来帧(例如，关键帧)中的像素/块/补片，以用于当对象大部分线性移动和/或具有从帧到帧的可预测运动。然而，当对象具有从帧到帧的动态非线性运动时，典型的预测方案可能无法可靠地预测先前帧和/或未来帧(例如，关键帧)中的像素/块/补片以用于计算残差。因此，当对象具有从帧到帧的动态非线性运动时，使用位移预测模型将可能会导致很少的压缩。In other words, typical prediction schemes can reliably predict pixels/blocks/patches in previous and/or future frames (e.g., keyframes) for use when objects move mostly linearly and/or have transitions from frame to frame of predictable movements. However, when objects have dynamic non-linear motion from frame to frame, typical prediction schemes may not be able to reliably predict pixels/blocks/patches in previous and/or future frames (eg, keyframes) for computation residual. Therefore, using a displacement prediction model will likely result in very little compression when objects have dynamic non-linear motion from frame to frame.

发明内容SUMMARY OF THE INVENTION

示例实施方式描述通过几何代理使用颜色预测来压缩视频的帧的系统和方法。Example embodiments describe systems and methods for compressing frames of video using color prediction by a geometric proxy.

在总体方面，一种方法和其上存储有计算机可执行程序代码的非暂时性计算机可读存储介质，所述计算机可执行程序代码在计算机系统上执行时，使计算机系统执行步骤。所述步骤包括，接收视频的帧，在该帧中标识三维(3D)对象，将3D对象与所存储的3D对象进行匹配，基于3D对象和所存储的3D对象使用颜色预测方案压缩视频帧，以及存储具有元数据的压缩帧，该元数据标识3D对象，指示3D对象在视频帧中的位置并指示3D对象在视频帧中的定向。In general aspects, a method and a non-transitory computer readable storage medium having computer executable program code stored thereon that, when executed on a computer system, cause the computer system to perform steps. The steps include, receiving a frame of the video, identifying a three-dimensional (3D) object in the frame, matching the 3D object to a stored 3D object, compressing the video frame using a color prediction scheme based on the 3D object and the stored 3D object, and storing the compressed frame with metadata identifying the 3D object, indicating the position of the 3D object in the video frame and indicating the orientation of the 3D object in the video frame.

实施方式可以包括以下一个或多个特征。例如，基于3D对象和所存储的3D对象使用颜色预测方案对视频的帧进行压缩可以包括，基于所存储的3D对象生成第一3D对象代理，基于在帧中标识的3D对象变换第一3D对象代理，基于所存储的3D对象生成第二3D对象代理，在视频的关键帧中标识3D对象，基于在关键帧中标识的3D对象变换第二3D对象代理，将颜色属性从3D对象映射到经变换的第一3D对象代理，将颜色属性从在关键帧中标识的3D对象映射到经变换的第二3D对象代理，以及基于经变换的第一对象的颜色属性以及经变换的第二3D对象代理的颜色属性生成3D对象的残差。Implementations may include one or more of the following features. For example, compressing the frames of the video using a color prediction scheme based on the 3D object and the stored 3D object may include generating a first 3D object proxy based on the stored 3D object, transforming the first 3D object based on the 3D object identified in the frame Proxy, generate a second 3D object proxy based on the stored 3D object, identify the 3D object in keyframes of the video, transform the second 3D object proxy based on the 3D object identified in the keyframe, map color attributes from the 3D object to the a transformed first 3D object proxy, mapping color properties from the 3D object identified in the keyframe to the transformed second 3D object proxy, and based on the color properties of the transformed first object and the transformed second 3D object The color properties of the proxy generate residuals for 3D objects.

例如，基于3D对象和所存储的3D对象使用颜色预测方案对视频的帧进行压缩可以包括，基于所存储的3D对象生成第一3D对象代理，基于帧中标识的3D对象变换第一3D对象代理，基于所存储的3D对象生成第二3D对象代理，在视频的关键帧中标识3D对象，基于在关键帧中标识的3D对象变换第二3D对象代理，将颜色属性从3D对象映射到经变换的第一3D对象代理，以及基于经变换的第一3D对象代理的颜色属性和经变换的第二3D对象代理的默认颜色属性生成3D对象的残差。For example, compressing the frames of the video using a color prediction scheme based on the 3D objects and the stored 3D objects may include generating a first 3D object proxy based on the stored 3D objects, transforming the first 3D object proxy based on the 3D objects identified in the frame , generate a second 3D object proxy based on the stored 3D object, identify the 3D object in a key frame of the video, transform the second 3D object proxy based on the 3D object identified in the key frame, map color attributes from the 3D object to the transformed and a residual of the 3D object is generated based on a color attribute of the transformed first 3D object proxy and a default color attribute of the transformed second 3D object proxy.

例如，基于3D对象和所存储的3D对象使用颜色预测方案对视频的帧进行压缩可以包括，基于所存储的3D对象生成第一3D对象代理，使用自动编码器对第一3D对象代理进行编码，基于帧中标识的3D对象变换编码的第一3D对象代理，基于存储的3D对象生成第二3D对象代理，使用自动编码器对第二3D对象代理进行编码，标识在视频的关键帧中的3D对象，基于在关键帧中标识的3D对象变换编码的第二3D对象代理，将来自于3D对象的颜色属性映射到经变换的第一3D对象代理，将来自于在关键帧中标识的3D对象的颜色属性映射到经变换的第二3D对象代理，以及基于用于经变换的第一3D对象代理的颜色属性以及用于经变换的第二3D对象代理的颜色属性为3D对象生成残差。For example, compressing the frames of the video using a color prediction scheme based on the 3D object and the stored 3D object may include generating a first 3D object proxy based on the stored 3D object, encoding the first 3D object proxy using an auto-encoder, Transform-encoded first 3D object proxy based on the 3D object identified in the frame, generate a second 3D object proxy based on the stored 3D object, encode the second 3D object proxy using an autoencoder, identify the 3D object proxy in the key frame of the video object, transform-coded second 3D object proxy based on the 3D object identified in the keyframe, maps color properties from the 3D object to the transformed first 3D object proxy, converts the color properties from the 3D object identified in the keyframe to the transformed first 3D object proxy The color properties of the are mapped to the transformed second 3D object proxy, and a residual is generated for the 3D object based on the color properties for the transformed first 3D object proxy and the color properties for the transformed second 3D object proxy.

例如，基于3D对象和所存储的3D对象使用颜色预测方案对视频的帧进行压缩可以包括基于所存储的3D对象生成第一3D对象代理，使用自动编码器对第一3D对象代理进行编码，基于在帧中标识的3D对象变换所编码的第一3D对象代理，基于所存储的3D对象生成第二3D对象代理，使用自动编码器对第二3D对象代理进行编码，在视频的关键帧中标识3D对象，基于在关键帧中标识的3D对象对所编码的第二3D对象代理进行变换，将颜色属性从3D对象映射到经变换的第一3D对象代理，以及基于经变换的第一3D对象代理的颜色属性和经变换的第二3D对象代理的默认颜色属性生成该3D对象的残差。For example, compressing the frames of the video using a color prediction scheme based on the 3D objects and the stored 3D objects may include generating a first 3D object proxy based on the stored 3D objects, encoding the first 3D object proxy using an auto-encoder, based on Transform the encoded first 3D object proxy with the 3D object identified in the frame, generate a second 3D object proxy based on the stored 3D object, encode the second 3D object proxy using an auto-encoder, identify in key frames of the video a 3D object, transforming the encoded second 3D object proxy based on the 3D object identified in the keyframe, mapping color properties from the 3D object to the transformed first 3D object proxy, and based on the transformed first 3D object The color properties of the proxy and the default color properties of the transformed second 3D object proxy generate the residual for the 3D object.

例如，在存储3D对象之前，这些步骤可以进一步包括：标识与视频相关联的至少一个感兴趣的3D对象，确定与感兴趣的3D对象相关联的多个网格属性，确定与感兴趣的3D对象相关联的位置，确定与感兴趣的3D对象相关联的定向，确定与感兴趣的3D对象相关联的多个颜色属性以及使用自动编码器减少与感兴趣的3D对象的网格属性相关联的变量的数量。压缩视频的帧可以包括确定3D对象相对于关键帧中的背景3D对象的原点坐标的位置坐标。所存储的3D对象可以包括默认颜色属性，并且颜色预测方案可以使用默认颜色属性。这些步骤可以进一步包括：标识与视频相关联的至少一个感兴趣的3D对象；基于至少一个感兴趣的3D对象，生成至少一个所存储的3D对象，至少一个所存储的3D对象中的每一个由包括通过面连接的点的集合的网格定义，每个点存储至少一个属性，该至少一个属性包括相应点的位置坐标，以及与视频相关联地存储至少一个所存储的3D对象。For example, prior to storing the 3D object, the steps may further include: identifying at least one 3D object of interest associated with the video, determining a plurality of mesh attributes associated with the 3D object of interest, determining Locations associated with objects, determine orientations associated with 3D objects of interest, determine multiple color properties associated with 3D objects of interest, and reduce mesh properties associated with 3D objects of interest using autoencoders number of variables. Compressing the frames of the video may include determining the positional coordinates of the 3D object relative to the origin coordinates of the background 3D object in the keyframe. The stored 3D objects may include default color properties, and the color prediction scheme may use the default color properties. The steps may further include: identifying at least one 3D object of interest associated with the video; generating at least one stored 3D object, each of the at least one stored 3D object, based on the at least one 3D object of interest A grid definition comprising a collection of points connected by faces, each point storing at least one attribute including the positional coordinates of the corresponding point, and at least one stored 3D object being stored in association with the video.

在另一个总体方面，一种方法和其上存储有计算机可执行程序代码的非暂时性计算机可读存储介质，所述计算机可执行程序代码在计算机系统上执行时，使计算机系统执行步骤。这些步骤包括接收视频的帧，在帧中标识三维(3D)对象，将3D对象与所存储的3D对象进行匹配，基于3D对象和所存储的3D对象使用颜色预测方案对视频的帧进行解压缩，以及渲染视频的帧。In another general aspect, a method and non-transitory computer-readable storage medium having computer-executable program code stored thereon that, when executed on a computer system, causes the computer system to perform steps. These steps include receiving frames of the video, identifying three-dimensional (3D) objects in the frames, matching the 3D objects to stored 3D objects, and decompressing the frames of the video using a color prediction scheme based on the 3D objects and the stored 3D objects , and the frame of the rendered video.

实施方式可以包括以下一个或多个特征。例如，基于3D对象和所存储的3D对象使用颜色预测方案对视频的帧进行解压缩可以包括，基于所存储的3D对象生成第一3D对象代理；基于在帧中标识的3D对象变换第一3D对象代理，在视频的关键帧中标识3D对象，基于在关键帧中标识的3D对象，变换第二3D对象代理，将颜色属性从3D对象映射到经变换的第一3D对象代理，将颜色属性从在关键帧中标识的3D对象映射到经变换的第二3D对象代理，以及基于经变换的第一3D对象代理的颜色属性和经变换的第二3D对象代理的颜色属性生成3D对象的颜色属性。Implementations may include one or more of the following features. For example, decompressing the frame of the video using a color prediction scheme based on the 3D object and the stored 3D object may include generating a first 3D object proxy based on the stored 3D object; transforming the first 3D object based on the 3D object identified in the frame An object proxy that identifies the 3D object in the keyframes of the video, transforms the second 3D object proxy based on the 3D object identified in the keyframe, maps the color properties from the 3D object to the transformed first 3D object proxy, converts the color properties Mapping from the 3D object identified in the keyframe to the transformed second 3D object proxy, and generating the color of the 3D object based on the color properties of the transformed first 3D object proxy and the transformed color properties of the second 3D object proxy Attributes.

例如，基于3D对象和所存储的3D对象使用颜色预测方案对视频的帧进行解压缩可以包括基于所存储的3D对象生成第一3D对象代理，基于帧中标识的3D对象变换第一3D对象代理，在视频的关键帧中标识3D对象，基于在关键帧中标识的3D对象变换第二3D对象代理，将颜色属性从3D对象映射到经变换的第一3D对象代理，基于经变换的第一3D对象代理的颜色属性和经变换的第二3D对象代理的默认颜色属性生成3D对象的颜色属性。For example, decompressing a frame of a video using a color prediction scheme based on the 3D object and the stored 3D object may include generating a first 3D object proxy based on the stored 3D object, transforming the first 3D object proxy based on the 3D object identified in the frame , identify the 3D object in the key frame of the video, transform the second 3D object proxy based on the 3D object identified in the key frame, map the color attributes from the 3D object to the transformed first 3D object proxy, based on the transformed first 3D object proxy The color properties of the 3D object proxy and the transformed default color properties of the second 3D object proxy generate the color properties of the 3D object.

例如，基于3D对象和所存储的3D对象使用颜色预测方案对视频的帧进行解压缩可以包括基于所存储的3D对象生成第一3D对象代理，使用自动编码器对第一3D对象代理进行解码，基于与3D对象相关联的元数据变换已解码的第一3D对象代理，基于所存储的3D对象生成第二3D对象代理，使用自动编码器对第二3D对象代理进行解码，在视频的关键帧中标识3D对象，基于与在关键帧中标识的3D对象相关联的元数据对已解码的第二3D对象代理进行变换，将颜色属性从3D对象映射到经变换的第一3D对象代理，将颜色属性从在关键帧中标识的3D对象映射到经变换的第二3D对象代理，以及基于经变换的第一3D对象代理的颜色属性和经变换的第二3D对象代理的颜色属性生成3D对象的颜色属性。For example, decompressing the frame of the video using a color prediction scheme based on the 3D object and the stored 3D object may include generating a first 3D object proxy based on the stored 3D object, decoding the first 3D object proxy using an auto-encoder, Transform the decoded first 3D object proxy based on metadata associated with the 3D object, generate a second 3D object proxy based on the stored 3D object, decode the second 3D object proxy using an auto-encoder, at keyframes of the video identify the 3D object in the keyframe, transform the decoded second 3D object proxy based on the metadata associated with the 3D object identified in the keyframe, map the color attributes from the 3D object to the transformed first 3D object proxy, The color properties are mapped from the 3D object identified in the keyframe to the transformed second 3D object proxy, and the 3D object is generated based on the transformed color properties of the first 3D object proxy and the transformed color properties of the second 3D object proxy color properties.

例如，基于3D对象和所存储的3D对象使用颜色预测方案对视频的帧进行解压缩可以包括基于所存储的3D对象生成第一3D对象代理，使用自动编码器对第一3D对象代理进行解码，基于与3D对象相关联的元数据对已解码的第一3D对象代理进行变换，基于所存储的3D对象生成第二3D对象代理，使用自动编码器对第二3D对象代理进行解码，在视频的关键帧中标识3D对象，基于与在关键帧中标识的3D对象相关联的元数据对已解码的第二3D对象代理进行变换，将颜色属性从3D对象映射到经变换的第一3D对象代理，以及基于经变换的第一3D对象代理的颜色属性和经变换的第二3D对象代理的默认属性生成3D对象的颜色属性。For example, decompressing the frame of the video using a color prediction scheme based on the 3D object and the stored 3D object may include generating a first 3D object proxy based on the stored 3D object, decoding the first 3D object proxy using an auto-encoder, Transforming the decoded first 3D object proxy based on metadata associated with the 3D object, generating a second 3D object proxy based on the stored 3D object, decoding the second 3D object proxy using an autoencoder, Identifying the 3D object in the keyframe, transforming the decoded second 3D object proxy based on metadata associated with the 3D object identified in the keyframe, mapping color attributes from the 3D object to the transformed first 3D object proxy , and the color properties of the 3D object are generated based on the transformed color properties of the first 3D object proxy and the default properties of the transformed second 3D object proxy.

例如，步骤可以进一步包括使用机器训练的生成建模技术来接收3D形状的至少一个潜在表示以：确定与3D形状相关联的多个网格属性；确定与3D形状相关联的位置；确定与3D形状相关联的定向；以及确定与3D形状相关联的多个颜色属性，并将3D形状存储为所存储的3D对象。渲染视频的帧可以包括：接收3D对象相对于关键帧中的背景3D对象的原点坐标的位置坐标，以及使用位置坐标将3D对象放置在帧中。这些步骤可以进一步包括：接收由自动编码器的编码器使用的神经网络以减少与至少一个感兴趣的3D对象的网格属性、位置、定向和颜色属性相关联的变量的数量，在自动编码器的解码器中使用神经网络再生与至少一个感兴趣的3D对象的网格相关联的点，再生点包括再生位置属性、定向属性和颜色属性，以及将至少一个感兴趣的3D对象存储为所存储的3D对象。For example, the steps may further include using machine-trained generative modeling techniques to receive at least one potential representation of the 3D shape to: determine a plurality of mesh attributes associated with the 3D shape; determine a location associated with the 3D shape; determine a location associated with the 3D shape an orientation associated with the shape; and determining a plurality of color properties associated with the 3D shape and storing the 3D shape as a stored 3D object. Rendering the frame of the video may include receiving position coordinates of the 3D object relative to the origin coordinates of the background 3D object in the keyframe, and using the position coordinates to place the 3D object in the frame. The steps may further include: receiving a neural network used by an encoder of the auto-encoder to reduce the number of variables associated with mesh properties, position, orientation, and color properties of the at least one 3D object of interest, in the auto-encoder The decoder uses a neural network to regenerate points associated with the mesh of at least one 3D object of interest, the regenerated points include regenerating position attributes, orientation attributes, and color attributes, and stores the at least one 3D object of interest as stored 3D objects.

在又一总体方面中，一种方法和其上存储有计算机可执行程序代码的非暂时性计算机可读存储介质，所述计算机可执行程序代码在计算机系统上执行时使计算机系统执行用于使用代理预测颜色变化(color variance)的步骤。这些步骤包括：基于所存储的3D对象生成第一3D对象代理；基于所存储的3D对象生成第二3D对象代理；基于在视频的帧中标识的3D对象变换第一3D对象代理；基于在视频的关键帧中标识的3D对象变换第二3D对象代理，将颜色属性从在视频帧中标识的3D对象映射到经变换的第一3D对象代理，将颜色属性从在关键帧中标识的3D对象映射到经变换的第二3D对象代理，以及基于经变换的第一3D对象代理的颜色属性和经变换的第二3D对象代理的颜色属性生成3D对象的颜色数据。In yet another general aspect, a method and non-transitory computer-readable storage medium having computer-executable program code stored thereon, the computer-executable program code, when executed on a computer system, causes the computer system to execute for use Steps for the agent to predict color variance. The steps include: generating a first 3D object proxy based on the stored 3D objects; generating a second 3D object proxy based on the stored 3D objects; transforming the first 3D object proxy based on the 3D objects identified in the frames of the video; The 3D object identified in the keyframe of mapping to the transformed second 3D object proxy, and generating color data for the 3D object based on the transformed color properties of the first 3D object proxy and the transformed color properties of the second 3D object proxy.

实施方式可以包括以下一个或多个特征。例如，这些步骤进一步可以包括：在变换第一3D对象代理之前，使用自动编码器对第一3D对象代理进行编码，以及在变换第二3D对象代理之前，使用自动编码器对第二3D对象代理进行编码。步骤可以进一步包括在变换第一3D对象代理之后，使用自动编码器对第一3D对象代理进行解码，以及在变换第二3D对象代理之后，使用自动编码器对第二3D对象代理进行解码。生成3D对象的颜色数据可以包括从经变换的第二3D对象代理的颜色属性中减去经变换的第一3D对象代理的颜色属性。生成3D对象的颜色数据可以包括将经变换的第一3D对象代理的颜色属性添加到经变换的第二3D对象代理的颜色属性。Implementations may include one or more of the following features. For example, the steps may further comprise: encoding the first 3D object proxy using an autoencoder prior to transforming the first 3D object proxy, and using the autoencoder to encode the second 3D object proxy prior to transforming the second 3D object proxy to encode. The steps may further include decoding the first 3D object proxy using an autoencoder after transforming the first 3D object proxy, and decoding the second 3D object proxy using the autoencoder after transforming the second 3D object proxy. Generating color data for the 3D object may include subtracting the color attribute of the transformed first 3D object proxy from the color attribute of the transformed second 3D object proxy. Generating color data for the 3D object may include adding color attributes of the transformed first 3D object proxy to color attributes of the transformed second 3D object proxy.

附图说明Description of drawings

通过下面在此给出的详细描述和附图，示例实施方式将变得更加充分地理解，其中，相似的元件由相似的附图标记表示，其仅以图示的方式给出，并且因此不限制示例实施方式并且其中：Example embodiments will become more fully understood from the following detailed description and accompanying drawings, wherein like elements are denoted by like reference numerals, which are given by way of illustration only and are therefore not Example embodiments are limited and where:

图1图示根据示例实施方式的用于压缩视频的信号流的框图。1 illustrates a block diagram of a signal flow for compressing video, according to an example embodiment.

图2图示根据示例实施方式的用于存储压缩视频的信号流的框图。2 illustrates a block diagram of a signal flow for storing compressed video, according to an example embodiment.

图3A图示根据示例实施方式的编码器预测模块的框图。3A illustrates a block diagram of an encoder prediction module according to an example embodiment.

图3B图示根据示例实施方式的另一编码器预测模块的框图。3B illustrates a block diagram of another encoder prediction module according to an example embodiment.

图4A图示根据示例实施方式的解码器预测模块的框图。4A illustrates a block diagram of a decoder prediction module according to an example embodiment.

图4B图示根据示例实施方式的另一解码器预测模块的框图。4B illustrates a block diagram of another decoder prediction module according to an example embodiment.

图5A图示根据示例实施方式的用于对3D对象进行编码的信号流的框图。5A illustrates a block diagram of a signal flow for encoding a 3D object, according to an example embodiment.

图5B图示根据示例实施方式的用于解码3D对象的信号流的框图。5B illustrates a block diagram of a signal flow for decoding a 3D object, according to an example embodiment.

图6A图示用于流传输视频并在客户端设备上渲染视频的信号流的框图。6A illustrates a block diagram of a signal flow for streaming video and rendering the video on a client device.

图6B图示用于流传输视频并在客户端设备上渲染视频的另一信号流的框图。6B illustrates a block diagram of another signal flow for streaming video and rendering the video on a client device.

图7图示根据至少一个示例实施例的用于压缩视频的帧的方法的框图。7 illustrates a block diagram of a method for compressing frames of video, according to at least one example embodiment.

图8图示根据至少一个示例实施例的用于压缩视频的帧的另一种方法的框图。8 illustrates a block diagram of another method for compressing frames of video, according to at least one example embodiment.

图9图示根据至少一个示例实施例的用于解压缩和渲染视频的帧的方法的框图。9 illustrates a block diagram of a method for decompressing and rendering frames of a video, according to at least one example embodiment.

图10图示根据至少一个示例实施例的用于压缩3D对象的方法的框图。10 illustrates a block diagram of a method for compressing a 3D object according to at least one example embodiment.

图11图示根据至少一个示例实施例的用于解压缩3D对象的方法的框图。11 illustrates a block diagram of a method for decompressing a 3D object according to at least one example embodiment.

图12图示根据至少一个示例实施例的视频编码器系统。12 illustrates a video encoder system in accordance with at least one example embodiment.

图13图示根据至少一个示例实施例的视频解码器系统。13 illustrates a video decoder system in accordance with at least one example embodiment.

图14图示根据至少一个示例实施例的计算机设备和移动计算机设备的示例。14 illustrates an example of a computer device and a mobile computer device in accordance with at least one example embodiment.

应该注意的是，这些图旨在图示在某些示例实施方式中利用的方法和/或结构的一般特性，并且旨在补充在下面提供的撰写描述。然而，这些附图不是按比例的并且可能不精确地反映任何给定实施方式的精确结构或性能特性，并且不应该被解释为限定或者限制由示例实施方式所包含的值或属性的范围。在各个附图中使用类似或相同的附图标记旨在指示存在类似或相同的元件或特征。It should be noted that these figures are intended to illustrate the general characteristics of the methodologies and/or structures utilized in certain example embodiments and are intended to supplement the written description provided below. However, these drawings are not to scale and may not accurately reflect the precise structural or performance characteristics of any given embodiment, and should not be construed as defining or limiting the range of values or properties encompassed by example embodiments. The use of similar or identical reference numbers in the various figures is intended to indicate the presence of similar or identical elements or features.

具体实施方式Detailed ways

尽管示例实施例可以包括各种修改和替代形式，但是其实施例在附图中通过示例示出，并且将在本文中进行详细描述。然而，应理解，无意将示例实施例限制为所公开的特定形式，而是相反，示例实施例将覆盖落入权利要求的范围内的所有修改、等效物和替代形式。在整个附图的描述中，相同的标号指代相同的元件。While example embodiments may include various modifications and alternative forms, embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intention to limit example embodiments to the particular forms disclosed, but on the contrary, example embodiments are to cover all modifications, equivalents, and alternatives falling within the scope of the claims. Throughout the description of the drawings, the same reference numerals refer to the same elements.

在3D动态环境中，经历复杂3D变换的3D对象可能不允许基于先前帧和/或未来帧、先前压缩的图像或帧等进行简单的刚性变换。例如，经历由刚性、清晰度和变形分量组成的变换的人类演员将导致像素空间中的高度非线性变换。随后，附近的关键帧中可能没有合适的对应块。这些3D对象在本文中称为动态3D对象。In a 3D dynamic environment, 3D objects undergoing complex 3D transformations may not allow simple rigid transformations based on previous and/or future frames, previously compressed images or frames, and the like. For example, a human actor undergoing a transformation consisting of rigidity, sharpness, and deformation components will result in highly nonlinear transformations in pixel space. Subsequently, there may be no suitable corresponding blocks in the nearby keyframes. These 3D objects are referred to herein as dynamic 3D objects.

非动态的可以包括由于相机或场景平移而看起来是从帧到帧移动的3D对象，可以通过几何代理使用颜色预测来进行编码/解码。在该示例中，因为捕获场景的相机正在(例如，以可预测的方式和/或方向)移动，所以静止的3D对象可以看起来是正在从帧到帧移动。在该示例中，3D对象(例如，车辆，即，火车、汽车或飞机)可以以可预测的方式(例如，以恒定的速度和/或方向)从帧到帧移动。这些对象有时在本文中称为平移3D对象。非动态3D对象的另一个示例是看起来不在场景内移动的3D对象。在此示例中，静止或固定的3D对象(例如，场景的背景、场景内的固定位置的家具或远处的缓慢移动的对象)可能看起来从帧到帧是静止的(例如，没有任何相机或场景平移)。这些对象有时在本文中称为固定或背景3D对象。Non-dynamic can include 3D objects that appear to move from frame to frame due to camera or scene translation, which can be encoded/decoded using color prediction by geometry proxies. In this example, a stationary 3D object may appear to be moving from frame to frame because the camera capturing the scene is moving (eg, in a predictable manner and/or direction). In this example, a 3D object (eg, a vehicle, ie, a train, car, or airplane) may move from frame to frame in a predictable manner (eg, with a constant speed and/or direction). These objects are sometimes referred to herein as translation 3D objects. Another example of a non-dynamic 3D object is a 3D object that does not appear to move within the scene. In this example, a stationary or stationary 3D object (eg, the background of the scene, furniture in a fixed position within the scene, or a slowly moving object in the distance) may appear stationary from frame to frame (eg, without any cameras) or scene pan). These objects are sometimes referred to herein as stationary or background 3D objects.

示例实施方式将所存储的3D对象用作出现在视频中的对象的几何代理。所存储的3D对象可以是可以在预测方案中使用的可变形3D形状模型(例如，具有必要时可以进行操纵的给定属性的网格)。例如，可以基于所存储的3D对象来预测与动态3D对象相关联的像素在视频的帧中的位置。类似地，预测方案可用于压缩视频的帧中的背景图像、在视频的帧的图像平面内移动(例如，可预测的移动)的3D对象、3D对象的一部分、包括3D对象的视频帧的层(例如，作为Z阶图像的一部分)、包括3D对象的容器(例如，标准网格对象)等。Example embodiments use stored 3D objects as geometric proxies for objects that appear in the video. The stored 3D objects may be deformable 3D shape models (eg, meshes with given properties that can be manipulated if necessary) that can be used in the prediction scheme. For example, the positions of pixels associated with dynamic 3D objects in the frames of the video may be predicted based on the stored 3D objects. Similarly, prediction schemes may be used to compress background images in frames of video, 3D objects that move (eg, predictably move) within the image plane of frames of video, portions of 3D objects, layers of video frames that include 3D objects (eg, as part of a Z-order image), a container that includes 3D objects (eg, standard mesh objects), etc.

一种或多种实施方式可以包括使用所存储的3D对象在关键帧中定位像素、块和/或补片。一种或多种实现方式可以包括将视频的帧中的像素、块和/或补片与所存储的3D对象的像素、块和/或补片匹配。一种或多种实施方式可以包括在存储所存储的3D对象之前压缩所存储的3D对象，以及在预测过程期间解压缩所存储的3D对象。一种或多种实现方式可以包括，在存储所存储的3D对象之前压缩所存储的3D对象；压缩与视频的帧相关联的3D对象；以及在预测过程中使用压缩3D对象。One or more implementations may include using stored 3D objects to locate pixels, blocks, and/or patches in keyframes. One or more implementations may include matching pixels, blocks, and/or patches in a frame of a video to pixels, blocks, and/or patches of a stored 3D object. One or more embodiments may include compressing the stored 3D object prior to storing the stored 3D object, and decompressing the stored 3D object during the prediction process. One or more implementations may include compressing the stored 3D objects prior to storing them; compressing the 3D objects associated with the frames of the video; and using the compressed 3D objects in the prediction process.

图1图示根据示例实施方式的用于压缩视频的信号流的框图。如图1中所示，编码器105包括帧110、预测模块115和编码模块120。此外，视频存储125包括元数据130、所存储的3D对象135和压缩帧140。1 illustrates a block diagram of a signal flow for compressing video, according to an example embodiment. As shown in FIG. 1 , encoder 105 includes frame 110 , prediction module 115 and encoding module 120 . In addition, video storage 125 includes metadata 130 , stored 3D objects 135 and compressed frames 140 .

视频数据5被输入到编码器105，其中从包括在视频数据5中的多个帧中选择帧110。视频数据5可以对应于3D视频(例如，单眼视图)、2D视频、视频的一部分(例如，少于视频的所有帧)等。因此，帧110可以包括与视频的帧相对应的数据。编码器105可以被配置成基于使用3D对象作为几何代理来使用颜色预测方案。编码器105可以基于将3D对象用作几何代理来使用颜色预测方案来压缩帧110。压缩帧110可以减少用于存储和/或传达帧110的数据量。压缩帧110可以包括预测步骤、量化步骤、变换步骤和熵编码步骤。Video data 5 is input to encoder 105 , where frame 110 is selected from a plurality of frames included in video data 5 . Video data 5 may correspond to 3D video (eg, a monocular view), 2D video, a portion of a video (eg, less than all frames of a video), or the like. Accordingly, frames 110 may include data corresponding to frames of video. The encoder 105 may be configured to use a color prediction scheme based on using 3D objects as geometric proxies. The encoder 105 may compress the frame 110 using a color prediction scheme based on the use of 3D objects as geometric proxies. Compressing the frame 110 may reduce the amount of data used to store and/or communicate the frame 110 . The compressed frame 110 may include prediction steps, quantization steps, transformation steps, and entropy coding steps.

帧间预测可以通过计算依据一个或多个邻近帧所表达的增量值来利用空间冗余(例如，帧之间的像素之间的相关性)。增量代码化可以包括定位关键帧(例如，先前的邻近关键帧、即将到来的邻近关键帧)中的相关像素/块/补片，以及然后计算正被编码的帧中的像素的增量值。增量值可以称为残差。因此，帧间预测可以生产帧中的像素/块/补片(例如，3D对象)的残差。在示例实施方式中，可以依据显式纹理(颜色)、默认纹理、预定义纹理、所标识的3D对象的纹理等来表达增量值。Inter prediction can exploit spatial redundancy (eg, correlation between pixels between frames) by computing delta values expressed in terms of one or more adjacent frames. Incremental coding may include locating relevant pixels/blocks/patches in keyframes (eg, previous adjacent keyframes, upcoming adjacent keyframes), and then computing delta values for pixels in the frame being encoded . The incremental value can be called residual. Thus, inter prediction can produce residuals of pixels/blocks/patches in a frame (eg, 3D objects). In an example embodiment, the delta value may be expressed in terms of an explicit texture (color), a default texture, a predefined texture, the texture of the identified 3D object, or the like.

预测模块115可以被配置成在帧110和/或关键帧中定位3D对象。例如，机器视觉、计算机视觉和/或计算机图像识别技术可以用于标识和定位3D对象。一旦已经标识出3D对象，就可以使用坐标系(例如，2D笛卡尔、3D笛卡尔、极坐标等)作为与所标识的3D对象的网格相关联的点的属性来定位3D对象。Prediction module 115 may be configured to locate 3D objects in frame 110 and/or keyframes. For example, machine vision, computer vision and/or computer image recognition techniques may be used to identify and locate 3D objects. Once the 3D object has been identified, the 3D object can be positioned using a coordinate system (eg, 2D Cartesian, 3D Cartesian, polar, etc.) as an attribute of the points associated with the mesh of the identified 3D object.

在示例实施方式中，使用多个已知图像基于训练(机器学习)卷积神经网络的计算机图像识别技术可以用于标识和定位3D对象。例如，从所选择的帧中选择和/或标识一个块、多个块和/或补片。经训练的卷积神经网络可以对所选择的块、多个块和/或补片进行操作。可以测试结果(例如，错误测试、损耗测试、发散测试等)。如果该测试产生低于阈值的值(或可替代地，取决于测试类型而产生高于阈值的值)，则可以将所选择的块、多个块和/或补片标识为3D对象。In an example embodiment, a computer image recognition technique based on a trained (machine learning) convolutional neural network using a number of known images may be used to identify and locate 3D objects. For example, a block, blocks and/or patches are selected and/or identified from the selected frame. The trained convolutional neural network may operate on the selected block, multiple blocks, and/or patches. Results can be tested (eg, error testing, loss testing, divergence testing, etc.). If the test yields a value below the threshold (or alternatively, depending on the type of test, above the threshold), the selected block, blocks and/or patches may be identified as a 3D object.

在示例实施方式中，视频的帧可以包括指示先前标识的感兴趣3D对象被包括在帧中的标签。标签可以包括3D对象的身份和定位。例如，可以使用计算机生成图像(CGI)工具(例如，计算机动画电影)来生成视频。可以在每帧中标识和标记计算机生成的字符。此外，可以将用于每个所标识的感兴趣的3D对象(例如，所标识的字符)的模型存储为所存储的3D对象135。In an example embodiment, a frame of the video may include a tag indicating that a previously identified 3D object of interest is included in the frame. Labels may include the identity and location of the 3D object. For example, a computer-generated image (CGI) tool (eg, a computer-animated film) can be used to generate the video. Computer-generated characters can be identified and tagged in each frame. Additionally, a model for each identified 3D object of interest (eg, identified character) may be stored as stored 3D object 135 .

在示例实施方式中，可以由三角形网格定义所存储的3D对象135和3D对象。三角形网格可以是由三角形面连接的点的集合。每个点可以存储各种属性。例如，属性可以包括每个点的位置、颜色、纹理坐标等。属性可以包括和/或指示(例如，多个属性可以指示)在视频的帧(例如，帧110)和/或图像中的相应3D对象的定向和/或相应3D对象的位置。In an example embodiment, the stored 3D objects 135 and 3D objects may be defined by a triangular mesh. A triangular mesh can be a collection of points connected by triangular faces. Each point can store various properties. For example, attributes can include the location, color, texture coordinates, etc. of each point. The attributes may include and/or indicate (eg, a plurality of attributes may indicate) the orientation of the corresponding 3D object and/or the position of the corresponding 3D object in a frame (eg, frame 110 ) and/or image of the video.

因此，在示例实时方式中，3D对象的网格属性可以足以标识和定位3D对象。可以将包括用于多个感兴趣3D对象的网格属性的模型存储为所存储的3D对象135。该模型可以被标准化。例如，可以将用于男人、女人、青少年、儿童或更一般地人类或人类的一部分(例如，身体、头部、手等)的模型存储为3D对象135。例如，用于狗、猫、鹿或更一般的四足动物或四足动物的一部分(例如，身体、头部、腿等)的模型可以被存储为所存储的3D对象135。然后可以使用模型的属性以在帧中搜索具有相似属性的3D对象。Thus, in an example real-time manner, the mesh properties of the 3D object may be sufficient to identify and locate the 3D object. A model including mesh properties for a plurality of 3D objects of interest may be stored as stored 3D objects 135 . The model can be normalized. For example, a model for a man, woman, teen, child, or more generally a human being or a part of a human being (eg, body, head, hand, etc.) may be stored as 3D object 135 . For example, a model for a dog, cat, deer, or more generally a quadruped or part of a quadruped (eg, body, head, legs, etc.) may be stored as the stored 3D object 135 . The properties of the model can then be used to search the frame for 3D objects with similar properties.

位于帧110中的3D对象(以下称为3D对象)可以与所存储的3D对象135之一匹配。在示例实施方式中，由计算机图像识别技术生成的3D对象的身份可以被用于搜索所存储的3D对象135。在示例实施方式中，可以将在帧中找到的3D对象的标签与所存储的3D对象135中的一个的标签进行匹配。在示例实施方式中，所存储的3D对象135之一与3D对象具有相似属性的模型可以被标识为匹配。The 3D object located in the frame 110 (hereinafter referred to as the 3D object) may be matched with one of the stored 3D objects 135 . In an example embodiment, the identities of the 3D objects generated by computer image recognition techniques may be used to search the stored 3D objects 135 . In an example embodiment, the label of the 3D object found in the frame may be matched to the label of one of the stored 3D objects 135 . In an example embodiment, a model with one of the stored 3D objects 135 having similar properties to the 3D object may be identified as a match.

然后，可以在针对帧110的颜色预测方案中使用所存储的3D对象135中的所匹配的3D对象(以下称为所存储的3D对象)。例如，与所存储的3D对象相对应的网格可以被平移并且被定向以与3D对象的定向对齐。然后，可以将与经平移和定向的所存储的3D对象相对应的点与位于附近(临时)、先前编码的关键帧中的对应(例如，相同)的3D对象的点进行匹配。然后，预测模块115可以使用对应的3D对象的匹配点来选择或预测关键帧中的像素/块/补片，以用于计算3D对象在帧110中的残差(例如，相对于关键帧的颜色位移)。The matched 3D objects in the stored 3D objects 135 (hereinafter referred to as stored 3D objects) may then be used in the color prediction scheme for the frame 110 . For example, a grid corresponding to a stored 3D object may be translated and oriented to align with the orientation of the 3D object. The points corresponding to the translated and oriented stored 3D object may then be matched to points of the corresponding (eg, identical) 3D object located in a nearby (temporary), previously encoded keyframe. The prediction module 115 may then use the matching points of the corresponding 3D object to select or predict pixels/blocks/patches in the keyframe for use in computing the residual of the 3D object in frame 110 (eg, relative to the keyframe's color shift).

另外，预测模块115可以生成与帧110相关联的元数据130。元数据130可以包括与位于已经使用所存储的3D对象135之一预测的帧110中的至少一个3D对象相关联的数据。元数据130可以包括与3D对象在帧110中的定位和/或定向相关联的属性(例如，网格点属性)。元数据130是相对于(对应于)视频数据5的压缩帧140之一被存储。Additionally, prediction module 115 may generate metadata 130 associated with frame 110 . The metadata 130 may include data associated with at least one 3D object located in the frame 110 that has been predicted using one of the stored 3D objects 135 . Metadata 130 may include attributes (eg, grid point attributes) associated with the positioning and/or orientation of 3D objects in frame 110 . The metadata 130 is stored with respect to (corresponds to) one of the compressed frames 140 of the video data 5 .

编码模块120可以被配置成对残差执行一系列编码过程。例如，对应于残差的数据可以被变换、量化和熵编码。The encoding module 120 may be configured to perform a series of encoding processes on the residual. For example, the data corresponding to the residuals may be transformed, quantized, and entropy encoded.

变换残差可以包括将来自空间域的数据(例如，像素值)转换为变换域中的变换系数。变换系数可以对应于最初与帧110中的原始块和/或补片大小相同的系数的二维矩阵。换句话说，在帧110中的原始块和/或补片中变换系数可以与数据点(例如，像素)一样多。然而，由于变换，变换系数的一部分可以具有等于零的值。通常，变换包括Karhunen-Loève变换(KLT)、离散余弦变换(DCT)、奇异值分解变换(SVD)和非对称离散正弦变换(ADST)。Transform residuals may include transforming data (eg, pixel values) from the spatial domain into transform coefficients in the transform domain. The transform coefficients may correspond to a two-dimensional matrix of coefficients that are initially the same size as the original block and/or patch in frame 110 . In other words, there may be as many transform coefficients in the original block and/or patch in frame 110 as there are data points (eg, pixels). However, due to the transform, a portion of the transform coefficients may have a value equal to zero. Typically, transforms include Karhunen-Loève transform (KLT), discrete cosine transform (DCT), singular value decomposition transform (SVD), and asymmetric discrete sine transform (ADST).

矢量坐标(例如，表示变换系数)通常由浮点或双倍精确给出。但是，这样的表示通常比实际需要的更为精确。例如，视频数据5可以源自具有一些测量误差的视频捕获设备(例如，相机、扫描仪、计算机生成的图像程序)。因此，相对大量的较低位可能是噪声。量化将给定的浮点数转换为b位长的整数表示。因此，量化可以减少每个变换系数中的数据。量化可能涉及将相对较大范围内的值映射到相对较小范围内的值，从而减少表示量化的变换系数所需的数据量。量化可以将变换系数转换为离散的量子值，其被称为量化的变换系数或量化等级。例如，量化可以将零添加到与变换系数相关联的数据。例如，编码标准可以在标量量化过程中定义128个量化等级。Vector coordinates (eg, representing transform coefficients) are usually given by float or double precision. However, such representations are often more precise than actually needed. For example, the video data 5 may originate from a video capture device (eg, camera, scanner, computer-generated image program) with some measurement error. Therefore, a relatively large number of lower bits may be noise. Quantize converts the given floating-point number to a b-bit long integer representation. Therefore, quantization can reduce the data in each transform coefficient. Quantization may involve mapping a relatively large range of values to a relatively small range of values, thereby reducing the amount of data required to represent the quantized transform coefficients. Quantization can convert transform coefficients into discrete quantum values, which are called quantized transform coefficients or quantization levels. For example, quantization may add zeros to data associated with transform coefficients. For example, a coding standard may define 128 quantization levels in a scalar quantization process.

包含在量化的变换系数中的信息的期望值(例如，离散量子值的数据集)。数据集包含的唯一值数量越多，熵就越高。重复的值减少熵并获得更好的压缩效果。因此，可以对量化的变换系数进行熵编码。使用用于基于数据熵压缩数据集的一组技术中的一种来执行熵代码化。例如，熵代码化技术可以包括霍夫曼代码化、算术代码化或使用不对称数字系统。The expected value of the information contained in the quantized transform coefficients (eg, a dataset of discrete quantum values). The higher the number of unique values a dataset contains, the higher the entropy. Repeated values reduce entropy and get better compression. Therefore, the quantized transform coefficients can be entropy encoded. Entropy coding is performed using one of a set of techniques for compressing datasets based on data entropy. For example, entropy coding techniques may include Huffman coding, arithmetic coding, or the use of asymmetric number systems.

然后，将(针对帧110中的所有像素/块/补片的)熵编码的系数以及解码帧110所需的信息(例如，使用的预测类型、运动矢量和量化值)一起输出作为压缩帧140之一，并与视频数据5的其他压缩帧一起存储和/或与视频数据5的其他压缩帧关联存储。The entropy coded coefficients (for all pixels/blocks/patches in frame 110) are then output as compressed frame 140 along with the information needed to decode frame 110 (eg prediction type used, motion vector and quantization value) One of the compressed frames of the video data 5 and stored with and/or in association with the other compressed frames of the video data 5 .

通常，编码器105和视频存储125是云或万维网中的元素。例如，编码器105可以是被实现为在云计算设备中实现的计算机硬件和/或计算机软件的多个编码器之一，其被配置成使用由一个或多个标准(例如，H.264、H.265、HEVC、VP8、VP9、VP10、AV1等)定义的压缩方案来压缩视频数据(例如，视频数据5)。例如，视频存储125可以被实现为位于云计算设备(例如，流传输服务器)中的至少一种非易失性存储器、非暂时性计算机可读介质等。在至少一个实施方式中，编码器105将视频数据5压缩为视频，并且存储压缩的视频数据以用于未来(例如，在时间上稍后)在从云计算设备流传输到客户端设备时回放视频。Typically, encoder 105 and video storage 125 are elements in the cloud or the World Wide Web. For example, the encoder 105 may be one of a number of encoders implemented as computer hardware and/or computer software implemented in a cloud computing device that is configured to use data provided by one or more standards (eg, H.264, H.265, HEVC, VP8, VP9, VP10, AV1, etc.) to compress video data (eg, video data 5). For example, video storage 125 may be implemented as at least one type of non-volatile memory, non-transitory computer-readable medium, etc. located in a cloud computing device (eg, a streaming server). In at least one embodiment, the encoder 105 compresses the video data 5 into video, and stores the compressed video data for future (eg, later in time) playback when streaming from the cloud computing device to the client device video.

在此实施方式中，客户端设备包括解码器(例如，解码器145)。在一些实施方式中，元数据130和/或所存储的3D对象135可以被传达到客户端设备。元数据130和/或存储的3D对象135可以按需和/或作为初始化过程被传达到客户端设备。元数据130和/或所存储的3D对象135可以在被存储到视频存储125中时被压缩和/或在被传达到客户端设备之前按需压缩。在示例实施方式中，可以使用机器训练的生成建模技术来压缩所存储的3D对象135，以生成与针对所存储的3D对象的网格的属性和位置相关联的减少数量的变量(在本文中称为潜在表示或减少潜在表示)(如下文更详细描述)。In this embodiment, the client device includes a decoder (eg, decoder 145). In some implementations, metadata 130 and/or stored 3D objects 135 may be communicated to the client device. Metadata 130 and/or stored 3D objects 135 may be communicated to the client device on demand and/or as an initialization process. Metadata 130 and/or stored 3D objects 135 may be compressed when stored in video storage 125 and/or on demand before being communicated to the client device. In an example embodiment, the stored 3D objects 135 may be compressed using machine-trained generative modeling techniques to generate a reduced number of variables associated with the properties and positions of the meshes for the stored 3D objects (herein referred to as latent representation or reduced latent representation) (as described in more detail below).

因此，如图1中所示，解码器145包括帧重构模块150、预测模块155和解码模块160。解码模块160可以被配置成执行编码模块120的逆运算。解码模块160可以接收压缩帧140(例如，表示由包括解码器145的客户端设备的用户为了流传输或下载而选择的3D电影)。压缩帧140可以一次被接收一个，一次被接收多个帧或者帧的块，或者被接收作为完整的3D电影。解码器145可以选择压缩帧140之一，并且解码模块160可以对所选择的压缩帧进行熵解码、去量化和逆变换。Accordingly, as shown in FIG. 1 , the decoder 145 includes a frame reconstruction module 150 , a prediction module 155 and a decoding module 160 . The decoding module 160 may be configured to perform the inverse operation of the encoding module 120 . Decoding module 160 may receive compressed frames 140 (eg, representing a 3D movie selected for streaming or download by a user of a client device including decoder 145). The compressed frames 140 may be received one at a time, multiple frames or blocks of frames at a time, or as a complete 3D movie. The decoder 145 may select one of the compressed frames 140, and the decoding module 160 may entropy decode, dequantize, and inverse transform the selected compressed frame.

在示例实施方式中，解码模块160使用选择的压缩帧内的数据元素，并通过熵解码(例如，使用霍夫曼代码化、算术代码化或非对称数字系统代码化的逆变换)对数据进行解压缩以产生量化的变换系数集。对量化的变换系数进行去量化，并且对去量化的变换系数进行逆变换(例如，使用KLT、DCT、SVD或ADST的逆变换)以生成可以与通过预测模块115生成的残差相同(或近似相同)的导数残差。In an example embodiment, decoding module 160 uses the selected data elements within the compressed frame and performs entropy decoding (eg, inverse transform using Huffman coding, arithmetic coding, or Asymmetric Numerical System coding) on the data. Decompress to produce a set of quantized transform coefficients. The quantized transform coefficients are dequantized, and the dequantized transform coefficients are inversely transformed (eg, using an inverse transform of KLT, DCT, SVD, or ADST) to generate residuals that may be the same (or similar to those generated by prediction module 115 ) the same) derivative residuals.

预测模块155可以被配置成确定所选择的压缩帧是否包括所存储的3D对象135之一。在示例实施方式中，元数据130可以用于确定所选择的压缩帧是否包括所存储的3D对象之一。例如，预测模块155可以在所选择的压缩帧中查询元数据130，并且如果返回元数据，则确定所选择的压缩帧包括所存储的3D对象135之一。The prediction module 155 may be configured to determine whether the selected compressed frame includes one of the stored 3D objects 135 . In an example embodiment, metadata 130 may be used to determine whether the selected compressed frame includes one of the stored 3D objects. For example, prediction module 155 may query metadata 130 in the selected compressed frame, and if metadata is returned, determine that the selected compressed frame includes one of the stored 3D objects 135 .

预测模块155可以被配置成基于返回的元数据来为所选择的压缩帧标识3D对象、3D对象的位置(例如，定位)和3D对象的定向。预测模块155可以被配置成从所存储的3D对象135中选择所存储的3D对象(以下称为所存储的3D对象)，并使用所存储的3D对象来解压缩所选择的帧。The prediction module 155 may be configured to identify, for the selected compressed frame, the 3D object, the location (eg, positioning) of the 3D object, and the orientation of the 3D object based on the returned metadata. The prediction module 155 may be configured to select a stored 3D object (hereinafter referred to as a stored 3D object) from the stored 3D objects 135 and use the stored 3D object to decompress the selected frame.

在示例实施方式中，可以基于所标识的3D对象的位置(例如，定位)和所标识的3D对象的定向来平移和定向与所存储的3D对象相对应的网格。然后，可以将对应于经平移和定向的所存储的3D对象的点与位于附近(临时)、先前解码的关键帧中的对应(例如，相同)3D对象的点进行匹配。然后，预测模块155可以使用对应的3D对象的匹配点来选择或预测关键帧中的像素/块/补片，以用于基于所选择的帧中的标识位置处的残差来再生(例如，计算)经平移的和定向的所存储的3D对象的颜色值和/或颜色属性。In an example embodiment, the grid corresponding to the stored 3D object may be translated and oriented based on the location (eg, positioning) of the identified 3D object and the orientation of the identified 3D object. The points corresponding to the translated and oriented stored 3D object can then be matched with points of the corresponding (eg, the same) 3D object located in a nearby (temporary), previously decoded keyframe. The prediction module 155 may then use the matching points of the corresponding 3D object to select or predict pixels/blocks/patches in the keyframe for regeneration based on the residuals at the identified locations in the selected frame (eg, Compute) the translated and oriented color values and/or color properties of the stored 3D object.

预测模块155可以进一步被配置成基于所选择的帧的残差或剩余部分以及关键帧的对应的像素/块/补片来为所选择的帧的剩余部分再生(例如，计算)颜色值和/或颜色属性。帧重构模块150可以被配置成基于针对经平移和定向的所存储的3D对象的再生的颜色值和/或颜色属性以及针对所选择的帧的剩余部分的再生的颜色值和/或颜色属性来重构所选择的帧。在示例实施方式中，帧重构模块150可以被配置成基于所标识的3D对象的位置将针对经平移和定向的所存储的3D对象的再生的颜色值和/或颜色属性拼接到针对所选择的帧的剩余部分的再生的颜色值和/或颜色属性。The prediction module 155 may be further configured to regenerate (eg, calculate) color values and/or for the remainder of the selected frame based on the residual or remainder of the selected frame and the corresponding pixels/blocks/patches of the keyframe. or color properties. The frame reconstruction module 150 may be configured to be based on reproduced color values and/or color attributes for the translated and oriented stored 3D objects and reproduced color values and/or color attributes for the remainder of the selected frame to reconstruct the selected frame. In an example embodiment, the frame reconstruction module 150 may be configured to stitch the reproduced color values and/or color attributes for the translated and oriented stored 3D objects to the values and/or color properties for the selected 3D objects based on the positions of the identified 3D objects. The regenerated color value and/or color attribute for the remainder of the frame.

示例实施方式可以包括标识两个或更多个3D对象，再生两个或更多个3D对象中的每个的颜色值和/或颜色属性，以及使用两个或更多个3D对象中的每个重构所选择的帧。帧重构模块150可以被配置成基于多个重构帧来再生视频数据5。可以渲染视频数据5(例如，纹理数据和颜色数据)，并校正颜色以显示在客户端设备的显示器上。Example embodiments may include identifying two or more 3D objects, regenerating color values and/or color attributes of each of the two or more 3D objects, and using each of the two or more 3D objects. reconstructed selected frames. The frame reconstruction module 150 may be configured to reproduce the video data 5 based on the plurality of reconstructed frames. Video data 5 (eg, texture data and color data) may be rendered and color corrected for display on the client device's display.

使用被配置成基于使用3D对象作为几何代理而使用颜色预测方案的编码器(例如，编码器105)来压缩视频(或视频帧)可能不会导致对视频、视频帧或多个视频帧的最高的压缩率(例如，最小的数据大小)。因此，示例实施方式可以包括使用两个或更多个编码器来压缩视频、视频帧或多个视频帧，每个编码器能够使用标识的颜色预测方案来压缩视频数据。Compressing video (or video frames) using an encoder (eg, encoder 105 ) configured to use a color prediction scheme based on the use of 3D objects as geometric proxies may not result in the highest performance on the video, video frame, or multiple video frames. compression ratio (eg, minimum data size). Thus, example embodiments may include compressing a video, a frame of video, or multiple frames of video using two or more encoders, each encoder capable of compressing video data using an identified color prediction scheme.

图2图示根据示例实施方式的用于存储压缩视频(和/或视频的压缩帧)的信号流的框图。如图2中所示，第一编码器205是基于将3D对象用作几何代理而使用颜色预测方案的编码器(例如，编码器105)。此外，至少一个第二编码器210-1是使用颜色预测方案1的编码器，至少一个第二编码器210-2是使用颜色预测方案2的编码器，并且至少一个第二编码器210-i是使用颜色预测方案i的编码器。第一编码器205和至少一个第二编码器210-1、210-2、...、210-i中的每一个可以被配置成生成总共具有x位的n个帧并将其传达给压缩大小比较器215。颜色预测方案1、2、...、i可以是编码标准的默认预测方案、编码标准的可配置预测方案、基于时间位移的自定义预测方案、基于将3D对象用作几何代理的替代预测方案等。2 illustrates a block diagram of a signal flow for storing compressed video (and/or compressed frames of video) according to an example embodiment. As shown in Figure 2, the first encoder 205 is an encoder (eg, encoder 105) that uses a color prediction scheme based on using 3D objects as geometric proxies. Furthermore, at least one second encoder 210-1 is an encoder using color prediction scheme 1, at least one second encoder 210-2 is an encoder using color prediction scheme 2, and at least one second encoder 210-i is the encoder using color prediction scheme i. Each of the first encoder 205 and the at least one second encoder 210-1, 210-2, . . . , 210-i may be configured to generate and communicate n frames having x bits in total to the Size comparator 215 . Color prediction schemes 1, 2, ..., i can be default prediction schemes for coding standards, configurable prediction schemes for coding standards, custom prediction schemes based on temporal displacement, alternative prediction schemes based on using 3D objects as geometric proxies Wait.

压缩大小比较器215可以被配置成选择第一编码器205、至少一个第二编码器210-1、210-2，...或210-i的输出中的一个以保存在视频存储125中(例如，用于稍后流传输到客户端设备)。在示例实施方式中，可以基于压缩效率来选择编码器输出。例如，可以保存具有最少位数(例如，x的最小值)的压缩视频(和/或视频的压缩帧)。Compression size comparator 215 may be configured to select one of the outputs of first encoder 205, at least one second encoder 210-1, 210-2, . . . or 210-i for saving in video storage 125 ( For example, for later streaming to client devices). In an example embodiment, the encoder output may be selected based on compression efficiency. For example, compressed video (and/or compressed frames of video) with the fewest number of bits (eg, minimum value of x) may be saved.

在另一示例实施方式中，颜色预测方案可以是优选预测方案。在该实施方式中，除非存在一些条件，否则保存优选颜色预测方案的压缩视频(和/或视频的压缩帧)。例如，该条件可以基于压缩效率。除非至少一个第二编码器210-1、210-2、...、210-i之一的输出至少为比第一编码器205的输出更有效的阈值百分比(例如，10％、15％、20％等)，否则可以选择第一编码器205的输出。如果至少一个第二编码器210-1、210-2，...、210-i中的一个以上至少是比第一编码器205的输出更有效的阈值百分百(例如，10％、15％、20％等)，可以保存至少一个第二编码器210-1、210-2、...、210-i的最高效率(例如，x的最小值)。In another example embodiment, the color prediction scheme may be the preferred prediction scheme. In this embodiment, unless some conditions exist, the compressed video (and/or the compressed frames of the video) of the preferred color prediction scheme is saved. For example, the condition may be based on compression efficiency. Unless the output of at least one of the second encoders 210-1, 210-2, . . . , 210-i is at least a threshold percentage more efficient than the output of the first encoder 205 (eg 20%, etc.), otherwise the output of the first encoder 205 can be selected. If more than one of the at least one second encoder 210-1, 210-2, . . . , 210-i is at least a threshold percent more efficient than the output of the first encoder 205 (eg, 10%, 15 %, 20%, etc.), the highest efficiency (eg, minimum value of x) of at least one second encoder 210-1, 210-2, . . . , 210-i can be saved.

在示例实施方式中，图2中图示的用于存储压缩视频(和/或视频的压缩帧)的信号流可以在逐帧视频和/或在帧集上执行。例如，视频的每个帧可以如上所述被编码。然后，压缩大小比较器215可以基于效率或基于有条件的优选颜色预测方案来选择编码器输出。在示例实施方式中，例如，关键帧可以用于在逐帧或帧集压缩之间进行选择。In an example embodiment, the signal flow illustrated in FIG. 2 for storing compressed video (and/or compressed frames of video) may be performed on frame-by-frame video and/or on frame sets. For example, each frame of the video may be encoded as described above. The compression size comparator 215 may then select the encoder output based on efficiency or based on a conditional preferred color prediction scheme. In an example embodiment, for example, keyframes may be used to select between frame-by-frame or frameset compression.

在示例实施方式中，可以始终保存编码器输出之一。例如，可以保存默认的颜色预测方案以确保向后兼容。在图2中，该实施方式被图示为来自至少一个第二编码器210-i的虚线。In an example embodiment, one of the encoder outputs may always be saved. For example, the default color prediction scheme can be saved to ensure backward compatibility. In Figure 2, this embodiment is illustrated as a dashed line from at least one second encoder 210-i.

可以利用不同的代码化标准多次执行图2中图示的用于存储压缩视频(和/或视频的压缩帧)的信号流。例如，可以使用H.264、H.265、HEVC、VP8、VP9、VP10、AV1等代码化标准中的两个或更多个来执行图2中图示的信号流。因此，可以在两个或更多个代码化标准中并且在同一视频(例如，视频数据5)上实现基于将3D对象用作几何代理的颜色预测方案(例如，编码器105)。因此，视频存储125可以存储视频的多个实例(和/或视频的压缩帧)，每个实例已经使用不同的标准和/或标准的不同配置进行压缩。结果，流传输服务器可以基于请求客户端设备和/或要通过其传达视频的网络的能力来服务于视频。The signal flow illustrated in FIG. 2 for storing compressed video (and/or compressed frames of video) may be performed multiple times using different coding standards. For example, the signal flow illustrated in FIG. 2 may be performed using two or more of H.264, H.265, HEVC, VP8, VP9, VP10, AV1, etc. coding standards. Thus, a color prediction scheme (eg, encoder 105) based on the use of 3D objects as geometric proxies can be implemented in two or more coding standards and on the same video (eg, video data 5). Thus, video store 125 may store multiple instances of video (and/or compressed frames of video), each instance having been compressed using a different standard and/or a different configuration of the standard. As a result, the streaming server may serve the video based on the capabilities of the requesting client device and/or the network over which the video is to be communicated.

图3A图示根据示例实施方式的编码器预测模块的框图。如图3A中所示，预测模块115包括帧110、3D对象定位器模块305、3D对象匹配模块310、所存储的3D对象135、所存储的3D对象平移模块315、关键帧320、块匹配模块325和残差生成模块330。3A illustrates a block diagram of an encoder prediction module according to an example embodiment. As shown in Figure 3A, prediction module 115 includes frame 110, 3D object locator module 305, 3D object matching module 310, stored 3D objects 135, stored 3D object translation module 315, keyframes 320, block matching module 325 and a residual generation module 330.

在示例实施方式中，所存储的3D对象135和帧110中的3D对象可以由三角形网格定义。三角形网格可以是由三角形面连接的点的集合。每个点可以存储各种属性。例如，属性可以包括每个点的位置、颜色、纹理坐标等。属性可以包括和/或指示(例如，多个属性可以指示)对应的3D对象的定向和/或对应的3D对象在视频的帧110和/或图像中的位置。In an example embodiment, the stored 3D objects 135 and the 3D objects in frame 110 may be defined by a triangular mesh. A triangular mesh can be a collection of points connected by triangular faces. Each point can store various properties. For example, attributes can include the location, color, texture coordinates, etc. of each point. The attributes may include and/or indicate (eg, a plurality of attributes may indicate) the orientation of the corresponding 3D object and/or the position of the corresponding 3D object in the frame 110 and/or the image of the video.

3D对象定位器模块305可以被配置成在帧110和关键帧320中标识和定位3D对象(在下文中称为3D对象)。例如，机器视觉技术、计算机视觉技术、计算机图像识别技术等等可用于标识和定位帧110和关键帧320中的3D对象。一旦已经标识出3D对象，可以使用坐标系统(例如，2D笛卡尔、3D笛卡尔、极坐标等)作为与所标识的3D对象的网格相关联的点的属性来定位3D对象。The 3D object locator module 305 may be configured to identify and locate 3D objects (hereinafter referred to as 3D objects) in frames 110 and keyframes 320 . For example, machine vision techniques, computer vision techniques, computer image recognition techniques, and the like may be used to identify and locate 3D objects in frame 110 and key frame 320 . Once the 3D object has been identified, the 3D object can be positioned using a coordinate system (eg, 2D Cartesian, 3D Cartesian, polar, etc.) as an attribute of the points associated with the mesh of the identified 3D object.

在示例实施方式中，使用多个已知图像基于训练(机器学习)卷积神经网络的计算机图像识别技术可以用于标识3D对象。例如，从所选择的帧中选择和/或标识一个块、多个块和/或补片。经训练的卷积神经网络可以对所选择的块、多个块和/或补片进行操作。可以测试结果(例如，错误测试、损耗测试、分散测试等)。如果测试产生低于阈值的值(或可替选的，取决于测试类型而高于阈值的值)阈值，则可以将所选择的块、多个块和/或补片标识为3D对象。In an example embodiment, a computer image recognition technique based on a trained (machine learning) convolutional neural network using a number of known images may be used to identify 3D objects. For example, a block, blocks and/or patches are selected and/or identified from the selected frame. The trained convolutional neural network may operate on the selected block, multiple blocks, and/or patches. Results can be tested (eg, error testing, loss testing, scatter testing, etc.). If the test yields a value below the threshold (or alternatively, a value above the threshold depending on the type of test) the threshold, the selected block, blocks and/or patches may be identified as a 3D object.

在示例实施方式中，包括先前标识的感兴趣3D对象的视频帧还可以(例如，在头部、帧元数据等中)包括指示先前标识的感兴趣3D对象被包括在帧中的标签。标签可以包括3D对象的身份和定位(例如，与网格相关联的点的坐标属性)。例如，可以使用计算机生成图像(CGI)工具(例如，计算机动画电影)来生成视频。可以在每帧中标识和标记计算机生成的字符。此外，可以将针对每个所标识的感兴趣3D对象(例如，动画电影中的角色)的(例如，由三角形网格定义的)模型存储为所存储的3D对象135。In an example embodiment, a video frame that includes a previously identified 3D object of interest may also include (eg, in a header, frame metadata, etc.) a tag indicating that the previously identified 3D object of interest is included in the frame. Labels may include the identity and location of the 3D object (eg, coordinate attributes of points associated with the grid). For example, a computer-generated image (CGI) tool (eg, a computer-animated film) can be used to generate the video. Computer-generated characters can be identified and tagged in each frame. Additionally, a model (eg, defined by a triangular mesh) for each identified 3D object of interest (eg, a character in an animated movie) may be stored as stored 3D object 135 .

在示例实施方式中，3D对象的网格属性可以足以标识和定位3D对象。可以将包括用于多个通用感兴趣3D对象的网格属性的模型存储在3D对象定位器模块中(或与之相关联)。可以标准化该模型。例如，可以存储用于男人、女人、青少年、儿童或更一般地人类或人类的一部分(例如，身体、头部、手等)的模型。例如，可以存储用于狗、猫、鹿或更一般的四足动物或四足动物的一部分(例如，身体，头部，腿等)的模型。然后，可以使用模型的属性来在该帧中搜索具有相似属性的3D对象。In an example embodiment, the mesh properties of the 3D object may be sufficient to identify and locate the 3D object. A model including mesh properties for a number of generic 3D objects of interest may be stored in (or associated with) a 3D object locator module. The model can be normalized. For example, models for men, women, adolescents, children, or more generally humans or parts of humans (eg, bodies, heads, hands, etc.) may be stored. For example, models for dogs, cats, deer, or more generally quadrupeds or parts of quadrupeds (eg, body, head, legs, etc.) can be stored. The properties of the model can then be used to search for 3D objects with similar properties in that frame.

3D对象匹配模块310可以被配置为将位于帧110中的3D对象(在下文中，称为3D对象)与所存储的3D对象135之一进行匹配。在示例实施方式中，计算机图像识别技术由3D对象定位器模块305使用来标识3D对象。标识3D对象还可以包括将来自ID数据库的唯一ID分配给3D对象。唯一ID可以被用来在所存储的3D对象135中搜索3D对象。如果在所存储的3D对象135中找到唯一ID，则所存储的3D对象中的相应一个与该3D对象相匹配。The 3D object matching module 310 may be configured to match a 3D object (hereinafter, referred to as a 3D object) located in the frame 110 with one of the stored 3D objects 135 . In an example embodiment, computer image recognition techniques are used by 3D object locator module 305 to identify 3D objects. Identifying the 3D object may also include assigning a unique ID from the ID database to the 3D object. The unique ID can be used to search the stored 3D objects 135 for 3D objects. If the unique ID is found in the stored 3D objects 135, then the corresponding one of the stored 3D objects matches the 3D object.

在示例实施方式中，在帧中找到的3D对象的标签对3D对象可以是唯一的(例如，在标签和3D对象之间具有一对一关系)。标签可以被用来在所存储的3D对象135中搜索3D对象。如果在所存储的3D对象135中找到标签，则所存储的3D对象中的相应一个与该3D对象相匹配。In an example embodiment, the tags of the 3D objects found in the frame may be unique to the 3D objects (eg, have a one-to-one relationship between tags and 3D objects). Tags can be used to search for 3D objects in stored 3D objects 135 . If a tag is found in the stored 3D objects 135, then the corresponding one of the stored 3D objects is matched with the 3D object.

在示例实施方式中，所存储的3D对象135之一与3D对象具有相似属性的模型可以被标识为部分匹配。然后可以基于部分匹配来过滤所存储的3D对象135。然后，可以使用3D对象的一个或多个属性或属性的组合来在所存储的3D对象135中搜索该3D对象。如果在所存储的3D对象135中找到一个或多个属性或属性的组合，则所存储的3D对象中的相应一个是该3D对象的匹配。一个或多个属性或属性的组合可以以相对高的确定性，相对于所存储的3D对象135，唯一地标识3D对象。例如，身体部位的形状或相对位置(例如，面部形状、鼻子形状、眼睛,鼻子和嘴巴的相对位置等)、卡通人物的头发或皮肤的颜色、角色所穿戴的对象(例如珠宝)的形状和相对定位、车辆类型(例如汽车、拖拉机等)等可以相对于所存储的3D对象135，唯一地标识3D对象(例如，如果所存储的3D对象135对应于视频的感兴趣3D对象集)。In an example embodiment, a model in which one of the stored 3D objects 135 has similar properties to the 3D object may be identified as a partial match. The stored 3D objects 135 may then be filtered based on the partial matches. The stored 3D object 135 may then be searched for the 3D object using one or more properties or a combination of properties of the 3D object. If one or more attributes or combination of attributes are found in the stored 3D objects 135, then the corresponding one of the stored 3D objects is a match for that 3D object. One or more attributes, or combination of attributes, may uniquely identify the 3D object relative to the stored 3D object 135 with a relatively high degree of certainty. For example, the shape or relative position of body parts (e.g., face shape, nose shape, eyes, relative position of nose and mouth, etc.), the color of a cartoon character's hair or skin, the shape of objects worn by the character (e.g., jewelry), and Relative positioning, vehicle type (eg, car, tractor, etc.), etc., may uniquely identify the 3D object relative to the stored 3D object 135 (eg, if the stored 3D object 135 corresponds to the video's set of 3D objects of interest).

如果3D对象匹配模块310在所存储的3D对象中未找到3D对象的匹配，则预测模块115可以使用由代码化标准定义的标准预测技术。换句话说，基于将3D对象用作几何代理的颜色预测方案可以不被用于该3D对象。If the 3D object matching module 310 does not find a match for the 3D object among the stored 3D objects, the prediction module 115 may use standard prediction techniques defined by coding standards. In other words, a color prediction scheme based on using a 3D object as a geometric proxy may not be used for the 3D object.

如果3D对象匹配模块310未在所存储的3D对象135中找到3D对象的匹配，则3D对象匹配模块310可以被配置为将3D对象添加到所存储的3D对象135中。例如，3D对象匹配模块310可以被配置为向3D对象分配唯一ID或唯一标签，并且为该3D对象定义模型(例如，包括点和对应属性的三角形网格)。然后可以将模型存储为所存储的3D对象135之一，并且由唯一ID或唯一标签标识。If the 3D object matching module 310 does not find a match for the 3D object in the stored 3D objects 135 , the 3D object matching module 310 may be configured to add the 3D object to the stored 3D objects 135 . For example, 3D object matching module 310 may be configured to assign a unique ID or unique label to a 3D object and define a model (eg, a triangular mesh including points and corresponding attributes) for the 3D object. The model can then be stored as one of the stored 3D objects 135 and identified by a unique ID or unique tag.

在示例实施方式中，所存储的3D对象135可以具有预定网格表示和/或数据结构。因此，模型可以具有基于所存储的3D对象135的设计而预定义的大小(例如，点的数量、面的数量、顶点的数量等)、属性的数量、属性的类型等。此外，使该3D对象与所存储的3D对象135中的3D对象匹配可以包括在所存储的3D对象135中搜寻匹配之前，由3D对象匹配模块310基于所存储的3D对象135的设计来重新定义3D对象。In an example embodiment, the stored 3D objects 135 may have a predetermined grid representation and/or data structure. Thus, the model may have a predefined size (eg, number of points, number of faces, number of vertices, etc.), number of attributes, types of attributes, etc. that are predefined based on the stored design of 3D object 135 . Additionally, matching the 3D object to 3D objects in the stored 3D objects 135 may include redefining by the 3D object matching module 310 based on the design of the stored 3D objects 135 before searching for a match in the stored 3D objects 135 3D objects.

所存储的3D对象平移模块315可以被配置为平移(或变换)所存储的3D对象135中的所匹配的一个(在下文中，称为所存储的3D对象)。例如，对应于所存储的3D对象的网格可以被平移和定向以与3D对象的定向对准。平移和定向一起可以被称为变换。所存储的3D对象平移模块315可以被配置为变换与帧110和关键帧320相关联的所存储的3D对象135中的所匹配的一个。因此，所存储的3D对象平移模块315可以被配置为基于与帧110相关联的针对3D对象的所存储的3D对象生成并变换第一3D对象代理。此外，所存储的3D对象平移模块315可以被配置为基于与关键帧320相关联的针对3D对象的所存储的3D对象生成和变换第二3D对象代理。The stored 3D object translation module 315 may be configured to translate (or transform) a matched one of the stored 3D objects 135 (hereinafter, referred to as the stored 3D object). For example, a grid corresponding to a stored 3D object may be translated and oriented to align with the orientation of the 3D object. Translation and orientation together may be referred to as transformations. Stored 3D object translation module 315 may be configured to transform a matched one of stored 3D objects 135 associated with frame 110 and keyframe 320 . Accordingly, the stored 3D object translation module 315 may be configured to generate and transform the first 3D object proxy based on the stored 3D object associated with the frame 110 for the 3D object. Additionally, the stored 3D object translation module 315 may be configured to generate and transform a second 3D object proxy based on the stored 3D object associated with the keyframe 320 for the 3D object.

变换3D代理可以包括通过铰接网格点(例如，像骨架)，使用自动编码器生成潜在表示(如下所述)和定位与潜在表示相关联的值，发送用于Bezier，或NURBS曲面或细分曲面的控制点和值。每个网格可以具有预定连通性。预定连通性可以允许处于不同姿势的两个网格的点之间的直接对应。因此，我们可以对两个网格使用相同的纹理参数化。Transforming 3D proxies can include generating latent representations using autoencoders (described below) by articulating mesh points (like skeletons, for example) and locating values associated with latent representations, sending for Bezier, or NURBS surfaces or subdivisions Control points and values for the surface. Each grid may have predetermined connectivity. The predetermined connectivity may allow direct correspondence between points of two meshes in different poses. Therefore, we can use the same texture parameterization for both meshes.

所存储的3D对象平移模块315可以被配置为生成元数据20。元数据20包括标识所存储的3D对象的信息以及与所存储的3D对象的平移和定向有关的信息。与所存储的3D对象的平移和定向有关的信息可以用于在将来(例如，由解码器145)执行所存储的3D对象的相同平移和定向。在示例实施方式中，针对第一3D对象代理和第二3D对象代理两者生成和存储元数据。The stored 3D object translation module 315 may be configured to generate the metadata 20 . The metadata 20 includes information identifying the stored 3D object and information related to the translation and orientation of the stored 3D object. Information about the stored translation and orientation of the 3D object can be used to perform the same translation and orientation of the stored 3D object in the future (eg, by decoder 145). In an example embodiment, metadata is generated and stored for both the first 3D object proxy and the second 3D object proxy.

块匹配模块325可以被配置为匹配与关键帧320的经平移和定向的所存储的3D对象相对应的帧110的经平移和定向的所存储的3D对象的点。在示例实施方式中，块匹配模块325可以被配置为用颜色和/或纹理属性包装表示经平移和定向的所存储的3D对象的网格(例如3D网格)。例如，块匹配模块325可以被配置为将颜色属性从在帧110中标识的3D对象映射到经变换的第一3D对象代理。此外，块匹配模块325可以被配置为将颜色属性从在关键帧320中标识的3D对象映射到经变换的第二3D对象代理。将颜色属性从在帧110中标识的3D对象映射到经变换的第一3D对象代理可以包括将3D对象从3D空间(例如XYZ空间)转换到2D空间(例如UV空间)和/或将在帧110中标识的3D对象从2D空间(例如，UV空间)转换到3D空间(例如，XYZ空间)。Block matching module 325 may be configured to match points of the translated and oriented stored 3D object of frame 110 corresponding to the translated and oriented stored 3D object of key frame 320 . In an example embodiment, the block matching module 325 may be configured to wrap a mesh (eg, a 3D mesh) representing the translated and oriented stored 3D object with color and/or texture attributes. For example, block matching module 325 may be configured to map color attributes from the 3D object identified in frame 110 to the transformed first 3D object proxy. Additionally, block matching module 325 may be configured to map color attributes from the 3D object identified in keyframe 320 to the transformed second 3D object proxy. Mapping the color properties from the 3D object identified in frame 110 to the transformed first 3D object proxy may include converting the 3D object from 3D space (eg, XYZ space) to 2D space (eg, UV space) and/or converting the 3D object from 3D space (eg, XYZ space) to 2D space (eg, UV space) The 3D objects identified in 110 are converted from 2D space (eg, UV space) to 3D space (eg, XYZ space).

每个网格可以具有预定连通性。预定连通性可以允许处于不同姿势的两个网格的点之间的直接对应。因此，我们可以对两个网格使用相同的纹理参数化。因此，映射颜色属性可以包括基于帧中的像素坐标来标识帧(例如，帧110和/或关键帧320)中的像素，然后将具有相同坐标的经变换的3D对象代理的网格表示中的点的颜色属性设置为所标识的像素的相同颜色值。在示例实施方式中，可以使用一个以上的帧来生成像素属性。此外，来自再生帧(例如，来自编码器中的再生循环)的像素属性可以被用来生成像素属性。Each grid may have predetermined connectivity. The predetermined connectivity may allow direct correspondence between points of two meshes in different poses. Therefore, we can use the same texture parameterization for both meshes. Thus, mapping color properties may include identifying pixels in a frame (eg, frame 110 and/or keyframe 320 ) based on pixel coordinates in the frame, and then mapping pixels in a grid representation of the transformed 3D object proxy with the same coordinates The color property of the point is set to the same color value of the identified pixel. In example embodiments, more than one frame may be used to generate pixel attributes. Additionally, pixel attributes from regenerated frames (eg, from regeneration loops in the encoder) can be used to generate pixel attributes.

在示例实施方式中，可以混合像素属性。混合像素属性(或纹理)可以使用来自那些纹理的相应纹素(texel)的每个纹素平均值来完成，这些纹理具有有关该纹素的信息，来自不同纹理的纹素可能具有置信度值(例如，当前帧与由其生成该纹理的帧之间的时间距离)，还可以预测特定纹理中未观察到的纹素(降低它们的置信度)，对于当前帧，使用网格的新姿势及其纹理来获取对当前帧(的一部分)的预测等。In an example embodiment, pixel attributes may be blended. Blending pixel attributes (or textures) can be done using per-texel averages of corresponding texels from those textures that have information about that texel, texels from different textures may have confidence values (e.g. the temporal distance between the current frame and the frame from which this texture was generated), also predict unobserved texels in a particular texture (reduce their confidence), for the current frame, use the new pose of the mesh and its textures to get predictions for (part of) the current frame, etc.

在一些情况下，帧110和/或关键帧320可能不具有与经变换的3D对象代理的网格表示上的点相对应的像素。因此，经变换的3D对象代理的网格表示可以保留这些点的默认颜色属性。因此，任何残差颜色计算都可以基于默认颜色属性。In some cases, frame 110 and/or keyframe 320 may not have pixels corresponding to points on the meshed representation of the transformed 3D object proxy. Thus, the mesh representation of the transformed 3D object proxy may retain the default color properties of the points. Therefore, any residual color calculations can be based on the default color properties.

然后，块匹配模块325可以使用相应的3D对象的匹配点来选择或预测关键帧320中的像素/块/补片，以用于生成帧110中的3D对象的残差15。在示例实施方式中，在平移和匹配之前，可以使用自动编码器对所存储的3D对象和/或3D对象代理进行编码(在下文更详细地描述)。编码所存储的3D对象或3D对象代理将3D对象、所存储的3D对象和/或3D对象代理转换为潜在表示。潜在表示包括比表示3D对象、所存储的3D对象和/或3D对象代理的网格更少的值(例如，点)。因此，将3D对象、所存储的3D对象和/或3D对象代理平移为潜在表示包括操纵更少的点。此外，映射颜色属性包括当将3D对象、所存储的3D对象和/或3D对象代理编码为潜在表示时，映射较少的点。The block matching module 325 may then use the matching points of the corresponding 3D objects to select or predict pixels/blocks/patches in the key frame 320 for use in generating the residuals 15 for the 3D objects in the frame 110 . In an example embodiment, the stored 3D objects and/or 3D object proxies may be encoded using an auto-encoder (described in more detail below) prior to translation and matching. Encoding the stored 3D object or 3D object proxy converts the 3D object, the stored 3D object and/or the 3D object proxy into a latent representation. The latent representation includes fewer values (eg, points) than the mesh representing the 3D object, the stored 3D object, and/or the 3D object proxy. Thus, translating a 3D object, stored 3D object and/or 3D object proxy into a latent representation involves manipulating fewer points. Furthermore, mapping color properties includes mapping fewer points when encoding 3D objects, stored 3D objects, and/or 3D object proxies as latent representations.

残差生成模块330可以被配置为生成(或计算)帧110中的3D对象的残差15(例如，相对于关键帧320的颜色位移)。例如，残差生成模块330可以通过从用于关键帧320中的预测像素/块/补片的每个匹配点的像素属性值中减去用于3D对象的三角形网格中的每个点的像素属性值来生成残差15。在示例实施方式中，可以从第二3D对象代理的网格表示中的相应点(例如，具有相同的点标识或处于网格序列中的相同位置)的颜色属性中减去第一3D对象代理的网格表示中的点的颜色属性。Residual generation module 330 may be configured to generate (or calculate) residuals 15 (eg, color shifts relative to keyframes 320 ) for the 3D objects in frame 110 . For example, the residual generation module 330 may subtract the value of each point in the triangular mesh for the 3D object from the pixel attribute value for each matching point of the predicted pixel/block/patch in the keyframe 320 pixel attribute values to generate residuals 15. In an example embodiment, the first 3D object proxy may be subtracted from the color attribute of the corresponding point in the grid representation of the second 3D object proxy (eg, having the same point identification or being in the same position in the grid sequence) The color property of the points in the grid representation.

在示例实施方式中，编码器105包括重构路径(未示出)。重构路径包括使用上述编码过程的至少一个逆过程而一起解码帧的若干组件或软件实现。例如，重构路径可以至少包括逆量化过程和逆变换过程。当编码下一连续帧时，可以使用在重构路径中生成的重构帧来代替关键帧320。编码原始帧会造成一些损失。因此，与原始帧相比，在重构路径中生成的重构帧包括一些伪像(例如，误差)。在预测模块115中可以校正也可以不校正这些伪像。在一些实施方式中，可以通过下文描述的颜色校正模块635来校正使用重构帧生成的伪像。In an example embodiment, the encoder 105 includes a reconstruction path (not shown). The reconstruction path includes several components or software implementations that decode the frame together using at least one inverse of the encoding process described above. For example, the reconstruction path may include at least an inverse quantization process and an inverse transform process. When encoding the next consecutive frame, the reconstructed frame generated in the reconstruction path may be used in place of the key frame 320 . There is some loss in encoding raw frames. Therefore, the reconstructed frame generated in the reconstruction path includes some artifacts (eg, errors) compared to the original frame. These artifacts may or may not be corrected in prediction module 115 . In some embodiments, artifacts generated using reconstructed frames may be corrected by the color correction module 635 described below.

图3B示出了根据示例实施方式的另一编码器预测模块的框图。如图3B所示，预测模块115包括帧110、3D对象定位器模块305、3D对象匹配模块310、所存储的3D对象135、所存储的3D对象平移模块315、块匹配模块325和残差生成模块330。3B shows a block diagram of another encoder prediction module according to an example embodiment. As shown in Figure 3B, prediction module 115 includes frame 110, 3D object locator module 305, 3D object matching module 310, stored 3D objects 135, stored 3D object translation module 315, block matching module 325, and residual generation Module 330.

图3B所示的预测模块115的实现基本上与图3A所示的预测模块115相同。但是，图3B中所示的预测模块115不基于关键帧生成残差15。相反，图3B中所示的预测模块115的残差生成模块330使用3D对象以及经平移和定向的所存储的3D对象来生成残差15。The implementation of the prediction module 115 shown in FIG. 3B is basically the same as the prediction module 115 shown in FIG. 3A . However, the prediction module 115 shown in Figure 3B does not generate the residual 15 based on the keyframes. In contrast, residual generation module 330 of prediction module 115 shown in FIG. 3B uses 3D objects and the translated and oriented stored 3D objects to generate residuals 15 .

在示例实施方式中，经平移和定向的所存储的3D对象与3D对象具有直接的点对点关系。换句话说，在定义经平移和定向的所存储的3D对象的三角形网格中的点(例如，位置属性为x₁,y₁,z₁)在定义3D对象的三角形网格中具有相应的点(例如，位置属性为x₁,y₁,z₁)。因此，颜色属性可以用于确定3D对象和所存储的3D对象之间的相对颜色位移。因此，残差生成模块330可以被配置为基于所存储的对象来生成(或计算)帧110中的3D对象的残差15(例如，颜色位移)。例如，残差生成模块330可以通过从经平移和定向的所存储的3D对象的三角形网格中的相应点的像素属性值中减去用于3D对象的三角形网格中的每个点的像素属性值来生成残差15。In an example embodiment, the translated and oriented stored 3D object has a direct point-to-point relationship with the 3D object. In other words, a point in the triangle mesh that defines the translated and oriented stored 3D object (eg, the position attributes are x ₁ , y ₁ , z ₁ ) have a corresponding triangle mesh in the triangle mesh that defines the 3D object Points (for example, the location attributes are x ₁ , y ₁ , z ₁ ). Thus, the color property can be used to determine the relative color displacement between the 3D object and the stored 3D object. Accordingly, residual generation module 330 may be configured to generate (or calculate) residuals 15 (eg, color shifts) for 3D objects in frame 110 based on the stored objects. For example, residual generation module 330 may subtract the pixel for each point in the triangular mesh of the 3D object from the translated and oriented stored pixel attribute value of the corresponding point in the triangular mesh of the 3D object attribute values to generate residuals 15.

在示例实施方式中，可以使用经平移的3D对象代理(例如上述第一3D对象代理)和具有默认颜色属性的所存储的3D对象来生成残差，该经平移的3D对象代理基于与帧110相关联的针对3D对象的所存储的3D对象，该所存储的3D对象已经映射有帧110的颜色。因此，可以从所存储的3D对象的网格表示中的相应点(例如，具有相同的点标识或处于网格序列中的相同位置)的颜色属性中减去第一3D对象代理的网格表示中的点的颜色属性。In an example embodiment, the residuals may be generated using a translated 3D object proxy (eg, the first 3D object proxy described above) based on and frame 110 and a stored 3D object with default color properties The associated stored 3D object for the 3D object that has been mapped with the colors of frame 110 . Thus, the grid representation of the first 3D object proxy can be subtracted from the color properties of the corresponding points in the stored grid representation of the 3D object (eg, having the same point identification or being in the same position in the grid sequence) The color property of the point in .

图4A示出了根据示例实施方式的解码器预测模块的框图。如图4A所示，预测模块155包括帧405、元数据模块410、3D对象匹配模块415、所存储的3D对象135、所存储的3D对象平移模块420、关键帧320、块匹配模块325和像素再生模块425。4A shows a block diagram of a decoder prediction module according to an example embodiment. As shown in FIG. 4A, prediction module 155 includes frame 405, metadata module 410, 3D object matching module 415, stored 3D objects 135, stored 3D object translation module 420, keyframes 320, block matching module 325, and pixels Regeneration module 425.

帧405是选自压缩帧140中的作为要解压缩的帧的压缩帧。关键帧320是与帧405相关联的解压缩关键帧(附近、时间上更早和/或时间上更晚)。帧405已经由解码模块160进行了熵解码、逆量化和逆变换。因此，帧405可以包括导数残差，该导数残差可以与预测模块115生成的残差相同(或近似相同)。Frame 405 is a compressed frame selected from compressed frames 140 as a frame to be decompressed. Keyframe 320 is the decompressed keyframe associated with frame 405 (nearby, earlier in time, and/or later in time). Frame 405 has been entropy decoded, inverse quantized, and inverse transformed by decoding module 160 . Accordingly, frame 405 may include a derivative residual, which may be the same (or approximately the same) as the residual generated by prediction module 115 .

元数据模块410可以被配置为确定帧405是否包括元数据和/或具有相关联的元数据(例如，元数据130)。元数据模块410可以被配置为从与帧405相关联的报头中读取元数据。元数据可以包括元数据20。元数据可以包括与位于帧405中的至少一个3D对象相关联的数据，该至少一个3D对象已经使用所存储的3D对象135中的一个进行了预测。元数据可以包括与帧405中的3D对象的定位和/或定向相关联的属性(例如，网格点属性)。元数据可以包括标识所存储的3D对象的信息和与所存储的3D对象的平移和定向有关的信息。与所存储的3D对象的平移和定向有关的信息可以用于执行与由所存储的3D对象平移模块315执行的所存储的3D对象的相同平移和定向。Metadata module 410 may be configured to determine whether frame 405 includes metadata and/or has associated metadata (eg, metadata 130). Metadata module 410 may be configured to read metadata from headers associated with frame 405 . The metadata may include metadata 20 . The metadata may include data associated with at least one 3D object located in frame 405 that has been predicted using one of the stored 3D objects 135 . The metadata may include attributes (eg, grid point attributes) associated with the positioning and/or orientation of the 3D object in frame 405 . The metadata may include information identifying the stored 3D object and information related to the translation and orientation of the stored 3D object. Information about the stored translation and orientation of the 3D object can be used to perform the same translation and orientation of the stored 3D object as performed by the stored 3D object translation module 315 .

3D对象匹配模块415可以被配置为将位于帧405中的3D对象(在下文中，称为3D对象)与所存储的3D对象135中的一个进行匹配。在示例实施方式中，元数据模块410输出可以被用来在所存储的3D对象135中搜索3D对象的唯一ID或标记。如果在所存储的3D对象135中找到唯一ID或标签，则所存储的3D对象135中的相应一个(在下文中，被称为所存储的3D对象)是3D对象的匹配。The 3D object matching module 415 may be configured to match a 3D object (hereinafter, referred to as a 3D object) located in the frame 405 with one of the stored 3D objects 135 . In an example embodiment, the metadata module 410 outputs a unique ID or tag that can be used to search the stored 3D objects 135 for 3D objects. If a unique ID or tag is found in the stored 3D objects 135, a corresponding one of the stored 3D objects 135 (hereinafter, referred to as stored 3D objects) is a match of the 3D objects.

所存储的3D对象平移模块420可以被配置为平移所存储的3D对象。例如，在压缩帧405之前，当在帧405中定向3D对象时，可以平移和定向与所存储的3D对象相对应的网格以与3D对象的定向对齐。换句话说，当在帧110中定向3D对象时，可以平移和定向与所存储的3D对象相对应的网格以与3D对象的定向对齐。所存储的3D对象平移模块420可以被配置为基于与元数据中包括的所存储的3D对象的平移和定向有关的信息来平移所存储的3D对象。The stored 3D object translation module 420 may be configured to translate the stored 3D objects. For example, prior to compressing frame 405, when orienting a 3D object in frame 405, a grid corresponding to the stored 3D object may be translated and oriented to align with the orientation of the 3D object. In other words, when a 3D object is oriented in frame 110, the grid corresponding to the stored 3D object may be translated and oriented to align with the orientation of the 3D object. The stored 3D object translation module 420 may be configured to translate the stored 3D object based on information about the translation and orientation of the stored 3D object included in the metadata.

所存储的3D对象平移模块420可以被配置为平移与帧405和关键帧320相关联的所存储的3D对象135中的所匹配的一个。因此，所存储的3D对象平移模块315可以被配置为基于与帧405相关联的针对3D对象的所存储的3D对象而生成第一3D对象代理。此外，所存储的3D对象平移模块315可以被配置成基于与关键帧320相关联的针对3D对象的所存储的3D对象而生成第二3D对象代理。在示例实施方式中，第二3D对象代理可以每个关键帧320生成一次，然后用于每个相关联的帧405。Stored 3D object translation module 420 may be configured to translate a matched one of stored 3D objects 135 associated with frame 405 and keyframe 320 . Accordingly, the stored 3D object translation module 315 may be configured to generate a first 3D object proxy based on the stored 3D object for the 3D object associated with the frame 405 . Additionally, the stored 3D object translation module 315 may be configured to generate a second 3D object proxy based on the stored 3D object for the 3D object associated with the keyframe 320 . In an example embodiment, the second 3D object proxy may be generated once per keyframe 320 and then used for each associated frame 405 .

块匹配模块325可以被配置为匹配与关键帧320的经平移和定向的所存储的3D对象相对应的帧405的经平移和定向的所存储的3D对象的点。在示例实施方式中，块匹配模块325可以被配置为将颜色属性从帧110中标识的3D对象映射到经变换的第一3D对象代理。此外，块匹配模块325可以被配置为将颜色属性从关键帧320中标识的3D对象映射到经变换的第二3D对象代理。将颜色属性从帧110中标识的3D对象映射到经变换的第一3D对象代理可以包括将3D对象从3D空间(例如XYZ空间)转换到2D空间(例如UV空间)和/或将在帧110中标识的3D对象从2D空间(例如，UV空间)转换到3D空间(例如，XYZ空间)。Block matching module 325 may be configured to match points of the translated and oriented stored 3D object of frame 405 corresponding to the translated and oriented stored 3D object of keyframe 320 . In an example embodiment, the block matching module 325 may be configured to map color attributes from the 3D object identified in the frame 110 to the transformed first 3D object proxy. Additionally, block matching module 325 may be configured to map color attributes from the 3D object identified in keyframe 320 to the transformed second 3D object proxy. Mapping the color properties from the 3D object identified in frame 110 to the transformed first 3D object proxy may include converting the 3D object from 3D space (eg, XYZ space) to 2D space (eg, UV space) and/or converting the 3D object from 3D space (eg, XYZ space) to 2D space (eg, UV space) at frame 110 The 3D objects identified in are converted from 2D space (eg, UV space) to 3D space (eg, XYZ space).

在示例实施方式中，映射颜色属性可以包括基于帧中的像素坐标来标识帧(例如，帧405和/或关键帧320)中的像素，然后将经变换的3D对象代理的网格表示中的具有相同坐标的点的颜色属性设置为所标识的像素的相同颜色值。在一些情况下，帧405和/或关键帧320可以不具有与经变换的3D对象代理的网格表示上的点相对应的像素。因此，经变换的3D对象代理的网格表示可以保留这些点的默认颜色属性。因此，任何残差颜色计算都可以基于默认颜色。In an example embodiment, mapping color attributes may include identifying pixels in a frame (eg, frame 405 and/or keyframe 320 ) based on pixel coordinates in the frame, and then mapping pixels in a mesh representation of the transformed 3D object proxy The color properties of points with the same coordinates are set to the same color value of the identified pixel. In some cases, frame 405 and/or keyframe 320 may not have pixels corresponding to points on the meshed representation of the transformed 3D object proxy. Thus, the mesh representation of the transformed 3D object proxy may retain the default color properties of the points. Therefore, any residual color calculation can be based on the default color.

然后，块匹配模块325可以使用相应的3D对象的匹配点来选择或预测关键帧320中的像素/块/补片，以用于再生帧405中的3D对象的像素。像素再生模块425可以被配置为生成(或计算)帧405中的3D对象25的像素值。例如，像素再生模块425可以基于所选择的帧中的标识位置处的残差，通过将来自用于关键帧320中的预测像素/块/补片的每个匹配点的像素属性值中的用于3D对象的三角形网格中的每个点的像素属性值与用于经平移和定向的所存储的3D对象的颜色值和/或颜色属性相加来生成像素值。The block matching module 325 may then use the matching points of the corresponding 3D object to select or predict the pixels/blocks/patches in the key frame 320 for regenerating the pixels of the 3D object in the frame 405 . Pixel regeneration module 425 may be configured to generate (or calculate) pixel values for 3D object 25 in frame 405 . For example, the pixel regeneration module 425 may, based on the residuals at the identified locations in the selected frame, generate a The pixel value is generated by adding the pixel attribute value for each point in the triangular mesh of the 3D object to the stored color value and/or color attribute for the translated and oriented 3D object.

在示例实施方式中，已经使用自动编码器对3D对象、所存储的3D对象和/或3D对象代理进行了编码。因此，在平移和匹配之前，可以使用自动编码器对3D对象、所存储的3D对象和/或3D对象代理进行解码(在下文更详细地描述)。解码所存储的3D对象或3D对象代理将3D对象、所存储的3D对象和/或3D对象代理的潜在表示转换为再生的网格表示。In example embodiments, the 3D objects, stored 3D objects and/or 3D object proxies have been encoded using an auto-encoder. Thus, the 3D objects, stored 3D objects and/or 3D object proxies may be decoded using an auto-encoder (described in more detail below) prior to translation and matching. Decoding the stored 3D object or 3D object proxy converts the 3D object, the underlying representation of the stored 3D object and/or the 3D object proxy into a regenerated mesh representation.

像素再生模块425可以进一步被配置为基于所选择的帧的残差或剩余部分以及关键帧的相应像素/块/补片，再生(例如，计算)所选择的帧的剩余部分的颜色值和/或颜色属性。The pixel regeneration module 425 may be further configured to regenerate (e.g., calculate) the color values and/or the remaining portion of the selected frame based on the residual or remainder of the selected frame and the corresponding pixels/blocks/patches of the key frame. or color properties.

图4B示出了根据示例实施方式的另一解码器预测模块的框图。如图4B所示，预测模块155包括帧405、元数据模块410、3D对象匹配模块415、所存储的3D对象135、所存储的3D对象平移模块420、关键帧320、块匹配模块325和像素再生模块425。4B shows a block diagram of another decoder prediction module according to an example embodiment. As shown in Figure 4B, prediction module 155 includes frame 405, metadata module 410, 3D object matching module 415, stored 3D objects 135, stored 3D object translation module 420, keyframes 320, block matching module 325, and pixels Regeneration module 425.

图4B所示的预测模块155的实现基本上与图4A所示的预测模块155相同。但是，图4B中所示的预测模块155不会基于关键帧再生3D对象的颜色值。相反，图4B中所示的预测模块155的像素再生模块425使用3D对象以及经平移和定向的所存储的3D对象来再生3D对象的颜色值。The implementation of the prediction module 155 shown in FIG. 4B is basically the same as the prediction module 155 shown in FIG. 4A . However, the prediction module 155 shown in FIG. 4B does not reproduce the color values of the 3D object based on the key frames. In contrast, the pixel regeneration module 425 of the prediction module 155 shown in FIG. 4B uses the 3D object and the translated and oriented stored 3D object to regenerate the color values of the 3D object.

在示例实施方式中，经平移和定向的所存储的3D对象与3D对象具有直接的点对点关系。换句话说，在定义经平移和定向的所存储的3D对象的三角形网格中的点(例如，位置属性为x₁,y₁,z₁)在定义3D对象的三角形网格中具有相应点(例如，位置属性为x₁,y₁,z₁)。因此，经平移和定向的所存储的3D对象的颜色属性可以用于确定3D对象的颜色值。In an example embodiment, the translated and oriented stored 3D object has a direct point-to-point relationship with the 3D object. In other words, points in the triangle mesh that define the translated and oriented stored 3D object (eg, position attributes are x ₁ , y ₁ , z ₁ ) have corresponding points in the triangle mesh that defines the 3D object (For example, the position attributes are x ₁ , y ₁ , z ₁ ). Thus, the translated and oriented stored color properties of the 3D object can be used to determine the color value of the 3D object.

因此，像素再生模块425可以被配置为基于所存储的对象来生成(或计算)用于帧405中的3D对象的颜色值。例如，像素再生模块425可以通过将从帧405读取的3D对象的三角形网格中的每个点的像素值与用于经平移和定向的所存储的3D对象的三角形网格中的相应点的颜色属性值相加来再生颜色值。然后将具有所计算的颜色属性值的经平移和定向的所存储的3D对象输出为作为再生的3D对象的3D对象25。如上参考图4A所述，再生帧30的剩余部分。Accordingly, pixel regeneration module 425 may be configured to generate (or calculate) color values for 3D objects in frame 405 based on the stored objects. For example, the pixel regeneration module 425 may compare the pixel value of each point in the triangular mesh of the 3D object read from frame 405 with the corresponding point in the triangular mesh of the stored 3D object for the translated and oriented The color attribute values of the s are added to reproduce the color value. The translated and oriented stored 3D object with the calculated color attribute values is then output as 3D object 25 as a regenerated 3D object. The remainder of frame 30 is reproduced as described above with reference to Figure 4A.

如上所述，每个所存储的3D对象135可以由三角形网格定义。三角形网格可以是由三角形面连接的点的集合。每个点可以存储各种属性。例如，属性可以包括每个点的位置、颜色、纹理坐标等。尽管上文引用了三角形网格结构，但是也可以使用其他多边形来定义所存储的3D对象135。此外，每个点的属性可以包括在本公开的上下文中可能有用或可能没有用的其他属性。As described above, each stored 3D object 135 may be defined by a triangular mesh. A triangular mesh can be a collection of points connected by triangular faces. Each point can store various properties. For example, attributes can include the location, color, texture coordinates, etc. of each point. Although the triangular mesh structure is referenced above, other polygons may also be used to define the stored 3D objects 135 . Additionally, the properties of each point may include other properties that may or may not be useful in the context of the present disclosure.

在示例实施方式中，所存储的3D对象135可以包括用于CGI电影的感兴趣3D对象。因此，所存储的3D对象135可能包括3D模型或网格数据以及用于大量3D角色的相应属性值。通常，3D角色可以包括CGI演员(例如，主角、配角和其他角色)、CGI宠物、CGI生物、CGI怪物等。如上所述，这些3D角色中的每一个可以是存储为3D模型的感兴趣3D对象或存储在所存储的3D对象135中的网格数据。每个3D角色可以包括唯一ID或标签。In an example embodiment, the stored 3D objects 135 may include 3D objects of interest for CGI movies. Thus, stored 3D objects 135 may include 3D model or mesh data and corresponding attribute values for a number of 3D characters. Typically, 3D characters may include CGI actors (eg, main characters, supporting characters, and other characters), CGI pets, CGI creatures, CGI monsters, and the like. As mentioned above, each of these 3D characters may be a 3D object of interest stored as a 3D model or mesh data stored in the stored 3D object 135 . Each 3D character can include a unique ID or tag.

在一些实施方式中，感兴趣3D对象的一部分可以被存储为3D模型或存储在所存储的3D对象135中的网格数据。3D角色的部分中的每一个可以包括唯一ID或标签。例如，至少一个3D角色(例如，主角和配角)可以具有表示至少一个3D角色头部或面部的相关联的3D模型或网格数据，以及表示至少一个3D角色身体的另一个相关联的3D模型或网格数据(注意头部、面部或身体也可以分成更小的部分)。通过将角色划分为多个部分，与相对较少动态的角色部分(例如，躯干和/或肩膀)相比，可以为更多动态的角色的部分(例如，头部、手臂和/或腿)存储相对较高的细节水平。In some embodiments, a portion of the 3D object of interest may be stored as a 3D model or mesh data stored in the stored 3D object 135 . Each of the parts of the 3D character may include a unique ID or tag. For example, at least one 3D character (eg, main and supporting characters) may have an associated 3D model or mesh data representing at least one 3D character's head or face, and another associated 3D model representing at least one 3D character's body or mesh data (note that the head, face or body can also be divided into smaller parts). By dividing the character into multiple parts, there can be more dynamic parts of the character (eg, head, arms, and/or legs) than relatively less dynamic character parts (eg, torso and/or shoulders) Stores a relatively high level of detail.

此外，可以将网格数据配置为适合标准3D模型。例如，第一立方模型可以是旨在适合头部模型的标准尺寸，而第二立方模型可以是旨在适合手的模型的标准尺寸。其他标准形状的3D模型可以包括球形、直角棱柱、长方形棱柱、棱锥、圆柱体等。例如，手臂或腿可以使用直角棱柱3D模型或圆柱3D模型。标准3D模型还可以对用来定义3D对象的多个点(例如，顶点)设置限制(最大和/或最小)。Additionally, mesh data can be configured to fit standard 3D models. For example, the first cubic model may be a standard size designed to fit a head model, while the second cubic model may be a standard size designed to fit a hand model. 3D models of other standard shapes can include spheres, right-angled prisms, rectangular prisms, pyramids, cylinders, etc. For example, an arm or leg can use a right-angled prism 3D model or a cylindrical 3D model. Standard 3D models may also place limits (maximum and/or minimum) on a number of points (eg, vertices) used to define the 3D object.

划分角色的各部分并将每个部分存储为所存储的3D对象135中的感兴趣唯一3D对象可以使得所存储的3D对象135的大小呈指数增长。但是，表示角色的一部分的3D模型或网格数据可以用于许多角色。例如，表示角色A的躯干的躯干模型也可以用于角色B。所存储的3D对象135还可以包括标识表示角色的标准部分的3D模型或网格数据的元数据以及对应于用于该角色的标准部分的一些变形信息。因此，3D模型或网格数据表示标准部分。例如，可以将表示躯干的一个3D模型用作将角色A的躯干表示为高个子瘦男人的3D模型，以及用作将角色B的躯干表示为矮胖的男人的3D模型。Dividing the parts of the character and storing each part as a unique 3D object of interest in the stored 3D objects 135 may allow the size of the stored 3D objects 135 to grow exponentially. However, 3D model or mesh data representing part of a character can be used for many characters. For example, a torso model representing the torso of character A can also be used for character B. The stored 3D objects 135 may also include metadata identifying 3D models or mesh data representing standard parts of the character and some deformation information corresponding to the standard parts for the character. Therefore, 3D models or mesh data represent standard parts. For example, one 3D model representing the torso may be used as the 3D model representing the torso of character A as a tall, thin man, and as the 3D model representing the torso of character B as a squat man.

随着所存储的3D对象135的数量增加(例如，对于使用大量3D标准模型的具有许多角色的全长CGI动画电影)，存储所存储的3D对象135所需的资源量(例如，内存)也随之增加。例如，流传输服务器可以存储10万个、100万个及数百万个视频。随着对包括3D视频的视频需求的增加(例如，在虚拟现实、增强现实、3D CGI电影等中)，在流传输服务器处存储的视频的百分比(通常存储为左眼和右眼2D视频)肯定也会增加。因此，当实现本文所述的技术时，用于与增长的视频数量一起使用。所存储的3D对象135的数量必定增加，因此存储所存储的3D对象135所需的资源量(例如，内存)也将增加。此外，在流传输活动期间，将所存储的3D对象135从流传输服务器传送到客户端设备可能需要大量带宽。因此，对于流传输操作而言，可能变得期望有效地编码和解码所存储的3D对象135。As the number of stored 3D objects 135 increases (eg, for a full-length CGI animated movie with many characters using a large number of 3D standard models), the amount of resources (eg, memory) required to store the stored 3D objects 135 also increases increased accordingly. For example, a streaming server can store 100,000, 1 million, and millions of videos. As demand for video including 3D video increases (eg, in virtual reality, augmented reality, 3D CGI movies, etc.), the percentage of video stored at the streaming server (usually stored as left-eye and right-eye 2D video) It will definitely increase as well. Therefore, when implementing the techniques described herein, for use with the growing number of videos. The number of stored 3D objects 135 must increase, so the amount of resources (eg, memory) required to store the stored 3D objects 135 will also increase. Furthermore, transferring the stored 3D objects 135 from the streaming server to the client device may require significant bandwidth during a streaming activity. Therefore, for streaming operations, it may become desirable to efficiently encode and decode stored 3D objects 135 .

图5A示出了根据示例实施方式的用于编码3D对象的信号流的框图。如图5A所示，编码3D对象可以至少包括所存储的3D对象135、神经网络编码器505和3D对象的潜在表示510。5A shows a block diagram of a signal flow for encoding a 3D object, according to an example embodiment. As shown in FIG. 5A, encoding a 3D object may include at least the stored 3D object 135, a neural network encoder 505, and a latent representation 510 of the 3D object.

如上所述，每个所存储的3D对象135可以由三角形网格定义。三角形网格可以是由三角形面连接的点或顶点的集合。每个点可以存储各种属性。在示例实施方式中，点或顶点的数量可以限制为一定值。As described above, each stored 3D object 135 may be defined by a triangular mesh. A triangular mesh can be a collection of points or vertices connected by triangular faces. Each point can store various properties. In example embodiments, the number of points or vertices may be limited to a certain value.

神经网络编码器505可以使用生成建模技术来压缩3D对象。生成建模技术的示例实施方式可以包括变分和/或卷积自动编码器(VAE)、生成对抗网络(GAN)和/或组合VAE-GAN。尽管提到和/或讨论了这些生成建模技术，但是示例实施方式不限于此。此外，尽管将神经网络型编码器讨论为用于压缩3D对象的实施方式，但是编码3D对象(包括三角形网格)的其他实施方式也在本公开的范围内。The neural network encoder 505 may use generative modeling techniques to compress 3D objects. Example implementations of generative modeling techniques may include variational and/or convolutional autoencoders (VAEs), generative adversarial networks (GANs), and/or combined VAE-GANs. While these generative modeling techniques are mentioned and/or discussed, example embodiments are not so limited. Furthermore, although a neural network-type encoder is discussed as an implementation for compressing 3D objects, other implementations of encoding 3D objects, including triangular meshes, are also within the scope of the present disclosure.

神经网络编码器505可以将由具有固定数量的顶点和固定连通性的网格定义的3D对象压缩为有时被称为潜在空间的相对较小数量的变量。例如，VAE可以被配置为学习用于3D形状(例如，已经在其上训练了VAE的3D形状)的紧凑的潜在空间。神经网络编码器505的神经网络和神经网络解码器515的神经网络都可以是机器训练神经网络和/或机器训练卷积神经网络。Neural network encoder 505 can compress 3D objects defined by meshes with a fixed number of vertices and fixed connectivity into a relatively small number of variables sometimes referred to as latent spaces. For example, a VAE can be configured to learn a compact latent space for 3D shapes (eg, 3D shapes on which the VAE has been trained). Both the neural network of neural network encoder 505 and the neural network of neural network decoder 515 may be machine-trained neural networks and/or machine-trained convolutional neural networks.

神经网络编码器505的神经网络可以包括与被应用于网格以便编码该网格的一个或多个卷积和/或滤波器相关联的系数。每个卷积可以具有C个滤波器、K×K掩码(K表示卷积)和步长因子。系数可以对应于C个滤波器、KxK掩码和步长因子中的一个或多个。例如，KxK掩码可以是3x3掩码。3x3掩码包含用于在网格上执行卷积的九(9)个变量。换句话说，3x3掩码具有九(9)个块，每个块都包含变量。每个变量可以分别是系数之一。The neural network of neural network encoder 505 may include coefficients associated with one or more convolutions and/or filters that are applied to the trellis to encode the trellis. Each convolution can have C filters, a K×K mask (K for convolution) and a stride factor. The coefficients may correspond to one or more of the C filters, KxK masks, and stride factors. For example, the KxK mask can be a 3x3 mask. The 3x3 mask contains nine (9) variables used to perform convolution on the grid. In other words, a 3x3 mask has nine (9) blocks, each block containing variables. Each variable can be one of the coefficients individually.

神经网络编码器505的神经网络可以包括具有不同数量的神经元的层。KxK空间范围可以包括K列和K(或L)行。KxK空间范围可以是2x2、3x3、4x4、5x5、(KxL)2x4等。卷积包括将KxK空间范围以网格点为中心，并且在空间范围内对所有网格点进行卷积，以及基于空间范围中的所有网格点的所有卷积(例如，总和)，生成该网格点的新值。然后基于步长，将空间范围移动到新网格点，并且对新网格点重复卷积。步长可以为例如一(1)或二(2)，其中，步长为1会移动到下一个网格点，而步长为2会跳过网格点。The neural network of the neural network encoder 505 may include layers with different numbers of neurons. The KxK spatial extent may include K columns and K (or L) rows. The KxK spatial extent can be 2x2, 3x3, 4x4, 5x5, (KxL)2x4, etc. Convolution involves centering the KxK spatial extent at a grid point, and convolving all grid points in the spatial extent, and generating the The new value of the grid point. Then based on the stride, the spatial extent is moved to the new grid point, and the convolution is repeated for the new grid point. The step size can be, for example, one (1) or two (2), where a step size of 1 moves to the next grid point, and a step size of 2 skips a grid point.

VAE可以将用于网格的每个点的位置坐标(例如，对于任何感兴趣的大多数3D形状或具有视觉上相当大量的细节的相对大量的数据)用作神经网络编码器505的输入，并且通过将神经网络与网格的每个点的位置坐标进行卷积，生成数量减少(最好相对少量)的变量(例如从64到128个变量)。用于神经网络的C个滤波器、KxK掩码和步长因子的配置可以确定用于3D对象510的潜在表示的潜在空间中的变量数量。The VAE may use the position coordinates for each point of the grid (eg, for any mostly 3D shape of interest or a relatively large amount of data with a visually considerable amount of detail) as input to the neural network encoder 505, And by convolving the neural network with the position coordinates of each point of the grid, a reduced (preferably relatively small) number of variables (eg from 64 to 128 variables) is generated. The configuration of the C filters, KxK masks, and stride factor for the neural network may determine the number of variables in the latent space for the latent representation of the 3D object 510 .

图5B示出了根据示例实施方式的用于解码3D对象的信号流的框图。如图5B所示，解码3D对象可以至少包括用于3D对象的潜在表示510、神经网络解码器515和所存储的3D对象135。3D对象的潜在表示510可以是通过使用神经网络编码器505来编码所存储的3D对象135所生成的3D对象的潜在表示510。5B shows a block diagram of a signal flow for decoding a 3D object, according to an example embodiment. As shown in FIG. 5B , decoding a 3D object may include at least a latent representation 510 for the 3D object, a neural network decoder 515 , and the stored 3D object 135 . The latent representation 510 of the 3D object may be generated by using the neural network encoder 505 A latent representation 510 of the 3D object generated by encoding the stored 3D object 135 .

神经网络解码器515的神经网络可以包括与被应用于网格以便编码该网格的一个或多个卷积和/或滤波器相关联的系数。每个卷积可以具有C个滤波器、K×K掩码(K表示卷积)和步长因子。系数可以对应于C个滤波器、KxK掩码和步长因子中的一个或多个。例如，KxK掩码可以是3x3掩码。3x3掩码包含用于在网格上执行卷积的九(9)个变量。换句话说，3x3掩码具有九(9)个块，每个块都包含变量。每个变量可以分别是系数之一。The neural network of neural network decoder 515 may include coefficients associated with one or more convolutions and/or filters that are applied to the trellis to encode the trellis. Each convolution can have C filters, a K×K mask (K for convolution) and a stride factor. The coefficients may correspond to one or more of the C filters, KxK masks, and stride factors. For example, the KxK mask can be a 3x3 mask. The 3x3 mask contains nine (9) variables used to perform convolution on the grid. In other words, a 3x3 mask has nine (9) blocks, each block containing variables. Each variable can be one of the coefficients individually.

神经网络解码器515的神经网络可以包括具有不同数量的神经元的层。KxK空间范围可以包括K列和K(或L)行。KxK空间范围可以是2x2、3x3、4x4、5x5、(KxL)2x4等。卷积包括将KxK空间范围以网格点为中心，并且在空间范围内对所有网格点进行卷积，以及基于空间范围中的所有网格点的所有卷积(例如，总和)，生成该网格点的新值。然后基于上采样，将空间范围移动到新网格点，并且对新网格点重复卷积。上采样因子可以为例如一(1)或二(2)，其中，步长因子为1会移动到下一个网格点，而步长因子为2会跳过网格点。The neural network of the neural network decoder 515 may include layers with different numbers of neurons. The KxK spatial extent may include K columns and K (or L) rows. The KxK spatial extent can be 2x2, 3x3, 4x4, 5x5, (KxL)2x4, etc. Convolution involves centering the KxK spatial extent at a grid point, and convolving all grid points in the spatial extent, and generating the The new value of the grid point. Then based on the upsampling, the spatial extent is moved to the new grid point, and the convolution is repeated on the new grid point. The upsampling factor can be, for example, one (1) or two (2), where a step factor of 1 moves to the next grid point and a step factor of 2 skips grid points.

卷积可以包括一个或多个补零的卷积运算和系数重组。在示例实施方式中，补零包括在非零像素之间填充零，并且系数重组可以包括具有以零填充像素为中心旋转180度的KxK掩码的卷积。Convolution may include one or more zero-padded convolution operations and coefficient reorganization. In an example embodiment, zero-padding includes padding zeros between non-zero pixels, and coefficient reorganization may include convolution with a KxK mask rotated 180 degrees around the zero-padded pixels.

VAE可以将用于3D对象的潜在表示510的潜在空间中的变量用作神经网络解码器515的输入，并且对网格的每个点再生位置坐标。神经网络的C个滤波器、KxK掩码和步长因子的配置可以确定再生的网格点的数量。The VAE may use the variables in the latent space for the latent representation 510 of the 3D object as input to the neural network decoder 515 and regenerate the position coordinates for each point of the grid. The configuration of the neural network's C filters, KxK mask, and step factor can determine the number of regenerated grid points.

因此，神经网络解码器515可以被配置为使用用于3D对象的潜在表示510中的相应一个的变量来再现在由神经网络编码器505压缩3D对象之前的3D对象的形状和/或模型的近似。神经网络解码器515可以被配置为生成对应于所存储的3D对象135之一的3D模型的近似，包括生成在具有与由神经网络编码器505压缩之前的所存储的3D对象135之一相同数量的顶点和连通性的网格中的点的位置坐标。Accordingly, the neural network decoder 515 may be configured to use the variables for the respective one of the latent representations 510 of the 3D object to reproduce an approximation of the shape and/or model of the 3D object prior to compression of the 3D object by the neural network encoder 505 . The neural network decoder 515 may be configured to generate an approximation of the 3D model corresponding to one of the stored 3D objects 135 , including generating an approximation of the 3D model having the same number as one of the stored 3D objects 135 prior to compression by the neural network encoder 505 The location coordinates of the vertices and points in the connectivity mesh.

根据示例实施方式，可以由具有相同数量的点的网格定义每个所存储的3D对象135，每个点具有位置坐标。此外，VAE可以包括神经网络编码器505和神经网络解码器515，均包括具有用于该神经网络的相同配置的C个滤波器、KxK掩码和步长因子的神经网络。结果，用于由神经网络编码器505生成的、用于3D对象的潜在表示510中的每一个的潜在空间中的变量数量是相同的。此外，由神经网络解码器515再生的网格中的点数相同。According to an example embodiment, each stored 3D object 135 may be defined by a grid having the same number of points, each point having position coordinates. Additionally, the VAE may include a neural network encoder 505 and a neural network decoder 515, each including a neural network with the same configuration of C filters, KxK masks, and stride factors used for the neural network. As a result, the number of variables in the latent space for each of the latent representations 510 for 3D objects generated by the neural network encoder 505 is the same. Furthermore, the number of points in the grid reproduced by the neural network decoder 515 is the same.

在上述示例实施方式中，所存储的3D对象135可以包括CGI电影的感兴趣3D对象。因此，所存储的3D对象135很可能包括3D模型或网格数据以及用于大量3D角色的相应属性值。如上所述，3D模型可以表示整个感兴趣3D对象和/或感兴趣3D对象的一部分。可以使用标准3D模型来表示整个感兴趣3D对象和/或感兴趣3D对象的一部分。In the example embodiments described above, the stored 3D objects 135 may include 3D objects of interest for CGI movies. Thus, the stored 3D objects 135 are likely to include 3D model or mesh data and corresponding attribute values for a large number of 3D characters. As mentioned above, the 3D model may represent the entire 3D object of interest and/or a portion of the 3D object of interest. The entire 3D object of interest and/or a portion of the 3D object of interest may be represented using a standard 3D model.

在该示例实施方式中，所存储的3D对象135对应于对CGI电影感兴趣的3D对象(例如，作为视频数据5)，并且可以不包括CGI电影中未包含的任何其他3D对象。通常，监督机器学习方法机器学习一个或多个规则或功能，以根据操作员所预定的、在示例输入和期望输出之间进行映射。监督机器学习方法和半监督学习方法可以使用标记的数据集。因此，机器训练或训练神经网络编码器505和神经网络解码器515可以使用将所存储的3D对象135(例如，每个3D对象用唯一ID和/或标签标记)用作输入数据和比较数据的监督机器学习方法或半监督学习方法完成。In this example embodiment, the stored 3D objects 135 correspond to 3D objects of interest to the CGI movie (eg, as video data 5), and may not include any other 3D objects not included in the CGI movie. Typically, supervised machine learning methods machine learn one or more rules or functions to map between example inputs and desired outputs as predetermined by an operator. Supervised machine learning methods and semi-supervised learning methods can use labeled datasets. Thus, machine training or training neural network encoder 505 and neural network decoder 515 may use a method that uses stored 3D objects 135 (eg, each 3D object is labeled with a unique ID and/or label) as input data and comparison data Supervised machine learning methods or semi-supervised learning methods are done.

在训练过程期间使用的许多训练迭代可以在重构网格的精度上产生近似对数增益。因此，在半监督学习方法中，可以使用基于时间、误差(例如，损失)和/或迭代次数的阈值来停止进一步训练。例如，可以基于重构网格的点数和位置坐标，将阈值设置为重构误差或损失。在示例实施方式中，重构误差可以是基于形状重构损失(Lr)和正则化先验损失(Lp)的损失。The many training iterations used during the training process can produce an approximate logarithmic gain in the accuracy of the reconstructed grid. Thus, in semi-supervised learning methods, thresholds based on time, error (eg, loss), and/or number of iterations can be used to stop further training. For example, a threshold can be set to the reconstruction error or loss based on the number of points and position coordinates of the reconstructed grid. In an example embodiment, the reconstruction error may be a loss based on a shape reconstruction loss (Lr) and a regularization prior loss (Lp).

形状重构损失(Lr)可以基于在神经网络编码器505和神经网络解码器515之间获得接近恒同变换。形状重构损失(Lr)可以基于在图5B中生成的所存储的3D对象135和图5A的所存储的3D对象135之间的逐点距离来计算。正则化先验损失(Lp)可以在潜在变量上使用基于以图5A的所存储的3D对象135为基础的矢量的变化或先前迭代分布的散度算法来计算，并且散度算法进一步基于潜在变量的分布。VAE损失或总损失变为L＝Lr+λLp，其中，λ≥0，并且控制变化分布与先前变化分布的相似性。The shape reconstruction loss (Lr) may be based on obtaining a near-identical transformation between the neural network encoder 505 and the neural network decoder 515 . The shape reconstruction loss (Lr) may be calculated based on the pointwise distance between the stored 3D object 135 generated in FIG. 5B and the stored 3D object 135 of FIG. 5A . The regularization prior loss (Lp) may be calculated on the latent variables using a divergence algorithm based on the variation of the vector based on the stored 3D object 135 of FIG. 5A or a previous iterative distribution, and the divergence algorithm is further based on the latent variable Distribution. The VAE loss or total loss becomes L=Lr+λLp, where λ≧0, and controls the similarity of the variation distribution to the previous variation distribution.

在示例实施方式中，训练过程包括通过依次迭代通过如图5A所示的用于编码3D对象的信号流和如图5B所示的用于解码3D对象的信号流来训练神经网络编码器505和神经网络解码器515。继续上述示例，所存储的3D对象135对应于CGI电影的感兴趣3D对象。在每次迭代(例如，编码和解码每个所存储的3D对象135)之后，将在图5B中生成的所存储的3D对象135(例如，作为重构的3D对象)与图5A的所存储的3D对象135进行比较。在示例实施方式中，重构误差可以基于以形状重构损失(Lr)和正则化先验损失(Lp)为基础的损失来计算。如果重构误差高于阈值(如上所述)，则可以修正用于系数的变量，该系数对应于用于神经网络编码器505和神经网络解码器515中的至少一个的C个滤波器、KxK掩码和步长因子中的一个或多个。此外，可以修正与计算VAE损失相关的变量λ。在示例实施方式中，变量可以基于用于前一迭代的变量值和结果以及用于当前修正的变量值和结果来修正。在变量修正后，可以开始下一个训练迭代。In an example embodiment, the training process includes training the neural network encoder 505 by sequentially iterating through the signal flow for encoding 3D objects as shown in FIG. 5A and the signal flow for decoding 3D objects as shown in FIG. 5B . Neural Network Decoder 515. Continuing the above example, the stored 3D objects 135 correspond to the 3D objects of interest for the CGI movie. After each iteration (eg, encoding and decoding each stored 3D object 135 ), the stored 3D object 135 generated in FIG. 5B (eg, as a reconstructed 3D object) is compared with the stored 3D object 135 of FIG. 5A . 3D object 135 for comparison. In an example embodiment, the reconstruction error may be calculated based on a loss based on a shape reconstruction loss (Lr) and a regularization prior loss (Lp). If the reconstruction error is above a threshold (as described above), the variables for the coefficients corresponding to the C filters, KxK, for at least one of the neural network encoder 505 and the neural network decoder 515 may be modified One or more of mask and stride factor. In addition, the variable λ associated with calculating the VAE loss can be corrected. In an example embodiment, the variables may be modified based on the variable values and results for the previous iteration and the variable values and results for the current modification. After the variables are corrected, the next training iteration can be started.

在示例实施方式中，流传输和/或下载视频可以包括在传送或流传输视频的压缩帧之前，传送初始化数据以供请求客户端设备使用(例如，用于解压缩压缩帧)。图6A示出了用于流传输视频并在客户端设备上渲染视频的信号流的框图。如图6A所示，流传输服务器605-1包括视频存储125、主动流传输模块610、流传输初始化模块615和收发器620。视频存储125包括压缩帧140、元数据130和用于3D对象的潜在表示510。流传输服务器605-1可以经由收发器625通信地耦合到客户端设备650。如图6A所示，客户端设备650包括收发器625、解码器145、渲染器630、颜色校正模块635和显示器640。In an example embodiment, streaming and/or downloading the video may include transmitting initialization data for use by the requesting client device (eg, to decompress the compressed frames) prior to transmitting or streaming the compressed frames of the video. 6A shows a block diagram of a signal flow for streaming video and rendering the video on a client device. As shown in FIG. 6A , streaming server 605 - 1 includes video storage 125 , active streaming module 610 , streaming initialization module 615 , and transceiver 620 . Video storage 125 includes compressed frames 140, metadata 130, and potential representations 510 for 3D objects. Streaming server 605 - 1 may be communicatively coupled to client device 650 via transceiver 625 . As shown in FIG. 6A , client device 650 includes transceiver 625 , decoder 145 , renderer 630 , color correction module 635 , and display 640 .

主动流传输模块610可以被配置为将所选择的视频的帧流式传输到请求客户端设备(例如，客户端设备650)。主动流传输模块610可以从客户端设备650接收对下一帧的请求。然后，主动流传输模块610可以从压缩帧140中选择下一帧，并且从元数据130中选择相应的元数据。在示例实施方式中，所选择的下一帧可以是多个压缩帧，并且所选择的元数据可以是多个相应的元数据元素。例如，多个压缩帧可以是由(并且可能包括)前一(例如，时间上，在帧之前的)关键帧和未来(例如，时间上，在帧之后的)关键帧绑定的多个帧。Active streaming module 610 may be configured to stream selected frames of video to a requesting client device (eg, client device 650). Active streaming module 610 may receive a request for the next frame from client device 650 . The active streaming module 610 may then select the next frame from the compressed frames 140 and select the corresponding metadata from the metadata 130 . In an example embodiment, the selected next frame may be a plurality of compressed frames, and the selected metadata may be a plurality of corresponding metadata elements. For example, multiple compressed frames may be multiple frames bound by (and may include) a previous (eg, temporally, before the frame) keyframe and a future (eg, temporally, after the frame) keyframe .

然后，主动流传输模块610可以将所选择的下一帧(或多个帧)和所选择的元数据(或元数据元素)传送到收发器620。然后，收发器620可以构建包括所选择的下一帧和所选择的元数据的一个或多个数据包，将客户端设备650的地址分配给数据包，并且将数据包传送到客户端设备650。The active streaming module 610 may then transmit the selected next frame (or frames) and the selected metadata (or metadata elements) to the transceiver 620 . The transceiver 620 may then construct one or more data packets including the selected next frame and the selected metadata, assign the address of the client device 650 to the data packets, and transmit the data packets to the client device 650 .

流传输初始化模块615可以被配置为响应于客户端设备650对视频的第一请求来选择与流视频相关联的数据。例如，客户端设备650的用户可以下载该视频以供未来回放和/或开始流式回放。流传输初始化模块615可以将用于视频的所存储的3D对象135选择为用于3D对象的潜在表示510的集合并且从元数据130中选择对应于该视频的元数据。然后，流传输初始化模块615可以将所选择的所存储的3D对象135和所选择的元数据传送到收发器620。然后，收发器620可以构建包括所选择的所存储的3D对象135和所选择的元数据的数据包，将客户端设备650的地址分配给该数据包，并且将该数据包传送到客户端设备650。在示例实施方式中，收发器620可以构建一个以上的数据包。Streaming initialization module 615 may be configured to select data associated with streaming video in response to a first request for video by client device 650 . For example, the user of client device 650 may download the video for future playback and/or begin streaming playback. The streaming initialization module 615 may select the stored 3D object 135 for the video as the set of potential representations 510 for the 3D object and select the metadata corresponding to the video from the metadata 130 . The streaming initialization module 615 may then transmit the selected stored 3D objects 135 and the selected metadata to the transceiver 620 . The transceiver 620 may then construct a data packet including the selected stored 3D object 135 and the selected metadata, assign the address of the client device 650 to the data packet, and transmit the data packet to the client device 650. In an example embodiment, transceiver 620 may construct more than one data packet.

客户端设备650经由收发器625接收包括所存储的3D对象135和元数据的数据包。收发器625将所存储的3D对象135和元数据传送到解码器145。解码器145可以被配置为与所请求的视频相关联地存储所存储的3D对象135和元数据。然后，客户端设备650经由收发器625接收包括所选择的下一帧和所选择的元数据的数据包。所选择的下一帧可以是初始回放时的第一帧。收发器625将所选择的下一帧和所选择的元数据传送到解码器145。Client device 650 receives data packets including stored 3D objects 135 and metadata via transceiver 625 . The transceiver 625 transmits the stored 3D object 135 and metadata to the decoder 145 . The decoder 145 may be configured to store the stored 3D object 135 and metadata in association with the requested video. Client device 650 then receives via transceiver 625 a data packet including the selected next frame and the selected metadata. The next frame selected may be the first frame on initial playback. The transceiver 625 transmits the selected next frame and the selected metadata to the decoder 145 .

然后，解码器145使用与下一帧相关联的所存储的3D对象135和元数据来解码(如上更详细所述)下一帧。在示例实施方式中，解码器145可以被实现为包括图形处理单元(GPU)的图形卡和/或芯片(例如，计算机母板上的ASIC)或在包括图形处理单元(GPU)的图形卡和/或芯片(例如，计算机母板上的ASIC)中实现，被配置为移除来自设备650的中央处理单元(CPU)的负担。GPU可以被配置为并行地处理大块视频数据。可以将GPU配置为处理(例如，解压缩)网格数据并由网格数据生成像素数据。解码器将纹理数据和颜色数据传送到渲染器630。The decoder 145 then decodes (as described in more detail above) the next frame using the stored 3D object 135 and metadata associated with the next frame. In example embodiments, the decoder 145 may be implemented as a graphics card and/or chip including a graphics processing unit (GPU) (eg, an ASIC on a computer motherboard) or as a graphics card and/or chip including a graphics processing unit (GPU) and/or implemented in a chip (eg, an ASIC on a computer motherboard) configured to remove the burden from the central processing unit (CPU) of the device 650 . GPUs can be configured to process large blocks of video data in parallel. The GPU may be configured to process (eg, decompress) mesh data and generate pixel data from the mesh data. The decoder transmits texture data and color data to the renderer 630 .

在3D渲染系统中，通常使用笛卡尔坐标系(x-(左/右)，y-(上/下)和z轴(近/远))。笛卡尔坐标系提供了一种精确的数学方法来定位和表示空间中的对象。为简化起见，可以将帧的图像视为对象将处于的空间。可以基于相机位置来定向空间。例如，可以将相机放置在原点并直视z轴。因此，(相对于相机位置的)平移运动为z向前/向后、y向上/向下和x向左/向右。然后，基于对象相对于空间原点的坐标以及任何相机的重新定位，将对象投影到空间中。注意对象和/或相机可以逐帧移动。In 3D rendering systems, a Cartesian coordinate system (x-(left/right), y-(up/down) and z-axis (near/far)) is usually used. The Cartesian coordinate system provides a precise mathematical way to locate and represent objects in space. For simplicity, the image of the frame can be thought of as the space the object will be in. The space can be oriented based on the camera position. For example, you can place the camera at the origin and look directly at the z-axis. Thus, the translational movement (relative to the camera position) is z forward/backward, y up/down, and x left/right. The object is then projected into space based on its coordinates relative to the origin of space and any camera repositioning. Note that objects and/or the camera can move frame by frame.

渲染器630可以被配置为渲染纹理数据和颜色数据以供显示。可以在几何阶段和渲染阶段的图形管线中实现渲染或绘图。可以将GPU配置为处理几何阶段和渲染阶段。由CPU或GPU执行的几何阶段被配置为处理所有多边形活动并且将3D空间数据转换为像素。几何阶段过程可以包括但不限于场景(例如背景)几何生成、基于相机运动的对象运动、对象运动、对象变换(例如旋转、平移和/或缩放)、对象可见性(例如，遮挡和剔除)和多边形(例如三角形)的生成。Renderer 630 may be configured to render texture data and color data for display. Rendering or drawing can be implemented in the geometry stage and the graphics pipeline in the render stage. The GPU can be configured to handle the geometry stage and the rendering stage. The geometry stage, executed by the CPU or GPU, is configured to handle all polygon activity and convert 3D spatial data to pixels. Geometry stage processes may include, but are not limited to, scene (eg background) geometry generation, camera motion based object motion, object motion, object transformation (eg rotation, translation and/or scaling), object visibility (eg occlusion and culling) and Generation of polygons (eg triangles).

由GPU的3D硬件加速器执行的渲染阶段被配置为管理内存和像素活动，并且处理要绘制到显示器640的像素。渲染阶段过程可以包括但不限于阴影、纹理、深度缓冲和显示。The rendering stage performed by the GPU's 3D hardware accelerator is configured to manage memory and pixel activity, and to process pixels to be drawn to the display 640 . Render stage processes may include, but are not limited to, shading, texturing, depth buffering, and display.

渲染纹理数据和颜色数据以供显示可以包括使用3D着色器(例如，顶点着色器、几何着色器等)来绘制与帧相关联的网格。可以将着色器配置为生成图元。可以将着色器配置为将网格中的每个顶点的3D位置和纹理变换为出现在显示器(例如，显示器640)上的2D坐标(例如，图元)。渲染纹理数据和颜色数据以供显示还可以包括执行光栅化。光栅化可以包括基于纹理数据和颜色数据，对图元使用分配像素(例如，颜色)值。Rendering the texture data and color data for display may include using a 3D shader (eg, vertex shader, geometry shader, etc.) to draw a mesh associated with the frame. Shaders can be configured to generate primitives. The shader may be configured to transform the 3D position and texture of each vertex in the mesh into 2D coordinates (eg, primitives) that appear on a display (eg, display 640 ). Rendering texture data and color data for display may also include performing rasterization. Rasterization may include assigning pixel (eg, color) values to primitives using texture data and color data.

如上所述，几何阶段包括在2D坐标系中构建3D网格，而渲染阶段包括向网格添加颜色和纹理。因此，解码器145的输出可以包括对象位置数据和颜色/纹理数据。因此，压缩帧140可以包括压缩对象位置数据和压缩颜色/纹理数据。压缩对象位置数据和压缩颜色/纹理数据可以基于与前一帧或下一帧的位移、基于相机平移运动数据、基于绝对数据(例如，x,y,z坐标、RGB值和/等)和/或其组合或变化。因此，编码器105可以对帧的场景进行编码以生成包括具有该压缩对象位置数据和压缩颜色/纹理数据的数据的压缩帧140。As mentioned above, the geometry phase involves building a 3D mesh in a 2D coordinate system, while the rendering phase involves adding color and texture to the mesh. Thus, the output of decoder 145 may include object position data and color/texture data. Accordingly, compressed frame 140 may include compressed object position data and compressed color/texture data. Compressed object position data and compressed color/texture data may be based on displacement from the previous or next frame, based on camera translation motion data, based on absolute data (e.g., x, y, z coordinates, RGB values and/or the like) and/ or a combination or variation thereof. Thus, the encoder 105 may encode the scene of the frame to generate a compressed frame 140 including data with the compressed object position data and compressed color/texture data.

与所渲染的下一帧相关联的数据可以被传送到颜色校正模块635。颜色校正模块635可以被配置为对与所渲染的下一帧相关联的数据进行颜色校正。颜色校正可以包括补偿帧之间的色差、补偿同一场景的多个视图之间的色差、校正对象变形(扭曲)、校正对象边界变形等。Data associated with the next frame to be rendered may be passed to the color correction module 635 . Color correction module 635 may be configured to color correct data associated with the next frame to be rendered. Color correction may include compensating for chromatic aberration between frames, compensating for chromatic aberration between multiple views of the same scene, correcting object distortion (warping), correcting object boundary distortion, and the like.

颜色校正模块635可以将颜色校正帧传送到显示器640。显示器640可以被配置为将帧显示为表示所请求的视频的帧序列。在示例实施方式中，显示器640包括和/或具有相关联的缓冲器和/或队列。缓冲器和/或队列可以是先入/先出缓冲器和/或队列。因此，颜色校正模块635可以将颜色校正帧传送到缓冲器和/或队列。显示器640包括被配置为从缓冲器和/或队列中选择下一帧的组件。Color correction module 635 may transmit the color correction frame to display 640 . Display 640 may be configured to display the frames as a sequence of frames representing the requested video. In an example embodiment, display 640 includes and/or has associated buffers and/or queues. The buffers and/or queues may be first-in/first-out buffers and/or queues. Accordingly, the color correction module 635 may transmit the color corrected frames to a buffer and/or queue. Display 640 includes components configured to select the next frame from a buffer and/or queue.

在示例实施方式中，流传输和/或下载视频可以包括与传送或流传输视频的压缩帧并行地(例如，几乎同时)传送数据供请求客户端设备使用(例如，用于解压缩压缩帧)。图6B示出了用于流传输视频并在客户端设备上渲染视频的另一信号流的框图。如图6B所示，流传输服务器605-2包括视频存储125、主动流传输模块610和收发器620。In example embodiments, streaming and/or downloading video may include transmitting data for use by a requesting client device (eg, for decompressing the compressed frames) in parallel (eg, nearly simultaneously) with transmitting or streaming compressed frames of the video. . 6B shows a block diagram of another signal flow for streaming video and rendering the video on a client device. As shown in FIG. 6B , streaming server 605 - 2 includes video storage 125 , active streaming module 610 and transceiver 620 .

在图6B所示的示例实施方式中，主动流传输模块610被进一步配置为从用于3D对象的潜在表示510中选择用在压缩所选择的下一帧中的感兴趣的至少一个3D对象作为所存储的3D对象135中的至少一个。主动流传输模块610将所存储的3D对象135中的至少一个与所选择的下一帧和所选择的元数据一起传送到收发器620。然后，收发器620可以构建包括所选择的下一帧、所选择的元数据和所存储的3D对象135中的至少之一的数据包，将客户端设备650的地址分配给该数据包，并且将该数据包传送到客户端设备650。In the example embodiment shown in FIG. 6B, the active streaming module 610 is further configured to select, from the latent representations 510 for 3D objects, at least one 3D object of interest for use in compressing the selected next frame as a At least one of the stored 3D objects 135 . The active streaming module 610 transmits at least one of the stored 3D objects 135 to the transceiver 620 along with the selected next frame and the selected metadata. The transceiver 620 may then construct a data packet including at least one of the selected next frame, the selected metadata, and the stored 3D object 135, assign the address of the client device 650 to the data packet, and The data packet is transmitted to the client device 650 .

然后，客户端设备650经由收发器625接收包括所选择的下一帧、所选择的元数据和所存储的3D对象135中的至少一个的数据包。收发器625将所选择的下一帧、所选择的元数据和所存储的3D对象135中的至少一个传送到解码器145。解码器145可以进一步被配置为将所存储的3D对象135中的至少一个添加到与所请求的视频相关联的所存储的3D对象135和/或使用所存储的3D对象135中的至少一个来初始化用于所请求的视频的所存储的3D对象135。The client device 650 then receives via the transceiver 625 a data packet including at least one of the selected next frame, the selected metadata, and the stored 3D object 135 . The transceiver 625 transmits at least one of the selected next frame, the selected metadata, and the stored 3D object 135 to the decoder 145 . The decoder 145 may be further configured to add and/or use at least one of the stored 3D objects 135 to the stored 3D objects 135 associated with the requested video. The stored 3D object 135 for the requested video is initialized.

此外，尽管当编码/解码视频中的具有从帧到帧的动态非线性和/或随机运动的3D对象(有时在本文中称为动态3D对象)时，通过本文所述的几何代理技术的颜色预测可能是最有利的，但其他3D对象或非动态3D对象也可以通过几何代理技术，使用颜色预测来有利地进行编码/解码。Furthermore, while encoding/decoding 3D objects in video with dynamic non-linear and/or random motion from frame to frame (sometimes referred to herein as dynamic 3D objects), color through the geometric proxy techniques described herein Prediction may be most advantageous, but other 3D objects or non-dynamic 3D objects can also be encoded/decoded advantageously using color prediction through geometric proxy techniques.

例如，可以通过几何代理，使用颜色预测来编码/解码由于相机或场景平移而看似从帧到帧移动的3D对象。在该示例中，由于捕获场景的相机正在(例如，以可预测的方式和/或方向)移动，所以静止的3D对象可能看似正在从帧到帧移动。例如，可以通过几何代理，使用颜色预测来编码/解码看似在场景内以可预测方式移动的3D对象。在该示例中，3D对象(例如，车辆，即火车、汽车或飞机)可以以可预测的方式(例如，以恒定的速度和/或方向)从帧到帧移动。这些对象有时在本文中被称为平移3D对象。作为另一示例，可以通过几何代理，使用颜色预测来编码/解码看似在场景内未移动的3D对象。在该示例中，静止或固定的3D对象(例如，场景的背景、场景内固定位置处的家具或远处缓慢移动的对象)可能从帧到帧看起来是静止的(例如，没有任何相机或场景平移)。这些对象有时在本文中被称为固定或背景3D对象。For example, color prediction can be used by geometry proxies to encode/decode 3D objects that appear to move from frame to frame due to camera or scene translation. In this example, a stationary 3D object may appear to be moving from frame to frame because the camera capturing the scene is moving (eg, in a predictable manner and/or direction). For example, color prediction can be used to encode/decode 3D objects that appear to move in a predictable manner within the scene through a geometric proxy. In this example, a 3D object (eg, a vehicle, ie, a train, car, or plane) may move from frame to frame in a predictable manner (eg, with a constant speed and/or direction). These objects are sometimes referred to herein as translation 3D objects. As another example, color prediction can be used to encode/decode 3D objects that do not appear to be moving within the scene through a geometric proxy. In this example, a stationary or stationary 3D object (eg, the background of the scene, furniture at a fixed location within the scene, or objects moving slowly in the distance) may appear stationary from frame to frame (eg, without any cameras or scene pan). These objects are sometimes referred to herein as stationary or background 3D objects.

出于多种原因，可以通过几何代理技术，使用颜色预测来有利地编码/解码上文例示的其他类型的3D对象。例如(这不旨在是详尽的列表)，可以将平移3D对象、固定3D对象和/或背景3D对象的至少一个位置(例如，帧中的位置)从流传输服务器(例如，流传输服务器605-1、605-2)传送到要在渲染器和/或渲染操作中使用的客户端设备(例如，客户端设备650)；由于存在几何代理(例如所存储的3D对象135)，可以增加关键帧之间的帧数；Z阶分层技术可以被用于动态3D对象和非动态3D对象；可以使用(例如在被渲染前在队列中存储的)时间上较早和时间上较晚的帧和几何代理，重新创建丢失(例如，未重传)的帧；可以使用几何代理来编码/解码关键帧之间出现和消失的3D对象和/或可以使用几何代理来编码/解码帧外背景相机或场景平移。The other types of 3D objects exemplified above can be advantageously encoded/decoded using color prediction through geometric proxy techniques for a number of reasons. For example (this is not intended to be an exhaustive list), at least one position (eg, a position in a frame) of the panning 3D object, the stationary 3D object, and/or the background 3D object may be downloaded from the streaming server (eg, the streaming server 605 ). -1, 605-2) to the client device (eg, client device 650) to be used in the renderer and/or rendering operations; due to the presence of geometry proxies (eg, stored 3D objects 135), the critical Number of frames between frames; Z-level layering techniques can be used for both dynamic 3D objects and non-dynamic 3D objects; temporally earlier and later frames (e.g. stored in a queue before being rendered) can be used and geometry proxies, to recreate lost (eg, not retransmitted) frames; geometry proxies can be used to encode/decode 3D objects that appear and disappear between keyframes and/or out-of-frame background cameras can be encoded/decoded or scene panning.

因此，上述的元数据20可以用在渲染器630中以更有效地渲染平移3D对象、固定3D对象和/或背景3D对象。在示例实施方式中，第一对象可以被标识为背景3D对象。元数据20可以将第一对象标识为例如所存储的3D对象135中的一个。元数据20还可以将(例如，作为该帧的背景的)第一对象的原点坐标(例如x₀,y₀,z₀)标识为表示第一对象的网格中的点之一的位置属性。Accordingly, the above-described metadata 20 may be used in the renderer 630 to render panning 3D objects, stationary 3D objects, and/or background 3D objects more efficiently. In an example embodiment, the first object may be identified as a background 3D object. The metadata 20 may identify the first object as, for example, one of the stored 3D objects 135 . The metadata 20 may also identify the origin coordinates (eg, x ₀ , y ₀ , z ₀ ) of the first object (eg, as the background of the frame) as a location attribute representing one of the points in the grid of the first object .

此外，可以将至少一个第二对象标识为平移3D对象和/或固定3D对象。元数据20可以将至少一个第二对象标识为例如所存储的3D对象135之一。元数据20还可以将至少一个第二对象的位置(例如，x_n,y_n,z_n)标识为表示至少一个第二对象的网格中的点之一的位置属性。至少一个第二对象的所标识的位置可以相对于另一对象。至少一个第二对象的相对位置可以从帧到帧变化(例如，作为平移3D对象)。至少一个第二对象的相对位置可以从帧到帧是固定的(例如，作为固定的3D对象)。Furthermore, the at least one second object may be identified as a translating 3D object and/or a stationary 3D object. The metadata 20 may identify the at least one second object as, for example, one of the stored 3D objects 135 . The metadata 20 may also identify the location (eg, x _n , y _n , z _n ) of the at least one second object as a location attribute representing one of the points in the grid of the at least one second object. The identified location of the at least one second object may be relative to another object. The relative position of the at least one second object may vary from frame to frame (eg, as a translating 3D object). The relative position of the at least one second object may be fixed from frame to frame (eg, as a fixed 3D object).

例如，至少一个第二对象的所标识的位置可以相对于用于该帧的背景(例如，第一对象)的所标识的原点坐标。将相对于该帧的背景定位至少一个第二对象时，将相对位置用于至少一个第二对象可以允许六个自由度。换句话说，当相对于该帧的背景定位至少一个第二对象时，至少一个第二对象可以具有平移运动(例如，向前/向后、向上/向下和/或向左/向右)以及旋转运动(例如，俯仰、滚动和/或偏航)。For example, the identified location of the at least one second object may be relative to the identified origin coordinates for the background (eg, the first object) for the frame. Using the relative position for the at least one second object may allow six degrees of freedom when positioning the at least one second object relative to the background of the frame. In other words, the at least one second object may have translational motion (eg, forward/backward, up/down, and/or left/right) when positioned relative to the background of the frame and rotational motion (eg, pitch, roll, and/or yaw).

如上所述，第一对象可以被标识为背景3D对象，并且可以是所存储的3D对象135中的一个。此外，至少一个第二对象可以被标识为平移3D对象和/或固定3D对象，其也可以是所存储的3D对象135中的一个。因此，第一对象和/或至少一个第二对象可以通过自动编码器(实现神经网络)编码为相应存储的3D对象135的潜在表示(如上更详细所述)。与神经网络的结构有关的潜在表示(例如，作为用于3D对象的一个或多个潜在表示510)和信息(例如，元数据)可以从流传输服务器605-1、605-2传送到客户端650。客户端650可以使用自动编码器的解码器(实现神经网络)来重构一个或多个相应的所存储的3D对象135(如上更详细地讨论)。As described above, the first object may be identified as a background 3D object, and may be one of the stored 3D objects 135 . Furthermore, at least one second object may be identified as a translation 3D object and/or a stationary 3D object, which may also be one of the stored 3D objects 135 . Accordingly, the first object and/or the at least one second object may be encoded by an autoencoder (implementing a neural network) into a corresponding stored latent representation of the 3D object 135 (as described in more detail above). Latent representations (eg, as one or more latent representations 510 for 3D objects) and information (eg, metadata) related to the structure of the neural network may be communicated from the streaming servers 605-1, 605-2 to the client 650. The client 650 may use the auto-encoder's decoder (implementing a neural network) to reconstruct one or more corresponding stored 3D objects 135 (discussed in more detail above).

图7至图11是根据示例实施例的方法的流程图。参考图7至图11描述的步骤可以由于执行在与装置(例如，如图12和13所示)相关联的存储器(例如，至少一个存储器1210、1310)中存储的软件代码来实现或由与装置相关联的至少一个处理器(例如，至少一个处理器1205、1305)执行。7-11 are flowcharts of methods according to example embodiments. The steps described with reference to Figures 7-11 may be implemented as a result of executing software code stored in memory (eg, at least one memory 1210, 1310) associated with the apparatus (eg, as shown in Figures 12 and 13) or by The device is associated with at least one processor (eg, at least one processor 1205, 1305) to execute.

然而，可以设想替代实施例，诸如体现为专用处理器的系统。专用处理器可以是图形处理单元(GPU)。GPU可以是图形卡的组件。图形卡还可以包括视频存储器、随机存取存储器数模转换器(RAMDAC)和驱动器软件。视频存储器可以是帧缓冲器，其存储代表图像或帧的场景的数字数据。可以将RAMDAC配置为读取视频存储器的内容，将内容转换为模拟RGB信号，然后将模拟信号发送到显示器或监视器。驱动器软件可以是存储在上述存储器(例如，至少一个存储器1210、1310)中的软件代码。可以将软件代码配置为实施下文所述的步骤(和/或上文所述的组件、模块和信号流)。However, alternative embodiments are contemplated, such as systems embodied as special purpose processors. The dedicated processor may be a graphics processing unit (GPU). A GPU can be a component of a graphics card. The graphics card may also include video memory, random access memory digital to analog converters (RAMDACs), and driver software. The video memory may be a frame buffer that stores digital data representing a scene of an image or frame. The RAMDAC can be configured to read the contents of video memory, convert the contents to an analog RGB signal, and send the analog signal to a display or monitor. The driver software may be software codes stored in the memory described above (eg, at least one memory 1210, 1310). The software code may be configured to implement the steps described below (and/or the components, modules and signal flows described above).

尽管下文描述的步骤被描述为由处理器和/或专用处理器执行，但是这些步骤不一定由同一处理器执行。换句话说，至少一个处理器和/或至少一个专用处理器可以执行下文参考图7至图11所述的步骤。Although the steps described below are described as being performed by a processor and/or special purpose processors, these steps are not necessarily performed by the same processor. In other words, at least one processor and/or at least one dedicated processor may perform the steps described below with reference to FIGS. 7 to 11 .

图7示出了根据至少一个示例实施例的用于压缩视频的帧的方法的框图。如图7所示，在步骤S705中，接收包括视频的多个帧的文件。例如，可以将文件保存或传送到服务器(例如，流传输服务器)。该文件可以包括视频。该视频可以是CGI 3D电影。该文件可以包括感兴趣的多个3D对象(例如，3D电影中的角色)。7 illustrates a block diagram of a method for compressing frames of video, according to at least one example embodiment. As shown in FIG. 7, in step S705, a file including a plurality of frames of video is received. For example, the file may be saved or transferred to a server (eg, a streaming server). The file can include video. The video can be a CGI 3D movie. The file may include a number of 3D objects of interest (eg, characters in a 3D movie).

在示例实施方式中，可以通过三角形网格来定义感兴趣的多个3D对象中的每一个。三角形网格可以是由三角形面连接的点的集合。每个点可以存储各种属性。例如，属性可以包括每个点的位置、颜色、纹理坐标等。感兴趣的多个3D对象中的每一个的网格可以具有相同数量的点，每个点具有相同属性。因此，当存储在存储器中时，感兴趣的多个3D对象中的每一个的网格可以为近似相同大小(例如，位数)。In an example embodiment, each of the plurality of 3D objects of interest may be defined by a triangular mesh. A triangular mesh can be a collection of points connected by triangular faces. Each point can store various properties. For example, attributes can include the location, color, texture coordinates, etc. of each point. The mesh for each of the multiple 3D objects of interest may have the same number of points, each point having the same properties. Thus, the grids for each of the plurality of 3D objects of interest may be approximately the same size (eg, bits) when stored in memory.

在步骤S710中，选择多个帧之一。例如，可以将多个帧中的每个帧作为压缩目标。可以按时间顺序压缩多个帧。因此，在初始步骤中，选择时间上的第一帧。然后可以顺序地选择下一帧。In step S710, one of the plurality of frames is selected. For example, each of a plurality of frames can be a compression target. Multiple frames can be compressed in chronological order. Therefore, in an initial step, the first frame in time is selected. The next frame can then be selected sequentially.

在步骤S715中，在所选择的帧和关键帧中标识3D对象。所标识的3D对象可以是动态3D对象、非动态3D对象、固定3D对象、背景3D对象等。例如，机器视觉、计算机视觉和/或计算机图像识别技术可以被用来标识和定位3D对象。In step S715, 3D objects are identified in the selected frames and keyframes. The identified 3D objects may be dynamic 3D objects, non-dynamic 3D objects, fixed 3D objects, background 3D objects, and the like. For example, machine vision, computer vision and/or computer image recognition techniques may be used to identify and locate 3D objects.

在示例实施方式中，基于使用多个已知图像训练(机器学习)卷积神经网络的计算机图像识别技术可以被用来标识3D对象。例如，从所选择的帧中选择和/或标识块、多个块和/或补片。经训练的卷积神经网络可以对所选块、多个块和/或补片进行操作。可以测试结果(例如，误差测试、损失测试、散度测试等)。如果测试产生低于阈值的值(或替代地，取决于测试类型而产生高于阈值的值)，则可以将所选块、多个块和/或补片标识为3D对象。In an example embodiment, computer image recognition techniques based on training (machine learning) convolutional neural networks using a number of known images may be used to identify 3D objects. For example, a block, blocks and/or patches are selected and/or identified from the selected frame. The trained convolutional neural network can operate on the selected block, multiple blocks, and/or patches. Results can be tested (eg, error tests, loss tests, divergence tests, etc.). If the test produces a value below the threshold (or alternatively, depending on the type of test, above the threshold), the selected block, blocks and/or patches may be identified as a 3D object.

在示例实施方式中，视频的帧可以包括指示先前标识的感兴趣的3D对象被包括在帧中的标签。标签可以包括3D对象的身份和定位。例如，可以使用计算机生成图像(CGI)工具(例如计算机动画电影)来生成视频。可以在每帧中标识和标记计算机生成的角色。此外，可以将所标识的感兴趣的3D对象中的每一个(例如，所标识的角色)的模型存储为所存储的3D对象135。In an example embodiment, a frame of the video may include a tag indicating that a previously identified 3D object of interest is included in the frame. Labels may include the identity and location of the 3D object. For example, a computer-generated image (CGI) tool, such as a computer-animated film, may be used to generate the video. Computer-generated characters can be identified and tagged in each frame. Additionally, a model of each of the identified 3D objects of interest (eg, identified characters) may be stored as stored 3D objects 135 .

在示例实施方式中，可以由三角形网格定义3D对象。三角形网格可以是由三角形面连接的点的集合。每个点可以存储各种属性。例如，属性可以包括每个点的位置、颜色、纹理坐标等。属性可以包括和/或指示(例如，多个属性可以指示)相应的3D对象的定向和/或相应的3D对象在所选择的帧中的位置。In an example embodiment, a 3D object may be defined by a triangular mesh. A triangular mesh can be a collection of points connected by triangular faces. Each point can store various properties. For example, attributes can include the location, color, texture coordinates, etc. of each point. The attributes may include and/or indicate (eg, multiple attributes may indicate) the orientation of the corresponding 3D object and/or the position of the corresponding 3D object in the selected frame.

因此，在示例实施方式中，3D对象的网格属性可以足以标识和定位3D对象。可以将包括用于感兴趣的多个3D对象的网格属性的模型存储为所存储的3D对象135。该模型可以被标准化。例如，可以将用于男人、女人、青少年、儿童或更一般地人类或人类的一部分(例如，身体、头部、手等)的模型存储为所存储的3D对象135。例如，可以将用于狗、猫、鹿或更一般的四足动物或四足动物的一部分(例如，身体、头部、腿等)的模型存储为所存储的3D对象135。然后，可以使用模型的属性在帧中搜索具有相似属性的3D对象。Thus, in example embodiments, the mesh properties of the 3D object may be sufficient to identify and locate the 3D object. A model including mesh properties for multiple 3D objects of interest may be stored as stored 3D objects 135 . The model can be normalized. For example, a model for a man, woman, teenager, child, or more generally a human being or a part of a human being (eg, body, head, hand, etc.) may be stored as stored 3D objects 135 . For example, a model for a dog, cat, deer, or more generally a quadruped or part of a quadruped (eg, body, head, legs, etc.) may be stored as the stored 3D object 135 . The model's properties can then be used to search the frame for 3D objects with similar properties.

在步骤S720中，确定3D对象在所选择的帧和关键帧内的位置和方向。例如，可以通过具有多个点的网格来定义3D对象，每个点具有至少一个属性。3D对象的位置可以是3D对象在帧内的位置。因此，3D对象在帧内的位置可以基于用于该帧的3D笛卡尔坐标系。例如，网格的至少一个点可以位于帧内的x,y,z位置。3D对象的定向可以基于用于定义3D对象的网格中的每个点的位置坐标属性。如果3D对象是移动中的角色，则3D对象的定向可以是角色在帧和/或关键帧中的姿势。In step S720, the position and orientation of the 3D object within the selected frame and keyframe are determined. For example, a 3D object can be defined by a grid with multiple points, each point having at least one property. The position of the 3D object may be the position of the 3D object within the frame. Thus, the position of the 3D object within the frame may be based on the 3D Cartesian coordinate system used for the frame. For example, at least one point of the grid may be located at an x,y,z position within the frame. The orientation of the 3D object may be based on the location coordinate properties of each point in the grid used to define the 3D object. If the 3D object is a moving character, the orientation of the 3D object may be the pose of the character in frames and/or keyframes.

在步骤S725中，将3D对象与所存储的3D对象进行匹配。例如，可以将该3D对象与感兴趣的3D对象(例如，所存储的3D对象135)之一匹配。在示例实施方式中，可以使用如由计算机图像识别技术生成的3D对象的身份来搜索所存储的3D对象135。在示例实施方式中，可以将在帧中找到的3D对象的标签与所存储的3D对象135之一的标签匹配。在示例实施方式中，所存储的3D对象135之一与3D对象具有相似属性的模型可以被标识为匹配。经匹配的所存储的3D对象可以被用作几何代理。In step S725, the 3D objects are matched with the stored 3D objects. For example, the 3D object may be matched to one of the 3D objects of interest (eg, stored 3D objects 135). In an example embodiment, the stored 3D object 135 may be searched using the identity of the 3D object as generated by computer image recognition techniques. In an example embodiment, the label of the 3D object found in the frame may be matched to the label of one of the stored 3D objects 135 . In an example embodiment, a model with one of the stored 3D objects 135 having similar properties to the 3D object may be identified as a match. The matched stored 3D objects can be used as geometric proxies.

在步骤S730中，基于3D对象来变换所存储的3D对象。例如，可以基于3D对象使所存储的3D对象变形。可以基于3D对象调整所存储的3D对象的大小。可以基于3D对象来定向所存储的3D对象。可以基于3D对象旋转所存储的3D对象。可以基于3D对象平移所存储的3D对象。变换所存储的3D对象可以导致所存储的3D对象(如果在显示上与3D对象并排地渲染)看起来与3D对象基本相似(例如，从姿势上)。注意，所存储的3D对象可以由不同的网格定义，并且可以具有与3D对象不同的颜色属性。In step S730, the stored 3D objects are transformed based on the 3D objects. For example, the stored 3D objects can be deformed based on the 3D objects. The stored 3D objects can be resized based on the 3D objects. The stored 3D objects may be oriented based on the 3D objects. The stored 3D objects can be rotated based on the 3D objects. The stored 3D objects can be translated based on the 3D objects. Transforming the stored 3D object may cause the stored 3D object (if rendered side-by-side with the 3D object on the display) to appear substantially similar (eg, pose-wise) to the 3D object. Note that the stored 3D objects may be defined by different meshes and may have different color properties than the 3D objects.

在示例实施方式中，与所存储的3D对象相对应的网格可以绕x轴、y轴和z轴并且沿着x轴、y轴和z轴旋转和平移，以与3D对象的定向对齐。可以变换与帧和关键帧相关联的所存储的3D对象135中的所匹配的一个。因此，所存储的3D对象可以用于基于与帧相关联的针对3D对象的所存储的3D对象来生成第一3D对象代理。此外，所存储的3D对象可以用于基于与关键帧相关联的针对3D对象的所存储的3D对象来生成第二3D对象代理。因此，可以基于在帧中标识的3D对象来变换第一3D对象代理，并且可以基于在关键帧中标识的3D对象来变换第二3D对象代理。In an example embodiment, the grid corresponding to the stored 3D object may be rotated and translated about and along the x, y and z axes to align with the orientation of the 3D object. The matched one of the stored 3D objects 135 associated with the frame and keyframe may be transformed. Thus, the stored 3D object may be used to generate a first 3D object proxy based on the stored 3D object for the 3D object associated with the frame. Furthermore, the stored 3D object may be used to generate a second 3D object proxy based on the stored 3D object for the 3D object associated with the keyframe. Thus, the first 3D object proxy can be transformed based on the 3D object identified in the frame, and the second 3D object proxy can be transformed based on the 3D object identified in the keyframe.

在步骤S735中，使用经变换的所存储的3D对象，通过颜色预测方案压缩所选择的帧。举例来说，定义经变换的所存储的3D对象的网格中的点可以与位于附近(时间上)的、先前编码的关键帧中的相应(例如，同一)3D对象的点匹配。然后，预测技术可以使用相应的3D对象的匹配点来选择或预测关键帧中的像素/块/补片，以用于计算所选择的帧中的3D对象的残差(例如，相对于关键帧的颜色位移)。In step S735, the selected frame is compressed by a color prediction scheme using the transformed stored 3D object. For example, points in a grid defining a transformed stored 3D object may match points of a corresponding (eg, the same) 3D object in a nearby (temporally), previously encoded keyframe. The prediction technique may then use the matching points of the corresponding 3D object to select or predict pixels/blocks/patches in the keyframes for use in computing the residuals of the 3D objects in the selected frames (eg, relative to the keyframes) color shift).

在示例实施方式中，可以与在帧中标识的3D对象相对应(用颜色和/或纹理)包装经变换的第一3D对象代理，并且可以与关键帧中标识的3D对象相对应(用颜色和/或纹理)包装经变换的第二3D对象代理。将颜色属性从在帧110中标识的3D对象映射到经变换的第一3D对象代理可以包括将3D对象从3D空间(例如XYZ空间)转换到2D空间(例如UV空间)和/或将在帧110中标识的3D对象从2D空间(例如，UV空间)转换到3D空间(例如，XYZ空间)。可以通过从第二3D对象代理的网格表示中的相应点(例如，具有相同点标识或处于网格序列中的相同位置)的颜色属性中减去第一3D对象代理的网格表示中的点的颜色属性来生成用于3D对象的残差。In an example embodiment, the transformed first 3D object proxy may be wrapped (with color and/or texture) corresponding to the 3D object identified in the frame, and may correspond (with color and/or texture) to the 3D object identified in the keyframe and/or texture) wraps the transformed second 3D object proxy. Mapping the color properties from the 3D object identified in frame 110 to the transformed first 3D object proxy may include converting the 3D object from 3D space (eg, XYZ space) to 2D space (eg, UV space) and/or converting the 3D object from 3D space (eg, XYZ space) to 2D space (eg, UV space) The 3D objects identified in 110 are converted from 2D space (eg, UV space) to 3D space (eg, XYZ space). can be obtained by subtracting the color attribute in the mesh representation of the first 3D object proxy from the color attribute of the corresponding point in the mesh representation of the second 3D object proxy (eg, having the same point identification or being at the same position in the mesh sequence) Color properties of points to generate residuals for 3D objects.

在示例实施方式中，在所选择的帧中标识两个或更多个3D对象。使用上述预测技术来预测每个所标识的3D对象。帧的剩余部分使用标准预测技术来生成残差。压缩所选择的帧可以包括对残差执行一系列编码处理。例如，可以变换、量化和熵编码残差。In an example embodiment, two or more 3D objects are identified in the selected frame. Each identified 3D object is predicted using the prediction techniques described above. The remainder of the frame uses standard prediction techniques to generate residuals. Compressing the selected frames may include performing a series of encoding processes on the residuals. For example, the residual can be transformed, quantized and entropy encoded.

另外，预测方案可以生成与所选择的帧相关联的元数据。元数据可以包括已经使用感兴趣的3D对象之一(例如，所存储的3D对象135)预测的、与位于所选择的帧中的至少一个3D对象相关联的数据。元数据可以包括3D对象在所选择的帧和关键帧中的定位和/或定向相关联的属性(例如，网格点属性)。Additionally, the prediction scheme may generate metadata associated with the selected frame. The metadata may include data associated with at least one 3D object located in the selected frame that has been predicted using one of the 3D objects of interest (eg, stored 3D object 135). The metadata may include attributes (eg, grid point attributes) associated with the positioning and/or orientation of the 3D object in selected frames and keyframes.

在步骤S740中，将经压缩的所选择的帧与标识3D对象以及3D对象的位置和定向的元数据一起存储。例如，经压缩的所选择的帧和元数据可以存储在与流传输服务器相关联的存储器中。可以与对应于视频(或其一部分)的多个压缩帧有关地存储经压缩的所选择的帧和元数据。In step S740, the compressed selected frame is stored together with metadata identifying the 3D object and the position and orientation of the 3D object. For example, the compressed selected frames and metadata may be stored in memory associated with the streaming server. The compressed selected frame and metadata may be stored in relation to a plurality of compressed frames corresponding to the video (or a portion thereof).

图8示出了根据至少一个示例实施例的用于压缩视频的帧的另一种方法的框图。如图8所示，在步骤S805中，接收包括视频的多个帧的文件。例如，可以将文件保存或传送到服务器(例如，流传输服务器)。该文件可以包括视频。该视频可以是CGI 3D电影。该文件可以包括感兴趣的多个3D对象(例如，3D电影中的角色)。8 illustrates a block diagram of another method for compressing frames of video, according to at least one example embodiment. As shown in FIG. 8, in step S805, a file including a plurality of frames of video is received. For example, the file may be saved or transferred to a server (eg, a streaming server). The file can include video. The video can be a CGI 3D movie. The file may include a number of 3D objects of interest (eg, characters in a 3D movie).

在示例实施方式中，可以通过三角形网格来定义感兴趣的多个3D对象中的每一个。三角形网格可以是由三角形面连接的点的集合。每个点可以存储各种属性。例如，属性可以包括每个点的位置、颜色、纹理坐标等。感兴趣的多个3D对象中的每一个的网格可以具有相同数量的点，每个点具有相同属性。因此，当存储在存储器中时，用于感兴趣的多个3D对象中的每一个的网格可以具有近似相同的大小(例如，位数)。In an example embodiment, each of the plurality of 3D objects of interest may be defined by a triangular mesh. A triangular mesh can be a collection of points connected by triangular faces. Each point can store various properties. For example, attributes can include the location, color, texture coordinates, etc. of each point. The mesh for each of the multiple 3D objects of interest may have the same number of points, each point having the same properties. Thus, the grids for each of the plurality of 3D objects of interest may have approximately the same size (eg, number of bits) when stored in memory.

在示例实施方式中，可以压缩(例如，使用上文参考图5A所述的技术压缩)用于感兴趣的多个3D对象中的每一个的网格。例如，可以使用生成建模技术(例如，使用神经网络、卷积神经网络、VAE等)压缩用于感兴趣的多个3D对象中的每一个的网格。可以使用具有卷积神经网络的神经网络编码器来压缩用于感兴趣的多个3D对象中的每一个的网格，其中，卷积神经网络具有基于机器训练的生成建模技术选择和训练的元素，该机器训练的生成建模技术被配置为生成与用于每个3D对象的网格属性和位置相关联的减少数量的变量。与用于3D对象的网格属性和位置相关联的所生成的数量减少的变量有时被称为用于3D对象的紧凑潜在表示。In an example embodiment, the mesh for each of the plurality of 3D objects of interest may be compressed (eg, using the techniques described above with reference to FIG. 5A ). For example, the meshes for each of the plurality of 3D objects of interest may be compressed using generative modeling techniques (eg, using neural networks, convolutional neural networks, VAEs, etc.). The mesh for each of the plurality of 3D objects of interest can be compressed using a neural network encoder with a convolutional neural network with a machine-trained-based generative modeling technique selected and trained. Element, the machine-trained generative modeling technique is configured to generate a reduced number of variables associated with mesh properties and positions for each 3D object. The resulting reduced number of variables associated with mesh properties and positions for 3D objects is sometimes referred to as a compact latent representation for 3D objects.

在步骤S810中，选择多个帧之一。例如，可以将多个帧中的每个帧作为压缩目标。可以按时间顺序压缩多个帧。因此，在初始步骤中，选择时间上的第一帧。然后可以按顺序选择下一帧。In step S810, one of the plurality of frames is selected. For example, each of a plurality of frames can be a compression target. Multiple frames can be compressed in chronological order. Therefore, in an initial step, the first frame in time is selected. The next frame can then be selected in sequence.

在步骤S815中，在所选择的帧和关键帧中标识3D对象。所标识的3D对象可以是动态3D对象、非动态3D对象、静止3D对象、背景3D对象等。例如，机器视觉、计算机视觉和/或计算机图像识别技术可以被用来标识和定位3D对象。In step S815, 3D objects are identified in the selected frames and keyframes. The identified 3D objects may be dynamic 3D objects, non-dynamic 3D objects, stationary 3D objects, background 3D objects, and the like. For example, machine vision, computer vision and/or computer image recognition techniques may be used to identify and locate 3D objects.

在示例实施方式中，视频的帧可以包括指示先前标识的感兴趣的3D对象被包括在帧中的标签。标签可以包括3D对象的身份和定位。例如，可以使用计算机生成图像(CGI)工具(例如计算机动画电影)来生成视频。可以在每帧中标识和标记计算机生成的角色。此外，可以将用于每个所标识的感兴趣的3D对象(例如，所标识的角色)的模型存储为所存储的3D对象135。In an example embodiment, a frame of the video may include a tag indicating that a previously identified 3D object of interest is included in the frame. Labels may include the identity and location of the 3D object. For example, a computer-generated image (CGI) tool, such as a computer-animated film, may be used to generate the video. Computer-generated characters can be identified and tagged in each frame. Additionally, a model for each identified 3D object of interest (eg, identified character) may be stored as stored 3D object 135 .

在步骤S820中，确定3D对象在所选择的帧和关键帧内的位置和定向。例如，可以通过具有多个点的网格来定义3D对象，每个点具有至少一个属性。3D对象的位置可以是3D对象在帧内的位置。因此，3D对象在帧内的位置可以基于用于该帧的3D笛卡尔坐标系。例如，网格的至少一个点可以位于帧内的x,y,z位置。3D对象的定向可以基于用于定义3D对象的网格中的每个点的位置坐标属性。如果3D对象是移动中的角色，则3D对象的定向可以是角色在帧和/或关键帧中的姿势。In step S820, the position and orientation of the 3D object within the selected frame and keyframe is determined. For example, a 3D object can be defined by a grid with multiple points, each point having at least one property. The position of the 3D object may be the position of the 3D object within the frame. Thus, the position of the 3D object within the frame may be based on the 3D Cartesian coordinate system used for the frame. For example, at least one point of the grid may be located at an x,y,z position within the frame. The orientation of the 3D object may be based on the location coordinate properties of each point in the grid used to define the 3D object. If the 3D object is a moving character, the orientation of the 3D object may be the pose of the character in frames and/or keyframes.

在步骤S825中，将3D对象与用于3D对象的潜在表示进行匹配。例如，该3D对象可以与感兴趣的3D对象之一(例如，所存储的3D对象135)进行匹配。然后，如上所述，可以将所匹配的3D对象编码为用于3D对象的潜在表示。例如，可以将该3D对象与感兴趣的3D对象(例如，用于所存储的3D对象的的潜在表示510)之一进行匹配。在示例实施方式中，可以使用如由计算机图像识别技术生成的3D对象的身份来搜索所存储的3D对象的潜在表示510。在示例实施方式中，可以将帧中找到的3D对象的标签与所存储的3D对象的潜在表示510之一的标签进行匹配。在示例实施方式中，所存储的3D对象的潜在表示510之一与3D对象的压缩网格属性具有相似属性的模型可以被标识为匹配。In step S825, the 3D object is matched with a latent representation for the 3D object. For example, the 3D object may be matched with one of the 3D objects of interest (eg, stored 3D object 135). Then, as described above, the matched 3D objects can be encoded into latent representations for the 3D objects. For example, the 3D object can be matched to one of the 3D objects of interest (eg, the latent representations 510 for the stored 3D objects). In an example embodiment, the identities of the 3D objects as generated by computer image recognition techniques may be used to search 510 the stored potential representations of the 3D objects. In an example embodiment, the label of the 3D object found in the frame may be matched with the label of one of the stored potential representations 510 of the 3D object. In an example embodiment, a model whose one of the stored potential representations 510 of the 3D object has similar properties to the compressed mesh properties of the 3D object may be identified as a match.

在步骤S830中，基于3D对象变换潜在表示。例如，可以基于3D对象使所存储的3D对象变形。可以基于3D对象调整所存储的3D对象的大小。可以基于3D对象来定向所存储的3D对象。可以基于3D对象旋转所存储的3D对象。可以基于3D对象平移所存储的3D对象。变换所存储的3D对象可以导致所存储的3D对象(如果在显示上与3D对象并排地渲染)看起来与3D对象基本相似。注意，所存储的3D对象可以由不同的网格定义，并且可以具有与3D对象不同的颜色属性。In step S830, the latent representation is transformed based on the 3D object. For example, the stored 3D objects can be deformed based on the 3D objects. The stored 3D objects can be resized based on the 3D objects. The stored 3D objects may be oriented based on the 3D objects. The stored 3D objects can be rotated based on the 3D objects. The stored 3D objects can be translated based on the 3D objects. Transforming the stored 3D object may cause the stored 3D object (if rendered side-by-side with the 3D object on the display) to appear substantially similar to the 3D object. Note that the stored 3D objects may be defined by different meshes and may have different color properties than the 3D objects.

在示例实施方式中，可以绕x轴、y轴和z轴并且沿着x轴、y轴和z轴旋转和平移3D对象的潜在表示的每一点的坐标，以与3D对象的定向对齐。可以变换与帧和关键帧相关联的所匹配的3D对象的潜在表示。因此，可以基于与该帧相关联的针对3D对象的所存储的3D对象而将3D对象的潜在表示生成为第一3D对象代理。此外，可以基于与关键帧相关联的针对3D对象的所存储的3D对象而将3D对象的潜在表示生成为第二3D对象代理。因此，可以基于在帧中标识的3D对象来变换第一3D对象代理，并且可以基于在关键帧中标识的3D对象来变换第二3D对象代理。In an example embodiment, the coordinates of each point of the underlying representation of the 3D object may be rotated and translated about and along the x, y and z axes to align with the orientation of the 3D object. The underlying representations of the matched 3D objects associated with the frames and keyframes can be transformed. Accordingly, a potential representation of the 3D object may be generated as the first 3D object proxy based on the stored 3D object for the 3D object associated with the frame. Furthermore, a latent representation of the 3D object may be generated as a second 3D object proxy based on the stored 3D object for the 3D object associated with the keyframe. Thus, the first 3D object proxy can be transformed based on the 3D object identified in the frame, and the second 3D object proxy can be transformed based on the 3D object identified in the keyframe.

在示例实施方式中，将所存储的3D对象存储为基于3D对象的紧凑潜在表示。因此，在变换所存储的3D对象之前，可以使用基于3D对象的紧凑潜在表示来再生(例如，使用上文参考图5B所述的技术解压缩)所存储的3D对象。例如，可以将用于所存储的3D对象的紧凑潜在空间中的变量输入到神经网络解码器，以再生定义所存储的3D对象的网格的点以及用于网格的每个点的位置坐标。In an example embodiment, the stored 3D objects are stored as compact latent representations based on the 3D objects. Accordingly, the stored 3D object may be regenerated (eg, decompressed using the techniques described above with reference to FIG. 5B ) using a compact latent representation based on the 3D object prior to transforming the stored 3D object. For example, variables in the compact latent space for the stored 3D object can be input to a neural network decoder to reproduce the points defining the grid of the stored 3D object and the location coordinates for each point of the grid .

在步骤S835中，使用具有用于3D对象的紧凑潜在表示的经变换的所存储的3D对象，用颜色预测方案来压缩所选择的帧。如上所述，具有用于3D对象的紧凑潜在表示的经变换的所存储的3D对象可以是经变换的再生的所存储的3D对象。例如，定义经变换的再生的所存储的3D对象的网格中的点可以与位于附近(时间上)、先前编码的关键帧中的相应(例如，相同)3D对象的点匹配。然后，预测技术可以使用相应的3D对象的匹配点来选择或预测关键帧中的像素/块/补片，以用于计算所选择的帧中的3D对象的残差(例如，相对于关键帧的颜色位移)。In step S835, the selected frame is compressed with a color prediction scheme using the transformed stored 3D object with the compact latent representation for the 3D object. As described above, the transformed stored 3D object with a compact latent representation for the 3D object may be a transformed reproduced stored 3D object. For example, points in the grid defining the transformed reproduced stored 3D object may be matched with points of the corresponding (eg, the same) 3D object located in a nearby (temporally), previously encoded keyframe. The prediction technique may then use the matching points of the corresponding 3D object to select or predict pixels/blocks/patches in the keyframes for use in computing the residuals of the 3D objects in the selected frames (eg, relative to the keyframes) color shift).

在示例实施方式中，可以相应于在帧中标识的3D对象(用颜色和/或纹理)包装经变换的第一3D对象代理，并且可以相应于关键帧中标识的3D对象(用颜色和/或纹理)包装经变换的第二3D对象代理。将颜色属性从在帧110中标识的3D对象映射到经变换的第一3D对象代理可以包括将3D对象从3D空间(例如XYZ空间)转换到2D空间(例如UV空间)和/或将在帧110中标识的3D对象从2D空间(例如，UV空间)转换到3D空间(例如，XYZ空间)。在示例实施方式中，可以通过从第二3D对象代理的潜在表示中的相应点(例如，具有相同坐标或处于相同的位置)的颜色属性减去第一3D对象代理的潜在表示中的点的颜色属性来生成用于3D对象的残差。In an example embodiment, the transformed first 3D object proxy may be wrapped corresponding to the 3D object identified in the frame (with color and/or texture), and may correspond to the 3D object identified in the keyframe (with color and/or texture) or texture) wraps the transformed second 3D object proxy. Mapping the color properties from the 3D object identified in frame 110 to the transformed first 3D object proxy may include converting the 3D object from 3D space (eg, XYZ space) to 2D space (eg, UV space) and/or converting the 3D object from 3D space (eg, XYZ space) to 2D space (eg, UV space) The 3D objects identified in 110 are converted from 2D space (eg, UV space) to 3D space (eg, XYZ space). In an example embodiment, the value of a point in the latent representation of the first 3D object proxy may be subtracted from the color attribute of the corresponding point in the latent representation of the second 3D object proxy (eg, having the same coordinates or at the same location) Color properties to generate residuals for 3D objects.

在示例实施方式中，可以解码用于经变换的包装的第一3D对象代理的潜在表示和经变换的包装的第二3D对象代理的潜在表示。如上所述，解码3D对象的潜在表示可以再生包括3D对象的颜色属性的网格表示。可以通过从第二3D对象代理的再生网格表示中的相应点(例如，具有相同点标识或处于网格序列中的相同位置)的颜色属性中减去第一3D对象代理的再生网格表示中的点的颜色属性来生成用于3D对象的残差In an example embodiment, the latent representation for the transformed wrapped first 3D object proxy and the latent representation of the transformed wrapped second 3D object proxy may be decoded. As described above, decoding the underlying representation of a 3D object can reproduce a grid representation that includes the color properties of the 3D object. The regenerated mesh representation of the first 3D object proxy may be obtained by subtracting the regenerated mesh representation of the first 3D object proxy from the color properties of the corresponding points in the regenerated mesh representation of the second 3D object proxy (eg, having the same point identification or being in the same position in the mesh sequence) Color properties of points in to generate residuals for 3D objects

另外，预测方案可以生成与所选择的帧相关联的元数据。元数据可以包括已经使用感兴趣的3D对象之一(例如，所存储的3D对象135)预测的、与位于所选择的帧中的至少一个3D对象相关联的数据。元数据可以包括3D对象在所选择的帧和关键帧中的定位和/或定向相关联的属性(例如，网格点属性)。元数据可以包括与用于自动编码器的神经网络相关联的信息，该自动编码器用于生成3D对象的潜在表示(例如，编码)以及再生包括3D对象的颜色属性的网格表示(例如，解码)。Additionally, the prediction scheme may generate metadata associated with the selected frame. The metadata may include data associated with at least one 3D object located in the selected frame that has been predicted using one of the 3D objects of interest (eg, stored 3D object 135). The metadata may include attributes (eg, grid point attributes) associated with the positioning and/or orientation of the 3D object in selected frames and keyframes. The metadata may include information associated with a neural network for an autoencoder used to generate latent representations of 3D objects (e.g., encode) and regenerate grid representations (e.g., decode) that include color properties of 3D objects. ).

在步骤S840中，将经压缩的所选择的帧与标识3D对象以及3D对象的位置和定向的元数据一起存储。例如，经压缩的所选择的帧和元数据可以存储在与流传输服务器相关联的存储器中。可以与对应于视频(或其一部分)的多个压缩帧有关地存储经压缩的所选择的帧和元数据。In step S840, the compressed selected frame is stored together with metadata identifying the 3D object and the position and orientation of the 3D object. For example, the compressed selected frames and metadata may be stored in memory associated with the streaming server. The compressed selected frame and metadata may be stored in relation to a plurality of compressed frames corresponding to the video (or a portion thereof).

图9示出了根据至少一个示例实施例的用于解压缩和渲染视频的帧的方法的框图。如图9所示，在步骤S905中，接收包视频的至少一个压缩帧的数据包。例如，客户端设备可以从流传输服务器请求视频的下一帧。响应于该请求，流传输服务器可以选择下一帧(或多个下一帧)，确定所选择的下一帧(或多个下一帧)是否具有相关联的元数据并选择相关联的元数据。流传输服务器可以生成包括所选择的下一帧(多个下一帧)和所选择的相关联的元数据的数据包(或多个数据包)，并且将该数据包(或多个数据包)传送给请求客户端设备。9 illustrates a block diagram of a method for decompressing and rendering frames of a video, according to at least one example embodiment. As shown in FIG. 9, in step S905, a data packet of at least one compressed frame of the packet video is received. For example, the client device may request the next frame of video from the streaming server. In response to the request, the streaming server may select the next frame(s), determine if the selected next frame(s) has associated metadata and select the associated metadata data. The streaming server may generate a packet (or packets) including the selected next frame(s) and the selected associated metadata, and the packet(s) ) to the requesting client device.

在步骤S910中，选择至少一帧进行解压缩。例如，可以从所接收的数据包(或多个数据包)中选择帧。在步骤S915中，确定帧是否包括元数据。响应于确定帧包括元数据，处理继续到步骤S925。否则，处理继续到步骤S920，并且通过一些其他预测方案(例如，不基于将3D对象用作几何代理的预测方案)解码所选择的帧。In step S910, at least one frame is selected for decompression. For example, a frame may be selected from the received packet (or packets). In step S915, it is determined whether the frame includes metadata. In response to determining that the frame includes metadata, processing proceeds to step S925. Otherwise, processing continues to step S920 and the selected frame is decoded by some other prediction scheme (eg, a prediction scheme not based on using 3D objects as geometric proxies).

例如，当将数据包从流传输服务器传送到客户端时，与用于所选择的帧的数据包相关联的报头可以被配置为包含元数据。因此，与用于所选择的帧的数据包相关联的报头可以(例如，经由收发器625)被读取，并且将其与所选择的帧一起传送到解码器(例如，解码器145)。因此，确定所选择的帧包括元数据可以包括确定元数据已经与所选择的帧一起被传送。替代地，确定所选择的帧包括元数据可以包括确定与所选择的帧相关联地存储元数据(例如，存储在解码器中)。For example, a header associated with the data packet for the selected frame may be configured to contain metadata when the data packet is transmitted from the streaming server to the client. Accordingly, the header associated with the data packet for the selected frame may be read (eg, via transceiver 625) and transmitted to a decoder (eg, decoder 145) along with the selected frame. Thus, determining that the selected frame includes metadata may include determining that metadata has been transmitted with the selected frame. Alternatively, determining that the selected frame includes metadata may include determining that metadata is stored in association with the selected frame (eg, in a decoder).

在步骤S925中，基于元数据在所选择的帧中标识3D对象。例如，元数据可以包括已经将3D对象用作几何代理预测的、与位于所选择的帧中的至少一个3D对象(例如，所存储的3D对象135中的至少一个)相关联的数据。元数据可以包括标识用作几何代理的3D对象的信息。In step S925, a 3D object is identified in the selected frame based on the metadata. For example, the metadata may include data associated with at least one 3D object (eg, at least one of the stored 3D objects 135 ) located in the selected frame that has been predicted using the 3D object as a geometric proxy. Metadata may include information identifying 3D objects used as geometric proxies.

在步骤S930中，基于元数据，确定3D对象在所选择的帧和关键帧内的位置和定向。例如，元数据可以包括与3D对象在所选择的帧和关键帧中的定位和/或定向相关联的属性(例如，网格点属性)。In step S930, based on the metadata, the position and orientation of the 3D object within the selected frame and keyframe is determined. For example, the metadata may include attributes (eg, grid point attributes) associated with the positioning and/or orientation of the 3D object in selected frames and keyframes.

在步骤S935中，将3D对象与所存储的3D对象进行匹配。例如，元数据可以包括标识3D对象具有相应的所存储的3D对象的信息。在示例实施方式中，元数据包括可以被用来在所存储的3D对象135中搜索3D对象的唯一ID或标签。如果在所存储的3D对象135中找到唯一ID或标签，则所存储的3D对象135中的相应一个是3D对象的匹配。In step S935, the 3D objects are matched with the stored 3D objects. For example, the metadata may include information identifying that a 3D object has a corresponding stored 3D object. In an example embodiment, the metadata includes a unique ID or tag that can be used to search the stored 3D objects 135 for 3D objects. If a unique ID or tag is found in the stored 3D objects 135, then the corresponding one of the stored 3D objects 135 is a match for the 3D object.

在步骤S940中，基于元数据变换所存储的3D对象。例如，元数据可以包括标识3D对象具有相应的所存储的3D对象的信息以及与在编码过程期间对所存储的3D对象执行的变换有关的信息。与在编码过程期间对所存储的3D对象执行的变换有关的信息可以被用来对所存储的3D对象执行与在所存储的3D对象上执行的相同变换以编码所选择帧。In step S940, the stored 3D object is transformed based on the metadata. For example, the metadata may include information identifying that the 3D object has a corresponding stored 3D object and information related to transformations performed on the stored 3D object during the encoding process. Information about the transformations performed on the stored 3D objects during the encoding process can be used to perform the same transformations on the stored 3D objects as performed on the stored 3D objects to encode the selected frame.

可以基于与在编码过程期间在所存储的3D对象上执行的变换有关的信息使所存储的3D对象变形。可以基于与在编码过程期间在所存储的3D对象上执行的变换有关的信息来调整所存储的3D对象的大小。可以基于与在编码过程期间在所存储的3D对象上执行的变换有关的信息来定向所存储的3D对象。可以基于与在编码过程期间在所存储的3D对象上执行的变换有关的信息来旋转所存储的3D对象。可以基于与在编码过程期间在所存储的3D对象上执行的变换有关的信息来平移所存储的3D对象。变换所存储的3D对象可以使得所存储的3D对象(如果在显示器上与3D对象并排渲染)以与3D对象基本类似地出现，因为3D对象在压缩所选择的帧之前出现在的所选择的帧中。The stored 3D object may be deformed based on information related to transformations performed on the stored 3D object during the encoding process. The stored 3D objects may be resized based on information related to transformations performed on the stored 3D objects during the encoding process. The stored 3D objects may be oriented based on information related to transformations performed on the stored 3D objects during the encoding process. The stored 3D objects may be rotated based on information related to transformations performed on the stored 3D objects during the encoding process. The stored 3D objects may be translated based on information related to transformations performed on the stored 3D objects during the encoding process. Transforming the stored 3D object may cause the stored 3D object (if rendered side-by-side with the 3D object on the display) to appear substantially similar to the 3D object because the 3D object appears at the selected frame before compressing the selected frame middle.

在示例实施方式中，可以变换与帧和关键帧相关联的所存储的3D对象中的所匹配的3D对象。因此，可以使用所存储的3D对象来基于与帧相关联的针对3D对象的所存储的3D对象来生成第一3D对象代理。此外，可以使用所存储的3D对象来基于与关键帧相关联的针对3D对象的所存储的3D对象来生成第二3D对象代理。因此，可以基于在帧中标识的用于3D对象的元数据来变换第一3D对象代理，并且可以基于在关键帧中标识的用于3D对象的元数据来变换第二3D对象代理。In an example embodiment, the matched 3D objects of the stored 3D objects associated with the frames and keyframes may be transformed. Accordingly, the stored 3D object may be used to generate a first 3D object proxy based on the stored 3D object for the 3D object associated with the frame. Furthermore, the stored 3D object may be used to generate a second 3D object proxy based on the stored 3D object for the 3D object associated with the keyframe. Thus, the first 3D object proxy may be transformed based on the metadata identified in the frame for the 3D object, and the second 3D object proxy may be transformed based on the metadata identified in the keyframe for the 3D object.

在示例实施方式中，第一3D对象代理和第二3D对象代理是3D对象的潜在表示。因此，第一3D对象代理可以被用来再生用于第一3D对象代理的网格表示，并且第二3D对象代理可以被用来再生用于第二3D对象代理的网格表示。如上所述，使用自动编码器解码用于3D对象的潜在表示可以再生包括用于3D对象的颜色属性的网格表示。自动编码器可以使用具有从元数据读取的结构的神经网络。In an example embodiment, the first 3D object proxy and the second 3D object proxy are latent representations of the 3D object. Thus, the first 3D object proxy can be used to reproduce the mesh representation for the first 3D object proxy, and the second 3D object proxy can be used to reproduce the mesh representation for the second 3D object proxy. As described above, decoding a latent representation for a 3D object using an autoencoder can regenerate a mesh representation that includes color properties for the 3D object. Autoencoders can use neural networks with structures read from metadata.

在步骤S945中，使用针对3D对象的所存储的3D对象(作为几何代理)，用颜色预测方案，解压缩所选择的帧。初始地，可以熵解码、逆量化和逆变换所选择的帧，以生成与当编码所选择的帧时并且在变换、量化和熵编码残差之前由编码器生成的残差相同(或近似相同)的导数残差。In step S945, the selected frame is decompressed with a color prediction scheme using the stored 3D objects for the 3D objects (as geometric proxies). Initially, selected frames may be entropy decoded, inverse quantized, and inverse transformed to generate the same (or approximately the same) residuals as generated by the encoder when encoding the selected frames and prior to transforming, quantizing, and entropy encoding the residuals. ) of the derivative residuals.

在示例实施方式中，预测方案包括将对应于经变换的(例如，经平移和定向的)所存储的3D对象的点与关键帧的点匹配。然后，使用匹配点来选择或预测关键帧中的像素/块/补片，以用于再生所选择的帧中的3D对象的像素的颜色值。In an example embodiment, the prediction scheme includes matching points corresponding to the transformed (eg, translated and oriented) stored 3D objects with points of keyframes. The matching points are then used to select or predict pixels/blocks/patches in the keyframe for reproducing the color values of the pixels of the 3D object in the selected frame.

再生像素的颜色值可以包括将来自用于关键帧中的预测像素/块/补片的每个匹配点的像素属性值中的、用于3D对象的三角形网格中的每个点的像素属性值与用于基于所选择的帧中的所标识的位置处的残差的经平移和定向的所存储的3D对象的颜色值和/或颜色属性相加。可以基于所选择的帧的残差或剩余部分以及关键帧的相应像素/块/补片来再生用于所选择的帧的剩余部分的颜色值和/或颜色属性。The color value of the reproduced pixel may include the pixel attribute used for each point in the triangular mesh of the 3D object from the pixel attribute value used for each matching point of the predicted pixel/block/patch in the keyframe. The value is added to the stored color value and/or color attribute of the 3D object for translation and orientation based on the residual at the identified location in the selected frame. Color values and/or color attributes for the remainder of the selected frame may be reproduced based on the residual or remainder of the selected frame and the corresponding pixels/blocks/patches of the keyframe.

在示例实施方式中，可以对应于在帧中标识的3D对象(用颜色和/或纹理)包装经变换的第一3D对象代理。换句话说，可以用对应于在帧中标识的3D对象的残差来包装经变换的第一3D对象代理。此外，可以对应于在关键帧中标识的3D对象(用颜色和/或纹理)包装经变换的第二3D对象代理。可以通过将第一3D对象代理的网格表示中的点的颜色属性与第二3D对象代理的网格表示中的相应点(例如，具有相同的点标识或处于网格序列中的相同位置)的颜色属性相加来再生用于3D对象的像素的颜色值。将颜色属性从帧中标识的3D对象映射到经转换的第一3D对象代理可以包括将3D对象从3D空间(例如XYZ空间)转换到2D空间(例如UV空间)和/或将在帧110中标识的3D对象从2D空间(例如，UV空间)转换到3D空间(例如，XYZ空间)。In an example embodiment, the transformed first 3D object proxy may be wrapped (with color and/or texture) corresponding to the 3D object identified in the frame. In other words, the transformed first 3D object proxy may be wrapped with a residual corresponding to the 3D object identified in the frame. Furthermore, the transformed second 3D object proxy may be wrapped (with color and/or texture) corresponding to the 3D object identified in the keyframe. This can be done by comparing the color properties of points in the mesh representation of the first 3D object proxy with the corresponding points in the mesh representation of the second 3D object proxy (eg, having the same point identity or being in the same position in the mesh sequence) The color properties are added to reproduce the color values of the pixels used for the 3D object. Mapping the color properties from the 3D object identified in the frame to the transformed first 3D object proxy may include converting the 3D object from 3D space (eg, XYZ space) to 2D space (eg, UV space) and/or converting the 3D object in frame 110 The identified 3D objects are converted from 2D space (eg, UV space) to 3D space (eg, XYZ space).

在示例实施方式中，第一3D对象代理和第二3D对象代理是3D对象的潜在表示。因此，第一3D对象代理可以被用来再生包括用于第一3D对象代理的颜色属性的网格表示，并且第二3D对象代理可以被用来再生包括用于第二3D对象代理的颜色属性的网格表示。如上所述，使用自动编码器解码3D对象的潜在表示可以再生包括用于3D对象的颜色属性的网格表示。自动编码器可以使用具有从元数据读取的结构的神经网络。In an example embodiment, the first 3D object proxy and the second 3D object proxy are latent representations of the 3D object. Thus, a first 3D object proxy can be used to reproduce a mesh representation that includes color attributes for the first 3D object proxy, and a second 3D object proxy can be used to reproduce a mesh representation that includes color attributes for the second 3D object proxy grid representation. As described above, decoding a latent representation of a 3D object using an autoencoder can regenerate a mesh representation that includes color attributes for the 3D object. Autoencoders can use neural networks with structures read from metadata.

在步骤S950中，将经变换的所存储的3D对象缝合到解压缩的帧中。可以基于经平移和定向的所存储的3D对象的再生颜色值和/或颜色属性以及所选择的帧的剩余部分的再生颜色值和/或颜色属性来重构所选择的帧。在示例实施方式中，可以通过基于(例如，如从元数据读取的)3D对象的所标识的位置，将经平移和定向的所存储的3D对象的再生颜色值和/或颜色属性缝合到所选择的帧的剩余部分的再生颜色值和/或颜色属性中来重构所选择的帧。In step S950, the transformed stored 3D object is stitched into the decompressed frame. The selected frame may be reconstructed based on the translated and oriented stored reproduced color values and/or color properties of the 3D object and the reproduced color values and/or color properties of the remainder of the selected frame. In an example embodiment, the translated and oriented stored regenerated color values and/or color properties of the 3D object may be stitched to the The selected frame is reconstructed from the reproduced color values and/or color attributes of the remainder of the selected frame.

在步骤S955中，渲染包括3D对象的帧。例如，可以渲染再生纹理数据和再生颜色数据以供显示。渲染纹理数据和颜色数据以供显示可以包括使用3D着色器(例如，顶点着色器、几何着色器等)来绘制与帧相关联的网格。可以将着色器配置为生成图元。可以将着色器配置为将网格中的每个顶点的3D位置和纹理变换为出现在显示器(例如，显示器640)上的2D坐标(例如，图元)。渲染纹理数据和颜色数据以供显示还可以包括执行光栅化。光栅化可以包括基于纹理数据和颜色数据，对图元使用分配像素(例如，颜色)值。In step S955, the frame including the 3D object is rendered. For example, regenerated texture data and regenerated color data may be rendered for display. Rendering the texture data and color data for display may include using a 3D shader (eg, vertex shader, geometry shader, etc.) to draw a mesh associated with the frame. Shaders can be configured to generate primitives. The shader may be configured to transform the 3D position and texture of each vertex in the mesh into 2D coordinates (eg, primitives) that appear on a display (eg, display 640 ). Rendering texture data and color data for display may also include performing rasterization. Rasterization may include assigning pixel (eg, color) values to primitives using texture data and color data.

在步骤S960中，对渲染的帧进行颜色校正。例如，颜色校正可以包括补偿帧之间的色差、补偿同一场景的多个视图之间的色差、校正对象变形(扭曲)、校正对象边界变形等。In step S960, color correction is performed on the rendered frame. For example, color correction may include compensating for chromatic aberration between frames, compensating for chromatic aberration between multiple views of the same scene, correcting object distortion (warping), correcting object boundary distortion, and the like.

示例实施方式可以包括标识两个或更多个3D对象；再生两个或更多个3D对象中的每一个的颜色值和/或颜色属性；以及使用两个或更多个3D对象中的每一个重构所选择的帧。可以基于多个重构帧来再生视频数据5。可以渲染视频数据5(例如，纹理数据和颜色数据)，并且颜色校正以在客户端设备的显示器上显示。Example embodiments may include identifying two or more 3D objects; regenerating the color value and/or color attribute of each of the two or more 3D objects; and using each of the two or more 3D objects. A reconstructed frame selected. The video data 5 may be reproduced based on a plurality of reconstructed frames. Video data 5 (eg, texture data and color data) may be rendered and color corrected for display on the client device's display.

如上所述，当实现本文所述的技术(例如，上述方法)时，用于与增加数量的视频一起使用。感兴趣的所存储的3D对象(例如，所存储的3D对象135)的数量必定增加，并且存储感兴趣的3D对象所需的资源(例如，内存)的数量也将增加。此外，在流传输活动期间，将感兴趣的3D对象从流传输服务器传送到客户端设备可能需要大量带宽。因此，对于流传输操作而言，可能变得期望有效地编码和解码感兴趣的3D对象。As described above, the techniques described herein (eg, the methods described above) are intended for use with an increased number of videos when implemented. The number of stored 3D objects of interest (eg, stored 3D objects 135) must increase, and the amount of resources (eg, memory) required to store the 3D objects of interest will also increase. Furthermore, during a streaming activity, transferring the 3D object of interest from the streaming server to the client device may require significant bandwidth. Thus, for streaming operations, it may become desirable to efficiently encode and decode 3D objects of interest.

图10示出了根据至少一个示例实施例的用于压缩3D对象的方法的框图。如图10所示，在步骤S1005中，标识视频的感兴趣的至少一个3D对象。例如，感兴趣的对象可以包括CGI电影的3D角色，包括CGI演员(例如，主角、配角和临时演员)、CGI宠物、CGI生物、CGI怪物等。感兴趣的对象可以包括车辆(例如，火车、汽车或飞机)或可以以可预测的方式(例如，以恒定的速度和/或方向)从帧到帧运动的对象。感兴趣的对象可以包括静止或固定的3D对象(例如，场景的背景，场景内固定位置处的家具或远处缓慢移动的对象)在帧到帧(例如，在没有任何相机或场景平移的情况下)可能看起来是静止的。10 illustrates a block diagram of a method for compressing a 3D object according to at least one example embodiment. As shown in FIG. 10, in step S1005, at least one 3D object of interest of the video is identified. For example, objects of interest may include 3D characters from CGI movies, including CGI actors (eg, lead characters, supporting characters, and extras), CGI pets, CGI creatures, CGI monsters, and the like. Objects of interest may include vehicles (eg, trains, cars, or airplanes) or objects that may move in a predictable manner (eg, with a constant speed and/or direction) from frame to frame. Objects of interest can include stationary or stationary 3D objects (e.g., the background of the scene, furniture at fixed locations within the scene, or objects moving slowly in the distance) from frame to frame (e.g., without any camera or scene panning). bottom) may appear stationary.

可以预先确定感兴趣的3D对象并且与视频相关联地存储。可以根据需要(例如，当选择关键帧时、作为初始化操作的一部分)确定感兴趣的3D对象，并且将其与视频相关联地存储或添加到先前存储的感兴趣的3D对象中。3D objects of interest can be predetermined and stored in association with the video. A 3D object of interest may be determined as needed (eg, when a keyframe is selected, as part of an initialization operation) and stored in association with the video or added to a previously stored 3D object of interest.

在步骤S1010中，确定感兴趣的3D对象的网格属性(例如，顶点和连通性)和位置。根据示例实施方式，每个感兴趣的3D对象可以由具有相同数量的点的网格定义，每个点具有位置坐标。点的其他属性可以根据需要添加。因此，用于每个感兴趣的3D对象的网格属性可以包括具有变化的连通性的相同数量的顶点。In step S1010, mesh properties (eg, vertices and connectivity) and locations of the 3D object of interest are determined. According to an example embodiment, each 3D object of interest may be defined by a grid with the same number of points, each point having position coordinates. Other attributes of the point can be added as needed. Thus, the mesh properties for each 3D object of interest may include the same number of vertices with varying connectivity.

在步骤S1015中，使用机器训练的生成建模技术来生成与感兴趣的3D对象的网格属性和位置相关联的数量减少的变量。例如，可以使用VAE来生成数量减少的变量。这种数量减少的变量有时被称为用于3D对象的潜在表示或用于3D对象的减少的潜在表示。VAE可以包括神经网络编码器和神经网络解码器，每个神经网络编码器和神经网络解码器都包括具有用于神经网络的C个滤波器、KxK掩码和步长因子的相同配置的神经网络。因此，由神经网络编码器生成的用于感兴趣的3D对象的每个潜在表示的潜在空间中的变量数量对于每个感兴趣的3D对象都是相同的。此外，由神经网络解码器再生的网格中的点的数量对于每个感兴趣的3D对象均是相同的。In step S1015, a machine-trained generative modeling technique is used to generate a reduced number of variables associated with mesh properties and positions of the 3D object of interest. For example, a VAE can be used to generate a reduced number of variables. This reduced number of variables is sometimes referred to as a latent representation for 3D objects or a reduced latent representation for 3D objects. A VAE can include a neural network encoder and a neural network decoder, each of which includes a neural network with the same configuration of C filters, KxK masks and stride factors for the neural network . Therefore, the number of variables in the latent space generated by the neural network encoder for each latent representation of a 3D object of interest is the same for each 3D object of interest. Furthermore, the number of points in the grid regenerated by the neural network decoder is the same for each 3D object of interest.

在示例实施方式中，可以压缩(例如，使用上文参考图5A所述的技术压缩)用于感兴趣的多个3D对象的每一个的网格。例如，可以使用生成建模技术(例如，使用神经网络、卷积神经网络、VAE等)压缩用于感兴趣的多个3D对象中的每一个的网格。可以使用具有卷积神经网络的神经网络编码器来压缩用于感兴趣的多个3D对象中的每一个的网格，其中，卷积神经网络具有基于机器训练的生成建模技术选择和训练的元素，该机器训练的生成建模技术被配置为生成与用于每个3D对象的网格属性和位置相关联的减少数量的变量。与用于3D对象的网格属性和位置相关联的所生成的数量减少的变量有时被称为用于3D对象的紧凑潜在表示。In an example embodiment, the mesh for each of the plurality of 3D objects of interest may be compressed (eg, using the techniques described above with reference to FIG. 5A ). For example, the meshes for each of the plurality of 3D objects of interest may be compressed using generative modeling techniques (eg, using neural networks, convolutional neural networks, VAEs, etc.). The mesh for each of the plurality of 3D objects of interest can be compressed using a neural network encoder with a convolutional neural network with a machine-trained-based generative modeling technique selected and trained. Element, the machine-trained generative modeling technique is configured to generate a reduced number of variables associated with mesh properties and positions for each 3D object. The resulting reduced number of variables associated with mesh properties and positions for 3D objects is sometimes referred to as a compact latent representation for 3D objects.

在步骤S1020中，将变量存储为与感兴趣的3D对象相关联并与视频相关联的感兴趣的3D对象的紧凑可解码表示(或紧凑潜在表示)。例如，变量可以被存储为用于3D对象510的潜在表示。用于3D对象510的潜在表示可以被存储在流传输服务器和/或包括至少一个编码器(例如，编码器105)的设备上。In step S1020, the variable is stored as a compact decodable representation (or compact latent representation) of the 3D object of interest associated with the 3D object of interest and associated with the video. For example, variables may be stored as latent representations for 3D object 510 . The underlying representation for 3D object 510 may be stored on a streaming server and/or a device that includes at least one encoder (eg, encoder 105).

图11示出了根据至少一个示例实施例的用于解压缩3D对象的方法的框图。如图11所示，在步骤S1105中，接收与视频相关联的3D对象的紧凑可解码表示。例如，可以将存储为3D对象510的潜在表示的变量存储在流传输服务器和/或包括至少一个编码器(例如，编码器105)的设备上。在视频流传输操作期间，可以从流传输服务器接收存储为3D对象的潜在表示的至少一组变量。11 illustrates a block diagram of a method for decompressing a 3D object according to at least one example embodiment. As shown in FIG. 11, in step S1105, a compact decodable representation of the 3D object associated with the video is received. For example, variables stored as potential representations of 3D object 510 may be stored on a streaming server and/or a device that includes at least one encoder (eg, encoder 105). During a video streaming operation, at least one set of variables stored as a potential representation of a 3D object may be received from a streaming server.

在步骤S1110中，可以使用机器训练的生成建模技术来生成3D对象的网格属性(例如，顶点和连通性)和位置。例如，VAE可以使用可以被用作神经网络解码器的输入的3D对象的潜在表示的潜在空间中的变量，以及VAE可以再生网格的每个点的位置坐标。用于神经网络的C个滤波器、KxK掩码和步长因子的配置可以确定再生的网格的点的数量。当再生网格的点时，神经网络解码器可以再生3D对象的网格属性和位置。In step S1110, a machine-trained generative modeling technique may be used to generate mesh properties (eg, vertices and connectivity) and locations of the 3D object. For example, the VAE can use variables in a latent space of latent representations of 3D objects that can be used as input to the neural network decoder, and the VAE can regenerate the location coordinates of each point of the grid. The configuration of the C filters, KxK mask and step factor for the neural network can determine the number of points of the regenerated grid. When regenerating the points of the mesh, the neural network decoder can regenerate the mesh properties and positions of the 3D objects.

在步骤S1115中，存储与视频相关联的作为所存储的3D对象的3D对象的网格属性、位置、定向和颜色属性。例如，当3D对象被再生为包括由面连接的点的集合的网格时，可以将3D对象存储为所存储的3D对象135。每个再生的点可以存储各种属性。例如，属性可以包括每个点的位置、颜色、纹理坐标等。因此，在基于将3D对象用作如在3D编码器和/或3D解码器中实现的几何代理的颜色预测方案中，可以使用包括在所存储的3D对象135中的再生的3D对象。In step S1115, the mesh properties, position, orientation and color properties of the 3D object as the stored 3D object are stored in association with the video. For example, the 3D object may be stored as stored 3D object 135 when the 3D object is regenerated as a mesh comprising a collection of points connected by faces. Each regenerated point can store various attributes. For example, attributes can include the location, color, texture coordinates, etc. of each point. Accordingly, in a color prediction scheme based on the use of 3D objects as geometric proxies as implemented in the 3D encoder and/or 3D decoder, the reproduced 3D objects included in the stored 3D objects 135 may be used.

图12图示根据至少一个示例实施例的视频编码器系统1200。如图12中所示，视频编码器系统1200包括至少一个处理器1205、至少一个存储器1210、控制器1220和视频编码器105。至少一个处理器1205、至少一个存储器1210、控制器1220、视频编码器105和视频编码器105通过总线1215通信地耦合。12 illustrates a video encoder system 1200 in accordance with at least one example embodiment. As shown in FIG. 12 , video encoder system 1200 includes at least one processor 1205 , at least one memory 1210 , controller 1220 and video encoder 105 . At least one processor 1205 , at least one memory 1210 , controller 1220 , video encoder 105 , and video encoder 105 are communicatively coupled through bus 1215 .

在图12的示例中，视频编码器系统1200可以是或包括至少一个计算设备，并且应该被理解为实际上代表被配置成执行本文描述的方法的任何计算设备。这样，视频编码器系统1200可以被理解为包括可以被用来实现本文描述的技术或其不同或未来版本的各种组件。举例来说，视频编码器系统1200被图示为包括至少一个处理器1205，以及至少一个存储器1210(例如，非暂时性计算机可读存储介质)。In the example of FIG. 12, video encoder system 1200 may be or include at least one computing device, and should be understood to represent virtually any computing device configured to perform the methods described herein. As such, video encoder system 1200 may be understood to include various components that may be used to implement the techniques described herein or different or future versions thereof. For example, video encoder system 1200 is illustrated as including at least one processor 1205, and at least one memory 1210 (eg, a non-transitory computer-readable storage medium).

可以理解，可以利用至少一个处理器1205来执行存储在至少一个存储器1210上的指令，从而由此实现本文所述的各种特征和功能，或者附加的或替代的特征和功能。当然，至少一个处理器1205和至少一个存储器1210可以用于各种其他目的。特别地，可以理解的是，至少一个存储器1210可以被理解为代表各种类型的存储器以及相关的硬件和软件的示例，其可以被用于实现本文描述的模块中的任何一个。It will be appreciated that at least one processor 1205 may be utilized to execute instructions stored on at least one memory 1210 to thereby implement various features and functions described herein, or additional or alternative features and functions. Of course, the at least one processor 1205 and the at least one memory 1210 may be used for various other purposes. In particular, it is to be understood that at least one memory 1210 may be understood to represent examples of various types of memory and associated hardware and software that may be used to implement any of the modules described herein.

至少一个存储器1210可以被配置成存储与视频编码器系统1200相关联的数据和/或信息。至少一个存储器1210可以是共享资源。例如，视频编码器系统1200可以是较大系统(例如，服务器、个人计算机、移动设备等)的元件。因此，至少一个存储器1210可以被配置成存储与较大系统内的其他元素(例如，图像/视频服务、网络浏览或有线/无线通信)相关联的数据和/或信息。At least one memory 1210 may be configured to store data and/or information associated with video encoder system 1200 . At least one memory 1210 may be a shared resource. For example, video encoder system 1200 may be an element of a larger system (eg, server, personal computer, mobile device, etc.). Accordingly, at least one memory 1210 may be configured to store data and/or information associated with other elements within the larger system (eg, image/video services, web browsing, or wired/wireless communications).

控制器1220可以被配置成生成各种控制信号，并且将控制信号传达至视频编码器系统1200中的各个块。控制器1220可以被配置成生成控制信号以实现上述技术。根据示例实施例，控制器1220可以被配置成控制视频编码器1225以对图像、图像序列、视频帧、视频序列等进行编码。例如，控制器1220可以生成与视频质量相对应的控制信号。Controller 1220 may be configured to generate and communicate various control signals to various blocks in video encoder system 1200 . Controller 1220 may be configured to generate control signals to implement the techniques described above. According to example embodiments, the controller 1220 may be configured to control the video encoder 1225 to encode images, image sequences, video frames, video sequences, and the like. For example, the controller 1220 may generate a control signal corresponding to the video quality.

视频编码器105可以被配置成接收视频流输入5并输出压缩(例如，编码)视频位10。视频编码器105可以将视频流输入5转换为离散的视频帧。视频流输入5也可以是图像，因此，压缩(例如，编码)视频位10也可以是压缩的图像位。视频编码器105可以进一步将每个离散视频帧(或图像)转换成块矩阵(以下称为块)。例如，视频帧(或图像)可以被转换为分别具有多个像素的块的16×16、16×8、8×8、4×4或2×2的矩阵。尽管列出五个示例矩阵，但是示例实施例不限于此。Video encoder 105 may be configured to receive video stream input 5 and output compressed (eg, encoded) video bits 10 . The video encoder 105 may convert the video stream input 5 into discrete video frames. The video stream input 5 can also be an image, so the compressed (eg, encoded) video bits 10 can also be compressed image bits. Video encoder 105 may further convert each discrete video frame (or image) into a matrix of blocks (hereinafter referred to as blocks). For example, a video frame (or image) may be converted into a 16x16, 16x8, 8x8, 4x4, or 2x2 matrix of blocks of multiple pixels, respectively. Although five example matrices are listed, example embodiments are not so limited.

压缩视频位10可以表示视频编码器系统1200的输出。例如，压缩视频位10可以表示编码视频帧(或编码图像)。例如，压缩视频位10可以准备就绪以传输到接收设备(未示出)。例如，视频位可以被传输到系统收发器(未示出)以用于传输到接收设备。Compressed video bits 10 may represent the output of video encoder system 1200 . For example, compressed video bits 10 may represent encoded video frames (or encoded images). For example, compressed video bits 10 may be ready for transmission to a receiving device (not shown). For example, video bits may be transmitted to a system transceiver (not shown) for transmission to a receiving device.

至少一个处理器1205可以被配置成执行与控制器1220和/或视频编码器105相关联的计算机指令。至少一个处理器1205可以是共享资源。举例来说，视频编码器系统1200可为较大系统的元件(例如，移动设备、服务器、流传输服务器等)。因此，至少一个处理器1205可以被配置成执行与较大系统内的其他元件相关联的计算机指令(例如，图像/视频服务、网络浏览或有线/无线通信)。At least one processor 1205 may be configured to execute computer instructions associated with controller 1220 and/or video encoder 105 . At least one processor 1205 may be a shared resource. For example, video encoder system 1200 may be an element of a larger system (eg, mobile device, server, streaming server, etc.). Accordingly, at least one processor 1205 may be configured to execute computer instructions (eg, image/video services, web browsing, or wired/wireless communications) associated with other elements within the larger system.

在示例实施方式中，视频编码器系统1200可以被实现为包括图形处理单元(GPU)的图形卡和/或芯片(例如，计算机母板上的ASIC)或在图形卡和/或芯片中实现，该图形处理单元(GPU)被配置成从中央处理器(CPU)上去除负载。至少一个处理器1205可以被实现为被配置成并行处理大视频数据块的GPU。可以将GPU配置成处理(例如，压缩)网格数据并从网格数据生成像素数据。至少一个存储器1210可以包括视频存储器和驱动器软件。视频存储器可以是帧缓冲器，其存储代表帧图像或场景的数字数据。视频存储器可以在GPU处理之前和之后存储数字数据。驱动器软件可以包括配置成解压缩视频数据的编解码器。编解码器可以包括基于将3D对象用作此处所述的几何代理来实现颜色预测方案。In example embodiments, the video encoder system 1200 may be implemented as or in a graphics card and/or chip including a graphics processing unit (GPU) (eg, an ASIC on a computer motherboard), The graphics processing unit (GPU) is configured to offload the central processing unit (CPU). At least one processor 1205 may be implemented as a GPU configured to process large blocks of video data in parallel. The GPU may be configured to process (eg, compress) mesh data and generate pixel data from the mesh data. At least one memory 1210 may include video memory and driver software. Video memory may be a frame buffer that stores digital data representing frame images or scenes. Video memory can store digital data before and after GPU processing. The driver software may include a codec configured to decompress the video data. The codec may include implementing a color prediction scheme based on the use of 3D objects as geometric proxies as described herein.

图13图示根据至少一个示例实施例的视频解码器系统1300。如图13中所示，视频解码器系统1300包括至少一个处理器1305、至少一个存储器1310、控制器1320和视频解码器145。至少一个处理器1305、至少一个存储器1310、控制器1320、视频解码器145和视频解码器145通过总线1315通信地耦合。13 illustrates a video decoder system 1300 in accordance with at least one example embodiment. As shown in FIG. 13 , the video decoder system 1300 includes at least one processor 1305 , at least one memory 1310 , a controller 1320 and a video decoder 145 . At least one processor 1305 , at least one memory 1310 , controller 1320 , video decoder 145 , and video decoder 145 are communicatively coupled through bus 1315 .

在图13的示例中，视频解码器系统1300可以是至少一个计算设备，并且应该被理解为实际上代表被配置成执行本文描述的方法的任何计算设备。这样，视频解码器系统1300可以被理解为包括可以用来实现本文描述的技术或其不同或未来版本的各种组件。举例来说，视频解码器系统1300被图示为包括至少一个处理器1305，以及至少一个存储器1310(例如，计算机可读存储介质)。In the example of FIG. 13, video decoder system 1300 may be at least one computing device and should be understood to represent virtually any computing device configured to perform the methods described herein. As such, video decoder system 1300 may be understood to include various components that may be used to implement the techniques described herein or different or future versions thereof. For example, video decoder system 1300 is illustrated as including at least one processor 1305, and at least one memory 1310 (eg, a computer-readable storage medium).

因此，可以理解的是，可以利用至少一个处理器1305来执行存储在至少一个存储器1310上的指令，从而实现本文所述的各种特征和功能，或者附加的或替代的特征以及功能。当然，至少一个处理器1305和至少一个存储器1310可以用于各种其他目的。特别地，可以理解的是，至少一个存储器1310可以被理解为代表各种类型的存储器以及可以用于实现本文描述的模块中的任何一个的相关硬件和软件的示例。根据示例实施例，视频编码器系统1200和视频解码器系统1300可以被包括在相同的更大的系统(例如，个人计算机、移动设备等)中。Thus, it will be appreciated that at least one processor 1305 may be utilized to execute instructions stored on at least one memory 1310 to implement various features and functions described herein, or additional or alternative features and functions. Of course, the at least one processor 1305 and the at least one memory 1310 may be used for various other purposes. In particular, it is to be understood that the at least one memory 1310 may be understood to represent an example of various types of memory and associated hardware and software that may be used to implement any of the modules described herein. According to example embodiments, video encoder system 1200 and video decoder system 1300 may be included in the same larger system (eg, personal computer, mobile device, etc.).

至少一个存储器1310可以被配置成存储与视频解码器系统1300相关联的数据和/或信息。至少一个存储器1310可以是共享资源。例如，视频解码器系统1300可以是较大系统(例如，个人计算机、移动设备等)的元件。因此，至少一个存储器1310可以被配置成存储与较大系统内的其他元素(例如，网页浏览或无线通信)相关联的数据和/或信息。At least one memory 1310 may be configured to store data and/or information associated with video decoder system 1300 . At least one memory 1310 may be a shared resource. For example, video decoder system 1300 may be an element of a larger system (eg, personal computer, mobile device, etc.). Accordingly, at least one memory 1310 may be configured to store data and/or information associated with other elements within the larger system (eg, web browsing or wireless communications).

控制器1320可以被配置成生成各种控制信号并且将该控制信号传达至视频解码器系统1300中的各个块。控制器1320可以被配置成生成控制信号以便实现下面所描述的视频编码/解码技术。根据示例实施例，控制器1320可以被配置成控制视频解码器145以解码视频帧。Controller 1320 may be configured to generate and communicate various control signals to various blocks in video decoder system 1300 . The controller 1320 may be configured to generate control signals to implement the video encoding/decoding techniques described below. According to example embodiments, the controller 1320 may be configured to control the video decoder 145 to decode video frames.

视频解码器145可以被配置成接收输入并输出视频流5的压缩(例如，编码)视频位10。视频解码器145可以将压缩视频位10的离散视频帧转换为视频流5。压缩(例如，编码)视频位10也可以是压缩图像位，因此，视频流5也可以是图像。Video decoder 145 may be configured to receive input and output compressed (eg, encoded) video bits 10 of video stream 5 . Video decoder 145 may convert discrete video frames of compressed video bits 10 into video stream 5 . The compressed (eg, encoded) video bits 10 may also be compressed image bits, and thus, the video stream 5 may also be an image.

至少一个处理器1305可以被配置成执行与控制器1320和/或视频解码器145相关联的计算机指令。至少一个处理器1305可以是共享资源。例如，视频解码器系统1300可以是较大系统(例如，个人计算机、移动设备等)的元件。因此，至少一个处理器1305可以被配置成执行与较大系统内的其他元素(例如，网页浏览或无线通信)相关联的计算机指令。At least one processor 1305 may be configured to execute computer instructions associated with controller 1320 and/or video decoder 145 . At least one processor 1305 may be a shared resource. For example, video decoder system 1300 may be an element of a larger system (eg, personal computer, mobile device, etc.). Accordingly, at least one processor 1305 may be configured to execute computer instructions associated with other elements within the larger system (eg, web browsing or wireless communication).

在示例实施方式中，视频解码器系统1300可以被实现为或包括图形处理单元(GPU)的图形卡和/或芯片(例如，计算机母板上的ASIC)上或者在其中，该图形处理单元(GPU)被配置成去除来自中央处理器(CPU)的负载。至少一个处理器1305可以被实现为被配置成并行处理大视频数据块的GPU。可以将GPU配置成处理(例如，解压缩)网格数据并从网格数据生成像素数据。至少一个存储器1310可以包括视频存储器和驱动器软件。视频存储器可以是帧缓冲器，其存储代表帧图像或场景的数字数据。视频存储器可以在GPU处理之前和之后存储数字数据。驱动器软件可以包括配置成解压缩视频数据的编解码器。编解码器可以包括基于将3D对象用作此处所述的几何代理来实现颜色预测方案。In example embodiments, video decoder system 1300 may be implemented as or include a graphics processing unit (GPU) on or in a graphics card and/or chip (eg, an ASIC on a computer motherboard) GPU) is configured to remove the load from the central processing unit (CPU). At least one processor 1305 may be implemented as a GPU configured to process large blocks of video data in parallel. The GPU may be configured to process (eg, decompress) mesh data and generate pixel data from the mesh data. At least one memory 1310 may include video memory and driver software. Video memory may be a frame buffer that stores digital data representing frame images or scenes. Video memory can store digital data before and after GPU processing. The driver software may include a codec configured to decompress the video data. The codec may include implementing a color prediction scheme based on the use of 3D objects as geometric proxies as described herein.

图14示出可以与这里描述的技术一起使用的计算机设备1400和移动计算机设备1450的示例。计算设备1400旨在表示各种形式的数字计算机，诸如膝上型电脑、台式机、工作站、个人数字助理、服务器、刀片服务器、大型机和其他适当的计算机。计算设备1450旨在表示各种形式的移动设备，诸如个人数字助理、蜂窝电话、智能电话和其他类似的计算设备。这里示出的组件、其连接和关系及其功能仅意在为示例性的，而不意在限制本文档中描述和/或要求保护的发明的实施方式。14 shows an example of a computer device 1400 and a mobile computer device 1450 that may be used with the techniques described herein. Computing device 1400 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Computing device 1450 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, and other similar computing devices. The components shown here, their connections and relationships, and their functions are intended to be exemplary only, and are not intended to limit implementations of the invention described and/or claimed in this document.

计算设备1400包括处理器1402、存储器1404、存储设备1406、连接到存储器1404和高速扩展端口1410的高速接口1408、以及连接到低速总线1414和存储设备1406的低速接口1412。组件1402、1404、1406、1408、1410和1412中的每一个均使用各种总线来互连，并且可以被安装在公共主板上或者酌情以其他方式安装。处理器1402可处理用于在计算设备1400内执行的指令，包括存储在存储器1404中或者在存储设备1406上以在诸如耦合到高速接口1408的显示器1416的外部输入或者输出设备上显示用于GUI的图形信息的指令。在其他实施方式中，可以酌情连同多个存储器和多种类型的存储器一起使用多个处理器和/或多个总线。另外，可以连接多个计算设备1400，其中每个设备提供必要操作的部分(例如，作为服务器组、一组刀片服务器或多处理器系统)。Computing device 1400 includes processor 1402 , memory 1404 , storage device 1406 , high-speed interface 1408 connected to memory 1404 and high-speed expansion port 1410 , and low-speed interface 1412 connected to low-speed bus 1414 and storage device 1406 . Each of the components 1402, 1404, 1406, 1408, 1410, and 1412 are interconnected using various buses, and may be mounted on a common motherboard or otherwise as appropriate. Processor 1402 may process instructions for execution within computing device 1400 , including storing in memory 1404 or on storage device 1406 for display for a GUI on an external input or output device such as display 1416 coupled to high-speed interface 1408 instruction for graphic information. In other implementations, multiple processors and/or multiple buses may be used along with multiple memories and multiple types of memory, as appropriate. Additionally, multiple computing devices 1400 may be connected, with each device providing a portion of the necessary operations (eg, as a server group, a group of blade servers, or a multi-processor system).

存储器1404存储计算设备1400内的信息。在一个实施方式中，存储器1404是一个或多个易失性存储器单元。在另一实施方式中，存储器1404是一个或多个非易失性存储器单元。存储器1404还可以是另一形式的计算机可读介质，诸如磁盘或光盘。Memory 1404 stores information within computing device 1400 . In one embodiment, memory 1404 is one or more volatile memory cells. In another embodiment, memory 1404 is one or more non-volatile memory cells. Memory 1404 may also be another form of computer-readable medium, such as a magnetic or optical disk.

存储设备1406能够为计算设备1400提供大容量存储。在一个实施方式中，存储设备1406可以是或者包含计算机可读介质，诸如软盘设备、硬盘设备、光盘设备或磁带设备、闪速存储器或其他类似的固态存储设备或设备的阵列，包括存储区域网络或其他配置中的设备。计算机程序产品可被有形地具体实现在信息载体中。计算机程序产品还可以包含指令，该指令当被执行时执行一个或多个方法，诸如上面描述的那些方法。信息载体是计算机可读介质或机器可读介质，诸如存储器1404、存储设备1406、或处理器1402上的存储器。Storage device 1406 is capable of providing mass storage for computing device 1400 . In one embodiment, storage device 1406 may be or include a computer-readable medium such as a floppy disk device, hard disk device, optical disk device or tape device, flash memory or other similar solid state storage devices or arrays of devices, including storage area networks or devices in other configurations. The computer program product may be tangibly embodied in an information carrier. The computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer-readable medium or a machine-readable medium, such as memory 1404 , storage device 1406 , or memory on processor 1402 .

高速控制器1408管理计算设备1400的带宽密集运算，而低速控制器1412管理较低带宽密集运算。功能的这种分配仅是示例性的。在一个实施方式中，高速控制器1408耦合到存储器1404、显示器1416(例如，通过图形处理器或加速器)，并且耦合到高速扩展端口1410，该高速扩展端口1410可以接受各种扩展卡(未示出)。在该实施方式中，低速控制器1412耦合到存储设备1406和低速扩展端口1414。可以包括各种通信端口(例如，USB、蓝牙、以太网、无线以太网)的低速扩展端口可以例如通过网络适配器耦合到一个或多个输入或者输出设备，诸如键盘、指点设备、扫描仪、或诸如交换机或路由器的联网设备。High-speed controller 1408 manages bandwidth-intensive operations of computing device 1400, while low-speed controller 1412 manages lower bandwidth-intensive operations. This distribution of functions is exemplary only. In one embodiment, high-speed controller 1408 is coupled to memory 1404, display 1416 (eg, through a graphics processor or accelerator), and to high-speed expansion port 1410, which can accept various expansion cards (not shown) out). In this embodiment, low-speed controller 1412 is coupled to storage device 1406 and low-speed expansion port 1414 . Low-speed expansion ports, which may include various communication ports (eg, USB, Bluetooth, Ethernet, wireless Ethernet), may be coupled to one or more input or output devices, such as keyboards, pointing devices, scanners, or Networked devices such as switches or routers.

如图中所示，可以以许多不同的形式实现计算设备1400。例如，它可以作为标准服务器1420被实现，或者被多次实现在一组此类服务器中。它还可以作为机架服务器系统1424的一部分被实现。此外，它可以被实现在诸如膝上型计算机1422的个人计算机中。可替选地，来自计算设备1400的组件可以与诸如设备1450的移动设备(未示出)中的其他组件组合。此类设备中的每一个均可以包含计算设备1400、1450中的一个或多个，并且整个系统可以由彼此通信的多个计算设备1400、1450组成。As shown in the figure, computing device 1400 may be implemented in many different forms. For example, it may be implemented as a standard server 1420, or multiple times in a group of such servers. It can also be implemented as part of rack server system 1424. Furthermore, it may be implemented in a personal computer such as laptop computer 1422. Alternatively, components from computing device 1400 may be combined with other components in a mobile device (not shown) such as device 1450 . Each of such devices may contain one or more of the computing devices 1400, 1450, and the overall system may consist of multiple computing devices 1400, 1450 in communication with each other.

计算设备1450包括处理器1452、存储器1464、诸如显示器1454的输入或者输出设备、通信接口1466、和收发器1468以及其他组件。设备1450还可以被提供有存储设备，诸如微驱动器或其他设备，以提供附加存储。组件1450、1452、1464、1454、1466和1468中的每一个均使用各种总线来互连，并且若干组件可以被安装在公共主板上或者酌情以其他方式安装。Computing device 1450 includes processor 1452, memory 1464, input or output devices such as display 1454, communication interface 1466, and transceiver 1468, among other components. Device 1450 may also be provided with a storage device, such as a microdrive or other device, to provide additional storage. Each of components 1450, 1452, 1464, 1454, 1466, and 1468 are interconnected using various buses, and several components may be mounted on a common motherboard or otherwise as appropriate.

处理器1452可执行计算设备1450内的指令，包括存储在存储器1464中的指令。处理器可以作为芯片的芯片组被实现，该芯片包括单独的和多个模拟和数字处理器。处理器可以例如提供用于设备1450的其他组件的协调，诸如对用户界面、由设备1450运行的应用和由设备1450进行的无线通信的控制。Processor 1452 may execute instructions within computing device 1450 , including instructions stored in memory 1464 . The processor may be implemented as a chipset of chips including individual and multiple analog and digital processors. The processor may, for example, provide coordination for other components of the device 1450 , such as control of the user interface, applications run by the device 1450 , and wireless communications by the device 1450 .

处理器1452可以通过耦合到显示器1454的控制接口1458和显示接口1456来与用户进行通信。显示器1454可以是例如TFT LCD(薄膜晶体管液晶显示器)或OLED(有机发光二极管)显示器或其他适当的显示技术。显示接口1456可以包括用于驱动显示器1454以向用户呈现图形和其他信息的适当的电路。控制接口1458可以从用户接收命令并且对它们进行转换以便提交给处理器1452。此外，可以提供与处理器1452通信的外部接口1462，以使得能实现设备1450与其他设备的近区域通信。外部接口1462可以在一些实施方式中例如提供用于有线通信，或者在其他实施方式中用于无线通信，并且还可以使用多个接口。Processor 1452 may communicate with a user through control interface 1458 and display interface 1456 coupled to display 1454 . Display 1454 may be, for example, a TFT LCD (Thin Film Transistor Liquid Crystal Display) or OLED (Organic Light Emitting Diode) display or other suitable display technology. Display interface 1456 may include appropriate circuitry for driving display 1454 to present graphics and other information to a user. The control interface 1458 may receive commands from the user and convert them for submission to the processor 1452 . Additionally, an external interface 1462 may be provided in communication with the processor 1452 to enable near area communication of the device 1450 with other devices. The external interface 1462 may be provided, for example, for wired communication in some embodiments, or for wireless communication in other embodiments, and multiple interfaces may also be used.

存储器1464存储计算设备1450内的信息。存储器1464可作为一个或多个计算机可读介质、一个或多个易失性存储器单元、或者一个或多个非易失性存储器单元中的一种或多种而被实现。还可以提供扩展存储器674并且通过扩展接口1472将它连接到设备1450，扩展接口1472可以包括例如SIMM(单列直插存储器模块)卡接口。这种扩展存储器1474可以为设备1450提供附加的存储空间，或者还可以为设备1450存储应用或其他信息。具体地，扩展存储器1474可以包括用于执行或者补充上述过程的指令，并且还可以包括安全信息。因此，例如，扩展存储器1474可以作为用于设备1450的安全模块被提供，并且可以被编程有允许安全使用设备1450的指令。此外，可以经由SIMM卡提供安全应用以及附加信息，诸如以不可破解的方式将识别信息放置在SIMM卡上。Memory 1464 stores information within computing device 1450 . Memory 1464 may be implemented as one or more of one or more computer-readable media, one or more volatile memory units, or one or more non-volatile memory units. Expansion memory 674 may also be provided and connected to device 1450 through expansion interface 1472, which may include, for example, a SIMM (Single Inline Memory Module) card interface. Such expanded memory 1474 may provide additional storage space for device 1450, or may also store applications or other information for device 1450. Specifically, expansion memory 1474 may include instructions for performing or supplementing the above-described processes, and may also include security information. Thus, for example, expansion memory 1474 may be provided as a security module for device 1450, and may be programmed with instructions to allow secure use of device 1450. Furthermore, security applications and additional information may be provided via the SIMM card, such as placing identification information on the SIMM card in an uncrackable manner.

存储器可以包括例如闪速存储器和/或NVRAM存储器，如在下面所讨论的。在一个实施方式中，计算机程序产品被有形地具体实现在信息载体中。计算机程序产品包含指令，该指令当被执行时执行一个或多个方法，诸如上述的那些方法。信息载体是可以例如通过收发器1468或外部接口1462接收的计算机或机器可读介质，诸如存储器1464、扩展存储器1474、或处理器1452上的存储器。The memory may include, for example, flash memory and/or NVRAM memory, as discussed below. In one embodiment, the computer program product is tangibly embodied in an information carrier. The computer program product contains instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as memory 1464 , expansion memory 1474 , or memory on processor 1452 , such as memory 1464 , expansion memory 1474 , or a computer- or machine-readable medium that may be received, eg, through transceiver 1468 or external interface 1462 .

设备1450可以通过通信接口1466以无线方式通信，该通信接口1466必要时可以包括数字信号处理电路。通信接口1466可以提供用于各种模式或协议下的通信，各种模式或协议诸如GSM语音呼叫、SMS、EMS或MMS消息传送、CDMA、TDMA、PDC、WCDMA、CDMA2000或GPRS等。这种通信可以例如通过射频收发器1468而发生。此外，可以发生短距离通信，诸如使用蓝牙、WiFi或其他这种收发器(未示出)。此外，GPS(全球定位系统)接收器模块570可以向设备1450提供附加的导航和位置相关无线数据，其可以酌情由在设备1450上运行的应用使用。Device 1450 may communicate wirelessly through communication interface 1466, which may include digital signal processing circuitry as desired. Communication interface 1466 may provide for communication in various modes or protocols, such as GSM voice calls, SMS, EMS or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, and the like. Such communication may occur through radio frequency transceiver 1468, for example. Additionally, short-range communication may occur, such as using Bluetooth, WiFi, or other such transceivers (not shown). Additionally, GPS (Global Positioning System) receiver module 570 may provide additional navigation and location-related wireless data to device 1450, which may be used by applications running on device 1450 as appropriate.

设备1450还可以使用音频编解码器1460可听地通信，该音频编解码器1460可以从用户接收口语信息并且将它转换为可用的数字信息。音频编解码器1460可以同样地诸如通过扬声器为用户生成可听声音——例如，在设备1450的头戴式耳机中。这种声音可以包括来自语音电话呼叫的声音，可以包括记录的声音(例如，语音消息、音乐文件等)并且还可以包括由在设备1450上操作的应用所生成的声音。Device 1450 may also communicate audibly using audio codec 1460, which may receive spoken information from the user and convert it into usable digital information. Audio codec 1460 may likewise generate audible sound for the user, such as through speakers - eg, in a headset of device 1450. Such sounds may include sounds from voice phone calls, may include recorded sounds (eg, voice messages, music files, etc.) and may also include sounds generated by applications operating on device 1450.

如图中所示，可以以许多不同的形式实现计算设备1450。例如，它可以作为蜂窝电话1480被实现。它还可以作为智能电话1482、个人数字助理或其他类似的移动设备的一部分被实现。As shown in the figure, computing device 1450 may be implemented in many different forms. For example, it can be implemented as a cellular phone 1480. It can also be implemented as part of a smartphone 1482, personal digital assistant or other similar mobile device.

这里描述的系统和技术的各种实施方式可用数字电子电路、集成电路、专门地设计的ASIC(专用集成电路)、计算机硬件、固件、软件、和/或其组合加以实现。这些各种实施方式可包括在可编程系统上可执行和/或可解释的一个或多个计算机程序中的实施方式，该可编程系统包括至少一个可编程处理器，其可以是专用的或通用的，耦合以从存储系统、至少一个输入设备和至少一个输出设备接收数据和指令，并且以向存储系统、至少一个输入设备和至少一个输出设备发送数据和指令。这里描述的系统和技术的各种实施方式可作为可组合软件和硬件方面的电路、模块、块或系统被实现和/或一般地在本文中被称为可组合软件和硬件方面的电路、模块、块或系统。例如，模块可以包括在处理器(例如，形成在硅基底、GaAs基底等上的处理器)或某个其他可编程数据处理装置上执行的功能/行为/计算机程序指令。Various implementations of the systems and techniques described herein may be implemented in digital electronic circuits, integrated circuits, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include embodiments in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor, which may be special purpose or general purpose are coupled to receive data and instructions from a storage system, at least one input device and at least one output device, and to send data and instructions to the storage system, at least one input device and at least one output device. Various implementations of the systems and techniques described herein may be implemented as and/or generally referred to herein as circuits, modules of combinable software and hardware aspects as circuits, modules, blocks or systems of combinable software and hardware aspects , block or system. For example, a module may include functions/acts/computer program instructions executed on a processor (eg, a processor formed on a silicon substrate, GaAs substrate, etc.) or some other programmable data processing device.

上述示例实施例中的一些被描述为作为流程图描绘的过程或方法。尽管流程图将操作描述为顺序过程，然而可以并行地、并发地或同时地执行许多操作。此外，可以重新布置操作的次序。这些过程可以在其操作完成时被终止，但是也可以具有未包括在图中的附加步骤。这些过程可以对应于方法、函数、程序、子例程、子程序等。Some of the above-described example embodiments are described as processes or methods depicted as flowcharts. Although the flowcharts describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or concurrently. Additionally, the order of operations may be rearranged. These processes may be terminated when their operations are complete, but may also have additional steps not included in the figure. These procedures may correspond to methods, functions, programs, subroutines, subroutines, and the like.

上面讨论的方法——其中的一些通过流程图来图示——可以通过硬件、软件、固件、中间件、微码、硬件描述语言或其任何组合来实现。当用软件、固件、中间件或微码加以实现时，用于执行必要的任务的程序代码或代码段可以被存储在诸如存储介质的机器或计算机可读介质中。处理器可以执行必要的任务。The methods discussed above, some of which are illustrated by flowcharts, may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware, or microcode, the program code or code segments for performing the necessary tasks may be stored in a machine- or computer-readable medium, such as a storage medium. The processor can perform the necessary tasks.

本文中公开的具体结构和功能细节仅仅是表示性的以用于描述示例实施例的目的。然而，示例实施例被以许多替代形式具体实现，而不应该被解释为限于仅本文中阐述的实施例。Specific structural and functional details disclosed herein are merely representative for purposes of describing example embodiments. Example embodiments, however, are embodied in many alternative forms and should not be construed as limited to only the embodiments set forth herein.

应理解的是，尽管可以在本文中使用术语第一、第二等来描述各种元件，然而这些元件不应该受这些术语限制。这些术语仅用于区分一个元件和另一元件。例如，第一元件能被称为第二元件，并且类似地，第二元件能被称为第一元件，而不脱离示例实施例的范围。如本文中所使用的，术语和/或包括相关列举项目中的一个或多个的任何和所有组合。It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. As used herein, the terms and/or include any and all combinations of one or more of the associated listed items.

应理解的是，当一个元件被称为连接或者耦合到另一元件时，它可直接地连接或者耦合到另一元件或者可以存在中间元件。相比之下，当一个元件被称为直接地连接或者直接地耦合到另一元件时，不存在中间元件。应该以相似的方式解释用于描述元件之间的关系的其他单词(例如，在…之间对直接地在…之间、相邻对直接地相邻等)。It will be understood that when an element is referred to as being connected or coupled to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being directly connected or coupled to another element, there are no intervening elements present. Other words used to describe the relationship between elements should be interpreted in a like fashion (eg, between versus directly between, adjacent versus directly adjacent, etc.).

本文中使用的术语仅用于描述特定实施例的目的而不旨在限制示例实施例。如本文中所使用的，除非上下文另外清楚地指示，否则单数形式一(a)、一个(an)和该(the)也旨在包括复数形式。应进一步理解的是，术语含、含有、包括和/或包括有当在本文中使用时，指定存在陈述的特征、整数、步骤、操作、元件和/或组件，但是不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或其组。The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit example embodiments. As used herein, the singular forms a (a), an (an) and the (the) are intended to include the plural forms as well, unless the context clearly dictates otherwise. It is to be further understood that the terms containing, containing, including and/or including when used herein designate the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or Various other features, integers, steps, operations, elements, components and/or groups thereof.

还应该注意的是，在一些替代实施方式中，所指出的功能/行为可以不按图中指出的次序发生。例如，取决于所涉及的功能性或者行为，相继示出的两个图实际上可以被并发地执行或者有时可以以相反的次序被执行。It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may in fact be executed concurrently or may sometimes be executed in the reverse order, depending upon the functionality or behavior involved.

除非另外定义，否则本文中使用的所有术语(包括技术和科学术语)具有与由示例实施例所属的领域的普通技术人员所通常理解的相同的含义。应进一步理解的是，除非在本文中明确地如此定义，否则术语(例如，在常用词典中定义的那些术语)应该被解释为具有与其在相关领域的上下文中的含义一致的含义，而不应在理想化或过于正式的意义上进行解释。Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It should be further understood that, unless expressly so defined herein, terms (eg, those defined in commonly used dictionaries) should be construed as having meanings consistent with their meanings in the context of the relevant art, not Explain in an idealized or overly formal sense.

上述示例实施例的各部分和对应的详细描述是按软件或算法以及对计算机存储器内的数据比特的操作的符号表示来呈现的。这些描述和表示是本领域的普通技术人员用来有效地将其工作的实质传达给本领域的其他普通技术人员的描述和表示。算法(当在这里使用该术语时，并且当一般地使用它时)被认为是导致期望结果的步骤的自相一致序列。这些步骤是要求对物理量物的理操纵的那些步骤。通常，尽管不一定，然而这些量采取能够被存储、转移、组合、比较和以其他方式操纵的光学信号、电信号或磁信号的形式。有时主要由于通用的原因，将这些信号称为比特、值、元素、符号、字符、术语、数字等已证明是方便的。Portions of the above-described example embodiments and corresponding detailed descriptions have been presented in terms of software or algorithms and symbolic representations of operations on data bits within a computer memory. These descriptions and representations are those used by those of ordinary skill in the art to effectively convey the substance of their work to others of ordinary skill in the art. An algorithm (as the term is used here, and when it is used generally) is considered to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of optical, electrical, or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

在上述说明性实施例中，对可以作为程序模块或功能过程被实现的操作(例如，以流程图的形式)的行为和符号表示的引用包括执行特定任务或者实现特定抽象数据类型并且可以使用现有结构元件处的现有硬件来描述和/或实现的例程、程序、对象、组件、数据结构等。这种现有硬件可以包括一个或多个中央处理单元(CPU)、数字信号处理器(DSP)、专用集成电路、现场可编程门阵列(FPGA)计算机等。In the above-described illustrative embodiments, references to acts and symbolic representations of operations (eg, in the form of flowcharts) that may be implemented as program modules or functional processes include performing particular tasks or implementing particular abstract data types and may use existing Routines, programs, objects, components, data structures, etc., are described and/or implemented with existing hardware at the structural elements. Such existing hardware may include one or more central processing units (CPUs), digital signal processors (DSPs), application specific integrated circuits, field programmable gate array (FPGA) computers, and the like.

然而，应该记住的是，所有这些和类似的术语都将与适当的物理量相关联并且仅仅是应用于这些量的方便标签。除非另外具体地陈述，或者如从讨论中显而易见的，诸如处理或计算或计算出或确定显示等的术语指代计算机系统或类似的电子计算设备的动作和过程，所述计算机系统或类似的电子计算设备将被表示为计算机系统的寄存器和存储器内的物理、电子量的数据操纵并变换成被类似地表示为计算机系统存储器或寄存器或其他此类信息存储、传输或显示设备内的物理量的其他数据。It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise, or as apparent from the discussion, terms such as processing or computing or calculating or determining display refer to the acts and processes of a computer system or similar electronic computing device that Computing devices manipulate and transform data represented as physical, electronic quantities within the registers and memory of a computer system into other physical quantities similarly represented as computer system memory or registers or other such information storage, transmission or display devices data.

另外注意的是，示例实施例的软件实现的方面通常被编码在某种形式的非暂时性程序存储介质上或者实现在某种类型的传输介质上。程序存储介质可以是磁的(例如，软盘或硬盘驱动器)或光学的(例如，紧致盘只读存储器或CD ROM)，并且可以是只读或随机存取的。类似地，传输介质可以是双绞线对、同轴电缆、光纤或为本领域所知的某个其他适合的传输介质。示例实施例不受任何给定实施方式的这些方面限制。Note also that the software-implemented aspects of the example embodiments are typically encoded on some form of non-transitory program storage medium or implemented over some type of transmission medium. The program storage medium may be magnetic (eg, a floppy disk or hard drive) or optical (eg, a compact disk read only memory or CD ROM), and may be read-only or random access. Similarly, the transmission medium may be twisted wire pairs, coaxial cable, fiber optics, or some other suitable transmission medium known in the art. Example embodiments are not limited by these aspects of any given implementation.

最后，还应该注意的是，虽然所附权利要求书陈列本文中描述的特征的特定组合，但是本公开的范围不限于此后要求保护的特定组合，而是替代地扩展到包含本文中公开的特征或实施例的任何组合，而不管此时是否已在所附权利要求中具体地枚举了该特定组合。Finally, it should also be noted that although the appended claims recite specific combinations of features described herein, the scope of the present disclosure is not limited to the specific combinations claimed hereinafter, but instead extends to encompass the features disclosed herein or any combination of embodiments, whether or not at this time the specific combination is specifically recited in the appended claims.

Claims

1. A method, comprising:

receiving a frame of a video;

identifying a three-dimensional (3D) object in the frame;

matching the 3D object with the stored 3D object;

compressing frames of the video using a color prediction scheme based on the 3D object and the stored 3D object; and

storing compressed frames having metadata identifying the 3D object, indicating a location of the 3D object in a frame of the video and indicating an orientation of the 3D object in the frame of the video.

2. The method of claim 1, wherein compressing frames of the video using the color prediction scheme based on the 3D object and the stored 3D object comprises:

generating a first 3D object proxy based on the stored 3D object;

transforming the first 3D object proxy based on the 3D object identified in the frame;

generating a second 3D object proxy based on the stored 3D object;

identifying the 3D object in a key frame of the video;

transforming the second 3D object proxy based on the 3D object identified in the key frame;

mapping color attributes from the 3D object to a transformed first 3D object proxy;

mapping color attributes from the 3D object identified in the keyframe to a transformed second 3D object proxy; and

generating a residual of the 3D object based on the color attributes of the transformed first 3D object proxy and the color attributes of the transformed second 3D object proxy.

3. The method of claim 1, wherein compressing frames of the video using the color prediction scheme based on the 3D object and the stored 3D object comprises:

generating a first 3D object proxy based on the stored 3D object;

generating a second 3D object proxy based on the stored 3D object;

identifying the 3D object in a key frame of the video;

mapping color attributes from the 3D object to a transformed first 3D object proxy; and

generating a residual of the 3D object based on the color attribute of the transformed first 3D object proxy and a default color attribute of the transformed second 3D object proxy.

4. The method of claim 1, wherein compressing frames of the video using the color prediction scheme based on the 3D object and the stored 3D object comprises:

generating a first 3D object proxy based on the stored 3D object;

encoding the first 3D object proxy using an auto-encoder;

transform the encoded first 3D object proxy based on the 3D object identified in the frame;

generating a second 3D object proxy based on the stored 3D object;

encoding the second 3D object proxy using an auto-encoder;

identifying the 3D object in a key frame of the video;

transform the encoded second 3D object proxy based on the 3D object identified in the key frame;

5. The method of claim 1, wherein compressing frames of the video using the color prediction scheme based on the 3D object and the stored 3D object comprises:

generating a first 3D object proxy based on the stored 3D object;

encoding the first 3D object proxy using an auto-encoder;

generating a second 3D object proxy based on the stored 3D object;

encoding the second 3D object proxy using an auto-encoder;

identifying the 3D object in a key frame of the video;

6. The method of claim 1, further comprising:

prior to storing the 3D object:

identifying at least one 3D object of interest associated with the video;

determining a plurality of mesh attributes associated with the 3D object of interest;

determining a location associated with the 3D object of interest;

determining an orientation associated with the 3D object of interest;

determining a plurality of color attributes associated with the 3D object of interest; and

reducing a number of variables associated with a mesh attribute of the 3D object of interest using an auto-encoder.

7. The method of claim 1, wherein compressing frames of the video comprises determining location coordinates of the 3D object relative to origin coordinates of a background 3D object in a keyframe.

8. The method of claim 1, wherein:

the stored 3D object includes default color attributes, and

the color prediction scheme uses the default color attribute.

9. The method of claim 1, further comprising:

identifying at least one 3D object of interest associated with the video;

generating at least one stored 3D object based on the at least one 3D object of interest, each of the at least one stored 3D object being defined by a mesh comprising a set of points connected by a face, each point storing at least one attribute comprising location coordinates of the respective point; and

storing the at least one stored 3D object in association with the video.

10. A method, comprising:

receiving a frame of a video;

identifying a three-dimensional (3D) object in the frame;

matching the 3D object with the stored 3D object;

decompressing frames of the video using a color prediction scheme based on the 3D object and the stored 3D object; and

rendering frames of the video.

11. The method of claim 10, wherein decompressing frames of the video using the color prediction scheme based on the 3D object and the stored 3D object comprises:

generating a first 3D object proxy based on the stored 3D object;

identifying the 3D object in a key frame of the video;

generating color attributes of the 3D object based on the color attributes of the transformed first 3D object proxy and the color attributes of the transformed second 3D object proxy.

12. The method of claim 10, wherein decompressing frames of the video using the color prediction scheme based on the 3D object and the stored 3D object comprises:

generating a first 3D object proxy based on the stored 3D object;

identifying the 3D object in a key frame of the video;

generating color attributes of the 3D object based on the color attributes of the transformed first 3D object proxy and the default color attributes of the transformed second 3D object proxy.

13. The method of claim 10, wherein decompressing frames of the video using the color prediction scheme based on the 3D object and the stored 3D object comprises:

generating a first 3D object proxy based on the stored 3D object;

decoding the first 3D object proxy using an auto-encoder;

transforming the decoded first 3D object proxy based on metadata associated with the 3D object;

generating a second 3D object proxy based on the stored 3D object;

decoding the second 3D object proxy using an auto-encoder;

identifying the 3D object in a key frame of the video;

transforming the decoded second 3D object proxy based on metadata associated with the 3D object identified in the key frame;

mapping color attributes from the 3D object to the transformed first 3D object proxy;

mapping color attributes from the 3D object identified in the keyframe to the transformed second 3D object proxy; and

14. The method of claim 10, wherein decompressing frames of the video using the color prediction scheme based on the 3D object and the stored 3D object comprises:

generating a first 3D object proxy based on the stored 3D object;

decoding the first 3D object proxy using an auto-encoder;

generating a second 3D object proxy based on the stored 3D object;

decoding the second 3D object proxy using an auto-encoder;

identifying the 3D object in a key frame of the video;

generating color attributes of the 3D object based on the color attributes of the transformed first 3D object proxy and default attributes of the transformed second 3D object proxy.

15. The method of claim 10, further comprising:

receiving at least one potential representation of a 3D shape;

using machine-trained generative modeling techniques to:

determining a plurality of mesh attributes associated with the 3D shape;

determining a location associated with the 3D shape;

determining an orientation associated with the 3D shape; and

determining a plurality of color attributes associated with the 3D shape, an

Storing the 3D shape as a stored 3D object.

16. The method of claim 10, wherein rendering the frame of the video comprises:

receiving location coordinates of the 3D object relative to origin coordinates of a background 3D object in a keyframe; and

placing the 3D object in the frame using the location coordinates.

17. The method of claim 10, further comprising:

receiving a neural network used by an encoder of an auto-encoder to reduce a number of variables associated with mesh properties, position, orientation, and color properties of at least one 3D object of interest;

regenerating, in a decoder of the auto-encoder, points associated with a mesh of at least one 3D object of interest using the neural network, the regenerating the points including regenerating a position attribute, an orientation attribute, and a color attribute; and

storing the at least one 3D object of interest as a stored 3D object.

18. A method for predicting color changes using a proxy:

generating a first 3D object proxy based on the stored 3D object;

generating a second 3D object proxy based on the stored 3D object;

transforming the first 3D object proxy based on 3D objects identified in frames of a video;

transforming the second 3D object proxy based on the 3D objects identified in key frames of the video;

mapping color attributes from the 3D object identified in the frame of the video to a transformed first 3D object proxy;

generating color data of the 3D object based on the color attributes of the transformed first 3D object proxy and the color attributes of the transformed second 3D object proxy.

19. The method of claim 18, further comprising:

encoding the first 3D object agent using an auto-encoder prior to transforming the first 3D object agent; and

encoding the second 3D object agent using the auto-encoder prior to transforming the second 3D object agent.

20. The method of claim 18, further comprising:

decoding the first 3D object proxy using an auto-encoder after transforming the first 3D object proxy; and

decoding the second 3D object proxy using the auto-encoder after transforming the second 3D object proxy.

21. The method of claim 18, wherein generating color data for the 3D object comprises subtracting the color attribute of the transformed first 3D object proxy from the color attribute of the transformed second 3D object proxy.

22. The method of claim 18, wherein generating color data for the 3D object comprises adding color attributes of the transformed first 3D object proxy to color attributes of the transformed second 3D object proxy.