The present application claims priority from U.S. provisional patent application No. 63/389,316 entitled "Refection Symmetry-Based Mesh Coding" filed on 7.7.14, and is a continuation of and claims priority from U.S. patent application No. 18/208,111 entitled "SYSTEMS AND Methods for Reflection Symmetry-Based Mesh Coding" filed on 9.6.2023, all of which are incorporated herein by reference in their entirety.
Detailed Description
The present disclosure describes lossless and lossy mesh codec techniques based on the symmetric nature of the mesh content. More particularly, the disclosed methods and systems relate to reflectively symmetric partitioning, prediction and encoding of grid content.
Example systems and apparatus
Fig. 1 is a block diagram illustrating a communication system 100 according to some embodiments. The communication system 100 includes a source device 102 and a plurality of electronic devices 120 (e.g., electronic device 120-1 through electronic device 120-m) communicatively coupled to each other via one or more networks. In some implementations, the communication system 100 is a streaming system, for example, for use with video-enabled applications (e.g., video conferencing applications, digital TV applications, and media storage and/or distribution applications).
The source device 102 includes a video source 104 (e.g., camera component or media storage) and an encoder component 106. In some implementations, the video source 104 is a digital video camera (e.g., configured to create an uncompressed video sample stream). The encoder component 106 generates one or more encoded video bitstreams from the video stream. The video stream from the video source 104 may be of a high data volume compared to the encoded video bit stream 108 generated by the encoder component 106. Because the encoded video bitstream 108 is a lower amount of data (less data) than the video stream from the video source, the encoded video bitstream 108 requires less bandwidth to transmit and less storage space to store than the video stream from the video source 104. In some implementations, the source device 102 does not include the encoder component 106 (e.g., is configured to transmit uncompressed video data to the network 110).
One or more networks 110 represent any number of networks that communicate information between source device 102, server system 112, and/or electronic devices 120, including for example, wired (connection) and/or wireless communication networks. One or more networks 110 may exchange data in circuit-switched and/or packet-switched channels. Representative networks include telecommunication networks, local area networks, wide area networks, and/or the internet.
One or more networks 110 include a server system 112 (e.g., a distributed/cloud computing system). In some implementations, the server system 112 is or includes a streaming server (e.g., configured to store and/or distribute video content such as an encoded video stream from the source device 102). The server system 112 includes a codec component 114 (e.g., configured to encode and/or decode video data). In some implementations, the codec component 114 includes an encoder component and/or a decoder component. In various embodiments, the codec component 114 is instantiated as hardware, software, or a combination of hardware and software. In some implementations, the codec component 114 is configured to decode the encoded video bitstream 108 and re-encode the video data using different encoding standards and/or methods to generate encoded video data 116. In some implementations, the server system 112 is configured to generate a plurality of video formats and/or encodings from the encoded video bitstream 108.
In some implementations, the server system 112 serves as a Media-aware network element (Media-Aware Network Element, MANE). For example, the server system 112 may be configured to clip the encoded video bitstream 108 to customize potentially different bitstreams for one or more of the electronic devices 120. In some embodiments, the MANE is provided separately from the server system 112.
The electronic device 120-1 includes a decoder component 122 and a display 124. In some implementations, the decoder component 122 is configured to decode the encoded video data 116 to generate an outgoing video stream that may be presented on a display or other type of presentation device. In some implementations, one or more of the electronic devices 120 do not include a display component (e.g., communicatively coupled to an external display device and/or include media memory). In some implementations, the electronic device 120 is a streaming client. In some implementations, the electronic device 120 is configured to access the server system 112 to obtain the encoded video data 116.
The source device and/or the plurality of electronic devices 120 are sometimes referred to as "terminal devices" or "user devices". In some implementations, one or more of the electronic devices 120 and/or the source device 102 are examples of server systems, personal computers, portable devices (e.g., smartphones, tablets, or laptops), wearable devices, video conferencing devices, and/or other types of electronic devices.
In an example operation of the communication system 100, the source device 102 transmits the encoded video bitstream 108 to the server system 112. For example, the source device 102 may codec a stream of pictures captured by the source device. The server system 112 receives the encoded video bitstream 108 and may decode and/or encode the encoded video bitstream 108 using the codec component 114. For example, server system 112 may apply encoding to video data that is more optimized for network transmission and/or storage. The server system 112 may transmit the encoded video data 116 (e.g., one or more decoded video bit streams) to one or more of the electronic devices 120. Each electronic device 120 may decode the encoded video data 116 to recover and optionally display the video pictures.
In some embodiments, the transmission discussed above is a unidirectional data transmission. Unidirectional data transmission is sometimes used for media service applications and the like. In some embodiments, the transmission discussed above is a bi-directional data transmission. Bi-directional data transfer is sometimes used for video conferencing applications, etc. In some implementations, the encoded video bitstream 108 and/or the encoded video data 116 are encoded and/or decoded according to any video codec/compression standard described herein, such as HEVC, VVC, and/or AVI.
Fig. 2A is a block diagram illustrating example elements of encoder component 106, in accordance with some embodiments. Encoder component 106 receives a source video sequence from video source 104. In some implementations, the encoder component includes a receiver (e.g., transceiver) component configured to receive the source video sequence. In some implementations, the encoder component 106 receives the video sequence from a remote video source (e.g., a video source that is a component of a different device than the encoder component 106). Video source 104 may provide the source video sequence in the form of a stream of digital video samples, which may have any suitable bit depth (e.g., 8 bits, 10 bits, or 12 bits), any color space (e.g., bt.601YCrCb or RGB), and any suitable sampling structure (e.g., YCrCb4:2:0 or YCrCb4: 4:4). In some implementations, the video source 104 is a storage device that stores previously captured/prepared video. In some implementations, the video source 104 is a camera that captures local image information as a video sequence. The video data may be provided as a plurality of individual pictures that are given motion when viewed in sequence. The pictures themselves may be organized as an array of spatial pixels, where each pixel may include one or more samples, depending on the sampling structure, color space, etc. used. The relationship between pixels and samples can be readily understood by one of ordinary skill in the art. The following description focuses on the sample.
The encoder component 106 is configured to codec and/or compress pictures of the source video sequence into a codec video sequence 216 in real-time or under other temporal constraints required by the application. Performing the appropriate codec speed is a function of the controller 204. In some implementations, the controller 204 controls and is functionally coupled to other functional units as described below. The parameters set by the controller 204 may include rate control related parameters (e.g., lambda values for Picture skipping, quantizer, and/or rate distortion optimization techniques), picture size, group of pictures (GOP) layout, maximum motion vector search range, etc. Other functions of the controller 204 may be readily identified by one of ordinary skill in the art as they may involve the encoder assembly 106 being optimized for a particular system design.
In some implementations, the encoder component 106 is configured to operate in a codec loop. In a simplified example, the codec loop includes a source codec 202 (e.g., responsible for creating symbols, such as a symbol stream, based on input pictures and reference pictures to be encoded) and a (local) decoder 210. Decoder 210 reconstructs the symbols to create sample data (when compression between the symbols and the encoded video bitstream is lossless) in a similar manner as a (remote) decoder. The reconstructed sample stream (sample data) is input to the reference picture memory 208. Since decoding of the symbol stream produces a bit-accurate result independent of the decoder location (local or remote), the content in the reference picture memory 208 is also bit-accurate between the local encoder and the remote encoder. In this way, the prediction part of the encoder interprets as reference picture samples the same sample values as the decoder would interpret when using prediction during decoding. The principles of reference picture synchronicity (and drift that occurs in the event that synchronicity cannot be maintained, for example, due to channel errors) are well known to those of ordinary skill in the art.
The operation of the decoder 210 may be the same as the operation of a remote decoder, such as the decoder 210 described in detail below in connection with fig. 2B. However, referring briefly to fig. 2B, since symbols are available and encoding the symbols into a coded video sequence by the entropy codec 214 and decoding the symbols by the parser 254 may be lossless, the entropy decoding portion of the video decoder unit 122, including the buffer memory 252 and the parser 254, may not be fully implemented in the local decoder 210.
It is observed at this point that any decoder technique other than the parsing/entropy decoding present in the decoder also necessarily needs to be present in the corresponding encoder in substantially the same functional form. For this reason, the disclosed subject matter focuses on decoder operation. The description of the encoder technique may be simplified because the encoder technique is in contrast to the fully described decoder technique. Only in certain parts will a more detailed description be needed and provided below.
As part of its operation, the source codec 202 may perform motion compensated predictive codec that predictively codec an input frame with reference to one or more previously-codec frames from a video sequence that are designated as reference frames. In this way, the codec engine 212 encodes differences between blocks of pixels of an input frame and blocks of pixels of a reference frame that may be selected as a prediction reference for the input frame. The controller 204 may manage the codec operation of the source codec 202 including, for example, setting parameters and subgroup parameters for encoding video data.
The decoder 210 decodes the encoded video data of frames that may be designated as reference frames based on the symbols created by the source codec 202. The operation of the codec engine 212 may advantageously be a lossy process. When the encoded and decoded video data is decoded at a video decoder (not shown in fig. 2A), the reconstructed video sequence may be a copy of the source video sequence with some errors. The decoder 210 replicates the decoding process that may be performed on the reference frames by the remote video decoder and may cause the reconstructed reference frames to be stored in the reference picture memory 208. In this way, the encoder component 106 locally stores a copy of the reconstructed reference frame that has common content (no transmission errors) with the reconstructed reference frame to be obtained by the remote video decoder.
The predictor 206 may perform a prediction search for the codec engine 212. That is, for a new frame to be encoded, the predictor 206 may search the reference picture memory 208 for sample data (as candidate reference pixel blocks) or specific metadata such as reference picture motion vectors, block shapes, etc., which may be used as appropriate prediction references for the new picture. The predictor 206 may operate on a block of samples by block of pixels to find an appropriate prediction reference. In some cases, the input picture may have prediction references taken from multiple reference pictures stored in the reference picture memory 208 as determined by search results obtained by the predictor 206.
The outputs of all the above mentioned functional units may be subjected to entropy coding in the entropy codec 214. The entropy codec 214 converts the symbols generated by the various functional units into a coded video sequence by lossless compression of the symbols according to techniques well known to those of ordinary skill in the art (e.g., huffman coding, variable length coding, and/or arithmetic coding).
In some implementations, the output of the entropy codec 214 is coupled to a transmitter. The transmitter may be configured to buffer the encoded video sequence created by the entropy codec 214 in preparation for transmission via the communication channel 218, which communication channel 218 may be a hardware/software link to a storage device that is to store encoded video data. The transmitter may be configured to combine the encoded video data from the source codec 202 with other data to be transmitted, such as encoded audio data and/or an auxiliary data stream (source not shown). In some embodiments, the transmitter may transmit additional data along with the encoded video. The source codec 202 may include such data as part of a coded video sequence. The additional data may include temporal/spatial/SNR enhancement layers, other forms of redundant data such as redundant pictures and slices, supplemental enhancement information (Supplementary Enhancement Information, SEI) messages, visual availability information (Visual Usability Information, VUI) parameter set slices, and the like.
The controller 204 may manage the operation of the encoder assembly 106. During encoding and decoding, the controller 204 may assign each encoded picture a certain encoded picture type, which may affect the encoding and decoding techniques applied to the corresponding picture. For example, a picture may be allocated as an intra picture (I picture), a predicted picture (P picture), or a bi-predicted picture (B picture). Intra pictures may be encoded and decoded without using any other frames in the sequence as prediction sources. Some video codecs allow for different types of intra pictures, including, for example, independent decoder refresh (INDEPENDENT DECODER REFRESH, IDR) pictures. Those variations of the I picture and its corresponding applications and features are known to those of ordinary skill in the art and are therefore not repeated here. The predictive picture may be encoded and decoded using inter-prediction or intra-prediction that predicts sample values for each block using at most one motion vector and a reference index. Bi-predictive pictures may be encoded and decoded using inter-prediction or intra-prediction that predicts sample values for each block using at most two motion vectors and a reference index. Similarly, a multi-predictive picture may use more than two reference pictures and associated metadata for reconstructing a single block.
A source picture may typically be spatially subdivided into blocks of samples (e.g., blocks of 4 x 4, 8 x 8, 4 x 8, or 16 x 16 samples, respectively), and encoded and decoded block by block. These blocks may be predictively encoded with reference to other (encoded) blocks, which are determined by the codec allocation applied to the respective pictures of the block. For example, a block of an I picture may be non-predictive coded, or a block of an I picture may be predictive coded (spatial prediction or intra prediction) with reference to a coded block of the same picture. The pixel blocks of the P picture may be non-predictive coded via spatial prediction or via temporal prediction with reference to a previously coded reference picture. The block of B pictures may be non-predictive coded via spatial prediction or via temporal prediction with reference to one or two previously coded reference pictures.
The video may be captured as a plurality of source pictures (video pictures) in time series. Intra picture prediction (often simply referred to as intra prediction) exploits spatial correlation in a given picture, while inter picture prediction exploits (temporal or other) correlation between pictures. In an example, a particular picture in encoding/decoding (which is referred to as a current picture) is partitioned into blocks. In the case where a block in the current picture is similar to a reference block in a reference picture that has been previously encoded and still buffered in video, the block in the current picture may be encoded by a vector called a motion vector. The motion vector points to a reference block in the reference picture, and in the case of using multiple reference pictures, the motion vector may have a third dimension that identifies the reference picture.
The encoder component 106 can perform the codec operations according to any predetermined video codec technique or standard such as described herein. In operation of the encoder component 106, the encoder component 106 can perform various compression operations, including predictive codec operations that exploit temporal redundancy and spatial redundancy in an input video sequence. Thus, the encoded video data may conform to the syntax specified by the video encoding and decoding technique or standard used.
Fig. 2B is a block diagram illustrating example elements of decoder component 122 according to some embodiments. Decoder element 122 in fig. 2B is coupled to channel 218 and display 124. In some implementations, the decoder component 122 includes a transmitter coupled to the loop filter 256 and configured to transmit data to the display 124 (e.g., via a wired or wireless connection).
In some implementations, the decoder component 122 includes a receiver coupled to the channel 218 and configured to receive data from the channel 218 (e.g., via a wired or wireless connection). The receiver may be configured to receive one or more encoded video sequences decoded by decoder component 122. In some embodiments, the decoding of each encoded video sequence is independent of the other encoded video sequences. Each encoded video sequence may be received from a channel 218, which channel 218 may be a hardware/software link to a storage device storing encoded video data. The receiver may receive encoded video data as well as other data, such as encoded audio data and/or auxiliary data streams, which may be forwarded to their respective use entities (not depicted). The receiver may separate the encoded video sequence from other data. In some embodiments, the receiver receives additional (redundant) data as well as the encoded video. Additional data may be included as part of the encoded video sequence. The decoder component 122 can use the additional data to properly decode the data and/or more accurately reconstruct the original video data. The additional data may be in the form of, for example, temporal, spatial or SNR enhancement layers, redundant slices, redundant pictures, forward error correction codes, and the like.
According to some implementations, decoder component 122 includes buffer memory 252, parser 254 (sometimes also referred to as an entropy decoder), scaler/inverse transform unit 258, intra picture prediction unit 262, motion compensated prediction unit 260, aggregator 268, loop filter unit 256, reference picture memory 266, and current picture memory 264. In some implementations, decoder element 122 is implemented as an integrated circuit, a series of integrated circuits, and/or other electronic circuitry. In some implementations, the decoder component 122 is implemented at least in part in software.
Buffer memory 252 is coupled between channel 218 and parser 254 (e.g., to combat network jitter). In some embodiments, buffer memory 252 is separate from decoder element 122. In some embodiments, a separate buffer memory is provided between the output of channel 218 and decoder element 122. In some implementations, a separate buffer memory (e.g., to combat network jitter) is provided external to the decoder component 122 in addition to the buffer memory 252 internal to the decoder component 122 (e.g., the buffer memory 252 is configured to handle playout timing). The buffer memory 252 may not be needed or the buffer memory 252 may be smaller when receiving data from a store/forward device with sufficient bandwidth and controllability or from an isochronous network. In order to use as much as possible a packet network such as the internet, a buffer memory 252 may be required, which buffer memory 252 may be relatively large and may advantageously be of an adaptive size, and may be at least partially implemented in a similar element (not depicted) external to the operating system or decoder component 122.
The parser 254 is configured to reconstruct the symbols 270 from the encoded video sequence. The symbols may include, for example, information for managing the operation of decoder component 122 and/or information for controlling a presentation device such as display 124. The control information for the rendering device may be in the form of, for example, an auxiliary enhancement information (SEI (Supplementary Enhancement Information, SEI)) message or a video availability information (VUI (Video Usability Information, VUI)) parameter set fragment (not depicted). The parser 254 parses (entropy decodes) the encoded video sequence. The encoding and decoding of the encoded video sequence may be performed according to video encoding and decoding techniques or standards, and may follow principles well known to those skilled in the art, including variable length encoding, huffman encoding (Huffman encoding), arithmetic encoding with or without context sensitivity, and the like. The parser 254 may extract a subgroup parameter set for at least one of the subgroups of pixels in the video decoder from the encoded video sequence based on the at least one parameter corresponding to the group. The subgroup may include a group of pictures (Group of Pictures, GOP), pictures, tiles, slices, macroblocks, coding Units (CUs), blocks, transform Units (TUs), prediction Units (PUs), and the like. The parser 254 may also extract information such as transform coefficients, quantizer parameter values, motion vectors, etc. from the encoded video sequence.
Depending on the type of encoded video picture or a portion of encoded video picture (e.g., inter and intra pictures, inter and intra blocks), and other factors, the reconstruction of symbol 270 may involve a number of different units. Which units are involved and how these units are involved may be controlled by subgroup control information parsed by parser 254 from the encoded video sequence. Such subgroup control information flow between the parser 254 and the underlying plurality of cells is not depicted for simplicity.
In addition to the functional blocks already mentioned, the decoder section 122 can be conceptually subdivided into a plurality of functional units as described below. In a practical implementation operating under commercial constraints, many of these units interact tightly with each other and may be at least partially integrated with each other. However, for the purposes of describing the disclosed subject matter, the following conceptual subdivision of functional units is maintained.
The sealer/inverse transform unit 258 receives quantized transform coefficients and control information (e.g., which transform, block size, quantization factor, and/or quantization scaling matrix to use) from the parser 254 as symbols 270. The scaler/inverse transform unit 258 may output a block comprising sample values, which may be input to the aggregator 268.
In some cases, the output samples of the scaler/inverse transform unit 258 belong to intra-coded blocks; namely: the predictive information from the previously reconstructed picture is not used, but a block of predictive information from the previously reconstructed portion of the current picture may be used. Such predictive information may be provided by intra picture prediction unit 262. The intra picture prediction unit 262 may use information that has been reconstructed around acquired from a current (partially reconstructed) picture from the current picture memory 264 to generate a block of the same size and shape as the block under reconstruction. The aggregator 268 may add the prediction information that the intra picture prediction unit 262 has generated to the output sample information as provided by the scaler/inverse transform unit 258 on a per sample basis.
In other cases, the output samples of the scaler/inverse transform unit 258 belong to inter-frame encoded and possibly motion compensated blocks. In such a case, the motion compensated prediction unit 260 may access the reference picture memory 266 to obtain samples for prediction. After motion compensation of the acquired samples according to the symbols 270 belonging to the block, these samples may be added by an aggregator 268 to the output of the scaler/inverse transform unit 258 (in this case referred to as residual samples or residual signals) to generate output sample information. The address in the reference picture memory 266 from which the motion compensated prediction unit 260 obtains the prediction samples may be controlled by a motion vector. The motion vectors may be available to the motion compensated prediction unit 260 in the form of symbols 270, which symbols 270 may have, for example, an X component, a Y component, and a reference picture component. The motion compensation may also include interpolation of sample values, such as obtained from the reference picture memory 266, motion vector prediction mechanisms, etc. when sub-sample accurate motion vectors are used.
The output samples of the aggregator 268 may be subjected to various loop filtering techniques in the loop filter unit 256. The video compression techniques may include in-loop filter techniques controlled by parameters included in the encoded video bitstream and available to the loop filter unit 256 as symbols 270 from the parser 254, but may also be responsive to meta-information obtained during decoding of previous (in decoding order) portions of the encoded pictures or encoded video sequences, and to sample values of previous reconstructions and loop filtering.
The output of loop filter unit 256 may be a sample stream that may be output to a rendering device, such as display 124, and stored in reference picture memory 266 for future inter picture prediction.
Once fully reconstructed, some of the encoded pictures may be used as reference pictures for future prediction. Once a coded picture has been fully reconstructed and that coded picture has been identified (by, for example, the parser 254) as a reference picture, the current reference picture may become part of the reference picture memory 266 and a new current picture memory may be reallocated before starting to reconstruct a subsequent coded picture.
The decoder component 122 can perform decoding operations according to a predetermined video compression technique that can be recorded in a standard (e.g., any of the standards described herein). As specified in the video compression technology file or standard and in particular in the configuration file therein, the encoded video may conform to the syntax specified by the video compression technology or standard used in the sense that the encoded video sequence follows the syntax of the video compression technology or standard. Furthermore, to comply with some video compression techniques or standards, the complexity of the encoded video sequence may be within a range as defined by the hierarchy of the video compression techniques or standards. In some cases, the hierarchy limits a maximum picture size, a maximum frame rate, a maximum reconstructed sample rate (measured in units of, for example, mega samples per second), a maximum reference picture size, and so on. In some cases, the limits set by the hierarchy may be further defined by assuming a reference decoder (Hypothetical Reference Decoder, HRD) specification and metadata for HRD buffer management signaled in the encoded video sequence.
Fig. 3 is a block diagram illustrating a server system 112 according to some embodiments. The server system 112 includes control circuitry 302, one or more network interfaces 304, memory 314, a user interface 306, and one or more communication buses 312 for interconnecting these components. In some implementations, the control circuitry 302 includes one or more processors (e.g., a CPU, GPU, and/or DPU). In some implementations, the control circuitry includes one or more Field-Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA), hardware accelerators, and/or one or more integrated circuits (e.g., application specific integrated circuits).
The network interface 304 may be configured to interface with one or more communication networks (e.g., wireless, wired, and/or optical networks). The communication network may be local, wide area, metropolitan area network, on-board and industrial, real-time, delay tolerant, etc. Examples of communication networks include: local area networks such as ethernet; a wireless LAN (Local Area Network, LAN); a cellular network including GSM(Global System for Mobile Communications,GSM)、3G(the Third Generation,3G)、4G(the Fourth Generation,4G)、5G(the Fifth Generation,5G)、LTE(Long Term Evolution,LTE), etc.; TV wired or wireless wide area digital networks including cable TV, satellite TV, and terrestrial broadcast TV; vehicles and industrial networks including CANBus, and the like. Such communications may be uni-directional receive-only (e.g., broadcast TV), uni-directional transmit-only (e.g., CAN bus to some CAN bus device), or bi-directional (e.g., to other computer systems using a local area digital network or a wide area digital network). Such communications may include communications to one or more cloud computing networks.
The user interface 306 includes one or more output devices 308 and/or one or more input devices 310. The input device 310 may include one or more of the following: a keyboard, a mouse, a touch pad, a touch screen, a data glove, a joystick, a microphone, a scanner, a camera device, and the like. The output device 308 may include one or more of the following: an audio output device (e.g., a speaker), a visual output device (e.g., a display or screen), etc.
Memory 314 may include high-speed random access memory (e.g., DRAM, SRAM, DDR RAM, and/or other random access solid state memory devices) and/or non-volatile memory (e.g., one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, and/or other non-volatile solid state memory devices). Memory 314 optionally includes one or more storage devices remote from control circuitry 302. Memory 314, or alternatively, a non-volatile solid state memory device within memory 314, includes a non-transitory computer readable storage medium. In some implementations, the memory 314 or a non-transitory computer readable storage medium of the memory 314 stores the following programs, modules, instructions, and data structures, or a subset or superset thereof:
an operating system 316, the operating system 316 including processes for handling various basic system services and for performing hardware-related tasks;
A network communication module 318, the network communication module 318 for connecting the server system 112 to other computing devices via one or more network interfaces 304 (e.g., via wired and/or wireless connections);
A codec module 320, the codec module 320 being for performing various functions with respect to encoding and/or decoding data, such as video data. In some implementations, the codec module 320 is an example of the codec component 114. The codec module 320 includes, but is not limited to, one or more of the following:
A decoding module 322, the decoding module 322 for performing various functions related to decoding encoded data, such as those previously described with respect to the decoder component 122; and
An encoding module 340, the encoding module 340 for performing various functions related to encoding data, such as those previously described with respect to the encoder component 106; and
A picture memory 352, e.g., for use with the codec module 320, for storing pictures and picture data. In some implementations, the picture memory 352 includes one or more of the following: reference picture memory 208, buffer memory 252, current picture memory 264, and reference picture memory 266.
In some implementations, decoding module 322 includes parsing module 324 (e.g., configured to perform the various functions previously described with respect to parser 254), transform module 326 (e.g., configured to perform the various functions previously described with respect to sealer/inverse transform unit 258), prediction module 328 (e.g., configured to perform the various functions previously described with respect to motion compensated prediction unit 260 and/or intra picture prediction unit 262), and filtering module 330 (e.g., configured to perform the various functions previously described with respect to loop filter 256).
In some implementations, the encoding module 340 includes a code module 342 (e.g., configured to perform various functions previously described with respect to the source codec 202 and/or the codec engine 212) and a prediction module 344 (e.g., configured to perform various functions previously described with respect to the predictor 206). In some implementations, the decoding module 322 and/or the encoding module 340 include a subset of the modules shown in fig. 3. For example, a shared prediction module is used by both the decoding module 322 and the encoding module 340.
Each of the above identified modules stored in memory 314 corresponds to a set of instructions for performing the functions described herein. The above identified modules (e.g., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various embodiments. For example, the codec module 320 optionally does not include separate decoding and encoding modules, but rather uses the same set of modules to perform both sets of functions. In some implementations, the memory 314 stores a subset of the modules and data structures identified above. In some implementations, the memory 314 stores additional modules and data structures not described above, such as an audio processing module.
In some implementations, the server system 112 includes web or hypertext transfer Protocol (Hypertext Transfer Protocol, HTTP) servers, file transfer Protocol (FILE TRANSFER Protocol, FTP) servers, and web pages and applications implemented using Common gateway interface (Common GATEWAY INTERFACE, CGI) scripts, PHP hypertext preprocessors (PHP Hypertext Preprocessor, PHP), dynamic server pages (ACTIVE SERVER PAGE, ASP), hypertext markup language (Hyper Text Markup Language, HTML), extensible markup language (Extensible Markup Language, XML), java, javaScript, asynchronous JavaScript and XML (Asynchronous JavaScript and XML, AJAX), XHP, javelin, wireless universal resource files (Wireless Universal Resource File, WURFL), and the like.
Although fig. 3 illustrates a server system 112 according to some embodiments, fig. 3 is intended more as a functional description of various features that may be present in one or more server systems rather than a structural schematic of the embodiments described herein. In practice, and as recognized by one of ordinary skill in the art, items shown separately may be combined and some items may be separated. For example, some of the items shown separately in fig. 3 may be implemented on a single server, and a single item may be implemented by one or more servers. The actual number of servers used to implement server system 112 and how features are allocated between them will vary depending on the implementation, and optionally will depend in part on the amount of data traffic that the server system handles during peak usage periods as well as during average usage periods.
Example codec method
The V-grid (VMesh) is an evolving MPEG standard for compressing static and dynamic grids. Current V-grid reference software divides the input grid into a simplified base grid and displacement vectors that are independently encoded and decoded. Symmetry is a property of a geometric object when an operation maps the object to itself. In euclidean metrics, reflection, translation, rotation, and combinations thereof are a set of symmetric transformations or operations called euclidean equidistance. Reflection symmetry or bilateral symmetry is the most common symmetry that exists in both the living and non-living world. In some implementations, all points and edges of the reflective symmetric mesh have a one-to-one correspondence via the plane of symmetry. In addition, many artificial objects are designed to have reflective symmetry.
In some embodiments, one or more reflective symmetry planes of the mesh are detected by a simpler method using PCA. In some implementations, more advanced techniques using deep learning are utilized to detect one or more reflective symmetry planes of the mesh.
Reflection symmetry can be used to create a folded grid. Folding gridIs divided into mutually exclusive regions: The collapsed tree structure is used to represent mesh data with symmetry and asymmetry planes. Although a folded mesh has bit saving potential for a symmetrical mesh, a folded mesh: (1) vertex misalignment may not be handled and thus may not be suitable for lossless compression, (2) the efficiency of other compression tools may be ignored when reflected bit savings are less than lossless codec, (3) and may be inefficient in signaling symmetric information.
The methods and systems described herein include operations that may be used alone or in any order in combination. The methods and systems described herein are applicable to any polygonal mesh. In some implementations, the mesh includes a plurality of 3D polygons. The proposed method can be used for lossy and lossless mesh compression. In this proposal, reflection symmetry is used to demonstrate where the symmetrical structure is planar in 3D and linear in 2D.
In some embodiments, the symmetric mesh is recursively segmented and encoded. For example, the sub-grid at the ith iterationDividing into symmetrical sub-gridsAnd an asymmetric sub-gridThe two sub-grids are mutually exclusiveIn a completely symmetrical condition, there is no asymmetric part, and therefore
Symmetrical sub-gridCan be made of half-gridsAnd a first piece of symmetry information representation. Under reflective symmetry, the symmetrical subgrid is a plane of symmetry. Half of the gridCan be based on simple symmetric predictionAnd p i is derived as:
Under perfectly symmetric conditions, the symmetric prediction is accurate, without misalignment. Thus, the first and second substrates are bonded together, AndThe process is for each new sub-grid at the (i+1) th iterationRecursively completed. The symmetry plane and the asymmetric sub-grid are encoded. The overall framework is depicted in fig. 4. After the segmentation is completed, the remainder of the input mesh is also encoded.
The symmetric region detection method is used for grid segmentation (P1). In grid segmentation, the part of the unconnected grid is separated into separate sub-grids. For example, a grid depicting the head of a avatar wearing eyeglasses may be separated into a sub-grid of eyeglasses and a sub-grid of the head of the avatar. The separated sub-grids are then partitioned into one or more symmetrical grids (e.g.,) Or one or more asymmetric grids (e.g.,). One or more asymmetric grids are encoded as indicated by arrow 402 in fig. 4.
At step 404, the method determines whether to continue symmetric segmentation. For example, a cost function may be calculated to determine whether symmetric partitioning justifies the computational resources for symmetric partitioning (e.g., for encoding)And whether the estimated bits of p i are larger than those used for conventional coding (e.g., for encodingEstimated bits of (c)) are included. Or whether other thresholds (e.g., the number of symmetry points and/or edges is greater than a given threshold) are met for continuing the symmetric segmentation. The process stops when no significant symmetry remains (e.g., the number of symmetry points and edges is greater than a given threshold), or when the cost of symmetry-based codec is greater than conventional codec.
For symmetric partitioning (P2), any half of the symmetric mesh may be reserved (e.g., for further processing and/or to be encoded). In some embodiments, half of the symmetric mesh (e.g., or "symmetric portion") with more vertices and faces is selected for further processing and/or encoding. Alternatively, a left symmetrical portion or an upper symmetrical portion may be selected.
Example encoding operations
Fig. 5 illustrates an example of a fully recursive reflection symmetric partitioning, such as example encoding process 500. In some implementations, for each of the potentially symmetric partitions, a flag is signaled in the bitstream to indicate whether the corresponding partition is used. In some implementations, for each of the potentially symmetric partitions, a flag is not signaled in the bitstream, but rather a determination is made as to whether to use the corresponding partition implicitly. For the respective partition used, a description of the partition plane associated with that partition (e.g., a relevant parameter) is signaled or derived. As in fig. 3, the last split at step 3 is not necessary (x 1、x7 has been encoded via symmetry line p 1、p3), so the symmetric split will stop at step 2.
A two-dimensional (2D) raw grid comprising eight points x 1、x2、x3、x4、x5、x6、x7 and x 8 is shown at step 1 in fig. 5. Instead of encoding all eight points, in step 1, only x 1 and x 5 are encoded to represent the symmetry line p 1. The symmetry line p 1 passes through x 1 and x 5. In an embodiment where only the left portion of the mesh is reserved, step 2 of fig. 5 shows reserving the left half of the mesh. In the step 2 of the process, the process is carried out, X 7 and x 3 are encoded to represent a line of symmetry p 2 through two points x 7 and x 3. In some embodiments, as shown in step 3, only the upper half of the grid is reserved. Step 3 will involve encoding x 8 and x 4 to represent symmetry p 3, but step 3 is not required, as line p 3 maps x 1 onto x 7, but x 1 and x 7 have been encoded in step 1 and step 2, respectively. Instead of encoding x 8 and x 4 to represent symmetry p 3, only x 8 is encoded. In addition to the encoded vertices/points, The connection between x 7 and x 8, and the connection between x 8 and x 1 are also encoded. for this grid, the reflection-based symmetric segmentation ends at step 2.
With respect to the example shown in fig. 5, for an original mesh with eight vertices, three vertices (e.g., x 2、x4 and x 6) may be derived without encoding (e.g., only five vertices x 1、x3、x5、x7 and x 8 are encoded). The eight vertex mesh in fig. 5 also includes eight connections, and six of the eight connections may be derived (e.g., only the connection between x 7 and x 8 and the connection between x 8 and x 1 are encoded).
Example decoding operations
In some implementations, the reflective symmetric prediction codec is used to decode a totally reflective symmetric grid. In a fully reflective symmetric mesh, all vertices and edges have their correspondence with respect to the respective reflective planes. In some embodiments, based on the one or more coordinates and the one or more decoded connections that have been decoded, the remaining coordinates and connections are predicted based on information about the distance of the vertex from the line of symmetry or plane of symmetry to decode the vertex at the same distance from the line of symmetry or plane of symmetry along the normal direction of the line of symmetry or plane of symmetry.
Using encoded information from the example described with respect to fig. 5 (e.g., encoding the connection between x 1、x3、x5、x7、x8、x7 and x 8 and the connection between x 8 and x 1), the original mesh can be decoded as shown in fig. 6.
Fig. 6 illustrates an example decoding process 600. In step 1 of the decoding process, symmetry line p 2 is used to pass the normal vector along symmetry line p 2 from x 8 X 6 is located to derive or predict x 6, Wherein x 6 is the same distance from the symmetry line p 2 as x 8 is from p 2 on the symmetry line. That is, distance d 8 is the shortest distance between x 8 and line of symmetry p 2, and d 6 is the predicted/derived shortest distance between x 6 and line of symmetry p 2, Such that d 6 and d 8 are the same (e.g., The connection between d 8=d6).x7 and x 8 is similarly reflected to predict the connection between x 7 and x 6, and the connection between x 8 and x 1 is reflected to predict the connection between x 6 and x 5.
In step 2, after x 6 is obtained, the symmetry line p 1 is used to pass the normal vector along the symmetry line p 1 X 4 is positioned to derive or predict that x 4,x4 is the same distance from the line of symmetry p 1 as x 6 is from the line of symmetry p 1. Similarly, symmetry line p 1 is used to pass through the normal vector along symmetry line p1X 2 is positioned to derive or predict that x 2,x2 is the same distance from the line of symmetry p 1 as x 8 is from the line of symmetry p 1. The connection between x 7 and x 8 is similarly reflected to predict the connection between x 2 and x 3, And the connection between x 8 and x 1 is reflected to predict the connection between x 2 and x 1. the connection between x 7 and x 6 is similarly reflected to predict the connection between x 3 and x 4, And the connection between x 6 and x 5 is reflected to predict the connection between x 4 and x 5.
Examples of encoding and decoding a plane of symmetry
In some embodiments, instead of using three 3D points to represent the plane of symmetry p, only two 3D points of the symmetry pair (x 1,x2) are used as shown in fig. 7, where x 2 is the symmetry of x 1 on the plane of symmetry p. At the decoder, p can be derived to pass through the midpointAnd is perpendicular to the unique 3D plane 700 of line e connecting the two points x 1、x2. The 3D plane 700 is equivalent to passing through points having a normal vector defined by lIs a plane of the (c). x 1 may be encoded in the mesh data or in the plane signaling. In such a scenario, only a single 3D point (x, y, z) needs to be signaled.
In some embodiments, the plane of symmetry is encoded by modeling the plane of symmetry using the equation ax+by+cz+d=0, wherein the values of (a, b, c, d) are signaled.
In some embodiments, the nearly symmetrical grid is recursively segmented and encoded. As shown in fig. 8, an almost symmetrical grid is a grid that includes one-to-one vertex correspondence via a plane of symmetry and one or more additional displacements from the respective vertices predicted via a fully symmetrical reflection.
In mesh 800, symmetry line 814 passes through vertex 812. Reflection of vertex 802 about symmetry line 814 will result in 802', but mesh 800 includes vertex 816, vertex 816 being displaced from point 802' by distance d1. Grid 800 also includes vertex 818, vertex 820, vertex 822, and vertex 824, vertex 818 being displaced from point 804 'by distance d 2, vertex 820 being displaced from point 806' by distance d 3, vertex 822 being displaced from point 808 'by distance d 4, and vertex 824 being displaced from point 810' by distance d 5. In grid 800, all displacements d 1、d2、d3、d4 and d 5 are closer to symmetry line 814 than the corresponding points obtained by reflecting vertices about symmetry line 814. Typically, the displacement may be to the left or right of the reflection point. Further, the displacement may extend in more than one direction (e.g., up/down displacement in addition to left/right displacement).
Fig. 9 shows an example encoding framework for an almost symmetrical grid. Fig. 9 shows a coding framework 900 that differs from the coding framework 400 by including a symmetric prediction coding section 902. The symmetric prediction encoding portion 902 includes circuitry for implementing step 904, which separates two halves of a grid (e.g.,And) The comparison is symmetrical. Half of the grid(E.g., obsolete, uncoded) portions) can be predicted from simple reflection(E.g., encoded portion) and p i are derived as
Wherein the method comprises the steps ofRepresenting the displacement vector. Equation (1) is a special case of equation (2) when all displacements are zero. For an almost symmetrical grid, the encoder encodes an asymmetrical grid, a plane of symmetry and a displacement vector.
The output of the comparison at step 904 derives a displacement vector of the difference between the actual position of the respective vertex and the predicted position of the respective vertex based on symmetryWith respect to displacement vectorsIs compressed into the bit stream.
Fig. 10 is a flow chart illustrating a method 1000 of encoding video according to some embodiments. Method 1000 may be performed at a computing system (e.g., server system 112, source device 102, or electronic device 120) having control circuitry and memory storing instructions for execution by the control circuitry. In some implementations, the method 1000 is performed by executing instructions stored in a memory of a computing system (e.g., memory 314).
The system receives (1002) a mesh having polygons representing surfaces of objects. The system detects (1004) a first symmetric region in the grid that includes a first line of symmetry to divide the first symmetric region into a first partition and a second partition. The system recursively determines (1006) whether one of the first partition or the second partition includes a second symmetric region until no symmetric region is detected in both the first partition and the second partition. In response to detecting the second symmetric region within one of the first partition or the second partition: the system determines (1008) a second symmetry line within the second symmetry region to divide the first partition or the second partition into a third sub-partition and a fourth sub-partition; and the system compresses (1010) the information of the third sub-partition, the second symmetry line, and the first symmetry line into a bitstream.
In some embodiments, the first line of symmetry is a plane of reflective symmetry. In some implementations, the system compresses a first set of information about the third sub-partition into the bitstream. In some embodiments, prior to determining the plane of symmetry of the mesh, the system partitions the mesh into a first symmetric portion and a second asymmetric portion, and wherein determining the plane of symmetry of the mesh comprises determining the plane of symmetry of the first symmetric portion, and compressing information of the second asymmetric portion into the bitstream.
In some embodiments, dividing the mesh into a first symmetric portion and a second asymmetric portion comprises: the grid is separated into corresponding non-connected grid components based on a determination that the grid includes one or more non-connected grid components. In some implementations, compressing the information about the first symmetry line into the bitstream includes encoding a set of vertices of the first partition. In some implementations, the system reconstructs the mesh using the first line of symmetry and information about the normal direction to the first line of symmetry, the distance of the encoded vertex from the first line of symmetry.
In some implementations, determining the first line of symmetry includes determining a first plane of symmetry that includes the first line of symmetry, and compressing information about the first line of symmetry into the bitstream includes encoding the first plane of symmetry. In some implementations, the first plane of symmetry is encoded using a pair of symmetry vertices including the first vertex and the second vertex such that reflection of the first vertex with respect to the first plane of symmetry provides the second vertex. In some implementations, a first vertex of the pair of symmetric vertices is encoded in the mesh data or signaled with a first plane of symmetry. In some embodiments, the first plane of symmetry is modeled using a linear equation having four parameters, which are signaled.
In some implementations, the reflected vertex is a vertex of a first partition reflected about a first symmetry line, and the reflected vertex and a corresponding vertex from the second portion have a first displacement, and the system compresses information about the first displacement into the bitstream. In some implementations, the system provides a displacement vector including respective displacements between a set of vertices of the first partition and a corresponding set of vertices of the second portion, and the system compresses information about the displacement vector into the bitstream. In some implementations, the first partition and the second partition include an almost symmetrical grid.
Although fig. 10 shows some logic stages in a particular order, stages that are not order dependent may be reordered and other stages may be combined or split. Some reordering or other groupings not specifically mentioned will be apparent to those of ordinary skill in the art, and thus the ordering and groupings presented herein are not exhaustive. Furthermore, it should be appreciated that the stages may be implemented in hardware, firmware, software, or any combination thereof.
Turning now to some example embodiments.
(A1) In one aspect, some implementations include a method of video encoding and decoding (e.g., method 1000). In some implementations, the method is performed at a computing system (e.g., server system 112) having memory and control circuitry. In some implementations, the method is performed at a codec module (e.g., codec module 320). In some implementations, the method is performed at an entropy codec (e.g., entropy codec 214). The method comprises the following steps: (i) Receiving a mesh having polygons representing surfaces of objects; (ii) A first symmetric region of the grid including a first line of symmetry is detected to divide the first symmetric region into a first partition and a second partition. The method includes (iii) recursively determining whether one of the first partition or the second partition includes a second symmetric region until no symmetric region is detected in both the first partition and the second partition. The method includes, in response to detecting the second symmetric region within one of the first partition or the second partition: (iv) Determining a second symmetry line within the second symmetry region to divide the first partition or the second partition into a third sub-partition and a fourth sub-partition; and (v) compressing information of the third sub-partition, the second symmetry line, and the first symmetry line into the bitstream. In some implementations, a method includes (i) receiving a mesh (e.g., a 3D mesh having a plurality of 2D mesh segments) having a polygon (e.g., a triangle mesh) representing a surface of an object; (ii) Determining a first line of symmetry of the mesh (e.g., the first line of symmetry is a first plane of symmetry) to divide the mesh into a first partition and a second partition (e.g., the first partition is the same as the second partition when full symmetry exists, the first partition differing from the second partition by an amount less than a threshold amount when the first partition and the second partition are nearly symmetrical); (iii) In accordance with a determination that the first partition meets a first set of one or more criteria (e.g., the first set of one or more criteria includes a cost function, the cost function of further partitioning the sub-grid remains below a threshold): determining a second line of symmetry of the first partition (e.g., the second line of symmetry is a line in a second plane of symmetry) to divide the first partition into a third portion and a fourth portion; and (iv) compressing information of the third portion of the mesh, the second symmetry line, and the first symmetry line into the bitstream.
(A2) In some implementations of A1, the method includes compressing a first set of information about the third sub-partition (e.g., the first set of information includes connectivity information about the encoded points) into the bitstream.
(A3) In some embodiments of A1 or A2, the method comprises: dividing the grid into a first symmetric portion and a second asymmetric portion before detecting the first symmetric region, wherein detecting the first symmetric region in the grid includes determining a first line of symmetry and compressing information of the second asymmetric portion into the bitstream.
(A4) In some embodiments of A3, wherein dividing the mesh into a first symmetric portion and a second asymmetric portion comprises: the grid is separated into corresponding non-connected grid components based on a determination that the grid includes one or more non-connected grid components.
(A5) In some embodiments of any of A1-A4, wherein compressing the information about the first line of symmetry into the bitstream comprises encoding a set of vertices (e.g., the set of vertices represent the line of symmetry or the plane of symmetry) of the first partition.
(A6) In some embodiments of any of A1-A5, further comprising reconstructing the mesh (e.g., determining the location of one or more uncoded vertices) using the first line of symmetry and information about the normal direction to the first line of symmetry, the distance of the coded vertices from the first line of symmetry (e.g., decoding the vertex using the second line of symmetry before decoding the vertex with the first line of symmetry, the directionality of decoding is opposite to the directionality used for encoding).
(A7) In some embodiments of any of A1-A6, determining the first line of symmetry comprises determining a first plane of symmetry that contains the first line of symmetry, and compressing information about the first line of symmetry into the bitstream comprises encoding the first plane of symmetry.
(A8) In some implementations of A7, the first plane of symmetry is encoded using a pair of vertices including the first vertex and the second vertex such that reflection of the first vertex with respect to the first plane of symmetry provides the second vertex (e.g., the first plane of symmetry bisects a line connecting the first vertex and the second vertex).
(A9) In some implementations of A8, wherein a first vertex of the pair of symmetric vertices is encoded in the mesh data or signaled with a first plane of symmetry.
(A10) In some embodiments of A7, wherein the first plane of symmetry is modeled using a linear equation having four parameters, the four parameters are signaled.
(A11) In some embodiments of any of A1-a 10, wherein the reflected vertex is a vertex of a first partition reflected about a first line of symmetry, and the reflected vertex and a corresponding vertex from the second portion (e.g., the corresponding vertex is a specular reflection of the first partition's vertex about the first line of symmetry) have a first displacement, and the method includes compressing information about the first displacement into the bitstream.
(A12) In some implementations of a11, the method includes providing a displacement vector including respective displacements between a set of vertices of the first partition and a corresponding set of vertices of the second portion, and the method includes compressing information about the displacement vector into the bitstream.
(A13) In some implementations of a11, wherein the first partition and the second portion comprise an almost symmetrical mesh (e.g., an almost symmetrical mesh comprises a one-to-one vertex correspondence via a plane of symmetry and a corresponding displacement).
(A14) In some embodiments of any one of A1 to a13, wherein the first line of symmetry is a plane of reflective symmetry.
The methods described herein may be used alone or in combination in any order. Each of the methods may be implemented by processing circuitry (e.g., one or more processors or one or more integrated circuits). In some implementations, the processing circuitry executes programs stored in a non-transitory computer readable medium.
In another aspect, some implementations include a computing system (e.g., server system 112) including control circuitry (e.g., control circuitry 302) and memory (e.g., memory 314) coupled to the control circuitry, the memory storing one or more sets of instructions configured to be executed by the control circuitry, the one or more sets of instructions including instructions for performing any of the methods described herein (e.g., A1-a 14 above).
In yet another aspect, some implementations include a non-transitory computer-readable storage medium storing one or more sets of instructions for execution by control circuitry of a computing system, the one or more sets of instructions including instructions for performing any of the methods described herein (e.g., A1-a 14 above).
It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the claims. As used in the description of the embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term "if" may be interpreted in context as "when the prerequisite is true" or "after the prerequisite is true" or "in response to determining that the prerequisite is true" or "in accordance with determining that the prerequisite is true" or "in response to detecting that the prerequisite is true". Similarly, the phrase "if it is determined that the prerequisite is true" or "if it is true or" when it is true "may be interpreted in context as" after it is determined that the prerequisite is true "or" in response to determining that the prerequisite is true "or" in accordance with determining that the prerequisite is true "," after it is detected that the prerequisite is true "or" in response to detecting that the prerequisite is true ".
The foregoing description, for purposes of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the claims to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of operation and the practical application, thereby enabling others skilled in the art to practice the invention.