[go: up one dir, main page]

CN113596451B - Video encoding method, video decoding method and related devices - Google Patents

Video encoding method, video decoding method and related devices Download PDF

Info

Publication number
CN113596451B
CN113596451B CN202110723429.3A CN202110723429A CN113596451B CN 113596451 B CN113596451 B CN 113596451B CN 202110723429 A CN202110723429 A CN 202110723429A CN 113596451 B CN113596451 B CN 113596451B
Authority
CN
China
Prior art keywords
block
pixel
plane
horizon
blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110723429.3A
Other languages
Chinese (zh)
Other versions
CN113596451A (en
Inventor
季渊
宋远胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Tanggu Semiconductor Co ltd
Original Assignee
Wuxi Tanggu Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Tanggu Semiconductor Co ltd filed Critical Wuxi Tanggu Semiconductor Co ltd
Priority to CN202110723429.3A priority Critical patent/CN113596451B/en
Publication of CN113596451A publication Critical patent/CN113596451A/en
Application granted granted Critical
Publication of CN113596451B publication Critical patent/CN113596451B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application discloses a video encoding method, a video decoding method and related devices. The video encoding method includes: acquiring a first video frame in a video stream, wherein the first video frame comprises a plurality of first pixel blocks; determining a bit plane VJND threshold block of each first pixel block according to the viewpoint just-distortable VJND model; for each first pixel block, encoding the information of the target matching block according to a preset encoding format to obtain first encoded data of the first pixel block; the information of the target matching block is determined according to a bit plane VJND threshold block of the first pixel block; code data for an M-level plane of a first video frame is determined based on first code data for the plurality of first pixel blocks. By adopting the video encoding method, the video decoding method and the related devices, the transmission data quantity of the encoded data of the first video frame can be reduced, so that the data transmission bandwidth is reduced.

Description

Video encoding method, video decoding method and related devices
Technical Field
The application belongs to the field of image processing, and particularly relates to a video encoding method, a video decoding method and related devices.
Background
A near-eye display is a novel display that is positioned near the human eye and is magnified by an optical system to form a field of view. It may display scenes with a wearable, e.g., virtual Reality (VR), helmets and glasses used in the augmented Reality (Augmented Reality, AR) field. In order to further improve user experience, the near-eye display should have higher resolution and refresh rate, and the equipment is miniaturized, lightweight, and convenient for wearing. High resolution and high refresh rate means that massive data will be transferred and stored, which not only multiplies the bandwidth transfer pressure, but also results in waste of hardware resources. To cope with this problem, a scheme of compressing video data using a motion estimation technique is widely adopted. However, since the video compression method using the motion estimation technique requires buffering one frame of video data at a time, it increases the memory area overhead of the near-eye display, which is contrary to the demand for miniaturization of the apparatus.
Disclosure of Invention
The embodiment of the application provides a video encoding method, a video decoding method and a related device, which can compress and transmit bit planes of video frames on the premise of ensuring high refresh rate and high resolution, and reduce the memory area overhead of a near-eye display.
According to a first aspect of embodiments of the present application, there is provided a video encoding method, including:
acquiring a first video frame in a video stream, wherein the first video frame comprises a plurality of first pixel blocks;
determining a bit plane VJND threshold block for each of the first pixel blocks according to a view just noticeable distortion (View Just Noticeable Difference, VJND) model; the VJND model is determined according to the just-noticeable distortion threshold and the viewpoint factor quantization threshold; the viewpoint factor quantization threshold is determined according to the retinal eccentricity corresponding to the first pixel block; the bit plane VJND threshold block comprises visual sensitivity values of M layer planes corresponding to all pixel points in the first pixel block; the M layer level plane is obtained by splitting a first video frame;
for each first pixel block, encoding the information of the target matching block according to a preset encoding format to obtain first encoded data of the first pixel block; the first coding data comprise coding data of each horizon plane in the L horizon planes corresponding to the first pixel blocks; the M horizon planes include the L horizon planes; the information of the target matching block is determined according to a bit plane VJND threshold block of the first pixel block; the information of the target matching block comprises information of the target matching block of each horizon plane of the L horizon planes;
Code data for an M-level plane of a first video frame is determined based on first code data for the plurality of first pixel blocks.
In some embodiments, the determining the bit plane VJND threshold block for each of the first pixel blocks according to the view just-in-view distortion VJND model includes:
for each pixel point in each first pixel block, calculating a VJND threshold value of the pixel point according to a Viewpoint Just Noticeable Distortion (VJND) model;
determining a bit plane VJND threshold block corresponding to the VJND threshold of the pixel according to the corresponding relation between the VJND threshold and the bit plane VJND threshold block;
and determining the bit plane threshold value block of each first pixel block based on the bit plane VJND threshold value blocks corresponding to all pixel points in each first pixel block.
In some embodiments, before the encoding the information of the target matching block according to the preset encoding format for each first pixel block, determining the encoded data of the M-layer level of the first video frame, the method further includes;
determining a plurality of candidate matching blocks corresponding to each first pixel block for each horizon plane of the L horizon planes; the plurality of candidate matching blocks comprise candidate matching blocks of any one horizon plane of the L horizon planes;
For each of the first pixel blocks, determining a target matching block from a plurality of candidate matching blocks based on a bit-plane VJND threshold block of the first pixel block; the target matching block includes a candidate matching block that matches the first pixel block;
and determining information of the target matching blocks of each horizon plane based on the target matching blocks of each horizon plane corresponding to the first pixel blocks.
In some embodiments, the determining, for each horizon plane of the L horizon planes, a plurality of candidate matching blocks corresponding to each of the first pixel blocks includes:
searching each horizon plane of the L horizon planes in a corresponding horizon plane of a second video frame and/or a previous horizon plane of the bit plane of a first video frame according to a preset searching range, and determining a plurality of first candidate matching blocks of each first pixel block; the second video frame includes a frame prior to the first video frame;
the determining, for each of the first pixel blocks, a target matching block from a plurality of candidate matching blocks based on a bit-plane VJND threshold block of the first pixel block, including:
for each of the first pixel blocks, a target matching block is determined from a plurality of first candidate matching blocks based on a bit-plane VJND threshold block of the first pixel block.
In some embodiments, the preset search range includes a diamond-shaped range centered on a position of the bit plane corresponding to the first pixel block.
In some embodiments, the determining, for each of the first pixel blocks, a target matching block from a plurality of candidate matching blocks based on a bit-plane VJND threshold block of the first pixel block comprises:
for each first pixel block, determining a target matching block from a plurality of first candidate matching blocks and a preset complementary matching block based on a bit plane VJND threshold block of the first pixel block.
In some embodiments, the determining, for each of the first pixel blocks, a target matching block from a plurality of candidate matching blocks based on a bit-plane VJND threshold block of the first pixel block comprises:
determining, for each of the first pixel blocks, a number of matching points corresponding to each candidate matching block for the first pixel block based on the first pixel block, the plurality of candidate matching blocks, and the bit plane VJND threshold block; the matching points are used for representing the similarity between the first pixel block and the candidate matching block;
and determining a target matching block matched with the first pixel block according to the matching point number.
In some embodiments, the determining, based on the first pixel block, the plurality of candidate matching blocks, and the bit plane VJND threshold block, a number of matching points for the first pixel block corresponding to each candidate matching block includes:
determining a target visual sensitivity value of the bit plane corresponding to the first pixel block from the bit plane VJND threshold block according to the layer number of the bit plane;
for each candidate matching block, calculating a first matching point number of the ith x j pixel point in the first pixel block and the candidate matching block according to the gray level value of the ith x j pixel point of the first pixel block, the gray level value of the ith x j pixel point of the candidate matching block and the target visual sensitivity value of the bit plane of the ith x j pixel point of the first pixel block; i and j are not more than N, and i and j are integers not less than 0;
and accumulating the first matching points of the pixel points in the first pixel block and the candidate matching block to obtain the matching points of the first pixel block and the candidate matching block.
In some embodiments, the calculating the first matching point number of the first pixel block and the ith×j pixel point in the candidate matching block according to the gray level value of the ith×j pixel point of the first pixel block, the gray level value of the ith×j pixel point of the candidate matching block, and the target visual sensitivity value of the bit plane of the ith×j pixel point of the first pixel block includes:
Determining that the ith pixel point of the first pixel block is matched with the ith pixel point of the candidate matching block under the condition that the target visual sensitivity value of the bit plane of the ith pixel point is a first preset value;
and/or determining that the gray value of the ith x j pixel point of the first pixel block is the same as the gray value of the ith x j pixel point of the candidate matching block;
and determining the first matching point number of the ith multiplied by j pixel points of the first pixel block and the candidate matching block as a second preset value.
In some embodiments, where M is greater than L, the determining the encoded data for the M-level plane of the first video frame based on the first encoded data for the plurality of first pixel blocks comprises:
and determining the coding data of the M horizon planes of the first video frame according to the first coding data of the first pixel blocks and the gray values corresponding to other bit planes except the L horizon planes in the M horizon planes.
According to a second aspect of embodiments of the present application, there is provided a video decoding method, including:
acquiring coding data of an M-layer level of a first video frame; the M horizon plane is obtained by splitting a first video frame; the first video frame includes a plurality of first blocks of pixels;
Decoding the encoded data of the M-layer level of the first video frame according to a preset decoding format to obtain information of target matching blocks corresponding to a plurality of first pixel blocks; the information of the target matching block comprises information of the target matching block of each horizon plane of the L horizon planes; the M horizon planes include the L horizon planes;
for each first pixel block, determining a gray value of each horizon plane in the L horizon planes corresponding to the first pixel block according to the information of the target matching block corresponding to the first pixel block;
and restoring the M horizon planes of the first video frame based on the gray value of each horizon plane in the L horizon planes corresponding to the plurality of first pixel blocks.
In some embodiments, the determining, according to the information of the target matching blocks corresponding to the plurality of first pixel blocks, a gray value of each of the L horizon planes corresponding to the plurality of first pixel blocks includes:
determining a plurality of candidate matching blocks for each of the first pixel blocks for each of the L horizon planes; the plurality of candidate matching blocks comprise candidate matching blocks of any one horizon plane of the L horizon planes;
for each first pixel block, determining a target matching block from a plurality of candidate matching blocks of the corresponding horizon plane according to the information of the target matching block corresponding to the first pixel block; the target matching block includes a candidate matching block that matches the first pixel block;
And determining the gray value of each horizon plane corresponding to the first pixel block according to the target matching block of each horizon plane corresponding to the first pixel block.
In some embodiments, the determining, for each horizon plane of the L horizon planes, a plurality of candidate matching blocks for each of the first pixel blocks includes:
searching each horizon plane of the L horizon planes in a corresponding horizon plane of a second video frame and/or a previous horizon plane of the bit plane of a first video frame according to a preset searching range, and determining a plurality of first candidate matching blocks of each first pixel block; the second video frame includes a frame prior to the first video frame;
a plurality of candidate matching blocks for each of the first pixel blocks is determined based on the plurality of first candidate matching blocks for each of the first pixel blocks.
In some embodiments, the preset search range includes a diamond-shaped range centered on a position of the bit plane corresponding to the first pixel block.
In some embodiments, determining the plurality of candidate matching blocks for each of the first pixel blocks based on the plurality of first candidate matching blocks for each of the first pixel blocks comprises:
And determining a plurality of candidate matching blocks of the first pixel block based on the plurality of first candidate matching blocks of each first pixel block and a preset supplementary matching block.
In some embodiments, the encoded data of the M-bit-level planes of the first video frame includes gray values corresponding to other ones of the M-bit-level planes other than the L-bit-level plane;
the restoring the M-level plane of the first video frame based on the gray value of each level plane in the L-level planes corresponding to the plurality of first pixel blocks includes:
and restoring the M horizon planes of the first video frame based on the gray values of each horizon plane in the L horizon planes corresponding to the plurality of first pixel blocks and the gray values corresponding to other bit planes except the L horizon planes in the M horizon planes.
According to a third aspect of embodiments of the present application, there is provided a video encoding apparatus, for use in a controller of a near-eye display, the video encoding apparatus comprising:
a first acquisition module configured to perform acquisition of a first video frame in a video stream, the first video frame comprising a plurality of first pixel blocks;
a threshold block determining module configured to perform determining a bit plane VJND threshold block for each of the first pixel blocks according to a viewpoint-exact visual distortion VJND model; the VJND model is determined according to the just-noticeable distortion threshold and the viewpoint factor quantization threshold; the viewpoint factor quantization threshold is determined according to the retinal eccentricity corresponding to the first pixel block; the bit plane VJND threshold block comprises visual sensitivity values of M layer planes corresponding to all pixel points in the first pixel block; the M layer level plane is obtained by splitting a first video frame;
The encoding module is configured to execute encoding of the information of the target matching block according to a preset encoding format for each first pixel block to obtain first encoded data of the first pixel block; the first coding data comprise coding data of each horizon plane in the L horizon planes corresponding to the first pixel blocks; the M horizon planes include the L horizon planes; the information of the target matching block is determined according to a bit plane VJND threshold block of the first pixel block; the information of the target matching block comprises information of the target matching block of each horizon plane of the L horizon planes;
an encoded data determination module configured to perform determining encoded data of an M-level plane of a first video frame based on first encoded data of the plurality of first pixel blocks.
According to a fourth aspect of embodiments of the present application, there is provided a video decoding apparatus, applied to a controller of a near-eye display, the video decoding apparatus comprising:
a second acquisition module configured to perform acquisition of encoded data of the M-level plane of the first video frame; the M horizon plane is obtained by splitting a first video frame; the first video frame includes a plurality of first blocks of pixels;
The decoding module is configured to execute decoding on the encoded data of the M-layer level of the first video frame according to a preset decoding format to obtain information of target matching blocks corresponding to the plurality of first pixel blocks; the information of the target matching block comprises information of the target matching block of each horizon plane of the L horizon planes; the M horizon planes include the L horizon planes.
The gray level determining module is configured to execute the gray level determining module, for each first pixel block, of each level plane in the L level planes corresponding to the first pixel block according to the information of the target matching block corresponding to the first pixel block;
and the restoring module is configured to execute restoring the M horizon planes of the first video frame based on the gray value of each horizon plane in the L horizon planes corresponding to the plurality of first pixel blocks.
According to a fifth aspect of embodiments of the present application, there is provided an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement any of the video encoding methods as provided in the first aspect or any of the video decoding methods as provided in the second aspect.
According to a sixth aspect of embodiments of the present application, there is provided a storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform any one of the video encoding methods as provided in the first aspect, or any one of the video decoding methods as provided in the second aspect.
According to a seventh aspect of embodiments of the present application, there is provided a computer program product comprising a computer program or instructions, characterized in that the computer program or instructions, when executed by a processor, implement the video encoding method as claimed in any one of the first aspects or the video decoding method as claimed in any one of the second aspects.
The technical scheme provided by the embodiment of the application at least brings the following beneficial effects:
on one hand, the relation between the critical frequency of human eyes and the eccentricity of retina can be quantized, a VJND model is established, and the VJND model is used for determining a target matching block, so that the matching accuracy in bit plane motion estimation is improved, and the video frame quality of the restored bit plane is further improved. On the other hand, the information of the target matching block of the first pixel block in the first video frame is encoded to obtain the encoded data of the L-layer level of the first video frame, so that the bit-plane compression encoding of the first video frame is realized, the transmission data quantity of the encoded data of the first video frame is reduced, the data transmission bandwidth is reduced, the storage area overhead of the near-eye display is further reduced, the transmission efficiency of the first video frame is improved, and the high refresh rate and the high resolution of the near-eye display are ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described, and it is possible for a person skilled in the art to obtain other drawings according to these drawings without inventive effort.
FIG. 1 is a flow diagram of a block matching algorithm according to the prior art;
FIG. 2 is a flow chart of a near-eye display method according to the prior art;
FIG. 3 is a flow chart illustrating a near-eye display method according to an exemplary embodiment;
FIG. 4 is a flow chart illustrating a video encoding method according to an exemplary embodiment;
FIG. 5 is a schematic diagram illustrating video frame region partitioning according to an exemplary embodiment;
FIG. 6 is a graph illustrating PSNR results after single bit-plane code reduction according to an exemplary embodiment;
FIG. 7 is a graph illustrating PSNR results after multi-layer bit-plane coding reduction, according to an example embodiment;
FIG. 8 is a schematic diagram illustrating SSIM results after single bit-plane code reduction, according to an exemplary embodiment;
FIG. 9 is a schematic diagram illustrating SSIM results after multi-layer bit-plane coding reduction, according to an exemplary embodiment;
FIG. 10 is a schematic diagram of a preset supplemental matching block shown in accordance with an exemplary embodiment;
FIG. 11 is a flow chart illustrating a video decoding method according to an exemplary embodiment;
fig. 12 is a block diagram illustrating a video encoding apparatus according to an exemplary embodiment;
fig. 13 is a block diagram illustrating a video decoding apparatus according to an exemplary embodiment;
FIG. 14 is a block diagram illustrating a controller of a near-eye display according to an exemplary embodiment;
fig. 15 shows a hardware configuration diagram of the electronic device shown according to an exemplary embodiment.
Detailed Description
Features and exemplary embodiments of various aspects of the present application are described in detail below to make the objects, technical solutions and advantages of the present application more apparent, and to further describe the present application in conjunction with the accompanying drawings and the detailed embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative of the application and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by showing examples of the present application.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
Before providing a detailed description of a video encoding method in an embodiment of the present application, a brief description of a technology related to the present application is first provided.
Motion estimation is an important technology in the field of video compression, and is also an important link in a video coding system, and can be divided into two main categories, namely a space domain and a frequency domain according to different action fields. The motion estimation of the airspace can be roughly divided into algorithms such as a block matching method, a pixel recursion method, an optical flow method, a Bayesian method and the like. Among the above algorithms, the block matching algorithm is widely used because it is easy to be implemented in software and hardware.
Fig. 1 is a flow diagram of a block matching algorithm according to the prior art. As shown in fig. 1, the block matching algorithm is as follows: the current frame image is divided into a number of mutually exclusive matching blocks, which include m×m pixels. Let the matching block in the current frame be a, the co-located region corresponding to the reference frame be matching block B, and the search range centered on B in the reference frame be C. The block matching process is to find a candidate matching block D, which is most similar to A, in a search range C according to a matching criterion, and coordinate offsets V, namely motion vectors, of A to D.
And focusing a view geometric model of a certain gaze point in a frame of image according to human eyes. Let the pixel width of the observed image be N, while the line of fovea and gaze point is perpendicular to the image plane. The visual distance L identifies the distance from human eyes to the observation plane, d represents the distance between the current pixel point and the staring point in the image plane, and the coordinates of the staring point pixel are set as X 0 =(x 0 ,y 0 ) The coordinate of the current pixel point is X 1 =(x 1 ,y 1 ) The formula for calculating d is as follows:
calculating the retinal eccentricity e corresponding to the current pixel point:
a just noticeable distortion (Just Noticeable Difference, JND) threshold represents the minimum distortion of the image that can be perceived by the human eye. The main characteristics of the human visual system can be simulated through the JND model, a JND threshold value is obtained, and when the change value is lower than the threshold value, human eyes cannot perceive the change.
The bit plane may be a plane formed by sequentially extracting the numerical value of the same position of each pixel point by representing the gray value of each pixel point in the image as a plurality of binary codes. Each bit plane is a binary image, that is, the gray value of the pixel point in each bit plane is 0 or 1.
The digital driving method generates gray scale by changing the length of the bright and dark time, and has low image noise, high gray scale and rich colors, so that the digital driving method is widely applied to the near-eye display field. Digitally driven near-eye displays typically employ a subfield scan method, i.e., scanning the display in bit-plane subspaces.
At present, video coding standards such as H.264, H.265, H.266 and the like are released based on a traditional video compression method. Fig. 2 is a flow chart of a near-eye display method according to the prior art. As shown in fig. 2, the conventional video compression method is applied to the near-eye display method. The transmitting end performs encoding and decoding based on the frames. The receiving end decodes a frame of complete data and splits the video frame into multiple layers of bit planes. And the scanning display end scans and displays the layer planes layer by layer. Therefore, a near-eye display employing the above conventional video compression method needs to buffer one frame of video compression data at a time. Thus, the memory area overhead of the near-eye display is increased, contrary to the device miniaturization requirement.
In view of this, the present application provides a near-eye display method based on the technical concept of employing bit-plane compression on video frames. Fig. 3 is a flow chart illustrating a near-eye display method according to an exemplary embodiment. As shown in fig. 3, the near-eye display method includes: the video frame is split into a plurality of layers of bit planes at the transmitting end, then motion estimation encoding is carried out based on the bit planes, each layer of the bit planes are restored at the receiving end in sequence, and the scanning display end can directly carry out scanning display according to the restored bit planes. In this way, the memory area overhead of the near-eye display may be reduced. Furthermore, the present application provides a video encoding method, a video decoding method, an apparatus, an electronic device, a storage medium, and a computer program product based on the technical concept of the near-eye display method.
A video encoding method provided in the application embodiment is first described below.
Fig. 4 is a flow chart illustrating a video encoding method according to an exemplary embodiment. As shown in fig. 4, the video encoding method may be applied to a controller of a near-eye display, including the steps of:
step S110, a first video frame in the video stream is acquired.
Here, the first video frame is a video frame to be encoded. The first video frame may include a plurality of first pixel blocks. The first pixel blocks are mutually disjoint and distributed at different positions of the first video frame. The first pixel block may include a plurality of pixel points, for example, the first pixel block may include n×n pixel points, N being a positive integer greater than 0. Alternatively, N may be 3.
Step S120, determining a bit plane VJND threshold block of each first pixel block according to the VJND model.
Here, the VJND model is obtained by combining viewpoint factors with a conventional JND model. The VJND model may be determined based on an appropriate visual distortion threshold and a viewpoint factor quantization threshold. The viewpoint factor quantization threshold may be determined according to the retinal eccentricity corresponding to the first pixel block.
The bit plane VJND threshold block may include visual sensitivity values for M-bit planes corresponding to respective pixels in the first pixel block. The visual sensitivity value characterizes whether the human eye is sensitive to a pixel located at a certain position. If sensitive, the human eye can perceive the change of the pixel point. If not sensitive, the human eye cannot perceive the change of the pixel point.
Here, the first video frame is split to obtain an M-level plane. Specifically, the gray value of each pixel point of the first video frame is encoded into M-bit binary system, and the numerical value of the same bit number of each pixel point is taken to form a bit plane. For example, the first bit value in the M-bit binary of each pixel is taken to form the first horizon plane, … …, and the mth bit value in the M-bit binary of each pixel is taken to form the mth horizon plane. Alternatively, M may be 8.
Step S130, for each first pixel block, encoding the information of the target matching block according to a preset encoding format to obtain first encoded data of the first pixel block.
Since the first video frame can be split into M horizon planes, each first pixel block in the first video frame is mapped onto the same position of each horizon plane, so that the first pixel blocks correspond to the same size pixel blocks in each horizon plane.
Here, the first encoded data includes encoded data of each of the L-level planes corresponding to the first pixel block. The video encoding device compression encodes the bit planes of the first video frame. Specifically, the video encoding device determines, by using a block matching method, a corresponding target matching block of each first pixel block in the first video frame in each of the L-level planes. The video coding device codes the information of the target matching block to obtain the coded data of each horizon plane of the first pixel block in the L horizon planes.
Wherein the information of the target matching block is determined from the bit-plane VJND threshold block of the first pixel block. The information of the target matching block includes information of the target matching block of each of the L-level planes. In this way, the video encoding device encodes the information of the target matching block, so that the amount of encoded data of each pixel block in each horizon plane is reduced, and further, the amount of encoded data of the L horizon planes in the first video frame is reduced.
In some embodiments, the M-horizon plane comprises an L-horizon plane, i.e. the L-horizon plane belongs to the M-horizon plane. The L-level bit planes may include continuous multi-layer bit planes, may include any one of the M-level bit planes, and may include discontinuous multi-layer bit planes.
In some embodiments, the information of the target matching block includes index information of the target matching block, and the preset encoding format may be binary encoding. For example, the first pixel block includes 3×3 pixels, the index information of the target matching block is 5, and the video encoding device performs binary encoding on the index information of the target matching block to obtain encoded data as 11. Therefore, compared with 9-bit data which is required to be transmitted before the first pixel block is not compressed and encoded, the encoded data of the first pixel block is two bits after the compression and encoding are carried out, so that the transmission data quantity corresponding to the first pixel block is reduced.
Step S140 determines the encoded data of the M-level plane of the first video frame based on the first encoded data of the plurality of first pixel blocks.
Here, the video encoding apparatus determines the encoded data of the M-level plane in the first video frame based on the encoded data of each level plane of the L-level planes corresponding to all the first pixel blocks of the first video frame.
In the embodiment of the application, on one hand, the relation between the critical frequency of human eyes and the eccentricity of retina can be quantized, and the VJND model is used in a matching criterion in bit plane motion estimation, so that the matching accuracy in the bit plane motion estimation is improved, and the video frame quality of the restored bit plane is improved. On the other hand, the information of the target matching block of the first pixel block in the first video frame is encoded to obtain the encoded data of the L-layer level of the first video frame, so that the bit-plane compression encoding of the first video frame is realized, the data volume of the encoded data of the first video frame is reduced, the data transmission bandwidth is reduced, the storage area overhead of the near-eye display is further reduced, the transmission efficiency of the first video frame is improved, and the high refresh rate and the high resolution of the near-eye display are ensured.
A specific implementation of each of the above steps is described below.
In some embodiments, in step S120, determining the bit plane VJND threshold block for each first pixel block according to the VJND model may include:
step S121, for each pixel point in each first pixel block, calculates a VJND threshold for each pixel point according to the VJND model.
Here, the VJND model is obtained by combining viewpoint factors with a conventional JND model. The VJND threshold may be determined based on the just-noticeable distortion threshold and the view factor quantization threshold.
In some embodiments, the VJND model may be:
VJND=JND fb +JND v (3)
wherein JND fb Threshold value, JND, which can be obtained from conventional JND model v The threshold is quantized for the viewpoint factor.
In some embodiments, the viewpoint factor quantization threshold corresponding to each pixel point may be determined according to the retinal eccentricity corresponding to each pixel point.
Here, the VJND threshold of the pixel point is determined according to the correspondence between the retinal eccentricity and the viewpoint factor quantization threshold. Wherein, the viewpoint factor quantization threshold may be divided according to the magnitude of the retinal eccentricity.
In practical application, since the critical frequency of the human eye is maximum at the center of the viewpoint, the critical frequency of the human eye is sharply attenuated toward both sides as the eccentricity increases. Therefore, the high frequency information of the non-viewpoint area is hardly perceived by the human eye. Generally, the visual angle of the human eye is 124 degrees, and the visual angle is about 1/5 of the visual angle when focusing attention, namely 25 degrees, the visual area larger than 25 degrees is the non-viewpoint area, and the visual area smaller than 25 degrees is the non-viewpoint area.
In order to obtain the corresponding relation between the retinal eccentricity and the viewpoint factor quantization threshold, the applicant centers the viewpoint in the center of the video frame, and divides the video frame into a plurality of areas according to the retinal eccentricity e according to the curve change trend of the critical frequency and the eccentricity of the human eye. Fig. 5 is a schematic diagram illustrating video frame region division according to an exemplary embodiment. As shown in fig. 5, the video frame is divided into five regions according to the retinal eccentricity e. And then, respectively enabling the video frames to generate noise with different sizes, and counting the perception conditions of human eyes on the noise with different sizes in different areas, so as to obtain the relation between the retina eccentricity e and the viewpoint factor quantization threshold.
For example, the correspondence of the retinal eccentricity e and the viewpoint factor quantization threshold JNDv may be as follows:
step S122, determining a bit plane VJND threshold block corresponding to the VJND threshold of each pixel according to the correspondence between the VJND threshold and the bit plane VJND threshold block.
Here, the bit plane VJND threshold block may include visual sensitivity values of M-bit-level planes corresponding to the pixel points. Since different horizon planes mainly contain different component frequencies, the sensitivity of the human eye to pixels located at the same position of the different horizon planes is different.
For example, the high-level horizon plane may be predominantly low-frequency components, and the low-level plane may be predominantly high-frequency components. When a certain non-view region changes, the low level plane data of the non-view region is generally not noticeable to the human eye. Whereas for high level planar data of this non-view area the human eye may be aware.
In some embodiments, according to the corresponding relationship between the VJND threshold and the bit plane VJND threshold block, the VJND threshold corresponding to the pixel point is mapped to each horizon plane, so as to obtain the sensitivity value of each horizon plane corresponding to the pixel point.
For example, the correspondence of the VJND threshold to the bit-plane VJND threshold block may be as shown in equation 5:
Wherein the bit plane VJND threshold block may be denoted by C.
Step S123, determining a bit plane threshold block of each first pixel block based on the bit plane VJND threshold blocks corresponding to all the pixel points in each first pixel block.
Here, the first pixel block may include a plurality of pixel points. For each first pixel block, determining a bit plane threshold block of the first pixel block based on a bit plane VJND threshold block corresponding to each pixel point in the first pixel block.
In the embodiment of the application, the relation between the critical frequency of human eyes and the eccentricity of retina is quantized, a VJND model is established, and a VJND threshold value is mapped to a bit plane, so that the accuracy of judging whether each pixel point is sensitive in each horizon plane is improved, and the accuracy of matching of bit plane motion estimation is improved subsequently.
In some embodiments, in step S140, in the case where M is equal to L, the video encoding apparatus determines the encoded data of each horizon plane corresponding to the first video frame from the encoded data of each horizon plane of all the first pixel blocks of the first video frame, thereby determining the encoded data of the M horizon planes of the first video frame.
Here, the video encoding apparatus performs compression encoding on data of each bit-level plane corresponding to all the first pixel blocks. And obtaining the coded data of each horizon plane of the first video frame according to the coded data of each horizon plane corresponding to all the first pixel blocks, so that the coded data of all the bit planes of the first video frame can be determined. Therefore, the compression coding is carried out on all bit planes of the first video, the compression amount of the first video frame can be improved as much as possible, the bit plane data transmission amount of the first video frame is reduced, and the transmission bandwidth is reduced.
In some embodiments, in order to ensure the restored image quality, in step S140, in the case where M is greater than L, the video encoding apparatus determines the encoded data of the M-level plane of the first video frame according to the first encoded data of the plurality of first pixel blocks and gray values corresponding to bit planes other than the L-level plane in the M-level plane.
Here, the video encoding device performs compression encoding on the data of the L-level planes in the M-level planes corresponding to all the first pixel blocks, to obtain encoded data of each of the L-level planes corresponding to all the first pixel blocks. The video encoding device may determine compression encoded data of the L-level plane of the first video frame based on encoded data of each of the L-level planes corresponding to the first pixel block. The video encoding device determines the encoded data of the first video frame based on the encoded data of the L-level plane of the first video frame and gray values corresponding to other bit planes except for the L-level plane in the M-level plane. Therefore, some horizon planes of the first video frame are compressed and encoded, the data of other horizon planes are kept unchanged, and the image quality of the restored video frame can be improved on the basis of reducing the bit plane data transmission quantity of the first video.
In practical applications, the applicant chooses the signal-to-noise ratio (Peak Signal to Noise Ratio, PSNR) and the structural similarity (Structural Similarity, SSIM) as objective image quality evaluation criteria. Fig. 6 is a diagram illustrating PSNR results after single bit-plane coding reduction according to an exemplary embodiment. Fig. 7 is a diagram illustrating PSNR results after multi-layer bit-plane coding reduction according to an exemplary embodiment. Fig. 8 is a schematic diagram illustrating a single bit plane code restored SSIM result according to an exemplary embodiment. Fig. 9 is a diagram illustrating the result of the reduced SSIM of a multi-layer bit plane code according to an exemplary embodiment.
As shown in fig. 6 and fig. 7, according to the result of coding and restoring a certain horizon plane while keeping other horizon planes unchanged, it can be determined that the quality of the restored video frame obtained after compressing the low horizon plane is better. As shown in fig. 8 and 9, according to the result of coding and restoring the multi-layer bit planes from the lowest layer bit plane, it can be determined that the quality of the restored video frame obtained after compressing a certain number of low layer bit planes is better. As shown in fig. 6, 7, 8 and 9, it can be determined that the quality of video frames restored for the G-1 th horizon planar coding is better than the quality of video frames restored for the G-horizon planar coding, which is better than the quality of video frames restored for the g+1 th horizon planar coding.
Optionally, to ensure that video frame quality and compression are optimized, the L-level horizon plane may include a 5 th level horizon plane and a low level below the fifth level.
In some embodiments, before encoding the information of the target matching block according to the preset encoding format for each first pixel block and determining the encoded data of the M-level plane of the first video frame in step S130, the method may further include:
step S150, for each horizon plane of the L horizon planes, determining a plurality of candidate matching blocks corresponding to each first pixel block.
Here, the M horizon planes split by the first video frame may include L horizon planes. The plurality of candidate matching blocks may include candidate matching blocks for any one of the L horizon planes. For each horizon plane of the L horizon planes, the video encoding apparatus may determine a plurality of candidate matching blocks corresponding to the first pixel block at each horizon plane according to the position of the first pixel block at each horizon plane.
Step S160, for each first pixel block, determines a target matching block from the plurality of candidate matching blocks based on the bit-plane VJND threshold block of the first pixel block.
Here, the target matching block may include a candidate matching block that matches the first pixel block. For each first pixel block, the video encoding device determines a candidate matching block that matches the first pixel block from among a plurality of candidate matching blocks for each horizon plane based on a bit plane VJND threshold block of the first pixel block, so that a target matching block corresponding to the first pixel block in each horizon plane can be determined.
Step S170, determining information of the target matching block of each horizon plane based on the target matching block of each horizon plane corresponding to the first pixel block.
Here, the information of the target matching block may include index information of the target matching block. Alternatively, the index information of the target matching block may include ordering information of the target matching block in the candidate matching blocks. For example, the video encoding device determines that the 6 th candidate matching block is the target matching block, and may determine that the index information of the target match is 6.
In addition, for each horizon plane, the video encoding device determines information of a target matching block of each horizon plane corresponding to the first video frame according to target matching blocks of a plurality of first pixel blocks in the first video frame.
In the above embodiment, based on the bit plane VJND threshold block of the first pixel block, the target matching block is determined from the plurality of candidate matching blocks, which can consider viewpoint factors in the bit plane motion estimation, and is helpful for improving the accuracy of motion estimation matching, and further improving the quality of the restored video frame.
In order to reduce the searching range and ensure the matching accuracy, the applicant utilizes the high frame rate video sequence of the common video test sequence to form a test library, extracts a plurality of video fragments from the test library to carry out experimental statistics on the correlation of matching blocks in time, space and gray scale, thereby obtaining the following conclusion: the higher the number of bit plane layers, the higher the correlation; the spatial correlation is larger than the temporal correlation, the temporal correlation is larger than the gray scale correlation, and the gray scale correlation degree is also high; the correlation of the matching block gradually decreases with increasing distance from the center to the periphery, and there is a high degree of symmetry. Thus, in bit plane motion estimation, the search range may include a corresponding horizon plane of the second video frame, and/or a previous horizon plane of the first video frame.
In some embodiments, in step S150, determining, for each of the L horizon planes, a plurality of candidate matching blocks corresponding to each of the first pixel blocks may include:
and searching the corresponding horizon planes of the second video frame according to a preset searching range aiming at each horizon plane of the L horizon planes, and determining a plurality of first candidate matching blocks of each first pixel block.
Here, the candidate matching block may include a first candidate matching block. The second video frame may include a frame preceding the first video frame. The preset search range may be determined according to a search method of block matching. The block matching search method may include a diamond search method, a hexagonal search method, a new three-step method, a new four-step method, and other existing search methods. For each horizon plane of the L horizon planes, the video encoding apparatus searches for a corresponding bit plane of the second video frame according to a preset search range, so as to determine a plurality of first candidate matching blocks of each horizon plane corresponding to each first pixel block.
In some embodiments, the preset search range includes a diamond-shaped range centered on a location of a bit plane corresponding to the first pixel block.
Here, the diamond-shaped range centered on the position of the bit plane corresponding to the first pixel block may include an upper position, a lower position, a left position, a right position, and a center position where the first pixel block is located.
In practical application, the applicant finds through experiments that when the refresh rate is 30 hz, the distance of the circle S moving in two adjacent frames is l. When the refresh rate is 60Hz, the moving distance of the circle S in two adjacent frames is 1/2l. When the refresh rate is 90Hz, the circle S moves by 1/3l in the adjacent two frames. When the refresh rate is 120Hz, the distance of the circle S between two adjacent frames is only 1/4l. Therefore, as the refresh rate increases exponentially, the range of movement of the circle S in two adjacent frames is greatly reduced.
Based on the above conclusion, considering the characteristic of the super high refresh rate of the near-eye display, in the embodiment of the present application, the step size of the preset search range is 1.
For example, for a certain position (x n ,y n ) The video encoding device searches in the 2 nd horizon plane of the second video frame according to a preset search range, and can determine a plurality of first candidate modules of the first pixel block in the 2 nd horizon plane of the first video frame. The plurality of first candidate blocks may comprise 5 second pixel blocks, in particular may comprise a second video frame having a position (x) of a 2 nd horizon plane n ,y n -1)、(x n ,y n -1)、(x n -1,y n )、(x n +1,y n ) (x) n ,y n ) Is included in the pixel block of (a). Each pixel block is equal in size to the first pixel block.
In the above embodiment, the search is performed on the bit plane of the previous frame, and the candidate matching module is determined, that is, the search range of the bit plane is optimized to the preset search range of the time dimension, which is beneficial to reducing the search range, simplifying the search flow and facilitating the design of hardware real-time performance.
In the embodiment of the application, since the video frame encoded data is sequentially transmitted according to the bit planes, the current horizon plane can be restored by one horizon plane on the same video frame. And the current horizon plane is restored by utilizing the previous horizon plane, so that the matching accuracy can be improved, and the image quality of the restored video frame is improved.
In some embodiments, in step S150, determining, for each of the L horizon planes, a plurality of candidate matching blocks corresponding to each of the first pixel blocks may include:
and searching a previous horizon plane of the first video frame according to a preset searching range aiming at each horizon plane of the L horizon planes, and determining a plurality of first candidate matching blocks of each first pixel block.
Here, the candidate matching block may include a first candidate matching block. The second video frame may include a frame preceding the first video frame. The preset search range may be determined according to a search method of block matching. The block matching search method may include a diamond search method, a hexagonal search method, a new three-step method, a new four-step method, and other existing search methods. For each horizon plane of the L horizon planes, the video encoding apparatus searches a previous horizon plane of the L horizon planes according to a preset search range, so as to determine a plurality of first candidate matching blocks of each horizon plane corresponding to each first pixel block.
In the above embodiment, the search is performed on the previous level plane, and the candidate matching module is determined, that is, the search range of the level plane is optimized to the preset search range of the gray scale dimension, which is beneficial to reducing the search range, simplifying the search flow, and facilitating the design of hardware real-time performance. Searching in the gray scale dimension is beneficial to improving the matching accuracy.
In some embodiments, the preset search range includes a diamond-shaped range centered on the location of the bit plane corresponding to the first pixel block, taking into account the characteristics of the near-eye display at the ultra-high refresh rate.
Here, the diamond-shaped range centered on the position of the bit plane corresponding to the first pixel block may include an upper position, a lower position, a left position, a right position, and a center position where the first pixel block is located. Optionally, the step size of the preset search range is 1.
For example, for a certain position (x n ,y n ) The video encoding device searches in a 3 rd horizon plane of the first video frame according to a preset search range, and can determine a plurality of first candidate modules of the first pixel block in the 2 nd horizon plane of the first video frame. The plurality of first candidate blocks may comprise 5 blocks of pixels, in particular may comprise a first video frame having a position (x) of a 3 rd horizon plane n ,y n -1)、(x n ,y n -1)、(x n -1,y n )、(x n +1,y n ) (x) n ,y n ) Is included in the pixel block of (a). Each pixel block is equal in size to the first pixel block.
In some embodiments, in step S150, determining, for each of the L horizon planes, a plurality of candidate matching blocks corresponding to each of the first pixel blocks may include:
and searching each horizon plane of the L horizon planes in the corresponding horizon plane of the second video frame and the previous horizon plane of the bit plane of the first video frame according to a preset searching range, and determining a plurality of first candidate matching blocks of each first pixel block.
Here, the candidate matching block may include a first candidate matching block. The second video frame may include a frame preceding the first video frame. The preset search range may be determined according to a search method of block matching. The block matching search method may include a diamond search method, a hexagonal search method, a new three-step method, a new four-step method, and other existing search methods. For each horizon plane of the L horizon planes, the video encoding device searches for the previous horizon plane of the L horizon planes and the same horizon plane of the second video frame according to a preset search range, so as to determine a plurality of first candidate matching blocks of each horizon plane corresponding to each first pixel block.
In some embodiments, the preset search range includes a diamond-shaped range centered on a location of a bit plane corresponding to the first pixel block.
Here, the diamond-shaped range centered on the position of the bit plane corresponding to the first pixel block may include an upper position, a lower position, a left position, a right position, and a center position where the first pixel block is located. Optionally, the step size of the preset search range is 1.
For example, for a certain position (x n ,y n ) The video coding device searches in a 3 rd horizon plane of the first video frame and a 2 nd horizon plane of the second video frame according to a preset searching range, and a plurality of first candidate modules of the first pixel block in the 2 nd horizon plane of the first video frame can be determined. The plurality of first candidate blocks may comprise 10 blocks of pixels, and may particularly comprise a first video frame having a position (x) of a 3 rd horizon plane n ,y n -1)、(x n ,y n -1)、(x n -1,y n )、(x n +1,y n ) (x) n ,y n ) And the position of the 2 nd horizon plane of the second video frame is (x) n ,y n -1)、(x n ,y n -1)、(x n -1,y n )、(x n +1,y n ) (x) n ,y n ) Is included in the pixel block of (a). Each pixel block is equal in size to the first pixel block.
It should be noted that, in the embodiment of the present application, for the first pixel block located at the edge of the first video frame, the number of the first candidate matching blocks may be adjusted accordingly. In addition, for the highest horizon plane, the video encoding device may search in a corresponding bit plane of the second video frame according to a preset search range, and determine a plurality of first candidate matching blocks of the highest horizon plane corresponding to the first pixel block.
In the above embodiment, searching is performed on the previous frame and the previous layer level, and the candidate matching module is determined, that is, the search range of the bit level is optimized to the preset search range of the time dimension and the gray scale dimension, which is beneficial to reducing the search range, simplifying the search flow, and beneficial to the design of hardware instantaneity. Searching in the gray scale dimension is beneficial to improving the matching accuracy.
In some embodiments, in step S160, for each first pixel block, determining a target matching block from a plurality of candidate matching blocks based on the bit-plane VJND threshold block of the first pixel block may include:
for each first pixel block, a target matching block is determined from a plurality of first candidate matching blocks based on a bit-plane VJND threshold block of the first pixel block.
Here, for each first pixel block, the video encoding device determines, based on the bit plane VJND threshold block of the first pixel block, a target match block for the corresponding horizon plane from among a plurality of first candidate match blocks for each horizon plane for which the first pixel block corresponds.
For example, for a certain horizon plane, the first candidate matching block includes 5 pixel blocks of the horizon plane of the second video frame and 5 pixel blocks of a horizon plane that is immediately above the horizon plane of the first video frame, and the video encoding device determines a target matching block for the first pixel block in the horizon plane from the 10 pixel blocks based on a bit plane VJND threshold block for the first pixel block.
In some embodiments, in order to make the residual value of the first pixel block and the target matching block as small as possible, in step S160, determining, for each of the first pixel blocks, the target matching block from the plurality of candidate matching blocks based on the bit plane VJND threshold block of the first pixel block may further include:
for each first pixel block, determining a target matching block from a plurality of first candidate matching blocks and a preset complementary matching block based on a bit plane VJND threshold block of the first pixel block.
Fig. 10 is a schematic diagram of a preset supplemental matching block shown according to an exemplary embodiment. In practical application, the applicant extracts a plurality of video frames, and performs statistics on various situations of the first pixel block under the condition that N may be 3, referring to fig. 10, the number of preset complementary matching blocks may be 22.
In the above embodiment, the residual data is replaced by the preset complementary matching block, so that the residual value of the first pixel block and the candidate matching block is as smaller than the JND threshold as possible, so that the transmission residual value is not needed, the transmission data amount after compression encoding of the video frame can be reduced, the transmission data amount is constant, and the hardware design is facilitated.
In some embodiments, in step S160, determining, for each of the first pixel blocks, a target matching block from a plurality of candidate matching blocks based on the bit-plane VJND threshold block of the first pixel block may include:
Step S161, for each first pixel block, determines the number of matching points corresponding to each candidate matching block for the first pixel block based on the first pixel block, the plurality of candidate matching blocks, and the bit plane VJND threshold block.
Here, the number of matching points is used to characterize the similarity of the first pixel block to the candidate matching block. The video encoding device may calculate, according to a preset matching rule, a number of matching points corresponding to the first pixel block and each candidate matching block based on the first pixel block, the plurality of candidate matching blocks, and the bit plane VJND threshold block.
In some embodiments, the formula for calculating the number of matching points may be as follows:
where k represents the number of bit plane layers, a represents the first pixel block, B represents the candidate matching block, C represents the bit plane VJND threshold block, "" represents "exclusive nor", "|" represents "or".
In some embodiments, step S161 may include:
in step S1611, the target visual sensitivity value of the bit plane corresponding to the first pixel block is determined from the bit plane VJND threshold block according to the number of layers of the bit plane.
Here, for each first pixel block, the video encoding apparatus determines, according to the number of layers of the bit plane to be encoded, a target visual sensitivity value of each pixel point in the first pixel block in the horizon plane from a bit plane VJND threshold block corresponding to each pixel point in each first pixel block.
For example, the number of layers of the bit plane to be encoded is 3, the bit plane VJND threshold block corresponding to a certain pixel point in the first pixel block is 00111111, and the video encoding device may determine that the target visual sensitivity value of the pixel point in the bit plane is 1. Similarly, the target visual sensitivity value of other pixels in the first pixel block at the horizon plane may be determined.
Step S1612, for each candidate matching block, calculating a first matching point number of the ith×j pixel point in the first pixel block and the candidate matching block according to the gray value of the ith×j pixel point of the first pixel block, the gray value of the ith×j pixel point of the candidate matching block, and the target visual sensitivity value of the bit plane of the ith×j pixel point of the first pixel block.
Here, i, j are each not more than N, and i, j are each integers not less than 0. The first matching point number is used for representing the similarity degree of the pixel points at the same position in the first pixel block and the first candidate matching block.
In some embodiments, in the case that the target visual sensitivity value of the bit plane of the ith×j pixel point is the first preset value, the video encoding apparatus may determine that the ith×j pixel point of the first pixel block matches the ith×j pixel point of the candidate matching block, so that the first matching point number of the ith×j pixel point of the first pixel block and the candidate matching block may be determined to be the second preset value.
Here, the target visual sensitivity value of the bit plane of the ith×j pixel point is a first preset value, which indicates that the human eye is insensitive to the ith×j pixel point in the bit plane to be encoded. Thus, since the human eye is insensitive to the pixel point, even if the i×j-th pixel point of the first pixel block is different from the gray value of the i×j-th pixel point of the candidate matching block, it can be determined that the i×j-th pixel point of the first pixel block matches the i×j-th pixel point of the candidate matching block.
In some embodiments, the video encoding apparatus determines that the gray value of the ith×j pixel point of the first pixel block is the same as the gray value of the ith×j pixel point of the candidate matching block, so that the first matching point number of the ith×j pixel point of the first pixel block and the candidate matching block can be determined as the second preset value.
In some embodiments, in the case where the target visual sensitivity value of the bit plane of the ith×j pixel point is the first preset value, the video encoding apparatus may determine that the ith×j pixel point of the first pixel block matches the ith×j pixel point of the candidate matching block, and determine that the gray value of the ith×j pixel point of the first pixel block is the same as the gray value of the ith×j pixel point of the candidate matching block, and the video encoding apparatus may determine that the first matching point number of the ith×j pixel point of the first pixel block and the ith×j pixel point of the candidate matching block is the second preset value.
Alternatively, the first preset value may be 1, and the second preset value may be 1.
Step S1613, accumulating the first matching points of the first pixel block and the pixel points in the candidate matching block to obtain the matching points of the first pixel block and the candidate matching block.
Here, the first pixel block may include n×n pixel points. The candidate matching block may also include n×n pixels. The video encoding device calculates the first matching points of all the pixels in the first pixel block and the candidate matching block according to step S1612, and accumulates the first matching points of each pixel, so as to obtain the matching points of the first pixel block and the candidate matching block.
Step S162, determining a target matching block matched with the first pixel block according to the number of matching points.
Here, the video encoding device calculates the number of matching points between the first pixel block and each candidate matching block, and screens out the target matching block from the candidate matching blocks according to a preset rule.
Alternatively, the preset rule may include a matching point maximum rule. For example, the video encoding device sets a candidate matching block having the largest number of matching points as a target matching block. The preset rule may also include an index preference rule of the candidate matching block when the number of matching points is the same. For example, in the case where the number of matching points of the first pixel block and the two candidate matching blocks is the same, the video encoding device takes the candidate matching block with the smallest index as the target matching block.
In the above embodiment, the relationship between the critical frequency of human eyes and the eccentricity of retina is quantized, a VJND model is established, and a VJND threshold is mapped to a bit plane, which is used in a matching criterion of bit plane motion estimation, so as to help to improve the accuracy of matching of bit plane motion estimation, and further improve the quality of restored video frames.
Fig. 11 is a flowchart illustrating a video decoding method according to an exemplary embodiment, referring to fig. 11, the video decoding method may be applied to a controller of a near-eye display, and includes the following steps:
step S210, obtaining the encoded data of the M-level plane of the first video frame.
Here, the M-level plane is obtained by splitting the first video frame. The first video frame includes a plurality of first pixel blocks. The encoded data of the M-level plane of the first video frame may be obtained according to the video encoding method in the foregoing embodiment.
Step S220, decoding the encoded data of the M-layer level of the first video frame according to a preset decoding format to obtain information of target matching blocks corresponding to the plurality of first pixel blocks.
Here, the information of the target matching block includes information of the target matching block of each of the L-level planes. The M horizon planes include L horizon planes. The preset decoding format corresponds to the preset encoding format. For example, the preset encoding format is binary encoding, and the preset decoding format is binary decoding.
Step S230, for each first pixel block, determining a gray value of each horizon plane in the L horizon planes corresponding to the first pixel block according to the information of the target matching block corresponding to the first pixel block.
Here, for each first pixel block, the video decoding apparatus determines a target matching block for each horizon plane corresponding to the first pixel block from information of the target matching block for each horizon plane in the L horizon planes corresponding to the first pixel block, wherein the target matching block matches the first pixel block. And restoring the first pixel block according to the target matching block, so that the gray value of each horizon plane in the L horizon planes corresponding to the first pixel block can be determined.
In some embodiments, since in video encoding, the residual value of the target matching block and the first pixel block is less than the JND threshold, the restored first pixel block may be the target matching block.
In step S240, the M-level planes of the first video frame are restored based on the gray value of each level plane in the L-level planes corresponding to the plurality of first pixel blocks.
In the above embodiment, the storage area overhead of the near-eye display can be reduced by receiving the compression-encoded data of the first video frame, which is beneficial to the miniaturization requirement of the near-eye display device. Meanwhile, the information of the target matching block is obtained by decoding the coded data of the M-layer level of the first video frame, so that the bit level can be quickly restored, and the high refresh rate and the high resolution of the near-eye display are ensured.
A specific implementation of each of the above steps is described below.
In some embodiments, in step S230, determining the gray value of each of the L horizon planes corresponding to the plurality of first pixel blocks according to the information of the target matching block corresponding to the plurality of first pixel blocks includes:
step S231, for each horizon plane of the L horizon planes, determining a plurality of candidate matching blocks for each first pixel block.
Here, the plurality of candidate matching blocks includes candidate matching blocks of any one of the L-level planes.
Step S232, for each first pixel block, determining a target matching block from a plurality of candidate matching blocks corresponding to the horizon plane according to the information of the target matching block corresponding to the first pixel block.
Step S233, determining the gray value of each horizon plane corresponding to the first pixel block according to the target matching block of each horizon plane corresponding to the first pixel block.
In some embodiments, in step S231, determining a plurality of candidate matching blocks for each first pixel block for each of the L horizon planes may include:
in step S2311, for each horizon plane of the L horizon planes, searching is performed on the corresponding horizon plane of the second video frame and/or the previous horizon plane of the bit plane of the first video frame according to the preset search range, so as to determine a plurality of first candidate matching blocks of each first pixel block.
Here, the second video frame includes a frame previous to the first video frame. The video decoding device may search in a corresponding horizon plane of the second video frame according to a preset search range, and determine a plurality of first candidate matching blocks of each first pixel block. The video decoding device may also search a previous bit plane of the bit plane according to a preset search range, determine a plurality of first candidate matching blocks of each first pixel block, and determine a plurality of first candidate matching blocks of each first pixel block. The video decoding apparatus may further search for a corresponding horizon plane of the second video frame and a previous horizon plane of the bit plane of the first video frame according to a preset search range, and determine a plurality of first candidate matching blocks for each first pixel block.
In some embodiments, the preset search range includes a diamond-shaped range centered on a location of the bit plane corresponding to the first pixel block. Optionally, the search step size is 1.
Here, the diamond-shaped range centered on the position of the bit plane corresponding to the first pixel block may include an upper position, a lower position, a left position, a right position, and a center position where the first pixel block is located.
Step S2312 determines a plurality of candidate matching blocks for each first pixel block based on the plurality of first candidate matching blocks for each first pixel block.
In some embodiments, to ensure the restored video frame quality, in step S2312, determining the plurality of candidate matching blocks for each of the first pixel blocks may include:
a plurality of candidate matching blocks of the first pixel block are determined based on the plurality of first candidate matching blocks of each first pixel block and a preset supplemental matching block.
In some embodiments, the encoded data for the M-bit-level planes of the first video frame includes gray values corresponding to other ones of the M-bit-level planes other than the L-bit-level plane.
In step S240, restoring the M-level plane of the first video frame based on the gray value of each level plane in the L-level planes corresponding to the plurality of first pixel blocks may include:
and restoring the M horizon planes of the first video frame based on the gray values of each horizon plane in the L horizon planes corresponding to the plurality of first pixel blocks and the gray values corresponding to other bit planes except the L horizon planes in the M horizon planes.
Fig. 12 is a block diagram illustrating a video encoding apparatus according to an exemplary embodiment. As shown in fig. 12, the video encoding apparatus 300 includes: a first acquisition module 310, a threshold block determination module 320, an encoding module 320, and an encoded data determination module 330.
The first acquisition module 310 is configured to perform acquisition of a first video frame in the video stream, the first video frame comprising a plurality of first pixel blocks.
The threshold block determining module 320 is configured to determine a bit plane VJND threshold block for each of the first pixel blocks according to a view just noticeable distortion VJND model. The VJND model is determined according to the just-noticeable distortion threshold and the viewpoint factor quantization threshold; the viewpoint factor quantization threshold is determined according to the retinal eccentricity corresponding to the first pixel block; the bit plane VJND threshold value block comprises visual sensitivity values of M bit planes corresponding to all pixel points in the first pixel block; the M-level plane is obtained by splitting the first video frame.
The encoding module 330 is configured to perform encoding, for each first pixel block, the information of the target matching block according to a preset encoding format, so as to obtain first encoded data of the first pixel block. The first coded data comprises coded data of each of the L horizon planes corresponding to the first pixel block; the M horizon plane comprises the L horizon plane; the information of the target matching block is determined according to a bit plane VJND threshold block of the first pixel block; the information of the target matching block includes information of the target matching block of each of the L-level planes.
The encoded data determining module 340 is configured to perform determining encoded data of an M-level plane of the first video frame based on the first encoded data of the plurality of first pixel blocks.
In the above embodiment, on one hand, the relationship between the critical frequency of human eyes and the retinal eccentricity can be quantized, a VJND model is established, and the VJND model is used for determining the target matching block, so that the matching accuracy in the bit plane motion estimation is improved, and the video frame quality of the restored bit plane is further improved. On the other hand, the information of the target matching block of the first pixel block in the first video frame is encoded to obtain the encoded data of the L-layer level of the first video frame, so that the bit-plane compression encoding of the first video frame is realized, the transmission data quantity of the encoded data of the first video frame is reduced, the data transmission bandwidth is reduced, the storage area overhead of the near-eye display is further reduced, the transmission efficiency of the first video frame is improved, and the high refresh rate and the high resolution of the near-eye display are ensured.
In some embodiments, the threshold block determination module 320 may include a threshold calculation sub-module, a first determination sub-module, and a second determination sub-module.
And a threshold value calculating sub-module configured to perform, for each pixel point in each first pixel block, calculation of a VJND threshold value of the pixel point according to the Viewpoint Just Noticeable Distortion (VJND) model.
And the first determining submodule is configured to determine a bit plane VJND threshold block corresponding to the VJND threshold of the pixel according to the corresponding relation between the VJND threshold and the bit plane VJND threshold block.
And a second determining sub-module configured to perform determining a bit-plane threshold block of each first pixel block based on bit-plane VJND threshold blocks corresponding to all pixel points in each first pixel block.
In some embodiments, the video decoding device further comprises a candidate determination module, a target determination module, and a target information determination module.
And a candidate determining module configured to determine, for each of the L horizon planes, a plurality of candidate matching blocks corresponding to each of the first pixel blocks.
Here, the plurality of candidate matching blocks includes candidate matching blocks of any one of the L-level planes.
A target determination module configured to perform, for each of the first pixel blocks, a determination of a target matching block from a plurality of candidate matching blocks based on a bit-plane VJND threshold block of the first pixel block; the target matching block comprises a candidate matching block matched with the first pixel block;
and the target information determining module is used for determining the information of the target matching blocks of each horizon plane based on the target matching blocks of each horizon plane corresponding to the first pixel block.
In some embodiments, the candidate determining module is specifically configured to perform searching at a corresponding horizon plane of the second video frame and/or a previous horizon plane of the bit plane of the first video frame according to a preset search range for each horizon plane of the L horizon planes, and determine a plurality of first candidate matching blocks for each first pixel block.
Here, the second video frame includes a frame previous to the first video frame.
In some embodiments, the preset search range includes a diamond-shaped range centered on a location of a bit plane corresponding to the first pixel block.
In some embodiments, the target determination module is specifically configured to perform, for each of said first pixel blocks, determining a target matching block from a plurality of first candidate matching blocks based on a bit-plane VJND threshold block of the first pixel block.
In some embodiments, the target determination module is specifically further configured to perform, for each first pixel block, determining a target matching block from a plurality of first candidate matching blocks and a preset complementary matching block based on a bit-plane VJND threshold block of the first pixel block.
In some embodiments, the targeting module may include a points determination sub-module and a first targeting sub-module.
The point determination submodule is configured to execute determination of a matching point corresponding to each candidate matching block of the first pixel block based on the first pixel block, the plurality of candidate matching blocks and the bit plane VJND threshold block.
Here, the number of matching points is used to characterize the similarity of the first pixel block to the candidate matching block.
A first object determination sub-module configured to perform determining an object matching block matching the first pixel block based on the number of matching points.
In some embodiments, the point determination submodule includes a sensitivity value determination unit, a point calculation unit, and an accumulation unit.
And the sensitive value determining unit is configured to determine the target visual sensitive value of the bit plane corresponding to the first pixel block from the bit plane VJND threshold block according to the layer number of the bit plane.
A point calculation unit configured to perform calculation of a first matching point of the ith×j pixel point in the first pixel block and the candidate matching block according to a gray value of the ith×j pixel point of the first pixel block, a gray value of the ith×j pixel point of the candidate matching block, and a target visual sensitivity value of a bit plane of the ith×j pixel point of the first pixel block for each candidate matching block; i, j are not more than N, and i, j are integers not less than 0.
And the accumulation unit is configured to accumulate the first matching points of the pixel points in the first pixel block and the candidate matching block to obtain the matching points of the first pixel block and the candidate matching block.
In some embodiments, the point calculation unit is specifically configured to determine that the ith×j pixel point of the first pixel block matches the ith×j pixel point of the candidate matching block, if the target visual sensitivity value of the bit plane of the ith×j pixel point is a first preset value;
and/or determining that the gray value of the ith multiplied by j pixel point of the first pixel block is the same as the gray value of the ith multiplied by j pixel point of the candidate matching block;
and determining the first matching point number of the ith multiplied by j pixel points of the first pixel block and the candidate matching block as a second preset value.
In some embodiments, where M is greater than L, the encoded data determining module 340 is further configured to perform determining the encoded data of the M-bit-plane of the first video frame from the first encoded data of the plurality of first pixel blocks and gray values corresponding to bit-planes of the M-bit-planes other than the L-bit-plane.
Fig. 13 is a block diagram illustrating a video decoding apparatus according to an exemplary embodiment. As shown in fig. 13, the video decoding apparatus 400 applied to a controller of a near-eye display may include: the second acquisition module 410, the decoding module 420, the gray determination module 430, and the restoration module 440.
A second acquisition module 410 configured to perform acquisition of encoded data of the M-level plane of the first video frame; the M horizon plane is obtained by splitting a first video frame; the first video frame includes a plurality of first blocks of pixels;
the decoding module 420 is configured to perform decoding on the encoded data of the M-level plane of the first video frame according to a preset decoding format, so as to obtain information of target matching blocks corresponding to the plurality of first pixel blocks. The information of the target matching block includes information of the target matching block of each horizon plane of an L-horizon plane, and the M-horizon plane includes the L-horizon plane.
The gray determining module 430 is configured to perform, for each first pixel block, determining a gray value of each of the L-level planes corresponding to the first pixel block according to information of the target matching block corresponding to the first pixel block.
The restoration module 440 is configured to perform restoration of the M-level planes of the first video frame based on the gray value of each of the L-level planes corresponding to the plurality of first pixel blocks.
In the above embodiment, the storage area overhead of the near-eye display can be reduced by receiving the compression-encoded data of the first video frame, which is beneficial to the miniaturization requirement of the near-eye display device. Meanwhile, the information of the target matching block is obtained by decoding the coded data of the M-layer level of the first video frame, so that the bit level can be quickly restored, and the high refresh rate and the high resolution of the near-eye display are ensured.
In some embodiments, the gray scale determination module 430 includes a candidate determination submodule, a second target determination submodule, and a gray scale determination submodule.
A candidate determination submodule configured to perform, for each of the L horizon planes, a determination of a plurality of candidate matching blocks for each first pixel block.
Here, the plurality of candidate matching blocks includes candidate matching blocks of any one of the L-level planes.
A second target determination sub-module configured to perform, for each first pixel block, a determination of a target matching block from a plurality of candidate matching blocks of the corresponding horizon plane according to information of the target matching block corresponding to the first pixel block.
Here, the target matching block includes a candidate matching block that matches the first pixel block.
And the gray level determining sub-module is configured to determine a gray level value of each horizon plane corresponding to the first pixel block according to the target matching block of each horizon plane corresponding to the first pixel block.
In some embodiments, the candidate determination submodule includes a search subunit and a candidate determination subunit.
And the searching subunit is configured to perform searching on each horizon plane of the L horizon planes according to a preset searching range in a corresponding horizon plane of the second video frame and/or a previous horizon plane of the bit plane of the first video frame, and determine a plurality of first candidate matching blocks of each first pixel block.
Here, the second video frame includes a frame previous to the first video frame;
and a candidate determination subunit configured to perform determination of the plurality of candidate matching blocks for each first pixel block based on the plurality of first candidate matching blocks for each first pixel block.
In some embodiments, the preset search range includes a diamond-shaped range centered on a location of the bit plane corresponding to the first pixel block.
In some embodiments, the candidate determining subunit is further configured to perform determining the plurality of candidate matching blocks for the first pixel block based on the plurality of first candidate matching blocks for each first pixel block and a preset supplemental matching block.
In some embodiments, the restoration module 440 is configured to restore the M-level planes of the first video frame based on the gray values of each of the L-level planes corresponding to the plurality of first pixel blocks and the gray values of the other bit planes except for the L-level plane in the case where the encoded data of the M-level planes of the first video frame includes the gray values of the other bit planes except for the L-level plane in the M-level planes.
Fig. 14 is a block diagram illustrating a controller of a near-eye display according to an exemplary embodiment. As shown in fig. 14, the controller 500 of the near-eye display includes a frame buffer module 510, a preprocessing module 520, a transmitting module 530, a receiving module 540, and a scanning display module 550.
The frame buffer module 510 is configured to buffer a current frame (i.e. a first video frame), a previous frame (i.e. a second video frame) and a JND frame if the timing requirement is met.
Here, the JND frame may include a VJND threshold mapped to the bit plane according to the VJND model. The VJND model is the same as that of the foregoing embodiment, and for brevity, will not be described here.
The preprocessing module 520 is configured to split the video frames output in the frame buffer module 510 according to bit planes to obtain M-level plane data; the split bit plane data is accessed into a plurality of N-layer first-in first-out queues (First Input First Outpu, FIFO), so that a plurality of N multiplied by N pixel blocks corresponding to each bit plane of the current frame (namely, a first pixel block corresponding to each bit plane) and a plurality of N multiplied by N pixel blocks corresponding to each bit plane of the previous frame are obtained.
A sending module 530 configured to execute the pixel blocks buffered from the plurality of FIFOs in the preprocessing module 520, and determine a first candidate matching block corresponding to each pixel block of each horizon plane in the current frame; matching the pixel block with the first candidate matching block and a preset complementary matching block (namely the candidate matching block) in parallel according to a preset matching rule from the first candidate matching block and the preset complementary matching block; comparing the matching results to determine a target matching block matching the pixel block; and encoding the index information of the target matching block to obtain the encoded data of the pixel block.
Here, the preset matching rule is as in formula 6 in the foregoing embodiment. The preset supplementary matching block is stored in a lookup table.
In some embodiments, the sending module 530 is configured to perform searching according to a preset searching range by the video encoding apparatus on a previous horizon plane of the current frame and the same horizon plane of the previous frame, so as to determine a plurality of first candidate matching blocks of each horizon plane corresponding to each first pixel block.
In some embodiments, the preset search range may include an upper position, a lower position, a left position, a right position of the pixel block, and a center position of the pixel block. Optionally, the search step size is 1. The plurality of first candidate matching blocks may include 5 pixel blocks of a previous horizon plane of the current frame and 5 pixel blocks of a same horizon plane of a previous frame.
Alternatively, N may be 3. The preset complementary matching block may be a 22 pixel block, as shown in fig. 10. The sending module 530 is specifically configured to perform matching according to a preset matching rule from 10 first candidate matching blocks and 22 preset complementary matching blocks, obtain 32 matching results according to the preset matching rule, and obtain a target matching block through comparison; the information of the target matching block is encoded into 5-bit data. Thus, the compression ratio of either bit plane is 9:5, so a constant compression ratio facilitates hardware design.
It should be noted that, since the highest horizon plane does not exist in the previous horizon plane, the first candidate matching block corresponding to the pixel block of the highest horizon plane may include 5 pixel blocks of the highest horizon plane of the previous frame.
A receiving module 540 configured to execute the pixel blocks buffered from the plurality of FIFOs in the preprocessing module 520, and determine a first candidate matching block corresponding to each pixel block of each horizon plane in the current frame; decoding the encoded data output by the transmitting module 530 to determine index information of a target matching block of each horizon plane of the pixel block of the current frame; restoring each horizon plane of the current frame from the first candidate matching block and a preset supplementary matching block according to the index information of the target matching block; the restored bit planes are sent to the scan display module 550.
In some embodiments, the receiving module 540 is further configured to perform reading of the uncoded bit-planes from the preprocessing module 520; the uncoded bit planes are sent to the scan display module 550.
Here, the transmitting module 530 does not compression-encode all bit planes of the current frame. Optionally, the low 5 horizon plane is coded and the 6 th to highest horizon plane is directly transmitted. Thus, the compression ratio is constant at 1.385, the transmission data amount is only 72.22%, the average PSNR can be 37.658, and the average SSIM can be 0.975. In the practical application environment, the restored video frames have no obvious difference from the original video frames, the rendering effect is relatively good, and the visual feeling of human eyes is met.
The scan display module 550 is configured to display video frames by means of subfield scanning according to the bit-plane data transmitted by the transmission module 530.
In the above embodiments, for the mode of operation of a digitally driven near-eye display, video data is compression encoded and decoded based on bit plane motion estimation. On one hand, the relation between the critical frequency and the eccentricity ratio of human eyes is quantized, a VJND model considering viewpoint factors is established, and a VJND threshold value is mapped to a bit plane, so that the VJND threshold value is applied to a matching criterion of bit plane motion estimation, the accuracy of motion estimation matching is improved, and the quality of restored video frames is improved. On the other hand, the previous frame and the previous layer plane are searched, a candidate matching module is determined, namely, the search range of the bit plane is optimized to be the time dimension and the gray scale dimension, so that the search range is reduced, the search flow is simplified, and the hardware real-time design is facilitated. Searching for gray scale dimensions is beneficial to improving matching accuracy.
In some embodiments, the controller 500 of the near-eye display may also include a memory module 560.
Here, the storage module 560 may include a random access memory (Random Access Memory, RAM).
The transmitting module 530 is further configured to perform transmitting the encoded data to the storage module 560.
A storage module 560 configured to perform storing the encoded bitplanes transmitted by the transmission module 530; the uncoded bit-planes output by the preprocessing module 520 are stored.
A receiving module 540, further configured to perform reading of the encoded data from the storage module 560; the uncoded bit-plane data is read from the memory module 560.
Here, the uncoded plane data may be used for scan display, as well as decoding of other bit planes.
Fig. 15 shows a hardware configuration diagram of the electronic device shown according to an exemplary embodiment.
A processor 601 may be included in an electronic device and a memory 602 storing computer program instructions.
In particular, the processor 601 may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.
Memory 602 may include mass storage for data or instructions. By way of example, and not limitation, memory 302 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. The memory 602 may include removable or non-removable (or fixed) media, where appropriate. In a particular embodiment, the memory 602 is a non-volatile solid state memory.
The memory may include Read Only Memory (ROM), random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors) it is operable to perform the operations described with reference to a method according to an aspect of the present application.
The processor 601 implements any one of the video encoding methods or any one of the video decoding methods of the above embodiments by reading and executing computer program instructions stored in the memory 602.
In one example, the electronic device may also include a communication interface 603 and a bus 604. As shown in fig. 12, the processor 601, the memory 602, and the communication interface 603 are connected to each other through the bus 604 and perform communication with each other.
The communication interface 603 is mainly configured to implement communication between each module, apparatus, unit and/or device in the embodiments of the present application.
Bus 604 includes hardware, software, or both, that couple components of the electronic device to one another. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. Bus 604 may include one or more buses, where appropriate. Although embodiments of the present application describe and illustrate a particular bus, the present application contemplates any suitable bus or interconnect.
The electronic device may execute the video encoding method in the embodiment of the present application based on the VJND model, thereby implementing the video encoding method and apparatus described in connection with fig. 1 and 12.
The electronic device may perform the video decoding method in the embodiment of the present application based on the encoded data of the bit plane, and implement the video decoding method and apparatus described in connection with fig. 11 and 13.
In addition, in combination with the video encoding method or the video decoding method in the above embodiments, the embodiments of the present application may be implemented by providing a computer storage medium. The computer storage medium has stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the video encoding methods or video decoding methods of the above embodiments.
Finally, in combination with the video encoding method or the video decoding method in the above embodiments, the embodiments of the present application may provide a computer program product, including a computer program or instructions, which when executed by a processor, implement any one of the video encoding method or the video decoding method in the above embodiments.
It should be clear that the present application is not limited to the particular arrangements and processes described above and illustrated in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions, or change the order between steps, after appreciating the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.
It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be different from the order in the embodiments, or several steps may be performed simultaneously.
Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to being, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware which performs the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In the foregoing, only the specific embodiments of the present application are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, which are intended to be included in the scope of the present application.

Claims (15)

1. A method of video encoding, comprising a controller for a near-eye display, comprising:
acquiring a first video frame in a video stream, wherein the first video frame comprises a plurality of first pixel blocks;
determining a bit plane VJND threshold block of each first pixel block according to the viewpoint just-perceived distortion VJND model; the VJND model is determined according to the just-noticeable distortion threshold and the viewpoint factor quantization threshold; the viewpoint factor quantization threshold is determined according to the retinal eccentricity corresponding to the first pixel block; the bit plane VJND threshold block comprises visual sensitivity values of M layer planes corresponding to all pixel points in the first pixel block; the M layer level plane is obtained by splitting a first video frame;
for each first pixel block, encoding the information of the target matching block according to a preset encoding format to obtain first encoded data of the first pixel block; the first coding data comprise coding data of each horizon plane in the L horizon planes corresponding to the first pixel blocks; the M horizon planes include the L horizon planes; the information of the target matching block is determined according to a bit plane VJND threshold block of the first pixel block; the information of the target matching block comprises information of the target matching block of each horizon plane of the L horizon planes;
Code data for an M-level plane of a first video frame is determined based on first code data for the plurality of first pixel blocks.
2. The video coding method of claim 1, wherein the determining the bit-plane VJND threshold block for each of the first pixel blocks according to the view-exact-visibility-distortion VJND model comprises:
for each pixel point in each first pixel block, calculating a VJND threshold value of the pixel point according to a Viewpoint Just Noticeable Distortion (VJND) model;
determining a bit plane VJND threshold block corresponding to the VJND threshold of the pixel according to the corresponding relation between the VJND threshold and the bit plane VJND threshold block;
and determining the bit plane threshold value block of each first pixel block based on the bit plane VJND threshold value blocks corresponding to all pixel points in each first pixel block.
3. The video coding method according to claim 1, wherein before said encoding the information of the target matching block according to the preset encoding format for each first pixel block, determining the encoded data of the M-level plane of the first video frame, the method further comprises:
determining a plurality of candidate matching blocks corresponding to each first pixel block for each horizon plane of the L horizon planes; the plurality of candidate matching blocks comprise candidate matching blocks of any one horizon plane of the L horizon planes;
For each of the first pixel blocks, determining a target matching block from a plurality of candidate matching blocks based on a bit-plane VJND threshold block of the first pixel block; the target matching block includes a candidate matching block that matches the first pixel block;
and determining information of the target matching blocks of each horizon plane based on the target matching blocks of each horizon plane corresponding to the first pixel blocks.
4. The video coding method of claim 3, wherein the determining, for each of the L-level levels, a plurality of candidate matching blocks for each of the first pixel blocks comprises:
searching each horizon plane of the L horizon planes in a corresponding horizon plane of a second video frame and/or a previous horizon plane of the bit plane of a first video frame according to a preset searching range, and determining a plurality of first candidate matching blocks of each first pixel block; the second video frame includes a frame prior to the first video frame;
the determining, for each of the first pixel blocks, a target matching block from a plurality of candidate matching blocks based on a bit-plane VJND threshold block of the first pixel block, including:
for each of the first pixel blocks, a target matching block is determined from a plurality of first candidate matching blocks based on a bit-plane VJND threshold block of the first pixel block.
5. The video encoding method according to claim 4, wherein the preset search range includes a diamond-shaped range centered on a position of the bit plane corresponding to the first pixel block.
6. The method of video encoding according to claim 4, wherein said determining, for each of said first pixel blocks, a target matching block from a plurality of candidate matching blocks based on a bit-plane VJND threshold block of said first pixel block, comprises:
for each first pixel block, determining a target matching block from the plurality of first candidate matching blocks and a preset complementary matching block based on a bit plane VJND threshold block of the first pixel block.
7. The method of video encoding according to claim 4, wherein said determining, for each of said first pixel blocks, a target matching block from a plurality of candidate matching blocks based on a bit-plane VJND threshold block of said first pixel block, comprises:
determining, for each of the first pixel blocks, a number of matching points corresponding to each candidate matching block for the first pixel block based on the first pixel block, the plurality of candidate matching blocks, and the bit plane VJND threshold block; the matching points are used for representing the similarity between the first pixel block and the candidate matching block;
And determining a target matching block matched with the first pixel block according to the matching point number.
8. The video coding method of claim 7, wherein the determining the number of matching points for the first pixel block corresponding to each candidate matching block based on the first pixel block, the plurality of candidate matching blocks, and the bit plane VJND threshold block comprises:
determining a target visual sensitivity value of the bit plane corresponding to the first pixel block from the bit plane VJND threshold block according to the layer number of the bit plane;
for each candidate matching block, calculating a first matching point number of a first pixel block and a first pixel point in the candidate matching block according to a gray level value of the first pixel point of the first pixel block, a gray level value of the first pixel point of the candidate matching block and a target visual sensitivity value of the bit plane of the first pixel point of the first pixel block; i and j are not more than N, and i and j are integers not less than 0;
and accumulating the first matching points of the pixel points in the first pixel block and the candidate matching block to obtain the matching points of the first pixel block and the candidate matching block.
9. The video encoding method according to claim 8, wherein the calculating the first matching point number of the first pixel block and the first pixel point in the candidate matching block according to the gray level value of the first pixel point of the first pixel block, the gray level value of the first pixel point of the candidate matching block, and the target visual sensitivity value of the bit plane of the first pixel point of the first pixel block includes:
Determining that a first pixel point of the first pixel block is matched with a first pixel point of the candidate matching block under the condition that the target visual sensitivity value of the bit plane of the first pixel point is a first preset value;
and/or determining that the gray value of the first pixel point of the first pixel block is the same as the gray value of the first pixel point of the candidate matching block;
and determining the first matching point number of the first pixel block and the first pixel point of the candidate matching block as a second preset value.
10. The video coding method of claim 1, wherein, in the case where M is greater than L, the determining the encoded data of the M-level plane of the first video frame based on the first encoded data of the plurality of first pixel blocks comprises:
and determining the coding data of the M horizon planes of the first video frame according to the first coding data of the first pixel blocks and the gray values corresponding to other bit planes except the L horizon planes in the M horizon planes.
11. A video decoding method, applied to a controller of a near-eye display, comprising:
acquiring coding data of an M-layer level of a first video frame; the M horizon plane is obtained by splitting a first video frame; the first video frame includes a plurality of first blocks of pixels;
Decoding the encoded data of the M-layer level of the first video frame according to a preset decoding format to obtain information of target matching blocks corresponding to a plurality of first pixel blocks; the information of the target matching block comprises information of the target matching block of each horizon plane of the L horizon planes; the M horizon planes include the L horizon planes;
for each first pixel block, determining a gray value of each horizon plane in the L horizon planes corresponding to the first pixel block according to the information of the target matching block corresponding to the first pixel block;
restoring M horizon planes of the first video frame based on gray values of each horizon plane in the L horizon planes corresponding to the plurality of first pixel blocks,
the target matching block is determined according to a bit plane VJND threshold block of the first pixel block during encoding, and the bit plane VJND threshold block comprises visual sensitivity values of M horizon planes corresponding to all pixel points in the first pixel block; determining a bit plane VJND threshold block of each first pixel block according to a viewpoint just-perceived distortion (VJND) model during encoding; the VJND model is determined according to the just-noticeable distortion threshold and the viewpoint factor quantization threshold; the viewpoint factor quantization threshold is determined according to the retinal eccentricity corresponding to the first pixel block.
12. A video encoding device, characterized by a controller for a near-eye display, comprising:
a first acquisition module configured to perform acquisition of a first video frame in a video stream, the first video frame comprising a plurality of first pixel blocks;
a threshold block determining module configured to perform determining a bit plane VJND threshold block for each of the first pixel blocks according to a viewpoint-exact visual distortion VJND model; the VJND model is determined according to the just-noticeable distortion threshold and the viewpoint factor quantization threshold; the viewpoint factor quantization threshold is determined according to the retinal eccentricity corresponding to the first pixel block; the bit plane VJND threshold block comprises visual sensitivity values of M layer planes corresponding to all pixel points in the first pixel block; the M layer level plane is obtained by splitting a first video frame;
the encoding module is configured to execute encoding of the information of the target matching block according to a preset encoding format for each first pixel block to obtain first encoded data of the first pixel block; the first coding data comprise coding data of each horizon plane in the L horizon planes corresponding to the first pixel blocks; the M horizon planes include the L horizon planes; the information of the target matching block is determined according to a bit plane VJND threshold block of the first pixel block; the information of the target matching block comprises information of the target matching block of each horizon plane of the L horizon planes;
An encoded data determination module configured to perform determining encoded data of an M-level plane of a first video frame based on first encoded data of the plurality of first pixel blocks.
13. A video decoding device, characterized by a controller for a near-eye display, comprising:
a second acquisition module configured to perform acquisition of encoded data of the M-level plane of the first video frame; the M horizon plane is obtained by splitting a first video frame; the first video frame includes a plurality of first blocks of pixels;
the decoding module is configured to execute decoding on the encoded data of the M-layer level of the first video frame according to a preset decoding format to obtain information of target matching blocks corresponding to the plurality of first pixel blocks; the information of the target matching block comprises information of the target matching block of each horizon plane of the L horizon planes; the M horizon planes include the L horizon planes;
the gray level determining module is configured to execute the gray level determining module, for each first pixel block, of each level plane in the L level planes corresponding to the first pixel block according to the information of the target matching block corresponding to the first pixel block;
a restoration module configured to perform restoration of M-level planes of the first video frame based on gray values of each of the L-level planes corresponding to the plurality of first pixel blocks,
The target matching block is determined according to a bit plane VJND threshold block of the first pixel block during encoding, and the bit plane VJND threshold block comprises visual sensitivity values of M horizon planes corresponding to all pixel points in the first pixel block; determining a bit plane VJND threshold block of each first pixel block according to a viewpoint just-perceived distortion (VJND) model during encoding; the VJND model is determined according to the just-noticeable distortion threshold and the viewpoint factor quantization threshold; the viewpoint factor quantization threshold is determined according to the retinal eccentricity corresponding to the first pixel block.
14. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the video encoding method of any one of claims 1 to 10 or the video decoding method of claim 11.
15. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the video encoding method of any one of claims 1 to 10, or the video decoding method of claim 11.
CN202110723429.3A 2021-06-28 2021-06-28 Video encoding method, video decoding method and related devices Active CN113596451B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110723429.3A CN113596451B (en) 2021-06-28 2021-06-28 Video encoding method, video decoding method and related devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110723429.3A CN113596451B (en) 2021-06-28 2021-06-28 Video encoding method, video decoding method and related devices

Publications (2)

Publication Number Publication Date
CN113596451A CN113596451A (en) 2021-11-02
CN113596451B true CN113596451B (en) 2024-01-26

Family

ID=78245033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110723429.3A Active CN113596451B (en) 2021-06-28 2021-06-28 Video encoding method, video decoding method and related devices

Country Status (1)

Country Link
CN (1) CN113596451B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6366705B1 (en) * 1999-01-28 2002-04-02 Lucent Technologies Inc. Perceptual preprocessing techniques to reduce complexity of video coders
US6690833B1 (en) * 1997-07-14 2004-02-10 Sarnoff Corporation Apparatus and method for macroblock based rate control in a coding system
CN101924874A (en) * 2010-08-20 2010-12-22 北京航空航天大学 A Real-Time Electronic Image Stabilization Method Based on Matching Block Grading
CN103002280A (en) * 2012-10-08 2013-03-27 中国矿业大学 Distributed encoding and decoding method and system based on HVS&ROI
CN112261407A (en) * 2020-09-21 2021-01-22 苏州唐古光电科技有限公司 Image compression method, device and equipment and computer storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8547382B2 (en) * 2008-05-30 2013-10-01 Advanced Micro Devices, Inc. Video graphics system and method of pixel data compression

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6690833B1 (en) * 1997-07-14 2004-02-10 Sarnoff Corporation Apparatus and method for macroblock based rate control in a coding system
US6366705B1 (en) * 1999-01-28 2002-04-02 Lucent Technologies Inc. Perceptual preprocessing techniques to reduce complexity of video coders
CN101924874A (en) * 2010-08-20 2010-12-22 北京航空航天大学 A Real-Time Electronic Image Stabilization Method Based on Matching Block Grading
CN103002280A (en) * 2012-10-08 2013-03-27 中国矿业大学 Distributed encoding and decoding method and system based on HVS&ROI
CN112261407A (en) * 2020-09-21 2021-01-22 苏州唐古光电科技有限公司 Image compression method, device and equipment and computer storage medium

Also Published As

Publication number Publication date
CN113596451A (en) 2021-11-02

Similar Documents

Publication Publication Date Title
US11151749B2 (en) Image compression method and apparatus
US20250056010A1 (en) Video encoding rate control for intra and scene change frames using machine learning
US10887614B2 (en) Adaptive thresholding for computer vision on low bitrate compressed video streams
AU2019253866B2 (en) Image compression method and apparatus
WO2015167704A1 (en) Constant quality video coding
WO2017127167A1 (en) Long term reference picture coding
EP3198868A1 (en) Video coding rate control including target bitrate and quality control
CN112261407B (en) Image compression method, device and equipment and computer storage medium
JP2003018599A (en) Method and apparatus for encoding image
US10560702B2 (en) Transform unit size determination for video coding
MX2012004747A (en) Embedded graphics coding: reordered bitstream for parallel decoding.
EP4046382A1 (en) Method and apparatus in video coding for machines
CN113596451B (en) Video encoding method, video decoding method and related devices
CN113452996B (en) Video coding and decoding method and device
US8879622B2 (en) Interactive system and method for transmitting key images selected from a video stream over a low bandwidth network
US12095981B2 (en) Visual lossless image/video fixed-rate compression
CN108668170B (en) Image information processing method and device, and storage medium
CN113810692B (en) Method for framing changes and movements, image processing device and program product
Lima et al. Fast low bit-rate 3D searchless fractal video encoding
CN119299716B (en) An IoT terminal with efficient video data storage function
US11233999B2 (en) Transmission of a reverse video feed
Sasazaki et al. Fuzzy vector quantization of images based on local fractal dimensions
Rajankar et al. Effect of Single and Multiple ROI Coding on JPEG2000 Performance
CN118381931A (en) Gray image compression method in weak network environment
Ham et al. A Consistent Quality Bit Rate Control for the Line-Based Compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20231023

Address after: Room 702, Block C, Swan Tower, No. 111 Linghu Avenue, Xinwu District, Wuxi City, Jiangsu Province, 214028

Applicant after: Wuxi Tanggu Semiconductor Co.,Ltd.

Address before: 215128 unit 4-a404, creative industry park, 328 Xinghu street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: Suzhou Tanggu Photoelectric Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant