Disclosure of Invention
The invention provides a video intra-frame coding method, a coder and a medium, which are used for solving the problem of how to weaken splicing marks after panoramic video mapping without influencing coding efficiency in the prior art.
In a first aspect, an embodiment of the present invention provides a video intra-frame encoding method, applied to panoramic video, including:
The method comprises the steps that an encoder obtains a plurality of CUs (Coding Units) in a current video frame based on a first preset rule, CTUs (Coding Tree Units ) and position information of splicing boundaries in the current video frame;
the encoder divides the plurality of first CUs according to a preset division mode to obtain a plurality of first PUs (Prediction units);
the encoder determines a first target PU adjacent to the splicing boundary in a preset encoding direction, and determines at least two first target areas and a first reference pixel set corresponding to each first target area according to a first mapping relation and a plurality of first target PUs;
For each first target area, the encoder determines a first MPM (Most Probable Mode ) reference block corresponding to the first target area according to the preset encoding direction and the first mapping relationship;
The encoder encodes the first target PU according to the preset encoding direction, the first reference pixel set, and the first MPM reference block.
According to the video intra-frame coding method provided by the embodiment of the invention, an encoder divides a current video frame into a plurality of first CUs and divides the plurality of first CUs into a plurality of first PUs, on the basis, the encoder determines a first target PU adjacent to a splicing boundary in a preset coding direction, determines at least two first target areas and a first reference pixel set corresponding to each first target area according to a first mapping relation and the plurality of first target PUs, determines a first MPM reference block corresponding to the first target areas, and codes the first target PU according to the preset coding direction, the first reference pixel set and the first MPM reference block. According to the video intra-frame coding method, through the first mapping relation between the current video frame of the panoramic spherical video and the planar video frame corresponding to the current video frame, the spherical pixels adjacent to the first target areas are determined to be the first reference pixel sets corresponding to the first target areas, the spherical MPM reference blocks adjacent to the first target areas are determined to be the first MPM reference blocks corresponding to the first reference pixel sets, based on the first reference pixel sets, intra-frame predictive coding of the current video frame is carried out, the visual splicing boundary on the current video frame due to overlarge pixel difference at the splicing position can be weakened while the coding efficiency is not influenced, and the video quality is improved.
In an alternative embodiment, the encoder determines at least two first target areas and a first reference pixel set corresponding to each first target area according to a first mapping relation and a plurality of first target PUs, and includes:
the encoder determines a reference area adjacent to each first target PU according to the first mapping relation;
the encoder determines the category of each reference area according to the position information of each reference area;
The encoder composes target boundary pixel points in a first target PU corresponding to the reference areas of the same category into the first target area in the current video frame, and takes the target pixel points in the reference areas of the same category as the first reference pixel set.
According to the method, the encoder determines the reference area adjacent to each first target PU according to the first mapping relation corresponding to the spherical video frame and the planar video frame, namely, the first target PU and the reference area corresponding to the first target PU are in the relation adjacent to the spherical video frame, then determines the category of the first target PU according to the position information of each reference area, in the current video frame, the target boundary pixel points in the first target PU corresponding to the reference area of the same category form the first target area, and the target pixel points in the reference area of the same category are used as the first reference pixel set. By the method, the first target area of the reference pixel in the current video frame is determined to be required to be modified, and the first reference pixel set corresponding to the first target area is determined, the reference pixel of the first target area is modified to be a set of pixels adjacent to the first reference pixel set on the spherical video frame, namely the first reference pixel set, but not a set of pixels adjacent to the first reference pixel set on the planar video frame, and as the target boundary pixel point in the first target area has strong relevance with the target pixel point which is referred to in the predictive coding, the visual influence caused by large difference of pixel values can be reduced.
In an alternative embodiment, after the encoder regards the target pixel point in the reference area of the same class as the first reference pixel set, the encoder further includes:
The encoder compares the number of the target pixel points in the first reference pixel set with the number of the target boundary pixel points in the first target area corresponding to the first reference pixel set, and if the comparison results are different, a second reference pixel set is determined according to the related information of each target pixel point in the first reference pixel set and the related information of each target boundary pixel point in the first target area;
wherein the related information includes position information and pixel values.
According to the method, if the number of the target pixel points in the first reference pixel set is different from the number of the target boundary pixel points in the first target area corresponding to the first reference pixel set, the first reference pixel set does not have reference value for the first target area, and a new pixel reference set, namely a second reference pixel set, needs to be rebuilt according to the related information of the target pixel points and the related information of the target boundary pixel points based on an interpolation processing method. And each target boundary pixel point in the first target region carries out predictive coding according to the second reference pixel set, so that the accuracy of a predictive coding result is improved.
In an alternative embodiment, the encoder determines a second reference pixel set according to the related information of each target pixel point in the first reference pixel set and the related information of each target boundary pixel point in the first target region, and includes:
for each target boundary pixel point in the first target area, the encoder determines a target position in a reference area corresponding to the first target area according to the position information of the target boundary pixel point in the first target area;
The encoder determines a first weight and a second weight according to the position information of the target position, the position information of a first target pixel point adjacent to the target position and the position information of a second target pixel point adjacent to the target position, wherein the first weight is used for representing the distance between the target position and the first target pixel point, and the second weight is used for representing the distance between the target position and the second target pixel point;
The encoder determines a pixel value corresponding to the target position according to the first weight, the second weight, the pixel value of the first target pixel point and the pixel value of the second target pixel point;
The encoder takes the determined pixel values corresponding to all the target positions as the second reference pixel set.
The method comprises the specific process of determining a second reference pixel set for an encoder, wherein the encoder determines a target position in a reference area corresponding to a target boundary pixel point according to position information of the target boundary pixel point in the first target area, determines a first weight used for representing the distance between the target position and the first target pixel point according to the position information of the target position and the position information of the first target pixel point adjacent to the target position, determines a second weight used for representing the distance between the target position and the second target pixel point according to the position information of the target position and the position information of the second target pixel point adjacent to the target position, and finally determines a pixel value corresponding to the target position according to the first weight, the second weight, the pixel value of the first target pixel point and the pixel value of the second target pixel point, and the pixel values corresponding to all the target positions form the second reference pixel set. Reference pixels within the second set of reference pixels are determined by interpolation processing to facilitate predictive encoding of the first target region accordingly.
In an optional implementation manner, after the encoder determines at least two first target areas and a first reference pixel set corresponding to each first target area, before determining, for each first target area, an MPM reference block corresponding to the first target area according to the preset encoding direction and the first mapping relationship, the encoder further includes:
for each first target area, the encoder determines that the reference area has a reference value according to the position information of the first target area and the position information of the reference area corresponding to the first target area.
According to the method, the encoder determines that the reference area has reference value according to the position information of the first target area and the position information of the reference area corresponding to the first target area, and then determines the first MPM reference block corresponding to the first target area according to the preset coding direction and the first mapping relation so as to refer to the prediction mode in the first MPM reference block, and performs prediction coding on the first target area according to the first reference pixel set, so that the coding effect is improved.
In an alternative embodiment, the video intra-coding method further includes:
the encoder determines that the reference area has no reference value, and rotates the video frame by a preset angle according to a preset selection direction to obtain a rotated video frame;
the encoder obtains a plurality of second CUs in the rotated video frame based on a first preset rule, the CTU and the position information of the splicing boundary in the rotated video frame;
The encoder divides the plurality of second CUs according to a preset division mode to obtain a plurality of second PUs;
The encoder determines a second target PU adjacent to the splicing boundary in a preset encoding direction, and determines at least two second target areas and a third reference pixel set corresponding to each second target area according to a second mapping relation and a plurality of second target PUs;
for each second target area, the encoder determines a second MPM reference block corresponding to the second target area according to the preset encoding direction and the second mapping relation;
the encoder encodes the second target PU according to the preset encoding direction, the third reference pixel set, and the second MPM reference block.
According to the method, the encoder determines that the reference area has no reference value according to the position information of the first target area and the position information of the reference area corresponding to the first target area, then the video frame is rotated by a preset angle according to a preset selection direction to obtain a rotated video frame, the rotated video frame is divided to obtain a second CU, the second CU is divided to a second PU, the second target area, a third reference pixel set corresponding to the second target area and a second MPM reference block are determined, and predictive coding is performed. By rotating the video frames with reference areas without reference values by a preset angle according to a preset selection direction, all the reference areas corresponding to the second target areas have reference values, and at the moment, the video frames after rotation are subjected to predictive coding operation, so that each second target area can be subjected to proper predictive coding operation, the video coding effect is greatly improved, and the visual viewing effect of the panoramic video after reverse mapping is better.
In an alternative embodiment, the encoder obtains a plurality of coding units CU in the current video frame based on a first preset rule, a coding tree unit CTU, and position information of a splicing boundary in the current video frame, including:
The encoder divides the CTU based on the first preset rule to obtain a plurality of divided units;
for each divided unit, the encoder judges whether the divided unit comprises the splicing boundary according to the position information of the divided unit and the position information of the splicing boundary, if so, the divided unit is divided to obtain the first CU, otherwise, the divided unit is taken as the first CU;
Wherein, the first preset rule includes: the rate-distortion cost of the divided units is less than the rate-distortion cost of the CTU.
According to the method, the encoder calculates the rate distortion cost after each CU is divided and the rate distortion cost before the division based on the rate distortion optimization algorithm, selects the division method with the minimum rate distortion cost as the most reasonable division method, and carries out CU division on the current video frame according to the division method. The encoder uses the CU as a unit to carry out encoding operation, so that the encoding efficiency is greatly improved.
In a second aspect, an embodiment of the present invention provides an encoder applied to panoramic video, including:
The CU dividing module is used for obtaining a plurality of first coding units CU in the current video frame based on a first preset rule, the coding tree units CTU and the position information of the splicing boundary in the current video frame;
the PU dividing module is used for dividing the plurality of first CUs according to a preset dividing mode to obtain a plurality of first prediction units PU;
The reference pixel set determining module is used for determining first target PU adjacent to the splicing boundary in a preset encoding direction, and determining at least two first target areas and a first reference pixel set corresponding to each first target area according to a first mapping relation and a plurality of first target PU;
The MPM reference block determining module is used for determining a first MPM reference block corresponding to each first target area according to the preset coding direction and the first mapping relation;
and the prediction coding module is used for coding the first target PU according to the preset coding direction, the first reference pixel set and the first MPM reference block.
In an alternative embodiment, the reference pixel set determining module is specifically configured to:
Determining a reference area adjacent to each first target PU according to the first mapping relation;
determining the category of each reference area according to the position information of each reference area;
And in the current video frame, forming a first target area by target boundary pixel points in a first target PU corresponding to the reference area of the same category, and taking the target pixel points in the reference area of the same category as the first reference pixel set.
In an alternative embodiment, the reference pixel set determining module is further configured to:
comparing the number of the target pixel points in the first reference pixel set with the number of the target boundary pixel points in the first target area corresponding to the first reference pixel set, and if the comparison results are different, determining a second reference pixel set according to the related information of each target pixel point in the first reference pixel set and the related information of each target boundary pixel point in the first target area;
wherein the related information includes position information and pixel values.
In an alternative embodiment, the reference pixel set determining module is further configured to:
For each target boundary pixel point in the first target area, determining a target position in a reference area corresponding to the first target area according to the position information of the target boundary pixel point in the first target area;
Determining a first weight and a second weight according to the position information of the target position, the position information of a first target pixel point adjacent to the target position and the position information of a second target pixel point adjacent to the target position, wherein the first weight is used for representing the distance between the target position and the first target pixel point, and the second weight is used for representing the distance between the target position and the second target pixel point;
Determining a pixel value corresponding to the target position according to the first weight, the second weight, the pixel value of the first target pixel point and the pixel value of the second target pixel point;
and taking the determined pixel values corresponding to all the target positions as the second reference pixel set.
In an alternative embodiment, the reference pixel set determining module is further configured to:
For each first target area, determining that the reference area has a reference value according to the position information of the first target area and the position information of the reference area corresponding to the first target area.
In an alternative embodiment, the encoder further comprises:
The video rotation module is used for determining that the reference area has no reference value, and rotating the video frame by a preset angle according to a preset selection direction to obtain a rotated video frame;
the CU dividing module is used for obtaining a plurality of second CUs in the rotated video frame based on a first preset rule and position information of a splicing boundary in the CTU and the rotated video frame;
the PU dividing module is used for dividing the plurality of second CUs according to a preset dividing mode to obtain a plurality of second PUs;
The reference pixel set determining module is used for determining second target PU adjacent to the splicing boundary in a preset encoding direction, and determining at least two second target areas and a third reference pixel set corresponding to each second target area according to a second mapping relation and a plurality of second target PU;
The MPM reference block determining module is configured to determine, for each second target area, a second MPM reference block corresponding to the second target area according to the preset encoding direction and the second mapping relationship;
The predictive coding module is configured to code the second target PU according to the preset coding direction, the third reference pixel set, and the second MPM reference block.
In an alternative embodiment, the CU partitioning module is specifically configured to:
dividing the CTU based on the first preset rule to obtain a plurality of divided units;
Judging whether the split units comprise the split boundaries or not according to the position information of the split units and the position information of the split boundaries for each split unit, if so, dividing the split units to obtain the first CU, otherwise, taking the split units as the first CU;
Wherein, the first preset rule includes: the rate-distortion cost of the divided units is less than the rate-distortion cost of the CTU.
In a third aspect, an embodiment of the present invention provides an encoder applied to panoramic video, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the video intra-frame encoding method according to any one of the embodiments above when executing the computer program.
In a fourth aspect, embodiments of the present invention provide a computer storage medium storing computer instructions that, when executed on a computer, cause the computer to perform the steps of the video intra-frame encoding method of any one of the embodiments above.
The technical effects that may be achieved by the encoder disclosed in the second aspect, the encoder disclosed in the third aspect, and the computer storage medium disclosed in the fourth aspect are referred to the technical effects that may be achieved by the encoder disclosed in the first aspect or the various possible aspects of the first aspect, and the detailed description is not repeated here.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.
When panoramic video is stored, the panoramic video frames are required to be mapped into planar video frames, and then compression encoding, decoding and reflection processing are carried out on the mapped planar video frames, so that the panoramic video can be stored in a computer, at present, the panoramic video frames are usually mapped into the planar video frames by using a mapping method based on sphere equal area expansion, but the planar video obtained by using the mapping method has more deformation and splicing marks, and after subsequent encoding, decoding and reflection, visible splicing marks are generated visually, so that watching is influenced.
In order to solve the problem of weakening the splice mark after the mapping of the panoramic video frame without affecting the encoding efficiency, the embodiment of the invention provides a video intra-frame encoding method, an encoder and a medium.
Example 1
The following describes, by way of specific embodiments, a video intra-frame encoding method provided by the present invention, where the method is applied to panoramic video, as shown in fig. 1, and includes:
step 101: the encoder obtains a plurality of first coding units CU in the current video frame based on a first preset rule, the coding tree units CTU and the position information of the splicing boundary in the current video frame;
step 102: the encoder divides the plurality of first CUs according to a preset division mode to obtain a plurality of first Prediction Units (PU);
step 103: the encoder determines a first target PU adjacent to the splicing boundary in a preset encoding direction, and determines at least two first target areas and a first reference pixel set corresponding to each first target area according to a first mapping relation and a plurality of first target PUs;
step 104: for each first target area, the encoder determines a first MPM reference block corresponding to the first target area according to a preset encoding direction and a first mapping relation;
Step 105: the encoder encodes the first target PU according to a preset encoding direction, a first reference pixel set, and a first MPM reference block.
According to the video intra-frame coding method provided by the embodiment of the invention, an encoder firstly obtains a plurality of first CUs in a current video frame based on a first preset rule and the position information of a splicing boundary in the CTU and the current video frame, then divides the plurality of first CUs according to a preset dividing mode to obtain a plurality of first PUs, then determines a first target PU adjacent to the splicing boundary in a preset coding direction, determines at least two first target areas and a first reference pixel set corresponding to each first target area according to a first mapping relation and the plurality of first target PUs, determines a first MPM reference block corresponding to the first target areas, and finally codes the first target PU according to the preset coding direction, the first reference pixel set and the first MPM reference block. According to the video intra-frame coding method, when the first target PU is coded, the reference pixel point is not the pixel point adjacent to the first target PU, but is determined according to the first mapping relation and the plurality of first target PUs, so that the visual splicing boundary on the current video frame, which is caused by overlarge pixel difference at the splicing position, can be weakened, and the quality of video is improved.
The embodiment of the invention is applied to panoramic video, the video frames in the embodiment of the invention are planar video frames corresponding to the panoramic video, and the planar video frames are obtained by mapping panoramic spherical video frames.
In one embodiment, the encoder obtains a plurality of coding units CU in the current video frame based on a first preset rule, a coding tree unit CTU and position information of a splicing boundary in the current video frame, divides the CTU based on the first preset rule to obtain a plurality of divided units, then determines, for each divided unit, whether the divided unit includes the splicing boundary according to the position information of the divided unit and the position information of the splicing boundary, if so, divides the divided unit to obtain a first CU, otherwise, takes the divided unit as the first CU, wherein the first preset rule can be that the rate distortion cost of the divided unit is smaller than the rate distortion cost of the CTU.
In the embodiment of the present invention, the encoder may divide the current video frame according to slices (slices) or divide the current video frame according to slices (tiles) when encoding the current video frame, which is not limited in the embodiment of the present invention.
In practice, as shown in fig. 2, a structure of a current video frame with a splicing boundary is shown, and the current video frame 21 includes four splicing boundaries 22 and two Slice boundaries 23. The encoder divides the current video frame into slices (the current video frame can be divided into slices, which are not shown in fig. 2), so as to obtain two Slice boundaries 23, and divides the current video frame into three video frame areas, wherein each video frame area is independent, so that a plurality of video frame areas can be processed in parallel, and the encoding efficiency of the encoder is improved; since the current video frame is mapped by the panoramic spherical video through the mapping method based on sphere equal area expansion, a splicing boundary 22 exists in the current video frame, and the splicing boundary 22 divides the current video frame into two video frame areas S1 and S5. The current video frame is divided into five video frame areas S1, S2, S3, S4, and S5 based on the Slice boundary 23 and the splice boundary 22. The splice boundary in the embodiment of the invention is 22.
Specifically, as shown in fig. 3a, which is a schematic diagram illustrating the division of CU in the current video frame, the current video frame is divided into a plurality of CTUs 31, where CTUs 31 are basic units of an encoder for performing prediction, transformation, quantization and entropy coding operations on the current video frame, and may have a size of 64×64.
Specifically, the encoder may divide each CTU 31 into four branches according to a preset encoding sequence based on the complexity of the current video frame content by using a rate distortion optimization algorithm to obtain a plurality of CUs 32, where the CUs 32 have four sizes, and the four sizes are sequentially: 64×64, 32×32, 16×16, 8×8. The encoder calculates a first rate distortion cost of the divided units and a second rate distortion cost of the units before division by using a rate distortion optimization algorithm, if the first rate distortion cost is smaller than the second rate distortion cost, the current unit is subjected to quadtree division, if the first rate distortion cost is larger than the second rate distortion cost, the current unit is not subjected to quadtree division, and a final CU division result of the current video frame is obtained through a repeated recursion division process. For each CU located at the splicing boundary, the encoder judges whether the current CU comprises the splicing boundary, if so, the current CU is forced to carry out quadtree division again, and if not, the current CU is reserved.
For example, as shown in fig. 3a, CU 33 is a CU that includes a splice boundary, but CU 33 is not partitioned along the splice boundary, at this time, the encoder forces the CU 33 to be further quadtree partitioned, and the result is shown in fig. 3 b.
It should be noted that, a Slice or a Tile is located at the upper layer of CTUs in the image structure, where a Slice is composed of an integer number of CTUs arranged in succession, and a Tile is also composed of an integer number of CTUs arranged in succession.
In the embodiment of the invention, the encoder calculates the rate distortion cost after each CU is divided and the rate distortion cost before the division based on the rate distortion optimization algorithm of the encoder, selects the division method with the minimum rate distortion cost as the most reasonable division method, and carries out CU division on the current video frame according to the division method. The encoder uses the CU as a unit to carry out encoding operation, so that the encoding efficiency is greatly improved.
In an implementation, an encoder sub-divides a CU divided into a current video frame once to obtain a plurality of PUs.
In one embodiment, the encoder determines at least two first target areas and a first reference pixel set corresponding to each first target area according to a first mapping relation and a plurality of first target PUs, may determine a reference area adjacent to each first target PU according to the first mapping relation, determine a category of each reference area according to position information of each reference area, form target boundary pixel points in the first target PU corresponding to the reference area of the same category into the first target area in the current video frame, and use target pixel points in the reference area of the same category as the first reference pixel set.
It should be noted that in the embodiment of the present invention, the first target area is a row or a column of pixels, the reference area is also a row or a column of pixels, and the first target PU allows for reference across a Slice or Tile boundary.
In a specific implementation, as shown in fig. 4, a schematic structural diagram of a first target area of a current video frame 21 and a reference area corresponding to the first target area is shown, where an encoder may encode in a "Z" coding order, determine a plurality of first target PUs adjacent to 4 splicing boundaries 22 in the encoding direction, and determine 5 first target areas according to a first mapping relationship, position information of the reference area, and the plurality of first target PUs, that is: a target area, B target area, C target area, D target area, E target area, and 5 reference areas corresponding to 5 first target areas, namely: a reference area, b reference area, c reference area, d reference area, e reference area. The target boundary pixel points in the first target PU corresponding to the reference areas of the same class in the current video frame 21 form a first target area, and the target pixel points in the reference areas of the same class are used as a first reference pixel set.
In the embodiment of the invention, an encoder determines a reference area adjacent to each first target PU according to a first mapping relation corresponding to spherical video and planar video, namely the first target PU and the reference area corresponding to the first target PU are adjacent in a spherical video frame, determines the category of the first target PU according to the position information of each reference area, forms a first target area by target boundary pixel points in the first target PU corresponding to the reference area of the same category in a current video frame, and takes target pixel points in the reference area of the same category as a first reference pixel set.
In the embodiment of the invention, the first target area of the reference pixel and the first reference pixel set corresponding to the first target area are determined in the current video frame, the reference pixel of the first target area is modified into the set of pixels adjacent to the first reference pixel set on the spherical video frame, namely the first reference pixel set, but not the set of pixels adjacent to the first reference pixel set on the planar video frame, and as the correlation between the target pixel points of the target boundary pixel points in the first target area and the target pixel points referenced in the prediction coding is strong, the visual influence caused by large difference of pixel values can be reduced.
Specifically, after determining at least two first target areas and the first reference pixel sets corresponding to each first target area, the encoder may determine, for each first target area, that the reference area has a reference value according to the position information of the first target area and the position information of the reference area corresponding to the first target area before determining the MPM reference block corresponding to the first target area according to the preset encoding direction and the first mapping relation.
For example, as shown in fig. 4, since the encoder encodes in the "zigzag" encoding order, when it is necessary to encode the a target region, the pixels in the corresponding a reference region are all encoded pixels, and can be used as reference pixels of the a target region in the predictive encoding process, and thus the a reference region corresponding to the a target region is actually of reference value.
It should be noted that, in the embodiment of the present invention, the coding sequence is the coding direction.
In one embodiment, after the encoder uses the target pixel points in the reference area of the same class as the first reference pixel set, the encoder may further compare the number of the target pixel points in the first reference pixel set with the number of the target boundary pixel points in the first target area corresponding to the first reference pixel set, and if the comparison result is different, determine the second reference pixel set according to the related information of each target pixel point in the first reference pixel set and the related information of each target boundary pixel point in the first target area, where the related information includes the position information and the pixel value.
Specifically, the encoder needs to compare the number of the target pixel points in the first reference pixel set with the number of the target boundary pixel points in the first target area corresponding to the first reference pixel set, and if the comparison results are the same, a direct reference method is adopted, that is, the first reference pixel set is used as the reference pixel set of the target boundary pixel points in the first target area corresponding to the first reference pixel set; if the comparison results are different, an interpolation processing method is adopted, that is, a second reference pixel set is determined according to the related information of each target pixel point in the first reference pixel set and the related information of each target boundary pixel point in the first target area, and the second reference pixel set is used as the reference pixel set of the target boundary pixel point in the first target area corresponding to the second reference pixel set.
For example, as shown in fig. 4, the a target area, the D target area, and the E target area belong to a direct reference class, and the direct reference class does not need any interpolation processing; the B target area and the C target area belong to the class of difference interpolation processing.
In the embodiment of the invention, if the number of the target pixel points in the first reference pixel set is different from the number of the target boundary pixel points in the first target area corresponding to the first reference pixel set, the first reference pixel set has no reference value for the first target area, and a new pixel reference set, namely a second reference pixel set, needs to be rebuilt based on the interpolation processing method according to the related information of the target pixel points and the related information of the target boundary pixel points. And each target boundary pixel point in the first target region carries out predictive coding according to the second reference pixel set, so that the accuracy of a predictive coding result can be improved.
In one embodiment, the encoder determines a second reference pixel set according to the related information of each target pixel point in the first reference pixel set and the related information of each target boundary pixel point in the first target region, may determine, for each target boundary pixel point in the first target region, a target position in a reference region corresponding to the first target region according to the position information of the target boundary pixel point in the first target region, then determine, according to the position information of the target position, the position information of a first target pixel point adjacent to the target position, and the position information of a second target pixel point adjacent to the target position, a first weight and a second weight, where the first weight is used to characterize the distance between the target position and the first target pixel point, the second weight is used to characterize the distance between the target position and the second target pixel point, determine, according to the first weight, the second weight, the pixel value of the first target pixel point, and the pixel value of the second target pixel point, and finally determine all the pixel values corresponding to the target position as the second reference set.
Specifically, for the first target area of the interpolation processing class, taking the a target area as an example, the vertical resolution of the a target area is different from that of the a reference area, that is, the number of target boundary pixel points in the a target area is different from that of target pixel points in the a reference area, and the required pixel value needs to be obtained by utilizing the peripheral pixel linear interpolation in the a reference area.
As shown in fig. 5, as a weighted interpolation schematic diagram of the to-be-interpolated pixel point, the distance between the adjacent pixel points on two sides of the to-be-interpolated pixel point is used as a weight, and weighted mixture is used for interpolation. (Ri, 0) and (ri+1, 0) respectively represent reference pixel values adjacent to two sides of the pixel point Ri to be interpolated and existing in the first reference pixel set, wi is a distance from Ri to (Ri, 0), the larger the distance is, the smaller the assigned weight is, otherwise, the larger the assigned weight is, and finally, the interpolated Ri is taken as the reference pixel, and a calculation formula is as follows:
Ri=(1-wi)Ri,0+wiRi+1,0
For example, the a target area contains 4 target boundary pixels, the a reference area contains 7 target pixels, the encoder performs predictive coding on the 3 rd target boundary pixel in the a target area, the 3 rd target boundary pixel is at the 3/4 position in the a target area, the 3 rd target boundary pixel is at the 7 x (3/4) position in the a reference area, and the position is taken as the target position; then, determining a target pixel point adjacent to the target position, namely, the 2 nd target pixel point in the a reference area is a first target pixel point, the pixel value of the first target pixel point is (Ri, 0), and the 3 rd target pixel point in the a reference area is a second target pixel point, the pixel value of the second target pixel point is (Ri+1, 0); the distance wi between the target position and the 2 nd target pixel point in the a reference area is determined, the distance wi is taken as a first weight, and the distance (1-wi) between the target position and the 3 rd target pixel point in the a reference area is determined, and the distance wi is taken as a second weight.
And calculating by using the calculation formula to obtain pixel values of points to be interpolated, traversing all target boundary pixel points in the target area A, sequentially performing interpolation processing, and taking the obtained pixel values of all the points to be interpolated as a second reference pixel set.
According to the embodiment of the invention, a specific process of determining a second reference pixel set for an encoder is that the encoder firstly determines a target position in a reference area corresponding to a target boundary pixel point according to position information of the target boundary pixel point in the first target area, then determines a first weight used for representing the distance between the target position and the first target pixel point according to the position information of the target position and the position information of the first target pixel point adjacent to the target position, determines a second weight used for representing the distance between the target position and the second target pixel point according to the position information of the target position and the position information of the second target pixel point adjacent to the target position, and finally determines a pixel value corresponding to the target position according to the first weight, the second weight, the pixel value of the first target pixel point and the pixel value of the second target pixel point, and the pixel values corresponding to all the target positions form the second reference pixel set. Reference pixels within the second set of reference pixels are determined by interpolation processing to facilitate predictive encoding of the first target region accordingly.
In one embodiment, if the encoder determines that the reference area has no reference value, the encoder rotates the current video frame by a preset angle according to a preset selection direction to obtain a rotated video frame, obtains CTUs and position information of a splicing boundary in the rotated video frame based on a first preset rule, obtains a plurality of second CUs in the rotated video frame, divides the plurality of second CUs according to a preset division mode to obtain a plurality of second PUs, determines a second target PU adjacent to the splicing boundary in a preset encoding direction, determines at least two second target areas and a third reference pixel set corresponding to each second target area according to a second mapping relation and the plurality of second target PUs, and determines a second MPM reference block corresponding to the second target area according to the preset encoding direction and the second mapping relation for each second target area, and encodes the second target PU according to the preset encoding direction, the third reference pixel set and the second MPM reference block.
Specifically, as shown in fig. 4, since the encoder encodes in the "zigzag" encoding order, when the D target region needs to be encoded, the pixels in the corresponding D reference region are all uncoded pixels, and cannot be used as the reference pixels of the D target region in the prediction encoding process, so the D reference region corresponding to the D target region is virtually non-reference value, and similarly, the E reference region corresponding to the E target region is virtually non-reference value.
In a specific implementation, if the reference area corresponding to the first target area has no reference value, the original video frame can be rotated 90 ° clockwise, then encoded, and after decoding, the video frame can be rotated 90 ° counterclockwise again, thus being recovered.
Specifically, after the second CU and the second PU are divided for the rotated video frame, the encoder determines a second target PU adjacent to the splicing boundary in a preset encoding direction, and determines at least two second target areas and a third reference pixel set corresponding to each second target area according to a second mapping relationship and the plurality of second target PUs.
For example, as shown in fig. 6, the mapping relationship between the second target area and the third reference pixel set in the rotated video frame is shown, and in the rotated video frame, according to the second mapping relationship, the second mapping relationship is obtained by rotating the first mapping relationship, and the plurality of second target PUs at the 4 splicing boundaries 22 are divided into B, C, D, F, G types of second target areas, that is, the second target areas include: the 5 reference areas corresponding to the 5 target areas are a B reference area, a C reference area, a D reference area, an F reference area and a G reference area respectively. Wherein, the intersecting part of the F target area and the G target area is provided with a left area referenced with an F reference area and an upper area referenced with a G reference area.
According to the second mapping relationship, the encoder further determines two other reference areas with reference values, namely an a reference area and an E reference area, so that when the pixel points of the 5 second target areas are processed, the pixel points in the A target area and the E target area can be processed. In this way, the encoder can process more target areas, and the accuracy of predictive coding is improved.
According to the embodiment of the invention, the encoder determines that the reference area has no reference value according to the position information of the first target area and the position information of the reference area corresponding to the first target area, rotates the video frame by a preset angle according to a preset selection direction to obtain a rotated video frame, divides the rotated video frame to obtain a second CU, divides the rotated video frame into a second PU, determines a second target area, a third reference pixel set corresponding to the second target area and a second MPM reference block, and performs predictive coding. By rotating the video frames with reference areas without reference values by a preset angle according to a preset selection direction, all the reference areas corresponding to the second target areas have reference values, and at the moment, the video frames after rotation are subjected to predictive coding operation, so that each second target area can be subjected to proper predictive coding operation, the video coding effect is greatly improved, and the visual viewing effect of the panoramic video after reverse mapping is better.
In one embodiment, for each first target area, the encoder determines a first MPM reference block corresponding to the first target area according to a preset encoding direction and a first mapping relationship. Since the PU has 53 prediction modes, the encoder needs to select one of the most likely modes and perform predictive encoding according to the most likely modes, and the encoder determines neighboring first MPM reference blocks of the current first target area on the spherical video corresponding to the current video frame according to the first mapping relationship, where the prediction mode of each first target area refers to the prediction mode of the first MPM reference block corresponding to the first target area.
Example 2
Based on the same conception, the embodiment of the invention also provides an encoder applied to panoramic video, and as the encoder is the encoder in the method in the embodiment of the invention, and the principle of the encoder for solving the problem is similar to that of the method, the implementation of the encoder can refer to the implementation of the method, and the repetition is omitted.
An encoder 70 according to this embodiment of the present invention is described below with reference to fig. 7. The encoder 70 shown in fig. 7 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 7, the encoder 70 may be embodied in the form of a general purpose computing device, which may be a terminal device, for example. Components of encoder 70 may include, but are not limited to: the at least one processor 71, the at least one memory 72 storing instructions executable by the processor 71, a bus 73 connecting the different system components, including the memory 72 and the processor 71, the processor 71 being a processor of a smart device.
The processor 71 executes executable instructions to implement the steps of:
Based on a first preset rule, a coding tree unit CTU and position information of a splicing boundary in a current video frame, obtaining a plurality of first coding units CU in the current video frame;
dividing the plurality of first CUs according to a preset dividing mode to obtain a plurality of first Prediction Units (PU);
determining first target PU adjacent to the splicing boundary in a preset encoding direction, and determining at least two first target areas and a first reference pixel set corresponding to each first target area according to a first mapping relation and a plurality of first target PU;
For each first target area, determining a first MPM reference block corresponding to the first target area according to the preset coding direction and the first mapping relation;
and encoding the first target PU according to the preset encoding direction, the first reference pixel set and the first MPM reference block.
In one embodiment, the processor 71 is specifically configured to:
Determining a reference area adjacent to each first target PU according to the first mapping relation;
determining the category of each reference area according to the position information of each reference area;
And in the current video frame, forming a first target area by target boundary pixel points in a first target PU corresponding to the reference area of the same category, and taking the target pixel points in the reference area of the same category as the first reference pixel set.
In one embodiment, the processor 71 is specifically configured to:
comparing the number of the target pixel points in the first reference pixel set with the number of the target boundary pixel points in the first target area corresponding to the first reference pixel set, and if the comparison results are different, determining a second reference pixel set according to the related information of each target pixel point in the first reference pixel set and the related information of each target boundary pixel point in the first target area;
wherein the related information includes position information and pixel values.
In one embodiment, the processor 71 is specifically configured to:
For each target boundary pixel point in the first target area, determining a target position in a reference area corresponding to the first target area according to the position information of the target boundary pixel point in the first target area;
Determining a first weight and a second weight according to the position information of the target position, the position information of a first target pixel point adjacent to the target position and the position information of a second target pixel point adjacent to the target position, wherein the first weight is used for representing the distance between the target position and the first target pixel point, and the second weight is used for representing the distance between the target position and the second target pixel point;
Determining a pixel value corresponding to the target position according to the first weight, the second weight, the pixel value of the first target pixel point and the pixel value of the second target pixel point;
and taking the determined pixel values corresponding to all the target positions as the second reference pixel set.
In one embodiment, the processor 71 is specifically configured to:
For each first target area, determining that the reference area has a reference value according to the position information of the first target area and the position information of the reference area corresponding to the first target area.
In one embodiment, the processor 71 is specifically configured to:
The method comprises the steps of rotating a video frame by a preset angle according to a preset selection direction to obtain a rotated video frame when the reference area has no reference value;
Based on a first preset rule, obtaining CTU and position information of a splicing boundary in the rotated video frame, and obtaining a plurality of second CUs in the rotated video frame;
dividing the plurality of second CUs according to a preset dividing mode to obtain a plurality of second PUs;
Determining second target PU adjacent to the splicing boundary in a preset encoding direction, and determining at least two second target areas and a third reference pixel set corresponding to each second target area according to a second mapping relation and a plurality of second target PU;
for each second target area, determining a second MPM reference block corresponding to the second target area according to the preset coding direction and the second mapping relation;
and encoding the second target PU according to the preset encoding direction, the third reference pixel set and the second MPM reference block.
In one embodiment, the processor 71 is specifically configured to:
dividing the CTU based on the first preset rule to obtain a plurality of divided units;
Judging whether the split units comprise the split boundaries or not according to the position information of the split units and the position information of the split boundaries for each split unit, if so, dividing the split units to obtain the first CU, otherwise, taking the split units as the first CU;
Wherein, the first preset rule includes: the rate-distortion cost of the divided units is less than the rate-distortion cost of the CTU.
Bus 73 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, and a local bus using any of a variety of bus architectures.
Memory 72 may include readable media in the form of volatile memory such as Random Access Memory (RAM) 721 and/or cache memory 722, and may further include Read Only Memory (ROM) 723.
Memory 72 may also include a program/utility 725 having a set (at least one) of program modules 724, such program modules 724 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The cluster resource planning device 70 may also communicate with one or more external devices 74 (e.g., keyboard, pointing device, etc.), one or more devices that enable users to interact with the cluster resource planning device 70, and/or any devices (e.g., routers, modems, etc.) that enable the cluster resource planning device 70 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 75. Moreover, cluster resource planning device 70 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet via network adapter 76. As shown, the network adapter 76 communicates with other modules of the electronic device 70 over the bus 73. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with cluster resource planning device 70, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
Example 3
Based on the same conception, the embodiment of the invention also provides an encoder applied to panoramic video, and as the device is the device in the method in the embodiment of the invention and the principle of solving the problem of the device is similar to that of the method, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.
As shown in fig. 8, the encoder includes the following modules:
a CU partitioning module 801, configured to obtain a plurality of first coding units CUs in a current video frame based on a first preset rule, a coding tree unit CTU, and position information of a splicing boundary in the current video frame;
The PU dividing module 802 is configured to divide the plurality of first CUs according to a preset division manner to obtain a plurality of first prediction units PU;
A reference pixel set determining module 803, configured to determine, in a preset encoding direction, a first target PU adjacent to the stitching boundary, and determine, according to a first mapping relationship and a plurality of first target PUs, at least two first target areas and a first reference pixel set corresponding to each first target area;
an MPM reference block determining module 804, configured to determine, for each first target area, a first MPM reference block corresponding to the first target area according to the preset encoding direction and the first mapping relationship;
The prediction encoding module 805 is configured to encode the first target PU according to the preset encoding direction, the first reference pixel set, and the first MPM reference block.
In an alternative embodiment, the reference pixel set determining module 803 is specifically configured to:
Determining a reference area adjacent to each first target PU according to the first mapping relation;
determining the category of each reference area according to the position information of each reference area;
And in the current video frame, forming a first target area by target boundary pixel points in a first target PU corresponding to the reference area of the same category, and taking the target pixel points in the reference area of the same category as the first reference pixel set.
In an alternative embodiment, the reference pixel set determining module 803 is further configured to:
comparing the number of the target pixel points in the first reference pixel set with the number of the target boundary pixel points in the first target area corresponding to the first reference pixel set, and if the comparison results are different, determining a second reference pixel set according to the related information of each target pixel point in the first reference pixel set and the related information of each target boundary pixel point in the first target area;
wherein the related information includes position information and pixel values.
In an alternative embodiment, the reference pixel set determining module 803 is specifically configured to:
For each target boundary pixel point in the first target area, determining a target position in a reference area corresponding to the first target area according to the position information of the target boundary pixel point in the first target area;
Determining a first weight and a second weight according to the position information of the target position, the position information of a first target pixel point adjacent to the target position and the position information of a second target pixel point adjacent to the target position, wherein the first weight is used for representing the distance between the target position and the first target pixel point, and the second weight is used for representing the distance between the target position and the second target pixel point;
Determining a pixel value corresponding to the target position according to the first weight, the second weight, the pixel value of the first target pixel point and the pixel value of the second target pixel point;
and taking the determined pixel values corresponding to all the target positions as the second reference pixel set.
In an alternative embodiment, the reference pixel set determining module 803 is further configured to:
For each first target area, determining that the reference area has a reference value according to the position information of the first target area and the position information of the reference area corresponding to the first target area.
In an alternative embodiment, as shown in fig. 9, the encoder further includes:
The video rotation module 901 is configured to rotate the video frame by a preset angle according to a preset selection direction to obtain a rotated video frame if the reference area has no reference value;
the CU partitioning module 801 is configured to obtain, based on a first preset rule, CTUs and position information of a splicing boundary in the rotated video frame, and obtain a plurality of second CUs in the rotated video frame;
The PU dividing module 802 is configured to divide the plurality of second CUs according to a preset division manner to obtain a plurality of second PUs;
The reference pixel set determining module 803 is configured to determine, in a preset encoding direction, a second target PU adjacent to the stitching boundary, and determine, according to a second mapping relationship and a plurality of second target PUs, at least two second target areas and a third reference pixel set corresponding to each second target area;
The MPM reference block determining module 804 is configured to determine, for each second target area, a second MPM reference block corresponding to the second target area according to the preset encoding direction and the second mapping relationship;
The predictive coding module 805 is configured to code the second target PU according to the preset coding direction, the third reference pixel set, and the second MPM reference block.
In an alternative embodiment, the CU partitioning module 801 is specifically configured to:
dividing the CTU based on the first preset rule to obtain a plurality of divided units;
Judging whether the split units comprise the split boundaries or not according to the position information of the split units and the position information of the split boundaries for each split unit, if so, dividing the split units to obtain the first CU, otherwise, taking the split units as the first CU;
Wherein, the first preset rule includes: the rate-distortion cost of the divided units is less than the rate-distortion cost of the CTU.
Example 4
In some possible embodiments, the aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of the respective modules in an encoder according to the various exemplary embodiments of the present disclosure as described in the section "exemplary method" of the present specification, when the program product is run on a terminal device, e.g. the encoder may be adapted to derive a plurality of first coding units CU in a current video frame based on a first preset rule, a coding tree unit CTU and position information of a splicing boundary in said current video frame; dividing the plurality of first CUs according to a preset dividing mode to obtain a plurality of first Prediction Units (PU); determining first target PU adjacent to the splicing boundary in a preset encoding direction, and determining at least two first target areas and a first reference pixel set corresponding to each first target area according to a first mapping relation and a plurality of first target PU; for each first target area, determining a first MPM reference block corresponding to the first target area according to the preset coding direction and the first mapping relation; and encoding the first target PU according to the preset encoding direction, the first reference pixel set and the first MPM reference block.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As shown in fig. 10, a program product 100 for an encoder according to an embodiment of the present invention is described, which may employ a portable compact disc read-only memory (CD-ROM) and comprise program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
It should be noted that while several modules or sub-modules of the system are mentioned in the detailed description above, such partitioning is merely exemplary and not mandatory. Indeed, the features and functions of two or more modules described above may be embodied in one module in accordance with embodiments of the present invention. Conversely, the features and functions of one module described above may be further divided into a plurality of modules to be embodied.
The present application is described above with reference to block diagrams and/or flowchart illustrations of methods, apparatus (systems) and/or computer program products according to embodiments of the application. It will be understood that one block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, and/or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, the present application may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Still further, the present application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of the present application, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.