CN114567776B

CN114567776B - Video low-complexity coding method based on panoramic visual perception characteristics

Info

Publication number: CN114567776B
Application number: CN202210157533.5A
Authority: CN
Inventors: 杜宝祯; 张奇
Original assignee: Ningbo Polytechnic
Current assignee: Zhejiang Chuanzhi Electronic Technology Co ltd
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2023-05-05
Anticipated expiration: 2042-02-21
Also published as: CN114567776A

Abstract

The invention discloses a video low-complexity coding method based on panoramic visual perception characteristics, which uses an airspace JND threshold value as an airspace perception factor, obtains a motion perception factor through a weighted gradient value, further obtains the average value of space-time weighted perception factors of all pixel points in a maximum coding unit, calculates Lagrange coefficient adjustment factors based on the space-time weighted perception factors of the maximum coding unit according to a rate distortion optimization theory, and further obtains the quantization parameter variation quantity of the maximum coding unit based on the space-time weighted perception factors; simultaneously calculating quantization parameter variation of the maximum coding unit based on dimension weight; calculating new coding quantization parameters of the maximum coding unit according to the two quantization parameter variation amounts, and applying the new coding quantization parameters to coding; the method has the advantages that the coding quality can be ensured, the coding rate can be effectively reduced, the coding complexity can be effectively reduced, the rate distortion performance is obviously improved, and the coding effect is better particularly when the initial coding quantization parameter is smaller.

Description

A low-complexity video coding method based on panoramic visual perception characteristics

技术领域Technical Field

本发明涉及一种视频编码技术，尤其是涉及一种基于全景视觉感知特性的视频低复杂度编码方法。The present invention relates to a video encoding technology, and in particular to a video low-complexity encoding method based on panoramic visual perception characteristics.

背景技术Background Art

近年来，全景视频系统以其“身临其境”的视觉体验受到了人们的广泛欢迎，在虚拟现实、模拟驾驶等领域有着巨大的应用前景。但是，目前全景视频系统在编码方面仍然存在编码复杂度过高的问题，这对全景视频系统的应用带来了巨大的挑战。因此，如何降低编码复杂度已成为该领域亟待解决的技术问题。In recent years, panoramic video systems have been widely welcomed by people for their "immersive" visual experience, and have great application prospects in the fields of virtual reality, simulated driving, etc. However, the current panoramic video system still has the problem of high coding complexity, which brings great challenges to the application of panoramic video systems. Therefore, how to reduce the coding complexity has become a technical problem that needs to be solved urgently in this field.

现有的全景视频低复杂度编码算法，未充分考虑人眼视觉系统(Human VisualSystem,HVS)感知特性和全景视频特点，难以达到最优的编码性能。由于视频编码的主要目的是在保证一定视频质量的前提下，尽可能地减少编码的码率；或者在编码的码率受限的情况下，采用失真最小的模式进行编码。因此，如何结合利用人眼视觉系统感知特性和全景视频特点，用于指导编码参数选择，就成为了该领域研究降低编码复杂度的重要突破方向。The existing low-complexity coding algorithms for panoramic videos do not fully consider the perceptual characteristics of the human visual system (HVS) and the characteristics of panoramic videos, making it difficult to achieve optimal coding performance. Since the main purpose of video coding is to reduce the coding bit rate as much as possible while ensuring a certain video quality; or to use the least distorted mode for coding when the coding bit rate is limited. Therefore, how to combine the perceptual characteristics of the human visual system and the characteristics of panoramic videos to guide the selection of coding parameters has become an important breakthrough direction for reducing coding complexity in this field.

发明内容Summary of the invention

本发明所要解决的技术问题是提供一种基于全景视觉感知特性的视频低复杂度编码方法，其能够有效节省编码码率，从而能够有效降低编码复杂度。The technical problem to be solved by the present invention is to provide a low-complexity video encoding method based on panoramic visual perception characteristics, which can effectively save the encoding bit rate and thus effectively reduce the encoding complexity.

本发明解决上述技术问题所采用的技术方案为：一种基于全景视觉感知特性的视频低复杂度编码方法，其特征在于包括以下步骤：The technical solution adopted by the present invention to solve the above technical problems is: a video low-complexity encoding method based on panoramic visual perception characteristics, characterized by comprising the following steps:

步骤1：将ERP投影格式的全景视频中当前待编码的视频帧定义为当前帧；其中，ERP投影格式的全景视频中的视频帧的宽度为W且高度为H；Step 1: define a video frame to be encoded in the panoramic video in the ERP projection format as a current frame; wherein the width of the video frame in the panoramic video in the ERP projection format is W and the height is H;

步骤2：判断当前帧是否为第1帧视频帧，如果是，则采用HEVC视频编码器的原始算法对当前帧进行编码，然后执行步骤10；否则，执行步骤3；Step 2: Determine whether the current frame is the first video frame. If so, encode the current frame using the original algorithm of the HEVC video encoder, and then execute step 10; otherwise, execute step 3;

步骤3：对当前帧中的每个像素点进行空域JND阈值计算，得到当前帧的全景空域JND阈值图，记为G₁，G₁中的每个像素点的像素值即为当前帧中对应像素点的空域JND阈值；并对当前帧中的每个像素点进行加权梯度计算，得到当前帧的加权梯度图，记为G₂，G₂中的每个像素点的像素值即为当前帧中对应像素点的加权梯度值；Step 3: Perform spatial JND threshold calculation on each pixel in the current frame to obtain the panoramic spatial JND threshold map of the current frame, denoted as G ₁ . The pixel value of each pixel in G ₁ is the spatial JND threshold of the corresponding pixel in the current frame. Perform weighted gradient calculation on each pixel in the current frame to obtain the weighted gradient map of the current frame, denoted as G ₂ . The pixel value of each pixel in G ₂ is the weighted gradient value of the corresponding pixel in the current frame.

步骤4：计算当前帧中的每个像素点的空域感知因子，将当前帧中坐标位置为(x,y)的像素点的空域感知因子记为δ_A(x,y)，δ_A(x,y)＝G₁(x,y)；并计算当前帧中的每个像素点的运动感知因子，将当前帧中坐标位置为(x,y)的像素点的运动感知因子记为δ_T(x,y)，

然后计算当前帧中的每个像素点的时空加权感知因子，将当前帧中坐标位置为(x,y)的像素点的时空加权感知因子记为δ(x,y)，δ(x,y)＝δ_A(x,y)×δ_T(x,y)；再计算当前帧中的所有像素点的时空加权感知因子的平均值，记为S_δ；计算当前帧中的每个像素点的维度权重，将当前帧中坐标位置为(x,y)的像素点的维度权重记为w_ERP(x,y)，

其中，0≤x≤W-1,0≤y≤H-1，G₁(x,y)表示G₁中坐标位置为(x,y)的像素点的像素值，G₁(x,y)亦表示当前帧中坐标位置为(x,y)的像素点的空域JND阈值，G₂(x,y)表示G₂中坐标位置为(x,y)的像素点的像素值，G₂(x,y)亦表示当前帧中坐标位置为(x,y)的像素点的加权梯度值，S_F表示G₂中的所有像素点的像素值的平均值，S_F亦表示当前帧中的所有像素点的加权梯度值的平均值，ε为运动感知常数，ε∈[1,2]，cos()为余弦函数；Step 4: Calculate the spatial perception factor of each pixel in the current frame, and record the spatial perception factor of the pixel with coordinate position (x, y) in the current frame as δ _A (x, y), δ _A (x, y) = G ₁ (x, y); and calculate the motion perception factor of each pixel in the current frame, and record the motion perception factor of the pixel with coordinate position (x, y) in the current frame as δ _T (x, y),

Then, the spatiotemporal weighted perceptual factor of each pixel in the current frame is calculated, and the spatiotemporal weighted perceptual factor of the pixel with coordinate position (x, y) in the current frame is recorded as δ(x, y), δ(x, y) = δ _A (x, y) × δ _T (x, y); then the average spatiotemporal weighted perceptual factor of all pixels in the current frame is calculated, recorded as S _δ ; the dimensional weight of each pixel in the current frame is calculated, and the dimensional weight of the pixel with coordinate position (x, y) in the current frame is recorded as w _ERP (x, y),

Wherein, 0≤x≤W-1, 0≤y≤H-1, _G1 (x,y) represents the pixel value of the pixel with coordinate position (x,y) in _G1 , _G1 (x,y) also represents the spatial JND threshold of the pixel with coordinate position (x,y) in the current frame, _G2 (x,y) represents the pixel value of the pixel with coordinate position (x,y) in _G2 , _G2 (x,y) also represents the weighted gradient value of the pixel with coordinate position (x,y) in the current frame, _SF represents the average pixel value of all pixels in _G2 , _SF also represents the average weighted gradient value of all pixels in the current frame, ε is the motion perception constant, ε∈[1,2], cos() is the cosine function;

步骤5：将当前帧中当前待处理的最大编码单元定义为当前最大编码单元；Step 5: define the maximum coding unit to be processed in the current frame as the current maximum coding unit;

步骤6：计算当前最大编码单元中的所有像素点的时空加权感知因子的平均值，记为S_{δ_LCU}；然后计算当前最大编码单元的基于时空加权感知因子的拉格朗日系数调节因子，记为Ψ_LCU，

再计算当前最大编码单元的基于时空加权感知因子的量化参数变化量，记为ΔQP₁，ΔQP₁＝3log₂(Ψ_LCU)；其中，K_LCU和B_LCU均为调节参数，K_LCU∈(0,1)，B_LCU∈(0,1)；Step 6: Calculate the average value of the spatiotemporal weighted perceptual factors of all pixels in the current maximum coding unit, denoted as S _{δ_LCU} ; then calculate the Lagrange coefficient adjustment factor based on the spatiotemporal weighted perceptual factor of the current maximum coding unit, denoted as Ψ _LCU ,

Then calculate the change of the quantization parameter based on the spatiotemporal weighted perceptual factor of the current largest coding unit, which is recorded as ΔQP ₁ , ΔQP ₁ =3log ₂ (Ψ _LCU ); wherein K _LCU and B _LCU are adjustment parameters, K _LCU ∈(0,1), B _LCU ∈(0,1);

步骤7：计算当前最大编码单元中的所有像素点的维度权重的平均值，记为S_{wERP_LCU}；再计算当前最大编码单元的基于维度权重的量化参数变化量，记为ΔQP₂，

其中，a和b均为调节参数，a∈(0,1)，b∈(0,1)，b＜a；Step 7: Calculate the average value of the dimension weights of all pixels in the current largest coding unit, recorded as S _{wERP_LCU} ; then calculate the quantization parameter change based on the dimension weight of the current largest coding unit, recorded as ΔQP ₂ ,

Wherein, a and b are adjustment parameters, a∈(0,1), b∈(0,1), b＜a;

步骤8：计算当前最大编码单元的新的编码量化参数，记为QP_new，

然后用QP_new更新当前最大编码单元的编码量化参数；再对当前最大编码单元进行编码；其中，QP_org表示当前最大编码单元的原始的编码量化参数，符号

为向下取整运算符号；Step 8: Calculate the new encoding quantization parameter of the current largest coding unit, denoted as QP _new ,

Then, QP _new is used to update the encoding quantization parameter of the current maximum coding unit; and the current maximum coding unit is encoded again; wherein QP _org represents the original encoding quantization parameter of the current maximum coding unit, and the symbol

is the floor rounding operator symbol;

步骤9：将当前帧中下一个待处理的最大编码单元作为当前最大编码单元，然后返回步骤6继续执行，直至当前帧中的所有最大编码单元均处理完毕，再执行步骤10；Step 9: taking the next maximum coding unit to be processed in the current frame as the current maximum coding unit, and then returning to step 6 to continue executing until all maximum coding units in the current frame are processed, and then executing step 10;

步骤10：将ERP投影格式的全景视频中下一帧待编码的视频帧作为当前帧，然后返回步骤2继续执行，直至ERP投影格式的全景视频中的所有视频帧均编码完毕。Step 10: The next video frame to be encoded in the panoramic video in the ERP projection format is used as the current frame, and then the process returns to step 2 to continue until all video frames in the panoramic video in the ERP projection format are encoded.

所述的步骤3中，G₁的获取方式为：采用空域恰可察觉失真模型对当前帧中的每个像素点进行空域JND阈值计算得到G₁。In the step 3, G ₁ is obtained by using a spatial just perceptible distortion model to perform spatial JND threshold calculation on each pixel point in the current frame to obtain G ₁ .

所述的步骤3中，G₂的获取过程为：将G₂中坐标位置为(x,y)的像素点的像素值记为G₂(x,y)，

其中，0≤x≤W-1,0≤y≤H-1，G₂(x,y)亦表示当前帧中坐标位置为(x,y)的像素点的加权梯度值，

表示水平方向，

表示垂直方向，

表示时域方向，

表示当前帧中坐标位置为(x,y)的像素点的水平方向梯度值，

表示当前帧中坐标位置为(x,y)的像素点的垂直方向梯度值，

表示当前帧中坐标位置为(x,y)的像素点的时域方向梯度值，

和

由3D-sobel算子计算得到，α表示水平方向的梯度调节因子，β表示垂直方向的梯度调节因子，γ表示时域方向的梯度调节因子，α+β+γ＝1。In step 3, the process of obtaining G ₂ is as follows: the pixel value of the pixel point with coordinate position (x, y) in G ₂ is recorded as G ₂ (x, y),

Where, 0≤x≤W-1, 0≤y≤H-1, G ₂ (x, y) also represents the weighted gradient value of the pixel with coordinate position (x, y) in the current frame.

Indicates the horizontal direction,

Indicates the vertical direction,

represents the time domain direction,

Indicates the horizontal gradient value of the pixel with coordinate position (x, y) in the current frame.

Indicates the vertical gradient value of the pixel with coordinate position (x, y) in the current frame.

Represents the temporal directional gradient value of the pixel with coordinate position (x, y) in the current frame.

and

It is calculated by the 3D-sobel operator, α represents the gradient adjustment factor in the horizontal direction, β represents the gradient adjustment factor in the vertical direction, γ represents the gradient adjustment factor in the time domain direction, and α+β+γ=1.

与现有技术相比，本发明的优点在于：Compared with the prior art, the advantages of the present invention are:

本发明方法充分考虑了人眼视觉系统感知特性和全景视频特点，利用空域JND阈值(视觉感知信息)作为空域感知因子，通过加权梯度值(视觉感知信息)获得运动感知因子，进而计算得到最大编码单元中的所有像素点的时空加权感知因子的平均值，根据率失真优化理论计算最大编码单元的基于时空加权感知因子的拉格朗日系数调节因子，并进一步得到最大编码单元的基于时空加权感知因子的量化参数变化量；同时本发明方法考虑ERP投影格式的全景视频的维度权重特点，计算最大编码单元的基于维度权重的量化参数变化量；根据两个量化参数变化量计算最大编码单元的新的编码量化参数，并运用于编码。本发明方法能针对具体最大编码单元的时空域以及全景纬度特征，自适应地调节编码量化参数，实验测试表明本发明方法能在保证编码质量的同时，有效降低编码码率，从而能够有效降低编码复杂度，且率失真性能显著提升，特别是针对初始编码量化参数较小的情况，编码效果更优。The method of the present invention fully considers the perceptual characteristics of the human visual system and the characteristics of panoramic video, uses the spatial JND threshold (visual perception information) as the spatial perception factor, obtains the motion perception factor through the weighted gradient value (visual perception information), and then calculates the average value of the spatiotemporal weighted perception factor of all pixels in the maximum coding unit, calculates the Lagrange coefficient adjustment factor based on the spatiotemporal weighted perception factor of the maximum coding unit according to the rate-distortion optimization theory, and further obtains the quantization parameter change based on the spatiotemporal weighted perception factor of the maximum coding unit; at the same time, the method of the present invention considers the dimensional weight characteristics of the panoramic video in the ERP projection format, calculates the quantization parameter change based on the dimensional weight of the maximum coding unit; calculates the new coding quantization parameter of the maximum coding unit according to the two quantization parameter changes, and applies it to coding. The method of the present invention can adaptively adjust the coding quantization parameter for the spatiotemporal domain and the panoramic latitude characteristics of the specific maximum coding unit. Experimental tests show that the method of the present invention can effectively reduce the coding bit rate while ensuring the coding quality, thereby effectively reducing the coding complexity, and the rate-distortion performance is significantly improved, especially for the case where the initial coding quantization parameter is small, the coding effect is better.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明方法的总体实现框图。FIG1 is a general implementation block diagram of the method of the present invention.

具体实施方式DETAILED DESCRIPTION

以下结合附图实施例对本发明作进一步详细描述。The present invention is further described in detail below with reference to the accompanying drawings.

本发明提出的一种基于全景视觉感知特性的视频低复杂度编码方法，其总体实现框图如图1所示，其包括以下步骤：The present invention proposes a low-complexity video encoding method based on panoramic visual perception characteristics, and its overall implementation block diagram is shown in FIG1 , which includes the following steps:

步骤1：将ERP(Equirectangular Projection)投影格式的全景视频中当前待编码的视频帧定义为当前帧；其中，ERP投影格式的全景视频中的视频帧的宽度为W且高度为H。Step 1: define a video frame to be encoded in a panoramic video in an ERP (Equirectangular Projection) projection format as a current frame; wherein the width of the video frame in the panoramic video in the ERP projection format is W and the height is H.

步骤2：判断当前帧是否为第1帧视频帧，如果是，则采用HEVC视频编码器的原始算法对当前帧进行编码，然后执行步骤10；否则，执行步骤3。Step 2: Determine whether the current frame is the first video frame. If so, encode the current frame using the original algorithm of the HEVC video encoder, and then execute step 10; otherwise, execute step 3.

步骤3：对当前帧中的每个像素点进行空域JND(Just Noticeable Distortion，恰可察觉失真)阈值计算，得到当前帧的全景空域JND阈值图，记为G₁，G₁中的每个像素点的像素值即为当前帧中对应像素点的空域JND阈值；并对当前帧中的每个像素点进行加权梯度计算，得到当前帧的加权梯度图，记为G₂，G₂中的每个像素点的像素值即为当前帧中对应像素点的加权梯度值；空域JND阈值越大表征恰可察觉失真越大，即对应区域的空域掩蔽性越强；反之，空域JND阈值越小则对应区域的空域掩蔽性越弱。Step 3: Perform spatial JND (Just Noticeable Distortion) threshold calculation on each pixel in the current frame to obtain the panoramic spatial JND threshold map of the current frame, denoted as _G1 , and the pixel value of each pixel in _G1 is the spatial JND threshold of the corresponding pixel in the current frame; and perform weighted gradient calculation on each pixel in the current frame to obtain the weighted gradient map of the current frame, denoted as _G2 , and the pixel value of each pixel in _G2 is the weighted gradient value of the corresponding pixel in the current frame; the larger the spatial JND threshold, the greater the just noticeable distortion, that is, the stronger the spatial masking of the corresponding area; conversely, the smaller the spatial JND threshold, the weaker the spatial masking of the corresponding area.

在本实施例中，G₁的获取方式为：采用现有的经典的空域恰可察觉失真模型对当前帧中的每个像素点进行空域JND阈值计算得到G₁。In this embodiment, G ₁ is obtained by using an existing classic spatial just noticeable distortion model to perform spatial JND threshold calculation on each pixel point in the current frame to obtain G ₁ .

在本实施例中，G₂的获取过程为：将G₂中坐标位置为(x,y)的像素点的像素值记为G₂(x,y)，

表示水平方向，

表示垂直方向，

表示时域方向，

表示当前帧中坐标位置为(x,y)的像素点的水平方向梯度值，

表示当前帧中坐标位置为(x,y)的像素点的垂直方向梯度值，

表示当前帧中坐标位置为(x,y)的像素点的时域方向梯度值，即为当前帧中坐标位置为(x,y)的像素点沿时域方向与前一帧视频帧中坐标位置为(x,y)的像素点的梯度值，

和

由现有的3D-sobel算子计算得到，α表示水平方向的梯度调节因子，β表示垂直方向的梯度调节因子，γ表示时域方向的梯度调节因子，α+β+γ＝1，在本实施例中α取值为0.25、β取值为0.25、γ取值为0.5。In this embodiment, the process of obtaining G ₂ is as follows: the pixel value of the pixel point with coordinate position (x, y) in G ₂ is recorded as G ₂ (x, y),

Indicates the horizontal direction,

Indicates the vertical direction,

represents the time domain direction,

Indicates the temporal gradient value of the pixel with coordinate position (x, y) in the current frame, that is, the gradient value of the pixel with coordinate position (x, y) in the current frame along the temporal direction with the pixel with coordinate position (x, y) in the previous video frame.

and

It is calculated by the existing 3D-sobel operator, α represents the gradient adjustment factor in the horizontal direction, β represents the gradient adjustment factor in the vertical direction, γ represents the gradient adjustment factor in the time domain direction, α+β+γ=1, and in this embodiment, α is taken as 0.25, β is taken as 0.25, and γ is taken as 0.5.

然后计算当前帧中的每个像素点的时空加权感知因子，将当前帧中坐标位置为(x,y)的像素点的时空加权感知因子记为δ(x,y)，δ(x,y)＝δ_A(x,y)×δ_T(x,y)；再计算当前帧中的所有像素点的时空加权感知因子的平均值，记为S_δ，

计算当前帧中的每个像素点的维度权重，将当前帧中坐标位置为(x,y)的像素点的维度权重记为w_ERP(x,y)，

其中，0≤x≤W-1,0≤y≤H-1，G₁(x,y)表示G₁中坐标位置为(x,y)的像素点的像素值，G₁(x,y)亦表示当前帧中坐标位置为(x,y)的像素点的空域JND阈值，G₂(x,y)表示G₂中坐标位置为(x,y)的像素点的像素值，G₂(x,y)亦表示当前帧中坐标位置为(x,y)的像素点的加权梯度值，S_F表示G₂中的所有像素点的像素值的平均值，S_F亦表示当前帧中的所有像素点的加权梯度值的平均值，

ε为运动感知常数，ε∈[1,2]，在本实施例中ε取值为1，cos()为余弦函数，π＝3.14…。Step 4: Calculate the spatial perception factor of each pixel in the current frame, and record the spatial perception factor of the pixel with coordinate position (x, y) in the current frame as δ _A (x, y), δ _A (x, y) = G ₁ (x, y); and calculate the motion perception factor of each pixel in the current frame, and record the motion perception factor of the pixel with coordinate position (x, y) in the current frame as δ _T (x, y),

Then, the spatiotemporal weighted perception factor of each pixel in the current frame is calculated, and the spatiotemporal weighted perception factor of the pixel with coordinate position (x, y) in the current frame is recorded as δ(x, y), δ(x, y) = δ _A (x, y) × δ _T (x, y); then the average spatiotemporal weighted perception factor of all pixels in the current frame is calculated, recorded as S _δ ,

Calculate the dimension weight of each pixel in the current frame, and record the dimension weight of the pixel with coordinate position (x, y) in the current frame as w _ERP (x, y).

Wherein, 0≤x≤W-1, 0≤y≤H-1, _G1 (x,y) represents the pixel value of the pixel with coordinate position (x,y) in _G1 , _G1 (x,y) also represents the spatial JND threshold of the pixel with coordinate position (x,y) in the current frame, _G2 (x,y) represents the pixel value of the pixel with coordinate position (x,y) in _G2 , _G2 (x,y) also represents the weighted gradient value of the pixel with coordinate position (x,y) in the current frame, _SF represents the average pixel value of all pixels in _G2 , _SF also represents the average weighted gradient value of all pixels in the current frame,

ε is a motion perception constant, ε∈[1,2], in this embodiment, ε is 1, cos() is a cosine function, π=3.14….

在本实施例中，ERP投影格式由于各个纬度采用不同程度像素采样，平面中不同维度存在不同像素冗余，且两极极度拉升冗余最为明显，因此球体投影到ERP投影格式后，通常以球体中心为基点，ERP投影格式的经度θ与球体的球面的经度对应，ERP投影格式的纬度

与球体的球面的纬度对应，θ∈[-π,π]，

考虑到全景纬度的特点，引入ERP投影格式的维度权重参数w_ERP(x,y)。In this embodiment, the ERP projection format uses different degrees of pixel sampling at each latitude, and different dimensions in the plane have different pixel redundancy, and the extreme pull-up redundancy is most obvious. Therefore, after the sphere is projected into the ERP projection format, the center of the sphere is usually used as the base point. The longitude θ of the ERP projection format corresponds to the longitude of the spherical surface of the sphere, and the latitude of the ERP projection format is

Corresponding to the latitude of the spherical surface of the sphere, θ∈[-π,π],

Taking into account the characteristics of panoramic latitude, the dimension weight parameter w _ERP (x, y) of the ERP projection format is introduced.

步骤5：将当前帧中当前待处理的最大编码单元(Largest Coding Unit，LCU)定义为当前最大编码单元。Step 5: Define the largest coding unit (LCU) to be processed in the current frame as the current largest coding unit.

步骤6：计算当前最大编码单元中的所有像素点的时空加权感知因子的平均值，记为S_{δ_LCU}，

然后计算当前最大编码单元的基于时空加权感知因子的拉格朗日系数调节因子，记为Ψ_LCU，

再计算当前最大编码单元的基于时空加权感知因子的量化参数变化量，记为ΔQP₁，ΔQP₁＝3log₂(Ψ_LCU)；其中，0≤i≤63,0≤j≤63，δ_LCU(i,j)表示当前最大编码单元中块内坐标位置为(i,j)的像素点的时空加权感知因子，K_LCU和B_LCU均为调节参数，K_LCU∈(0,1)，B_LCU∈(0,1)，在本实施例中通过大量实验最终确定K_LCU和B_LCU均取值为0.5。Step 6: Calculate the average value of the spatiotemporal weighted perceptual factors of all pixels in the current largest coding unit, denoted as S _{δ_LCU} ,

Then, the Lagrangian coefficient adjustment factor based on the spatiotemporal weighted perceptual factor of the current largest coding unit is calculated, denoted as Ψ _LCU ,

Then calculate the quantization parameter change based on the spatiotemporal weighted perceptual factor of the current largest coding unit, denoted as ΔQP ₁ , ΔQP ₁ =3log ₂ (Ψ _LCU ); wherein 0≤i≤63,0≤j≤63, δ _LCU (i,j) represents the spatiotemporal weighted perceptual factor of the pixel point with coordinate position (i,j) in the block of the current largest coding unit, K _LCU and B _LCU are both adjustment parameters, K _LCU ∈(0,1), B _LCU ∈(0,1), and in this embodiment, through a large number of experiments, it is finally determined that K _LCU and B _LCU are both 0.5.

步骤7：计算当前最大编码单元中的所有像素点的维度权重的平均值，记为S_{wERP_LCU}，

再计算当前最大编码单元的基于维度权重的量化参数变化量，记为ΔQP₂，

其中，w_{ERP_LCU}(i,j)表示当前最大编码单元中块内坐标位置为(i,j)的像素点的维度权重，a和b均为调节参数，a∈(0,1)，b∈(0,1)，b＜a，在本实施例中通过大量实验最终确定a取值为0.85、b取值为0.3。Step 7: Calculate the average value of the dimension weights of all pixels in the current largest coding unit, denoted as S _{wERP_LCU} ,

Then calculate the change of the quantization parameter based on the dimension weight of the current largest coding unit, recorded as ΔQP ₂ ,

Among them, w _{ERP_LCU} (i, j) represents the dimensional weight of the pixel point with coordinate position (i, j) in the block in the current largest coding unit, a and b are adjustment parameters, a∈(0,1), b∈(0,1), b＜a. In this embodiment, through a large number of experiments, it is finally determined that the value of a is 0.85 and the value of b is 0.3.

然后用QP_new更新当前最大编码单元的编码量化参数；再采用HEVC视频编码器对当前最大编码单元进行编码；其中，QP_org表示当前最大编码单元的原始的编码量化参数，QP_org可以从编码器的初始化参数列表中读取，符号

为向下取整运算符号。Step 8: Calculate the new encoding quantization parameter of the current largest coding unit, denoted as QP _new ,

Then, QP _new is used to update the encoding quantization parameter of the current maximum coding unit; the HEVC video encoder is used to encode the current maximum coding unit; wherein, QP _org represents the original encoding quantization parameter of the current maximum coding unit, and QP _org can be read from the initialization parameter list of the encoder, and the symbol

The floor operator symbol.

步骤9：将当前帧中下一个待处理的最大编码单元作为当前最大编码单元，然后返回步骤6继续执行，直至当前帧中的所有最大编码单元均处理完毕，再执行步骤10。Step 9: Take the next maximum coding unit to be processed in the current frame as the current maximum coding unit, and then return to step 6 to continue executing until all maximum coding units in the current frame are processed, and then execute step 10.

为了进一步说明本发明方法的性能，对本发明方法进行测试。In order to further illustrate the performance of the method of the present invention, the method of the present invention was tested.

选取HEVC视频编码器标准参考软件HM16.14作为实验测试平台，硬件配置为Intel(R)Core(TM)i7-10700 CPU，主频2.9GHz，内存为32G的64位WIN10操作系统，开发工具选择VS2013。选取4个全景视频序列作为标准测试序列，分别为：两个4K序列“AerialCity”、“DrivingInCity”以及两个6K序列“BranCastle2”、“Landing2”。每个标准测试序列的测试帧数为100帧，采用帧内编码方式，设置SearchRange(搜索范围)为64，设置MaxPartitionDepth(最大递归深度)为4，初始编码量化参数QP(即原始的编码量化参数QP_org)分别取为22、27、32、37。The HEVC video encoder standard reference software HM16.14 was selected as the experimental test platform. The hardware configuration was Intel(R) Core(TM) i7-10700 CPU, main frequency 2.9GHz, 64-bit WIN10 operating system with 32G memory, and VS2013 was selected as the development tool. Four panoramic video sequences were selected as standard test sequences, namely: two 4K sequences "AerialCity" and "DrivingInCity" and two 6K sequences "BranCastle2" and "Landing2". The number of test frames of each standard test sequence was 100 frames, intra-frame coding was used, SearchRange (search range) was set to 64, MaxPartitionDepth (maximum recursive depth) was set to 4, and the initial coding quantization parameter QP (i.e. the original coding quantization parameter QP _org ) was taken as 22, 27, 32, and 37 respectively.

表1列出了“AerialCity”、“DrivingInCity”“BranCastle2”、“Landing2”4个全景视频序列的相关参数信息。Table 1 lists the relevant parameter information of the four panoramic video sequences: “AerialCity”, “DrivingInCity”, “BranCastle2” and “Landing2”.

表1全景视频序列的相关参数信息Table 1. Related parameter information of panoramic video sequence

全景视频序列Panoramic video sequence 视频分辨率Video resolution AerialCityAerialCity 3840×19203840×1920 DrivingInCityDrivingInCity 3840×19203840×1920 BranCastle2BranCastle2 6144×30726144×3072 Landing2Landing2 6144×30726144×3072

表2列出了采用本发明方法对表1列出的全景视频序列进行编码，与采用HM16.14原始平台方法相比，编码码率的节省情况。定义采用本发明方法进行编码相比于采用HM16.14原始平台方法进行编码的编码码率节省率为ΔR_PRO，ΔR_PRO＝(R_ORG-R_PRO)/R_ORG×100(％)，其中，R_PRO表示采用本发明方法进行编码的编码码率，R_ORG表示采用HM16.14原始平台方法进行编码的编码码率。Table 2 lists the encoding bit rate savings of the panoramic video sequences listed in Table 1 by using the method of the present invention to encode the panoramic video sequences listed in Table 1, compared with using the HM16.14 original platform method. The encoding bit rate savings of the encoding method of the present invention compared with using the HM16.14 original platform method is defined as ΔR _PRO , ΔR _PRO =(R _ORG -R _PRO )/R _ORG ×100(%), where R _PRO represents the encoding bit rate of the encoding method of the present invention, and R _ORG represents the encoding bit rate of the encoding method of the HM16.14 original platform method.

表2采用本发明方法进行编码相比于采用HM16.14原始平台方法进行编码的编码码率节省比较情况Table 2 Comparison of the coding rate savings of the coding method of the present invention compared to the coding method of the HM16.14 original platform

从表2中可以看出，采用本发明方法进行编码能够平均节省编码码率12.9％。针对4个不同场景、不同运动情况的全景视频序列，采用本发明方法进行编码均能有效地降低编码码率，特别是针对初始编码量化参数QP(即原始的编码量化参数QP_org)较小的情况，编码效果更优。It can be seen from Table 2 that the encoding method of the present invention can save an average of 12.9% of the encoding bit rate. For the panoramic video sequences of 4 different scenes and different motion conditions, the encoding method of the present invention can effectively reduce the encoding bit rate, especially for the case where the initial encoding quantization parameter QP (i.e., the original encoding quantization parameter QP _org ) is small, the encoding effect is better.

表3列出了采用本发明方法对表1列出的全景视频序列进行编码的率失真性能。采用经典的主观质量评价方法，评估编码的视频质量，在该质量评价中，采用主观质量评价方法MOS((Mean Opinion Score)作为质量评价指标，分别计算出各全景视频序列在该主观质量评价方法MOS下的率失真性能指标BDBR_MOS，以综合评价本发明方法的性能。Table 3 lists the rate-distortion performance of the panoramic video sequences listed in Table 1 encoded by the method of the present invention. The classical subjective quality evaluation method is used to evaluate the encoded video quality. In this quality evaluation, the subjective quality evaluation method MOS (Mean Opinion Score) is used as the quality evaluation index, and the rate-distortion performance index BDBR _MOS of each panoramic video sequence under the subjective quality evaluation method MOS is calculated to comprehensively evaluate the performance of the method of the present invention.

表3采用本发明方法进行编码的率失真性能Table 3 Rate-distortion performance of encoding using the method of the present invention

从表3中可以看出，本发明方法采用BDBR_MOS率失真性能评价指标，表征在相同主观质量条件下，在质量评价指标MOS下编码码率节省均值在-7.4％左右。这说明与HM16.14原始平台方法相比，在相同主观感知质量下，本发明方法能节省更多编码码率。从表3中可以看到针对全景视频序列不同场景、不同运动情况，本发明方法能有效节约编码码率，且率失真性能显著提升。As can be seen from Table 3, the method of the present invention adopts the BDBR _MOS rate-distortion performance evaluation index, which indicates that under the same subjective quality conditions, the average encoding bit rate saving under the quality evaluation index MOS is about -7.4%. This shows that compared with the HM16.14 original platform method, under the same subjective perceived quality, the method of the present invention can save more encoding bit rate. From Table 3, it can be seen that for different scenes and different motion conditions of panoramic video sequences, the method of the present invention can effectively save encoding bit rate, and the rate-distortion performance is significantly improved.

Claims

1. A video low-complexity coding method based on panoramic visual perception characteristics is characterized by comprising the following steps:

step 1: defining a current video frame to be coded in the panoramic video in the ERP projection format as a current frame; the width of a video frame in the panoramic video in the ERP projection format is W, and the height is H;

step 2: judging whether the current frame is a 1 st frame video frame or not, if so, adopting an original algorithm of an HEVC video encoder to encode the current frame, and then executing a step 10; otherwise, executing the step 3;

step 3: performing airspace JND threshold calculation on each pixel point in the current frame to obtain a panoramic airspace JND threshold diagram of the current frame, and marking the panoramic airspace JND threshold diagram as G ₁ ，G ₁ The pixel value of each pixel point in the current frame is the airspace JND threshold value of the corresponding pixel point in the current frame; and performing weighted gradient calculation on each pixel point in the current frame to obtain a weighted gradient map of the current frame, which is marked as G ₂ ，G ₂ The pixel value of each pixel point in the current frame is the weighted gradient value of the corresponding pixel point in the current frame;

step 4: calculating the airspace perception factor of each pixel point in the current frame, and marking the airspace perception factor of the pixel point with the coordinate position of (x, y) in the current frame as delta _A (x,y)，δ _A (x,y)＝G ₁ (x, y); calculating motion perception factors of each pixel point in the current frame, and recording the motion perception factors of the pixel points with the coordinate positions of (x, y) in the current frame as delta _T (x,y)，

Then calculating the space-time weighted perceptron of each pixel point in the current frame, and recording the space-time weighted perceptron of the pixel point with the coordinate position of (x, y) in the current frame as delta (x, y), wherein delta (x, y) =delta _A (x,y)×δ _T (x, y); calculating the space-time weighted perception factors of all pixel points in the current frameIs denoted as S _δ The method comprises the steps of carrying out a first treatment on the surface of the Calculating the dimension weight of each pixel point in the current frame, and marking the dimension weight of the pixel point with the coordinate position of (x, y) in the current frame as w _ERP (x,y)，

Wherein x is more than or equal to 0 and less than or equal to W-1, y is more than or equal to 0 and less than or equal to H-1, G ₁ (x, y) represents G ₁ Pixel value, G, of pixel point with middle coordinate position (x, y) ₁ (x, y) also represents the spatial JND threshold, G, of the pixel point with the coordinate position (x, y) in the current frame ₂ (x, y) represents G ₂ Pixel value, G, of pixel point with middle coordinate position (x, y) ₂ (x, y) also represents the weighted gradient value of the pixel point with the coordinate position (x, y) in the current frame, S _F Represents G ₂ Average value of pixel values of all pixel points in (S) _F Also represents the average value of weighted gradient values of all pixel points in the current frame, epsilon is a motion perception constant, epsilon is [1,2 ]]Cos () is a cosine function;

step 5: defining a current maximum coding unit to be processed in a current frame as a current maximum coding unit;

step 6: calculating the average value of the space-time weighted perception factors of all pixel points in the current maximum coding unit, and recording the average value as S _{δ_LCU} The method comprises the steps of carrying out a first treatment on the surface of the Then calculating Lagrange coefficient adjusting factor based on space-time weighted perception factor of the current maximum coding unit, and marking as ψ _LCU ，

Calculating the quantization parameter variation of the current maximum coding unit based on the space-time weighted perception factor, and recording the quantization parameter variation as delta QP ₁ ，ΔQP ₁ ＝3log ₂ (Ψ _LCU ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein K is _LCU And B _LCU Are all adjusting parameters, K _LCU ∈(0,1)，B _LCU ∈(0,1)；

Step 7: calculating the average value of the dimension weights of all pixel points in the current maximum coding unit, and recording the average value as

Calculating the quantization parameter variation based on the dimension weight of the current maximum coding unit, and recording the quantization parameter variation as delta QP ₂ ，

Wherein a and b are both adjusting parameters, a epsilon (0, 1), b < a;

step 8: calculating new coding quantization parameter of current maximum coding unit, and recording as QP _new ，

Then use QP _new Updating the coding quantization parameter of the current maximum coding unit; coding the current maximum coding unit; wherein QP is _org Original coding quantization parameter representing current maximum coding unit, symbol

Rounding down the operator;

step 9: taking the next largest coding unit to be processed in the current frame as the current largest coding unit, returning to the step 6 to continue execution until all the largest coding units in the current frame are processed, and executing the step 10;

step 10: and taking a video frame to be encoded of the next frame in the panoramic video in the ERP projection format as a current frame, and returning to the step 2 to continue execution until all video frames in the panoramic video in the ERP projection format are encoded.

2. The method for encoding video with low complexity based on panoramic visual perception characteristics as recited in claim 1, wherein in said step 3, G ₁ The acquisition mode of (a) is as follows: performing spatial JND threshold calculation on each pixel point in the current frame by adopting spatial just noticeable distortion model to obtain G ₁ 。

3. A panoramic-vision-based perceptual feature as defined in claim 1 or 2A low complexity video coding method, characterized in that in step 3, G ₂ The acquisition process of (1) is as follows: will G ₂ The pixel value of the pixel point with the middle coordinate position of (x, y) is marked as G ₂ (x,y)，

Wherein x is more than or equal to 0 and less than or equal to W-1, y is more than or equal to 0 and less than or equal to H-1, G ₂ (x, y) also represents the weighted gradient value of the pixel point with the coordinate position (x, y) in the current frame, ">

Indicating the direction of the horizontal direction,

indicates the vertical direction +.>

Representing the time domain direction +_>

Horizontal gradient value representing pixel point with coordinate position (x, y) in current frame, +.>

Vertical gradient value representing pixel point with coordinate position (x, y) in current frame, +.>

Time domain direction gradient value representing pixel point with coordinate position (x, y) in current frame, +.>

And

calculated by a 3D-sobel operator, alpha represents a gradient adjustment factor in the horizontal direction, and beta represents the vertical directionThe gradient adjustment factor in the direction, γ represents the gradient adjustment factor in the time domain direction, α+β+γ=1. />