Method for evaluating immersive video transcoding scheme
Technical Field
The invention relates to the field of computational vision, in particular to a method for evaluating an immersive video transcoding scheme.
Background
In the process of video acquisition, transmission and storage, the video needs to be compressed and encoded to meet the requirements of storage and transmission. In the process of video compression and encoding, the main parameters which have a large influence on the subjective quality of the encoded video include three parameters, namely resolution, frame rate and quantization step size. In past researches, it has been found that the subjective quality influence of each of the three parameters on the coded video is related to the video content, for example, for a video scene containing a high-speed moving object, the influence of the change of the frame rate on the subjective quality is more obvious than that of a natural landscape video with slow change. In addition, the situation that the subjective quality of the same video content has a large difference under the same code rate due to the difference of the encoding parameters also easily causes the waste of transmission and storage resources. Therefore, how to evaluate the quality of the coding scheme in the compression process becomes a very critical problem for different video contents, and is also a problem that is always explored in the development process of video quality evaluation technology.
The video quality evaluation technology aims to evaluate the video quality after lossy processing such as compression and transmission. The existing video quality evaluation algorithm can be divided into subjective evaluation and objective evaluation on the aspect of a method. Objective quality assessment is mainly divided into three categories according to the amount of information provided by reference videos: full reference methods, half reference methods, and no reference methods.
The full-reference method requires both original (no quality loss) video and lossy video, and is easy to implement and apply, and the method mainly compares the information content of two pieces of video with the same content and the similarity of certain characteristics. The non-reference method only depends on the lossy video, and the implementation difficulty is high, and some common implementation methods at present include some specific algorithms to detect specific types of distortion for evaluation, such as fuzzy detection, noise detection, edge analysis, and the like. The semi-reference method only needs partial information of an original video or a reference video or extracts partial characteristics as a reference for evaluating the quality of a lossy video, the method provides a solution under the condition that the reference video information cannot be completely acquired, the application in an actual system can provide a more stable and accurate evaluation result than that without reference, and meanwhile, the unnecessary consumption of storage and transmission resources caused by the full-reference method is avoided. Based on the above discussion, for the coding scheme to evaluate the video quality evaluation scene with the original high-quality reference, if coding is performed first to obtain the lossy video and then evaluation is performed, not only is the calculation resource wasted, but also more processing time is required, and ultra-low delay response cannot be achieved. Therefore, establishing the link between the coding parameters and the video quality based on the idea of the semi-reference video quality evaluation technology is a reasonable choice for solving the aforementioned problems.
With the development of software and hardware technologies, immersive media contents such as VR (virtual reality) and AR (augmented reality) gradually enter the consumer market, and play more and more important roles in the fields of education, medical treatment, entertainment and the like. Immersive interaction methods are very different from conventional flat video interaction in both viewing environment and user freedom. When a user watches a traditional flat video, the screen of the playing device can only cover a local area in the center of the visual field of the user, and the user has no freedom in watching content. In an immersive viewing environment, the video content generally covers the entire field of view of the user, isolating most of the unnecessary external visual interference. Meanwhile, the 360-degree video content gives a higher degree of freedom to the user, the view field of the user can only cover part of the video content which is selected by the user at a certain moment, the user can change the direction and the position according to the subjective intention of the user in the watching process, and the attention of the user is usually focused on the central part of the current view field.
Based on the above changes, the conventional method for evaluating the common quality of the flat video cannot meet the requirements of immersive viewing. On one hand, the change of the viewing environment can bring about the change of the subjective quality perception characteristics of the user, and some quality evaluation models applied to the traditional plane video can have larger errors on the evaluation result; on the other hand, directly evaluating the quality of a complete panoramic video has not been able to accommodate the local features of a user's focus in an immersive viewing environment.
However, an immersive video quality evaluation model which can be designed specifically for the above changes and has high practicability is not yet proposed, and how to carry out reasonable design optimization to adapt to changes brought by an immersive viewing environment to a video quality evaluation technology becomes a very important subject how to directly link video coding parameters with video subjective quality to solve coding scheme evaluation.
Disclosure of Invention
In view of the above prior art variations and features, it is an object of the present invention to propose a method of evaluating an immersive video transcoding scheme.
The invention utilizes the technology of a semi-reference quality evaluation model, takes the coding parameters and a small amount of characteristics of the original video as model input, and can output the quality loss of the coding parameters relative to the original high-quality video through simple calculation, and the technical scheme is specifically adopted as follows:
a method of evaluating an immersive video transcoding scheme, comprising the steps of:
step 1, dividing a complete high-bit-rate panoramic video into a plurality of regular rectangles, wherein the size of each rectangle is smaller than one half of the size of the corresponding user in the transverse direction and the longitudinal direction;
step 2, conducting sobel filtering on each block of each frame in the video after blocking to extract edge characteristic information, obtaining corresponding gradient domain information, then making difference on the corresponding blocks of the two continuous frames before and after blocking, and taking mean value sigma of the obtained residual errormeanTaking the standard deviation sigma of the sumstd;
Step 3, after the position information of the user view field is obtained, calculating the parameters of the quality evaluation model according to the coverage condition of the view field to each block video:
wherein
Is a coefficient, alpha, describing the variation of video quality with decreasing video resolution
qIs to describe the coefficient of change, alpha, of video quality as the compression quantization step increases
tIs a coefficient describing the change of video quality as the frame rate decreases, and
n denotes the number of blocks covered by the current field of view, s
kRepresenting the area, s, covered by the field of view of each block of video
FoVWhich represents the area of coverage of the field of view,
and
representing the characteristic results of each of the segmented videos,
the sum of the characteristics of all the blocked videos covered by the field of view;
step 4, calculating a quality evaluation model, wherein a specific formula is as follows:
wherein Q (s, Q, t) is a video quality assessment prediction result after encoding according to a given parameter; resolution ratio
Frame rate
Quantization step size
S
tar、T
tarAnd Q
tarThree coding parameters, S, representing the corresponding actual resolution, frame rate and quantization step size at transcoding
ori,T
oriAnd Q
tarRepresenting the resolution, frame rate and quantization step size, alpha, of the original high quality video
sIs input dependent Q
tarA numerical value;
and 5, evaluating each coding scheme by using the methods in the steps 1 to 4 to obtain a corresponding quality evaluation prediction result, and selecting a corresponding resolution, frame rate and quantization step size parameter combination when the Q (s, Q, t) value is maximum as a final coding scheme.
The invention provides a semi-reference video quality evaluation method capable of adapting to an immersive viewing environment, and a direct mapping relation between coding parameters and video quality can be established by combining feature-dependent model parameters and taking the coding parameters as two key points of input. In addition, the block feature prediction module is added to improve the response speed of the actual system during deployment and operation by combining the user behavior characteristics in the immersive viewing environment. The method not only directly establishes the direct relation between the compression coding parameters and the video quality, but also can realize the self-adaptation of the video content based on the characteristic dependence of the model parameters, thereby realizing the immersive video quality evaluation method with strong practicability and high accuracy, being used for evaluating different coding schemes and selecting the optimized coding scheme according to the evaluation result.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic view of video blocking.
Detailed Description
Referring to fig. 1, the method for the server to evaluate the immersive video transcoding scheme of the invention specifically comprises the following steps:
step 1, as shown in fig. 2, aiming at the randomness characteristic that only the image in the view field is visible to the user and the view field of the user is selected in the viewing process of the panoramic video, in order to accelerate the subsequent processing, firstly, the complete high-bit-rate panoramic video is partitioned, the subsequent partitioning characteristic calculation steps are matched to deal with the view field of the user which may appear at any position, one complete panoramic video is divided into a plurality of regular rectangles, the size of each rectangle in the horizontal direction and the longitudinal direction is smaller than one half of the size of the corresponding view field in the horizontal direction and the longitudinal direction, and therefore the accuracy of the content characteristic calculation of the subsequent view field can be ensured. And a proper block feature calculation strategy is introduced to adapt to the randomly-appearing field position of the user, so that the calculation speed is optimized on the premise of ensuring the accuracy.
And 2, extracting the characteristics of each block of video content after the division is finished, wherein the required characteristics comprise the following characteristics on the premise of ensuring the result accuracy and simplifying the calculated amount:
the method is characterized in that: after each frame in the video is subjected to sobel filtering to extract edge features, the difference is made between two continuous frames, the residual error is averaged, N-1 numerical values are finally generated by the content of the N frames of video, the average value of the N-1 numerical values is recorded as
Where n is the corresponding block number.
The second characteristic: after each frame in the video is subjected to sobel filtering to extract edge features, the two continuous frames are subjected to subtraction, the standard deviation is taken for the residual error, N-1 numerical values are finally generated by the content of the N frames of the video, the average value of the N-1 numerical values is taken and recorded as
Where n is the corresponding block number.
Step 3, after the position information of the user view field is obtained, calculating the parameters of the quality evaluation model according to the coverage condition of the view field to each block video:
wherein sigma
meanAnd σ
stdBased on the video gradient domain characteristics calculated in the previous steps,
is a coefficient, alpha, describing the variation of video quality with decreasing video resolution
qIs to describe the coefficient of change, alpha, of video quality as the compression quantization step increases
tIs a coefficient describing the change of video quality as the frame rate decreases, and
n denotes the number of blocks covered by the current field of view, s
kRepresenting the area, s, covered by the field of view of each block of video
FoVWhich represents the area of coverage of the field of view,
and
representing the characteristic results of each of the segmented videos,
the sum of the characteristics of all the partitioned videos covered by the field of view.
Step 4, performing quality evaluation on a plurality of alternative coding schemes meeting the requirements, and substituting specific parameters of each combination into calculation under the condition that a plurality of different combinations of resolution, frame rate and quantization step size can meet the corresponding storage or transmission requirements, so as to obtain a quality evaluation result corresponding to each coding scheme, wherein the specific formula is as follows:
wherein alpha is
sIs input dependent Q
tarThe value, Q (s, Q, t), is the video quality assessment prediction result after encoding according to the given parameters; resolution ratio
Frame rate
Quantization step size
S
tar、T
tarAnd Q
tarThree coding parameters, S, representing the corresponding actual resolution, frame rate and quantization step size at transcoding
ori,T
oriAnd Q
tarRepresenting the parameters of the original high quality video.
And 5, after each coding scheme to be selected is evaluated, obtaining a corresponding quality evaluation prediction result, and selecting a corresponding resolution, frame rate and quantization step parameter combination when the Q (s, Q, t) value is maximum as a final coding scheme.
Based on the steps, the quality evaluation result corresponding to each coding scheme combination meeting the transcoding requirement can be obtained, and therefore the coding scheme with the optimal quality is selected. According to the method, under the condition that coding is not carried out, quality loss brought to the original video by the corresponding coding parameters can be evaluated only according to the coding parameters and the original high-quality video characteristics, and meanwhile, rapid calculation can be carried out on user view fields which randomly appear in an actual application scene based on the feature extraction and calculation strategy of blocks, so that the method is more suitable for high-freedom immersive video watching and high-quality panoramic video transmission scenes.