Background
Image Quality Assessment (IQA) is a fundamental problem in image processing, and is key to many applications, such as perceptual optimal encoder design, image communication, and image restoration. IQA can be classified into three categories according to the usability of the original image: full reference IQA (FR-IQA), no reference IQA (NR-IQA) and half reference IQA (RR-IQA). FR-IQA requires the entire original image, NR-IQA does not require any information from the original image, and RR-IQA requires partial information from the original image. For RR-IQA, the core problem of the metric is feature extraction. The features of the image may be local or global. The global features describe the visual features of the entire image. For example, Wang, z., Wu, g., Sheikh, h.r., simocell, e.p., et al. 'Quality-aware', IEEE trans. image Process,2006,15, (6), pp.1680-1689, the WNISM method models the distribution of the rotatable angle pyramid coefficients by a generalized gaussian density function, estimating the distribution parameters as image features. In order to further improve the performance of WNISM, Li, Q., Wang, Z. 'Reduced-referenced image quality using differential normalization-based image representation', IEEEjournal of Selected Topics in Signal Processing,2009,3, (2), pp.202-211, introduces decomposition normalization transformation to eliminate the coefficient correlation. Similar methods are used in other references. Due to the multi-channel nature of the human visual system, it has also been suggested to use normalized histograms based on geometric coefficients of the image representation.
Local features, on the other hand, describe the visual characteristics of small spots of the image. For example, based on the information theory, RRED proposed by soundarajan calculates local entropy of wavelet coefficients of each 3 × 3 block as an image feature to measure image quality. In the document shmadderman, a., Gusev, a., Eskiciogl, a.: An SVD-Based gray-scale quality measure for local and global assessment', IEEE trans. image process,2006,15, (2), pp.422-429, Singular Value Decomposition (SVD) is employed to extract singular values of 8 × 8 pixel blocks of An image as image features to quantify the image quality. From the point of view of fractal analysis theory, it has also been proposed that spmcr measures the difference in local regularity of phase consistency between reference and distorted images. In the document Xu, y, Liu, d, Quan, y, et al, 'frame analysis for reduced reference image quality assessment,' IEEE trans. image Process,2015,24, (7), pp.2098-2109, the distortion of each frequency band leads to a single quality degradation depending on the sensitivity of the human eye to the different frequency bands, the local Fractal dimension is used to calculate the visual information degradation on the different frequency bands separately for quality assessment. As we know, RR-IQA expects to use less reference image data and achieve higher prediction accuracy, so these methods have the disadvantage of using a larger amount of reference information.
Although the above-described image global feature achieves some good performance on RR-IQA, there are still some problems. One problem is that some of the features used by the RR-IQA metric are extracted directly from the image, and do not take into account that not all distortions are distinguishable in the human visual system, such as the SRRM method. Another problem is that some image features used by the RR-IQA metric are not robust across different distortion types. They perform well for images sharing the same distortion type, but they do not evaluate well when multiple distortion types are involved, such as WNISM. Therefore, it is necessary to provide an image evaluation method with good consistency between different distortion types.
Disclosure of Invention
The invention overcomes the defects of the prior art, and solves the technical problems that: the semi-reference image quality evaluation method based on the spectrum residual is provided to improve the consistency of semi-reference image quality evaluation.
In order to solve the technical problems, the invention adopts the technical scheme that: a semi-reference image quality evaluation method based on spectral residual errors is characterized by comprising the following steps: s101, calculating a residual spectrum regularity value RSR (Id) of an image Id to be evaluated; s102, calculating a residual spectrum regularity value RSR (Ir) of a reference image Ir; s103, calculating the RSRM value of the image to be evaluated and the reference image, wherein the calculation formula is as follows: RSRM (I)r,Id)=||RSR(Ir)-RSR(Id)||1(ii) a S104, evaluating the image quality according to the RSRM value, wherein the smaller the RSRM value is, the better the image quality is; the calculation process of the residual spectrum regularity value RSR of the image to be evaluated and the reference image is as follows: calculating the gradient size G (I) of the image I; calculating wavelet coefficients { DWT (I), DWT (G (I)) } of the images I and G (I); calculating the spectral residual errors SR of all components in the wavelet coefficients { DWT (I), DWT (G (I)) }; calculating the significant value SM of each component in the wavelet coefficients { DWT (I), DWT (G (I)) }; by fractalThe dimensionality codes the significant value of each component to obtain a residual spectrum regularity value RSR of a significant value SM of each component, and the calculation formula is that RSR (I) { FD [ SM (DWT (I), DWT (G (I))]And (4) the DWT represents a wavelet coefficient, and the FD represents a fractal dimension.
The formula for calculating the spectral residual SR of each component in the wavelet coefficients { DWT (I), DWT (G (I)) } is:
where L (F) denotes log (a (F)), a (F) denotes abs (F (I (x))), a (F) denotes an amplitude spectrum of the image I, and L (F) denotes a log spectrum; h is
nRepresenting an averaging filter of frequency domain size n,
denotes convolution operation, F denotes fourier transform, x denotes image space domain, and F denotes image frequency domain.
H isnRepresenting an averaging filter of frequency domain size 3.
The formula for calculating the significant value SM of each component in the wavelet coefficients { DWT (I), DWT (G (I)) } is:
wherein g (x) represents a two-dimensional Gaussian filter in the spatial domain, F
-1Denotes an inverse fourier transform, P (F) ═ angle (F (I (x))), denotes a phase spectrum of the image I; the above-mentioned
Representing a convolution operation.
The gradient size G (I) is calculated by the formula:
the above-mentioned
Which represents a convolution operation, is a function of,
the calculation formula of the wavelet coefficient is as follows:
wherein I represents an image, x and y represent image coordinates,
represents the "
DB 2" wavelet in the Daubechie wavelet function, M and N represent the wavelet scales, i and j represent the pixel locations, and M and N represent the size of the image.
Compared with the prior art, the invention has the following beneficial effects: the invention evaluates the image quality by utilizing wavelets, spectral residuals and fractal dimensions. Wavelets are intended to mimic the multi-channel structure of the human visual system, SR can indicate the saliency of an image, and fractal analysis can encode SR as an image feature. Firstly, extracting image sub-bands through wavelet transformation; then obtaining SR and SM to represent the significance of each sub-band; then using the fractal dimension to measure the irregularity of the SM; finally, all the calculated fractal dimensions are connected into a characteristic vector, the fractal dimension of the image to be evaluated and the fractal dimension of the reference image are compared, and a normal form solution is carried out once, so that the RSRM value of the image to be evaluated can be obtained. And, the consistency between different distortion types is also better.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a semi-reference image quality evaluation method based on spectral residual errors, which comprises the following steps:
s101, calculating a residual spectrum regularity value RSR (Id) of an image Id to be evaluated;
s102, calculating a residual spectrum regularity value RSR (Ir) of a reference image Ir;
s103, calculating the RSRM (residual spectrum rule similarity) value of the image to be evaluated and the reference image, wherein the calculation formula is as follows:
RSRM(Ir,Id)=||RSR(Ir)-RSR(Id)||1;
and S104, evaluating the image quality according to the RSRM value, wherein the smaller the RSRM value is, the better the image quality is.
Taking the image I as an example, the calculation process of the RSR (regularity of residual spectrum) value of the image I specifically includes:
(1) calculating the gradient size G (I) of the image I;
wherein, the gradient size G (I) is calculated by the formula:
in the formula (1-1),
which represents a convolution operation, is a function of,
(2) calculating wavelet coefficients { DWT (I), DWT (G (I)) } of the images I and G (I);
the calculation formula of the wavelet coefficient DWT is as follows:
in the formula (2-1), I represents an image, x and y represent image coordinates,
represents the "
DB 2" wavelet in the Daubechie wavelet function, M and N represent the wavelet scales, i and j represent the pixel locations, and M and N represent the size of the image.
(3) Calculating the spectral residual errors SR of all components in the wavelet coefficients { DWT (I), DWT (G (I)) };
the calculation formula of the spectrum residual SR of each component in the wavelet coefficients { DWT (I), DWT (G (I)) } is as follows:
in the formula (3-1), L (F) ═ log (a (F)), a (F) ═ abs (F (I (x))), represent the amplitude spectrum of the image I, and h represents the amplitude spectrum of the image I
nRepresenting an averaging filter of frequency domain size n,
denotes convolution operation, F denotes fourier transform, x denotes image space domain, and F denotes image frequency domain. The fourier transform may convert a spatial domain representation of the image to a frequency domain representation.
(4) Calculating the significant value SM of each component in the wavelet coefficients { DWT (I), DWT (G (I)) };
the calculation formula of the significant value SM of each component in the wavelet coefficient { DWT (I), DWT (G (I)) } is as follows:
in the formula (4-1), g (x) represents a two-dimensional Gaussian filter in the spatial domain, F
-1Denotes an inverse fourier transform, P (F) ═ angle (F (I (x))), denotes a phase spectrum of the image I; the above-mentioned
Representing a convolution operation.
(5) Coding the significant value of each component through the fractal dimension to obtain a residual spectrum regularity value RSR of the significant value SM of each component, wherein the calculation formula is as follows:
RSR={FD[SM(DWT(I),DWT(G(I))]}} (5-1)
in the formula (5-1), FD represents a fractal dimension, and in the embodiment of the present invention, a differential box Dimension (DBC) algorithm is used for calculating the fractal dimension, which is relatively to that of the fractal dimensionOther algorithms, differential box-dimension algorithms have the characteristics of accuracy, rapidness and efficiency, and the DBC considers an image I (x, y) with the size of M × M as a 3D point set { (x, y, z | z ═ I (x, y) }, (x, y) represents a plane position, z represents a gray value, an Xy plane is divided into a plurality of s × s grids, on each grid, a column of s × s × h boxes is provided, h represents the height of each box, G/h is M/s, and G represents the maximum gray value of the image
r(i, j)
l-k +1, and the number of boxes required to cover the entire image
The theoretical value of the fractal dimension is
However, in the data processing process, the fractal dimension FD coding is generally performed by calculating the slope through linear fitting.
By applying RSRM to the four largest databases, we have found that better results are obtained when the wavelet scale is about 5, as shown in fig. 1 for the SROCC curve versus scale for the four databases. Therefore, in the present invention, 5 is set as the number of scales, so that the characteristic length of RSR is 32 dimensions. More specifically, for RSR, we only get one fractal dimension feature on each SR. Since the wavelet decomposition has a scale of 5 (i.e., J-5), there is a matrix of 32 wavelet coefficients, including 2 low-pass components and 30 high-pass components. There is one SR per coefficient matrix, so the image features 32 × 1 ═ 32 dimensions. The RSR is obtained by coding and corresponds to a 32-dimensional feature vector.
In the actual image evaluation process, generally, the image of the client is evaluated, the image to be evaluated is the client film source, and the film source on the server is used as a reference image.
In step S103, the calculated RSRM value is a specific number, and the smaller the RSRM value is, the better the image quality is.
In order to verify the effect of the RSRM value on image evaluation provided by the invention, the average RSRM value of all image samples in the LIVE database is calculated, as shown in fig. 2, when the distortion degree is increased, the predicted quality value is increased, which indicates that the evaluation method of the invention predicts the image quality.
Furthermore, the inventors adopted TID2013 image database, TID2008 database, CSIQ image database and LIVE image database to evaluate the proposed RSRM metric. Table 1 lists the profile information for four databases. There were 6445 distorted images, each providing subjective evaluation with a (difference) mean opinion score (DMOS/MOS).
TABLE 1 four image databases
To evaluate the predicted performance, the inventors calculated five common performance indicators for the RSRM metric, Pearson Linear Correlation Coefficient (PLCC), Spearman Rank Order Correlation Coefficient (SROCC), Kendall Rank Order Correlation Coefficient (KROCC), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), respectively. Higher PLCC, SROCC, KROCC and lower RMSE, MAE value, IQA index are better. And compares these five common performance indicators of RSRM metrics with other representative RR-IQA metrics, which are: SSRM (spatial regular similarity), SRRM (Radon Radon projection rule similarity), WNISM (wavelet domain natural image statistical model), RR-SSIM, SPCRM (phase rule similarity), SVD (singular value decomposition), and FR-IQAMetrics (full reference image evaluation metric): SSIM (structural similarity), PSNR (peak signal-to-noise ratio).
The range of objective values obtained by different image quality evaluation models is different (for example, the range of SSIM is (0, 1), while the range of subjective evaluation values of the standard image library is (0, 100), which are obviously not in an order of magnitude, and for comparison, the objective evaluation values need to be mapped to the subjective evaluation values by using a nonlinear function to obtain predicted values.
wherein in the above formula, x represents an objective value, f (x) represents a predicted value, and the fitting parameter β1,β2,……β5Determined by the fitting process.
Then, the five common performance indicators of each measurement method are calculated, wherein RMSE represents the root mean square error, and the square of the difference between the predicted value and the subjective value of each image is calculated first, then averaged, and finally the square root is calculated. The calculation formula is as follows:
where Xi represents the predicted value of each picture, and Yi represents the subjective value of each picture.
The MAE represents an average absolute value, and the calculation formula is as follows, the index firstly calculates the absolute value of the difference between the predicted value and the subjective value of each image, and then the absolute value is averaged after summing. The calculation formula is as follows:
PLCC represents Pearson linear correlation coefficient, and its calculation formula is as follows,
wherein Xi represents the predicted value of each picture, Yi represents the subjective value of each picture,
represents the average of all the picture prediction values,
represents the average of subjective values of all images.
SROCC represents Spearman's rank correlation coefficient, which is calculated as follows,
in the formula (10-1), RXi and RYi indicate that the objective value and the subjective value of each image in the graph are respectively sorted in the same order (from large to small or from small to large), and for the serial number of the ith value in each sequence, n indicates the total number of the test images. KROCC is similar to SROCC.
Table 2 lists the results of the RSRM experiments in the 4 databases above and comparisons with other metrics, and highlights the best results for RR-IQA with underlining. As can be seen from table 2, RSRM performs best on the four databases compared to the RR-IQA metric. Compared with the FR-IQA metric, the performance of RSRM on four databases is far better than that of PSNR, while on TID2013 and LIVE databases, the performance of RSRM is slightly inferior to that of SSIM. In addition, the average PLCC, SROCC and KROCC results for the four databases are also provided in table 2. There are two averaging schemes, the first is direct averaging of the performance indicators. In the second scenario, the performance index is measured by the size of the database. We can see the RSRM average performance is best compared to the RR-IQA/FR-IQA index.
TABLE 2 composite index of individual image quality evaluation metrics in four image databases
The inventor of the invention not only verifies the superior performance of the RSRM through typical performance indexes, but also can see the superior performance of the RSRM on a scatter diagram which is objectively and subjectively evaluated. FIGS. 3-10 show scatter plots of subjective evaluations and objective evaluations obtained by the IQA indicators (including RSRM, SRRM, WNISM, SPCRM, SSRM, SVD, SSIM and PSNR) in the CSIQ database. The horizontal axis in fig. 3 represents objective values of various metrics, the vertical axis represents subjective evaluation DMOS values of an image, each point represents a distorted image, and a curve is obtained according to equation (6-1). As can be seen from fig. 3 to 10, the RSRM scattergram is more compact than other IQA indicators, which means that the objective score obtained by the RSRM metric proposed by the present invention is more correlated with the subjective evaluation. In addition, table 2 also gives the length of the image feature used by the IQA metric. The RSRM is longer in characteristic length than WNISM, SRRM and SSRM, but shorter than SVD, SSIM and PSNR.
In addition, there are 52 sets of distortion types in common among the four databases, and we performed experiments on each set of distortion types. Table 3 lists the results of the SROCC experiments. The best results are highlighted with underlining. Table 3 also gives the number of distortion types that perform the best metric. As can be seen from Table 3, SVD is better than other RR-IQA indicators. However, as previously mentioned, RSRM is much better than SVD on four databases. To provide a visual effect of the performance comparison between SVD and RSRM, fig. 11 shows a scatter plot of SVD and RSRM against different distortions in the TID2013 database. It can be seen that the points in the scatter plot for SVD versus single distortion are closer to each other, while the scatter plot for RSRM versus individual distortions is closer to the fitted curve. That is, SVD varies with different distortion types, while RSRM is very consistent between different distortion types.
TABLE 3 comparison of SROCC values for RR-IQA metrics at different distortion types
In summary, the invention provides a feature extraction framework of RSR for RR-IQA by utilizing wavelet, spectrum residual and fractal dimension, and provides a semi-reference image quality evaluation method based on spectrum residual based on RSR feature extraction. The wavelet aims at simulating a multi-channel structure of a human visual system, the spectral residual represents the importance of a local region to the human visual system, and the fractal dimension encodes the spectral residual. According to the invention, the RSRM is evaluated on four largest image databases (TID2013, TID2008, CSIQ and LIVE databases), and experimental results prove that the RSR is not only related to a human visual system, but also is robust to various image distortions. Therefore, compared with other measurement evaluations, the semi-reference image quality evaluation method provided by the invention has better performance indexes, and the image evaluation consistency of the method among different distortion types is better, so that the method can be widely applied to the field of image processing.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.