Stereoscopic image quality assessment based on the binocular properties of the human visual system

Provisioning the stereoscopic 3D (S3D) video transmission services of admissible quality in a wireless environment is an immense challenge for video service providers. Unlike for 2D videos, a widely accepted No-reference objective model for assessing transmitted 3D videos that explores the Human Visual System (HVS) appropriately has not been developed yet. Distortions perceived in 2D and 3D videos are significantly different due to the sophisticated manner in which the HVS handles the dissimilarities between the two different views. In real-time video transmission, viewers only have the distorted or receiver end content of the original video acquired through the communication medium. In this paper, we propose a No-reference quality assessment method that can estimate the quality of a stereoscopic 3D video based on HVS. By evaluating perceptual aspects and correlations of visual binocular impacts in a stereoscopic movie, the approach creates a way for the objective quality measure to...

Colour plus depth map based stereoscopic video has attracted significant attention in the last 10 years, as it can reduce storage and bandwidth requirements for the transmission of stereoscopic content over wireless channels such as mobile networks. However, quality assessment of coded 3D video sequence can currently be performed reliably using expensive and inconvenient subjective tests [1]. The main goal of many subjective video quality metrics is to automatically estimate average user or viewer opinion on a quality of video processed by the system. However, measurement of subjective video quality can also be challenging because it may require a trained expert to judge it. Many subjective video quality measurements are described in ITU-T recommendation BT.500. Their main idea is the same as in Mean Opinion Score for video sequences which are showed to the group of viewers and then their opinion is recorded and averaged to evaluate the quality of each video sequence. Optimization of 3D video systems in a timely manner is very important, it is therefore necessary that reliable subjective measures are calculated based on statistical analysis. This paper investigates subjective assessment for four standard 3D video sequences. Subjective tests are performed to verify the 3D video quality and depth perception of a range of differently coded video sequences, with packet loss rates ranging from 0% to 20%. The subjective quality results are used to determine more accurately the objective quality assessment metrics for 3D video sequences such as the Average PSNR, Structural similarities (SSIM), Mean Square Error (MSE) etc. The proposed measure of 3D perception and 3D quality of experience (QoE) is shown to correlate well with human perception of quality on a publicly available dataset of 3D videos and human subjective scores. The proposed measure extracts statistical features from depth and 3D videos to predict the human perception and 3D QoE.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE SYSTEMS JOURNAL 1 Stereoscopic Image Quality Assessment Based on Depth and Texture Information Xingang Liu, Member, IEEE, Kai Kang, and Yinbo Liu Abstract—With the increasing growth of multimedia applications over the networking in recent years, users have put forward much higher requirements for multimedia signals, particularly for image/video signal’s quality of experience (QoE) than before. The objective approach of image quality assessment (IQA) plays an important role for the development of compression standards and various multimedia applications. However, the quality assessment of stereoscopic (3-D) images faces more new challenges, such as depth perception, virtual view synthesis, and asymmetric stereo compression. In this paper, we propose a new full-reference (FR) 3-D IQA method to measure the quality of the distorted images. The properties of the depth component, structure component, and gradient component are taken into account to establish the proposed metric. The experimental results show that the proposed metric is highly consistent with the subjective test scores compared with the existing related metrics. In addition, the main significance of the proposed metric is that it not only could effectively evaluate the quality of 3-D image but also has a satisfied effect for measuring the quality of 2-D image. Index Terms—Depth map, full-reference (FR), image quality assessment (IQA), quality of experience (QoE), three-dimensional (3-D) image. I. I NTRODUCTION I N recent years, with the high-speed development of the communication systems, the multimedia applications over the networking have grown rapidly for many fields, such as live broadcast, video on demand, real-time conferencing, and so on [1] and [2]. People can have the access to use the network on any facilities, anyplace, and anytime. Consequently, the user have put forward much higher requirements for the quality of the multimedia signals than before; in particular, the video/image quality at the terminal is becoming the representative one in abundant emerging multimedia quality of experience (QoE). Currently, 3-D media services have been widely using Manuscript received November 13, 2014; revised February 23, 2015; accepted March 26, 2015. The work was supported by the Natural Science Foundation of China under Grant G0501020161301268, and also supported by the Fundamental Research Funds for the Central Universities under Grant ZYGX2015Z009. X. Liu is with the School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu 610051, China (e-mail: hanksliu@ uestc.edu.cn). K. Kang was with the School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu 610051, China. He is now with the Huawei Research Center, Chengdu 611731, China. Y. Liu was with the School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu 610051, China. He is now with the Audio and Video Technology Platform Department, ZTE Corporation, Shenzhen 518057, China. Digital Object Identifier 10.1109/JSYST.2015.2478119 in areas such as homes, workplaces, public spaces, and so on. Considered as the most epidemic media services, 3DTV and Free-viewpoint TV are expected to succeed High Definition TV (HDTV), bringing the visually lifelike graphical effects and the extended visual sensation to the end users [3], [4]. The delivery chain of the whole transmission system for 3DTV includes five stages listed as content production, encoding, transmission, decoding at the receive side, and display. However, any stage may cause the quality reduction of the 3-D image signal [5]. Moreover, the error generated at a certain stage may result in the error propagation in the subsequent stages of the delivery chain [6]. In addition, the compression technologies of 3-D image are imperfect at the present. Therefore, the 3-D image quality assessment (IQA) is an extremely important section in the design and optimization of the 3-D image processing system under wireless network. IQA, an image evaluation technology for 2-D/3-D image, which goes through the transmission or processing system, can contribute to maintain the quality of media service by monitoring the quality of 2-D/3-D image signals [7]. Over the past decades, IQA for digital image signal is one of the basic and challenging problems in the field of image and video processing, and it is the fact that the subjective IQA (SIQA) is known as the most accurate method to evaluate the quality of 2-D/3-D image since it is psychological based by using structured experimental designs and human participating to evaluate the quality of 2-D/3-D image [8], [9]. However, the realization of SIQA is complex and unpractical for real systems. Therefore, a great deal of attention has focused on the objective IQA (OIQA), which is a technology to establish an objective function by considering the physical aspects and psychological issues to evaluate the quality of the digital image signal [10]. Compared with SIQA, OIQA is simple and convenient, and it could realize the effect of automatically predicting the visual quality in real time [11]–[13]. Therefore, many efforts have been made to evaluate 2-D video/image quality over the last few decades, and a great deal of the corresponding metrics has been proposed. Simultaneously, some quality metrics have been applied to commerce. However, the OIQA for 3-D image is still at its preliminary stage [14]. Among the multiple 3-D image formats, such as stereoscopic image, multiview image, depth-image-based rendering (DIBR), and so on, the stereoscopic image is the most simple one, which consists of two views (commonly named as left and right views) captured by closely located two cameras where the slight difference between the two images is regarded as the disparity. By making use of the disparity information, human 1932-8184 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2 IEEE SYSTEMS JOURNAL beings can perceive the sensation of the depth for stereoscopic image [15]–[17]. Nowadays, it is the most commercialized 3-D image format and widely used for 3-D movie theaters, glasstype 3DTVs, and so on [18]–[20]. For this case, our research is carried out on basic of the stereoscopic image. According to availability of the reference image, OIQA metrics can be classified as full-reference (FR), no-reference (NR), and reduced-reference (RR) methods [21]–[22]. FR computes the quality difference by comparing the pixel information between the original and distorted images [3], [23], [24]. RR only uses some features of image, and NR predicts the test image quality without any reference information of the original image. In particular, FR OIQA techniques are based on the direct comparison of the test image with its distortion version. As the whole original information is available, FR metrics often return the most accurate and robust quality measurement. This paper does focus on FR IQA metric. According to the existing methods, we can conclude that the block artifacts and other factors related with human visual system (HVS), such as contrast, structure, blur, and so on, are the most important sections to design and optimize the performance of the OIQA metric [25], [26]. In this paper, a novel FR OIQA metric based on the depth and texture information is proposed. First, the contrast, structure, luminance, and depth properties are taken into consideration, and then, the distortion of stereoscopic image caused by blocking artifact is measured by using the original and gradient maps. Combined with the two aspects, the final 3-D OIQA metric of stereoscopic image is established. A great deal of simulations by using varied testing data bases is implemented in our experimental part to confirm the correctness and progressiveness of our proposal. The remainder of this paper is organized as follows: Section II discusses the recent related research works of the OIQA. The proposed metric is described in Section III, and Section IV presents the simulation results and performance analysis. Finally, the conclusions and future work are given in Section V. II. R ELATED W ORKS In the past few years, various 2-D OIQA for different evaluation models have emerged rapidly and achieved good performance because it has the capability of automatically predicting the image quality in real time. These IQA methods could be grouped into three major categories [1] described as follows. A. Pixel-Based Fidelity Method This kind of method mainly measures the distortion or impairments in video/image by physically comparing the pixel values between the distorted and original image signals and ignoring the perception of human vision. Mean squared error (MSE) and peak signal-to-noise ratio (PSNR) are the representative quality assessments in this assortment, since they are easy to understand in mathematics and calculate in pixelbased fidelity. However, without considering the perceptual information involving of the human vision, the two quality assessments have some limitations in practical applications. B. Psychophysical-Based Method The properties and mechanisms, such as luminance, contrast sensitivity, luminance masking and color perception, and so on, are mainly used in this kind of method. What is noteworthy is that the properties and mechanisms are related with the HVS and obtained from the researches of the physiology and psychology. By using the properties and mechanisms, each part of the mathematical models is respectively established. Video quality metric (VQM) [27] and perceptive VQM (PVQM) [3] are the popular metrics for this method. C. Engineering-Based Method The main idea of the engineering method is to extract and analyze some certain features or artifacts in image signal. Compared with the psychophysical method, the method sets up the model from the integral degree to acquire the ultimate IQA algorithm. In addition, the method is much more simple and effective than the psychophysical method. Examples of this kind of IQA method include structural similarity (SSIM) [13], content-based metric (CBM) [1], and VQM [3]. It is obvious that the most reliable IQA metric should be built to mimic the HVS, because the human beings are the ultimate receivers of the visual perception in practical applications. Therefore, compared with the pixel-based fidelity method, the psychophysical and engineering methods are considered as the better methods for evaluating the quality of 2-D/3-D image. Combining the strengths of both methods, a new FR quality assessment metric is proposed in this paper. In the following sections, we will describe some well-known IQA in detail. The distinguished quality metric PSNR, which is the most commonly used in objective metric applications in the world nowadays, is presented at first. It is worth noting that the PSNR is derived from the MSE that is calculated by using the summation of difference for the luminance values between the original and distorted images. The working function of PSNR is presented in (1) and (2), where X(i, j) and Y (i, j) are the corresponding pixel values in the original and distorted frames, respectively. M and N are the total pixel numbers in the horizontal and vertical directions of a frame, respectively. Here, k is the number of the bit used for describing one pixel. Thus MSE = N M (X(i, j) − Y (i, j))2 i=1j=1 M ×N 2 (2k − 1) . PSNR = 10 log10 MSE (1) (2) It is a pity that PSNR merely describes the difference in pixel values at the same position between the original frame and the distorted frame. As a consequence, the results obtained by PSNR method deviate from the results of the subjective video quality assessment (SVQA), which is verified in many works [8], [11]. In addition, without taking the perception of human vision into consideration, PSNRs have a weak correlation with the subjective quality assessment (SQA) scores to evaluate the quality of 2-D/3-D image. Thus, the objective quality This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LIU et al.: STEREOSCOPIC IMAGE QUALITY ASSESSMENT BASED ON DEPTH AND TEXTURE INFORMATION assessment (OQA) that is highly correlated with the SQA has been widely studied. A notable quality assessment method base on the perception information of human vision called SSIM is proposed by Wang et al. [12], which has become one of the key methods for image/video quality assessment for its accurate quality assessment results and wide application fields. It is noteworthy is that the HVS is highly specialized in extracting the structural information from the viewing field rather than simple pixel errors. It is well known that the statistical properties of the natural visual environment have vital important effect on the cognition process of the HVS, which includes evolution, development, and adaption. One of the properties for the natural visual signal is highly structured. The structural information is the theory that the pixels are strong interdependencies, particularly when they are spatially close. These dependencies carry vital important information about the structure of the objects in the visual scene. Based on the theory, SSIM is formed. In this algorithm, the structural information is independent on the luminance and contrast of an image. Hence, the image/video quality assessment method could be separated as the measurements of luminance distortion, contrast distortion, and structural distortion. As shown in (3), signal X and Y are the original and distorted image signals represented as X = {xi | i = 1, 2, . . . , M } and Y = {yi | i = 1, 2, . . . , M }, respectively, where i is the pixel index and M, N , are the number of the pixel for an image in the vertical and horizontal directions, respectively. Thus SSIM(x, y) = (2ux uy + c1 )(2σxy + c2 ) . (u2x + u2y + c1 )(σx2 + σy2 + c2 ) (3) Equation (3) gives the function of SSIM, where µx and µy refer to the means of the signal X and Y , respectively; σx and σy denote the variances of signal X and Y , respectively; and the σxy denotes the covariance between X and Y . c1 and c2 are constants whose values are very small and will take effect only when (u2x + u2y ) or (σx2 + σy2 ) is small. SSIM satisfies the following conditions. • SSIM (x, y) = SSIM(y, x); • SSIM (x, y) ≤ 1; • SSIM (x, y) = 1, if and only if the distorted image pixel values wholly identify with the original ones. Afterward, an improved quality algorithm that is named as Mean Structural Similarity (MSSIM) is published in [12] by Wang et al. It should be noted that MSSIM is a block-based VQM, as shown in (4), where, xj and yj are the block content at the jth block position for the original frame and the distorted frame, respectively, and M is the number of the blocks for the frame. By using the modified algorithm, MSSIM can obtain much better experimental results that are related with the subjective video quality scores, as compared with the previous metric in [9]. That is MSSIM = M 1 SSIM(xj , yj ). M j=1 (4) SSIM or MSSIM has a significant performance on quality evaluation, particularly for motionless image or slightly 3 changed video frames. There are two important profits for its performance. First, it considers not only the physical differences of the pixels but also the contrast and structure distortions. Second, the luminance, contrast, and structure distortions are mutual independence, avoiding the interactions with each other. Hence, the metric has a very important impact on the improvement for the SIQA metrics. However, its application for general image signal has obvious limitations and is not very smooth. First of all, the basic coding unit of general video coding standard is 8 × 8 subblock, so that the blocking effect always causes at the corresponding boundaries due to the lossy quantization operation. Therefore, the blocking artifact is not detected well in SSIM since its basic operation unit is also 8 × 8. Second, the primal problem of SSIM is that it is mainly for image signal, and the characters of video signal, such as motion vector, correlation of the neighboring frames, etc., are not fully considered. Hence, it could not give well enough quality evaluations, particularly for fast motion changing image signals [28]. For these cases, it is not suitable for the quality assessment of the more complex stereoscopic image. With the increasing requirement for video/image in the last decades, quality assessment metrics for 3-D video/image are constantly emerging. The PQM is the representative one to evaluate the quality of 3-D video [4]. It is a perceptual quality metric that can be applied to measure the distortion of 2-D/3-D video/image. It quantifies the distortions of luminance and contrast, which is calculated by the luminance value, mean, and variance of the original and distorted views, and covariance between the original and distorted views. These two distortions are weighed by the mean of each block of the original view to obtain the final distortion of the entire frame. In [14], the author proposed a perceptual FR quality assessment metric. In this algorithm, the binocular vision phenomenon that fuses and perceives the images from two eyes is tackled as a single perception for stereoscopic images. Through analyzing the relationship between the MSE and the SSIM, a MSE-SSIM IQA metric is proposed by Tan et al. in [15] by making full use of the variance of the original image and the MSE between the original and distorted images. The distortions of contrast coefficient on the block level in the frame are taken into account. As shown in (5), the distortion is calculated as follows: MSE-SSIM(Oi , Di ) = 2 2σO i 2 2σO + k1 i + MSE(Oi , Di ) + k1 (5) where σoi is the block variance of the original frame. MSE (Oi , Di ) is calculated by the summation of difference for the luminance values between the original and distorted block images. To avoid the mathematical logic error, it should be noted that a constant k1 is inserted in (5). It is the fact that the variance factor is roughly considered as the judge of the contrast for the frame signal. It is powerfully applied to evaluate the distorted levels of the contrast for a frame. Notice that the algorithm of MSE-SSIM is conceptually independent in a sense that depends on the mean of the original frame and the change of the luminance component between the original and distorted frames. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4 IEEE SYSTEMS JOURNAL Fig. 1. Schematic of overview of the proposed metric. MSE-SSIM is one of the representative algorithms for 2-D IQA and shows good quality evaluation results. The character of this metric is that it is sensitive to slight changes in image degradation, and the error qualification starts at pixel level right up to the sequence level. III. D ESCRIPTION OF THE P ROPOSED Q UALITY M ETRIC Generally, the stereoscopic images consist of two views, i.e., left and right view. The quality of each view plays an extremely important role in stereoscopic IQA; users perceive the stereoscopic image based on each view. We deal with the binocular vision that images perceived from two eyes are fused as a single perception. The proposed IQA mainly quantifies the following two distortions: the distortion of block content and the distortion for block boundary information. For the distortion of block content, which contains luminance and contrast features. they are exactly important to perceive the image quality in HVS. For the distortion of block boundary information, the processes of motion estimation, transformation, and quantization in the units of the blocks will deteriorate boundaries of blocks, and HVS is very sensitive to the boundary information of the image signals. To obtain the degree of the distortions in the frame, we adopt the 8 × 8 block as the basic operation unit in this paper. The algorithm can be effectively applied to measure the distortion of the stereoscopic image. Moreover, it is also used to evaluate the quality of 2-D image. The schematic overview of the proposed metric is illuminated in Fig. 1. The detailed procedure of the quality assessment is described in the following parts. A. Image Segmentation With the recent studies, we have found that it is accepted that, although the depth map includes less information of the image signal compared with the image view, it could effectively represent the image features of key contents, such as the boundaries, edges, and so on [17], [29]. Therefore, it is considered as the important edge information in this paper to enhance and optimize the proposed metric for 3-D image signal. HVS has different sensations for different depth distances. In addition, HVS is highly efficient for extracting the structural information when we perceive the depth in 3-D image signal. Under these considerations, the image views are segmented into multiple depth planes. Image segmentation plays a crucial role in many applications, such as image analysis and comprehension, computer vision, image coding, pattern recognition, and medical image analysis. Image segmentation methods are generally summarized as two sorts: threshold method based on gray-level histogram and region-growing method. The threshold methods became the most popular for its less computation, simplicity, and stability. In classical threshold image segmentation, an image is usually segmented and simply sorted into object and background by setting a threshold. In this paper, we can segment the image into different regions with the Otsu algorithm [17]. The step of Otsu is described as follows. First, the probability distribution of image is needed to be by the histogram of the image as pi = L xi , pi ≥ 0, pi = 1 X i=1 (6) where i represent the pixel value in the image, xi means the number of pixels whose gray value equal to i, X is the total number of pixels in the image, and pi is the probability distribution in grayscale. Then, the pixels are divided into two classes: objects and background (O) and (B) Otsu, which find out a threshold to minimize the intraclass variance. The class probability and class mean are calculated by ωO = P (O) = t pi = ω(t) (7) i=1 ωB = P (B) = L i=t+1 pi = 1 − ω(t) (8) This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LIU et al.: STEREOSCOPIC IMAGE QUALITY ASSESSMENT BASED ON DEPTH AND TEXTURE INFORMATION µO = t iP (i|O) = t ipi /ωO = i=1 i=1 where µ(t) = t 5 µ(t) ω(t) ipi (9) i=1 L µB = iP (i|B) = L ipi /ωB = i=t+1 i=t+1 L where µ(L) = µ(L) − µ(t) 1 − ω(t) ipi (10) i=1 where ωO and ωB represent the class probability of the background and object, respectively; L is the grayscale of the image, and it equals to 255 in optical imagery; and t means the threshold to separate the background and object. µ(L) is the total mean level of the original image, and µ(t) is the first-order cumulative moments of the histogram up to the tth level. It can easily be concluded that, for any choice of t, the following formula can be established: ω O µO + ω B µB = µT (11) ωO + ωB = 1. (12) 2 The class O and B variances are defined as σo2 , and σB . That is 2 σO = t (i − µO )2 P (i|O) = (i − µO )2 pi /ωO (13) Fig. 2. (a) Original image “im17_l.” (b) Indentify different depth planes of “im17_l.” i=1 i=1 2 σB = t L (i − µB )2 P (i|B) = L (i − µB )2 pi /ωB . (14) i=t+1 i=t+1 The intraclass variance σ12 , the interclass variance σ22 , and the total variance of levels are calculated by 2 2 σ1 2 = ωO σO + ωB σB (15) σ2 2 = ωO (µO − µT )2 + ωB (µB − µT )2 (16) σT2 = σ12 + σ22 . (17) The parameter η(t) is introduced and defined as η(t)2 = σ22 . σT2 (18) The optimization task can be transferred to T = max (η(t)) . (19) With the threshold t, the pixels ranged from [1, . . . , t] consist of object region, and the results belong to background. Then, we shall segment the object and background area images into multiple depth planes. There are several ways to statistically analyze the image planes for the depth map, and the histogram is utilized in this paper because it is very effective on analysis and processing for digital image signal. The range of the pixel value in the depth map is between 0 and 255. As shown in Fig. 3, the generation of image planes based on depth Fig. 3. Perceptive vision of different depth planes of “im17_l.” (a)–(c) Different planes of the original image. (d) Overall visual depth image of “im17_l.” map is illuminated. According to Fig. 2(b), it is clear that the peak frequencies occurred around several depth distances, such as 25, 160, and so on. The depth map can be segmented by the peaks. For this case, the images planes are defined as shown in Fig. 3(a)–(c), with the increasing order of the depth values. The highlight blocks are closer to observer than the dark blocks in Fig. 3(d). In addition, our statistical analysis shows that the This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6 image plane with close distance to viewers has larger effect on the perception of HVS. Based on the characteristic of histogram, the image planes are established. It is generally known that the numbers of image planes that can be obtained by the number of peaks are variation in view of different image views. In general, the numbers of image views range from 2 to 5 in our experiments. B. Detection of the Image Gradient Magnitude SSIM uses the structural information instead of error sensitivity-based measurement for quality assessment since the main function of the HVS is to extract structural information from the viewing field, and the SSIM is highly adapted for this purpose. SSIM is widely used because of its accurate quality assessment ability and general applicability. However, it only considers scaling distortions of the luminance, contrast for the block content, and structure that is unsuitable to assess the distortion of blocks. Generally, to reduce the data load for the network services, the video signal should be compressed by video coding standards before the transmission. Almost all of the current video coding standards are lossy codec, particularly due to the quantization operation [18]. Therefore, the blocking artifact always happens and could affect the judgment of HVS to the image quality. Although the blocking effect generally happens along the boundaries of the blocks, the blocking artifact will extend to the inner block since the pixel values also have been averaged by the lossy compression operation. In addition, when the sudden scene change happens, the distortions between the original and distorted blocks also will have a significant effect in evaluating the quality of stereoscopic/2-D image. The blocking artifact not only occurs in the 2-D video/image but also in the stereoscopic video/image because the stereoscopic video/image generally consists of the left and right views, i.e., each view is the 2-D video/image. The blocking artifact is caused by the distortion of the boundary information. Therefore, it is very important to consider the boundary information in IQA. Image gradients can be used to extract edge information from images. Gradient images are created from the original image (generally by convolving with a filter, one of the simplest is the Sobel filter) for this purpose. Each pixel of a gradient image represents the change in intensity of that corresponding point in the original image. After computing out gradient images, pixels with large gradient values are considered edge pixels for its dramatic changes in local region. Image gradient computation is a traditional topic in image processing. Gradient operators expressed by convolution masks are efficient to represent the edge information. The three most commonly used gradient operators are the Sobel operator, the Prewitt operator, and the Scharr operator. In our proposed IQA metrics, the Scharr operator chosen by experiments is used to calculate the gradient in this paper. The partial derivatives Gx (X) and Gy (X) of the image f (X) along horizontal and vertical directions using the Scharr operator listed in Table I. The gradient magnitude of f (X)is then defined as (20) G = G2x + G2y . IEEE SYSTEMS JOURNAL TABLE I PARTIAL D ERIVATIVES OF f (X) U SING S CHARR G RADIENT O PERATORS According to the gradient map, we take the similar measure with MSE-SSIM to calculate the block content and get MSE (GOi , GDi ), as shown in 8 8 (GOi (a, b) − GDi (a, b))2 a=1 b=1 MSE(GOi , GDi ) = 8×8 (21) where GOi (a, b) and GDi (a, b) are the gradient map on the ith block of the original and distortion map. C. Distortion for the Entirely Image The distortion of contrast and structure coefficients GM (Oi , Di ) in the block are taken into account. As shown in (22), these two distortions are conjunctively calculated. Thus GM(Oi , Di ) = 2 2 +2σG +k1 2σO i O i 2 +2σ 2 2σO GOi +MSE(Oi , Di )+MSE(GOi , GDi )+k1 i (22) 2 2 where σOi and σGOi are the variance on the ith block of the original and gradient map. To avoid the mathematical logic error, a constant k1 is inserted to guarantee the meaning of the denominator. According to the analysis in the previous section, the quality score for the 3-D image planes (AGM) is calculated as follows: j AGM = GM(Oi , Di ) i∈Bj Nj (23) where Bj is the jth block in the image, Nj is the number of blocks in Bj . After having obtained the quality scores of AGM at each region, the quality for each image plane is calculated. We will also use the linear weighting to combine these obtained quality scores for different image planes in the 3-D image (IAGM), as shown in IAGM = p wj × AGMj (24) j=1 where wj is the weight assigned to different image planes, respectively, with the constraint pi=1 wj = 1 to quantify the importance of each image planes in the binocular vision. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LIU et al.: STEREOSCOPIC IMAGE QUALITY ASSESSMENT BASED ON DEPTH AND TEXTURE INFORMATION 7 Fig. 4. Plots of DMOS versus IAGM with different distortion types. (a)–(e) Plot of JP2K, BLUR, JPEG, WN, and FF distortion, respectively. IV. E XPERIMENTAL R ESULTS AND A NALYSIS To evaluate the performance of the proposed algorithm, the IAGM and other state-of-the-art IQA metrics are implemented in the experimental phase. A. Subjective Quality Assessment for 3-D Image Signals The main target of the research on IQA is to explore the superior OIQA metric, which could be consistent with the subjective assessment result as much as possible. Therefore, the subjective experiments are executed at first to provide comparable data for the design of OIQA functions. It is the fact that SIQA is the most precise measures of perceptual quality since it is generated by HVS directly. For this kind of experiments, people are involved to evaluate the image quality under controlled test environments. In our experiments, we use the Live Database presented in [24] developed at The University of Texas at Austin. It consists of 20 reference images and a total of 365 distorted images with 5 types of distortions at different distortion levels. To evaluate the performance, our proposed metric is compared with various famous metrics by using five distortion types that are JPEG compression, JPEG2000 compression, additive white Gaussian noise, Gaussian blur, and a fast-fading model based on the Rayleigh fading channel, which abbreviated to JPEG, JP2K, WN, Blur, and FF, respectively. B. OQA Metrics Used in Our Paper With the development of the wireless networking, multiple relevant intelligent devices are produced and wildly used. The IQA metric is regarded as an efficient method for measuring the final received image quality. Moreover, OIQA could become the part of the real-time feedback mechanism and often applied to optimize the transmitting system. Four evaluation indexes, which are Pearson linear correlation coefficient (PLCC), Spearman’s rank order correlation coefficient (SROCC), root mean square error (RMSE), and coefficient of determination (R-square) are used to compare the performance of the mentioned metrics. It should be noted that SROCC and R-square are employed to assess prediction monotonicity, and PLCC and RMSE are used to evaluate prediction accuracy in contrast. For a perfect match between the objective scores after nonlinear regression and the subjective scores, PLCC = SROCC = R-square = 1 and RMSE = 0. The value of PLCC, SROCC, R-square, and RMSE will be equal to 1, 1, 1, and 0, respectively, in the ideal situation where the objective scores after nonlinear regression [21], as shown in (25), and the subjective scores are a perfect match. 1 1 − + α4 · x + α5 DMOS = α1 · 2 1 + exp (α2 · (x − α3 )) (25) where DMOS means that subjective scores, α1 , α2 , α3 , α4 , and α5 are determined by using the subjective scores and the objective scores. In order to validate the proposed IAGM algorithm, different distortion types are selected to make performance, as shown in Fig. 4, where each point in the plots represents one test image, and the vertical and horizontal axes represent DMOS and IAGM, respectively. It is obvious that, if the prediction is perfect, the point will lie on the diagonal line. In this paper, some 3-D IQA metrics are chosen based on their reported This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8 IEEE SYSTEMS JOURNAL Fig. 5. Ensemble performance with test on live 3-D-image database. (a)–(f) PSNR, MSSIM, PQM, M-SVD, MSE-SSIM, and the proposed method IAGM overcome versus DMOS in sequence. TABLE II PLCC C OMPARED W ITH O THER M ETHODS TABLE III SROCC C OMPARED W ITH O THER M ETHODS performance and availability to verify the universality of our proposed algorithm. In addition, in order to measure the efficiencies of the popular 2-D quality assessment metrics for evaluating 3-D visual quality, four other metrics for 2-D images are employed in our experiments. It should be noted that the 2-D IQA metrics are applied on the left and right views, respectively. The single measure the quality of 3-D image is produced by using the average of the 2-D IQA metrics scores. The performance comparison between our proposed metrics and the other latest six OQA algorithms is demonstrated by using the scatter plots, as shown in Fig. 5. The vertical axis means the subjective scores, which is represented as DMOS, and the horizontal axis denotes the revised objective quality scores after the nonlinear mapping algorithm. From Fig. 5, we can conclude that the proposed IAGM outperforms the other algorithms. The values of the four indexes for each distortion type of the database are listed in Tables II–V. For each IQA algorithm in each test, we can conclude that the proposed metric outperforms the other 2-D image and 3-D IQA algorithms. Since the PSNR, MSSIM, M_SVD, and MSE-SSIM metrics are directly extended from the 2-D case and have not taken the binocular visual characteristics into account, the overall performances of these are worse than our proposed metric, in general, even if they may be effective for some special distortion types. For example, it is well accepted that MSSIM has the good performance to measure the distortion of image. However, we find that the shortcoming is existed in this metric, which could be effectively optimized by taking the properties of human visual perception into account. In addition, our experiment results could be used to confirm this hypothesis. According to Tables II–V, the IAGM algorithm performs better than other metrics for all distortions, except for WN. In the situation of WN distortion, MSSIM performs better than other metrics. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LIU et al.: STEREOSCOPIC IMAGE QUALITY ASSESSMENT BASED ON DEPTH AND TEXTURE INFORMATION TABLE IV RMSE C OMPARED W ITH O THER M ETHODS 9 content could be automatically scaled by using the histogram reduced by depth map; (3) the gradient magnitude similarity as a weighting function is also used to enhance this model in assessing the quality of image. Compared with state-of-the-art IQA metrics, the proposed IAGM metric performs better in terms of both accuracy and efficient, making IAGM an ideal choice for high-performance IQA applications. R EFERENCES TABLE V R-S QUARE C OMPARED W ITH O THER M ETHODS Both IAGM and MSE-SSIM have higher ranking fractions for JPEG 2K and BLUR compared to other distortions. The result also shows that the proposed algorithm is robust to small shifts. As shown in Table II, to some extent, it could have good performance to some certain distorted types of 3-D image. However, it is still lower than some 2-D metrics. The reason is that the quality of stereo video to some degree is dependent on the stereo matching algorithm, so that the measurement for disparity does not coincide with the human vision of disparity. As a future work, we will focus on developing a new algorithm, which uses disparity map to compute the quality of different resolution image/video. V. C ONCLUSION The OQA of stereo images plays a key role for the development of 3-D image compression standards and the success in enriching various 3-D visual applications. Recently, the technologies of intelligent internet of things have been fast developed; in particular, for the 3-D image, many immersive scenes have been shown to people and let the perceptive appreciation be more vivid. In this paper, an efficient IQA metric for 3-D image signal has been proposed by considering local image properties and psychological property on HVS and considering the depth map as the important factor for 3-D IQA. The advantages of our proposed metric are as follows: 1) the calculation of the local pixel-based distortion, contrast distortion, and its structural distortion could perfectly describe the image features of the image view; (2) the local stimuli [1] H. Wei, H. Y. Zhao, C. Lin, and Y. Yang, “Effective load balancing for cloud-based multimedia system,” in Proc. Int. Conf. Electron. Mech. Eng. Inf. Technol., 2011, pp. 165–168. [2] W.-M. Chen, C.-J. Lai, H.-C. Wang, H.-C. Chao, and C.-H. Lo, “H.264 video watermarking with secret image sharing,” IET Image Process., vol. 5, no. 4, pp. 349–354, Jun. 2011. [3] W. Lin and C.-C. Jay Kuo, “Perceptual visual quality metrics: A survey,” J. Vis. Commun. Image Representation, vol. 22, no. 4, pp. 297–312, May 2011. [4] P. Joveluro, H. Malekmohamadi, W. A. C. Fernando, and A. M. Kondoz, “Perceptual video quality metric for 3D video quality assessment,” in Proc. 3DTV Conf., Jul. 2010, pp. 1–4. [5] T.-Y. Wua, C.-Y. Chenb, L.-S. Kuoc, W.-T. Leec, and H.-C. Chao, “Cloudbased image processing system with priority-based data distribution mechanism,” Comput. Commun., vol. 35, no. 15, pp. 1809–1818, Sep. 2012. [6] Z. M. P. Sazzad, S. Yamanaka, Y. Kawayoke, and Y. Horita, “Stereoscopic image quality prediction,” in Proc. IEEE QoMEX, Jul. 2009, pp. 180–185. [7] F. Qi, T. Jiang, S. Ma, and D. Zhao, “Quality of Experience Assessment for Stereoscopic Images,” in Proc. IEEE Int. Conf. Circuits Syst., May 2012, pp. 1712–1715. [8] K. Ha and M. Kim, “A perceptual quality assessment metric using temporal complexity and disparity information for stereoscopic video,” in Proc. IEEE Int. Conf. Image Process., Sep. 2011, pp. 2525–2528. [9] H.-W. Chang, H. Yang, Y. Gan, and M.-H. Wang, “Sparse feature fidelity for perceptual image quality assessment,” IEEE Trans. Image Process., vol. 22, no. 10, pp. 4007–4018, Oct. 2013. [10] M. Carnec, P. L. Callet, and D. Barba, “Objective quality assessment of color images based on a generic perceptual reduced reference,” J. Signal Process.: Image Commun., vol. 23, no. 4, pp. 239–256, Apr. 2008 [11] T. M. Kusuma and H.-J. Zepernick, “A reduced-reference perceptual quality metric for in-service image quality assessment,” in Proc. IEEE 1st Workshop Mobile Future Symp. Trends Commun., Oct. 2003, pp. 71–74. [12] Z. Wang, L. Lu, and A. C. Bovik, “Video quality assessment based on structural distortion measurement,” IEEE Trans. Image Process., vol. 19, no. 2, pp. 121–132, Feb. 2004 [13] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004. [14] F. Shao, W. Lin, S. Gu, G. Jiang, and T. Srikanthan, “Perceptual full-reference quality assessment of stereoscopic images by considering binocular visual characteristics,” IEEE Trans. Image Process., vol. 22, no. 5, pp. 1940–1953, May 2013. [15] H. Tan, Z. Li, Y. H. Tan, S. Rahardja, and C. Yeo, “A perceptually relevant MSE-based image quality metric,” IEEE Trans. Image Process., vol. 22, no. 11, pp. 4447–4459, Nov. 2013. [16] Y.-H. Lin and J.-L. Wu, “Quality assessment of stereoscopic 3D image compression by binocular integration behaviors,” IEEE Trans. Image Process., vol. 23, no. 4, pp. 1527–1542, Apr. 2014. [17] S. Zhou, P. Yang, and W. Xie, “Infrared image segmentation based on Otsu and genetic algorithm,” in Proc. Int. Conf. Multimedia Technol., 2011, pp. 5421–5424. [18] S. L. P. Yasakethu, S. T. Worrall, D. V. S. X. De Silva, W. A. C. Fernando, and A. M. Kondoz, “A compound depth and image quality metric for measuring the effects of packet loss on 3D video,” in Proc. Int. Conf. Digital Signal Process., Corfu, Greece, Jul. 2011, pp. 1–7. [19] A. K. Moorthy, C.-C. Su, A. Mittal, and A. C. Bovik, “Subjective evaluation of stereoscopic image quality,” Signal Process.: Image Commun., vol. 28, no. 8, pp. 870–883, Sep. 2013. [20] K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack, “Study of the subjective and objective quality assessment of video,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1427–1441, Jun. 2010. [21] P. G. Gottschalk and J. R. Dunn, “The five-parameter logistic: A characterization and comparison with the four-parameter logistic,” Anal. Biochem., vol. 343, no. 1, pp. 54–65, Aug. 2005. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10 [22] L. Jin, A. Boev, A. Gotchev, and K. Egiazarian, “3D-DCT based perceptual quality assessment of stereo video,” in Proc. IEEE Int. Conf. Image Process., Brussels, Belgium, Sep. 2011, pp. 2521–2524. [23] X. Liu, M. Chen, W. Tang, and C. Yu, “Hybrid no-reference video quality assessment focusing on codec effects,” KSII Trans. Internet Inf. Syst., vol. 3, no. 2, pp. 592–606, Jan. 2011. [24] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Trans. Image Process., vol. 20, no. 8, pp. 2378–2386, Aug. 2011 [25] S.-W. Jung, “A modified model of the just noticeable depth difference and its application to depth sensation enhancement,” IEEE Trans. Image Process., vol. 22, no. 10, pp. 3892–3903, Oct. 2013. [26] A. M. Demirtas, A. R. Reibman, and H. Jafarkhani, “Full-reference quality estimation for images with different spatial resolutions,” IEEE Trans. Image Process., vol. 23, no. 5, pp. 2069–2080, May 2014. [27] M. H. Pinson and S. Wolf, “A new standardized method for objectively measuring video quality,” IEEE Trans. Broadcast., vol. 50, no. 3, pp. 312–322, Sep. 2004. [28] X. Liu, L. T. Yang, and K. Sohn, “High-speed inter-view frame mode decision procedure for multi-view video coding,” J. Future Gener. Comput. Syst. (Elsevier), vol. 28, no. 6, pp. 947–956, Jun. 2011. [29] A. Liu, W. Lin, and M. Narwaria, “Image quality assessment based on gradient similarity,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1500–1512, Apr. 2012. Xingang Liu (M’10) received the B.Eng. degree in the school of Electronic Engineering (EE), University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2000, and the M.Eng. and Ph.D. degrees from Yeungnam University, Gyeongsan, Korea, in 2005 and 2009, respectively. He is a Full Professor with the school of EE, UESTC, Chengdu, China, where was a faculty member in EE during 2000–2003. He was a BK21 Research Fellow with the school of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea, from 2010 to 2011. He was also an Adjunct Professor with Dongguk University, Seoul, Korea. His research interests include multimedia signal communication-related topics, such as heterogeneous/homogenous video transcoding, video quality measurement, video signal error concealment, mode decision algorithm, 2-D/3-D video codec, and so on. Dr. Liu is a member of the Korea Information and Communications Society (KICS), and Korea Society for Internet Information (KSII). IEEE SYSTEMS JOURNAL Kai Kang received the M.S. degree in electronic engineering from the University of Electronic Science and Technology of China (UESTC), Chengdu, China in 2015. He is currently an Engineer with the Huawei Research Center, Chengdu, China. His research interests include 2-D/3-D video/image quality assessment, cloud computing systems, and so on. Yinbo Liu received the M.S. degree in electronic engineering from the University of Electronic Science and Technology of China (UESTC), Chengdu, China in 2015. He is currently a Video Algorithm Engineer with the Audio and Video Technology Platform Department, ZTE Corporation, Shenzhen, China. His research interests include video compression, mode decision, and so on.

RELATED PAPERS

RELATED TOPICS

Log In

Stereoscopic image quality assessment based on the binocular properties of the human visual system

Stereoscopic image quality assessment based on the binocular properties of the human visual system

Related Papers

RELATED PAPERS

RELATED TOPICS