A Novel No-reference Subjective Quality Metric for Free Viewpoint Video Using Human Eye Movement

Pallab Kanti Podder¹⁶,
Manoranjan Paul¹⁶ &
Manzur Murshed¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10749))

Included in the following conference series:

Pacific-Rim Symposium on Image and Video Technology

2244 Accesses
1 Citations

Abstract

The free viewpoint video (FVV) allows users to interactively control the viewpoint and generate new views of a dynamic scene from any 3D position for better 3D visual experience with depth perception. Multiview video coding exploits both texture and depth video information from various angles to encode a number of views to facilitate FVV. The usual practice for the single view or multiview quality assessment is characterized by evolving the objective quality assessment metrics due to their simplicity and real time applications such as the peak signal-to-noise ratio (PSNR) or the structural similarity index (SSIM). However, the PSNR or SSIM requires reference image for quality evaluation and could not be successfully employed in FVV as the new view in FVV does not have any reference view to compare with. Conversely, the widely used subjective estimator- mean opinion score (MOS) is often biased by the testing environment, viewers mode, domain knowledge, and many other factors that may actively influence on actual assessment. To address this limitation, in this work, we devise a no-reference subjective quality assessment metric by simply exploiting the pattern of human eye browsing on FVV. Over different quality contents of FVV, the participants eye-tracker recorded spatio-temporal gaze-data indicate more concentrated eye-traversing approach for relatively better quality. Thus, we calculate the Length, Angle, Pupil-size, and Gaze-duration features from the recorded gaze trajectory. The content and resolution invariant operation is carried out prior to synthesizing them using an adaptive weighted function to develop a new quality metric using eye traversal (QMET). Tested results reveal that the proposed QMET performs better than the SSIM and MOS in terms of assessing different aspects of coded video quality for a wide range of FVV contents.

You have full access to this open access chapter, Download conference paper PDF

Foveation-based content adaptive root mean squared error for video quality assessment

Article 29 December 2017

Multiple Image Arrangement for Subjective Quality Assessment

Article 25 January 2017

A 3D video visual comfort evaluation method on the consistency of accommodation and convergence

Article 24 May 2017

Keywords

1 Introduction

The video quality evaluation (VQE) is a promising research area due to its wide range of applications in the development of various video coding algorithms [1]. The technical coding areas involved with the FVV are characterized by the view generation using multiview video coding (MVC) and the view synthesis. This process first goes through the image warping and then a hole filling technique e.g. the inverse mapping technique or spatial/temporal correlation as simple post processing filtering [2, 3]. Since the synthesized view is generated at a virtual position between left and right views, there is no available reference frame for quality estimation of FVV [4]. Usually the quality estimation is performed in two ways: objective and subjective, where the former one is more widely used due to its simplicity, ease of use and having real-time applications. Thus, a good number of citable researches have been conducted based on the objective image quality estimation [5,6,7]. The quality estimation could be further categorized into full-reference (i.e. original videos as reference), reduced-reference (i.e. existence of partial signals as reference) and no-reference schemes. Among them, the applications of full-reference metrics such as the SSIM or PSNR have been restricted to the reference based situations only and these metrics lose their suitability in estimating different qualities of FVV where the reference frame is not available. To address the limitations of full-reference metrics, a number of no-reference based research works have recently come into the light for quality evaluation [8,9,10]. The introduced statistical metrics may not be suitable to some high quality ranges since the quality perception in these area is mostly due to perceptual human visual system (HVS) features, rather than to the statistics of the image [11]. However, different features of the HVS are not actively studied in the existing schemes. The authors in [12] performed the human cognition based quality assessment using eye-tracking and evolved more realistic ground truth visual saliency model to improve their algorithm. In fact, the eye-tracking has become a non-intrusive, affordable, and easy-to-use tool in human behavior research today. With very few exceptions, anything with a visual component can be eye tracked by simply employing the software based eye-tracking simulator [13]. Unlike objective quality evaluation, the subjective studies could yield valuable data to evaluate the performance of objective methods towards aiming the ultimate goal of matching human perception [14]. Thus, a number of quality assessment algorithms have been proposed which are closely related to the studies of human visual attention and cognition. The study in [15] introduced a no-reference framework using blur and blockiness metric to improve the performance of objective metric using eye-tracker data. The authors in [16] introduced a model to judge the video quality on the basis of psychological merits including- the pupil dilation and electroencephalogram signalling. Exploiting the eye gaze-data, Albanesi and Amadeo [17] generated a voting algorithm to develop a no-reference method. Using the scan path of eye movements, Tsai et al. [18] subjectively assessed the perceived image and its colour quality. Conversely, the widely used subjective testing scheme- the MOS [19, 20] is often biased by a number of factors such as viewers mode, domain knowledge, testing environment, and many more which may actively influence the effectiveness of quality assessment process. Podder et al. [21] first introduced the subjective metric- QMET, however, their initial work is based on the single view video where the viewing angle is fixed for users. Moreover, their introduced approach highly depends on threshold selection for each feature and incur with the lack of proper correlation setting among features. The most importantly, their metric does not perform well in different contents and resolutions of the videos. The proposed method is a significantly extended version of their work where the major amendments include the employment of FVV i.e. in the no reference scenario, increasing number of features, better correlation analysis of features, performing content and resolution invariant operation on features, synthesizing them by an adaptive weighted function, comparing the new metric with PSNR, SSIM, and MOS, and eventually employing two widely used estimators the Pearson Linear Correlation Coefficient (PLCC) and Spearman Rank-Order Correlation Coefficient (SRCC) to justify the effectiveness of the proposed QMET for a range of FVV sequences.

Let us first concentrate on Fig. 1 in which (a) and (d) represent a multiview video sequence namely Newspaper encoded as good and poor quality respectively, while (b) and (e) demonstrate the eye traversing approach of a viewer for good and poor quality image contents respectively. The tracked gaze plots indicate more concentrated eye-traversal for relatively better quality contents. Now if we determine Length (L) and Angle (A) features for the gaze plots, they could explicitly tell about the viewers pattern browsing (i.e. smooth or random as depicted in Fig. 1(b) and (e)). Then we discover that the quality variation effects on both Pupil-size (P) and Gaze-duration (T) variation presented in Fig. 1(c) and (f), thus, we calculate four cardinal features- L, A, P, and T for each potential gaze plot (PGP) from the gaze trajectory. The PGPs in this test are defined by the fixations (i.e. visual gaze on a single location) and saccades (i.e. quick movement of eyes between two or more phases of fixations). The content and resolution invariant operations are then performed on the features and adaptively synthesized using a weighted function to develop the proposed QMET. The higher QMET score promises good quality video as the viewers could better capture its content information with smooth global browsing. Experimental results reveal that the quality evaluation carried out by the QMET could better perform compared to the objective metric SSIM, and the subjective estimator MOS. Since the eye tracker data could be easily captured today by directly employing the software based eye-tracking simulator (i.e. device itself is no longer required), the utility of the QMET could also be more flexible simple simulator generated data set.

2 Proposed Method

First of all, by employing the HEVC [22] reference software HM15.0 [23], different video quality segments were generated and then watched by a group of participants. The processed eye-tracker recorded data were analyzed using four quality correlation features, i.e. L, A, P, and T. The content and resolution invariant operations were carried out on the features and the features were synthesized by an adaptive weighted function to develop a new metric- QMET. The process diagram of the entire process is presented in Fig. 2, while the key steps are described in the succeeding sections.

2.1 Data Capture and Pre-processing

The participants (including males and females) who were recruited from the University had normal or corrected-to-normal vision and did not suffer from any medical condition to adversely influence our project [ethical approval no. 2015/124]. They fall within the 20–45 age band and are undergraduate/postgraduate students, PhD students, and lecturers of the University. A number of multiview sequences which are used in this test comprise the resolution type of $1920\times 1088$ and $1024\times 768$ (detail to be found in [24]). To avoid the biasness, initially we use the gray scale components only and randomly vary the display order of the quality segments to the participants. We generate three different quality types of each video including Excellent (using quantization parameter $QP=5$), Fair ($QP=25$), and Very-poor ($QP=50$). Calibration and a trial run was performed so that the participants feel comfort about the whole process. Upon their satisfaction, the Tobii eye tracker [25] was employed to record their eye movements. As the device recorded data at 60 HZ frequency and allocated frame rate was 30 (fps), each frame could accommodate two gaze points and a single whole video covered 9000 gaze plots having 1800 for each quality segment.

2.2 Features Correlation Analysis with Quality

The Length (L- in pixel) of the i th potential gaze plot is calculated using the Euclidean distance with respect to the (i + 1)th gaze plot, while the Angle (A- in degree) of the ith plot is calculated by using both the reference of its ($\textit{i}-\textit{1}$)th and (i + 1)th values (where i = 1, 2, ..., n and the values of L and A are not calculated for the 1st and nth plots). The pupil-size (P- in mm) and Gaze-duration (T- in ms) for each ith plot are determined by averaging the values of left and right pupil size and the eye-tracker recorded timestamp data respectively by employing MATLAB R2012a (MathWorks Inc., Massachusetts, USA). The overall calculated results indicate that L, A, P features have a proportionate correlation, while the feature T has an inversely proportionate correlation with the video quality degradation as demonstrated in Fig. 3.

This time, the contribution of each feature has been estimated in the context of segregating different quality contents. It is observed that no single feature could solely be the best representative in distinguishing different qualities. The individual $Q-score$ (i.e. the calculated pseudo score of the proposed QMET) of each feature is determined by exploiting the Eqs. (1)–(4), where Q1, Q2, Q3, and Q4 indicate the $Q-score$ for the individual feature L, A, P, and T respectively.

$$\begin{aligned} Q_1=L^{\delta L} \end{aligned}$$

(1)

$$\begin{aligned} Q_2=A^{\varphi A} \end{aligned}$$

(2)

$$\begin{aligned} Q_3=(P/2)^{\gamma P} \end{aligned}$$

(3)

$$\begin{aligned} Q_4= \sqrt{2T}^{(\eta /\sqrt{2T})} \end{aligned}$$

(4)

here, $\delta $, $\varphi $, $\gamma $, and $\eta $ are the weighting factors of L, A, P, and T features respectively. Let us briefly discuss the formation of equations to produce different $Q-scores$ using the power law where the relative change in one quantity results in a proportional change in the other quantity, i.e. one quantity varies as a power of another [26]. In our case, the relative value change of the features is unknown, and their corresponding reproduced $Q-score$ is unknown as well, however, whether they have proportionate or inversely proportionate relation is known. For example, lower L indicates higher quality and respective higher $Q-score$, but still, we do not know how much. Since the value change of L for each quality segment is not significant (e.g. 0.08 for Excellent and 0.12 for Fair and the maximum average does not exceed 0.50), it could be best represented only by its power representation as the smaller power with smaller base produces a higher score. This could eventually produce a clear score difference for different quality segments. The features L, A, and P work with power-weight multiplication, however, power-weight division for T similarly works here as it has as inversely proportionate relation with Q-score. The relationships are presented in Eqs. (1)–(4). The rationality of using the Q-score is to predict a better picture of the QMET performance change for various changes of L, A, P, and T within a sizable format that ranges from 0 to 1. Since L, A, P, and T features could jointly advice about how far, how much, how large, and how long respectively in the spatiotemporal domain, the features are synthesized by developing an adaptive weighted function equated as $Q=L^{\delta L}\times A^{\varphi A}\times (P/2)^{\gamma P}\times \sqrt{2T}^{(\eta /\sqrt{2T})}$. The purpose of this multiplication is to keep a persistent relation of L, A, P, and T features with the reproduced Q-score. As the normalized value of the features varies within the range 0 to 1 using Eqs. (1)–(4), their multiplication could better reproduce the ultimate score within the predefined limit. Note that the weight for $\delta $, $\varphi $, $\gamma $, and $\eta $ in the Eqs. (1)–(4) is fixed with 0.5 in this test. This is because we further calculate the slope at each point changing the quality (i.e. Excellent, Fair, and so on) and determine their average for a number of weights. Since the calculated average using weight 0.5 outperforms the other weight combinations, we fix it for the entire experiment to best distinguish different quality segments which is demonstrated in Fig. 4. The distribution of other combinations might work better; however, the tested results demonstrate a good correlation of QMET with other metrics.

2.3 Content and Resolution Invariant Operation on Features

Let us first consider the content (left in Fig. 5) and resolution (right in Fig. 5) based unprocessed L of two example sequences e.g. Poznan_Street and Newspaper presented in Fig. 5. The calculated variations between the highest and lowest values are 41.72% and 28.63% according to the contents and resolutions respectively. Now, the content invariant operation follows a number of steps. First, we figure out the L of the PGPs as mentioned in Sect. 2.2; Second, calculate the average of potential gaze plot (x) and (y) and entitle it by the centre coordinate C(x,y); Third, with respect to C(x,y), we calculate the Euclidean distance of all PGPs and sort the values of length by lowest to the highest order. The rationality of this ordering scheme is due to prioritize the foveal central concentration on pixels by partially avoiding the long surrounded parafoveal, or perifoveal fixations [27] that may incur even with attentive eye browsing; Fourth, to determine the object motion area, we take the average of first $\mu $ sorted values ($\mu = 75\%$ in this test since it could help the QMET in obtaining the highest score) which is the foreseen radius of captured affective region; Fifth, the radius is then employed as a divisor of calculated lengths for each potential gaze plots in the First step.

Similar to the content based lengths, we also observe a stunning variation of 28.63% for different resolution based lengths in Fig. 5 (right). As a result, we exploit a number of multiplication factors (passively act as compensators) eventually to neutralize the impact of various size video resolutions displayed on the screen. For example, assuming $1024\times 768$ resolution sequence as a reference, the unprocessed lengths of its higher and lower resolution sequences are multiplied by 0.75 and 1.25 respectively. Almost for all the sequences, since the eye-tracker recorded data demonstrates a good correlation among the highest to the lowest resolution videos, the multipliers could perform well in resolution invariant operation. The outcomes then turn into the normalized values ranging within 0 to 1. The resultant effect of content plus resolution invariant operation for L is revealed in the top-left of Fig. 6 which is undertaken for the final QMET scoring. Once the similar operations are performed on the features A, P, and T, the variation effects could be significantly minimized as illustrated in the top-right, bottom-left and bottom-right respectively as demonstrated in Fig. 6.

2.4 The Development of QMET

If relatively lower values of L, A, and P, and higher values of T belong to a potential gaze plot, the QMET should produce relatively higher score. Thus, the QMET score is calculated for all PGPs of each Excellent, Fair, and Very-poor quality segment of the sequences by adaptively synthesizing the features as follows:

$$\begin{aligned} Q_{MET}=L^{\delta L}\times A^{\varphi A}\times (P/2)^{\gamma P}\times \sqrt{2T}^{(\eta /\sqrt{2T})} \end{aligned}$$

(5)

where the weight for $\delta $, $\varphi $, $\gamma $, and $\eta $ is fixed with 0.5 in this experiment as stated earlier. In an unusual case, if the normalized values of L and A become 0 for 30 consecutive frames (as the frame rate is kept 30 in this test), then a mimicking operation is performed. The rationality of allocating such operation is due to handling the consecutive 0 s that may incur with the intentional eye fixation of participants to a certain PGP. Thus, the user data which have got stack over the frames are forcefully panelized by arbitrarily setting the value of L = 0.1 and A = 0.1. This operation is applicable only for the features L and A since P and T are still ! = 0 then. Note that during this test, we did not experience such unusual situation and carried out no such operation.

3 Experimental Outcomes

The QMET evaluated maximum and minimum scores for each quality segment using two example sequences are presented in Fig. 7(a). For both sequences, the obtained score for the Excellent quality segment is the highest which gradually decreases with respect to the quality degradation and reaches its lowest for the Very-poor segment of quality. Compared to the Newspaper, the QMET score sharply decreases for the Poznan_Street sequence. This is because compared to its Excellent quality segment, the recorded supporting gaze data for the Very-poor quality incur with recurrent unsuitable feature values and produce a lower QMET score. Once we calculate the average score of each Max and Min for the individual quality segment, we notice that the average recognition of variation between the best and worst quality becomes 72.35% which indicate a clear quality distinguishing capability of the QMET.

Figure 7(b) revels the participant-wise and video-wise average QMET score for three different quality segments. The QMET could obtain the highest score i.e. 0.78 and 0.71 for the Excellent quality segment according to both the video and participant basis as the participants could better capture information from the best quality contents with smooth global browsing. Conversely, for the lowest scores i.e. 0.25 and 0.21 at Very-poor segment, participants in most cases do not succeed to capture content information due do its unpleasant quality and then immediately move to the next but still erroneous. As the number of such hits and miss browsing sharply increases with time, the quality score also decreases as plenty of inappropriate feature values incur with the scoring process. Therefore, for a sequence having really Poor to Very-poor quality, it becomes very unlikely to acquire higher quality score using the proposed QMET. This time, for better justifying the performance of QMET against the PSNR, SSIM, and the MOS using the FVV, two different quality segments (i.e. Excellent and Very-poor) have been taken into account. The calculated average score of four metrics for these segments are reported in Fig. 8(a)–(d). The obtained percentages of variations between the highest score (for Excellent quality segment) and the lowest score (for Very-poor quality segment) using PSNR, SSIM, QMET, and MOS are 57.39, 32.49, 78.51, and 69.71 as represented in Fig. 8(e). The outcomes indicate that the QMET estimated average quality segregation score outperforms the rest of the metrics. This is because viewers could better capture good quality synthesized video content with smooth global browsing. Conversely, the poorly reconstructed synthesized views incur with the localized edge reconstruction and crack like artifacts. Thus, the recorded gaze data of poor contents indicate participants haphazard means of browsing (being affected by unsuccessful attempts due to unpleasant quality) that could not meet the balanced feature correlation criteria and generate lower QMET score. Figure 8(f) indicates the maximum achievable difference (e.g. the difference between the highest score of Excellent quality and the lowest score of Very-poor quality segment) picked out by the four metrics where the MOS could outperform the other metrics. The Very-poor quality segment of some synthesized video (e.g. Newspaper) incur with an arbitrarily nominated lower score such as 0.05 (out of 1.0) which lead to such stunning variations. The calculated results for free viewpoint videos in Fig. 8 indicate that the improvement using the subjective assessment such as MOS could perform better than those of the objective metrics PSNR and SSIM. This is mostly due to the PSNR and SSIM do not find an available reference image to calculate the score in this regard. However, according to Fig. 8(e), the human visual perception based QMET could demonstrate relatively improved performance compared to the MOS in terms of segregating different aspects of coded video quality.

Now, two remarkable annotations: first, if different videos are coded using the same quality (e.g. QP = 5 for Excellent), the reproduced scores should have no stunning variations. Surprisingly, the PSNR discard this trend and almost for all quality segments, its variation reached the highest as illustrated in Fig. 9. Thus, it might lose its suitability for a wide range of free view video sequences. On the other side, for the Very-Poor quality segment, the participants perhaps give some arbitrary scores for which the MOS reaches its apex and its proficiency drops down in this regard. This example also requires the development of another subjective metric other than MOS for relatively fairer scoring. Although the QMET performs better than PSNR and MOS, the SSIM appears most stable for all segments. This is because the SSIM is a perception-based model that considers degradation in an image mainly by recognizing the change in structural information. To justify the second observation, i.e. even the same sequence is coded with a range of qualities, the recognition of quality variation should be prominent which has been verified by employing two ranges of variations (Excellent - Fair and Fair - Very-poor) and reported in Fig. 10. For the first range of segments, all the metrics with free view video although perform in a similar manner, the QMET appears the most responsive in differentiating the range of qualities. The SSIM tends to be the least responsive metric in this regard. For the second range of segments (i.e. Fair - Very-poor), the QMET and the MOS reach their apex to indicate their best performance in the context of quality segregation. Interestingly, for both range of segments, the subjective estimators perform relatively better compared to the objective ones.

For further performance estimation of four metrics, the calculated results for all videos used in this test are reported in Table 1 by implementing both the PLCC and SRCC’s evaluation criteria. A good quality metric is expected to achieve higher values in both PLCC and SRCC [8]. According to both PLCC and SRCCs judgement, the QMET reveal the similar performance compared to the PSNR, however, it could obtain relatively higher score compared to the SSIM and MOS. In fact, the obtained results of the proposed metric are promising given the fact that no information about the reference image is available to the QMET for evaluating quality. Since the scoring pattern of four metrics are approximately similar in terms of distinguishing different quality contents as illustrated in Figs. 9 and 10, and Table 1, the proposed QMET could be well represented as a new member of the quality metric family and successfully employed as an impressive alternative to the subjective estimator MOS. It could also be employed to evaluate the effectiveness of using the objective metrics PSNR and SSIM since the QMET does not require any ground-truth reference for quality estimation.

Table 1. Average performance of four metrics according to both PLCC and SRCC’s evaluation criteria.

Full size table

The potential application of QMET could be the evaluation of synthesized views reproduced by different FVV generation algorithms. A good number of contributions could be found in the literature which claim about the image quality improvement mostly depending on the objective metric PSNR, SSIM or the subjective estimator MOS. However, it is presented earlier that the subjective estimator MOS performs better than the objective metrics in most cases during evaluating the FVV quality. Since the proposed QMET is mostly correlated to the proximity of human cognition, its assessment process is presumed to be more neutral compared to the MOS. Moreover, since the view synthesis algorithms go through some post-processing phases such as inverse mapping or inpainting for crack filling, it is highly anticipated to obtain higher quality evaluation score using QMET especially for those algorithms successfully overcoming the crack filling artifacts.

4 Conclusion

In this work, a no-reference video quality assessment metric has been developed based on the free view video. The newly developed metric QMET could be an impressive substitute to the popularly used subjective estimator MOS for quality evaluation and comparison. In the metric generation process, the human perceptual eye- traversing nature on videos is exploited and discovered the patterns of Length, Angle, Pupil-size, and Gaze-duration features from the recorded gaze trajectory for varied video qualities. The content and resolution invariant operations are carried out prior to synthesizing them using an adaptive weighted function to develop the QMET. The experimental analysis reveal that the quality evaluation carried out by the QMET is mostly similar to the MOS and the reference required PSNR and SSIM in terms of assessing different aspects of quality contents. Eventually, the outcomes of four metrics have further been tested using the Pearson Linear Correlation Coefficient (PLCC) and Spearman Rank-Order Correlation Coefficients (SRCC) evaluation criteria which indicate that the QMET could relatively better perform compared to the MOS and the SSIM for a wide range of free viewpoint video contents. Since the eye-tracker data could be easily captured nowadays by directly employing the software based eye-tracking simulator (i.e. device itself is no longer required), the utility of the QMET could also be more flexible using such simple simulator generated data set.

References

Gu, K., Zhai, G., Lin, W., Liu, M.: The analysis of image contrast: from quality assessment to automatic enhancement. IEEE Trans. Cybern. 46(1), 284–297 (2016)
Article Google Scholar
Rahaman, D.M., Paul, M.: Adaptive weighting between warped and learned foregrounds for view synthesize. In: 2017 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 49–54. IEEE (2017)
Google Scholar
Zhu, C., Li, S.: Depth image based view synthesis: new insights and perspectives on hole generation and filling. IEEE Trans. Broadcast. 62(1), 82–93 (2016)
Article Google Scholar
Battisti, F., Bosc, E., Carli, M., Le Callet, P., Perugia, S.: Objective image quality assessment of 3D synthesized views. Sig. Process.: Image Commun. 30, 78–88 (2015)
Google Scholar
Xu, M., Zhang, J., Ma, Y., Wang, Z.: A novel objective quality assessment method for perceptual video coding in conversational scenarios. In: 2014 IEEE Visual Communications and Image Processing Conference, pp. 29–32. IEEE (2014)
Google Scholar
Gu, K., Liu, M., Zhai, G., Yang, X., Zhang, W.: Quality assessment considering viewing distance and image resolution. IEEE Trans. Broadcast. 61(3), 520–531 (2015)
Article Google Scholar
Liu, H., Klomp, N., Heynderickx, I.: A no-reference metric for perceived ringing artifacts in images. IEEE Trans. Circuits Syst. Video Technol. 20(4), 529–539 (2010)
Article Google Scholar
Fang, Y., Ma, K., Wang, Z., Lin, W., Fang, Z., Zhai, G.: No-reference quality assessment of contrast-distorted images based on natural scene statistics. IEEE Signal Process. Lett. 22(7), 838–842 (2015)
Google Scholar
Zhu, K., Li, C., Asari, V., Saupe, D.: No-reference video quality assessment based on artifact measurement and statistical analysis. IEEE Trans. Circuits Syst. Video Technol. 25(4), 533–546 (2015)
Article Google Scholar
Gu, K., Lin, W., Zhai, G., Yang, X., Zhang, W., Chen, C.W.: No-reference quality metric of contrast-distorted images based on information maximization. IEEE Trans. Cybern. 47, 4559–4565 (2016)
Article Google Scholar
Tourancheau, S., Autrusseau, F., Sazzad, Z.P., Horita, Y.: Impact of subjective dataset on the performance of image quality metrics. In: 2008 15th IEEE International Conference on Image Processing, ICIP 2008, pp. 365–368. IEEE (2008)
Google Scholar
Liu, H., Heynderickx, I.: Visual attention in objective image quality assessment: based on eye-tracking data. IEEE Trans. Circuits Syst. Video Technol. 21(7), 971–982 (2011)
Article Google Scholar
Böhme, M., Dorr, M., Graw, M., Martinetz, T., Barth, E.: A software framework for simulating eye trackers. In: Proceedings of the 2008 Symposium on Eye Tracking Research and Applications, pp. 251–258. ACM (2008)
Google Scholar
Seshadrinathan, K., Soundararajan, R., Bovik, A.C., Cormack, L.K.: Study of subjective and objective quality assessment of video. IEEE Trans. Image Process. 19(6), 1427–1441 (2010)
Article MathSciNet MATH Google Scholar
Jia, L., Zhong, X., Tu, Y.: No-reference video quality assessment model based on eye tracking data. In: International Conference on Information, Electronics and Computer, pp. 97–100 (2014)
Google Scholar
Arndt, S., Radun, J., Antons, J.N., Möller, S.: Using eye-tracking and correlates of brain activity to predict quality scores. In: 2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX), pp. 281–285. IEEE (2014)
Google Scholar
Albanesi, M.G., Amadeo, R.: A new algorithm for objective video quality assessment on eye tracking data. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), vol. 1, pp. 462–469. IEEE (2014)
Google Scholar
Tsai, C.-M., Guan, S.-S., Tsai, W.-C.: Eye movements on assessing perceptual image quality. In: Zhou, J., Salvendy, G. (eds.) ITAP 2016. LNCS, vol. 9754, pp. 378–388. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39943-0_37
Google Scholar
Ribeiro, F., Florencio, D., Nascimento, V.: Crowdsourcing subjective image quality evaluation. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 3097–3100. IEEE (2011)
Google Scholar
Streijl, R.C., Winkler, S., Hands, D.S.: Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives. Multimed. Syst. 22(2), 213–227 (2016)
Article Google Scholar
Podder, P.K., Paul, M., Murshed, M.: QMET: a new quality assessment metric for no-reference video coding by using human eye traversal. In: 2016 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–6. IEEE (2016)
Google Scholar
Bross, B., Han, W.J., Ohm, J.R., Sullivan, G.J., Wiegand T.: High efficiency video coding text specification draft 8. JTCVC- J1003, Sweden (2012)
Google Scholar
Joint collaborative team on video coding (JCT-VC), HM software manual, CVS server. (http://hevc.kw.bbc.co.uk/svn/jctvc-hm/). Accessed Dec 2016
Podder, P.K., Paul, M., Rahaman, D.M., Murshed, M.: Improved depth coding for HEVC focusing on depth edge approximation. Sig. Process. Image Commun. 55, 80–92 (2017)
Article Google Scholar
Mulvey, F., Villanueva, A., Sliney, D., Lange, R., Cotmore, S., Donegan, M.: Exploration of safety issues in eyetracking (2008)
Google Scholar
The basics of power law. https://en.wikipedia.org/wiki/power_law. Accessed Dec 2016
Salehin, M.M., Paul, M.: Human visual field based saliency prediction method using eye tracker data for video summarization. In: 2016 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–6. IEEE (2016)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the Australian Research Council under Discovery Projects Grant DP130103670.

Author information

Authors and Affiliations

School of Computing and Mathematics, Charles Sturt University, Bathurst, NSW, 2795, Australia
Pallab Kanti Podder & Manoranjan Paul
School of Information Technology, Federation University Churchill, Churchill, VIC, 3842, Australia
Manzur Murshed

Authors

Pallab Kanti Podder
View author publications
You can also search for this author in PubMed Google Scholar
Manoranjan Paul
View author publications
You can also search for this author in PubMed Google Scholar
Manzur Murshed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pallab Kanti Podder .

Editor information

Editors and Affiliations

School of Computing and Mathematics, Charles Sturt University, Bathurst, New South Wales, Australia
Manoranjan Paul
University of São Paulo, São Paulo, Brazil
Carlos Hitoshi
University of Chinese Academy of Science, Beijing, China
Qingming Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Podder, P.K., Paul, M., Murshed, M. (2018). A Novel No-reference Subjective Quality Metric for Free Viewpoint Video Using Human Eye Movement. In: Paul, M., Hitoshi, C., Huang, Q. (eds) Image and Video Technology. PSIVT 2017. Lecture Notes in Computer Science(), vol 10749. Springer, Cham. https://doi.org/10.1007/978-3-319-75786-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-75786-5_20
Published: 15 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75785-8
Online ISBN: 978-3-319-75786-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)