CN104506852B - A kind of objective quality assessment method towards video conference coding - Google Patents
A kind of objective quality assessment method towards video conference coding Download PDFInfo
- Publication number
- CN104506852B CN104506852B CN201410826849.4A CN201410826849A CN104506852B CN 104506852 B CN104506852 B CN 104506852B CN 201410826849 A CN201410826849 A CN 201410826849A CN 104506852 B CN104506852 B CN 104506852B
- Authority
- CN
- China
- Prior art keywords
- face
- eye
- mouth
- area
- nose
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001303 quality assessment method Methods 0.000 title claims description 18
- 238000000034 method Methods 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 24
- 239000000203 mixture Substances 0.000 claims abstract description 19
- 238000011156 evaluation Methods 0.000 claims abstract description 14
- 230000001815 facial effect Effects 0.000 claims description 14
- 238000002474 experimental method Methods 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 claims description 2
- 238000013441 quality evaluation Methods 0.000 abstract description 11
- 230000007812 deficiency Effects 0.000 abstract description 2
- 230000000007 visual effect Effects 0.000 description 13
- 238000004364 calculation method Methods 0.000 description 7
- 210000000887 face Anatomy 0.000 description 3
- 238000013442 quality metrics Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000000691 measurement method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Landscapes
- Compression Or Coding Systems Of Tv Signals (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
本发明公开了一种面向视频会议编码的客观质量评估方法,包括训练和评估两部分;训练部分包括步骤一:脸部及脸部区域提取;步骤二:获取单个像素点的受关注程度;步骤三:对脸部区域进行校准和归一化;步骤四:获取高斯混合模型;评估部分包括步骤一:针对一组视频,自动提取出背景、脸部、左眼、右眼、嘴、鼻子区域内像素个数;步骤二:对脸部区域进行校准和归一化;步骤三:获取权重图谱;步骤四:计算基于高斯混合模型的峰值信噪比,评估视频会议系统编码后的图像质量。本发明避免了传统方法未考虑到视频内容的不足,可通过赋予视频图像脸部更多的权重,提升图像质量评估的精度,使其更加反映主观质量评估的结果。
The invention discloses an objective quality evaluation method for video conference coding, which includes two parts: training and evaluation; the training part includes step 1: extracting the face and face area; step 2: obtaining the attention degree of a single pixel point; step Step 3: Calibrate and normalize the face area; Step 4: Obtain a Gaussian mixture model; the evaluation part includes Step 1: For a set of videos, automatically extract the background, face, left eye, right eye, mouth, and nose area The number of inner pixels; Step 2: Calibrate and normalize the face area; Step 3: Obtain the weight map; Step 4: Calculate the peak signal-to-noise ratio based on the Gaussian mixture model, and evaluate the encoded image quality of the video conference system. The invention avoids the deficiency that the traditional method does not take into account the video content, and can improve the accuracy of image quality evaluation by giving more weight to the face of the video image, so that it can better reflect the result of subjective quality evaluation.
Description
技术领域technical field
本发明涉及一种面向视频会议编码的客观质量评估方法,属于视频会议编码的感知视觉质量评估技术领域。The invention relates to an objective quality evaluation method for video conference coding, and belongs to the technical field of perceptual visual quality evaluation of video conference coding.
背景技术Background technique
在评估不同的视频编码方式的效率时,视觉质量的指标是必不可少的。感知视频编码的视觉质量评估可以分为两类:主观评估和客观评估。由于人类在观看视频时是最直接的接受者,主观视觉质量评估是评估视频编码的方法中最准确,最可靠的。但其低效率和高代价促进了客观视觉质量的评估指标的发展。客观评估的目的是改善其与主观视觉质量的相关性,以准确地测量视觉质量。最广泛使用的客观指标包括峰值信号噪声比(peaksignal-to-noise ratio,PSNR),结构相似度(structural similarity,SSIM),视觉信号噪声比(visual signal-to-noise ratio,VSNR),视觉质量度量(video quality metrics,VQM)和基于运动的视频完整性评价(MOtion-based Video Integrity Evaluation,MOVIE)。Visual quality metrics are essential when evaluating the efficiency of different video encoding schemes. Visual quality assessment for perceptual video coding can be divided into two categories: subjective assessment and objective assessment. Since humans are the most immediate recipients when viewing video, subjective visual quality assessment is the most accurate and reliable method of evaluating video encoding. But its low efficiency and high cost have promoted the development of objective visual quality evaluation metrics. The purpose of objective assessment is to improve its correlation with subjective visual quality to accurately measure visual quality. The most widely used objective metrics include peak signal-to-noise ratio (PSNR), structural similarity (SSIM), visual signal-to-noise ratio (VSNR), visual quality Metrics (video quality metrics, VQM) and motion-based video integrity evaluation (MOtion-based Video Integrity Evaluation, MOVIE).
感知视频编码的视频会议已经被广泛研究,因为脸部对于视频会议来说是一个ROI(Region-of-Interest,感兴趣区域)。然而,现在没有专门为视频会议开发的客观视觉质量评估方法。Perceptual video coding for video conferencing has been extensively studied because the face is a ROI (Region-of-Interest) for video conferencing. However, there is currently no objective visual quality assessment method developed specifically for videoconferencing.
发明内容Contents of the invention
本发明的目的是为了解决现有的视频质量的客观评估方法的不足,提供了一种针对视频会议编码的客观指标,旨在提高与观看者的主观感知质量之间的相关性。The purpose of the present invention is to solve the deficiency of the existing objective evaluation method of video quality, and provide an objective index for video conference coding, aiming at improving the correlation with the viewer's subjective perception quality.
一种面向视频会议编码的客观质量评估方法,包括训练和评估两部分;An objective quality assessment method for video conference coding, including training and evaluation;
训练部分包括以下几个步骤:The training part consists of the following steps:
步骤一:脸部及脸部区域提取;Step 1: Face and face area extraction;
步骤二:进行眼动仪实验,获取测试者观看视频时对于每一帧图像的关注点坐标位置,得到单个像素点的受关注程度;Step 2: Carry out an eye tracker experiment to obtain the coordinate position of the attention point of each frame of the image when the tester watches the video, and obtain the degree of attention of a single pixel;
步骤三:对脸部区域进行校准和归一化;Step 3: Calibrate and normalize the face area;
步骤四:获取高斯混合模型;Step 4: Obtain the Gaussian mixture model;
评估部分包括以下几个步骤:The evaluation section consists of the following steps:
步骤一:针对一组视频,重复训练部分步骤一,自动提取出背景、脸部、左眼、右眼、嘴、鼻子区域内像素个数;Step 1: For a group of videos, repeat step 1 of the training part to automatically extract the number of pixels in the background, face, left eye, right eye, mouth, and nose areas;
步骤二:重复训练过程的步骤三,对脸部区域进行校准和归一化;Step 2: Repeat step 3 of the training process to calibrate and normalize the face area;
步骤三:在训练阶段获得高斯混合模型基础上,计算出右眼、左眼、口、鼻、脸部其他区域、背景区域的权重及以上各区域周围的高斯分布权重,得到权重图谱;Step 3: Based on the Gaussian mixture model obtained during the training phase, calculate the weights of the right eye, left eye, mouth, nose, other areas of the face, background area, and the Gaussian distribution weights around the above areas to obtain a weight map;
步骤四:在权重图谱基础上,计算基于高斯混合模型的峰值信噪比,评估视频会议系统编码后的图像质量。Step 4: On the basis of the weight map, calculate the peak signal-to-noise ratio based on the Gaussian mixture model, and evaluate the encoded image quality of the video conference system.
本发明的优点在于:The advantages of the present invention are:
(1)本发明针对视频会议系统编码后的图像质量评估方法,避免了传统方法未考虑到视频内容的不足,可通过赋予视频图像脸部更多的权重,提升图像质量评估的精度,使其更加反映主观质量评估的结果;(1) The present invention is aimed at the image quality evaluation method after video conferencing system encoding, which avoids the lack of video content that is not considered in the traditional method, and can improve the accuracy of image quality evaluation by giving more weight to the face of the video image, making it More reflect the results of subjective quality assessment;
(2)本发明在脸部各区域(如鼻子、嘴巴)提取基础上,对于脸部的一些关键区域赋予更大的权重,从而满足当今及未来视频会议系统分辨率不断提高、显示尺寸越来越大的发展趋势;(2) On the basis of extracting various areas of the face (such as nose and mouth), the present invention assigns greater weights to some key areas of the face, so as to meet the needs of current and future video conferencing systems with continuous improvement in resolution and increasing display size. greater development trend;
(3)本发明通过引入眼动仪的实验数据,结合统计学习的计算工具,可挖掘在视频会议时人视觉注意力的规律,进一步将其应用于视频会议系统编码后的图像质量评估,大幅提高其与主观质量评估的相关度。(3) The present invention can excavate the rule of people's visual attention when video conferencing by introducing the experimental data of the eye tracker, in conjunction with the calculation tool of statistical learning, and further apply it to the image quality evaluation after video conferencing system coding, greatly Improve its correlation with subjective quality assessment.
附图说明Description of drawings
图1是本发明的方法流程图;Fig. 1 is method flowchart of the present invention;
图2脸部特征自动标定算法;Figure 2 facial features automatic calibration algorithm;
图3脸部关键区域的自动提取;Figure 3 Automatic extraction of key areas of the face;
图4校准和归一化的方法;Figure 4 Calibration and normalization methods;
图5权重图谱的绘制方法;The drawing method of Fig. 5 weight map;
图6 GMM-PSNR的计算示意。Fig. 6 Schematic diagram of calculation of GMM-PSNR.
具体实施方式detailed description
下面将结合附图和实施例对本发明作进一步的详细说明。The present invention will be further described in detail with reference to the accompanying drawings and embodiments.
本发明采用了实时脸部特征自动标定的方法,以跟踪脸部的关键特征点。在人脸检测之后,通过结合本地检测(纹理信息)和全局优化(面部结构),关键特征的点的分布模型(PDM)在视频帧上生成。在本发明中,利用66点PDM来提取的脸和脸部的轮廓。66点的PDM能很好地进行面部和面部特征的关键点的采样,并且因此这些点可以连接以精确地提取的脸部和脸部特征的轮廓和区域。因此,66点的PDM被用在的方法中以抽取的脸和脸部关键特征。最后,脸部和脸部关键区域根据它们的轮廓提取。The present invention adopts a real-time facial feature automatic calibration method to track key feature points of the face. After face detection, a Point Distribution Model (PDM) of key features is generated on video frames by combining local detection (texture information) and global optimization (facial structure). In the present invention, a 66-point PDM is used to extract the face and the contour of the face. The 66-point PDM can well sample key points of faces and facial features, and thus these points can be connected to accurately extract contours and regions of faces and facial features. Therefore, a 66-point PDM is used in the method to extract face and facial key features. Finally, faces and face key regions are extracted according to their contours.
在对话类场景的视频中,通过实验发现,视频中人脸内容可以吸引观测者绝大部分关注力。因此,根据观测者的关注力的不同,进一步量化背景,脸和五官不平等的重要性,从而提升视频会议的客观质量评估精确度。为了取得这样的不平等的重要性的值,在会议相关的视频上进行了一些眼动仪实验。In the video of dialogue scenes, it is found through experiments that the face content in the video can attract most of the attention of the observer. Therefore, according to the observer's focus, the importance of background, face and facial features is further quantified, thereby improving the accuracy of objective quality assessment of video conferencing. To obtain such unequal importance values, some eye-tracking experiments were performed on conference-related videos.
在实验中,使用眼动仪记录了观测者观看视频时,落在视频帧上的眼睛注视点。眼睛注视点代表者观测者的关注点,因此,眼睛跟踪的结果可以用来产生主观的关注模型。眼动仪实验后,属于右眼,左眼,口,鼻,脸部其他区域,以及背景的眼睛注视点的数目被记录。根据落在不同区域的眼球注视点的数目,引入一个新的概念,眼睛凝视点/像素(EFP/P),以反映在这些区域关注度的像素水平。在这里,有以下的EFP/P值。In the experiment, an eye tracker was used to record the gaze point of the observer's eyes on the video frame when watching the video. Eye gaze points represent the focus points of the observer, therefore, the results of eye tracking can be used to generate subjective attention models. After the eye tracker experiment, the number of eye fixations belonging to the right eye, left eye, mouth, nose, other areas of the face, and the background were recorded. According to the number of eye fixation points falling on different areas, a new concept, eye fixation point/pixel (EFP/P), is introduced to reflect the pixel level of attention in these areas. Here, there are the following EFP/P values.
取得上述眼动仪实验的结果后,利用其来训练GMM,以产生每个视频帧的重要性权重图谱。因此,GMM-PSNR可以通过结合相应的权重图谱来计算。训练GMM之前,要进行预处理以校准和归一化在上节获得的眼睛注视点。随后,在校准和归一化的眼球注视点上,用期望最大化(EM)算法来训练GMM。GMM可以通过运行几次EM迭代直至收敛而得到。鉴于得到的GMM的参数,可以计算出权重图谱,来建立客观的度量GMM-PSNR。Once the results of the above eye tracker experiments are obtained, they are used to train a GMM to produce an importance weight map for each video frame. Therefore, GMM-PSNR can be computed by combining the corresponding weight maps. Before training the GMM, preprocessing is performed to calibrate and normalize the eye fixations obtained in the previous section. Subsequently, the GMM is trained with the Expectation-Maximization (EM) algorithm on the calibrated and normalized eye fixations. GMM can be obtained by running several EM iterations until convergence. Given the parameters of the obtained GMM, a weight map can be calculated to establish an objective measure of GMM-PSNR.
本发明是一种面向视频会议编码的客观质量评估方法,流程如图1所示,包括训练和评估两部分;The present invention is an objective quality evaluation method oriented to video conference coding, the process flow is shown in Figure 1, including two parts of training and evaluation;
训练部分包括以下几个步骤:The training part consists of the following steps:
步骤一:脸部及脸部区域提取;Step 1: Face and face area extraction;
利用脸部特征自动标定算法在给定的视频会议序列中自动提取出背景、脸部、左眼、右眼、嘴、鼻子区域内像素个数。Using the facial feature automatic calibration algorithm to automatically extract the number of pixels in the background, face, left eye, right eye, mouth, and nose area in a given video conference sequence.
具体为:第一、通过脸部特征自动标定算法获取视频会议序列中每一帧图像中的脸部区域关键点,第二、利用平均值漂移技术在提取出的脸部区域上,局部搜索脸部区域图像中的左眼、右眼、嘴、鼻子区域关键点,并将这些关键点与数据库中的关键点分布模型(PDM)进行匹配,实现左眼、右眼、嘴、鼻子区域关键点优化,第三、得到优化后的每一帧图像中的脸部、左眼、右眼、嘴、鼻子区域关键点,如图2所示,共得到了66关键点,第四、分别将脸部、左眼、右眼、嘴、鼻子区域的关键点相连,得到脸部、左眼、右眼、嘴、鼻子轮廓,如图3所示,第五、分别获取脸部、左眼、右眼、嘴、鼻子区域内像素个数,将图像像素个数减去脸部像素个数,得到背景像素个数,最终实现脸部关键区域的自动提取。Specifically: first, obtain the key points of the face area in each frame image in the video conference sequence through the facial feature automatic calibration algorithm; second, use the average value drift technology to locally search the face area The key points of the left eye, right eye, mouth, and nose area in the image of the inner region, and match these key points with the key point distribution model (PDM) in the database to realize the key points of the left eye, right eye, mouth, and nose area Optimization, thirdly, get the key points of the face, left eye, right eye, mouth, and nose in each frame of image after optimization, as shown in Figure 2, a total of 66 key points have been obtained, and fourthly, the face The key points of the face, left eye, right eye, mouth, and nose are connected to obtain the contours of the face, left eye, right eye, mouth, and nose, as shown in Figure 3. Fifth, obtain the face, left eye, and right The number of pixels in the eye, mouth, and nose areas is subtracted from the number of image pixels by the number of face pixels to obtain the number of background pixels, and finally the automatic extraction of key areas of the face is realized.
其中,点分布模型是采用平均值漂移技术,通过对一组标准测试图像的训练。Among them, the point distribution model adopts the mean shift technique and is trained on a set of standard test images.
其中,可以提取不同人脸图像中的脸部、左眼、右眼、嘴、鼻子区域关键点。Among them, the key points of the face, left eye, right eye, mouth, and nose area in different face images can be extracted.
步骤二:进行眼动仪实验,获取测试者观看视频时对于每一帧图像的关注点坐标位置,得到单个像素点的受关注程度;Step 2: Carry out an eye tracker experiment to obtain the coordinate position of the attention point of each frame of the image when the tester watches the video, and obtain the degree of attention of a single pixel;
设单个区域(左眼、右眼、嘴、鼻子、脸部其他区域、背景)的受关注程度为眼睛关注点数目/该区域像素个数(efp/p):Let the degree of attention of a single area (left eye, right eye, mouth, nose, other areas of the face, background) be the number of eye attention points / the number of pixels in this area (efp/p):
其中:cr、cl、cm、cn、co、cb分别表示右眼、左眼、口、鼻子、脸部其他区域、背景区域的单个像素点的关注程度,fr、fl、fm、fn、fo、fb分别表示在眼动仪实验中,测试者落在右眼、左眼、口、鼻、脸部其他区域、背景区域的眼睛关注点数目,pr、pl、pm、pn、po、pb分别表示右眼、左眼、口、鼻、脸部其他区域、背景区域中的像素点数目;Among them: c r , c l , cm , c n , c o , c b respectively represent the degree of attention of a single pixel in the right eye, left eye, mouth, nose, other areas of the face, and background area, f r , f l , f m , f n , f o , and f b respectively represent the number of focus points of the tester on the right eye, left eye, mouth, nose, other areas of the face, and background areas in the eye tracker experiment, p r , p l , p m , p n , p o , p b respectively represent the number of pixels in the right eye, left eye, mouth, nose, other areas of the face, and the background area;
步骤三:对脸部区域进行校准和归一化;Step 3: Calibrate and normalize the face area;
校准可以避免人脸在图像中不同位置所导致的不确定性,而归一化的方法可以使得本发明适应于视频会议中人脸区域像素数目不等的情况。The calibration can avoid the uncertainty caused by different positions of the face in the image, and the normalization method can make the present invention adapt to the situation that the number of pixels in the face area is not equal in the video conference.
具体方法为:The specific method is:
如图4(a)所示,随机选取一帧图像,采用图像脸部区域关键点中最左侧点,作为校准原始点B,获取其他图像中脸部区域关键点中最左侧点A,获取A、B之间坐标转换关系,将其他图像中关注点根据坐标转换关系进行转换,完成校准。As shown in Figure 4(a), a frame of image is randomly selected, and the leftmost point in the key points of the face area of the image is used as the original point B for calibration, and the leftmost point A in the key points of the face area in other images is obtained. Obtain the coordinate transformation relationship between A and B, convert the points of interest in other images according to the coordinate transformation relationship, and complete the calibration.
如图4(b)所示,随机选取一帧图像,采用图像中人物右眼的横坐标长度(在66点中的右眼右侧的点和右眼左侧的点之间的距离)作为归一化单元,将其他图像中的关注点根据归一化单元进行归一化处理。As shown in Figure 4(b), a frame of image is randomly selected, and the abscissa length of the right eye of the person in the image (the distance between the point on the right side of the right eye and the point on the left side of the right eye among the 66 points) is used as The normalization unit is used to normalize the points of interest in other images according to the normalization unit.
步骤四:获取高斯混合模型;Step 4: Obtain a Gaussian mixture model;
假设眼睛注视点服从高斯混合模型,在归一化与校准眼动仪数据的基础上,通过高斯混合模型写成高斯分量的线性叠加如下:Assuming that the gaze point of the eyes obeys the Gaussian mixture model, on the basis of the normalized and calibrated eye tracker data, the linear superposition of Gaussian components is written through the Gaussian mixture model as follows:
其中:表示一个高斯分量,πk,μk和Σk是第k个高斯分量的混合系数,均值和方差,且x*表示二维校准和归一化后的眼睛注视点。K代表GMM的高斯分量的数量。由于鼻子的眼睛注视点的数量比眼睛和嘴的少得多,在这里将高斯分量的数目K设置为3,它们各自对应于右眼,左眼和嘴。同时,将μk设置为每个脸部特征的归一化质心。in: represents a Gaussian component, π k , μ k and Σ k are the mixing coefficients, mean and variance of the kth Gaussian component, and x * represents the eye gaze point after two-dimensional calibration and normalization. K represents the number of Gaussian components of the GMM. Since the number of eye fixations of the nose is much less than that of the eyes and mouth, the number K of Gaussian components is set to 3 here, which correspond to the right eye, left eye and mouth respectively. Meanwhile, μk is set as the normalized centroid of each facial feature.
上述步骤在离线情况下,针对一组训练视频,通过设计眼动仪实验及其数据分析,获得用于评估视频会议系统客观质量的高斯混合模型。In the above steps, in an offline situation, aiming at a set of training videos, the Gaussian mixture model used to evaluate the objective quality of the video conference system is obtained by designing an eye tracker experiment and its data analysis.
(2)评估部分包括以下几个步骤(2) The evaluation part includes the following steps
步骤一:同训练过程的步骤一相同,自动提取出背景、人脸、左眼、右眼、嘴、鼻子区域。Step 1: Same as Step 1 of the training process, the background, face, left eye, right eye, mouth, and nose regions are automatically extracted.
步骤二:同训练过程的步骤三相同,校准与归一化视频的脸部区域。具体内容见图4。Step 2: Same as step 3 of the training process, calibrate and normalize the face area of the video. See Figure 4 for details.
步骤三:在训练阶段获得高斯混合模型基础上,计算出右眼、左眼、口、鼻、脸部其他区域、背景区域的权重及以上各区域周围的高斯分布权重,具体内容见图5。Step 3: Based on the Gaussian mixture model obtained during the training phase, calculate the weights of the right eye, left eye, mouth, nose, other areas of the face, background area, and the weights of the Gaussian distribution around the above areas. See Figure 5 for details.
图5为权重图谱的绘制方法。在本实例实施中,可通过权重图谱量化视频会议系统中人脸及背景各像素的重要性。本实例的输入为视频会议的一帧图像。首先,按照图3的方法自动提取出脸部以脸部的关键区域。其次,对于视频中的关键点按照图4的方法进行校准和归一化。最后,使用图1训练部分步骤二、四的GMM训练得到的参数,根据各像素所属区域(主要有背景、脸部、左眼、右眼、鼻子、嘴巴),通过下面公式计算出各像素的权重,并输出该视频会议图像的权重图谱,通过权重大小设定图像各像素在质量评估时的重要性。Figure 5 shows the drawing method of the weight map. In the implementation of this example, the weight map can be used to quantify the importance of each pixel of the face and the background in the video conferencing system. The input of this example is a frame image of a video conference. First, according to the method in Fig. 3, the face and key areas of the face are automatically extracted. Secondly, the key points in the video are calibrated and normalized according to the method in Figure 4. Finally, using the parameters obtained from the GMM training in steps 2 and 4 of the training part in Figure 1, according to the area to which each pixel belongs (mainly background, face, left eye, right eye, nose, and mouth), calculate the value of each pixel by the following formula: weight, and output the weight map of the video conference image, and set the importance of each pixel of the image in quality evaluation through the weight.
其中,in,
本发明不局限于采用这种方法来设定图像像素的权重大小。The present invention is not limited to adopting this method to set the weight of image pixels.
步骤四:在权重图谱基础上,计算基于高斯混合模型的峰值信噪比(GMM-PSNR),评估视频会议系统编码后的图像质量。具体内容见图4。Step 4: On the basis of the weight map, calculate the peak signal-to-noise ratio (GMM-PSNR) based on the Gaussian mixture model, and evaluate the encoded image quality of the video conference system. See Figure 4 for details.
图6为GMM-PSNR的计算示意。在本实例实施中,GMM-PSNR的计算可输出衡量视频会议系统编码后图像质量的GMM-PSNR。首先,同传统衡量方法(如PSNR)一样,通过计算原始视频图像与待评估视频图像的均方根误差,获取编码前后图像的残差。接着,将均方根误差与权重图谱的加权相乘,即可得到GMM-MSE的值。最终,通过取对数的方法,计算出GMM-PSNR。具体计算方法及其计算公式在图1的说明内。本发明不局限于对传统PSNR的改进。还可对其他衡量方法(如SSIM=Structural SIMilarity,SSIM)通过与权重图谱中的加权相乘进行改进。Figure 6 is a schematic diagram of the calculation of GMM-PSNR. In this implementation example, the calculation of the GMM-PSNR can output the GMM-PSNR which measures the quality of the encoded image of the video conferencing system. First, as with traditional measurement methods (such as PSNR), by calculating the root mean square error between the original video image and the video image to be evaluated, the residual error of the image before and after encoding is obtained. Then, the root mean square error is multiplied by the weight of the weight map to obtain the value of GMM-MSE. Finally, the GMM-PSNR is calculated by taking the logarithm. The specific calculation method and its calculation formula are in the description of Figure 1. The invention is not limited to improvements over conventional PSNR. Other measurement methods (such as SSIM=Structural SIMilarity, SSIM) can also be improved by multiplying them with the weights in the weight map.
具体计算公式如下:The specific calculation formula is as follows:
其中I′x和Ix分别是在处理视频和原始视频帧上像素x的值。M和N分别是沿垂直方向和水平方向的像素数。n(=8)是位深度。where I'x and Ix are the values of pixel x on the processed video and the original video frame, respectively. M and N are the number of pixels along the vertical direction and the horizontal direction, respectively. n (=8) is the bit depth.
最终,本发明可输出视频会议系统编码后基于高斯混合模型的峰值信噪比(GMM-PSNR),用来衡量视频编码前后图像质量的降低情况。与传统峰值信噪比(PSNR)相同,GMM-PSNR的衡量单位为dB。但是,由于人在观看视频时对于图像中各区域的关注度不同,GMM-PSNR对于视频会议系统中重要性不等的脸部区域赋予大小不同的权重,从而大幅提升其与主观质量评估的相关度。Finally, the present invention can output the peak signal-to-noise ratio (GMM-PSNR) based on the Gaussian mixture model (GMM-PSNR) of the video conference system after encoding, which is used to measure the degradation of image quality before and after video encoding. Like traditional peak signal-to-noise ratio (PSNR), GMM-PSNR is measured in dB. However, because people pay different attention to each area in the image when watching the video, GMM-PSNR assigns different weights to the face areas of different importance in the video conferencing system, thereby greatly improving its correlation with subjective quality assessment. Spend.
本发明对于视频会议中视频传输的质量可以提供一种更为有效的评测方法。经过测试,相对于传统的客观视频评估方法,如VQM、MOVIE、PSNR而言,GMM-PSNR显著提高了与主观测试标准,如MOS、DMOS,之间的相关性,说明了GMM-PSNR可以作为一种更有效的面向视频会议编码的客观度量。这对于视频会议的视频处理、压缩以及视频通信都是十分有利的。它可以监测视频系统的性能,并给出调节编解码器或者信道参数的反馈,保证视频质量在可接受的范围内。视频质量评估标准也可以用于对编解码器性能的设计、评估和优化。其也可以用于设计和优化符合人类视觉模型的数字视频系统。The present invention can provide a more effective evaluation method for the quality of video transmission in the video conference. After testing, compared with traditional objective video evaluation methods, such as VQM, MOVIE, PSNR, GMM-PSNR significantly improves the correlation with subjective test standards, such as MOS, DMOS, which shows that GMM-PSNR can be used as A More Efficient Objective Metric for Videoconferencing Coding. This is very beneficial for video processing, compression and video communication of video conferencing. It can monitor the performance of the video system and give feedback to adjust the codec or channel parameters to ensure that the video quality is within an acceptable range. Video quality assessment criteria can also be used for the design, evaluation and optimization of codec performance. It can also be used to design and optimize digital video systems that conform to the human visual model.
本发明涉及视频序列的客观质量评估方法,用于视频会议编码的感知视觉质量评估。本发明采用了眼动仪实验以及脸部和脸部特征提取的实时技术。在实验中,背景、脸部和脸部特征区域的重要性基于观察者对各个部分的关注度被确定。利用眼动仪采集到的眼睛凝视点,并假设其分布为高斯混合模型,可生成一个重要性权重图谱,由此可观察者对于会议视频中各区域的关注度。根据这个产生的权重图谱,可以给视频帧上的每个像素分配不同的权重,从而改进已有的客观视频质量评估方法。更具体地,本发明涉及一种基于已有视频质量评估方法的视频会议编码的感知视频质量评估。The invention relates to an objective quality assessment method for video sequences, which is used for perceptual visual quality assessment of video conference coding. The present invention employs eye tracker experiments and real-time techniques for face and face feature extraction. In the experiments, the importance of the background, face, and facial feature regions is determined based on the observer's attention to each part. Using the gaze points collected by the eye tracker, and assuming that the distribution is a Gaussian mixture model, an importance weight map can be generated, so that the observer's attention to each area in the conference video can be measured. According to this generated weight map, different weights can be assigned to each pixel on the video frame, thereby improving the existing objective video quality assessment methods. More specifically, the present invention relates to a perceptual video quality assessment for video conference coding based on existing video quality assessment methods.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410826849.4A CN104506852B (en) | 2014-12-25 | 2014-12-25 | A kind of objective quality assessment method towards video conference coding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410826849.4A CN104506852B (en) | 2014-12-25 | 2014-12-25 | A kind of objective quality assessment method towards video conference coding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104506852A CN104506852A (en) | 2015-04-08 |
CN104506852B true CN104506852B (en) | 2016-08-24 |
Family
ID=52948564
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410826849.4A Active CN104506852B (en) | 2014-12-25 | 2014-12-25 | A kind of objective quality assessment method towards video conference coding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104506852B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10860858B2 (en) * | 2018-06-15 | 2020-12-08 | Adobe Inc. | Utilizing a trained multi-modal combination model for content and text-based evaluation and distribution of digital video content to client devices |
CN109376645B (en) * | 2018-10-18 | 2021-03-26 | 深圳英飞拓科技股份有限公司 | Face image data optimization method and device and terminal equipment |
CN110365966B (en) * | 2019-06-11 | 2020-07-28 | 北京航空航天大学 | A window-based video quality evaluation method and device |
CN113506260B (en) * | 2021-07-05 | 2023-08-29 | 贝壳找房(北京)科技有限公司 | Face image quality assessment method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102170552A (en) * | 2010-02-25 | 2011-08-31 | 株式会社理光 | Video conference system and processing method used therein |
CN102984540A (en) * | 2012-12-07 | 2013-03-20 | 浙江大学 | Video quality assessment method estimated on basis of macroblock domain distortion degree |
WO2013056123A2 (en) * | 2011-10-14 | 2013-04-18 | T-Mobile USA, Inc | Quality of user experience testing for video transmissions |
CN104243994A (en) * | 2014-09-26 | 2014-12-24 | 厦门亿联网络技术股份有限公司 | Method for real-time motion sensing of image enhancement |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8279259B2 (en) * | 2009-09-24 | 2012-10-02 | Microsoft Corporation | Mimicking human visual system in detecting blockiness artifacts in compressed video streams |
-
2014
- 2014-12-25 CN CN201410826849.4A patent/CN104506852B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102170552A (en) * | 2010-02-25 | 2011-08-31 | 株式会社理光 | Video conference system and processing method used therein |
WO2013056123A2 (en) * | 2011-10-14 | 2013-04-18 | T-Mobile USA, Inc | Quality of user experience testing for video transmissions |
CN102984540A (en) * | 2012-12-07 | 2013-03-20 | 浙江大学 | Video quality assessment method estimated on basis of macroblock domain distortion degree |
CN104243994A (en) * | 2014-09-26 | 2014-12-24 | 厦门亿联网络技术股份有限公司 | Method for real-time motion sensing of image enhancement |
Also Published As
Publication number | Publication date |
---|---|
CN104506852A (en) | 2015-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shao et al. | Full-reference quality assessment of stereoscopic images by learning binocular receptive field properties | |
CN108428227B (en) | No-reference image quality evaluation method based on full convolution neural network | |
CN105338343B (en) | It is a kind of based on binocular perceive without refer to stereo image quality evaluation method | |
CN103763552B (en) | Stereoscopic image non-reference quality evaluation method based on visual perception characteristics | |
CN110991281A (en) | A dynamic face recognition method | |
US20100208078A1 (en) | Horizontal gaze estimation for video conferencing | |
CN104506852B (en) | A kind of objective quality assessment method towards video conference coding | |
CN107396095B (en) | A kind of no reference three-dimensional image quality evaluation method | |
CN110766658B (en) | Non-reference laser interference image quality evaluation method | |
CN107318014B (en) | The video quality evaluation method of view-based access control model marking area and space-time characterisation | |
CN106447647A (en) | No-reference quality evaluation method of compression perception recovery images | |
Yang et al. | Blind assessment for stereo images considering binocular characteristics and deep perception map based on deep belief network | |
CN108683909A (en) | VR audio and video overall user experience quality evaluation method | |
CN109788275A (en) | Naturality, structure and binocular asymmetry are without reference stereo image quality evaluation method | |
Xia et al. | Toward accurate quality estimation of screen content pictures with very sparse reference information | |
CN108259893B (en) | A virtual reality video quality evaluation method based on two-stream convolutional neural network | |
CN110796635B (en) | A Light Field Image Quality Evaluation Method Based on Shearlet Transform | |
US10586110B2 (en) | Techniques for improving the quality of subjective data | |
CN109009094A (en) | Vision based on EEG signals KC complexity induces motion sickness detection method | |
CN108470336A (en) | Stereo image quality evaluation method based on stacking-type autocoder | |
CN105678775B (en) | A Machine Learning-Based Evaluation Method for Color Correction | |
CN105469413B (en) | It is a kind of based on normalization ring weighting without refer to smear restoration image synthesis method for evaluating quality | |
CN112866683B (en) | Quality assessment method based on video preprocessing and transcoding | |
CN110838120A (en) | A weighted quality evaluation method for asymmetric distorted 3D video based on spatiotemporal information | |
CN110175981A (en) | Image quality evaluating method based on multiple features fusion BRISQUE algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |