CN107241643A

CN107241643A - A kind of multimedia volume adjusting method and system

Info

Publication number: CN107241643A
Application number: CN201710656251.9A
Authority: CN
Inventors: 李孟歆; 林佰凤; 张锐; 张颖; 侯静
Original assignee: Shenyang Jianzhu University
Current assignee: Shenyang Jianzhu University
Priority date: 2017-08-03
Filing date: 2017-08-03
Publication date: 2017-10-10

Abstract

The invention discloses a multimedia volume adjustment method and system. The method includes: acquiring a depth image of a user gesture; performing image segmentation of a hand target area on the depth image to obtain a segmented target area image; performing gesture edge contour detection using a Sobel operator according to the target area image, Extract edge parameter features; utilize DAG-SVMs classifier to obtain gesture classification results according to the edge parameter features; adjust the volume level of multimedia according to the gesture classification results, and different gesture classifications correspond to different volume levels. With the method or system of the present invention, it is only necessary to connect a multimedia device to a peripheral device to acquire a depth image, and then the volume of the multimedia itself can be controlled through gesture operations. The control process uses common digital gestures in daily life, and the entire control process The external equipment is simple, the operability is strong, and the convenience is high.

Description

Method and system for adjusting multimedia volume

技术领域technical field

本发明涉及智能控制领域，特别是涉及一种多媒体音量调节方法及系统。The invention relates to the field of intelligent control, in particular to a multimedia volume adjustment method and system.

背景技术Background technique

多媒体是多种媒体的综合，一般包括文本，声音和图像等多种媒体形式。目前对多媒体音量的控制常用的有两种方式：一是通过多媒体设备的音量按钮手动调节；二是通过遥控设备远程调节。相比于手动调节音量的方式，远程调节在一定程度上提高了便捷性。但是不同的多媒体设备需要特定的遥控设备才能进行操作，而且并不是所有的多媒体设备都可以采用遥控设备远程调节音量，例如，电脑的音量调节只能采用手动接触鼠标或键盘的机械交互模式；多媒体课堂中PowerPoint展示，可以利用遥控设备进行页面的遥控操作，但是对于视频的音量或者其他多媒体的音量，只能通过鼠标进行调节。因此，现有的多媒体音量调节方式便捷性低。Multimedia is a combination of various media, generally including text, sound and image and other media forms. Currently, there are two commonly used methods for controlling the volume of multimedia: one is manual adjustment through the volume button of the multimedia device; the other is remote adjustment through a remote control device. Compared with the way of manually adjusting the volume, the remote adjustment improves the convenience to a certain extent. However, different multimedia devices require specific remote control devices to operate, and not all multimedia devices can use remote control devices to remotely adjust the volume. In the PowerPoint demonstration in the classroom, the remote control device can be used to perform remote control of the page, but the volume of the video or other multimedia can only be adjusted by the mouse. Therefore, the existing multimedia volume adjustment method has low convenience.

发明内容Contents of the invention

本发明的目的是提供一种多媒体音量调节方法及系统，以提高多媒体音量调节的便捷性。The object of the present invention is to provide a multimedia volume adjustment method and system to improve the convenience of multimedia volume adjustment.

为实现上述目的，本发明提供了如下方案：To achieve the above object, the present invention provides the following scheme:

一种多媒体音量调节方法，所述方法包括：A method for adjusting multimedia volume, said method comprising:

获取用户手势的深度图像，所述深度图像包括所述用户手势中手部关节的空间坐标；Obtaining a depth image of the user gesture, the depth image including the spatial coordinates of the hand joints in the user gesture;

对所述深度图像进行手部目标区域的图像分割，获得分割后的目标区域图像；Carrying out image segmentation of the hand target area on the depth image to obtain a segmented target area image;

根据所述目标区域图像，利用Sobel算子进行手势边缘轮廓检测，提取边缘参数特征；According to described target area image, utilize Sobel operator to carry out gesture edge contour detection, extract edge parameter feature;

根据所述边缘参数特征利用DAG-SVMs分类器获得手势分类结果；Utilize DAG-SVMs classifier to obtain gesture classification result according to described edge parameter feature;

根据所述手势分类结果调节多媒体的音量级别，不同的手势分类对应不同的音量级别。The volume level of the multimedia is adjusted according to the gesture classification result, and different gesture classifications correspond to different volume levels.

可选的，所述获取用户手势的深度图像，具体包括：Optionally, the acquiring the depth image of the user gesture specifically includes:

根据用户手势，利用Kinect摄像机获取所述用户手势的深度图像。According to the user's gesture, a Kinect camera is used to obtain a depth image of the user's gesture.

可选的，所述对所述深度图像进行手部目标区域的图像分割，具体包括：Optionally, the image segmentation of the hand target area on the depth image specifically includes:

根据所述深度图像绘制图像深度直方图；Draw an image depth histogram according to the depth image;

根据所述直方图中深度值波谷确定多个待定阈值区间；Determining a plurality of undetermined threshold intervals according to the depth value trough in the histogram;

利用Kinect骨骼追踪技术确定最终阈值区间；Use Kinect bone tracking technology to determine the final threshold interval;

根据所述最终阈值区间进行手部目标区域的图像分割，获得分割后的目标区域图像。The image segmentation of the target region of the hand is performed according to the final threshold interval to obtain the segmented image of the target region.

可选的，所述对所述深度图像进行手部目标区域的图像分割之前，还包括：Optionally, before performing the image segmentation of the hand target region on the depth image, it also includes:

对所述深度图像进行二值化处理，获得处理后的深度图像。Perform binarization processing on the depth image to obtain a processed depth image.

可选的，所述边缘参数特征包括边缘参数特征Hu矩和轮廓边缘参数特征长度矩，其中，Optionally, the edge parameter features include edge parameter feature Hu moments and contour edge parameter feature length moments, wherein,

利用Hu矩与长度矩的手势边缘特征作为数字手势图像的特征，提取边缘参数特征Hu矩和轮廓边缘参数特征长度矩。The gesture edge features of Hu moments and length moments are used as the features of the digital gesture image, and the edge parameter feature Hu moment and the contour edge parameter feature length moment are extracted.

一种多媒体音量调节系统，所述系统包括：A multimedia volume adjustment system, said system comprising:

深度图像获取模块，用于获取用户手势的深度图像，所述深度图像包括所述用户手势中手部关节的空间坐标；A depth image acquisition module, configured to acquire a depth image of a user gesture, the depth image including the spatial coordinates of the hand joints in the user gesture;

图像分割模块，用于对所述深度图像进行手部目标区域的图像分割，获得分割后的目标区域图像；An image segmentation module, configured to perform image segmentation of the hand target area on the depth image to obtain a segmented target area image;

边缘参数特征提取模块，用于根据所述目标区域图像，利用Sobel算子进行手势边缘轮廓检测，提取边缘参数特征；Edge parameter feature extraction module, for according to described target area image, utilize Sobel operator to carry out gesture edge profile detection, extract edge parameter feature;

手势分类结果获取模块，用于根据所述边缘参数特征利用DAG-SVMs分类器获得手势分类结果；Gesture classification result acquisition module, for utilizing DAG-SVMs classifier to obtain gesture classification result according to the edge parameter feature;

音量调节模块，用于根据所述手势分类结果调节多媒体的音量级别，不同的手势分类对应不同的音量级别。The volume adjustment module is configured to adjust the volume level of the multimedia according to the gesture classification result, and different gesture classifications correspond to different volume levels.

可选的，所述深度图像获取模块为Kinect摄像机，用于根据镜头前的用户手势，获取所述用户手势的深度图像。Optionally, the depth image acquisition module is a Kinect camera, configured to acquire the depth image of the user's gesture according to the user's gesture in front of the camera.

可选的，所述图像分割模块，具体包括：Optionally, the image segmentation module specifically includes:

直方图绘制单元，用于根据所述深度图像绘制图像深度直方图；a histogram drawing unit, configured to draw an image depth histogram according to the depth image;

待定阈值区间确定单元，用于根据所述直方图中深度值波谷确定多个待定阈值区间；An undetermined threshold interval determination unit, configured to determine a plurality of undetermined threshold intervals according to the depth value trough in the histogram;

最终阈值区间确定单元，用于利用Kinect骨骼追踪技术确定最终阈值区间；The final threshold interval determination unit is used to determine the final threshold interval by utilizing the Kinect skeleton tracking technology;

图形分割单元，用于根据所述最终阈值区间进行手部目标区域的图像分割，获得分割后的目标区域图像。The graphic segmentation unit is configured to perform image segmentation of the target region of the hand according to the final threshold interval, and obtain a segmented image of the target region.

可选的，所述系统还包括：Optionally, the system also includes:

二值化处理模块，用于对所述深度图像进行手部目标区域的图像分割之前，对所述深度图像进行二值化处理，获得处理后的深度图像。The binarization processing module is configured to perform binarization processing on the depth image before performing image segmentation of the hand target region on the depth image to obtain a processed depth image.

可选的，所述边缘参数特征包括边缘参数特征Hu矩和轮廓边缘参数特征长度矩，所述边缘参数特征提取模块用于利用Hu矩与长度矩的手势边缘特征作为数字手势图像的特征，提取边缘参数特征Hu矩和轮廓边缘参数特征长度矩。Optionally, the edge parameter feature includes an edge parameter feature Hu moment and a contour edge parameter feature length moment, and the edge parameter feature extraction module is used to use the gesture edge feature of the Hu moment and the length moment as a feature of a digital gesture image to extract Edge parameter characteristic Hu moment and contour edge parameter characteristic length moment.

根据本发明提供的具体实施例，本发明公开了以下技术效果：According to the specific embodiments provided by the invention, the invention discloses the following technical effects:

整个调节过程只需要多媒体设备连接一个外设采集深度图像设备，就能够通过手势操作，实现对多媒体自身音量的大小的控制，控制过程采用生活中常见的数字手势，所以整个控制过程外部设备简单，可操作性强，便捷性高。并且改变了以往对于多媒体的音量只能沟通过手动接触鼠标键盘的机械交互模式，实现了人体手势对多媒体的中远程非接触式控制。The entire adjustment process only requires the multimedia device to be connected to a peripheral device to collect depth images, and the volume of the multimedia itself can be controlled through gesture operations. The control process uses digital gestures that are common in life, so the external devices are simple in the entire control process. Strong operability and high convenience. And it has changed the previous mechanical interaction mode that the volume of multimedia can only be communicated through manual contact with the mouse and keyboard, and realized the medium and long-distance non-contact control of multimedia by human gestures.

具体实施时，手势识别过程基于Kinect带有的深度骨骼深度信息，避免了基于彩色信息识别过程中光照强度会给彩色信息带来的干扰，从而即使在亮度很低的情况下设备甚至黑暗条件下也能够对手势进行识别，降低手势识别过程中的外在环境限制；在数字手势算法中，通过改进DAG-SVMs分类器的策略结构，达到识别算法的优化，提高了算法的识别精度，提高了整个交互系统的识别稳定性。In the specific implementation, the gesture recognition process is based on the depth information of the bone depth carried by Kinect, which avoids the interference of light intensity to the color information in the recognition process based on color information, so that even in the case of low brightness and even dark conditions It can also recognize gestures and reduce the external environment restrictions in the gesture recognition process; in the digital gesture algorithm, by improving the strategy structure of the DAG-SVMs classifier, the recognition algorithm is optimized, the recognition accuracy of the algorithm is improved, and the Recognition stability of the entire interactive system.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the accompanying drawings required in the embodiments. Obviously, the accompanying drawings in the following description are only some of the present invention. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without paying creative labor.

图1为本发明多媒体音量调节方法实施例1的流程图；Fig. 1 is the flow chart of the embodiment 1 of the multimedia volume adjustment method of the present invention;

图2为本发明多媒体音量调节方法实施例1中获取的深度图像；Fig. 2 is the depth image acquired in Embodiment 1 of the multimedia volume adjustment method of the present invention;

图3为本发明多媒体音量调节方法实施例1中绘制的直方图；Fig. 3 is the histogram drawn in embodiment 1 of the multimedia volume adjustment method of the present invention;

图4为本发明多媒体音量调节方法实施例1中分割后的目标区域图像；Fig. 4 is the segmented target area image in Embodiment 1 of the multimedia volume adjustment method of the present invention;

图5a-5b为本发明多媒体音量调节方法实施例1中利用Sobel算子检测前后对比图；其中图5a为检测前的图像，图5b为检测后的图像；Figures 5a-5b are comparison diagrams before and after detection using the Sobel operator in Embodiment 1 of the multimedia volume adjustment method of the present invention; wherein Figure 5a is an image before detection, and Figure 5b is an image after detection;

图6为本发明多媒体音量调节方法实施例1中手势分类结果图；6 is a diagram of gesture classification results in Embodiment 1 of the multimedia volume adjustment method of the present invention;

图7a-7e为本发明多媒体音量调节方法实施例1中所有手势的深度图像；7a-7e are depth images of all gestures in Embodiment 1 of the multimedia volume adjustment method of the present invention;

图8a-8e为本发明多媒体音量调节方法实施例1中所有手势的深度图像对应的分割图；8a-8e are segmentation diagrams corresponding to the depth images of all gestures in Embodiment 1 of the multimedia volume adjustment method of the present invention;

图9为本发明多媒体音量调节方法实施例1中所有手势的边缘参数特征图；9 is a characteristic diagram of edge parameters of all gestures in Embodiment 1 of the multimedia volume adjustment method of the present invention;

图10a-10e为本发明多媒体音量调节方法实施例1中所有手势对应的分类结果图；10a-10e are diagrams of classification results corresponding to all gestures in Embodiment 1 of the multimedia volume adjustment method of the present invention;

图11为本发明多媒体音量调节方法实施例1中音量级别与手势分类对应图；FIG. 11 is a correspondence diagram between volume level and gesture classification in Embodiment 1 of the multimedia volume adjustment method of the present invention;

图12为本发明多媒体音量调节方法实施例2的流程图；12 is a flow chart of Embodiment 2 of the multimedia volume adjustment method of the present invention;

图13为本发明多媒体音量调节方法实施例2的分类结果图；FIG. 13 is a classification result diagram of Embodiment 2 of the multimedia volume adjustment method of the present invention;

图14为本发明多媒体音量调节方法实施例2的音量调节示意图；14 is a schematic diagram of volume adjustment in Embodiment 2 of the multimedia volume adjustment method of the present invention;

图15为本发明多媒体音量调节系统结构图。Fig. 15 is a structural diagram of the multimedia volume adjustment system of the present invention.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

实施例1：Example 1:

图1为本发明多媒体音量调节方法实施例1的流程图。如图1所示，所述方法包括：FIG. 1 is a flow chart of Embodiment 1 of the multimedia volume adjustment method of the present invention. As shown in Figure 1, the method includes:

步骤101：获取用户手势的深度图像。深度图像包括所述用户手势中手部关节的空间坐标，如图2所示，图2为本发明多媒体音量调节方法实施例1中获取的深度图像；Step 101: Obtain a depth image of a user's gesture. The depth image includes the spatial coordinates of the hand joints in the user gesture, as shown in FIG. 2 , which is the depth image obtained in Embodiment 1 of the multimedia volume adjustment method of the present invention;

例如，可以采用Kinect摄像机获取用户手势的深度图像。Kinect是一款3D体感摄影机，是微软在2009年6月2日的E3大展上，正式公布的XBOX360体感周边外设。它是一种3D体感摄影机(开发代号“ProjectNatal”)，同时它导入了即时动态捕捉、影像辨识、麦克风输入、语音辨识、社群互动等功能。采用Kinect摄像机作为深度手势图像采集的装置，对深度数字手势图像进行有效识别，兼顾在光照条件差的情况下，基于深度信息的手势分割，克服了基于普通RGB(红、绿、蓝)彩色图像采集设备给手势识别带来的环境限制。在手势图像采集过程中，首先设定手部平面始终与传感器(摄像机)平面保持平行，且位于身体最前部，采集深度图像有效。For example, a Kinect camera may be used to acquire depth images of user gestures. Kinect is a 3D somatosensory camera, which is an XBOX360 somatosensory peripheral peripheral officially announced by Microsoft at the E3 exhibition on June 2, 2009. It is a 3D somatosensory camera (developed code-named "ProjectNatal"), and it introduces functions such as real-time motion capture, image recognition, microphone input, voice recognition, and community interaction. The Kinect camera is used as the device for collecting depth gesture images to effectively recognize depth digital gesture images, taking into account the gesture segmentation based on depth information in the case of poor lighting conditions, and overcoming the problem based on ordinary RGB (red, green, blue) color images The environmental constraints imposed by the capture device on gesture recognition. In the process of gesture image acquisition, firstly, it is set that the plane of the hand is always parallel to the plane of the sensor (camera), and is located at the front of the body, and the acquisition of depth images is effective.

步骤102：对深度图像进行手部目标区域的图像分割。对所述深度图像进行手部目标区域的图像分割，获得分割后的目标区域图像，具体过程为：Step 102: Carry out image segmentation of the hand target area on the depth image. Carrying out image segmentation of the hand target area on the depth image to obtain the segmented target area image, the specific process is:

根据所述深度图像绘制图像深度直方图；图像深度直方图为依据图像深度绘制的直方图，包括深度图像中深度像素值的波峰与波谷的分布。如图3所示，图3为本发明多媒体音量调节方法实施例1中绘制的直方图；drawing an image depth histogram according to the depth image; the image depth histogram is a histogram drawn according to the image depth, including the distribution of peaks and troughs of depth pixel values in the depth image. As shown in Figure 3, Figure 3 is a histogram drawn in Embodiment 1 of the multimedia volume adjustment method of the present invention;

根据所述直方图中深度像素值波谷确定多个待定阈值区间；Determining a plurality of undetermined threshold intervals according to the depth pixel value valleys in the histogram;

利用Kinect骨骼追踪技术确定最终阈值区间，最终阈值区间为逼近目标的理想深度阈值区间；Use Kinect bone tracking technology to determine the final threshold interval, which is the ideal depth threshold interval for approaching the target;

根据所述最终阈值区间进行手部目标区域的图像分割，获得分割后的目标区域图像。如图4所示，图4为本发明多媒体音量调节方法实施例1中分割后的目标区域图像；The image segmentation of the target region of the hand is performed according to the final threshold interval to obtain the segmented image of the target region. As shown in FIG. 4, FIG. 4 is an image of the target area after segmentation in Embodiment 1 of the multimedia volume adjustment method of the present invention;

在图像分割之前，还可以对所述深度图像进行二值化处理，获得处理后的深度图像。手部目标区域像素点进行图像二值化处理。整个分割过程基于深度信息，有效避免了光照强度带来的干扰，保证了即使在外部亮度较低时也能够进行手势图像的识别。Before image segmentation, binarization processing may also be performed on the depth image to obtain a processed depth image. The pixels of the hand target area are processed by image binarization. The entire segmentation process is based on depth information, which effectively avoids the interference caused by light intensity and ensures that gesture images can be recognized even when the external brightness is low.

步骤103：提取边缘参数特征。根据分割后的目标区域图像，利用Sobel算子进行手势边缘轮廓检测，提取边缘参数特征。Sobel算子即索贝尔算子：计算机视觉领域的一种重要处理方法。主要用于获得数字图像的一阶梯度，常见的应用于边缘检测。采用Sobel算子对整个手势轮廓进行提取，采用了典型的边缘参数特征Hu矩及轮廓边缘参数特征长度矩作为数字手势的识别特征，如图5所示，图5a-5b为本发明多媒体音量调节方法实施例1中利用Sobel算子检测前后对比图；其中图5a为检测前的图像，图5b为检测后的图像；Hu矩即为用一组简单的数据(图像描述量)来描述整个图像，是图像的用于识别的一组参数特征。Step 103: Extract edge parameter features. According to the segmented image of the target area, the Sobel operator is used to detect the gesture edge contour and extract the edge parameter features. Sobel operator is Sobel operator: an important processing method in the field of computer vision. It is mainly used to obtain the first-order gradient of digital images, and is commonly used in edge detection. The Sobel operator is used to extract the entire gesture contour, and the typical edge parameter characteristic Hu moment and contour edge parameter characteristic length moment are used as the recognition features of digital gestures, as shown in Figure 5, Figures 5a-5b are the multimedia volume adjustment of the present invention In the method embodiment 1, the Sobel operator is used to detect the comparison diagram before and after; wherein Fig. 5a is the image before the detection, and Fig. 5b is the image after the detection; the Hu moment is to describe the whole image with a set of simple data (image description amount) , is a set of parameter features for image recognition.

步骤104：获得手势分类结果。根据提取的边缘参数特征利用DAG-SVMs(有向无环图支持向量机)分类器获得手势分类结果，如图6所示，图6为本发明多媒体音量调节方法实施例1中手势分类结果图。DAG-SVMs(Database Availability Group-SupportVectorMchine)即有向无环图支持向量机，又称为可调用数据库支持向量机，其为结构策略图具有决策导向，且无闭环结构。Step 104: Obtain a gesture classification result. Utilize DAG-SVMs (Directed Acyclic Graph Support Vector Machine) classifier according to the edge parameter feature of extraction to obtain gesture classification result, as shown in Figure 6, Fig. 6 is the gesture classification result figure in the multimedia volume adjustment method embodiment 1 of the present invention . DAG-SVMs (Database Availability Group-SupportVectorMchine) is a directed acyclic graph support vector machine, also known as a callable database support vector machine, which is a structural strategy graph with decision-oriented and no closed-loop structure.

要想实现利用DAG-SVMs作为分类器进行分类，首先需要构建包括多种手势分类的分类器，具体构建过程如下：In order to use DAG-SVMs as a classifier for classification, it is first necessary to construct a classifier that includes multiple gesture classifications. The specific construction process is as follows:

获得所有手势的深度图，如图7所示，图7a-7e为本发明多媒体音量调节方法实施例1中所有手势的深度图像；Obtain the depth images of all gestures, as shown in Figure 7, Figures 7a-7e are the depth images of all gestures in Embodiment 1 of the multimedia volume adjustment method of the present invention;

获得所有手势的目标区域分割图，如图8所示，图8a-8e为本发明多媒体音量调节方法实施例1中所有手势的深度图像对应的分割图；其中图8a为图7a对应的分割图，图8b为图7b对应的分割图，图8c为图7c对应的分割图，图8d为图7d对应的分割图。Obtain target area segmentation diagrams of all gestures, as shown in Figure 8, Figures 8a-8e are the segmentation diagrams corresponding to the depth images of all gestures in Embodiment 1 of the multimedia volume adjustment method of the present invention; wherein Figure 8a is the corresponding segmentation diagram of Figure 7a , FIG. 8b is a segmentation diagram corresponding to FIG. 7b, FIG. 8c is a segmentation diagram corresponding to FIG. 7c, and FIG. 8d is a segmentation diagram corresponding to FIG. 7d.

利用sobel算子获得所有手势的边缘参数特征，如图9所示，图9为本发明多媒体音量调节方法实施例1中所有手势的边缘参数特征图；采用Sobel算子对整个手势轮廓进行提取，作为分类器分类训练的模板，采用了典型的边缘参数特征Hu矩及轮廓边缘的长度矩参数作为数字手势的识别特征，进行分类器DAGSVM(层次分类器)的训练，构建满足五类要求的分类器。如图10所示，图10a-10e为本发明多媒体音量调节方法实施例1中所有手势对应的分类结果图。Utilize the sobel operator to obtain the edge parameter feature of all gestures, as shown in Figure 9, Figure 9 is the edge parameter feature map of all gestures in the embodiment 1 of the multimedia volume adjustment method of the present invention; adopt the Sobel operator to extract the whole gesture profile, As a template for classifier classification training, the typical edge parameter feature Hu moment and the length moment parameter of the contour edge are used as the recognition features of digital gestures, and the classifier DAGSVM (hierarchical classifier) is trained to construct a classification that meets the requirements of five categories device. As shown in FIG. 10, FIGS. 10a-10e are diagrams of classification results corresponding to all gestures in Embodiment 1 of the multimedia volume adjustment method of the present invention.

步骤105：调节多媒体音量级别。根据所述手势分类结果调节多媒体的音量级别，不同的手势分类对应不同的音量级别。如图11所示，图11为本发明多媒体音量调节方法实施例1中音量级别与手势分类对应图。其中，不同的数字手势对应不同的音量级别，每个音量级别对应音量的具体数值，也可以根据实际需求，重新设置每一个音量级别对应的音量数值。Step 105: Adjust the multimedia volume level. The volume level of the multimedia is adjusted according to the gesture classification result, and different gesture classifications correspond to different volume levels. As shown in FIG. 11 , FIG. 11 is a correspondence diagram between volume level and gesture classification in Embodiment 1 of the multimedia volume adjustment method of the present invention. Wherein, different digital gestures correspond to different volume levels, and each volume level corresponds to a specific volume value, and the volume value corresponding to each volume level can also be reset according to actual needs.

手势交互是人机交互中一种十分重要的交互方式，它主要是从计算机视频图像中检测手势并进行跟踪、识别，从而理解人的意图。手势识别是通过计算机对人的手势进行的精确解释，但现行的手势识别系统之所以不够普及，很大一部分在于识别算法的实时性和抗干扰性得不到保证。这是因为使用基于计算机视觉图像处理手段的手势识别系统，在实际操作中会受到光照、遮蔽、阴影等因素的制约，每一种因素的变化都会对最终的识别精度造成影响。Gesture interaction is a very important interaction method in human-computer interaction. It mainly detects gestures from computer video images, tracks and recognizes them, so as to understand people's intentions. Gesture recognition is an accurate interpretation of human gestures by computers. However, the reason why the current gesture recognition system is not popular enough is that the real-time and anti-interference performance of the recognition algorithm cannot be guaranteed. This is because the gesture recognition system based on computer vision image processing will be restricted by factors such as illumination, shading, and shadows in actual operation, and changes in each factor will affect the final recognition accuracy.

随着体感游戏的流行，目前基于体感外设3D摄像机Kinect的手势识别技术在大众视野里已不再陌生，而现如今多媒体已经成为一种在报告、会议、教学等活动形式下非常重要的展示方法，运用Kinect进行多媒体音量控制，包括PPT内添加的视频内容等音量的大小控制可以最大程度的简化多媒体演示形式，使整个多媒体控制过程变得更加便捷，提高了展示操作人传授知识的效率。With the popularity of somatosensory games, the gesture recognition technology based on the somatosensory peripheral 3D camera Kinect is no longer unfamiliar to the public, and now multimedia has become a very important display in the form of reports, conferences, teaching and other activities Methods: Using Kinect to control the multimedia volume, including the volume control of the video content added in the PPT, can simplify the form of multimedia presentation to the greatest extent, make the entire multimedia control process more convenient, and improve the efficiency of the display operator in imparting knowledge.

通过研究深度数据的手势识别技术来准确判断不同的手势，并在识别过程中降低光照强度给识别精度带来的影响。采用优化算法提高手势识别准确率，保证手势识别的实时性，实现数字手势对多媒体播放器音量进行精确的短程控制。By studying the gesture recognition technology of depth data, different gestures can be accurately judged, and the impact of light intensity on recognition accuracy can be reduced during the recognition process. The optimized algorithm is used to improve the accuracy of gesture recognition, ensure the real-time performance of gesture recognition, and realize the precise short-range control of the volume of multimedia players by digital gestures.

实施例2：Example 2:

图12为本发明多媒体音量调节方法实施例2的流程图；如图12所示，图12为按照本发明多媒体音量调节方法执行的具体流程图。输出的分类结果图趣图13所示，图13为本发明多媒体音量调节方法实施例2的分类结果图；图中显示，分类结果为第1类手势，对应根据手势分类结果调节多媒体音量的级别为1级，本实施例中第一级音量级别对应的音量大小为20；如图14所示，图14为本发明多媒体音量调节方法实施例2的音量调节示意图。FIG. 12 is a flow chart of Embodiment 2 of the multimedia volume adjustment method of the present invention; as shown in FIG. 12 , FIG. 12 is a specific flow chart executed according to the multimedia volume adjustment method of the present invention. The output classification result figure is shown in Figure 13, and Figure 13 is the classification result figure of embodiment 2 of the multimedia volume adjustment method of the present invention; it is shown in the figure that the classification result is the first type of gesture, corresponding to the level of adjusting the multimedia volume according to the gesture classification result It is level 1, and the volume level corresponding to the first level volume level in this embodiment is 20; as shown in Figure 14, Figure 14 is a schematic diagram of volume adjustment in Embodiment 2 of the multimedia volume adjustment method of the present invention.

图15为本发明多媒体音量调节系统结构图。如图15所示，该系统包括：Fig. 15 is a structural diagram of the multimedia volume adjustment system of the present invention. As shown in Figure 15, the system includes:

深度图像获取模块1501，用于获取用户手势的深度图像，所述深度图像包括所述用户手势中手部关节的空间坐标；所述深度图像获取模块1505具体可以采用Kinect摄像机，根据镜头前的用户手势，获取所述用户手势的深度图像。The depth image acquisition module 1501 is configured to acquire the depth image of the user gesture, the depth image includes the spatial coordinates of the hand joints in the user gesture; the depth image acquisition module 1505 can specifically use a Kinect camera, according to the user in front of the camera Gesture, acquire the depth image of the user's gesture.

图像分割模块1502，用于对所述深度图像进行手部目标区域的图像分割，获得分割后的目标区域图像；图像分割模块1502具体包括：The image segmentation module 1502 is used to perform image segmentation of the hand target area on the depth image to obtain the segmented target area image; the image segmentation module 1502 specifically includes:

边缘参数特征提取模块1503，用于根据所述目标区域图像，利用Sobel算子进行手势边缘轮廓检测，提取边缘参数特征；边缘参数特征包括边缘参数特征Hu矩和轮廓边缘参数特征长度矩，边缘参数特征提取模块1503利用Hu矩与长度矩的手势边缘特征作为数字手势图像的特征，提取边缘参数特征Hu矩和轮廓边缘参数特征长度矩。Edge parameter feature extraction module 1503, for according to described target region image, utilize Sobel operator to carry out gesture edge profile detection, extract edge parameter feature; Edge parameter feature comprises edge parameter feature Hu moment and profile edge parameter feature length moment, edge parameter The feature extraction module 1503 uses the gesture edge features of Hu moments and length moments as the features of the digital gesture image to extract edge parameter feature Hu moments and contour edge parameter feature length moments.

手势分类结果获取模块1504，用于根据所述边缘参数特征利用DAG-SVMs分类器获得手势分类结果；The gesture classification result acquisition module 1504 is used to obtain the gesture classification result by using the DAG-SVMs classifier according to the edge parameter characteristics;

音量调节模块1505，用于根据所述手势分类结果调节多媒体的音量级别，不同的手势分类对应不同的音量级别。The volume adjustment module 1505 is configured to adjust the volume level of the multimedia according to the gesture classification result, and different gesture classifications correspond to different volume levels.

所述系统还可以包括：二值化处理模块，用于对所述深度图像进行手部目标区域的图像分割之前，对所述深度图像进行二值化处理，获得处理后的深度图像。The system may further include: a binarization processing module, configured to perform binarization processing on the depth image before performing image segmentation of the hand target region on the depth image to obtain a processed depth image.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言，由于其与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part.

本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处。综上所述，本说明书内容不应理解为对本发明的限制。In this paper, specific examples have been used to illustrate the principle and implementation of the present invention. The description of the above embodiments is only used to help understand the method of the present invention and its core idea; meanwhile, for those of ordinary skill in the art, according to the present invention Thoughts, there will be changes in specific implementation methods and application ranges. In summary, the contents of this specification should not be construed as limiting the present invention.

Claims

1. a multimedia volume adjustment method, is characterized in that, described method comprises:

Obtaining a depth image of the user gesture, the depth image including the spatial coordinates of the hand joints in the user gesture;

Carrying out image segmentation of the hand target area on the depth image to obtain a segmented target area image;

According to described target area image, utilize Sobel operator to carry out gesture edge contour detection, extract edge parameter feature;

Utilize DAG-SVMs classifier to obtain gesture classification result according to described edge parameter feature;

The volume level of the multimedia is adjusted according to the gesture classification result, and different gesture classifications correspond to different volume levels.

2. The method according to claim 1, wherein the acquiring the depth image of the user gesture specifically comprises:

According to the user's gesture, a Kinect camera is used to obtain a depth image of the user's gesture.

3. The method according to claim 1, wherein the image segmentation of the hand target region to the depth image specifically comprises:

Draw an image depth histogram according to the depth image;

Determining a plurality of undetermined threshold intervals according to the depth value trough in the histogram;

Use Kinect bone tracking technology to determine the final threshold interval;

The image segmentation of the target region of the hand is performed according to the final threshold interval to obtain the segmented image of the target region.

4. The method according to claim 1, wherein, before performing the image segmentation of the hand target region on the depth image, further comprising:

Perform binarization processing on the depth image to obtain a processed depth image.

5. method according to claim 1, is characterized in that, described edge parameter characteristic comprises edge parameter characteristic Hu moment and contour edge parameter characteristic length moment, wherein,

The gesture edge features of Hu moments and length moments are used as the features of the digital gesture image, and the edge parameter feature Hu moment and the contour edge parameter feature length moment are extracted.

6. A multimedia volume adjustment system, characterized in that the system comprises:

A depth image acquisition module, configured to acquire a depth image of a user gesture, the depth image including the spatial coordinates of the hand joints in the user gesture;

An image segmentation module, configured to perform image segmentation of the hand target area on the depth image to obtain a segmented target area image;

Edge parameter feature extraction module, for according to described target area image, utilize Sobel operator to carry out gesture edge profile detection, extract edge parameter feature;

Gesture classification result acquisition module, for utilizing DAG-SVMs classifier to obtain gesture classification result according to the edge parameter feature;

The volume adjustment module is configured to adjust the volume level of the multimedia according to the gesture classification result, and different gesture classifications correspond to different volume levels.

7. The system according to claim 6, wherein the depth image acquisition module is a Kinect camera, configured to acquire the depth image of the user gesture according to the user gesture in front of the lens.

8. The system according to claim 6, wherein the image segmentation module specifically comprises:

a histogram drawing unit, configured to draw an image depth histogram according to the depth image;

An undetermined threshold interval determination unit, configured to determine a plurality of undetermined threshold intervals according to the depth value trough in the histogram;

The final threshold interval determination unit is used to determine the final threshold interval by utilizing the Kinect skeleton tracking technology;

The graphic segmentation unit is configured to perform image segmentation of the target region of the hand according to the final threshold interval, and obtain a segmented image of the target region.

9. The system according to claim 6, further comprising:

The binarization processing module is configured to perform binarization processing on the depth image before performing image segmentation of the hand target region on the depth image to obtain a processed depth image.

10. system according to claim 6, is characterized in that, described edge parameter feature comprises edge parameter feature Hu moment and contour edge parameter feature length moment, and described edge parameter feature extraction module is used for utilizing Hu moment and length moment The gesture edge feature is used as the feature of the digital gesture image, and the edge parameter feature Hu moment and the contour edge parameter feature length moment are extracted.