CN111158489B

CN111158489B - Gesture interaction method and gesture interaction system based on camera

Info

Publication number: CN111158489B
Application number: CN201911417209.7A
Authority: CN
Inventors: 林树宏; 张三顺
Original assignee: Shanghai Youjiu Health Technology Co ltd
Current assignee: Shanghai Youjiu Health Technology Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-08-08
Anticipated expiration: 2039-12-31
Also published as: CN111158489A

Abstract

The invention relates to the technical field of camera shooting, in particular to a gesture interaction method and a gesture interaction system based on a camera, wherein the method comprises the following steps: step S1, acquiring depth data through a depth camera; s2, intercepting depth data in a rectangular range to calculate a depth average value; s3, judging that the depth average value is the distance from the human body to the equipment through the depth average value; s4, obtaining a range of a preset depth value; s5, acquiring a human body contour according to a depth value in a preset depth value range; s6, determining hand position points; s7, determining a hand movable area; step S8, calculating the size of the hand movable area and the screen resolution ratio of the equipment to determine a screen pointer point; step S9, recognizing a gesture state, and determining a gesture instruction to control the equipment. The invention has the beneficial effects that: people can control the equipment in a long distance, and the method has low requirements on the resolution of the equipment, high response speed and simple gesture and is easy to get up.

Description

Gesture interaction method and gesture interaction system based on camera

Technical Field

The invention relates to the technical field of camera shooting, in particular to a gesture interaction method and a gesture interaction system based on a camera.

Background

With the development of computer vision and the conversion from two dimensions to three dimensions, the depth camera is widely applied in the market, and some devices such as cameras, body testers, intelligent body testers and somatosensory game devices can acquire three-dimensional characteristics of a human body through the depth camera and can recognize various postures of the human body according to the depth data provided by the camera, so that a three-dimensional model of the human body can be built.

However, in the prior art, the devices all require a certain distance from the human body to the device in the use process, when people need to operate the device, the device needs to be temporarily interrupted for use and operated before the device is operated, so that inconvenience is brought to the people for controlling the device in real time, and the key operation through the remote controller is heavy and cumbersome, so that the problem is a problem to be solved by the people in the technical field.

Disclosure of Invention

Aiming at the problems in the prior art, a gesture interaction method and a gesture interaction system based on a camera are provided.

The specific technical scheme is as follows:

the invention provides a gesture interaction method based on a camera, wherein the camera is arranged on equipment, the equipment comprises a depth camera and a color camera, and the gesture interaction method specifically comprises the following steps:

step S1, acquiring depth data in the shooting range of the equipment through the depth camera;

s2, intercepting the depth data in a rectangular range by taking the central position of the equipment as a reference, and calculating a depth average value of the depth data in the rectangular range according to the depth data;

step S3, judging whether a human body exists in the shooting range of the equipment through the depth average value,

if yes, judging that the depth average value is the distance from the human body to the equipment, and turning to the step S4;

if not, returning to the step S2;

step S4, obtaining a range of a preset depth value of the standing position of the human body according to the depth average value;

step S5, judging whether the depth value in the range of the pre-depth value is smaller or larger than the depth average value according to the range of the pre-depth value,

if yes, removing the first pixel point corresponding to the depth value;

if not, reserving a second pixel point corresponding to the depth value to acquire a human outline;

step S6, determining a hand position point of the human body according to the position point closest to the equipment in the human body outline;

step S7, obtaining a preset hand movable area according to the hand position points;

step S8, proportional calculation is carried out on the size of the hand movable area and the screen resolution of the equipment, a proportional relation is obtained, and therefore the hand position points are determined to correspond to the screen pointer points of the equipment;

step S9, according to the pre-stored gesture action in the color camera, the gesture state of the screen pointer point is identified, so as to determine the gesture instruction sent by the corresponding hand position point,

when the gesture state is recognized to be converted from a fist-making state to an opening state, the screen pointer point executes clicking operation according to the corresponding gesture instruction;

when the gesture state is recognized to be changed into the open state after being moved from the fist-making state, the screen pointer point executes the dragging operation according to the corresponding gesture instruction.

Preferably, in the step S6, a standard hand depth value when the human body stands is calculated, and a current hand depth value moving in the human body contour is selected to determine whether the difference between the current hand depth value and the standard hand depth value is greater than a threshold value,

if not, re-selecting the current hand depth value in the human body contour;

if so, determining that the hand in the human body contour is in a forward extending state, and determining the nearest position point to the equipment in the human body contour, thereby determining the hand position point.

Preferably, the step S7 includes:

step S70, determining a lowest height value and a highest height value of the lifting of the hand position points and a maximum distance of the left-right movement of the hand position points by calculating the height value of the human body outline;

and step S71, determining the hand movable area according to the lowest height value, the highest height value and the maximum distance of the hand position point to move left and right.

Preferably, in the step S8, the human body contour is subjected to mirror image processing, so as to ensure that the left or right side of the human body contour in the screen of the apparatus corresponds to the left or right side of the human body contour in a real situation.

Preferably, in the step S9, the color camera includes an image processing library, and the pre-stored gesture is extracted through the image processing library.

The invention also provides a gesture interaction method based on the camera, the camera is arranged on a device, the device comprises a depth camera and a color camera, wherein the gesture interaction method is adopted, and the gesture interaction system comprises:

the acquisition module is used for acquiring depth data in the shooting range of the equipment through the depth camera;

the intercepting module is connected with the acquisition module, intercepts the depth data in a rectangular range by taking the central position of the equipment as a reference, and calculates a depth average value of the depth data in the rectangular range according to the depth data;

the first judging module is connected with the intercepting module, judges whether a human body exists in the shooting range of the equipment through the depth average value, and judges that the depth average value is the distance from the human body to the equipment if the human body exists;

the first acquisition module is connected with the first judgment module and acquires a range of a preset depth value of a standing position of a human body according to the depth average value;

the second judging module is connected with the first acquiring module, judges whether the depth value in the preset depth value range is smaller than or larger than the depth average value according to the preset depth value range, and removes a first pixel point corresponding to the depth value if the depth value in the preset depth value range is smaller than or larger than the depth average value; otherwise, reserving a second pixel point corresponding to the depth value to obtain a human body contour;

the first determining module is connected with the second judging module and is used for determining a hand position point of the human body according to the position point closest to the equipment in the human body outline;

the second acquisition module is connected with the first determination module and used for acquiring a preset hand movable area according to the hand position points;

the second determining module is connected with the second obtaining module, and is used for carrying out proportional calculation on the size of the hand movable area and the screen resolution of the equipment to obtain a proportional relation so as to determine that the hand position point corresponds to the screen pointer point of the equipment;

a recognition module connected with the second determination module for recognizing the gesture state of the screen pointer point according to the pre-stored gesture action in the color camera so as to determine the gesture instruction sent by the corresponding hand position point,

Preferably, the first determining module includes;

the judging unit is used for judging whether the difference value between the current hand depth value and the hand standard depth value is larger than a threshold value or not by calculating a hand standard depth value when a human body stands and selecting a current hand depth value moving in the human body outline, and if the difference value between the current hand depth value and the hand standard depth value is smaller than and/or equal to the threshold value, the current hand depth value in the human body outline is selected again; and if the difference value between the current hand depth value and the hand standard depth value is larger than the threshold value, determining that the hand in the human body contour is in a forward extending state, and determining the nearest position point to the equipment in the human body contour, thereby determining the hand position point.

Preferably, the second obtaining module includes:

the first determining unit is used for determining a lowest height value and a highest height value of the lifting of the hand position point and a maximum distance of the left-right movement of the hand position point by calculating the height value of the human body outline;

and the second determining unit is connected with the first determining unit and is used for determining the movable area of the hand according to the minimum height value, the maximum height value and the maximum distance of the hand position point moving left and right.

The technical scheme of the invention has the beneficial effects that: capturing the distance from a human body to equipment, the outline of the human body and hand position points through a depth camera, synchronizing the hand position points to screen pointer points corresponding to the equipment, controlling the movement of the screen pointer points through the movement of the hand position points, identifying gesture states of the screen pointer points through a color camera, and executing clicking operation by the screen pointer points according to corresponding gesture instructions when the gesture states are identified to be converted from a fist-holding state to an open state; when the gesture state is recognized to be changed into the open state after being moved from the fist-making state, the screen pointer point executes dragging operation according to the corresponding gesture instruction, so that simple and practical gesture interaction is realized, people can control the equipment in a long distance, and the method has the advantages of low requirement on equipment resolution, high response speed, simplicity in gesture and easiness in hand lifting.

Drawings

Embodiments of the present invention will now be described more fully with reference to the accompanying drawings. The drawings, however, are for illustration and description only and are not intended as a definition of the limits of the invention.

FIG. 1 is a step diagram of a gesture interaction method according to an embodiment of the present invention;

FIG. 2 is a step S6 diagram of a gesture interaction method according to an embodiment of the present invention;

FIG. 3 is a step S7 diagram of a gesture interaction method according to an embodiment of the present invention;

FIG. 4 is a block diagram of a gesture interaction system of an embodiment of the present invention;

FIG. 5 is a first determination module block diagram of a gesture interaction system of an embodiment of the present invention;

FIG. 6 is a second acquisition module block diagram of a gesture interaction system of an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

The invention is further described below with reference to the drawings and specific examples, which are not intended to be limiting.

The invention provides a gesture interaction method based on a camera, wherein the camera is arranged on equipment, and the equipment comprises a depth camera and a color camera, and is characterized in that the gesture interaction method specifically comprises the following steps:

step S1, acquiring depth data in a shooting range of equipment through a depth camera;

s2, intercepting depth data in a rectangular range by taking the central position of the equipment as a reference, and calculating a depth average value of the depth data in the rectangular range according to the depth data;

if not, returning to the step S2;

s4, acquiring a range of a preset depth value of the standing position of the human body according to the depth average value;

if yes, removing the first pixel point corresponding to the depth value;

if not, reserving a second pixel point corresponding to the depth value to acquire a human body contour;

step S8, proportional calculation is carried out on the size of the hand movable area and the screen resolution of the equipment, so that a proportional relation is obtained, and the hand position point is determined to correspond to the screen pointer point of the equipment;

step S9, according to the pre-stored gesture actions in the color camera, the gesture state of the screen pointer point is identified, so as to determine the gesture instruction sent by the corresponding hand position point,

when the gesture state is recognized to be converted from the fist-making state to the open state, the screen pointer point executes clicking operation according to the corresponding gesture instruction;

when the gesture state is recognized to be changed into the open state after the gesture state is moved from the fist-holding state, the screen pointer point executes the dragging operation according to the corresponding gesture instruction.

According to the gesture interaction method provided by the above, as shown in fig. 1, first, depth data in a shooting range of a device is collected by a depth camera of the device, where the device may be a camera, a body measuring instrument, an intelligent body measuring mirror, a motion sensing game device, etc., when no human body is in the shooting range of the device, the depth data is a distance between a background and the device, where the background generally mainly consists of a ground and a wall, and to calculate a distance between the human body or the wall and the device by the distance, an influence of the ground must be excluded.

Therefore, taking the center position of the equipment as a reference, intercepting the depth data in the rectangular range, and calculating the depth average value of the depth data in the rectangular range according to the depth data, namely the distance between a human body or a wall surface and the equipment.

Further, in order to recognize the posture of the human body, it is necessary to separate the human body from the background, i.e. to perform the contour matting of the human body, so that a rough pre-depth value range of the standing position of the human body can be obtained according to the depth average value obtained in the above, and according to the pre-depth value range, it is determined whether the depth value in the pre-depth value range is smaller or larger than the depth average value, any first pixel point corresponding to the depth value smaller or larger than the pre-depth value range is regarded as the background point, and delete them, while retaining the second pixel points corresponding to the other depth values, so as to obtain the contour of the human body.

Further, in order to allow the movement of the screen pointer point of the hand position point movement control device, the hand must be lifted and extended forward a certain distance, and then translated in a specific area, at which time, the point closest to the device (i.e., the point with the smallest depth value) is the hand position point for the human body.

In order to determine the hand position point, whether the hand is in a forward extending state or not needs to be determined, firstly, a standard depth value of the hand when a human body stands naturally needs to be calculated, and when the difference value between the current hand depth value moving in the selected human body outline and the standard depth value is larger than a threshold value, the hand is considered to be in the forward extending state, so that the hand position point of the hand closest to the equipment can be determined according to the hand in the forward extending state. The procedure for calculating the standard depth value of the hand is as follows: the number of the pixel points in the transverse and longitudinal directions of the human body can be obtained according to the human body outline, so that the position of the gravity center of the human body is determined, when the human body stands naturally, the hand and the gravity center of the human body are basically in the same plane, and the distances between the hand and the gravity center of the human body are basically equal relative to the equipment, so that the depth value of the gravity center point is used as the standard depth value of the hand.

Further, in order to make the hand movable region, firstly, the height value of the human body outline is calculated, the lowest height value of the lifting of the hand position point is determined according to the ratio of the height values, if the height value is larger than the lowest height value, the hand position point is considered to be lifted, if the hand position point is higher than the head top position of the human body, the hand movable region in the longitudinal range can be determined, then the hand movable region in the transverse range can be determined according to the left and right movement of the hand position point of the human body outline, and finally, the complete hand movable region is obtained according to the hand movable region in the longitudinal range and the hand movable region in the transverse range.

Further, the body contour is mirrored to ensure that the left or right side of the body contour in the screen of the device corresponds to the left or right side of the body contour in real situations, by which hand position points, represented by coordinates (x 1, y 1), are determined, the coordinate system of which is based on the hand movable area (W1 x H1).

Further, in order to obtain a screen pointer point of the hand position point corresponding to the device, a screen resolution (W2×h2) of the device needs to be obtained, and a coordinate point (x 2, y 2) of the hand position point corresponding to the screen pointer point of the device at this time is determined according to a proportional relationship between the screen resolution and the size of the hand movable area, so that the screen pointer point moves along with the movement of the hand position point in real time. Wherein the mapping relation is as follows:

x2＝(W2/W1)*x1；y2＝(H2/H1)*y1。

further, according to the pre-stored gesture actions in the color camera, the gesture state of the screen pointer point is recognized, so as to determine the gesture instruction sent by the corresponding hand position point,

when the gesture state is recognized to be changed into the open state after being moved from the fist holding state, the screen pointer point executes dragging operation according to the corresponding gesture instruction, so that simple and practical gesture interaction is realized, people can control the equipment in a long distance, and the method has the advantages of low equipment resolution requirement, high response speed, simple gesture and easiness in getting up.

In a preferred embodiment, as shown in fig. 2, in step S6, by calculating a standard hand depth value when the human body stands and selecting a current hand depth value moving in the human body contour, it is determined whether the difference between the current hand depth value and the standard hand depth value is greater than a threshold value,

if not, re-selecting the current hand depth value in the human body contour;

if so, determining that the hand in the human body contour is in a forward extending state, so as to determine the position point closest to the equipment in the human body contour, and further determining the hand position point.

In a preferred embodiment, step S7 comprises:

step S70, determining a lowest height value and a highest height value of lifting the hand position points and a maximum distance of left-right movement of the hand position points by calculating the height value of the human body outline;

step S71, determining the hand movable area according to the minimum height value, the maximum height value and the maximum distance of the hand position point to move left and right.

Specifically, as shown in fig. 3, the height value of the human body contour is calculated, the lowest height value of the lifting of the hand position point is determined by using the ratio of the height value, if the height value is larger than the lowest height value, the hand position point is considered to be lifted, if the hand position point is higher than the head top position of the human body, the hand lifting is considered to be too high, the hand movable area in the longitudinal range can be determined, then the hand movable area in the transverse range can be determined by moving the hand position point according to the human body contour left and right, and finally the complete hand movable area is obtained according to the hand movable area in the longitudinal range and the hand movable area in the transverse range.

In a preferred embodiment, in step S8, the human body contour is mirrored to ensure that the left or right side of the human body contour in the screen of the device corresponds to the left or right side of the human body contour in real situations.

In a preferred embodiment, in step S9, the color camera includes an image processing library, and the pre-stored gesture is extracted through the image processing library.

The invention also provides a gesture interaction method based on the camera, wherein the camera is arranged on a device, the device comprises a depth camera and a color camera, the gesture interaction method is adopted, and the gesture interaction system comprises:

the acquisition module 1 acquires depth data in the shooting range of the equipment through the depth camera;

the intercepting module 2 is connected with the acquisition module 1, intercepts depth data in a rectangular range by taking the center position of the equipment as a reference, and calculates a depth average value of the depth data in the rectangular range according to the depth data;

the first judging module 3 is connected with the intercepting module 2 and judges whether a human body exists in the shooting range of the equipment through the depth average value, and if the human body exists, the depth average value is judged to be the distance from the human body to the equipment;

the first acquisition module 4 is connected with the first judgment module 3 and acquires a range of a preset depth value of the standing position of the human body according to the depth average value;

a second judging module 5, connected to the first obtaining module 4, for judging whether the depth value in the range of the pre-depth value is smaller than or larger than the depth average value according to the range of the pre-depth value, if the depth value in the range of the pre-depth value is smaller than or larger than the depth average value, removing the first pixel point corresponding to the depth value; otherwise, reserving a second pixel point corresponding to the depth value to obtain a human outline;

a first determining module 6 connected to the second judging module 5 for determining a position point of a hand of the human body according to a position point closest to the device in the human body contour;

a second obtaining module 7, connected to the first determining module 6, for obtaining a preset hand movable area according to the hand position points;

a second determining module 8, connected to the second obtaining module 7, for performing a proportional calculation on the size of the hand movable area and the screen resolution of the device, so as to obtain a proportional relationship, so as to determine that the hand position point corresponds to the screen pointer point of the device;

a recognition module 9 connected with the second determination module 8 for recognizing the gesture state of the screen pointer point according to the pre-stored gesture action in the color camera, thereby determining the gesture instruction sent by the corresponding hand position point,

Specifically, as shown in fig. 4, the acquisition module 1 acquires depth data in a shooting range of a device, such as a camera, a body measuring instrument, an intelligent body measuring mirror, a motion sensing game device, and the like, through a depth camera of the device, and when no human body exists in the shooting range of the device, the depth data is a distance from a background to the device, wherein the background generally mainly consists of a ground and a wall surface, and to calculate the distance from the human body or the wall surface to the device through the distance, the influence of the ground must be eliminated.

Therefore, by taking the central position of the device as a reference through the intercepting module 2, intercepting the depth data in the rectangular range, and calculating the depth average value of the depth data in the rectangular range according to the depth data, namely the distance between a human body or a wall surface and the device, because the distance range between the human body and the device is generally required when the device is used, it is obvious that whether a person exists in the shooting range of the device can be judged through the first judging module 3 by the depth average value, and if the person exists, the depth average value is the distance between the human body and the device.

Further, in order to recognize the posture of the human body, it is necessary to separate the human body from the background, i.e. to perform the contour matting of the human body, so that the approximate range of the pre-depth value of the standing position of the human body can be obtained by the first obtaining module 4, and in the second judging module 5, according to the range of the pre-depth value, it is judged whether the depth value in the range of the pre-depth value is smaller than or greater than the depth average value, any first pixel point corresponding to the depth value smaller than or greater than the range of the pre-depth value is regarded as the background point, and delete them, while the second pixel points corresponding to other depth values are reserved, so that the contour of the human body can be obtained.

In order to determine the hand position point, whether the hand is in a forward extending state or not needs to be determined, firstly, a standard depth value of the hand when the human body stands naturally needs to be calculated, and when the difference value between the current hand depth value moving in the selected human body outline and the standard depth value is larger than a threshold value, the hand is considered to be in the forward extending state, so that the hand position point closest to the device by the hand in the forward extending state can be determined through the first determining module 6. The procedure for calculating the standard depth value of the hand is as follows: the number of the pixel points in the transverse and longitudinal directions of the human body can be obtained according to the human body outline, so that the position of the gravity center of the human body is determined, when the human body stands naturally, the hand and the gravity center of the human body are basically in the same plane, and the distances between the hand and the gravity center of the human body are basically equal relative to the equipment, so that the depth value of the gravity center point is used as the standard depth value of the hand.

Further, in order to make the hand movable region, the height value of the human body contour is calculated by the second obtaining module 7, the lowest height value of the lifting of the hand position point is determined by the proportion of the height value, if the height value is larger than the lowest height value, the lifting of the hand position point is considered, if the hand position point is higher than the head top position of the human body, the lifting of the hand is considered to be too high, the hand movable region in the longitudinal range can be determined, then the hand movable region in the transverse range can be determined by moving left and right according to the hand position point of the human body contour, and finally the complete hand movable region is obtained according to the hand movable region in the longitudinal range and the hand movable region in the transverse range.

Further, by means of the second determination module 8, the body contour is first mirrored to ensure that the left or right side of the body contour in the screen of the device corresponds to the left or right side of the body contour in real situations, by means of which hand position points, represented by coordinates (x 1, y 1), are determined, the coordinate system of which is based on the hand movable area (W1 x H1).

x2＝(W2/W1)*x1；y2＝(H2/H1)*y1。

further, the recognition module 9 recognizes the gesture state of the screen pointer point according to the pre-stored gesture action in the color camera, so as to determine the gesture instruction sent by the corresponding hand position point,

In a preferred embodiment, as shown in fig. 5, the first determining module 6 comprises;

a judging unit 60 for judging whether the difference between the current hand depth value and the hand standard depth value is greater than a threshold value by calculating a hand standard depth value when the human body stands and selecting a current hand depth value moving in the human body contour, and re-selecting the current hand depth value in the human body contour if the difference between the current hand depth value and the hand standard depth value is less than and/or equal to the threshold value; if the difference value between the current hand depth value and the hand standard depth value is larger than the threshold value, determining that the hand in the human body contour is in a forward extending state, and determining the nearest position point to the equipment in the human body contour, thereby determining the hand position point.

In a preferred embodiment, as shown in fig. 6, the second acquisition module 7 comprises:

a first determining unit 70 for determining a lowest height value and a highest height value of the lifting of the hand position point and a maximum distance of the left-right movement of the hand position point by calculating the height value of the human body contour;

a second determining unit 71 connected to the first determining unit 70 for determining the hand movable region according to the maximum distance of the left and right movement of the lowest and highest height values and the hand position points.

The foregoing description is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the invention, and it will be appreciated by those skilled in the art that equivalent substitutions and obvious variations may be made using the description and illustrations of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. The gesture interaction method based on the camera is characterized by comprising the following steps of:

step S3, judging whether a human body exists in the shooting range of the equipment according to the depth average value and the distance range between the human body and the equipment, which are required in advance,

if not, returning to the step S2;

s5, removing first pixel points corresponding to depth values larger or smaller than the range of the preset depth values, and reserving other second pixel points corresponding to the depth values as human body contours;

2. The method of claim 1, wherein in the step S6, a standard hand depth value when a person stands is calculated and a current hand depth value moving in the outline of the person is selected to determine whether a difference between the current hand depth value and the standard hand depth value is greater than a threshold,

if not, re-selecting the current hand depth value in the human body contour;

3. The method of gesture interaction based on camera of claim 1, wherein the step S7 includes:

4. The camera-based gesture interaction method according to claim 1, wherein in the step S8, the human body contour is mirrored to ensure that the left or right of the human body contour in the screen of the device corresponds to the left or right of the human body contour in a real situation.

5. The method according to claim 1, wherein in the step S9, the color camera includes an image processing library, and the pre-stored gesture is extracted from the image processing library.

6. A camera-based gesture interaction system, the camera being disposed on a device, the device comprising a depth camera and a color camera, the gesture interaction system comprising:

7. The camera-based gesture interaction system of claim 6, wherein the first determination module comprises;

8. The camera-based gesture interaction system of claim 6, wherein the second acquisition module comprises: