CN113703564B

CN113703564B - Human-computer interaction equipment and system based on facial features

Info

Publication number: CN113703564B
Application number: CN202010436213.4A
Authority: CN
Inventors: 李华栋
Original assignee: Beijing Jujiangyi Media Co ltd
Current assignee: Beijing Jujiangyi Media Co ltd
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2024-08-23
Anticipated expiration: 2040-05-21
Also published as: CN113703564A

Abstract

The invention provides a human-computer interaction device and a human-computer interaction system based on facial features, wherein the device comprises: the device comprises an image acquisition device, a processor and a display screen; the image acquisition device is used for continuously acquiring the user images; the processor is used for sequentially extracting user face images in each user image; extracting facial feature information in each user face image; determining pose change information of a user; generating a corresponding control instruction according to the pose change information, executing corresponding control operation on the preset video according to the control instruction, and sending the preset video after the control operation to a time counting display screen; the display screen is also used for playing the preset video after the control operation. According to the human-computer interaction equipment and the human-computer interaction system based on the facial features, the current motion state of the face of the user is judged by detecting the biological features of the face of the human body and the motion gesture of the human body, so that control signals such as pause or play can be sent out, and video play control through face motion is realized.

Description

Human-computer interaction equipment and system based on facial features

Technical Field

The invention relates to the technical field of human-computer interaction, in particular to human-computer interaction equipment and system based on facial features.

Background

In the course of handwriting education, physical states include pen holding gestures, hand movements, body gestures, and the like. The mental state changes include focus transfer, learning interest fluctuation, learning rhythm interruption and the like. In the traditional handwriting informatization teaching process, a learner can watch the teaching video and practice handwriting, and when watching the teaching video, the learner may need to operate equipment for playing the teaching video, so that the physical state or psychological state of the learner may be changed, and physical and psychological changes form a significant obstacle in the learning process of the learner.

The existing methods for controlling the playing device include two methods of manual clicking and voice control. Manual clicking refers to clicking a button on the mobile terminal interface with a finger to control play, pause, fast forward, reverse, etc. The voice control method is to use a voice recognition system to change the voice signal into a control signal to control the video to play, pause, fast forward, backward and the like.

The manual click control method requires a learner to put down a pen, changes the writing gesture, and causes the fluctuation of the writing mental state. The voice is controlled to play and pause, a learner is required to change from a static state to a speaking state, the psychological state is influenced, and the voice recognition is easy to generate ambiguity and is not suitable for a scene of simultaneous learning of multiple people.

Disclosure of Invention

In order to solve the above problems, an object of an embodiment of the present invention is to provide a human-computer interaction device and system based on facial features.

In a first aspect, an embodiment of the present invention provides a human-computer interaction device based on facial features, including: the device comprises an image acquisition device, a processor and a display screen, wherein the image acquisition device and the display screen are respectively connected with the processor;

The display screen is used for playing a preset video;

the image acquisition device is used for continuously acquiring user images when the display screen plays the preset video and sending the user images to the processor;

The processor is used for sequentially extracting user face images in each user image; extracting facial feature information in each user face image; determining pose change information of a user according to change values of facial feature information of a plurality of user face images and time intervals among the acquired user face images, wherein the pose change information comprises an angle change value and an angular velocity change value; generating a corresponding control instruction according to the pose change information, executing corresponding control operation on the preset video according to the control instruction, and sending the preset video after the control operation to a time count display screen;

The display screen is also used for playing the preset video after the control operation.

In a second aspect, an embodiment of the present invention further provides a human-computer interaction system based on facial features, including: a plurality of human-machine interaction devices as described above.

In the solution provided in the first aspect of the embodiment of the present application, the current motion state of the face of the user is determined by detecting the biological characteristics of the face of the human body and the motion gesture thereof, so as to send out control signals such as pause or play, and realize video play control through the motion of the face. The interaction mode is simple and answering, the user does not need to move the body greatly, the damage of the traditional method to the writing state can be avoided, and students can keep continuity of learning mental states easily. The interaction mode provided by the application provides basic promotion effect for the informatization process of calligraphy education, and has important practical value.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a first structural schematic diagram of a human-computer interaction device based on facial features according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an extracted face feature point according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a second structure of a human-computer interaction device based on facial features according to an embodiment of the present invention;

Fig. 4 is a schematic front view of a face feature point according to an embodiment of the present invention;

fig. 5 is a schematic top view of a face feature point according to an embodiment of the present invention;

FIG. 6 illustrates a schematic view of a head motion gesture provided by an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a human-computer interaction system based on facial features according to an embodiment of the present invention.

Detailed Description

In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

The human-computer interaction device based on facial features provided by the embodiment of the invention, as shown in fig. 1, comprises: the image acquisition device 10, the processor 20 and the display screen 30, wherein the image acquisition device 10 and the display screen 30 are respectively connected with the processor 20.

Wherein the display screen 30 is used for playing a preset video; the image capturing device 10 is configured to continuously capture images of a user while a preset video is being displayed on a display screen, and send the images of the user to the processor 20.

The processor 20 is configured to sequentially extract a face image of a user in each user image; extracting facial feature information in each user face image; determining pose change information of a user according to change values of facial feature information of a plurality of user face images and time intervals among the acquired user face images, wherein the pose change information comprises angle change values and angular velocity change values; generating a corresponding control instruction according to the pose change information, executing corresponding control operation on the preset video according to the control instruction, and sending the preset video after the control operation to the time count display screen 30;

The display screen 30 is also used for playing a preset video after the control operation.

In the embodiment of the invention, the preset video refers to a video which a user needs to watch currently, such as a teaching video which the user needs to watch when practicing handwriting. When the preset video is played through a certain device, a user image of a user in front of the device can be acquired through a camera and other devices. The device may be a smart phone, a tablet computer, a computer, or the like, which is not limited in this embodiment. After the user image is obtained, the face detection technology is adopted to extract the face image of the user in the user image.

Generally, due to illumination, photographing angle, etc., the user image has overexposure, unreasonable contrast, distortion, etc., and meanwhile, the image background obtained by the camera or the camera may have serious pollution noise, so there are various adverse factors such as distortion, pollution, fracture, blurring, distortion, etc., which affect the accuracy of the subsequent feature extraction. Optionally, the embodiment further includes a process of preprocessing the user image, and background interference and noise are removed through preprocessing to obtain a good recognition rate, so that accuracy of subsequent feature extraction can be improved. The preprocessing of the user image mainly adopts the following links: gray enhancement, filtering, morphological processing, adaptive binarization, and the like of the image. And after preprocessing the user image, extracting a user face image in the preprocessed user image.

In addition, in this embodiment, the face detection network may be used to extract a face image of the user from the user image. Specifically, the face detection network is a convolutional neural network with 16 total layers, and consists of 1 convolutional layer, 12 Inception layers, 3 pooling layers and 1 full-connection layer. The network inputs a user image with 256×256 pixel size, and the output is a 256-dimensional feature vector containing extracted face feature information.

The front part of the network consists of a convolution layer and a pooling layer, and the functions of the structure are used for extracting low-level characteristics such as most basic points, lines, intersection points and the like of the user image. The main part of the network consists of 12 Inception layers and 2 pooling layers. From the perspective of network design analysis, the 14-layer structure is responsible for arranging and combining front-end inputs from simple to complex, learning structural features capable of describing face differences in the process of network training, and finally compressing the structural features into 1024-dimensional feature vectors. The output end of the network is composed of a fully connected layer, and the layer structure compresses the input 1024-dimensional feature vector to 256-dimensional for output. The design can randomly shield the connection between 1024-dimensional vectors and 256-dimensional vectors, lighten the overfitting phenomenon generated during network training, and finally extract the face image of the user.

In the embodiment of the invention, the facial feature information comprises feature points extracted from the face image of the user or information related to the feature points, such as coordinate values of the feature points. The feature points of the human face refer to the outline of the human face and key points near the five sense organs, including the outline of the eyebrows, the upper and lower outlines of the eyes, the midline of the nose, the upper and lower outlines of the lips, and the like. As shown in fig. 2. The face feature recognition algorithm is used for positioning a plurality of feature points from the face image of the user, and the coordinates of each feature point can be determined, wherein the coordinates comprise the posture information of the whole face. In fig. 2, 68 feature points (from 0 to 67) are illustrated.

In the embodiment of the invention, a plurality of user images are continuously acquired, namely a plurality of user face images can be continuously acquired, and the change value among the face feature information can be determined through different face feature information of different user face images; meanwhile, the time interval exists among different user images is acquired, and the speed change value can be determined based on the change value and the time interval. Meanwhile, in this embodiment, information related to the angle is used as pose change information, and the head rotation condition of the user is described by the pose change information.

In the embodiment of the invention, the rotation condition of the head of the user is represented by the pose change information, so that the control instruction corresponding to the pose change information can be produced. For example, a user head turns left to generate a fast-backward instruction, a user head turns right to generate a fast-forward instruction, a user head turns upward to generate a pause instruction, and the like. After the control instruction is asserted, the play condition of the preset video, such as fast forward play, can be controlled.

According to the human-computer interaction equipment based on the facial features, the current motion state of the face of the user is judged by detecting the biological features of the face of the human body and the motion gesture of the human body, so that control signals such as pause or play can be sent out, and video play control through face motion is realized. The interaction mode is simple and answering, the user does not need to move the body greatly, the damage of the traditional method to the writing state can be avoided, and students can keep continuity of learning mental states easily. The interaction mode provided by the application provides basic promotion effect for the informatization process of calligraphy education, and has important practical value.

On the basis of the above embodiment, referring to fig. 3, the human-computer interaction device based on facial features further includes: a memory 40, the memory 40 being coupled to the processor 20; the memory 40 is used for storing the preset video, and sending the preset video to the processor 20 when the preset video needs to be played, and the processor 20 instructs the display screen 30 to play the preset video.

On the basis of the above embodiment, referring to fig. 3, the human-computer interaction device based on facial features further includes: the communication device 50, the communication device 50 couples to processor 20; the communication device 50 is configured to acquire the preset video, send the preset video to the processor 20, and instruct the display screen 30 to play the preset video by the processor 20.

In the embodiment of the present invention, the preset video may be stored in the memory in advance, or may be obtained remotely through the communication device, which is not limited in this embodiment.

On the basis of the above embodiment, the processor 20 extracts facial feature information in each user face image, including: extracting characteristic points in a user face image, and sequentially detecting similarity between a picture with a preset size in a neighborhood of the characteristic points and a trained facial feature filter; and (3) setting a unified coordinate system based on the camera internal parameters, and determining coordinate values of the feature points detected through similarity under the coordinate system, wherein the coordinate values are one item of information in the facial feature information.

In the embodiment of the invention, the face five-sense organ sample set with the label is used in advance for training, and each group of parameters represents a specific small filter after training is finished, namely the face five-sense organ filter, such as an eye filter, a mouth filter and the like. By training the facial feature filter, the similarity detection can be carried out on a small image in the coordinate neighborhood of a certain characteristic point. For example, a picture sampled near the eye feature point is detected using a small-piece model of the eye, and a picture sampled near the mouth feature point is detected using a small-piece model of the mouth. Wherein, the asm algorithm can be adopted to obtain the characteristic points in the face image of the user.

Meanwhile, a camera in which a user collects an image of the user has a specific camera reference, which is fixed, and a coordinate system, such as a world coordinate system, can be set based on the camera reference. When the feature points pass through the similarity detection, the feature points are indicated to be effective feature points and can be used as references in the subsequent processing process, so that coordinate values of the feature points can be determined at the moment, and the subsequent pose parameters can be calculated conveniently.

On the basis of the above embodiment, the image capturing device 10 may be a camera, and if the camera can capture three-dimensional face images, the rotation condition of the head can be conveniently determined according to the change of the three-dimensional coordinates of the face feature points. However, since the camera of the general device can only collect two-dimensional planar images, that is, the face image of the user collected by the camera does not have depth information, a large amount of processing procedures are required when the two-dimensional planar images are used to determine the rotation of the head, which reduces the processing efficiency. In this embodiment, the rotation condition of the head of the user is determined by selecting a part of the feature points, specifically, the facial feature information includes coordinate values of the feature points, and the processor 20 determines pose change information of the user includes:

Step A1: at least four standard feature points S _a,S_b,S_c,S_d are selected from the feature points of the user in advance; in the standard face image, the difference between the 90 degrees and the included angle between the line segment between the standard feature points S _a and S _b and the line segment between the standard feature points S _c and S _d is smaller than the preset value.

In the embodiment of the invention, at least four feature points, namely four standard feature points, are predetermined, and the difference between the included angle between the line segment between the two feature points S _a and S _b in the four feature points and the line segment between the feature points S _c and S _d and 90 degrees is smaller than a preset value, namely the included angle between the two line segments is approximately 90 degrees.

Specifically, the positions of certain feature points in the face are substantially fixed, such as the positions of eyes; meanwhile, if two line segments perpendicular to each other exist in the plane, the two line segments rotate around some specific axes in the three-dimensional space so that the line segments projected onto the plane by the two line segments are still perpendicular. The face image of the user is a two-dimensional image, and the pose change of the head of the user rotating around certain specific axes can be correctly identified by selecting four vertical characteristic points. For example, as shown in fig. 2, the four standard feature points are, in order, a leftmost feature point 36 for the left eye, a rightmost feature point 45 for the right eye, a feature point 33 for the tip of the nose, and a feature point 8 for the chin. Fig. 4 shows a schematic position diagram of four standard feature points, where the four standard feature points S _a,S_b,S_c,S_d are A, B, C, D in fig. 4, and the included angle θ between AB and CD is about 90 degrees.

It should be noted that, the process of determining the four standard feature points in the step A1 is a pre-performed process, that is, the four standard feature points do not need to be determined after the face image of the user is acquired.

Step A2: determining four feature points f _a,f_b,f_c,f_d corresponding to the four standard feature points in the facial feature information, sequentially determining the distance between the feature points f _a and f _b and the distance between the feature points f _c and f _d in each user face image, and determining the maximum distance value between the feature points f _a and f _b Maximum distance value between feature points f _c and f _d

In the embodiment of the invention, in order to uniformly determine the reference of the angle, the maximum distance value between the characteristic points is used as the reference. Specifically, when the distance between the face of the user and the camera is not considered, the distance between the feature points is the largest when the plane of the face of the user is parallel to the plane acquired by the camera; when the face of the user deviates, the distance between the feature points is reduced, and the angle capable of representing the head gesture can be determined through the distance between the feature points and the maximum distance value.

Step A3: determining a first angle and a second angle corresponding to a face image of a user:

Wherein i e [1, n ], n is the number of face images of the user, Y _i represents a first angle of the ith face image of the user, P _i represents a second angle of the ith face image of the user, Representing the distance between the feature points f _a and f _b in the ith user face image,Representing the distance between feature points f _c and f _d in the ith user face image.

In the embodiment of the invention, the line segment between two feature points of the face image of the user can be regarded as the projection of the plane where the maximum distance line segment is located, so the first angle isSpecifically, for convenience of explanation, fig. 4 shows a front view of a face, four feature points A, B, C, D (corresponding to four feature points f _a,f_b,f_c,f_d in a face image of a user) in fig. 4 are located on the same plane, and the distance between the corresponding feature points of the face in fig. 4 is the largest, that is, the distance between AB in fig. 4 isThe distance between CDs isIn addition, referring to fig. 5, fig. 5 shows a top view of a human face, a line segment AB in fig. 5 shows a plane in which four feature points A, B, C, D in fig. 4 are located, and a distance AB in fig. 5 is stillIn practical situations, taking the camera as a reference, if the user's head deflects in the left-right direction (i.e. the user swings left or right), i.e. under the real world coordinate system, the characteristic points a and B of the user's head are located at the A1 position and the B1 position, and the distance of the line segment A1B1 in fig. 5 is still the same regardless of the rotation of the user's head up and down (i.e. the user does not nod or lift)However, since the camera can only collect two-dimensional images, i.e. the user face image collected by the camera, the feature points a and B are mapped to Ai and Bi, and for the i-th user face image collected, ai and Bi are the positions of the feature points in the user face image, i.e. the distance between the line segments Ai Bi is the distance between the feature points f _a and f _b The user is deflected by an angle in the left-right directionSimilarly, the deflection angle of the user in the up-down direction can be determined as

Step A4: determining a first angle change value delta Y, a second angle change value delta P, a first angular velocity change value delta omega _Y and a second angular velocity change value delta omega _P between the ith and jth user face images:

ΔY＝Y_i-Y_j,ΔP＝P_i-P_j， Wherein Δt _ij represents the time interval between the acquisition of the ith user face image and the jth user face image.

In the embodiment of the invention, each user face image can determine the angle of the user face image, namely the first angle of the jth user face image is Y _j, and the second angle is P _j; the angle change value is determined by the angle values of two user face images (such as two adjacent frames of user face images), and the angular velocity change value can be determined according to the time interval.

Step A5: determining a first angular change direction between the ith and jth user face imagesAnd a second angle change direction

Wherein the coordinates of the four feature points f _a,f_b,f_c,f_d in the ith user face image are (xi_a,yi_a)、(xi_b,yi_b)、(xi_c,yi_c)、(xi_d,yi_d); and the coordinates of the feature points f _a,f_b,f_c,f_d in the jth user face image are respectively (xj_a,yj_a)、(xj_b,j_b)、(xj_c,yj_c)、(xj_d,yj_d).

In the embodiment of the present invention, the angle change value may be determined in the step A4, that is, it may be determined that the user's head has rotated left and right or up and down, but it is not possible to distinguish whether the user rotates left or right, and in this embodiment, the rotation direction is determined by using the change of the coordinate values of the feature points. Specifically, as shown in fig. 6, when the human head rotates, the head movement gesture can be regarded as six directions of movement of the head in the three-dimensional space, including six directions of the head up, down, left, right, front, back, and up, down, left, right, and front, and back, and various combined gestures between the four directions. Euler's angle describes three angular orientations of rigid body motion under a fixed coordinate system, and any direction can be expressed by a combination of three angles of Yaw angle (Yaw), pitch angle (Pitch), roll angle (Roll), which can accurately express the head rotation angle. As shown in fig. 6, the yaw angle refers to an angle produced by left-right rotation of the head; the pitch angle refers to the angle generated by the up-and-down rotation of the head; the roll angle refers to the angle produced by rotation in a plane. Based on the characteristics of human body, the head is based on neck movement when deflection occurs, namely, the rotation axis when the head rotates is positioned at the neck, so that the whole characteristic points of the human face can also displace when the head rotates. For example, when the head rotates rightward under the action of the neck, the four feature points f _a,f_b,f_c,f_d in the face image of the user also move rightward, and the rotation direction of the head is determined based on the feature in this embodiment.

Specifically, the jth user face image is collected first, and then the ith user face image is collected, namely the time for collecting the jth user face image is earlier than the time for collecting the ith user face image. When the first angle change direction is determined, the coordinates of the feature points f _a and f _b in the jth user face image are (xj _a,yj_a)、(xj_b,j_b) respectively; thereafter, in the ith user face image, the coordinates of the feature points f _a and f _b are (xi _a,yi_a)、(xi_b,yi_b), respectively, that is, the feature point f _a is moved from (xj _a,yj_a) to (xi _a,yi_a), the movement vector thereof isSimilarly, the feature point f _b is shifted from (xj _b,j_b) to (xi _b,yi_b), and the shift vector isThe motion vector of the feature points f _a and f _b as a wholeNamely:

Similarly, when the vertical rotation occurs, the second angle changes direction with respect to the feature point f _c,f_d The method comprises the following steps:

In the embodiment of the invention, based on the characteristic of head rotation, the angle change value, the angular velocity change value, the angular change direction and the like of the two user face images can be determined by utilizing the two-dimensional user face images, so that the rotation displacement, the rotation speed and the rotation direction of the head of the user can be determined, and further, the subsequent generation of a control instruction consistent with the rotation gesture of the user is facilitated.

Based on the above embodiments, the above process of determining pose change information needs to keep the distance between the face of the user and the camera substantially consistent, and if the distance between the face of the user and the camera changes, the determined pose change information may be inaccurate. In this embodiment, the processor 20 generates the corresponding control command according to the pose change information, including:

Step B1: when the second angle change value delta P is smaller than the first preset threshold value and the first angle change value delta Y is larger than the second preset threshold value, the first angle change value delta Y, the first angular velocity change value delta omega _Y and the first angle change direction are used for adjusting the angle of the first angle And generating a corresponding control instruction.

Step B2: when the first angle change value delta Y is smaller than the first preset threshold value and the second angle change value delta P is larger than the second preset threshold value, according to the second angle change value delta P, the second angular velocity change value delta omega _P and the second angle change directionAnd generating a corresponding control instruction.

In the embodiment of the present invention, when the second angle change value Δp is smaller than the first preset threshold value, the maximum distance value in the current time periodThe distance between the feature points f _c and f _d between the two face images of the user is not changed greatly, so that the distance between the face of the user and the camera can be basically considered to be kept consistent, and the head of the user hardly rotates in the directions (such as the up-down rotation direction) of the feature points f _c and f _d. Meanwhile, if the first angle change value Δy is greater than the second preset threshold, it indicates that the user's head has a larger rotation in the directions of the feature points f _a and f _b (such as the left-right rotation direction), at this time, it can be determined that the user's head has rotated in the directions of f _a and f _b, and then the user's head can be rotated according to the first angle change value Δy, the first angle change value Δω _Y and the first angle change directionAnd generating a corresponding control instruction. For example, according to a first angular change directionThe user's head may be determined to be rotated to the right, and the magnitude of the user's head rotation may be determined to be large enough according to the first angle change value Δy and the magnitude of the first angular velocity change value Δω _Y, and then a control command, such as a fast forward command, corresponding to the user's head rotation to the right may be generated.

Similarly, in step B2, when the first angle change value Δy is smaller than the first preset threshold and the second angle change value Δp is larger than the second preset threshold, other corresponding control commands may be generated, such as a control command for playing video when the user rotates his head downward (i.e. when nodding his head).

According to the human-computer interaction equipment based on the facial features, the current motion state of the face of the user is judged by detecting the biological features of the face of the human body and the motion gesture of the human body, so that control signals such as pause or play can be sent out, and video play control through face motion is realized. The interaction mode is simple and answering, the user does not need to move the body greatly, the damage of the traditional method to the writing state can be avoided, and students can keep continuity of learning mental states easily. The interaction mode provided by the application provides basic promotion effect for the informatization process of calligraphy education, and has important practical value. Meanwhile, the two-dimensional user face images are utilized to determine the angle change value, the angular velocity change value, the angle change direction and the like between the two user face images, so that the rotation displacement, the rotation speed and the rotation direction of the head of the user can be determined, and further, the follow-up generation of a control instruction consistent with the rotation gesture of the user is facilitated.

Based on the same inventive concept, the embodiment of the present invention further provides a human-computer interaction system based on facial features, as shown in fig. 7, the system includes: a plurality of separately arranged human-computer interaction devices 1 as described above are illustrated in fig. 7 by way of example as comprising three human-computer interaction devices 1.

In the embodiment of the invention, the man-machine interaction device 1 can be a device used in an informationized education environment, such as an intelligent device (such as a smart phone and a tablet personal computer) used in a classroom, or can be a device fixedly installed at a preset position in advance, a user can watch a video played by the device 1 by sitting in front of the man-machine interaction device 1, and the user can conveniently control and operate the video played by the man-machine interaction device 1 through head movement.

Optionally, referring to fig. 7, the system further includes: an upper computer 2; the human-computer interaction device 1 further comprises communication means 50.

The upper computer 2 is connected with the man-machine interaction device 1 through the communication device 50, and is configured to send a preset video to the man-machine interaction device 1, and instruct the man-machine interaction device 1 to play the preset video.

According to the human-computer interaction system based on the facial features, the current motion state of the face of the user is judged by detecting the biological features of the face of the human body and the motion gesture of the human body, so that control signals such as pause or play can be sent out, and video play control through face motion is achieved. The interaction mode is simple and answering, the user does not need to move the body greatly, the damage of the traditional method to the writing state can be avoided, and students can keep continuity of learning mental states easily. The interaction mode provided by the application provides basic promotion effect for the informatization process of calligraphy education, and has important practical value. Meanwhile, the two-dimensional user face images are utilized to determine the angle change value, the angular velocity change value, the angle change direction and the like between the two user face images, so that the rotation displacement, the rotation speed and the rotation direction of the head of the user can be determined, and further, the follow-up generation of a control instruction consistent with the rotation gesture of the user is facilitated.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A facial feature-based human-machine interaction device, comprising: the device comprises an image acquisition device, a processor and a display screen, wherein the image acquisition device and the display screen are respectively connected with the processor;

The display screen is used for playing a preset video;

the display screen is also used for playing the preset video after the control operation;

Wherein the facial feature information includes feature points extracted from a user face image;

The feature points of the human face refer to key points near the outline and the five sense organs of the human face;

wherein the control instruction includes: a fast-rewinding instruction, a fast-forwarding instruction, a pause instruction and a play instruction;

the processor extracts facial feature information in each of the user face images, including:

Extracting characteristic points in the face image of the user, and sequentially carrying out similarity detection on pictures with preset sizes in the neighborhood of the characteristic points and trained facial five-sense organ filters;

A unified coordinate system is set based on camera internal parameters, and coordinate values of feature points detected through similarity under the coordinate system are determined, wherein the coordinate values are one item of information in the facial feature information;

The facial feature information includes coordinate values of feature points, and the processor determines pose change information of the user includes:

At least four standard feature points S _a,S_b,S_c,S_d are selected from the feature points of the user in advance; in the standard face image, the difference between the included angle between the line segment between the standard feature points S _a and S _b and the line segment between the standard feature points S _c and S _d and 90 degrees is smaller than a preset value;

Determining four feature points f _a,f_b,f_c,f_d corresponding to the four standard feature points in the facial feature information, sequentially determining the distance between the feature points f _a and f _b and the distance between the feature points f _c and f _d in each user face image, and determining the maximum distance value between the feature points f _a and f _b Maximum distance value between feature points f _c and f _d

Determining a first angle and a second angle corresponding to the face image of the user:

Wherein i epsilon [1, n ], n is the number of the user face images, Y _i represents a first angle of the ith user face image, P _i represents a second angle of the ith user face image, represents a distance between characteristic points f _a and f _b in the ith user face image, and represents a distance between characteristic points f _c and f _d in the ith user face image;

Determining a first angle change value delta Y, a second angle change value delta P, a first angular velocity change value delta omega _Y and a second angular velocity change value delta omega _P between the ith and jth user face images:

ΔY＝Y_i-Y_j,ΔP＝P_i-P_j， Wherein Δt _ij represents the time interval between the acquisition of the ith user face image and the jth user face image;

Determining a first angular change direction between the ith and jth user face images And a second angle change direction

2. The apparatus of claim 1, wherein the processor generating the corresponding control instructions from the pose change information comprises:

When the second angle change value Δp is smaller than a first preset threshold value and the first angle change value Δy is larger than a second preset threshold value, according to the first angle change value Δy, the first angle speed change value Δω _Y and the first angle change direction Generating a corresponding control instruction;

When the first angle change value Δy is smaller than a first preset threshold value and the second angle change value Δp is larger than a second preset threshold value, according to the second angle change value Δp, the second angular velocity change value Δω _P and the second angle change direction And generating a corresponding control instruction.

3. The apparatus of any of claims 1-2, wherein the processor extracting a user face image in each of the user images comprises:

preprocessing the user image, wherein the preprocessing comprises one or more of gray scale enhancement, filtering and binarization;

and extracting a user face image in each preprocessed user image.

4. The apparatus as recited in claim 1, further comprising: the memory is connected with the processor;

the memory is used for storing the preset video, and sending the preset video to the processor when the preset video needs to be played, and the processor instructs the display screen to play the preset video.

5. The apparatus as recited in claim 1, further comprising: the communication device is connected with the processor;

The communication device is used for acquiring a preset video and sending the preset video to the processor, and the processor instructs the display screen to play the preset video.

6. A facial feature-based human-machine interaction system, comprising: a plurality of separately provided human-machine interaction devices as claimed in any one of claims 1-4.

7. The system of claim 6, further comprising: an upper computer; the man-machine interaction equipment further comprises a communication device; the upper computer is connected with the man-machine interaction equipment through the communication device and is used for sending a preset video to the man-machine interaction equipment and indicating the man-machine interaction equipment to play the preset video.