Display device and control method thereof
Cross Reference to Related Applications
The application is required to be filed in 2021, 11 and 04 days, and the application number is 202111302345.9; filed at 11.04 of 2021 with application number 202111302336.X; submitted at 2022, month 03 and 17 with application number 202210266245.3; the present application claims priority from the chinese patent application filed 24 at 2022, 03, application number 202210303452.1, the entire contents of which are incorporated herein by reference.
Technical Field
The application relates to the field of gesture control, in particular to a display device and a control method thereof.
Background
With the continuous development of electronic technology, more and more functions can be implemented by display devices such as a television, for example, the display devices can capture images of users through video acquisition devices arranged by the display devices, and after gesture information of the users in the images is identified by a processor, commands corresponding to the gesture information are executed.
However, at present, the control command determined by the gesture information of the display device is generally to identify one collected user behavior image, determine the target gesture information, and further determine the corresponding control command, which results in lower intelligent degree of the display device and poorer user experience.
Disclosure of Invention
The present application provides a display device including: a display configured to display an image; an image input interface configured to acquire a user behavior image; a controller configured to: acquiring a plurality of frames of user behavior images; performing gesture recognition processing on the user behavior image of each frame to obtain target gesture information; and controlling the display to display corresponding content based on the target gesture information.
The application provides a display device control method, which comprises the following steps: acquiring a plurality of frames of user behavior images; performing gesture recognition processing on the user behavior image of each frame to obtain target gesture information; and controlling the display to display corresponding content based on the target gesture information.
Drawings
Fig. 1 is a usage scenario of a display device according to an embodiment of the present application;
Fig. 2 is a block diagram of a hardware configuration of a control device 100 according to an embodiment of the present application;
fig. 3 is a block diagram of a hardware configuration of a display device 200 according to an embodiment of the present application;
fig. 4 is a software configuration diagram of a display device 200 according to an embodiment of the present application;
fig. 5 is a schematic diagram of a display device according to an embodiment of the present application;
fig. 6a is a schematic diagram of a built-in camera of a display device according to an embodiment of the present application;
fig. 6b is a schematic diagram of an external camera of a display device according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a user interface provided by an embodiment of the present application;
FIG. 8 is a schematic diagram of a display displaying a cursor according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a display for displaying cursor control mode confirmation information according to an embodiment of the present application;
FIG. 10 is an interactive flow chart of the components of the display device provided by the embodiment of the application;
FIG. 11 is a schematic diagram of a user gesture according to an embodiment of the present application;
FIG. 12 is a flowchart illustrating determining a cursor position according to target gesture information according to an embodiment of the present application;
fig. 13 is a schematic diagram of a display camera area provided by an embodiment of the present application;
FIG. 14 is a schematic view of a cursor moving along a straight line according to an embodiment of the present application;
FIG. 15 is a schematic diagram of a cursor moving along a curve according to an embodiment of the present application;
FIG. 16 is a schematic diagram of a relationship between cursor and control distance provided by an embodiment of the present application;
FIG. 17 is a positional relationship of a cursor and a control provided by an embodiment of the present application;
FIG. 18 is a schematic diagram of a dynamic gesture interaction flow provided by an embodiment of the present application;
FIG. 19 is a schematic view of a hand orientation provided by an embodiment of the present application;
FIG. 20 is a schematic tree structure of a detection model according to an embodiment of the present application;
FIG. 21 is a diagram of an action path when a pseudo jump provided by an embodiment of the present application is successful;
FIG. 22 is a diagram illustrating a path of actions when a pseudo jump fails according to an embodiment of the present application;
FIG. 23 is a schematic diagram of a data flow relationship of dynamic gesture interaction according to an embodiment of the present application;
FIG. 24 is a diagram of a dynamic gesture interaction timing relationship provided by an embodiment of the present application;
Fig. 25 is a schematic diagram of another usage scenario of a display device according to an embodiment of the present application;
fig. 26 is a schematic hardware structure diagram of another hardware system in the display device according to the embodiment of the present application;
Fig. 27 is a schematic diagram of a control method of a display device according to an embodiment of the present application;
Fig. 28 is a schematic diagram of another embodiment of a control method of a display device according to an embodiment of the present application;
fig. 29 is a schematic diagram of hand keypoint coordinates according to an embodiment of the present application;
FIG. 30 is a schematic diagram of different telescopic states of the hand key points according to the embodiment of the present application;
Fig. 31 is a schematic diagram of an application scenario of a control method of a display device according to an embodiment of the present application;
FIG. 32 is a schematic diagram of determining control commands together using gesture information and limb information according to an embodiment of the present application;
Fig. 33 is a flowchart illustrating a control method of a display device according to an embodiment of the present application;
FIG. 34 is a schematic diagram of a mapping relationship provided in an embodiment of the present application;
FIG. 35 is another schematic diagram of a mapping relationship provided in an embodiment of the present application;
FIG. 36 is a schematic diagram of target gesture information and limb information in an image according to an embodiment of the present application;
FIG. 37 is a schematic diagram of a movement position of a target control provided by an embodiment of the present application;
FIG. 38 is another schematic diagram of a movement position of a target control provided by an embodiment of the present application;
fig. 39 is a flowchart illustrating a control method of a display device according to an embodiment of the present application;
FIG. 40 is a flowchart illustrating another method for controlling a display device according to an embodiment of the present application;
FIG. 41 is a schematic diagram of a virtual box according to an embodiment of the present application;
FIG. 42 is a schematic diagram of a correspondence relationship between a virtual frame and a display according to an embodiment of the present application;
FIG. 43 is a schematic diagram of movement of a target control provided by an embodiment of the present application;
FIG. 44 is a schematic view of an area of a virtual frame according to an embodiment of the present application;
FIG. 45 is a schematic view of an edge region according to an embodiment of the present application;
FIG. 46 is a schematic diagram of a gesture information state according to an embodiment of the present application;
FIG. 47 is a diagram of a reestablished virtual box provided by an embodiment of the present application;
FIG. 48 is another diagram of a re-established virtual box provided by an embodiment of the present application;
FIG. 49 is a schematic diagram of a target control provided by an embodiment of the present application when moving;
FIG. 50 is another schematic diagram of a target control according to an embodiment of the present application;
FIG. 51 is a schematic diagram of a control process of a display device according to an embodiment of the present application;
FIG. 52 is a flowchart of another method for controlling a display device according to an embodiment of the present application;
Fig. 53 is a flowchart illustrating an embodiment of a control method of a display device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, embodiments and advantages of the present application more apparent, an exemplary embodiment of the present application will be described more fully hereinafter with reference to the accompanying drawings in which exemplary embodiments of the application are shown, it being understood that the exemplary embodiments described are merely some, but not all, of the examples of the application.
Fig. 1 is a schematic diagram of an operation scenario between a display device and a control apparatus according to an embodiment of the present application, as shown in fig. 1, a user may operate the display device 200 through the mobile terminal 300 and the control apparatus 100. The control apparatus 100 may be a remote control, and the communication between the remote control and the display device includes infrared protocol communication, bluetooth protocol communication, and wireless or other wired manner to control the display device 200. The user may control the display device 200 by inputting user instructions through keys on a remote control, voice input, control panel input, etc. In some embodiments, mobile terminals, tablet computers, notebook computers, and other smart devices may also be used to control the display device 200.
In some embodiments, the mobile terminal 300 may install a software application with the display device 200, implement connection communication through a network communication protocol, and achieve the purpose of one-to-one control operation and data communication. The audio/video content displayed on the mobile terminal 300 may also be transmitted to the display device 200, so that the display device 200 may also perform data communication with the server 400 through various communication modes. The display device 200 may be permitted to make communication connections via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display device 200.
The display device 200, in one aspect, may be a liquid crystal display, an OLED display, a projection display device; in another aspect, the display device may be a smart television or a display system of a display and a set-top box. The display device 200 may additionally provide an intelligent network television function of a computer support function in addition to the broadcast receiving television function. Examples include web tv, smart tv, internet Protocol Tv (IPTV), etc. In some embodiments, the display device may not have broadcast receiving television functionality.
Fig. 2 is a block diagram illustrating a configuration of a control apparatus 100 according to an embodiment of the present application. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction of a user, and convert the operation instruction into an instruction recognizable and responsive to the display device 200, and may perform an interaction between the user and the display device 200. The communication interface 130 is configured to communicate with the outside, and includes at least one of a WIFI chip, a bluetooth module, NFC, or an alternative module. The user input/output interface 140 includes at least one of a microphone, a touch pad, a sensor, keys, or an alternative module.
Fig. 3 is a block diagram of a hardware configuration of a display device 200 according to an embodiment of the present application. The display apparatus 200 shown in fig. 3 includes at least one of a modem 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface 280. The controller includes a central processor, a video processor, an audio processor, a graphic processor, a RAM, a ROM, and first to nth interfaces for input/output.
The display 260 may be at least one of a liquid crystal display, an OLED display, a touch display, and a projection display, and may also be a projection device and a projection screen.
The communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, or other network communication protocol chip or a near field communication protocol chip, and an infrared receiver. The display device 200 may establish transmission and reception of control signals and data signals with the external control device 100 or the server 400 through the communicator 220.
A user input interface may be used to receive control signals from the control device 100 (e.g., an infrared remote control, etc.).
The modem 210 receives broadcast television signals through a wired or wireless reception manner, and demodulates audio and video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.
The detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for capturing the intensity of ambient light; or the detector 230 comprises an image collector 231, such as a camera, which may be used to collect external environmental scenes, user attributes or user interaction gestures.
The external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, etc. The input/output interface may be a composite input/output interface formed by a plurality of interfaces.
The controller 250 and the modem 210 may be located in separate devices, i.e., the modem 210 may also be located in an external device to the main device in which the controller 250 is located, such as an external set-top box.
In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored on the memory. The controller 250 controls the overall operation of the display apparatus 200. The user may input a user command through a Graphical User Interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the Graphical User Interface (GUI). Or the user may input the user command by inputting a specific sound or gesture, the user input interface recognizes the sound or gesture through the sensor, and receives the user input command.
Fig. 4 is a schematic view of software configuration in a display device 200 according to an embodiment of the present application, as shown in fig. 4, the system is divided into four layers, namely, an application layer (application layer), an application framework layer (Application Framework layer), a An Zhuoyun line layer (Android runtime) and a system library layer (system runtime layer), and a kernel layer. The kernel layer contains at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (e.g., fingerprint sensor, temperature sensor, pressure sensor, etc.), and power supply drive, etc.
With the rapid development of display devices, people are not limited to controlling the display devices with a control device, but want to more conveniently control the display devices with only limb movements or voices. The user may control the display device in a gesture-interactive manner. Gesture interactions that can be employed by a display device may include static gestures and dynamic gestures. When interaction of static gestures is used, the display device can detect gesture types according to a gesture type recognition algorithm, and corresponding control actions are executed according to the gesture types.
In order to improve the intelligentization degree and the user experience of the display device, in an embodiment of the present application, a display device is provided, fig. 5 is a schematic diagram of the display device provided in the embodiment of the present application, and as shown in fig. 5, the display device includes a display 260, an image input interface 501 and a controller 110,
Wherein the display 260 is configured to display an image;
an image input interface 501 configured to acquire a user behavior image;
a controller 110 configured to:
Acquiring a plurality of frames of user behavior images; performing gesture recognition processing on the user behavior image of each frame to obtain target gesture information; and controlling the display to display corresponding content based on the target gesture information.
In order to improve the intelligentization degree of the display device and improve the experience of the user, in the embodiment of the present application, the controller 110 may acquire a plurality of frames of user behavior images through the image input interface 501, where the user behavior images may include only local images of the user, for example, gesture images of gestures made by the user, and may also include acquired global images of the user, for example, acquired whole-body images of the user. The acquired plurality of frames of user behavior images can be videos containing the plurality of frames of user behavior images or image sets containing the plurality of frames of user behavior images.
After acquiring a plurality of frames of user behavior images, the controller 110 may perform gesture recognition processing on each frame of user behavior image to obtain target gesture information. When the gesture recognition processing is performed on the user behavior image, the gestures contained in the user behavior image can be recognized based on the image recognition technology, and the recognized gestures in each frame of user behavior image can be combined to obtain target gesture information, that is, each recognized gesture is included in the target gesture information. The recognized gestures can be classified according to gesture types set by the preset equipment, and the gesture type with the largest occurrence number is determined as target gesture information.
After determining the target gesture information, the controller 110 may control the display 260 to display corresponding content.
Because in the embodiment of the application, the controller 110 acquires a plurality of frames of user behavior images, determines target gesture information according to the acquired frames of user behavior images, and performs corresponding control based on the target gesture information instead of determining target gesture information to perform control based on one acquired user behavior image, the accuracy of display control of the display device based on gesture recognition is improved, the intelligent degree of the display device is improved, and the experience of the user is improved.
The display device refers to a terminal device capable of outputting a specific display screen. Along with the rapid development of display equipment, the functions of the display equipment are more and more abundant, the performances are more and more powerful, the bidirectional human-computer interaction function can be realized, and various functions such as video and audio, entertainment, data and the like are integrated, so that the user diversified and personalized requirements are met.
Gesture interaction is a novel man-machine interaction mode. The gesture interaction aims at controlling the display device to execute corresponding control instructions by detecting specific gesture actions made by a user. Gesture interactions that can be employed by a display device may include static gestures and dynamic gestures. When interaction of static gestures is used, the display device can detect gesture types according to a gesture type recognition algorithm, and corresponding control actions are executed according to the gesture types. In interactions using dynamic gestures, a user may manipulate a cursor in a display to move. The display device can establish the mapping relation between the gesture of the user and the cursor in the display, and can determine the dynamic gesture of the user by continuously detecting the user image, so as to determine the gesture movement track mapped in the display, and further control the cursor to move along the gesture movement track.
For the interactive process of dynamic gestures, the display device needs to constantly detect user images. However, some images may not detect the gesture of the user, so that the gesture movement track corresponding to the user image cannot be accurately obtained, and thus the cursor cannot be controlled to move, and the situation of cursor blocking and interruption occurs, so that the experience of the user is poor.
When the interaction of the dynamic gestures is used, the display device can detect the dynamic gestures of the user, further determine gesture movement tracks mapped to the display, and control the cursor to move along the gesture movement tracks.
The display device needs to constantly detect user images while the user is controlling cursor movement using dynamic gestures. And recognizing each frame of user image to obtain user gestures in the image, and further determining coordinates of each frame of user gestures mapped to a display, so as to control a cursor to move along the coordinates. However, considering numerous factors such as errors in shooting of the camera, nonstandard gestures of the user, errors in gesture recognition, and the like, the display device may not recognize gestures of part of the user images, so that corresponding coordinates cannot be determined, and corresponding gesture movement tracks cannot be accurately acquired. Under normal conditions, the cursor needs to move according to the position corresponding to each frame of image to form a continuous motion track. If the position corresponding to the middle frame image is lacking, the cursor cannot move, so that the situation of moving stuck occurs, the cursor cannot move continuously until the position corresponding to the next frame image is identified, but if the positions are far apart, the situation of sudden jump and the like of the cursor occurs, and the watching experience of a user is seriously influenced.
In some embodiments, to enable the display device to perform a gesture interaction function with a user, the display device further includes an image input interface for connecting to the image collector 231. The image collector 231 may be a camera for collecting some image data. It should be noted that the camera may be an external device, and may be externally connected to the display device through the image input interface, or may be a detector built into the display device. For the camera externally connected to the display equipment, the camera can be connected to an external device interface of the display equipment and connected to the display equipment. The user can complete a photographing or shooting function on the display device by using the camera, thereby acquiring image data.
The camera may further include a lens assembly having a photosensitive element and a lens disposed therein. The lens can make the light of the image of the scenery irradiate on the photosensitive element through the refraction action of a plurality of lenses on the light. The photosensitive element may use a detection principle based on a CCD (Charge-coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor ) according to a specification of the camera, convert an optical signal into an electrical signal through a photosensitive material, and output the converted electrical signal into image data. The camera can also acquire image data frame by frame according to a set sampling frequency so as to form video stream data according to the image data.
In some embodiments, a camera built into the display device may also support lifting. The camera can be arranged on the lifting mechanism, and when image acquisition is needed, the lifting mechanism is controlled to move through a specific lifting instruction, so that the camera is driven to rise, and the image acquisition is carried out. When image acquisition is not needed, the lifting mechanism can be controlled to move through a specific lifting instruction, so that the camera is driven to be lowered, and the camera is hidden. Fig. 6a is a schematic diagram of a built-in camera of a display device according to an embodiment of the present application.
For the image collector 231 externally connected to the display device, the image collector can be a separate peripheral device and is connected with the display device through a specific data interface. For example, as shown in fig. 6b, the image collector 231 may be a stand-alone camera device, and the display device may be provided with a universal serial bus interface (Universal Serial Bus, USB) or a high-definition multimedia interface (High Definition Multimedia Interface, HDMI), and the image collector 231 is connected to the display device through the USB interface or the HDMI interface. To facilitate detection of gesture interactions by a user, in some embodiments, the image collector 231 external to the display device may be positioned near the display device, such as with the image collector 231 clamped to the top of the display device by a clamping device, or with the image collector 231 placed on a desktop near the display device.
Obviously, for the image collector 231 externally connected to the display device, other modes of connection can be supported according to the specific hardware configuration of the display device. In some embodiments, the image collector 231 may also establish a connection relationship with the display device through a communicator of the display device, and send the collected image data to the display device according to a data transmission protocol corresponding to the communicator. For example, the display device may be connected to the image collector 231 through a local area network or the internet, and after the network connection is established, the image collector 231 may transmit the collected data to the display device through a network transmission protocol.
In some embodiments, the image collector 231 may also be externally connected to a display device by way of a wireless network connection. For example, for a display device supporting a WiFi wireless network, a WiFi module is provided in a communicator thereof, and thus, the display device and the image pickup 231 can be made to establish a wireless connection by connecting the image pickup 231 and the display device to the same wireless network. After the image data is collected by the image collector 231, the image data may be sent to a router device of the wireless network, and forwarded to the display device by the router device. Obviously, the image collector 231 may also be connected to the display device by other wireless connection means. The wireless connection mode includes but is not limited to WiFi direct connection, cellular network, analog microwave, bluetooth, infrared and the like.
In some embodiments, the display device may display a user interface after the user controls the display device to power on. Fig. 7 is a schematic diagram of a user interface according to an embodiment of the present application. The user interface includes a first navigation bar 700, a second navigation bar 710, a function bar 720, and a content display area 730, the function bar 720 including a plurality of function controls such as "watch record", "my favorites" and "my applications" and the like. Wherein the content displayed in the content display area 730 will change as the selected control in the first navigation bar 700 and the second navigation bar 710 changes. The user can control a control by touching to control the display device to display the display panel corresponding to the control. It should be noted that, the user may input the selection operation of the control by other manners, for example, selecting a control by using a voice control function or a search function.
Whether the image collector 231 is built in the display device or externally connected with the display device, a user can start the image collector 231 to collect image data through specific interaction instructions or application program control in the process of using the display device, and the collected image data is correspondingly processed according to different requirements. For example, camera applications may be installed in the display device, which may invoke the cameras to implement respective associated functions. The camera application refers to a camera application which needs to access a camera, and can process image data collected by the camera, so as to realize related functions, such as video chat. The user can view all applications installed in the display device by touching the "my applications" control. A list of applications may be shown in the display. When the user selects to open a certain camera application, the display device may run a corresponding camera application, and the camera application may wake up the image collector 231, and the image collector 231 may further detect image data in real time and send the image data to the display device. The display device may further process the image data, for example, control a display to display an image, etc.
In some embodiments, the display device may interact with a user gesture to identify a control command of the user. The user may interact with the display device using static gestures to input control instructions. Specifically, in the gesture interaction process, the user may put out a specific gesture within the shooting range of the image collector 231, and the image collector 231 may collect the gesture image of the user and send the collected gesture image to the display device. The display device may further identify a gesture image, and detect a type of gesture in the image. The gesture interaction strategy can be pre-stored in the display device, the control instruction corresponding to each type of gesture can be defined, one gesture type can correspond to one control instruction, and the display device can set gestures for triggering specific control instructions according to different purposes. And comparing the type of the gesture in the image with the corresponding relation in the interaction strategy one by one, determining a control instruction corresponding to the gesture, and implementing the control instruction.
For example, when the user puts out a gesture in which five fingers are closed in the shooting range of the image pickup 231 and the palm faces the image pickup 231, the display device may recognize the gesture in the gesture image picked up by the image pickup 231 and determine a control instruction of "pause/start play" for the gesture. And finally, executing pause or starting playing control on the current playing interface by running the control instruction. It should be noted that the number of the substrates,
In the above embodiment, the gesture recognition adopts a static gesture recognition mode, and the static gesture recognition can recognize the gesture type to determine the corresponding control instruction. Each time the user presents a static gesture, an independent control command is input by the user, such as controlling the volume by one. It should be noted that, when the user keeps a static gesture for a long time, the display device may still determine that the user has input a control instruction. Therefore, for some control instructions requiring a consistent operation, if a static gesture interaction manner is adopted, the control instructions are too cumbersome.
For example, when a user wants to control a focus in a display to select a control, the focus may be moved down, right, and down in sequence. At this time, the user needs to continuously change the static gesture to control the focus to move, which results in poor user experience. Or if the focus is required to be continuously moved a plurality of times in one direction, the user is required to continuously make a static gesture. Because the user is determined to input a control command even if the user holds a static gesture for a long time, the user needs to put down his/her hand after making a static gesture and then make the static gesture again, thereby affecting the use experience.
In some embodiments, the display device may also support dynamic gesture interactions. The dynamic gesture refers to a mode that a user can input a control instruction to the display device by using a dynamic gesture input mode in an interaction process. Wherein, can set as: the control instruction can be input to the display device through a series of dynamic gestures, different control instructions of multiple types can be sequentially input to the display device through different types of gestures, and the same control instruction of one type can be continuously input to the display device through the gestures of the same type, so that the gesture interaction type of the display device is expanded, and the richness of the gesture interaction form is improved.
For example, when the user adjusts the gesture from opening of the five fingers to closing of the five fingers within 2s, that is, inputs a grabbing action lasting 2s, the display device may continuously acquire gesture images within a detection period of 2s, and recognize gesture types in the gesture images frame by frame, so as to recognize the grabbing action according to gesture changes in multiple frame images. And finally, determining a control instruction corresponding to the grabbing action, namely 'full screen/window playing', executing the control instruction, and adjusting the size of a playing window.
In some embodiments, when a user interface is displayed in the display device, the user may control the focus in the display to select a control and trigger. As shown in fig. 7, the current focus has selected the my applications control. Considering that the user may be complicated when using the control device to control the movement of the focus, in order to increase the experience of the user, the user may also select the control by using a dynamic gesture.
The display device may be provided with a cursor control mode. When the display device is in cursor control mode, the original focus in the display may be changed to a cursor, as shown in fig. 8, which selects the my application control. The user can control the cursor to move by utilizing the gesture, so that a certain control is selected to replace the original focus movement.
In some embodiments, the user may send a cursor control mode instruction to the display device by operating a designated key of the remote control. And pre-binding the corresponding relation between the cursor control mode instruction and the remote controller key in the actual application process. For example, a cursor control mode key is set on the remote controller, when the user touches the key, the remote controller sends a cursor control mode instruction to the controller, and at this time, the controller controls the display device to enter the cursor control mode. When the user touches the key again, the controller may control the display device to exit the cursor control mode.
In some embodiments, the correspondence between the cursor control mode command and the plurality of remote controller keys may be pre-bound, and when the user touches the plurality of keys bound to the cursor control mode command, the remote controller issues the cursor control mode command.
In some embodiments, the user may send a cursor control mode instruction to the display device by way of voice input using a sound collector of the display device, such as a microphone, causing the display device to enter a cursor control mode.
In some embodiments, the user may also send cursor control mode instructions to the display device through a preset gesture or action. The display device may detect the user's behavior in real time through the image collector 231. When the user makes a preset gesture or action, the user may be considered to have sent a cursor control mode instruction to the display device.
In some embodiments, the cursor control mode instruction may also be sent to the display device when the user controls the display device using a smart device, such as a cell phone. In the practical application process, a control can be set in the mobile phone, whether the mobile phone enters a cursor control mode or not can be selected through the control, and therefore a cursor control mode instruction is sent to the display device.
In some embodiments, a cursor control mode option may be set in a UI interface of the display device, which when clicked by a user, may control the display device to enter or exit cursor control mode.
In some embodiments, to prevent the user from triggering the cursor control mode by mistake, when the controller receives the cursor control mode instruction, the display may be controlled to display the cursor control mode confirmation information, so that the user performs secondary confirmation, whether to control the display device to enter the cursor control mode. Fig. 9 is a schematic diagram of displaying cursor control mode confirmation information in a display according to an embodiment of the present application.
When the display device enters a cursor control mode, a user can control the cursor to move by utilizing gestures, so that a control to be triggered is selected.
Fig. 10 is an interaction flow chart of each component of the display device provided by the embodiment of the application, which includes the following steps:
s1001: and acquiring a user behavior image.
In some embodiments, when the display device is detected to enter the cursor control mode, the controller may wake up the image collector 231, send an on instruction to the image collector 231, and thus start the image collector 231 to take an image. At this time, the user may make a dynamic gesture within the shooting range of the image collector 231, and the image collector 231 may continuously shoot multiple frames of user images along with the dynamic gesture of the user.
Specifically, the image collector 231 may capture a user behavior image at a preset frame rate, for example, 30 frames per second (30 FPS) of the user behavior image. Meanwhile, the image collector 231 may also transmit each photographed frame of the user behavior image to the display device in real time. Note that, since the image collector 231 transmits the photographed user behavior image to the display device in real time, the rate at which the display device acquires the user behavior image may be the same as the photographing frame rate of the image collector 231.
For example, when the image collector 231 performs image capturing at a frame rate of 30 frames per second, the controller may also acquire a user behavior image at a frame rate of 30 frames per second.
In some embodiments, the image collector 231 collects several frames of user behavior images, which may be sequentially transmitted to the display device. The display device can recognize each frame of user behavior image successively, so that user gestures contained in the user behavior image are recognized, and control instructions input by a user are determined.
S1002: for the collected user behavior images, the controller performs gesture recognition processing on the user behavior images, for example, a preset dynamic gesture recognition model can be used for processing each frame of user behavior images successively.
The controller may input the user behavior image into a dynamic gesture recognition model, where the dynamic gesture recognition model may further recognize a user gesture included in the image, for example, may recognize position information of key points such as a finger, a joint, and a wrist included in the user behavior image, where the key point position refers to position coordinates of the key point in the user behavior image. After recognition, the target gesture information for each frame of user behavior image may be output in turn.
S1003: and acquiring the cursor position according to the gesture information of the user.
S1004: and determining a gesture movement track according to the cursor position.
S1005: the controller controls the cursor to move so that the display displays that the cursor moves along the gesture movement track.
FIG. 11 is a schematic diagram of a user gesture according to an embodiment of the present application. It can be set as: the keypoints used to characterize the user gesture include 21 finger keypoints. The dynamic gesture recognition model can confirm the user gesture in the user behavior image and recognize the position information of the key points of the 21 fingers of the user hand, namely the position coordinates in the user behavior image, and the position information of each key point can be represented by the coordinates of the corresponding point.
It should be noted that, when the dynamic gesture recognition model recognizes the user behavior image, the user gesture may be recognized, and the position information of each finger key point is obtained. At this time, the output target gesture information may include position information of all finger key points. However, due to the influence of different gestures of the user, some finger key points may be covered by the user, so that the finger key points do not appear in the user behavior image, and at this time, the dynamic gesture recognition model cannot acquire the position information of the finger key points, and the position information of the finger key points can only be null. That is, the target gesture information includes the position information of the finger key points recognized by the dynamic gesture recognition model, and the position information of the finger key points not recognized is a null value.
In some embodiments, the dynamic gesture recognition model may output to the controller after obtaining the target gesture information for each frame. The controller may further determine a control instruction indicated by the user according to the target gesture information of each frame. Since the user wants to control the cursor to move, the control instruction indicated by the user can be regarded as a position instruction indicating that the cursor needs to move. At this time, the controller may acquire a cursor position of each frame according to the target gesture information of each frame.
In some embodiments, given that the computing power of the display device may be weak, the display device may be in a higher load state if the display device is currently in a state that performs some other function, such as far-field speech, 4K video playback, etc. At this time, if the frame rate of the user behavior image input into the dynamic gesture recognition model is high, the real-time data processing amount is too large, and the rate of the model processing the user behavior image may be slow, so that the rate of acquiring the cursor position is slow, which results in a relatively stuck cursor moving in the display.
Thus, the controller may first detect the current load factor of the display device. When the load rate is higher than a preset threshold, for example, higher than 60%, the controller may cause the dynamic gesture recognition model to process each frame of user behavior image at regular intervals of a fixed period. For example, a fixed period may be set to process 15 frames of images for one second. So that the dynamic gesture recognition model can stably process the image. When the load rate of the display device is detected not to be higher than the preset threshold value, the dynamic gesture recognition model can be enabled to process each frame of user behavior image in real time. At this time, the controller may input the user behavior image transmitted from the image collector 231 into the dynamic gesture recognition model in real time, and control the model to recognize. The dynamic gesture recognition model may also be processed at regular intervals in a fixed period.
It should be noted that the rate at which the dynamic gesture recognition model outputs the target gesture information and the rate at which the user behavior image is processed may be the same. When the dynamic gesture recognition model processes an image at regular intervals of a fixed period, it outputs target gesture information at regular intervals of a fixed period. The model also outputs target gesture information in real time as it processes the image in real time.
In some embodiments, in order to enable the cursor displayed in the display to generate a real-time motion track according to the dynamic gesture of the user, so that the cursor smoothly follows the dynamic gesture, the controller may determine the cursor position of each frame according to the information indicated by the gesture by the user.
When a user controls a cursor by using a gesture, in gesture images of user motion continuously shot in a period of time, some frames of shot images may be blurred or the gesture is blocked, at this time, the dynamic gesture recognition model cannot recognize the result, and the related information of the target gesture cannot be obtained, for example, the target gesture information is null. At this time, the information indicated by the user cannot be obtained according to the target gesture information, that is, the cursor position cannot be obtained, so that the display device can predict the cursor position corresponding to the frame image, and the situation that the cursor is blocked, the track is interrupted and lost when following the gesture of the user due to the fact that the cursor is not moved due to the lack of the cursor position is avoided.
The display device may determine whether the information indicated by the user can be obtained according to the target gesture information obtained by the dynamic gesture recognition model, for example, the position information of the finger key points shown in fig. 11. When the result of the dynamic gesture recognition model is null, that is, the target gesture information is null, the cursor position prediction can be performed.
In the embodiment of the present application, it may be set as follows: when a preset target gesture is detected, the user is considered to indicate the position information of the cursor movement. The target gesture may be a preset finger key point displayed by the user. For the user gesture schematic diagram shown in fig. 11, the key point No. 9 may be set as a control point for indicating the movement of the cursor by the user, that is, when the position information of the preset finger key point is detected, the movement of the cursor indicated by the user is determined. The display device can further determine the position information of the cursor movement according to the position information of the preset finger key points.
Therefore, when the position information of the preset finger key point is detected in the target gesture information, the position information of the cursor movement can be acquired. In the embodiment of the application, the virtual position information refers to the position information of the preset finger key points, namely the position information of the target gestures in the user behavior image.
In some embodiments, the display device may detect whether virtual location information is included in each frame of target gesture information. If a certain frame of target gesture information comprises virtual position information, namely the position information of a preset finger key point is identified, the frame of target gesture information is considered to be detected in the user behavior image, namely the user specifically indicates how the cursor moves. At this time, the display device may determine the position information of the cursor to be moved according to the virtual position information.
If the virtual position information is not included in the target gesture information of a certain frame, namely the position information of the preset finger key point is null, the target gesture is not detected in the user behavior image of the frame, at the moment, the user does not specifically indicate how the cursor should move, and the display device needs to self-predict the position information of the supplemental cursor which needs to move.
Fig. 12 is a schematic flow chart of determining a cursor position according to target gesture information according to an embodiment of the present application, which is described below, and includes the following steps:
S1201: judging whether the gesture information of the target user comprises virtual position information or not; if yes, then execution proceeds to S1202, otherwise execution proceeds to 1204.
S1202: and acquiring an initial cursor position according to the virtual position information.
S1203: the initial cursor position is adjusted.
S1204: the cursor position is predicted.
In some embodiments, for a frame of user behavior image, the controller may obtain location information for which the cursor needs to be moved, respectively, for both cases where whether the target gesture is detected.
If the target gesture is detected, that is, the frame of target gesture information contains virtual position information, at this time, position information of a cursor to be moved, that is, a cursor position corresponding to a user behavior image, can be obtained according to the virtual position information.
Specifically, the virtual position information is characterized by the position information of the preset finger key points identified in the user behavior image and is used for representing the position information of the target gesture of the user. But the position information is the position of the finger key point in the user behavior image, so the display device can map the target gesture of the user into the display, thereby obtaining the position of the cursor. When the target gesture of the user is mapped to the display, reference may be made according to the initial position of the cursor, and when the target gesture of the user is detected for the first time, the position of the finger key point in the frame image is determined as the initial position of the cursor, so as to form a mapping relationship. In the subsequent mapping, the target gestures of the subsequent user can be mapped to the display in sequence according to a preset mapping method, so that the cursor positions corresponding to the images of each frame are obtained.
In some embodiments, after the position information of the cursor is obtained, considering that the gesture movement of the user is stereoscopic, the movement direction is not only up, down, left and right, but also front and back in the air, in the mapping process of the cursor, if the gesture is frequently moved and the gesture state is unstable, the cursor can shake and other problems, so that the movement of the cursor is smoother, the user experience is better, the display device can also adjust and optimize the position of the cursor, so that the cursor can be dynamically anti-shake, and the movement track is smooth and stable.
The display device may map the target gesture in the target user behavior image to the display according to the virtual location information, to obtain the original cursor location F c. The original cursor position in the embodiment of the application refers to: the coordinates identified by the dynamic gesture recognition model are mapped directly to coordinates in the display. The original cursor position is adjusted and optimized to obtain the target cursor position, and in the embodiment of the application, the target cursor position refers to: after adjustment and optimization, the cursor is actually at the coordinate position displayed in the display.
Specifically, the display device may adjust the original cursor position according to the following method:
The display device may obtain the first position value according to the cursor position F p corresponding to the previous frame of the user behavior image of the target user behavior image and a preset adjustment threshold, and may obtain the second position value according to the original cursor position and the preset adjustment threshold. And acquiring a target cursor position F c1 corresponding to the target user behavior image according to the first position value and the second position value. Can be expressed by equation 1:
F c1=E 1*F p+(1-E 1)*F c (1)
wherein:
F c1 denotes the adjusted target cursor position;
e 1 represents a preset adjustment threshold;
F c denotes the original cursor position before adjustment, and F p denotes the cursor position corresponding to the previous frame of user behavior image.
Through a preset adjustment threshold value, the original cursor position can be adjusted according to the cursor position corresponding to the previous frame of image, so that the shake offset possibly occurring in the target gesture of the frame is reduced, and the movement of the cursor is optimized.
The adjustment threshold may be preset according to the following method:
wherein:
E 1 denotes a preset adjustment threshold.
K represents a first adjustment parameter; g represents a second adjustment parameter; the first adjusting parameter and the second adjusting parameter are all numbers between 0 and 1, and can be set by the related technicians.
S g denotes the size of the target user behavior image. The size of the user behavior image refers to the size of the user behavior image relative to the display.
Specifically, the display device may display the captured user behavior image in the display, so that the user can intuitively determine the current gesture situation. Fig. 13 is a schematic diagram of a display displaying a camera area according to an embodiment of the present application. The camera area displays the picture condition shot by the camera, and the size of the whole camera area can be set by the display equipment. The user may choose to turn on or off the camera area, but when the camera area is turned off, its size is set to be the same as when it is turned on.
S c denotes the size of the control at the cursor position corresponding to the previous frame of the user behavior image of the target user behavior image. For each cursor movement, the cursor may be considered to have selected a control. Thus, the adjustment threshold can be set according to the control selected by the previous frame of cursor.
S tv denotes the size of the display.
After the original cursor position is adjusted, the target cursor position corresponding to the target user behavior image, namely the position to which the cursor needs to be moved, can be determined.
In some embodiments, if the target gesture of the user is not detected in the target user behavior image, that is, the frame of target gesture information does not include virtual position information, at this time, the display device may predict the cursor position corresponding to the target user behavior image, so that the cursor can normally move.
Specifically, to better predict the position of the cursor, the display device may first determine the type of cursor movement. It should be noted that the types of cursor movement can be classified into two types: linear motion and curvilinear motion. When the cursor moves along the straight line, the gesture motion of the user is also moved along the straight line, so that the gesture motion is relatively stable, and the frame loss phenomenon is generally avoided when the image is shot. However, when the cursor moves along the curve, the gesture motion representing the user also moves along the curve, and at this time, compared with a straight line, the stability is poor, and the frame loss rate is slightly high. Therefore, a threshold for detecting frame loss can be preset to determine whether the cursor moves linearly or in a curve.
The display device may detect whether the number of the user behavior images, in which the target gesture of the user is not detected, exceeds a preset detection threshold, in a preset detection number of the user behavior images, for example, in 20 frames of images, and may set the detection threshold to 0.
Therefore, it can be detected whether the number of images in which the frame loss occurs in the previous 20 frame images is greater than 0, that is, whether the frame loss occurs in the previous 20 frame images. If the frame loss condition does not occur, the cursor is considered to be in linear motion, and the first type of motion is set in the embodiment of the application; if the frame loss occurs, the cursor is considered to be doing curvilinear motion, and the second type of motion is set in the embodiment of the application.
In some embodiments, the display device may perform a first process on the target user behavior image when it is detected that the cursor is moving linearly, thereby predicting the target cursor position.
Fig. 14 is a schematic diagram of a cursor moving along a straight line according to an embodiment of the present application. The initial position of the cursor is A1, and the acquired cursor positions are A2, A3 and A4 in sequence. The cursor moves along the straight line, and A5 is the predicted target cursor position of the frame.
Specifically, the controller may obtain the historical cursor position offset according to the cursor positions corresponding to the first two frames of user behavior images of the target user behavior image, where the historical cursor position offset is used to characterize the movement condition of the previous cursor.
The controller may obtain the cursor movement speed based on the historical cursor position offset and the first time. Wherein, the first time refers to: and processing the time interval between the first two frames of user behavior images of the target user behavior image by a preset dynamic gesture recognition model. In general, the time taken for the dynamic gesture recognition model to process an image frame is fixed, and therefore, the first time can also be considered as: and outputting target gesture information corresponding to the previous two frames of user behavior images by the dynamic gesture recognition model, wherein the time is interval.
It should be noted that, when the dynamic gesture recognition model processes the image at regular intervals in a fixed period, the first time is a fixed value and does not need to be acquired every time. When the dynamic gesture recognition model processes images in real time, the time for outputting the difference of recognition results of the previous two frames of images by the model needs to be acquired in real time.
The controller can acquire the target cursor position offset of the cursor according to the cursor moving speed, the second time and a preset first prediction threshold value. Wherein, the second time is: the preset dynamic gesture recognition model processes the time interval between the target user behavior image and the previous frame of user behavior image, namely the time interval from the moment when the model outputs the recognition result of the previous frame of image to the moment when the model outputs the recognition result of the current frame of image. The controller can predict the movement condition of the cursor.
And finally, the controller can sum the coordinate position corresponding to the previous frame of user behavior image and the target cursor position offset, and the target cursor position can be obtained by carrying out the offset movement at the position of the previous frame of cursor.
The prediction method can be expressed by formulas 3 and 4:
F 0=v*Δt 0*E 2+F 0-1 (3)
v=(F 0-1-F 0-2)/Δt (4)
wherein:
F 0 denotes the target cursor position; v represents the speed of the cursor this time, Δt 0 represents the second time;
S f represents a preset first prediction threshold;
f 0-1 represents the coordinate position corresponding to the previous frame of user behavior image;
F 0-2 represents the coordinate position corresponding to the user behavior image of the previous second frame; Δt represents the first time.
The first prediction threshold may be preset according to the following method:
wherein:
E 2 represents a first prediction threshold, which may be a value of 0.6. a1 represents a first prediction parameter; a2 represents a second prediction parameter. The first prediction parameter and the second prediction parameter are both numbers between 0 and 1, and can be set by a relevant technician.
D f represents the processing rate of the dynamic gesture recognition model preset in the preset time to the user behavior image.
C f represents the rate at which the image collector 231 collects the user behavior image within the preset time.
P f represents the frame rate of cursor movement within a preset time. The frame rate of cursor movement refers to the frequency of the number of cursor movements, and may be considered as how many times the cursor has been moved in a unit time, and the cursor is moved once from one cursor position to the next.
Specifically, the preset time may be 1s. It is thus possible to acquire the rate at which the model processes the image, the rate at which the image is captured by the image capture unit 231, and the frame rate of cursor movement within the preceding second of the target user behavior image. Further, the first prediction threshold value may be set.
According to the above formula, the position coordinates of the cursor under linear motion can be predicted.
In some embodiments, the display device may perform a second processing on the target user behavior image when it is detected that the cursor is doing a curvilinear motion, thereby predicting the target cursor position.
Fig. 15 is a schematic diagram of a cursor moving along a curve according to an embodiment of the present application. The initial position of the cursor is A1, and the acquired cursor positions are A2-A9 in sequence. The first frame loss phenomenon occurs in the image corresponding to the cursor position A4, and the current movement (movement between A1 and A4) of the cursor is recognized to be a linear movement condition due to the first frame loss. The positions A5 and A6 are coordinates mapped according to the target gesture of the user. The second frame loss phenomenon occurs in the corresponding image of the cursor position A7, so that the current movement of the cursor (movement between A5 and A7) is considered to move along a curve, and the cursor position A7 is obtained according to prediction. The positions A8 and A9 are coordinates mapped according to the target gesture of the user. At this time, the frame loss phenomenon occurs in the target user behavior image, and the frame loss is performed for the third time for the whole (the preset detection number), at this time, the cursor is considered to move along the curve (the movement between A8 and a 10), and the cursor position a10 can be predicted.
The second frame loss occurs for the target user behavior image, so that the cursor is considered to move along the curve, and the predicted cursor position of the target user behavior image can be A8.
In addition, when the curve motion is performed, the method for predicting the cursor position is similar to that of the linear motion. The previous cursor movement condition, namely the historical cursor position offset, can be acquired first.
And then acquiring the cursor moving speed according to the historical cursor position offset and the first time. And acquiring the target cursor position offset of the cursor according to the cursor moving speed, the second time and a preset second prediction threshold value.
Finally, the controller can calculate the difference between the coordinate position corresponding to the previous frame of user behavior image and the target cursor position offset, and the target cursor position can be obtained by carrying out the offset movement at the position of the previous frame of cursor.
The specific prediction method can be expressed by formulas 6 and 7:
F 0=F 0-1-v*Δt 0*E 3 (6)
v=(F 0-1-F 0-2)/Δt (7)
wherein:
S b represents a second prediction threshold, which may be a value of 0.3.
Specifically, the second prediction threshold may be preset according to the following method:
E 3=b*E 2 (8)
Wherein b represents a third prediction parameter. The third prediction parameter is a number between 0 and 1, which can be set by the skilled person and can be 0.5.
According to the above formula, the position coordinates of the cursor under the curve motion can be predicted.
In some embodiments, a preset threshold of continuous frame loss may be set, which may be 4, considering that the continuous multi-frame user behavior image may have a frame loss. Within this threshold, the display device may continue to predict the position of the cursor if the user behavior image continues to exhibit a frame loss condition.
Specifically, before gesture recognition is performed on the target user behavior image of the present frame, a preset threshold value of the user behavior image before the present frame image may be detected first, and in the 4 frame user behavior images, whether all the images do not detect the target gesture, that is, whether all the previous 4 frame images of the target user behavior image lose frames.
If so, the user may be considered to no longer indicate the cursor position with a gesture, at which point the user may have put down his hand, having determined the control that the cursor should select. At this time, the cursor can be controlled not to move, and the gesture movement of the user is considered to be ended. And until the camera shoots the gesture of the user again, the gesture recognition of the next round can be performed.
If not, the user is considered to be still indicating the cursor position with a gesture, except for the first few frames, which are all dropped for some cases. At this time, the controller may continue gesture recognition on the target user behavior image, and determine a cursor position corresponding to the current frame image.
In some embodiments, for the case of predicting the cursor position, only occurs in the process after the cursor has started to move, that is, the first position of the cursor is not predicted, but only obtained according to the user instruction, specifically, after the display device enters the cursor control mode, it may be set as follows: and after the target gesture of the user is detected for the first time, allowing the cursor to start moving so as to avoid the condition of frame loss of the first frame image.
In some embodiments, after determining the target cursor position corresponding to the target user behavior image, the gesture movement track of the user may be determined according to the cursor position. Considering that the distance between the cursor positions of every two frames is relatively short, the cursor can be considered to make a linear motion between the cursor positions of two frames. The target cursor position may be made to reach the target cursor position along a straight line from the cursor position of the previous frame. Namely, the target cursor position is connected with the cursor position of the previous frame to obtain a gesture movement track.
The controller can further enable the cursor to move along the gesture movement track.
In some embodiments, the user may not control the cursor to move after the cursor moves along the gesture movement track. At this time, the cursor may be located within the area of a control or at the edge of a control. When the cursor is located within the area of a control, it may be determined that the control was selected by the user, and the display device may allow the user to confirm whether to trigger the control. However, if the cursor is located at an edge of the control and beyond, and thus fails to select a control, the display device cannot enable the user to confirm that the control is triggered.
Therefore, when the cursor does not explicitly fall into the area of a certain control, the corresponding control when the cursor is stationary needs to be determined, namely, the control finally selected by the user is determined.
Specifically, the position information of the preset size can be determined according to the position of the cursor. For example, the preset size may be 500×500. For the cursor position (a, b), a region of size 500 x 500 can be determined centered on this coordinate.
The controller can determine all the controls in the area and acquire the distances from all the controls to the cursor. The distance from the control to the cursor is set as follows: average distance from the middle point of four sides of the control to the cursor. As shown in fig. 16, the position of the cursor is point O. For a control A, the midpoints of four sides of the control A are sequentially B1, B2, B3 and B4. The distances from the four midpoints to the cursor are sequentially X1, X2, X3 and X4. Thus, the distance of the control to the cursor is: (x1+x2+x3+x4)/4.
In some embodiments, it is contemplated that when the control is smaller in size, the mid-point-to-cursor distance on its four sides may be shorter, thereby affecting the decision. Thus, the distance of each control to the cursor may also be determined as follows.
Specifically, in the embodiment of the application, two positional relations between the cursor and the control are set. One is that the cursor and the control are located in the same horizontal direction or the same vertical direction, and one is that the cursor and the control are located in neither the same horizontal direction nor the same vertical direction.
Fig. 17 is a schematic diagram of a positional relationship between a cursor and a control according to an embodiment of the present application.
The cursor position is (a, b). For one control, the dimensions are set to be wide w and high h. The coordinates of the four vertexes are as follows: (x-w, y-h), (x+w, y+h), (x-w, y+h). The vertical straight lines corresponding to the two vertical sides of the control are L1 and L2 respectively, and the horizontal straight lines corresponding to the two horizontal sides are L3 and L4 respectively. In the embodiment of the application, if the cursor is positioned in the area between the vertical straight lines, the cursor and the control are considered to be positioned in the same vertical direction; if the cursor is located in the area between the horizontal lines, the cursor and the control are considered to be in the same horizontal direction. If the cursor is not located within both regions, the cursor and the control are considered to be located neither in the same horizontal nor in the same vertical direction. As in fig. 17, the cursor O1 and the control a are located in the same vertical direction, the cursor O2 and the control a are located in the same horizontal direction, and the cursor O3 and the control a are neither located in the same horizontal direction nor in the same vertical direction.
Specifically, for all the controls in the area, the relationship between the cursor position and the control position can be judged.
If x < a < x+w and y < b < y+h. The cursor is illustrated to be positioned in the control area, and other controls are not required to be considered at this time, so that the control can be determined to be the control selected by the user.
If x < a < x+w is satisfied but y < b < y+h is not satisfied, the cursor and control are in the same vertical direction.
If x < a < x+w is not satisfied, but y < b < y+h is satisfied, the cursor and control are in the same horizontal direction.
If x < a < x+w and y < b < y+h are not satisfied, the cursor and control are neither in the same horizontal nor in the same vertical direction.
If the cursor and the control are positioned in the same vertical direction or the same horizontal direction, the distance between the cursor and the control can be calculated according to the following method.
The distances from the four sides of the control A to the cursor O are respectively obtained: t1, T2, T3, T4. And taking the result with the smallest value in the four distances as the distance between the cursor and the control, wherein the distance is as follows: MIN (T1, T2, T3, T4).
If the cursor and the control are located neither in the same horizontal direction nor in the same vertical direction, the distance between the cursor and the control can be calculated as follows.
The distances between the four vertexes of the control A and the cursor O are respectively obtained: p1, P2, P3, P4. And taking the result with the smallest value in the four distances as the distance between the cursor and the control, wherein the distance is as follows: MIN (P1, P2, P3, P4).
The controller may set the control with the shortest distance as the control selected by the cursor.
When the user touches the confirm key, the display device may trigger a control selected by the cursor.
With the development of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) technology, more and more gesture interactions can be applied to the interaction process of display devices. The gesture interaction aims at controlling the display device to execute corresponding control instructions by detecting specific gesture actions made by a user. For example, the user may control the display device to perform a fast-backward or fast-forward play operation by a left or right hand waving action instead of a left or right direction key on a control device such as a remote controller.
Typically, the manner in which gesture interactions supported by a display device are based on static gestures, i.e., the hand shape is maintained as the user makes a particular gesture. For example, when performing a swing motion to the left or right, the user needs to hold the five fingers together and the palm moves in parallel to perform the swing motion. When interaction is performed, the display device may detect a static gesture according to the gesture type recognition algorithm, and then execute a corresponding control action according to the gesture type.
It can be seen that the static gesture-based interaction mode supports a small number of gestures, and is only suitable for a simple interaction scene. In order to increase the number of supported gestures, part of the display device also supports dynamic gesture interactions, i.e. specific gesture interactions are achieved by continuous actions over a period of time. However, due to the limitation of the model used in the dynamic gesture detection process, the above dynamic gesture interaction process does not support user-defined gestures, and thus the requirements of users cannot be met.
In some embodiments, the dynamic gesture recognition may adopt a training method such as deep learning to perform model training to obtain a dynamic gesture recognition model, then input a plurality of continuous frame gesture image data into the dynamic gesture recognition model obtained by training, and calculate through a classification algorithm inside the model to obtain target gesture information corresponding to the current multi-frame gesture image. The target gesture information may generally be associated with a particular control instruction that the display device 200 may implement dynamic gesture interactions by executing.
For example, training data may be generated based on gesture image data, where each frame of user behavior image is provided with a classification tag, i.e. representing the gesture type corresponding to the current frame of user behavior image. Meanwhile, a plurality of continuous frame user behavior images are uniformly provided with dynamic gesture labels, namely dynamic gestures corresponding to the multi-frame user behavior images are represented. After generating the training data, training data comprising a plurality of successive frames of gesture images may be input into the initial dynamic gesture recognition model to obtain a classification probability output by the recognition model. And performing loss function operation on the classification probability output by the model and the classification labels in the training data, and calculating the classification loss. Finally, model parameters in the recognition model are adjusted according to the calculated classification loss back propagation. Repeating the model training process of classification calculation, loss calculation and back propagation, and obtaining the identification model capable of outputting accurate classification probability through a large amount of training data. By using the recognition model obtained by training, the display device 200 may input a plurality of continuous frame user behavior images detected in real time into the recognition model, thereby obtaining a classification result output by the recognition model, determining dynamic gestures corresponding to the plurality of continuous frame user behavior images, and then matching control instructions corresponding to the dynamic gestures, so as to implement dynamic gesture interaction.
In some embodiments, dynamic gesture interactions may also support user-defined operations, i.e., provide a display device control method that may be applied to display device 200. To satisfy gesture interactions of a user with the display device, the display device 200 should include at least a display 260 and a controller 250. And embeds or circumscribes at least one image collector 231. Wherein, the display 260 is used for displaying a user interface to assist the interactive operation of the user; the image collector 231 is used for collecting user behavior images input by a user. Fig. 18 is a schematic diagram of a dynamic gesture interaction flow provided in an embodiment of the present application, as shown in fig. 18, a controller 250 is configured to execute an application program corresponding to the control method of the display device, where the application program includes the following contents:
And acquiring a gesture information stream. Wherein the gesture information stream is video data generated by the image collector 231 through successive image capturing, and thus the gesture information stream includes successive multi-frame user behavior images. After the gesture interaction is started, the display device 200 may send an opening instruction to the image collector 231, and start the image collector 231 to capture an image. After the image capturing is started, the user can make a dynamic gesture within the capturing range of the image capturing unit 231, and then the image capturing unit 231 can continuously capture multiple frames of user behavior images along with the dynamic gesture action of the user. And sends the multiple frames of user behavior images obtained by shooting to the controller 250 in real time to form a gesture information stream.
Since the gesture information stream includes a plurality of frames of user behavior images, and the user behavior images are obtained by capturing by the image capturing unit 231, the frame rate of the user behavior images included in the gesture information stream may be the same as the frame rate of the image capturing by the image capturing unit 231. For example, when the image collector 231 performs image capturing at a frame rate of 30 frames per second (30 FPS), the controller 250 may also acquire a gesture information stream at a frame rate of 30 frames per second.
However, in some less computationally capable display devices 200, too high a frame rate will result in too much real-time data processing by the controller 250, affecting the response speed of gesture recognition. Thus, in some embodiments, the display device 200 may also obtain a lower frame rate gesture information stream. In order to reduce the frame rate of the gesture information stream, the display device 200 may extract a plurality of frames of user behavior images at equal intervals among the images captured by the image capture unit 231. For example, the display apparatus 200 may extract one frame of user behavior image every one frame in the gesture image obtained by photographing by the image collector 231, thereby obtaining a gesture information stream with a frame rate of 15. The display device 200 may also send a control instruction for frame rate adjustment to the image collector 231, and control the image collector 231 to capture only 15 frames of gesture image data per second, thereby forming a gesture information stream with a frame rate of 15.
It should be noted that, since the input process of the dynamic gesture may be affected by different input speeds of the user actions, that is, the gesture input actions of some users are faster and the gesture input actions of some users are slower. Obviously, for the gesture input when the motion is slower, the gesture difference between the adjacent frames is smaller, so that the gesture information flow with low frame rate can also represent the complete gesture input process. For the gesture input when the motion is faster, the gesture difference between the adjacent frames is larger, so that part of key gestures may be lost in the gesture information flow with low frame rate, and the accuracy of gesture recognition is affected. Therefore, in order to improve the accuracy of gesture recognition, the display device 200 should keep the frame rate as high as possible to obtain the user behavior image, for example, the user behavior image may be a gesture interaction image of the user, and the frame rate of the gesture information stream may be maintained in the 15-30FPS interval.
In addition, in some embodiments, the display device 200 may dynamically adjust the frame rate of the gesture information stream in a specific interval according to the current running load, so as to improve the accuracy of gesture recognition by acquiring the gesture information stream with high frame rate when the computing capability is sufficient; and when the computing power is insufficient, excessive consumption of the computing power of the controller 250 is reduced by acquiring the gesture information stream with low frame rate.
After acquiring the gesture information stream, the display device 200 may perform gesture recognition processing on each frame of user behavior image in the gesture information stream, so as to extract key gesture information from the gesture information stream. The gesture recognition processing can be based on an image recognition algorithm, and positions of key points such as fingers, joints and wrists are recognized in the user behavior image. I.e. the keypoint coordinates are used to characterize the imaging position of the hand joint in the user behavior image.
For example, the display apparatus 200 may identify the position coordinates of each key point in the current user behavior image in the user behavior image by means of feature shape matching. And forming the coordinates of each key point into an information vector according to a set sequence. That is, as shown in fig. 11, the key points for representing the gesture motion may include 21 finger key points, and the position information of each key point may be represented by coordinates of the corresponding point. For a fingertip key point, the thumb fingertip coordinates are P T1=(x t1,y t1), the index finger fingertip coordinates are P T2=(x t2,y t2), the middle finger fingertip coordinates are P t3=(x t3,y t3) … …; similarly, the above coordinate representation mode is also adopted for the key points in the finger, namely, the coordinates in the thumb are as follows:
P M1=(x m1,y m1) … …; and the root key point is P B1=(x b1,y b1).
The above-mentioned fingertip, finger, and finger root coordinates may be combined to form a vector for representing fingertip information, finger-in information, and finger root information, that is, fingertip information F T is:
F T=[P T1,P T2,P T3,P T4,P T5]
The information in finger F M is:
F M=[P M1,P M2,P M3,P M4,P M5]
The finger root information F B is:
F B=[P B1,P B2,P B3,P B4,P B5]
In addition to the above-described coordinate information of the fingertip F T, the mid-finger F M, and the finger root F B, the display device 200 may extract the palm coordinates P Palm and the wrist coordinates P Wrist in the user behavior image. And combining the coordinate information to form a gesture key coordinate set H Info. Namely, the gesture key coordinate set H Info is:
H Info=[P Palm,P Wrist,F T,F M,F B]
it can be seen that the gesture key coordinate set is a coordinate set formed by combining a plurality of key point coordinates. The display device 200 may thus determine the key gesture type from the set of gesture key coordinates based on the correlations of the key point locations in the set of gesture key coordinates. In order to determine the key gesture type, in some implementations, the display device 200 may identify key point coordinates in the user behavior image when extracting key gesture information from the gesture information stream, and then extract preset key point standard coordinates from the database. The standard coordinates of the key points are a set of template coordinates determined by an operator of the display device 200 through statistical analysis on gestures of a crowd, and each gesture may be provided with corresponding standard coordinates of the key points.
After extracting the key point coordinates and the key point standard coordinates, the display apparatus 200 may calculate a difference value of the key point coordinates and the key point standard coordinates. If the calculated difference value is smaller than or equal to a preset recognition threshold value, it is determined that the user gesture in the current user behavior image is similar to the gesture type in the standard gesture template, and therefore the gesture type corresponding to the standard coordinates of the key points can be determined to be the target gesture type.
For example, when the user puts out the five-finger gathering gesture on the image collector 231, a frame of user behavior image corresponding to the gesture is identified, so as to obtain a gesture key coordinate set H Info1, and then a standard gesture similar to the five-finger gathering gesture is matched in the database, so as to extract a key point standard coordinate H'. By calculating the difference between the two coordinate sets, i.e., h=h Info -H', if the difference is less than or equal to the preset recognition threshold H ", i.e., h+.h", then the matching hits the target coordinate set, so the target gesture type in the current user behavior image can be determined to be a five-finger close gesture.
In some embodiments, the key gesture information may also include a confidence parameter for characterizing differences between each gesture type and a standard gesture. At this time, the key gesture information may further include the following parameter items capable of representing the key gesture type, that is, gesture posture information includes, but is not limited to: hand facing information H F (Hand Face), hand facing information H O (HandOrientation), hand facing angle information H OB, left and right Hand information H S (Hand Side), gesture telescopic state information H T (HANDSTRETCHED), and the like. Each parameter item can be obtained through calculation of the gesture key coordinate set.
The hand orientation information may be used to indicate the orientation of the finger tip on the screen, that is, as shown in fig. 19, the finger tip is Up, down, left, right, center, default to unkenown, and therefore, the hand orientation information may be expressed as:
H O={Up,Down,Left,Right,Center,Unknown}
Similarly, when the hand orientation information is identified, the hand orientation deflection angle information can be determined according to the position relation among specific key point coordinates, and the confidence coefficient is equivalent to the confidence coefficient of the hand orientation information. For example, although the hand orientation is detected as Left, the hand orientation still has an angle of deviation, and may not be completely oriented to the Left, and then some subsequent processing is needed according to the angle of deviation information, so that false triggering can be prevented. I.e. the hand-facing angle of deviation can be expressed as:
H Ob=a(0<a<90)
The display device 200 may preferentially extract hand orientation information, that is, generate hand orientation information according to left and right hand and index finger key point information, and the display device 200 may generate left and right hand information H s using index finger root information P B2, little finger root information P B5, wrist information P Wrist, hand orientation angle information H OB, hand lateral and longitudinal information H XY, hand posture angle information H XB,H YB, and finally obtain hand orientation information H O. Namely:
H O=g(H OB,H XY,H XB,H YB)=f(P B2,P B5,P Wrist,H S,α)
The generation logic is as follows, and the offset angle f (delta X, delta Y) of the vector of the index finger root P B2 and the little finger root P B5 and the X-axis direction is calculated, wherein the value range of the offset angle is (0 degrees, 90 degrees). And obtaining hand orientation information according to the deflection angle, and setting a deflection angle threshold value to judge whether the orientation information is effective. For example, the yaw angle threshold β may be set to 5, that is, within 45±5, the orientation information is considered invalid, and the hand transverse direction and longitudinal direction information H XY, that is, the generation formula is as follows:
Wherein DeltaX is the horizontal coordinate difference of the root of the index finger and the root of the little finger; Δy is the vertical coordinate difference of the index finger root and the little finger root; f (Δx, Δy) is the declination; beta is the declination threshold.
Then calculating the middle point P M of the index finger root and the little finger root, calculating the middle points of four finger root connecting lines between the index finger and the little finger, and then calculating the difference delta Y between P M and the wrist coordinate P Wrist and the difference delta X between the index finger root and the little finger root, so that the hand orientation pitching angle information can be obtained:
Wherein H YB is the hand-facing pitch angle; Δx is the difference in horizontal coordinates of the index finger root and the little finger root; Δy is the difference in vertical coordinates of the root of the index finger and the root of the little finger.
If the pitch angle is too large, the hand orientation is considered to be Center, and the specific threshold value is α. Since the posture determination error of the Center orientation is large and cannot be used as a determination standard of the motion, the posture determination error can be directly equivalent to the Unknown in some scenes with low fineness requirements. Namely, the judgment formula is as follows:
Where H O is hand orientation information, including Center and the other two states, and α is the hand orientation pitch angle threshold.
Obviously, for some scenes requiring fine actions, more accurate hand gesture angle deviation information H XB,H YB is required, so the display device 200 can model the hand of the user, preset hand attribute information for different distances, and obtain more accurate hand gesture angle deviation information. That is, the user can input hand type (size) information at different distances in advance, and then according to the current frame distance information, the index finger root information P B2, the little finger root information P B5, the wrist information P Wrist, the left and right hand information H s can generate hand posture angle deviation information H XB,H YB.
Based on the intermediate point P M information, the wrist information P Wrist, the lateral-longitudinal-hand information H XY, and the left-right-hand information H s may generate corresponding orientation information. For example, in the right-hand portrait case, it is necessary to compare the Y-axis information of the wrist and the middle point, and if the Y-value of the middle point is smaller than the Y-value of the wrist, it is proved to be portrait. Thus:
H O=l(P M,P Wrist,H XY,H S)
The hand facing information H F indicates information of hand facing in the screen, and may include a specific value indicating facing, that is, forward and Back. The hand facing information H F defaults to Unknown. Namely:
H F={Front,Back,Unknown}
In the process of identifying the hand orientation information, the hand orientation deflection angle information can be determined and used for representing the hand orientation degree, and the degree is equal to the confidence of the hand orientation information. For example, although the hand facing information of the user is detected as Front, the hand facing information still has an angle of deviation, and may not be completely forward facing, and then some subsequent processing is required according to the angle of deviation information to prevent the false triggering gesture. Namely:
H Fb=a(0<a<90)
By extracting hand facing information, and generating hand facing information H F according to index finger root information P B2, little finger root information P B5, left and right hand information H s, and hand facing information H O, the generating logic is that, taking right hand up as an example, if x of index finger root is smaller than x of little finger root, it is proved to be Front, and more details are not repeated, and a general formula is used to replace:
H F=g(P B2,P B5,H S,α,H O)
For Left and Right hand information, it can be used to represent whether the hand image in the screen belongs to the Left hand or the Right hand of the user, where Left hand is Left and Right hand is Right, so the Left and Right hand information can be represented as:
H S={Right,Left,Unknown}
For the gesture telescopic state, the telescopic state of the finger can be represented, namely, the finger state in the extended state can be represented as 1, and the finger state in the contracted state can be represented as 0. Obviously, the telescopic state for the finger includes not only the two states of the extended state and the contracted state, and thus the telescopic state can be also indicated by setting different values, for example, the values indicating the telescopic state can be set to 0,1,2. Wherein, the complete contraction is 0, the half expansion is 1, the complete expansion is 2, and the flexible transformation can be realized according to specific application scenes. The gesture zoom state can thus be expressed as:
H T=[F 1,F 2,F 3,F 4,F 5](F=0 or 1or 2)
Wherein F 1~F 5 represents the telescopic state of five fingers respectively.
The gesture stretching state is extracted, and in this portion, the contracted state of each finger is mainly extracted, and the finally extracted contracted state attribute is 0 or 1 (in this embodiment, the state attribute 0 or 1 is taken as an example) according to information such as hand orientation, left and right hands, gesture key points, and the like, wherein 0 is the contracted state, and 1 is the extended state. Taking H o=Up,H S=Right,H F =front as an example, that is, when the user swings out the right hand to face the camera and the hand faces upward, assuming that the index finger tip coordinates are 50, the index finger middle coordinates are 70, the index finger tip is above the finger, the index finger is extended to 1, and if the index finger tip is 30, the index finger middle is 50, the index finger is contracted. The thumb and the other four fingers are different in comparison mode, and when the other four fingers are used for comparing the abscissa, the thumb needs to be used for comparing the ordinate. Under the condition that the hand orientation is Up and Down, the thumb needs to compare the x coordinate, and the other four fingers need to compare the y coordinate; whereas in the case of Right and Left hand orientations, the thumb needs to contrast the y-coordinate and the remaining four fingers need to contrast the x-coordinate. The states of the finger roots and the fingertips of the thumb need to be compared, the states of the middle finger and the fingertips of the other four fingers need to be compared, and the comparison point positions can be adjusted according to specific scenes, so that the information of the contracted states of the 5 fingers is finally obtained.
Through the gesture recognition process, key gesture information of the current frame can be obtained, wherein the key gesture information comprises hand orientation information H F, hand orientation information H O, hand orientation deflection angle information H OB, left and right hand information H s and gesture telescopic state information H T. The hand orientation deflection angle information can be used for judging the quasi-determination of the gesture orientation, a threshold value can be set in a specific scene, some fuzzy gesture is filtered, and gesture recognition accuracy is improved. Taking the right hand, the back of the hand facing the camera, the gesture is downward (86 degrees), and compared with gesture 1, the final key gesture information G Info can be expressed as:
G Info={H F=Back,H O=Down,H S=Right,H T={0,1,0,0,0},H OB=86}
Since user dynamic gestures are one continuous input process, i.e., gesture interactions can be divided into multiple phases, the key gesture information includes key gesture types for multiple phases. In some embodiments, the display device 200 may determine the intersection of key gesture types corresponding to multiple frames of user behavior images by traversing the target gesture types corresponding to multiple frames of user behavior images, i.e., dividing multiple phases of dynamic gestures according to multiple frames of user behavior images, the user behavior images in each phase belonging to the same target gesture type.
For example, the display device 200 may determine the key gesture types type 1-typen in each of the multiple user behavior images by analyzing the set of gesture key coordinates in the multiple user behavior images photo 1-photo. The key gesture types of the multi-frame user behavior images, type 1-typen, are compared, so that multi-frame user behavior images with the same key gesture types, such as photo 1-photo 30 and photo 31-photo, are respectively determined to be two stages, and the key gesture types of the two stages, namely type1 = type2 = … = type30 and type31 = type32 = … = typen, are determined.
For confidence parameters corresponding to multiple phases, in some embodiments, the confidence parameters include key gesture angles, and the display device 200 may calculate the gesture angles according to the key point coordinates and the key point standard coordinates; traversing gesture deflection angles corresponding to a plurality of continuous frame user behavior images in each stage to obtain a deflection angle union in each stage; and extracting the deflection angles in each stage and concentrating extremum values to be used as key gesture deflection angles in key gesture information of the current stage.
After extracting the key gesture information, the display device 200 may invoke the detection model for dynamic gesture matching. The detection model is a matching model and comprises a plurality of nodes stored in a tree structure, and a gesture template is arranged in each node. The nodes can be respectively located in different levels, the nodes of each level are provided with upper nodes except the root node and the leaf node, and the nodes of each level are designated as lower nodes. For example, in the memory of the display device 200, a plurality of gesture templates may be pre-stored, each gesture template for characterizing one type of static gesture action. Meanwhile, the display device 200 also constructs a gesture detection model according to the stored gesture templates, and in the detection model, node attributes and subordinate nodes corresponding to each gesture template can be given. Thus, in the display device 200, the gesture templates may still maintain the original stored quantity, and the detection model may be constructed by simply assigning node attributes.
Obviously, for the detection model, only one gesture template is inserted into each node, and each gesture template can be endowed with a plurality of node attributes. For example, a "grab-release" dynamic gesture includes three phases, namely a five-finger open gesture, a five-finger pinch gesture, a five-finger open gesture. The corresponding nodes and gesture templates in the detection model are: root node- "five-finger open gesture"; first level node- "five finger pinch gesture"; secondary node- "five finger spread gesture". It can be seen that for each node, only one gesture pose template is inserted, and for each gesture pose template, node attributes of different levels are correspondingly assigned, namely, a "five-finger open gesture" target is assigned two node attributes of a root node and a secondary node.
In the detection model, the root node is used to initiate a match, and may include a plurality of gesture templates, which may be used to match the initial gesture entered by the user. For example, the root node may insert a gesture template for characterizing triggering gesture interactions. The leaf nodes in the detection model are typically not inserted with specific gesture templates, but control instructions for representing specific response actions, so in embodiments of the present application, the nodes of the detection model do not include leaf nodes unless otherwise specified.
After invoking the detection model, the display device 200 may match the key gesture information using the detection model to obtain target gesture information, wherein the target gesture information is the same key gesture type as the gesture template at each stage and the confidence parameters are combined at nodes within the confidence interval. Thus, the target gesture information may be represented by one action (action) path. To determine target gesture information, display device 200 may match key gesture types for each stage in the key gesture information with gesture templates on each level of nodes in the detection model.
In performing the key gesture matching using the detection model, the display device 200 may first match gesture templates of the same type in the corresponding hierarchy based on the key gesture types of each stage. And recording the node corresponding to the gesture template when the matching hits the gesture template. Meanwhile, the display device 200 also determines whether the confidence coefficient parameter of the node is within a preset reasonable confidence coefficient interval. If the key gesture type of the current stage is the same as the gesture template and the confidence coefficient parameter is in the confidence coefficient interval, the matching of the next stage is started.
For example, for a "grab-and-release" dynamic gesture, after the user inputs the dynamic gesture, the display device 200 may first match the "five-finger open gesture" of the first stage with the gesture template in the root node, and when the match determines that the "five-finger open gesture" is the same as or similar to the five-finger open gesture template in one root node, it may determine whether the confidence parameter of the first stage is within a preset confidence interval, that is, whether the gesture orientation bias angle is within a preset bias angle interval. If the gesture orientation deflection angle is within the preset deflection angle interval, the key gesture 'five-finger pinch gesture' in the second stage is started to be matched with the lower node of the root node.
After the key gesture in each stage is matched with the nodes of the corresponding hierarchy, the display device 200 may obtain an action path composed of a plurality of matching hit nodes, where the action path may ultimately point to a leaf node, and the leaf node corresponds to one target gesture information, so that the display device 200 may obtain the target gesture information after the matching is completed and execute a control instruction associated with the target gesture information.
For example, depending on the setting of the gesture interaction policy of display device 200, a grab-release dynamic gesture may be used to delete the currently selected file, so that display device 200 may obtain a "root node-five finger open" at the match; primary node-five finger crimping; after the action path of the second-level node-five fingers are opened, a deleting instruction is obtained, and the currently selected file is deleted by executing the deleting instruction.
It can be seen that, in the above embodiment, the display device 200 may determine the action path layer by layer according to the gesture input stage by extracting gesture posture information of each stage in the gesture information stream and matching the gesture posture information using the detection model having the form of the tree structure node, so as to obtain the target gesture information. Because the detection model adopts the node form of the tree structure, the repeated detection of the dynamic gesture template can be avoided from being read each time in the process of matching the gesture key information. In addition, the detection model of the tree structure also supports the user to insert nodes at any time, so that gesture input is realized. And the hit rate of the node matching process can be customized by adjusting the confidence interval of each node, so that the detection model can use gesture habits of different users to realize the customized gesture operation.
In some embodiments, to enable display device 200 to perform gesture type matching for critical gesture information, display device 200 may first extract a first stage critical gesture type from the multi-stage critical gesture information when matching the critical gesture information using a detection model. And matching a first node according to the first-stage key gesture type, wherein the first node is a node with the same stored gesture template as the first-stage key gesture type. After the first node is obtained by matching, the display device 200 may extract a second stage key gesture type from the key gesture information, where the second stage is a subsequent action stage of the first stage. And matching the second node according to the key gesture type of the second stage. Similarly, the second node is a node with the same type of the stored gesture template and the second stage key gesture, that is, the lower node designated by the first node comprises the second node. And finally, recording the first node and the second node to obtain an action branch.
For example, 4 key gesture templates may be registered in the display device 200 in advance, and the corresponding key gesture information is G info1-G info4, and five dynamic gestures of AM 1-AM 5 can be combined correspondingly. The first stage key gesture type of AM 1-AM 4 is the same, and the second stage gesture type of AM 3-AM 4 is also the same, as shown in fig. 20, a corresponding tree structure detection model may be obtained, and the corresponding dynamic gesture is expressed as follows:
In performing the key gesture information matching, the display device 200 may preferentially match the key gesture information of G info1 and G info2 according to the node storage hierarchy of the detection model tree structure. If the matching key gesture information is G info1, the next detection is performed according to the designated lower node of the root node corresponding to G info1, that is, the lower node matching the key gesture templates of G info2、G info3 and G info4. Similarly, if the key gesture information is G info4 in the matching process of the nodes of the second hierarchy, the next node, i.e. the node corresponding to G info2 and G info3 in the third hierarchy, is continuously detected. The node matching of subsequent levels is performed in sequence until a leaf node is detected, and if a node hit G info3 is matched in the third level, action AM 3 is returned. If during the matching of one hierarchical node, other actions not stored in the current hierarchical node of the detection model are detected, the tree root node is returned, and G info1 and G info2 are detected again.
It should be noted that, in the above embodiment, the first stage, the second stage, and the first node and the second node are only used to represent the precedence relationship of different stages in the dynamic gesture and the upper and lower hierarchical relationship of different nodes in the detection model, and do not have corresponding numerical meanings. In the process of matching key gesture information by using the detection model, the gesture in the same stage can be used as a first stage or a second stage, and the same node can be used as a first node or a second node in the same way.
For example, in a starting stage of performing key gesture information matching by using the detection model, the key gesture information of the starting stage needs to be matched with a root node in the detection model, and at this time, the starting stage is a first stage, and a next stage of the starting stage is a second stage; the root node of the matching hit is a first node, and the node of the matching hit of the next level of the root node is a second node. After the matching is completed in the beginning, the display device 200 will continue to use the detection model to match the key gesture information. At this time, the next stage of the start stage is the first stage, and the next stage of the first stage is the second stage; and the node which is matched and hit in the next-level node of the root node is a first node, and the node which is matched and hit in the next-level node of the first node is a second node. Thus, in the process of matching using the detection model, the above process may be repeated until the final leaf node is matched.
The detection model with the tree structure also supports a gesture entry process for the user, i.e., in some embodiments, the display device 200 may traverse gesture templates stored by subordinate nodes of the first node when matching the second node according to the second stage key gesture type; if all gesture templates stored by the subordinate nodes are different from the key gesture types of the second stage, namely, the dynamic gesture input by the user is a new gesture, the display device 200 can be triggered to perform gesture input at this time, namely, the display 260 is controlled to display an input interface.
The input interface can prompt the user to input the gesture, and in order to obtain accurate dynamic gestures, the input interface can prompt the user to repeatedly put out the dynamic gestures needing to be input through prompt messages in the gesture input process. Namely, the user performs repeated input on the same behavior. Meanwhile, the user can also specify the control instruction associated with the entered dynamic gesture through the input interface. The display device 200 extracts key gesture information according to the above example and matches with the nodes of the detection model when the user performs the logging each time, and adds a new node at the current level according to the key gesture type of the corresponding stage when the key gesture template is not matched in the nodes of one level.
To reduce the impact of the gesture entry process on the user's gesture interaction, in some embodiments, the display device 200 may query the user through a prompt message or window to initiate entry before displaying the entry interface, and receive instructions from the user based on the window input. If the user inputs the input gesture information, the input gesture information input by the user based on the input interface can be received, and a new node is set for the detection model in response to the input gesture information, wherein the new node is a subordinate node of the first node. And finally, storing the gesture type of the corresponding stage in the new node to serve as a gesture template of the new node.
It can be seen that in the above embodiment, the display device 200 may perform dynamic gesture input in real time based on the detection model of the tree structure, and detect whether the Action tree structure has a corresponding Action branch by determining the Action to be input and inputting the user Action. If the corresponding Action branch does not exist, gesture key gesture extraction is performed, then a corresponding behavior template is obtained, and the corresponding node is inserted into a behavior tree to complete dynamic gesture input. Obviously, in the process of inputting the dynamic gesture, if the dynamic gesture input by the user has a corresponding Action branch in the detection model, the user behavior is detected according to the branch template, and if the detection is successful, the node state of the detection model is not required to be changed.
In some embodiments, the display device 200 may also determine a corresponding confidence when matching critical gesture information using the detection model, where the confidence may include a gesture bias angle and a critical gesture maintenance frame number. For the gesture deflection angle, the display device 200 may obtain a confidence interval preset by a corresponding node in the detection model after matching hits a node; and comparing the key gesture deflection angle at the current stage with the confidence interval of the corresponding node. If the key gesture deflection angle is in the confidence interval, recording a corresponding current node and starting the matching of the lower nodes of the current node; if the critical gesture deflection angle is not within the confidence interval, the gesture deviation is determined to be larger, so that further judgment or adaptive adjustment is needed.
The display device 200 may also adjust the detection model parameters for user habits, as the confidence parameters may not be within the confidence interval due to user input habits. Thus, in some embodiments, if the key gesture type of a stage is the same as the gesture templates in the nodes during the matching of the key gesture information using the detection model, but the key gesture bias angle is not within the confidence interval, the display device 200 may also modify the confidence interval by the gesture bias angle.
It should be noted that, when performing template matching, the display device 200 may match the hand orientation, and finger stretching information, and if the matching is successful, then detect whether the confidence threshold is successful, and if the matching is successful, consider that the gesture matching is successful. While in gesture entry, the display device 200 only needs to match the hand orientation, and finger stretching information. If the matching is successful, the template matching is calculated to be successful, if all the gestures in the dynamic gestures are successfully matched, the dynamic gesture is considered to be successfully matched, and finally, the template confidence degree is optimized according to the optimal confidence degree.
The optimal confidence level can be obtained by calculating part of key frames when the user behavior image is input for a plurality of times. For example, during gesture detection, there is a five-finger up motion in a dynamic gesture that occurs 10 times in a particular sequence, and the gesture is considered to be detected whenever three times are detected. Then there are 8 consecutive gestures in these 10 times that meet the criterion (10-3+1), and the one with the lowest average confidence is selected, because in the beginning and ending stage of the gesture, there may be a large deflection angle at the connection of the gesture and other gestures, resulting in an excessively large deflection angle value, and if this partial deflection angle value is used as the confidence value, many false detection situations may occur.
The confidence parameter, the number of frames, is maintained for the critical gesture, which is the number of consecutive frames in the user behavior image that are the same type as the first stage critical gesture. In some embodiments, the display device 200 may also obtain a maintenance frame number before matching the second node according to the second stage key gesture type; if the maintenance frame number of the first stage key gesture type is greater than or equal to the frame number threshold, that is, the user keeps a gesture action for a long time and does not belong to the condition of false input, the second node can be matched according to the second stage key gesture type. If the number of frames of the first stage key gesture type is less than the threshold number of frames, the current input may be different from the predetermined dynamic gesture, so gesture entry may be initiated according to the above embodiment, i.e., the display 260 is controlled to display an entry interface to update the confidence interval.
For example, during a gesture interaction, multiple gesture types may occur, and thus, it is desirable to extract a characteristic gesture that is more pronounced therein as a characteristic gesture of the action. The gesture features of the core are the hand orientation and the finger stretching state, so that the display device 200 can perform gesture key point recognition and key gesture information extraction on the action frame; and performing circular matching on the key gesture information, and judging the similar gestures if the gesture faces, the hand faces, the left hand, the right hand and the fingers are in the same telescopic state. And updating the deflection angle information and the number information of the similar gestures every time the similar gestures are detected, wherein the deflection angle information takes the maximum range, and the number information of the similar gestures is required to be larger than a threshold value. The threshold value may be determined based on the frame rate and may also be set to a fixed value, such as 3. And processing the action frames, selecting gesture postures meeting the conditions, and when a plurality of action frames are processed, taking action intersections, and taking the union of parameters of each action posture, so as to finally obtain a corresponding key gesture template.
Since the actions performed by the user when inputting a certain gesture are compared with the standard, the user may be more random when using gesture interaction, and the user is less conscious of whether the gesture is standard or not. Especially when the user is more urgent, the gestures that may be made are very non-standard. Resulting in inaccurate recognition by the display device 200 when performing dynamic gesture detection, reducing user experience.
To improve the above-described problem, the display device 200 may also take a pseudo-jump approach when performing dynamic gesture detection in some embodiments to improve the user experience. That is, the display device 200 may obtain an intermediate stage confidence parameter, which is one of the stages of the key gesture information that is located between the start stage and the end stage. And comparing the confidence coefficient parameter of the intermediate stage with the confidence coefficient interval of the corresponding node, and marking the node corresponding to the intermediate stage as a pre-jump node if the confidence coefficient parameter of the intermediate stage is not in the confidence coefficient interval of the corresponding node. And then matching the lower node of the pre-jump node according to the detection model so as to determine target gesture information according to the matching result of the lower node of the pre-jump node.
When performing matching on the lower node of the pre-jump node according to the detection model, the display apparatus 200 may acquire a lower node matching result of the pre-jump node; if the matching result is any hit next node, recording the pre-jump node and the hit next node to be used as the node of the target gesture information; if the matching result is that the node of the lower level is not hit, discarding the pre-jump node, and carrying out matching again from the node of the upper level.
For example, as shown in fig. 21, after the action G1 is detected, the detection of the subsequent action G2 is entered. At this time, if one action G2 occurs, but the confidence parameter exceeds the confidence interval, the display apparatus 200 performs a pseudo jump, that is, performs the subsequent detection of the action G1 and the subsequent action detection of the action G2 at the same time. If the operation G3 is detected after the pseudo jump is performed, the previous pseudo jump is considered to be established, and the operation proceeds directly to the operation G3. As shown in fig. 22, if the Action G3 is not detected after the pseudo jump is performed, but the Action G4 appears, and the Action G1 and the Action G4 just form another Action path, it is considered that the pseudo jump is not established at this time, and the subsequent Action detection of the Action G4 is continued.
For better implementation of the pseudo-jump, the display device 200 may set a pseudo-jump threshold, if a specific confidence parameter value is not in the confidence interval, then the pseudo-jump is performed when the confidence parameter is less than the pseudo-jump threshold. And, every time a pseudo jump is made, a user can delete the pseudo jump through a specific key or a specific gesture. After a certain number of pseudo hops, the display device 200 may optimize the Action nodes involved in the pseudo hops, and increase the specified threshold to adapt to the Action style of the user.
The display device 200 may update the pseudo jump threshold in various manners, for example, every time a pseudo jump is performed, a prompt is popped up, and the Action node information is updated by default, and if the user considers that the detection is false detection, the user only needs to delete the identification. The display device 200 may also update the pseudo-jump threshold after multiple pseudo-jumps to obtain a better user experience. In addition, for the pseudo-jump procedure, a threshold number of times may be set, that is, there are multiple pseudo-jumps in the detection procedure, and after a certain number of times, the previous pseudo-jumps are considered invalid.
Based on the above display device control method, a display device 200 is also provided in some embodiments of the present application. The display device 200 includes: a display 260, an image acquisition interface, and a controller 250. Wherein the display 260 is configured to display a user interface; the image acquisition interface is configured to acquire a user behavior image input by a user; as shown in fig. 23, 24, the controller 250 is configured to perform the following program steps:
Acquiring a gesture information stream, wherein the gesture information stream comprises continuous multi-frame user behavior images;
Extracting key gesture information from the gesture information stream, wherein the key gesture information comprises key gesture types of a plurality of stages and confidence parameters of each stage;
Matching the key gesture information by using a detection model to obtain target gesture information, wherein the detection model comprises a plurality of nodes stored in a tree structure; each node is provided with a gesture template and a designated subordinate node; the target gesture information is node combinations with the same key gesture type as the gesture template in each stage and the confidence coefficient parameters in the confidence coefficient interval;
and executing the control instruction associated with the target gesture information.
Specifically, fig. 24 is a timing diagram of a dynamic gesture interaction provided by an embodiment of the present application, where, as shown in fig. 24, the dynamic gesture interaction may include the following steps:
s2401: the image collector collects gestures put out by a user.
S2402: and the image collector sends the collected gestures put out by the user to the image collection interface as gesture information flow.
S2403: and the image acquisition interface sends the received gesture information stream to the controller.
S2404: the controller detects key gesture types of each stage based on the acquired gesture information stream.
S2405: the detection model is used to match the key gesture information to obtain target gesture information.
S2406: executing the control instruction associated with the target gesture information, and enabling the display to display corresponding content through response interaction.
As can be seen from the above, the display device 200 provided in the above embodiments can obtain the gesture information stream after the user inputs the dynamic gesture, and extract the key gesture information from the gesture information stream. And matching the key gesture types of each stage in the key gesture information by using the detection model to obtain node combinations with the same key gesture type and confidence parameters within a set confidence interval, wherein the node combinations are used as determined target gesture information, and finally, executing a control instruction associated with the target gesture information to realize dynamic gesture interaction. The display device 200 detects dynamic gestures based on gesture key points, dynamically matches key gesture types based on a detection model of a tree structure node storage form, enriches dynamic gesture interaction forms, and supports user-defined dynamic gestures.
Fig. 25 is a schematic diagram of another usage scenario of a display device according to an embodiment of the present application. As shown in fig. 25, a user may operate the display apparatus 200 through the control device 100, or a video capture device 201 such as a camera provided on the display apparatus 200 may also capture video data including a user's body and respond to gesture information, limb information, etc. of the user according to an image in the video data, thereby executing a corresponding control command according to motion information of the user. The user can control the display device 200 without the remote controller 100, so as to enrich the functions of the display device 200 and improve the user experience.
The display device 200 may also be in data communication with a server via a variety of communication means. By way of example, display device 200 may interact by sending and receiving information, as well as an electronic program guide (EPG, electronic Program Guide), receive software program updates, or access a remotely stored digital media library. The servers may be one group, or may be multiple groups, and may be one or more types of servers. Other web service content such as video on demand and advertising services are provided by the server.
In other examples, the display device 200 may further add more functions or reduce the functions mentioned in the above embodiments. The specific implementation of the display device 200 is not particularly limited in the present application, and for example, the display device 200 may be any electronic device such as a television.
Fig. 26 is a schematic diagram of a hardware structure of another hardware system in the display device according to the embodiment of the present application. As shown in fig. 26, the display device in the display device 200 in fig. 25 may specifically include: a panel 1, a backlight assembly 2, a main board 3, a power board 4, a rear case 5, and a base 6. Wherein the panel 1 is used for presenting pictures to a user; the backlight assembly 2 is located below the panel 1, usually some optical assemblies, and is used for providing enough brightness and uniformly distributed light sources to enable the panel 1 to display images normally, the backlight assembly 2 further comprises a back plate 20, the main plate 3 and the power panel 4 are arranged on the back plate 20, some convex hull structures are usually stamped and formed on the back plate 20, and the main plate 3 and the power panel 4 are fixed on the convex hulls through screws or hooks; the rear shell 5 is arranged on the panel 1 in a covering way so as to hide parts of the display equipment such as the backlight assembly 2, the main board 3, the power panel 4 and the like, thereby having an attractive effect; and a base 6 for supporting the display device. Optionally, fig. 26 further includes a key board, where the key board may be disposed on the back plate of the display device, which is not limited by the present application.
In addition, the display device 200 may further include a sound reproduction apparatus (not shown in the drawing) such as an acoustic component, e.g., an I2S interface including a power Amplifier (AMP) and a Speaker (Speaker), etc., for realizing reproduction of sound. Typically, the audio assembly is capable of at least two channels of sound output; when the panoramic surround effect is to be achieved, a plurality of acoustic components need to be provided to output sounds of a plurality of channels, and a detailed description thereof will not be given here.
It should be noted that, the display device 200 may be implemented in a specific manner such as an OLED display screen, so that the template included in the display device 200 shown in fig. 26 is changed accordingly, which is not described herein too much. The present application is not limited to the specific structure of the inside of the display device 200.
With the continuous development of electronic technology, more and more functions can be implemented by display devices such as a television, for example, the display devices can capture images of users through video acquisition devices arranged by the display devices, and after gesture information of the users in the images is identified by a processor, commands corresponding to the gesture information are executed.
However, at present, the control command determined by the gesture information of the display device is single, which results in lower intelligent degree and poorer user experience of the display device.
In order to improve the intelligentization degree of display and improve the experience of a user, a specific embodiment is described below to describe a control method of a display device provided by the present application, and the following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
In some embodiments, the execution body of the control method of the display device provided by the embodiment of the present application may be the display device, and specifically may be a controller such as CPU, MCU, SOC in the display device or a control unit, a processor, a processing unit, etc., where in the subsequent embodiments of the present application, the controller is taken as an example of the execution body. And after the controller acquires the video data through the video acquisition device of the display equipment, gesture recognition is performed according to continuous multi-frame images of the video data, and then corresponding actions are performed according to the recognized gesture information.
In some embodiments, fig. 27 is a schematic diagram of an embodiment of a control method of a display device according to the present application, where when a controller obtains an image to be detected on the right side of fig. 27 from video data of a video capturing device, identifies a gesture a in the image to be detected, and can identify that gesture information is included in the image to be detected through a gesture identification algorithm, where the gesture information includes an "OK" gesture, and a position, a size, and the like of the gesture. The controller may then determine that the control command corresponding to "OK" of the gesture information is "click determination control" according to the cursor being located on the control "determination" displayed on the display of the current display device, and eventually the controller may execute the command.
In other embodiments, fig. 28 is a schematic diagram of another embodiment of a control method of a display device according to the present application, where after a controller recognizes a gesture in each frame of image in video data of a video capturing apparatus, the gesture B of a user in a to-be-detected image is moved from a left side in a previous frame image to a right side in a subsequent frame image according to a comparison result of two frames of to-be-detected images, which indicates that the gesture B of the user in the to-be-detected image is moved. Then, the controller may determine that the control command corresponding to the gesture information is "moving the cursor to the right" according to what is displayed on the current display is the moving cursor C, and the moved distance may be related to the moved distance corresponding to the gesture information in the image to be detected.
As can be seen from the embodiments shown in fig. 27 and fig. 28, when the controller in the display device can determine the gesture information of the user through the video data collected by the video collecting device, and further execute the control command indicated by the gesture by the user, the user can control the display device without relying on the control devices such as the remote controller and the mobile phone, so that the functions of the display device are enriched, the interestingness in controlling the display device is increased, and the user experience of the display device can be greatly improved.
The specific mode of determining the gesture information in the image according to the image to be detected by the controller is not limited, and for example, the gesture information in the image to be detected can be identified by adopting a machine learning model based on the mode of image identification.
In some embodiments, the present application further provides a control manner of a display device, which can further determine gesture information of a hand by defining coordinates of key points of the hand of a human body in an image to be detected, and can be better applied to a scene of the display device. For example, fig. 29 is a schematic diagram of coordinates of key points of a hand according to an embodiment of the present application, and in the example shown in fig. 29, total 21 key points of 1-21 are marked on a human hand in sequence according to positions of fingers, joints, and palms.
Fig. 30 is a schematic diagram of different stretching states of hand key points provided in an embodiment of the present application, where when a controller identifies gesture information in an image to be detected, the controller first determines the direction of a hand in the image to be detected through algorithms such as image identification, and when the image includes key points on one side of a palm, the controller continues to identify all the key points, and determines the position of each key point. For example, in the leftmost image in fig. 30, the distances between the key points corresponding to the 9-12 numbers of the middle fingers of the hand are sparse and scattered, the middle fingers are in an extended state, in the middle image in fig. 30, the upper parts are concentrated, the lower parts are scattered, and the middle fingers are in a semi-bending state; in the right image of fig. 30, the distances between the key points 9-12 corresponding to the middle fingers of the hand are close and concentrated, indicating that the middle fingers are in a fully contracted state. Therefore, the distances, distribution ratios, and the like between different key points can be defined to distinguish different states in fig. 30, and then according to the same manner in fig. 30, each key point corresponding to each of the 5 fingers in fig. 29 can be identified, and then gesture information in the image to be detected can be obtained.
In some embodiments, the present application further provides a control method of a display device, where the controller may identify gesture information and limb information in an image to be detected, and determine a control command according to the two information together, and execute the control command. For example, fig. 31 is a schematic diagram of an application scenario of a control method of a display device according to an embodiment of the present application, where in the scenario shown in fig. 31, the specific structure of the display device 200 is the same as that shown in fig. 25 to 26, and at this time, a user of the display device 200 may indicate a control command through a gesture and a limb together, and then, after the display device 200 collects video data through its video collecting device, a controller in the display device 200 identifies an image to be detected in a multi-frame image, and at the same time, identifies gesture information and limb information of the user in the image to be detected.
Fig. 32 is a schematic diagram of determining a control command by using gesture information and limb information together, where, assuming that gesture information F on the left side in fig. 32 is an "OK" gesture and limb information G is an elbow pointing to the upper left corner, the control command that can be determined according to gesture information F and limb information G is to click a control displayed on the left side of the display; in fig. 32, the gesture information H on the right side is an "OK" gesture, and the limb information I is the elbow pointing to the upper right corner, so that the control command that can be determined according to the gesture information H and the limb information I is to click the control displayed on the right side of the display.
In combination with the above embodiments, it can be seen that, according to the control method for a display device provided by the embodiment of the present application, the controller can determine different control commands according to gesture information and limb information in an image to be detected, so that the number of control commands that a user can send to the display device by using the interaction manner is enriched, and the degree of intelligence and user experience of the display device are further improved.
In some embodiments, if the computing capability of the controller of the display device supports, the controller may perform gesture and limb information recognition on each frame of to-be-detected image extracted from the video data, but since the amount of computation required for common gesture and limb recognition is large, the amount of computation required for the controller is greatly increased, and the user does not always control the display device in most of the time, the display device provided by the present application is provided with at least two detection models, denoted as a first detection model and a second detection model, where the second detection model is used for recognizing gesture information and limb information in the to-be-detected image, and the amount of computation and data of the first detection model is smaller than that of the second detection model, and may be used for recognizing whether the gesture information is included in the to-be-detected image. A control method of the display device according to an embodiment of the present application is specifically described below with reference to fig. 33.
Fig. 33 is a flowchart of a control method of a display device according to an embodiment of the present application, where the control method shown in fig. 33 includes:
s3301: and extracting one frame of image to be detected from the continuous multi-frame images of the video data acquired by the video acquisition device of the display equipment according to a preset time interval.
The application can be applied to a scene as shown in fig. 31, and is executed by a controller in a display device, when the display device is in a working state, a video acquisition device acquires video data of the direction of the display device, and after the controller serving as an execution subject acquires the video data, a frame of image to be detected is extracted from the video data according to a preset time interval. For example, when the frame rate of the video data collected by the video collecting device is 60 frames/second, the controller may sample at a frame rate of 30 frames/second, so as to extract one frame of the image to be detected every one frame for subsequent processing, where the preset time interval is 1/30 second.
S3302: and judging whether the image to be detected comprises gesture information of a human body or not by using the first detection model.
Specifically, for the application scenario in fig. 31, when the user needs to control the display device, the user can stand in the direction of the video acquisition device, and according to the control command of the display device, make corresponding gestures and actions of limbs, and at this time, the video acquisition device acquires an image including target gesture information and limb information; when the user does not need to control the display device, the video image acquired by the video acquisition device in the acquisition range does not comprise target gesture information and limb information.
Therefore, if the gesture information is not included in the image to be detected before S3302 and the image to be detected is not processed using the second detection model, the controller processes the image to be detected using the first detection model having a smaller calculation amount in S3302, and determines whether the gesture information is included in the image to be detected through the first detection model.
In some embodiments, the controller uses the gesture type detection model as the first detection model to implement a global sensing algorithm, so as to achieve the purpose of judging whether gesture information is included in the image to be detected. The global sensing algorithm is an algorithm that the controller can be turned on by default after being turned on and keeps an operation state, has the characteristics of small calculated amount and simple detection type, and can be only used for acquiring specific information and for turning on a second detection model to detect other non-global functions.
In some embodiments, the first detection model is obtained through training through a plurality of training images, each training image includes different gesture information to be trained, and then the controller compares the gesture information obtained through learning with the image to be detected by using the first detection model so as to judge whether the gesture information is included in the image to be detected, but the first detection model may not be used for specifically identifying the gesture information, and the second detection model may be used for determining the gesture information through a specific joint or other identification algorithm.
S3303: if it is determined in S3302 that the image to be detected includes gesture information of a human body, it is determined that the user wishes to control the display device, and then the controller continues to acquire the image to be detected and uses the second detection model to identify target gesture information and limb information in the image to be detected.
In some embodiments, after detecting that the to-be-detected image includes gesture information of a human body, the controller may continue to extract the to-be-detected image from the multi-frame image acquired by the video acquisition device according to a preset time interval, and use the second detection model to replace the first detection model, so as to process the subsequently extracted to-be-detected image, thereby identifying target gesture information and limb information of each frame of to-be-detected image. Or the controller may also reduce the preset time interval and extract the image to be detected at a smaller time interval.
In some embodiments, the controller may also process the to-be-detected image determined in S3302 to include gesture information of the human body using the second detection model, and then continue to process the subsequent to-be-detected image using the second detection model, that is, process the user behavior image.
S3304: and determining a corresponding control command according to the target gesture information and the limb information in the user behavior image of the preset number of frames determined in S3303, and executing the control command.
In some embodiments, in order to improve accuracy of recognition, the controller may continuously collect and process multiple frames of images, for example, when it is determined in S3302 that the image to be detected includes gesture information of a human body, in S3303, after a preset number (for example, 3) of user behavior images are collected according to a preset time interval, target gesture information recognition and limb information recognition are performed on the 3 user behavior images, and finally when the target gesture information and the limb information in the 3 user behavior images are the same, it is determined that subsequent calculation is performed according to the same target gesture information and limb information, so that inaccuracy in recognition caused by occasional errors caused by other factors can be prevented.
When the target gesture information and the limb information in the preset number of user behavior images are the same (or the target gesture information and the limb information are partially the same, and the ratio of the partially the same to the preset number is greater than a threshold, for example, the threshold may be 80%, etc.), the controller determines a control command corresponding to the target gesture information and the limb information according to the mapping relationship. For example, fig. 34 is a schematic diagram of an embodiment of a mapping relationship provided in an embodiment of the present application, where the mapping relationship includes a plurality of control commands (control command 1, control command 2 …), and a correspondence relationship between each control command and corresponding target gesture information and limb information, for example: control command 1 corresponds to gesture information 1 and limb information 1, and control command 2 corresponds to gesture information 2 and limb information 2 … …. A specific implementation thereof may refer to fig. 32, and different combinations of target gesture information and limb information may correspond to different control commands.
In some embodiments, the mapping relationship may be preset or specified by a user of the display device, and may be stored in the controller in advance, so that the controller may determine, according to the determined target gesture information and limb information, a corresponding control command from the mapping relationship and continue to execute.
In other embodiments, fig. 35 is another schematic diagram of a mapping relationship provided in the embodiment of the present application, where in the mapping relationship shown in fig. 35, the target gesture information and the limb information respectively correspond to one control command, and at this time, the controller may verify the determined control command by using the other information after determining one control command according to the target gesture information or the limb information, so as to improve accuracy of the obtained control command, and when the control commands determined by the two information are different, it is indicated that the identification is wrong, and processing measures such as not executing the control command or re-identifying may be performed, so as to prevent executing the wrong control command.
In still other embodiments, the mapping relationship provided by the present application may further include a control command corresponding to "do not execute any command", for example, fig. 36 is a schematic diagram of target gesture information and limb information in an image provided by the embodiment of the present application, where a user in the image is with his back facing the display device, and his hand is just facing the display device. Although the user does not want to control the display device, after determining that the current image to be detected includes gesture information through the first detection model in the flow shown in fig. 33 and then identifying the target gesture information and the limb information through the second detection model, the controller may determine that the current target gesture information and the limb information do not execute any command according to the mapping relationship. The mapping relationship at this time may include, for example, palm expansion for gesture information, elbow pointing obliquely downward for limb information, and the like.
In summary, according to the control method of the display device provided by the embodiment, the controller can determine different control commands according to the target gesture information and the limb information in the user behavior image, so that the number of control commands which can be sent to the display device by the user in the interactive mode is enriched, and the intelligent degree and the user experience of the display device are further improved. Further, in this embodiment, whether the image to be detected includes gesture information is identified by using the first detection model with smaller calculation amount, and only after the first detection model determines that the image includes gesture information, the target gesture information and the limb information are identified by using the second detection model with larger calculation amount, so that the calculation amount and the power consumption caused by invalid identification can be reduced, and the calculation efficiency of the controller is improved.
In a specific implementation manner, when the control command is a one-time control operation such as clicking a control displayed on the display, returning to the home page, modifying the volume, etc., in conjunction with S3301-S3304 in fig. 33, after executing the control command in fig. S3304, the process may be ended, the second detection model is stopped to identify the target gesture information and the limb information, and the process returns to S3301 to continue extracting the image to be detected, and the gesture information is identified again by using the first detection model, so that the whole process shown in fig. 33 is re-executed.
In another specific implementation manner, when the control command is a movement command for controlling a target control such as a mouse on the display to move to a position corresponding to the gesture information, after the movement command is executed in S3304, the process of S3303-S3304 should be repeated in S3303, so as to detect a continuous movement action of the user, thereby realizing continuous movement of the target control on the display.
In some embodiments, in the above-described repeated execution of S3303-S3304, if it is recognized that the target gesture information and the limb information of the human body in the preset number of user behavior images currently acquired correspond to the stop command, or it is determined by the second detection model that the target gesture information and the limb information of the human body are not included in the preset number of user behavior images, the process may be ended, the recognition of the target gesture information and the limb information using the second detection model is stopped, and the process returns to S3301 to continue the extraction of the image to be detected, and the gesture information is recognized again using the first detection model, thereby re-executing the entire process as shown in fig. 33.
In some embodiments, when the control command is a movement command for controlling a target control such as a mouse on the display to move to a position corresponding to the gesture information, and the controller continuously and repeatedly executes the process of S3303-S3304, it can be understood that the gesture of the user should be in a continuously moving state at this time, once the gesture is too fast, the controller may not immediately stop executing the process in a process of detecting the target gesture information and the limb information in the multi-frame user behavior image during a certain detection, and may predict the target gesture information and the limb information that may be currently present according to the previous or multiple detection results, and execute the subsequent movement command according to the predicted target gesture information and limb information.
For example, fig. 37 is a schematic diagram of a moving position of a target control according to an embodiment of the present application, after the controller ① th execution S3303 detects the target gesture information K and the limb information L in the user behavior image, in S3304, a moving command for moving the target control to a ① position on the display is executed. After the controller ② th execution S3303 detects the target gesture information K and the limb information L in the user behavior image, in S3304, a movement command to move the target control to ② position on the display is executed. However, assuming that the user moves too fast after ② th detection, when the controller ③ th executes S3303, the target gesture information and limb information cannot be identified in the user behavior image, the target control on the display cannot be moved, and after the subsequent controller ④ th executes S3303, the target gesture information K and limb information L in the user behavior image can be detected, in S3304, when a movement command for moving the target control to ④ positions on the display is executed, the change of directly moving the target control from ② positions to ④ positions on the display is large, so that a pause and clamping viewing effect is brought to the user, and user experience is greatly affected.
Therefore, in this embodiment, when the controller ③ th executes S3303 and fails to recognize the target gesture information and the limb information in the user behavior image, the controller may predict the target gesture information K and the limb information L that may appear in the ③ th user behavior image according to the moving speed and the moving direction of the target gesture information K and the limb information L recognized ① th and ② th and further execute a moving command for moving the target control to the ③ position on the display according to the predicted target gesture information and the limb information when the target gesture information and the limb information are not recognized in the user behavior image.
Finally, fig. 38 is another schematic diagram of a moving position of a target control according to an embodiment of the present application, where after the above prediction method is used, for target gesture information and limb information that change according to ①-②-③-④ in a user behavior image acquired at the same time interval, although the target gesture information and limb information cannot be identified in the user behavior image when S3303 is executed ③, the ③ position on the display is predicted based on the predicted target gesture information and limb information, so that the target control on the display uniformly changes according to the position ①-②-③-④ in the whole process, and a pause and a click of the target control in fig. 37 from the position ② to the position ④ are avoided, so that the display effect is greatly improved, and the operation effect when the user controls the display device through gestures and limbs is smoother, and the user experience is further improved.
In order to implement the above procedure, in some embodiments, after each execution of S3303, the controller stores and records the target gesture information and the limb information obtained in the execution of S3303, so as to predict when the target gesture information and the limb information are not detected at a subsequent time. In some embodiments, when the target gesture information and limb information are not detected when the process in S3303 is performed a plurality of times (e.g., 3 times) consecutively, the prediction is not performed, but the current flow is stopped to be performed again from S3301.
Based on the above embodiment, in a specific implementation process, the controller may maintain a gesture movement speed v and a movement direction α according to the recognition result of the second detection model, and may obtain the gesture movement speed v and the movement direction α according to the frame rate and the inter-frame movement distance (typically, three frames). When the gesture is undetected but the limb is detectable, multi-frame action prediction (generally three frames) is added, so that the conditions of focus reset, mouse blocking and the like affecting the user experience caused by the fact that the gesture is suddenly undetected are prevented. The predicted gesture position of the next frame can be obtained according to the gesture moving speed v and the moving direction alpha, of course, a speed threshold beta is needed, if the gesture moving speed exceeds the threshold beta, the gesture moving speed is fixed to be the speed beta, and the situation that the speed caused by the gesture is too fast influences the experience is prevented.
In some embodiments, in the above example, when the target gesture information and the limb information of the user behavior image are identified by using the second detection model, the identification result of one frame of the user behavior image is not used as a reference, but when a preset time is set, a preset number of user behavior images are extracted, and after the target gesture information and the limb information are detected in the user behavior images, the control commands corresponding to the same target gesture information and limb information are executed. In a specific implementation process, the controller of the display device may dynamically adjust the preset time interval according to the working parameter of the display device, for example, when the controller determines that the preset time is 100ms according to the current light load, that is, one frame of user behavior image is extracted every 100ms, and if the preset number is 8, the preset number of user behavior images corresponds to a time range of 800ms, if in the time range, the controller indicates that the target gesture information and the limb information are real and effective after detecting the target gesture information and the limb information in the 8 frames of user behavior images, and then the control commands corresponding to the same target gesture information and limb information can be implemented. When the controller determines that the load is heavy according to the fact that the current load is larger than the threshold value, the preset time is determined to be 200ms, namely, one frame of user behavior image is extracted every 200ms, at the moment, the controller can adjust the preset number to be 4, and therefore the reality and the effectiveness of the target gesture information and the limb information are determined in the time range of 800ms corresponding to the 4 frames of user behavior images. Therefore, in the control method provided by the embodiment, the controller can dynamically adjust the preset number according to the preset time interval and the preset number and the controller are in inverse proportion correspondence, so that the calculated amount of the controller in heavy load can be reduced, the extension of the identification time caused by the larger preset number when the preset time interval is longer can be prevented, and a certain identification efficiency is finally met on the basis of ensuring the accuracy of identification.
In some embodiments, fig. 39 is a flowchart of a control method of a display device according to an embodiment of the present application, which may be a specific implementation manner of the control method shown in fig. 33, as shown in fig. 39, and includes the following steps:
S3901, S3903: and performing gesture detection on the image to be detected, if the target gesture information is detected, executing S3904, otherwise executing S3901 and S3903.
S3904-S3906: and starting a gesture limb control mode, continuing to identify the limb, and determining limb information.
S3907 to S3908: user behavior detection is performed to determine whether a tap gesture of the user is detected, if so, S3910 is performed, otherwise S3909 is performed.
S3909: and executing the control instruction related to the movement.
S3910: executing the control command related to clicking, resetting the detection mode, stopping limb recognition, starting gesture recognition only, and executing S3901-S3902.
S3901-S3902: and performing gesture detection on the image to be detected, acquiring target gesture information of a user, and executing S3907-S3908.
The specific implementation and principle of fig. 39 are the same as those of fig. 33, and are not described in detail in the embodiment of the present application.
In some embodiments, the controller can identify the target gesture information of the human body in the user behavior image by using the second detection model, and the first detection model is also obtained through image training including gesture information, so that after the whole process shown in fig. 33 is executed each time, the controller can use the target gesture information identified by the second detection model in the current execution for training and updating the first detection model, thereby realizing more effective updating of the first detection model according to the currently detected target gesture information, and improving the instantaneity and applicability of the first detection model.
In the specific implementation process of the foregoing embodiments of the present application, although the display may be controlled according to the target gesture information and the limb information in the user behavior image, the human body may be located only in a small area of the image to be detected acquired by the video acquisition device of the display apparatus, so that when the user finishes the movement operation of controlling the control on the display for a longer distance, the gesture information of the human body moves longer, which brings inconvenience to the user. Therefore, the embodiment of the application also provides a control method of the display device, which enables a user to realize the movement of the indication target control on the display only through the movement of the gesture in the virtual frame when the user controls the display device by establishing the mapping relation between the virtual frame in the image to be detected and the display, thereby greatly reducing the action amplitude of the user and improving the user experience. The "virtual box" and related applications provided in the present application are described below with reference to specific embodiments, where the virtual box is merely an exemplary name, and may also be referred to as a mapping box, an identification area, a mapping area, etc., and the present application is not limited to the names thereof.
For example, fig. 40 is a flowchart of a control method of a display device according to an embodiment of the present application, where the method shown in fig. 40 may be applied to the scenario shown in fig. 31, and executed by a controller in the display device, and is used to identify, when the display device displays a control such as a mouse, a movement command for moving the control, where the movement command is issued by a user through gesture information, and specifically the method includes:
S4001: when the display equipment is in a working state, the video acquisition device acquires video data in the direction of the display equipment, and after the controller serving as an execution main body acquires the video data, a frame of image to be detected is extracted from the video data according to a preset time interval. And identifying gesture information of a human body in the image to be detected.
The specific implementation manner of S4001 may refer to S3301-S3303, for example, the controller may use the first detection model to determine whether the gesture information is included in the image to be detected extracted each time, and use the second detection model to identify the target gesture information and the limb information in the user behavior image including the gesture information, so that specific implementation and principles are not repeated. Or in S4001, when the display device displays the target control or runs an application program that needs to display the target control, it is indicated that the target control may need to be moved at this time, so after the image to be detected is obtained each time, the second detection model is directly used to identify the target gesture information and/or limb information in the user behavior image, and the identified target gesture information and/or limb information may be used to determine the movement command later.
S4002: after the first user behavior image extracted in S4001 is identified, the controller determines that the first user behavior image includes target gesture information, and then the controller establishes a virtual frame according to the target gesture information in the first user behavior image, establishes a mapping relationship between the virtual frame and a display of the display device, and may display the target control at a preset first display position, where the first display position may be a center position of the display.
Fig. 41 is a schematic diagram of a virtual frame provided in an embodiment of the present application, where when a first user behavior image includes target gesture information K and limb information L, and the target gesture information and the limb information are expanded palms and correspond to commands of a target control displayed on a mobile display, at this time, a controller establishes the virtual frame according to a first focus position P where the target gesture information K is located as a center, and displays the target control at the center position of the display. In some embodiments, the virtual frame may be rectangular in shape and the ratio of the length to the width of the rectangle to the length to the width of the display is the same, but the area of the virtual frame and the area of the display may be different. As shown in fig. 41, the mapping relationship between the virtual frame and the display is represented by a dashed line in the figure, in the mapping relationship, the midpoint P of the virtual frame corresponds to the midpoint Q of the display, the four vertices of the rectangular virtual frame correspond to the four vertices of the rectangular display respectively, and since the ratio of the length to the width of the virtual frame is the same as the ratio of the length to the width of the display, one focal position in the rectangular virtual frame can correspond to one display position on the display, so that when the focal position in the rectangular virtual frame changes, the display position on the display can correspondingly change along with the focal position.
In some embodiments, the above-described mapping relationship may be represented by a relative distance between a focus position in the virtual frame and one target position within the virtual frame, and a relative distance between a display position on the display and the same target position on the display. For example, a coordinate system is established with the point P0 of the vertex assuming the lower left corner of the virtual frame as the origin, and the coordinates of the point P may be expressed as (x, y); a coordinate system is established with the vertex Q0 point of the lower left corner of the display as the origin, and the coordinates of the Q point can be expressed as (X, Y). The mapping relationship can be expressed as: X/X in the rectangular long side direction and Y/Y in the rectangular wide side direction.
The controller completes the establishment of the rectangular virtual frame and the mapping relation in S4001-S4002, and then, the virtual frame and the mapping relation may be applied in S4003-S4004, so that the movement of the focus position corresponding to the gesture information may correspond to the position movement of the target control on the display.
S4003: and when the target gesture information is included in the second user behavior image and the second focus position corresponding to the target gesture information is in the rectangular virtual frame, determining a second display position on the display according to the second focus position and the mapping relation.
S4004: the target control on the control display is moved to the second display position determined in S4003.
Specifically, fig. 42 is a schematic diagram of a correspondence between a virtual frame and a display provided in an embodiment of the present application, where it is assumed that, in a first user behavior image, the virtual frame is established at a first focus position P in target gesture information, and at this time, a target control "mouse" may be displayed at a first display position Q point in the center on the display. Then, when the second focus position P 'of the target gesture information moves in the right upper corner direction relative to the first detection image in the virtual frame in the second user behavior image after the first user behavior image, the controller may determine, according to the first relative distance between the second focus position and the target position in the left lower corner in the virtual frame and in combination with the ratio in the mapping relationship, the second relative distance between the corresponding second display position Q' on the display and the target position in the left lower corner on the display. Finally, the controller may calculate an actual position of the second display position Q 'on the display according to the second relative distance and the coordinates of the lower left corner target position, and display the target control at the second display position Q'.
Fig. 43 is a schematic diagram of movement of a target control according to an embodiment of the present application, where the schematic diagram illustrates a process shown in fig. 42, when target gesture information between a first user behavior image and a second user behavior image moves from a first focus position P to a second focus position P ', a controller may display the target control at a first display position Q and a second display position Q' on a display according to a change of the focus position in a virtual frame, and in this process, a look and feel presented to a user is that the target control displayed on the display moves correspondingly along with movement of the target gesture information.
It can be appreciated that the above-described process of S4003-S4004 can be performed repeatedly in a loop, and the display position can be determined for the focus position corresponding to the target gesture information in each identified user behavior image, and continuously controlling the target control to move on the display.
In this embodiment, the position where the target gesture information is located is taken as the focal position, for example, one key point in the target gesture information is taken as the focal position, and in other embodiments, the key point of the limb information may also be taken as the focal position, so that implementation manners thereof are the same and will not be repeated.
In addition, in the above example, taking the first user behavior image and the second user behavior image as single-frame images as an example, as shown in fig. 40, the method shown in fig. 33 may also be combined with the method, where the user behavior image includes multiple frames of user behavior images, so that the corresponding focus position is determined according to the target gesture information identified in the multiple frames of user behavior images.
In summary, according to the control method for the display device provided by the embodiment, the mapping relation between the virtual frame and the display in the user behavior image can be established, so that when the user controls the display device, the user can move on the display only through the movement of the gesture in the virtual frame, the movement of the indication target control can be realized, the action amplitude of the user is greatly reduced, and the user experience can be improved.
In a specific implementation of the above embodiment, when the controller establishes the virtual frame, the size of the established virtual frame may be related to a distance between the human body and the video capture device. For example, fig. 44 is an area schematic diagram of a virtual frame provided in an embodiment of the present application, where when a distance between a human body and a video capturing device is relatively long, an area corresponding to gesture information in a user behavior image is relatively small, so that a relatively small virtual frame may be set; when the distance between the human body and the video acquisition device is relatively short, the area corresponding to the gesture information in the user behavior image is relatively large, so that a relatively large virtual frame can be set. The area of the virtual frame can be in a linear multiple relation of direct proportional change with the distance, or the virtual frame can be divided into multiple levels of mapping relations (namely, the size of a certain frame corresponds to a certain distance), and the specific mapping relations can be adjusted according to actual conditions. In some embodiments, the controller may determine the distance between the human body and the display device (where the video capturing apparatus is disposed on the display device) according to an infrared mode or any other ranging mode set by the display device, or the controller may determine the corresponding distance according to an area corresponding to gesture information in the user behavior image, and further determine the area of the virtual frame according to the area of the gesture information, and so on.
In some embodiments, when the established virtual box is relatively close to the edge of the user behavior image, the accuracy of identifying the target gesture information may be reduced due to the limitation of conditions such as an image recognition processing algorithm. Thus, the controller can also establish a control optimal range for the user behavior image to establish an edge region around its edge. For example, fig. 45 is a schematic diagram of an edge area provided by an embodiment of the present application, where it can be seen that the edge area refers to an area within a user behavior image and outside a control optimal range, and a distance between the edge area and a boundary of the user behavior image is smaller than a preset distance. In the user behavior image above fig. 45, the subsequent calculation can be continued assuming that the virtual frame established from the target gesture information in the first user behavior image is located entirely outside the edge region within the control optimum range. And when the virtual frame established by the controller according to the target gesture information in the first user behavior image has a partial area located in the edge area, and in the user behavior image below in fig. 45, the left side of the virtual frame is located in the edge area, the controller may compress the virtual frame in the transverse direction, so as to obtain a transversely compressed virtual frame. It can be understood that a mapping relationship can be established with the display according to the compressed virtual frame, at this time, the moving distance of the focus position corresponding to the target gesture information corresponds to a larger changing distance of the display position on the display, and although the user experiences that the target control moves faster in the transverse direction, the controller is prevented from recognizing the target gesture information from the edge area of the user behavior image, the recognition precision of the target gesture information can be improved, and the accuracy of the whole control process is improved.
In the above embodiment, the virtual frame in the user behavior image is provided, so that the user can control the movement of the target control on the display through the movement of the gesture information in the virtual frame, but in some cases, the gesture information of the user may move out of the virtual frame due to the reasons of large action, overall movement of the body and the like, so that the situation that the gesture information cannot be recognized is caused, and the control effect is affected. For example, fig. 46 is a schematic state diagram of gesture information provided in the embodiment of the present application, where in state S1, the second user behavior image includes target gesture information, and the second focus position corresponding to the target gesture information may be inside the established virtual frame K1, where the control method in the foregoing embodiment may be normally executed, and the display position of the target control is determined according to the focus position of the target gesture information in the virtual frame. In the state S2 in fig. 46, the second user behavior image includes the target gesture information, and the second focus position corresponding to the target gesture information may appear outside the virtual frame K1 in the user behavior image, where the display position of the target control cannot be determined normally by the focus position of the target gesture information in the virtual frame.
Therefore, after the controller recognizes that the second focus corresponding to the gesture information in the second user behavior image is located at the P2 point outside the virtual frame, the virtual frame K2 can be reestablished with the center of the P2 point where the second focus is located at the moment, and a mapping relationship between the virtual frame K2 and the display is established. Fig. 47 is a schematic diagram of a reestablished virtual frame provided in an embodiment of the present application, it can be seen that, in the reestablished virtual frame K2 in fig. 47, the second focal position P2 is located at the center of the virtual frame K2, so that the controller needs to control the target control to be displayed at the center position on the display according to the second focal position P2, which also brings the viewing effect of resetting the target control to the user, thereby avoiding the problem that the target control cannot be controlled after the virtual frame is removed due to gesture information.
Fig. 48 is another schematic diagram of a reestablished virtual box according to an embodiment of the present application, in which, in this manner, when the gesture information shown in the S2 state in fig. 46 appears outside the virtual box K1 in the image to be detected, the controller resets the virtual box. At this time, it is assumed that the controller displays the target control at the first relative position Q1 on the display according to the position information of the target gesture information in the virtual frame K1 in the previous user behavior image, and at this time, the virtual frame K2 is reestablished according to the relative position relationship of the first relative position Q1 on the display in the whole display, so that the relative position relationship of the second focus position P2 in the virtual frame K2 is the same as the relative position relationship of the first relative position Q1 in the display. Therefore, the controller can continue to display the target control at the first relative position Q1, and the reset of the virtual frame K2 is completed under the condition that the jump of the target control to the central position of the display position does not occur. In the subsequent user behavior image, when the target gesture information changes in the virtual frame K2, the controller can determine the display position of the target control according to the focus position of the target gesture information in the virtual frame K2, so that focus resetting is completed under the condition that a user is unknown, the problem that the target control cannot be controlled due to the fact that the target gesture information is removed from the virtual frame can be avoided, the whole process is smoother, and the use experience of the user is further improved.
In some embodiments, after the controller performs the above-mentioned process to reestablish the virtual frame, relevant prompt information may be displayed on the display to prompt the user that the virtual frame has been reestablished on the display, and prompt the user that the virtual frame has been reestablished. Or after the controller determines that the virtual frame is to be re-established in the above process, the controller can also display information prompting to update the virtual frame on the display, and execute the process of reconstructing the virtual frame after receiving the confirmation information of the user, so that the user can control the whole process, reconstruct according to the intention of the user, and prevent invalid reconstruction under the condition of active departure of the user and the like.
In some embodiments, in the process of controlling the movement of the target control, when the controller does not recognize the target gesture information in the continuous preset number of user behavior images in the process of controlling, the display of the target control on the display may be stopped, so as to end the flow shown in fig. 40. Or when the user behavior image processed by the controller in a certain preset time period does not include the target gesture information, the display of the target control on the display can be stopped, and the process is ended. Or when the controller recognizes that the target gesture information included in the user behavior image corresponds to the stop command in the control process, the display of the target control on the display can be stopped, and the process is ended.
In some embodiments, during execution of the method as shown in FIG. 40, the controller will determine a display location on the display based on the focus position within the virtual frame of target gesture information within each frame of user behavior image, and display the target control at the display location. In a specific implementation manner, fig. 49 is a schematic diagram of movement of the target control provided in the embodiment of the present application, as can be seen from fig. 49, it is assumed that when the controller determines that the target gesture information in the user behavior image 1 is located at a focal position P1 in the virtual frame, so as to control the display position Q1 on the display to display the target control, the target gesture information in the user behavior image 2 is located at a focal position P2 in the virtual frame, so as to control the display position Q2 on the display to display the target control, and the target gesture information in the user behavior image 3 is located at a focal position P3 in the virtual frame, so as to control the display position Q3 on the display to display the target control. However, in the above process, when the user makes the gesture, the user may move too fast in the process of P1-P2, so that the target control displayed on the display moves between Q1-Q2, and the user is provided with the impression of uneven moving speed and jump of the target control.
Thus, after the controller determines the second focal position, the processing performed by the controller may refer to the state change in fig. 50, where fig. 50 is another schematic diagram of the movement of the target control according to the embodiment of the present application. As shown in fig. 50, after the controller determines the first focus position P1 and the second focus position P2 in the virtual frame, the distance between the second focus position and the first focus position is compared with a preset time interval, if the ratio of the distance between P1-P2 and the preset time interval (i.e. the interval time of extracting the user behavior image where the first focus position and the second focus position are located) is greater than a preset threshold, it is indicated that the moving speed of the target gesture information is too fast, and if the second display position of the target control is determined according to the second focus position and the target control is displayed, the display effect as shown in fig. 49 may be brought. Accordingly, the controller determines a third focal point position P2 'between the first focal position and the second focal position, wherein a ratio of a distance between the third focal point position P2' and the first focal position P1 to a preset time interval is not greater than a preset threshold, and the third focal point position P2 'may be a point located on a connecting line between P1-P2, with P1, P2', and P2 in a linear connection relationship. The controller may then determine a second display position Q2' on the display based on the third focal point position P2' and the mapping relationship, and control the target control to move from the first display position Q1 to the second display position Q2'.
In the moving process, since the gesture information is moved to the second focus position P2 and the target control displayed on the display is not moved to the display position Q2 corresponding to the second focus position but is moved to the second display position Q2 'corresponding to the third focus position P2', when the controller processes the third user behavior image after the second user behavior image, if the third user behavior image includes the target gesture information and the fourth focus position P3 corresponding to the target gesture information is located in the rectangular virtual frame, and meanwhile, when the ratio of the distance between the fourth focus position P3 and the third focus position P2 'to the preset time interval is not greater than the preset threshold, the third display position Q3 corresponding to the fourth focus may be determined according to the mapping relationship, and the target control on the display is controlled to move from the second display position Q2' to the third display position Q3.
Finally, in the whole process, when the target gesture information at the P1-P2 position moves too fast, the moving length of the target control displayed on the display can be reduced, when the moving speed of the target gesture information at the P2-P3 position is reduced, the distance which is reduced in the P1-P2 position process can be supplemented, from the perspective of a user, when the target gesture information moves from the P1 position at the left side of the virtual frame to the P3 position at the right side, the target control on the display also moves from the Q1 position at the left side of the display to the Q3 position at the right side, so that when the target gesture information of the user moves too fast between P1-P2, the moving speed change of the target control displayed on the display at the whole P1-P3 can not be too large, and the user can be kept to have the impression of uniform moving speed and continuous change of the target control.
On the basis of the above embodiments, the embodiment of the present application further provides a display device control method, and fig. 51 is a schematic diagram of a display device control process provided by the embodiment of the present application, as shown in fig. 51, where the method includes the following steps:
S5101: and acquiring a plurality of frames of user behavior images.
S5102: and carrying out gesture recognition processing on the user behavior image of each frame to obtain target gesture information.
S5103: and controlling the display to display corresponding content based on the target gesture information.
In one embodiment, the controlling the display to display the corresponding content based on the target gesture information includes:
Acquiring a cursor position corresponding to the user behavior image of each frame according to the target gesture information; the cursor position is the display position of the user in the user behavior image, and the target gesture of the user is mapped to the display position in the display;
and determining a gesture movement track of the user according to the cursor position, and controlling the cursor in the display to move along the gesture movement track.
In the following, a specific embodiment is described with reference to a display device control procedure provided by an embodiment of the present application, and fig. 52 is a schematic diagram of another display device control procedure provided by an embodiment of the present application, as shown in fig. 52, where the method includes the following steps:
And step 5201, controlling the image collector to collect a plurality of frames of user behavior images of the user.
And 5202, carrying out gesture recognition processing on the user behavior image to obtain target gesture information of the user behavior image of each frame.
Step 5203, acquiring a cursor position corresponding to the user behavior image of each frame according to the target gesture information; the cursor position is the display position in the display mapped to by the gesture of the user in the user behavior image.
And 5204, determining a gesture movement track of a user according to the cursor position, and controlling a cursor in the display to move along the gesture movement track.
In one embodiment, the method further comprises: acquiring a gesture information stream, wherein the gesture information stream comprises a plurality of continuous frames of user behavior images; extracting key gesture information from the gesture information stream, wherein the key gesture information comprises key gesture types of a plurality of stages and confidence parameters of each stage; matching the key gesture information by using a detection model to obtain target gesture information, wherein the detection model comprises a plurality of nodes stored in a tree structure; each node is provided with a gesture template and a designated subordinate node; the target gesture information is node combinations with the same key gesture type as the gesture template in each stage and the confidence coefficient parameters in the confidence coefficient interval; and executing the control instruction associated with the target gesture information.
In one embodiment, the method further comprises: extracting a frame of image to be detected from continuous multi-frame images of video data acquired by a video acquisition device of display equipment of the display device according to a preset time interval; judging whether the image to be detected comprises gesture information of a human body or not by using a first detection model; if yes, continuously extracting a preset number of user behavior images to be detected from the video data according to the preset time interval and the preset number, and respectively identifying target gesture information and limb information of a human body in the preset number of user behavior images to be detected by using a second detection model; wherein the data volume calculated by the first detection model is smaller than the data volume calculated by the second detection model; and executing the control commands corresponding to the target gesture information and the limb information in the images to be detected of the preset number of user behavior images.
In one embodiment, the method further comprises: identifying target gesture information in the first user behavior image; establishing a rectangular virtual frame in the first user behavior image by taking a first focus position corresponding to the target gesture information as a center, displaying a target control at a first display position of the display screen, and determining a mapping relation between the rectangular virtual frame and a display of the display device; when the target gesture information is included in a second user behavior image after the first user behavior image and a second focus position corresponding to the target gesture information is located in the rectangular virtual frame, determining a second display position on the display according to the second focus position and the mapping relation; and controlling the target control on the display to move to the second display position.
Fig. 53 is a flowchart of an embodiment of a control method of a display device according to an embodiment of the present application, and in a specific implementation manner shown in fig. 53, the process includes the following steps:
S5301: the controller of the display device first performs gesture detection, if the gesture state is normal, S5302-S5306 are performed, otherwise S5307 is performed.
S5302-S5306: and mapping the position of the television interface cursor according to the position of the hand in the virtual frame, performing gesture movement control, updating the gesture speed and the gesture direction, detecting gesture clicking, detecting gesture returning and the like.
S5307: a prediction of the behavior of a plurality of frames (typically three frames) is performed.
S5308: if the gesture is re-detected during the period, S5312 is executed, otherwise S5309 is executed.
S5309-S5310: and (3) cleaning the mouse in the television interface, and if the gesture is not detected for a long time, executing S5311.
S5311: and exiting gesture limb recognition, and entering a global gesture detection scheme until a focus gesture is detected.
S5312: and (5) resetting the focus, if the distance is short, continuing to move, and if the distance is long, resetting the focus to be the central position of the television. Wherein, when the focus is reset, the virtual frame needs to be regenerated. Furthermore, if the gesture is not detected multiple times.
The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. The illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.