CN117612265A - Model training method, gesture recognition method, electronic device and storage medium - Google Patents
Model training method, gesture recognition method, electronic device and storage medium Download PDFInfo
- Publication number
- CN117612265A CN117612265A CN202311862254.XA CN202311862254A CN117612265A CN 117612265 A CN117612265 A CN 117612265A CN 202311862254 A CN202311862254 A CN 202311862254A CN 117612265 A CN117612265 A CN 117612265A
- Authority
- CN
- China
- Prior art keywords
- gesture recognition
- dynamic
- recognition model
- gesture
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 86
- 210000000988 bone and bone Anatomy 0.000 claims abstract description 51
- 238000003062 neural network model Methods 0.000 claims abstract description 9
- 238000006243 chemical reaction Methods 0.000 claims abstract description 8
- 238000004590 computer program Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 10
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000011435 rock Substances 0.000 description 2
- 239000004575 stone Substances 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000006386 memory function Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a training method, a gesture recognition method, electronic equipment and a storage medium of a model, wherein the training method comprises the following steps: collecting video data of dynamic gestures of multiple categories to obtain a training sample set; extracting features of the training sample set through a preset neural network model to respectively obtain bone node data of dynamic gestures of multiple categories; converting the bone node data into bone data through a conversion algorithm; training the gesture recognition model for multiple times based on the dynamic gestures of multiple categories and corresponding bone data to obtain the accuracy of the gesture recognition model; and responding to the accuracy of the gesture recognition model reaching a preset value, and completing training of the gesture recognition model. According to the gesture recognition method and device, the video data of different gestures are converted into the bone data through collecting the video data of gesture dynamics, the gesture recognition model is trained based on the bone data of the gestures, the shooting angle of the gestures and the influence of the hand shape on gesture recognition can be reduced, and the accuracy of the gesture recognition model is improved.
Description
Technical Field
The present disclosure relates to the field of gesture recognition technologies, and in particular, to a training method for a model, a gesture recognition method, an electronic device, and a storage medium.
Background
At present, gesture recognition technology has been widely applied to the fields of man-machine interaction, sign language recognition and the like. The electronic equipment identifies the shot gesture, so that a preset corresponding function is realized, or the gesture meaning is analyzed. In the existing gesture recognition based on the neural network or the machine learning method, a large number of gesture pictures are usually required for training, and in addition, the accuracy of gesture recognition can be affected by different hand shapes of different people and different shooting angles of the same gesture.
Disclosure of Invention
In order to solve the problems, the application provides a training method of a model, a gesture recognition method, electronic equipment and a storage medium.
In order to solve the above technical problems, the present application provides a first technical solution, and provides a training method for a gesture recognition model, where the training method includes: collecting video data of dynamic gestures of multiple categories to obtain a training sample set; extracting features of the training sample set through a preset neural network model to respectively obtain bone node data of the dynamic gestures of multiple categories; converting the bone node data into bone data through a conversion algorithm; training the gesture recognition model for multiple times based on the dynamic gestures of multiple categories and the corresponding bone data to obtain the accuracy of the gesture recognition model; and responding to the accuracy of the gesture recognition model to reach a preset value, and completing training of the gesture recognition model.
Wherein the step of training the gesture recognition model for a plurality of times based on the dynamic gestures of a plurality of categories and the corresponding bone data comprises: setting a plurality of model parameters of the gesture recognition model; inputting the dynamic gestures of a plurality of categories and the corresponding skeletal data into the set gesture recognition model; training the gesture recognition model multiple times based on multiple model parameters, multiple categories of the dynamic gestures, and the corresponding skeletal data; outputting the trained gesture recognition model and outputting training results of the dynamic gestures of a plurality of categories, wherein the training results comprise training times and training accuracy.
Wherein, after the step of training the gesture recognition model multiple times based on the dynamic gestures of multiple categories and the corresponding bone data, the training method further comprises: testing the gesture recognition model based on skeletal data of the dynamic gestures of a plurality of categories; the step of testing the gesture recognition model based on the skeletal data of the dynamic gestures of the plurality of categories includes: based on skeleton data of the dynamic gestures of a plurality of categories, obtaining a two-dimensional image of the dynamic gestures of each category; and inputting the two-dimensional image into the gesture recognition model to obtain the prediction probability and the test accuracy of the dynamic gesture in each category respectively.
Wherein the gesture recognition model includes an input layer, a network layer, and an output layer, and the step of setting a plurality of model parameters of the gesture recognition model includes: setting the width, height and depth of the input layer; the network layer comprises at least one full-connection layer, and the connection type, the neuron number and the activation function of the full-connection layer are set; and setting the output number of the output layers, wherein the output number is equal to the category number of the dynamic gestures.
The step of acquiring video data of dynamic gestures of multiple categories to obtain a training sample set comprises the following steps: collecting video data of dynamic gestures of multiple categories within preset time; sampling the video data based on a preset time interval to obtain a data set of the dynamic gestures of each category; and obtaining the training sample set based on the data sets of the dynamic gestures of a plurality of categories.
In order to solve the technical problems, another technical scheme provided by the application is as follows: provided is a gesture recognition method, the gesture recognition method including: collecting video data of dynamic gestures; based on the video data of the dynamic gesture, obtaining skeleton data of the dynamic gesture; inputting the skeleton data of the dynamic gesture into the gesture recognition model to obtain the category of the dynamic gesture; the gesture recognition model is obtained through training by the training method of the gesture recognition model.
The step of inputting the skeleton data of the dynamic gesture into the gesture recognition model to obtain the category of the dynamic gesture comprises the following steps: inputting skeleton data of the dynamic gestures into the gesture recognition model to obtain prediction probabilities of the dynamic gestures in each category respectively; and selecting the dynamic gesture category with the highest prediction probability as a category result of the dynamic gesture.
The step of inputting the skeleton data of the dynamic gesture into the gesture recognition model to obtain the prediction probability of the dynamic gesture in each category respectively comprises the following steps: inputting the skeleton data of the dynamic gesture into the gesture recognition model to obtain a two-dimensional image of the dynamic gesture; further, the two-dimensional image of the dynamic gesture is input into the gesture recognition model, and the prediction probability of the dynamic gesture in each category is obtained.
In order to solve the technical problems, another technical scheme provided by the application is as follows: an electronic device is provided, which includes a processor and a memory connected to the processor, wherein program data and preset grammar rules are stored in the memory, and the processor invokes the program data stored in the memory to execute the training method of the gesture recognition model or the gesture recognition method.
In order to solve the technical problems, another technical scheme provided by the application is as follows: there is provided a storage medium storing a computer program for implementing a training method of a gesture recognition model as described above or a gesture recognition method as described above when executed by a processor.
The application provides a training method of a model, a gesture recognition method, electronic equipment and a storage medium, wherein the training method comprises the following steps: collecting video data of dynamic gestures of multiple categories to obtain a training sample set; extracting features of the training sample set through a preset neural network model to respectively obtain bone node data of dynamic gestures of multiple categories; converting the bone node data into bone data through a conversion algorithm; training the gesture recognition model for multiple times based on the dynamic gestures of multiple categories and corresponding bone data to obtain the accuracy of the gesture recognition model; and responding to the accuracy of the gesture recognition model reaching a preset value, and completing training of the gesture recognition model. Based on the mode, the influence of the hand shape on gesture recognition can be reduced by converting the video data of different gestures into the bone data and training the gesture recognition model based on the bone data of the gestures, and in addition, the influence of the shooting angle of the gestures on gesture recognition can be reduced due to the acquisition of the video data of gesture dynamics, so that the accuracy rate of the gesture recognition model for recognizing different gestures is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a flow chart of an embodiment of a method for training a gesture recognition model provided herein;
FIG. 2 is a flow chart of another embodiment of a method of training a gesture recognition model provided herein;
FIG. 3 is a flow chart of yet another embodiment of a training method for a gesture recognition model provided herein;
FIG. 4 is a flow chart of an embodiment of a gesture recognition method provided herein;
FIG. 5 is a flow chart illustrating another embodiment of a gesture recognition method provided herein
FIG. 6 is a schematic diagram of a frame of an embodiment of an electronic device provided herein;
fig. 7 is a schematic structural diagram of an embodiment of a storage medium provided in the present application.
Detailed Description
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms "first," "second," and the like in this disclosure are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The existing gesture recognition based on the neural network or the machine learning method generally needs a large amount of training samples, and has the defects of large calculated amount, complex model and the like. In addition, the accuracy of gesture recognition can be affected by different hand shapes of different people and different shooting angles of the same gesture.
Based on the above problems, the present application provides a training method for a gesture recognition model. The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the drawings in the embodiments of the present application. Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a training method of a gesture recognition model provided in the present application. The training method of the gesture recognition model is applied to the electronic equipment; the electronic device may be a server, a computer, a tablet, or the like, and the electronic device of the present application may be a computer.
The training method of the gesture recognition model of the embodiment comprises the following steps:
s101: video data of dynamic gestures of multiple categories are collected, and a training sample set is obtained.
The video data of the dynamic gestures are collected through a camera module of the electronic equipment, and the video data of the dynamic gestures can be collected through an external camera module.
The dynamic gesture is a gesture action of the same gesture dynamic. For example, when capturing video data of a dynamic gesture, the wrist or pan gesture may be rotated to capture gestures of different shooting angles.
The dynamic gestures of multiple categories can comprise gestures of scissors, stones, cloths and the like, and the dynamic gestures of multiple categories can also comprise gestures of different numbers.
S102: and extracting features of the training sample set through a preset neural network model to respectively obtain bone node data of dynamic gestures of multiple categories.
The preset neural network model may be an existing neural network model, which is not described herein. For example, the neural network model may be a bone node recognition model constructed based on an openPose model, an SRHandNet model.
The bone node data includes position data of 21 bone nodes. Wherein, 21 bone nodes are located respectively in finger and palm, and every finger includes 4 bone nodes, and the palm includes a bone node.
S103: the bone node data is converted into bone data by a conversion algorithm.
The skeleton data comprise dimension data of fingers and palms and rotation data of the fingers and the palms. When the bone node data are converted into bone data through a conversion algorithm, the dimension data of the fingers and the palm are unchanged, and the position data of a plurality of bone nodes of the same gesture are converted into rotation data of the fingers and the palm.
S104: and training the gesture recognition model for multiple times based on the dynamic gestures of multiple categories and corresponding bone data to obtain the accuracy of the gesture recognition model.
The names of the dynamic gestures of a plurality of categories and corresponding bone data are input into a gesture recognition model, and the gesture recognition model is trained for a plurality of times. The names of the dynamic gestures of the plurality of categories are used to identify corresponding skeletal data. And after each training, the gesture recognition model output accuracy rate gradually becomes higher along with the increase of training times.
S105: and responding to the accuracy of the gesture recognition model reaching a preset value, and completing training of the gesture recognition model.
Wherein the preset value can be set manually. For example, the preset value may be 0.90, 0.93, 0.95, or the like. And responding to the accuracy of the gesture recognition model reaching a preset value, completing training of the gesture recognition model, and outputting the trained gesture recognition model.
Therefore, according to the training method of the model of the embodiment, by converting video data of different gestures into bone data and training the gesture recognition model based on the bone data of the gestures, the influence of hand shapes on gesture recognition can be reduced, in addition, as the video data of gesture dynamics are collected, the influence of shooting angles of the gestures on gesture recognition can be reduced, and the accuracy rate of the gesture recognition model for recognizing different gestures is improved.
Referring to fig. 2, fig. 2 is a flowchart illustrating another embodiment of a training method of a gesture recognition model provided in the present application. The present embodiment is a specific implementation manner of the step of training the gesture recognition model multiple times based on multiple types of dynamic gestures and corresponding bone data in fig. 1.
The training method of the gesture recognition model of the embodiment comprises the following steps:
s201: a plurality of model parameters of the gesture recognition model are set.
The gesture recognition model comprises an input layer, a network layer and an output layer, and model parameters of the input layer, the network layer and the output layer are respectively set.
The width, height and depth of the input layer are set. The width, height and depth of the input layer are the image width, image height and image channel number respectively. The number of image channels is the number of image color channels. For example, the depth of an RGB image is 3, and the depth of a grayscale image is 1.
The network layer comprises at least one full connection layer, and in this embodiment, the gesture recognition model comprises two full connection layers, and the connection type, the neuron number and the activation function of the full connection layers are respectively set. The connection type of the full connection layer is full connection. The activation function may be a ReLu function.
And setting the output number of the output layers, wherein the output number is equal to the category number of the dynamic gestures.
The model parameters of the gesture recognition model may be set according to historical experience, and different model parameters may affect the performance of the gesture recognition model.
S202: and inputting the dynamic gestures of the multiple categories and corresponding skeletal data into the set gesture recognition model.
After setting the model parameters of the gesture recognition model, inputting the dynamic gestures of multiple categories and corresponding bone data into the gesture recognition model.
S203: the gesture recognition model is trained multiple times based on the multiple model parameters, the multiple categories of dynamic gestures, and the corresponding skeletal data.
S204: outputting the trained gesture recognition model and outputting training results of dynamic gestures of a plurality of categories, wherein the training results comprise training times and training accuracy.
After the gesture recognition model is trained for a plurality of times, outputting the trained gesture recognition model, the training times and the training accuracy.
Optionally, after the training step of the gesture recognition model is completed in response to the accuracy of the gesture recognition model reaching the preset value, the training method of the gesture recognition model further includes: the gesture recognition model is tested based on skeletal data of the dynamic gestures of the plurality of categories.
S205: based on the skeleton data of the dynamic gestures of the multiple categories, a two-dimensional image of the dynamic gestures of each category is obtained.
In the process of training the gesture recognition model for multiple times, bone data of dynamic gestures of each category are mapped to a two-dimensional array to obtain a two-dimensional image, and in the process of training each time, the two-dimensional image is updated until training of the gesture recognition module is completed, so that a characteristic image of the dynamic gesture is obtained. The feature images are used for representing gesture features of the corresponding dynamic gestures.
When the gesture recognition model is tested, bone data of the dynamic gestures of each category are input into the gesture recognition model again, and two-dimensional images corresponding to the dynamic gestures of each category are obtained respectively.
S206: inputting the two-dimensional image into a gesture recognition model to obtain the prediction probability and the test accuracy of the dynamic gesture in each category respectively.
Inputting the two-dimensional image of the dynamic gesture into a gesture recognition model, and comparing the two-dimensional image of the dynamic gesture with a plurality of characteristic images generated in the training process of the dynamic gestures of a plurality of categories to obtain the prediction probability of the dynamic gesture in each category respectively. And selecting the dynamic gesture category with the highest prediction probability as a category result of the dynamic gesture, thereby obtaining the test accuracy.
Therefore, by setting the model parameters of the gesture recognition model, the method can realize the custom gesture recognition model, train the gesture recognition model based on skeleton data of multiple types of dynamic gestures, reduce the influence of hand shapes on gesture recognition, and improve the accuracy of the gesture recognition model in recognizing different gestures.
Referring to fig. 3, fig. 3 is a flowchart illustrating a training method of a gesture recognition model according to another embodiment of the present application. The embodiment is a specific implementation manner of acquiring video data of multiple types of dynamic gestures in fig. 1 to obtain a training sample set.
The training method of the gesture recognition model of the embodiment comprises the following steps:
s301: and collecting video data of dynamic gestures of multiple categories within preset time.
Wherein the preset time may be 3 seconds, 4 seconds or 5 seconds. In this embodiment, the preset time is 3 seconds. And respectively acquiring video data of dynamic gestures of multiple categories through the electronic equipment or an external camera module within preset time.
S302: and sampling the video data based on a preset time interval to obtain a data set of the dynamic gestures of each category.
The preset time interval may be 0.2 seconds, 0.3 seconds or 0.4 seconds. In this embodiment, the preset time interval is 0.2 seconds. And sampling the video data based on a preset time interval to obtain multi-frame gesture images, and further obtaining a data set of the dynamic gestures of each category.
S303: a training sample set is obtained based on the data sets of the dynamic gestures of the plurality of categories.
The training sample set comprises a plurality of types of data sets of dynamic gestures, and each type of data set of dynamic gestures comprises a plurality of frames of gesture images.
Therefore, by means of the method, video data of dynamic gestures are recorded continuously within the preset time, and as the gestures are dynamic, the gestures swing or rotate at different angles in space, and further, the video data are sampled to obtain gesture images at different shooting angles, so that the influence of the shooting angles of the gestures on gesture recognition can be reduced, and the accuracy of the gesture recognition model in recognizing different gestures is improved.
Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of a gesture recognition method provided in the present application. According to the gesture recognition method, gestures of different types are recognized through the gesture recognition model, and the gesture recognition model is trained through the training method of the gesture recognition model.
The gesture recognition method of the present embodiment includes the following steps:
s401: video data of the dynamic gesture is collected.
The user can acquire video data of the dynamic gestures in real time through the camera module of the personal terminal or an external camera module.
In some embodiments, the user may upload a video file or an image file of the dynamic gesture.
S402: and obtaining skeleton data of the dynamic gesture based on the video data of the dynamic gesture.
The method comprises the steps of inputting video data of a dynamic gesture into a preset neural network model, obtaining bone node data of the dynamic gesture, and converting the bone node data into bone data through a conversion algorithm.
The specific implementation of this step is the same as steps S102 and S103, and will not be described here again.
S403: and inputting skeleton data of the dynamic gestures into a gesture recognition model to obtain the categories of the dynamic gestures.
Inputting skeleton data of the dynamic gestures into a gesture recognition model to obtain prediction probabilities of the dynamic gestures in each category respectively; and selecting the dynamic gesture category with the highest prediction probability as a category result of the dynamic gesture.
Therefore, by the method of the embodiment, the gesture recognition model can recognize dynamic gestures of multiple categories, video data of the dynamic gestures are collected to be converted into corresponding bone data, and the bone data are further input into the gesture recognition model to obtain the categories of the dynamic gestures. By inputting the skeleton data of the dynamic gestures into the gesture recognition model to recognize gestures of different types, the influence of the hand shape on gesture recognition can be reduced, and the accuracy of the gesture recognition model in recognizing different gestures can be improved.
Referring to fig. 5, fig. 5 is a flowchart illustrating another embodiment of a gesture recognition method provided in the present application. The embodiment is a specific implementation manner of the step of inputting the skeleton data of the dynamic gesture in fig. 4 into the gesture recognition model to obtain the category of the dynamic gesture.
The gesture recognition method of the present embodiment includes the following steps:
s501: and inputting the skeleton data of the dynamic gesture into a gesture recognition model to obtain a two-dimensional image of the dynamic gesture.
The skeleton data of the dynamic gesture is mapped to a two-dimensional array through the gesture recognition model, and a two-dimensional image corresponding to the gesture is obtained.
S502: and inputting the two-dimensional image of the dynamic gesture into a gesture recognition model to obtain the prediction probability of the dynamic gesture in each category.
The gesture recognition model generates characteristic images of dynamic gestures of a plurality of categories in a training process, and therefore the gesture recognition model comprises the characteristic images of the dynamic gestures of the plurality of categories.
Inputting the two-dimensional images of the dynamic gestures into a gesture recognition model, and comparing the two-dimensional images with characteristic images of the dynamic gestures of a plurality of categories to obtain the prediction probability of the dynamic gestures in each category respectively.
S503: and selecting the dynamic gesture category with the highest prediction probability as a category result of the dynamic gesture.
And selecting the dynamic gesture category with the highest prediction probability, and obtaining the category result of the dynamic gesture. In one embodiment, the gesture recognition model may output the top three predictive probabilities. For example, the gesture recognition model may recognize six gestures, scissors, stone, cloth, rock, OK, and number 1, respectively. The user collects video data of a scissors gesture, bone data of the scissors gesture is obtained through a preset neural network and a conversion algorithm, the bone data is input into a gesture recognition model for recognition, the bone data are obtained to be respectively the prediction probabilities of scissors, rock and number 1, and the gesture type corresponding to the bone data is scissors because the prediction probability of the category of scissors is highest.
Therefore, by the method of the embodiment, the prediction probability of the dynamic gesture in each category is obtained by inputting the skeleton data of the dynamic gesture into the gesture recognition model, and then the category result of the dynamic gesture is obtained.
Referring to fig. 6, fig. 6 is a schematic frame diagram of an embodiment of an electronic device provided in the present application. As shown in fig. 6, the electronic device 100 includes a processor 101 and a memory 102 connected to the processor 101. The memory 102 is used for storing a computer program, and the processor 101 is used for executing the computer program to implement the training method or the gesture recognition method of the gesture recognition model.
The processor 101 may also be referred to as a CPU (Central Processing Unit ). The processor 101 may be an electronic chip with signal processing capabilities. Processor 101 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 102 may be a memory bank, a TF card, or the like, and may store all information in the electronic device 100, including input raw data, a computer program, intermediate operation results, and final operation results, which are stored in the memory 102. It stores and retrieves information based on the location specified by the processor 101. With the memory 102, the electronic device 100 has a memory function to ensure normal operation. The memory 102 of the electronic device 100 may be classified into a main memory (memory) and an auxiliary memory (external memory) according to the purpose, and may be classified into an external memory and an internal memory. The external memory is usually a magnetic medium, an optical disk, or the like, and can store information for a long period of time. The memory refers to a storage component on the motherboard for storing data and programs currently being executed, but is only used for temporarily storing programs and data, and the data is lost when the power supply is turned off or the power is turned off.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a storage medium provided in the present application. As shown in fig. 7, the storage medium 110 stores therein a computer program 111 capable of implementing all the methods described above.
The units integrated with the respective functional units in the embodiments of the present application may be stored in the storage medium 110 if implemented in the form of software functional units and sold or used as independent products. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or all or part of the technical solution in the form of a software product, and the storage medium 110 includes several instructions in a computer program 111 to enable a computer device (which may be a personal computer, a system server, or a network device, etc.), an electronic device (such as MP3, MP4, etc., also mobile terminals such as a mobile phone, a tablet computer, a wearable device, etc., also a desktop computer, etc.), or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, the present application may take the form of a computer program product embodied on one or more storage media 110 (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or blocks in the flowchart illustrations and/or block diagrams, may be implemented by storage medium 110. These storage media 110 may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the computer program 111, which is executed by the processor of the computer or other programmable data processing apparatus, produces means for implementing the functions specified in the flowchart block or blocks and/or block diagram block or blocks.
These storage media 110 may also be stored in a memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the computer program 111 stored in the storage media 110 produces an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks and/or block diagram block or blocks.
These storage media 110 may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the computer program 111, which executes on the computer or other programmable apparatus, provides steps for implementing the functions specified in the flowchart block or blocks and/or block diagram block or blocks.
Any process or method descriptions in flow diagrams or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiments of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
Logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any medium for use by or in connection with an instruction execution system, apparatus, or device (which can be a personal computer, server, network device, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions).
The foregoing is only the embodiments of the present application, and not the patent scope of the present application is limited by the foregoing description, but all equivalent structures or equivalent processes using the contents of the present application and the accompanying drawings, or directly or indirectly applied to other related technical fields, which are included in the patent protection scope of the present application.
Claims (10)
1. A method for training a gesture recognition model, the method comprising:
collecting video data of dynamic gestures of multiple categories to obtain a training sample set;
extracting features of the training sample set through a preset neural network model to respectively obtain bone node data of the dynamic gestures of multiple categories;
converting the bone node data into bone data through a conversion algorithm;
training the gesture recognition model for multiple times based on the dynamic gestures of multiple categories and the corresponding bone data to obtain the accuracy of the gesture recognition model;
and responding to the accuracy of the gesture recognition model to reach a preset value, and completing training of the gesture recognition model.
2. The training method of claim 1, wherein the step of training the gesture recognition model a plurality of times based on the dynamic gestures of a plurality of categories and the corresponding skeletal data comprises:
setting a plurality of model parameters of the gesture recognition model;
inputting the dynamic gestures of a plurality of categories and the corresponding skeletal data into the set gesture recognition model;
training the gesture recognition model multiple times based on multiple model parameters, multiple categories of the dynamic gestures, and the corresponding skeletal data;
outputting the trained gesture recognition model and outputting training results of the dynamic gestures of a plurality of categories, wherein the training results comprise training times and training accuracy.
3. The training method of claim 2, wherein after the step of training the gesture recognition model a plurality of times based on the dynamic gestures of the plurality of categories and the corresponding skeletal data, the training method further comprises:
testing the gesture recognition model based on skeletal data of the dynamic gestures of a plurality of categories;
the step of testing the gesture recognition model based on the skeletal data of the dynamic gestures of the plurality of categories includes:
based on skeleton data of the dynamic gestures of a plurality of categories, obtaining a two-dimensional image of the dynamic gestures of each category;
and inputting the two-dimensional image into the gesture recognition model to obtain the prediction probability and the test accuracy of the dynamic gesture in each category respectively.
4. The training method of claim 2, wherein the gesture recognition model comprises an input layer, a network layer, and an output layer, and wherein the step of setting a plurality of model parameters of the gesture recognition model comprises:
setting the width, height and depth of the input layer;
the network layer comprises at least one full-connection layer, and the connection type, the neuron number and the activation function of the full-connection layer are set;
and setting the output number of the output layers, wherein the output number is equal to the category number of the dynamic gestures.
5. The training method according to any one of claims 1-4, wherein the step of acquiring video data of a plurality of categories of dynamic gestures to obtain a training sample set comprises:
collecting video data of dynamic gestures of multiple categories within preset time;
sampling the video data based on a preset time interval to obtain a data set of the dynamic gestures of each category;
and obtaining the training sample set based on the data sets of the dynamic gestures of a plurality of categories.
6. A gesture recognition method, characterized in that the gesture recognition method comprises:
collecting video data of dynamic gestures;
based on the video data of the dynamic gesture, obtaining skeleton data of the dynamic gesture;
inputting the skeleton data of the dynamic gesture into the gesture recognition model to obtain the category of the dynamic gesture;
wherein the gesture recognition model is trained by the training method of the gesture recognition model according to any one of claims 1 to 5.
7. The gesture recognition method of claim 6, wherein the step of inputting the skeletal data of the dynamic gesture into the gesture recognition model to obtain the category of the dynamic gesture comprises:
inputting skeleton data of the dynamic gestures into the gesture recognition model to obtain prediction probabilities of the dynamic gestures in each category respectively;
and selecting the dynamic gesture category with the highest prediction probability as a category result of the dynamic gesture.
8. The method of claim 7, wherein the step of inputting the skeletal data of the dynamic gesture into the gesture recognition model to obtain the predicted probabilities of the dynamic gesture in each category respectively comprises:
inputting the skeleton data of the dynamic gesture into the gesture recognition model to obtain a two-dimensional image of the dynamic gesture;
further, the two-dimensional image of the dynamic gesture is input into the gesture recognition model, and the prediction probability of the dynamic gesture in each category is obtained.
9. An electronic device, comprising a processor and a memory connected to the processor, wherein program data and preset grammar rules are stored in the memory, and the processor invokes the program data stored in the memory to perform the training method of the gesture recognition model according to any one of claims 1-5 or the gesture recognition method according to any one of claims 6-8.
10. A storage medium storing a computer program for implementing a training method of a gesture recognition model according to any one of claims 1-5 or a gesture recognition method according to any one of claims 6-8 when executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311862254.XA CN117612265A (en) | 2023-12-29 | 2023-12-29 | Model training method, gesture recognition method, electronic device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311862254.XA CN117612265A (en) | 2023-12-29 | 2023-12-29 | Model training method, gesture recognition method, electronic device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117612265A true CN117612265A (en) | 2024-02-27 |
Family
ID=89951817
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311862254.XA Pending CN117612265A (en) | 2023-12-29 | 2023-12-29 | Model training method, gesture recognition method, electronic device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117612265A (en) |
-
2023
- 2023-12-29 CN CN202311862254.XA patent/CN117612265A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110414499B (en) | Text position positioning method and system and model training method and system | |
US11055516B2 (en) | Behavior prediction method, behavior prediction system, and non-transitory recording medium | |
CN110427852B (en) | Character recognition method and device, computer equipment and storage medium | |
WO2023284416A1 (en) | Data processing method and device | |
US20200257902A1 (en) | Extraction of spatial-temporal feature representation | |
CN113128368B (en) | Method, device and system for detecting character interaction relationship | |
CN112364799A (en) | Gesture recognition method and device | |
CN110689518B (en) | Cervical cell image screening method, cervical cell image screening device, computer equipment and storage medium | |
CN111722700A (en) | Man-machine interaction method and man-machine interaction equipment | |
CN114385012A (en) | Motion recognition method and device, electronic equipment and readable storage medium | |
CN113050860B (en) | Control identification method and related device | |
CN118314618A (en) | Eye movement tracking method, device, equipment and storage medium integrating iris segmentation | |
CN111797867A (en) | System resource optimization method, device, storage medium and electronic device | |
CN109635706B (en) | Gesture recognition method, device, storage medium and device based on neural network | |
CN110838306A (en) | Voice signal detection method, computer storage medium and related equipment | |
CN112712114B (en) | Instrument analysis method, system, equipment and medium based on TextCNN-BiLSTM | |
EP4528666A1 (en) | Image processing method and apparatus, device, medium and product | |
CN111800535B (en) | Evaluation method, device, storage medium and electronic equipment for terminal operating state | |
CN117612265A (en) | Model training method, gesture recognition method, electronic device and storage medium | |
CN115690544B (en) | Multi-task learning method and device, electronic equipment and medium | |
CN116309274B (en) | Method and device for detecting small target in image, computer equipment and storage medium | |
Zhang et al. | Small-footprint keyword spotting based on gated Channel Transformation Sandglass residual neural network | |
CN113723227A (en) | Method for recognizing dynamic gestures | |
CN112489687A (en) | Speech emotion recognition method and device based on sequence convolution | |
CN111796924A (en) | Service processing method, device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |