CN114283491B

CN114283491B - Method, device and equipment for identifying motion of moving objects, and non-volatile storage medium

Info

Publication number: CN114283491B
Application number: CN202011034590.1A
Authority: CN
Inventors: 沈旭; 黄镇; 黄建强
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2025-09-05
Anticipated expiration: 2040-09-27
Also published as: CN114283491A

Abstract

This application discloses a method, apparatus, and device for identifying the motion of a moving object, as well as a non-volatile storage medium. The method comprises: acquiring a motion image of the moving object; analyzing the motion image to obtain a sequence of joint points of the moving object; extracting spatiotemporal dynamic features of the motion of the moving object from the sequence of joint points; and identifying the motion of the moving object based on the spatiotemporal dynamic features. This application addresses the technical problem in related art of ignoring the dependencies between interspaced, non-directly connected joint points when identifying pedestrian motion, resulting in inaccurate recognition results.

Description

Motion recognition method, device and equipment for moving object and nonvolatile storage medium

Technical Field

The present application relates to the technical field of behavior recognition, and in particular, to a method, an apparatus, a device, and a non-volatile storage medium for recognizing a motion of a moving object.

Background

Today, deep learning technology is widely used in various fields, in which the task of motion recognition is the first step from static vision to dynamic vision, and the biggest feature of pedestrian skeleton (Skeleton) data is that it is a map data of non-european space, and only skeleton (Bone) links are provided between specific points of care (joints).

The pedestrian action recognition based on the articulation point is always considered as a very promising and potential research subject due to the strong robustness of the pedestrian skeleton (Skeleton) data to various complex application scenes such as visual angles, shielding, illumination and the like. The existing method mainly uses a graph roll-up neural network (GCN) based method for motion recognition, focuses on learning the topological structure of a graph, learns the relationships (such as wrists and fingers) between directly connected nodes in graph data, ignores the dependency relationships (such as hands and feet) between the nodes which are spaced and not directly connected, but in the motion of a person, the hands and feet often have strong correlation, and the information is very critical in recognizing the motion of the person. And the method based on the graph convolution neural network (GCN) has the advantages of more applied model parameters, large calculated amount and unfavorable deployment for practical application.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for identifying motion of a moving object and a nonvolatile storage medium, which at least solve the technical problem that the identification result is inaccurate because the dependency relationship between spaced and indirectly connected joint points is ignored when the motion of a pedestrian is identified in the related technology.

According to one aspect of the embodiment of the application, a moving object motion recognition method is provided, which comprises the steps of obtaining a moving object motion image, analyzing the moving object motion image to obtain a moving object joint point sequence, extracting space-time dynamic characteristics of the moving object motion from the moving object joint point sequence, and recognizing the moving object motion based on the space-time dynamic characteristics.

According to another aspect of the embodiment of the application, a moving object motion recognition method is provided, which comprises the steps of obtaining a moving object motion image, analyzing the moving object motion image to obtain a moving object joint point sequence, setting the moving object joint point sequence as an input parameter of a space-time diagram convolution network model, and outputting a recognition result of the moving object motion, wherein the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the moving object motion from the moving object joint point sequence, and the space-time dynamic characteristics are used for predicting the recognition result.

According to another aspect of the embodiment of the application, a motion recognition method for a moving object is further provided, and the motion recognition method comprises the steps of responding to control operation received by a client, obtaining a motion recognition request, wherein information carried in the motion recognition request comprises a moving object motion image to be recognized, based on the motion recognition request, calling a software service of a server on the client to analyze the moving object motion image to obtain a moving object joint point sequence, and performing motion recognition on the moving object joint point sequence to obtain a recognition result of the moving object motion, wherein the software service is used for providing a space-time diagram convolution network model, the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, and the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the moving object motion from the moving object joint point sequence and predicting the recognition result.

According to another aspect of the embodiment of the application, a motion recognition method for a moving object is further provided, and the method comprises the steps of receiving a motion recognition request from a client, wherein information carried in the motion recognition request comprises a moving object motion image to be recognized, calling a software service on a server to analyze the moving object motion image based on the motion recognition request to obtain a moving object joint point sequence, and performing motion recognition on the moving object joint point sequence to obtain a recognition result of the moving object motion, wherein the software service is used for providing a space-time diagram convolution network model, the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the moving object motion from the moving object joint point sequence, the space-time dynamic characteristics are used for predicting the recognition result, and feeding the recognition result back to the client.

According to another aspect of the embodiment of the application, a moving object motion recognition device is provided, which comprises an acquisition module, an analysis module, an extraction module and a recognition module, wherein the acquisition module is used for acquiring a moving object motion image, the analysis module is used for analyzing the moving object motion image to obtain a moving object joint point sequence, the extraction module is used for extracting the space-time dynamic characteristics of the moving object motion from the moving object joint point sequence, and the recognition module is used for recognizing the moving object motion based on the space-time dynamic characteristics.

According to another aspect of the embodiment of the application, a moving object motion recognition device is provided, which comprises an acquisition unit for acquiring a moving object motion image, an analysis unit for analyzing the moving object motion image to obtain a moving object joint point sequence, and a recognition unit for setting the moving object joint point sequence as an input parameter of a space-time diagram convolution network model and outputting a recognition result of the moving object motion, wherein the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit for extracting space-time dynamic characteristics of the moving object motion from the moving object joint point sequence, and the space-time dynamic characteristics are used for predicting the recognition result.

According to another aspect of the embodiment of the present application, there is further provided a nonvolatile storage medium, where the nonvolatile storage medium includes a stored program, and when the program runs, a device in which the nonvolatile storage medium is controlled to execute any one of the above-described moving object action recognition methods.

According to another aspect of the embodiment of the application, there is also provided a moving object motion recognition device, including a processor, and a memory, connected to the processor, for providing the processor with instructions for processing steps of acquiring a moving object motion image, analyzing the moving object motion image to obtain a moving object joint point sequence, extracting a spatio-temporal dynamic feature of a moving object motion from the moving object joint point sequence, and recognizing the moving object motion based on the spatio-temporal dynamic feature.

In the embodiment of the application, the motion object action image is acquired, the motion object action image is analyzed to obtain a motion object joint point sequence, the space-time dynamic characteristics of the motion object action are extracted from the motion object joint point sequence, and the motion object action is identified based on the space-time dynamic characteristics.

It is easy to note that, in the embodiment of the application, by analyzing the acquired moving object action image, determining the moving object joint point sequence, extracting two characteristics of time and space between the joint points from the moving object joint point sequence, and immediately identifying the moving object action based on the space-time dynamic characteristics by fusing static and dynamic information, the aim of focusing on the relationship between directly connected joint points and the dependency relationship between the spaced and indirectly connected joint points is achieved, thereby realizing the technical effect of improving the accuracy of identifying the moving object action, and further solving the technical problem of inaccurate identification result caused by neglecting the dependency relationship between the spaced and indirectly connected joint points when identifying the pedestrian action in the related technology.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

Fig. 1 is a block diagram of a hardware configuration of a computer terminal (or mobile device) for implementing a moving object motion recognition method according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of motion recognition of a moving object according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative space-time diagram convolutional network model in accordance with an embodiment of the present application;

FIG. 4 is a schematic diagram of the structure of an alternative space-time diagram convolution calculation unit according to an embodiment of the present application;

FIG. 5 is a flow chart of another method of motion recognition of a moving object according to an embodiment of the present application;

FIG. 6 is a flow chart of another method of motion recognition of a moving object according to an embodiment of the present application;

FIG. 7 is a flow chart of yet another method of motion object action recognition according to an embodiment of the present application;

Fig. 8 is a schematic structural view of a moving object motion recognition apparatus according to an embodiment of the present application;

Fig. 9 is a schematic structural view of a moving object motion recognition apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural view of a moving object motion recognition apparatus according to an embodiment of the present application;

Fig. 11 is a block diagram of a computer terminal according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, partial terms or terminology appearing in the course of describing embodiments of the application are applicable to the following explanation:

deep learning (DEEP LEARNING) is an algorithm set for solving various problems such as images, texts and the like by using various machine learning algorithms on a multi-layer neural network. Deep learning can fall into the neural network from a broad class, but there are many variations in the implementation. The core of the deep learning is feature learning, and aims to acquire hierarchical feature information through a hierarchical network, so that the important difficulty that the features are required to be designed manually in the past is solved.

Motion recognition Action Recognition motion recognition studies are performed on objects in a sequence, such as determining whether a person is walking, jumping, or waving, and have important applications in video surveillance, video recommendation, and human-computer interaction.

Skeleton (Skeleton) the skeleton is usually represented by joints and bones (Bone) interconnected for characterizing the positions and connection relations of key parts of the moving object. The joints corresponding to the interconnecting parts in the moving object are also connected, and the joints corresponding to the non-adjacent parts are also not connected, which is typical graph data in non-European space.

The graph convolutional neural network (Graph Convolutional Networks) is used for processing non-Euclidean space data, such as traffic network, social network, etc., which can only process Euclidean space data. In these data, the local structure of each node is different, and the translation invariance is no longer satisfied.

Example 1

According to an embodiment of the present application, there is also provided a moving object action recognition method embodiment, it being noted that the steps shown in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases the steps shown or described may be performed in an order different from that herein.

The method embodiments provided by the embodiments of the present application may be performed in a mobile terminal, a computer terminal, or similar computing device. Fig. 1 shows a hardware block diagram of a computer terminal (or mobile device) for implementing a moving object motion recognition method, as shown in fig. 1, a computer terminal 10 (or mobile device 10) may include one or more (shown as 102a,102b, 102 n) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission module 106 for communication functions. Among other things, a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS BUS), a network interface, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuits described above may be referred to generally herein as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module, or incorporated, in whole or in part, into any of the other elements in the computer terminal 10 (or mobile device). As referred to in embodiments of the application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as a program instruction/data storage device corresponding to the motion object motion recognition method in the embodiment of the present application, and the processor 102 executes the software programs and modules stored in the memory 104, thereby executing various functional applications and data processing, that is, implementing the motion object motion recognition method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means 106 is arranged to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

In the above-mentioned operating environment, the present application provides an embodiment of a moving object motion recognition method as shown in fig. 2, and fig. 2 is a flowchart of a moving object motion recognition method according to an embodiment of the present application, and as shown in fig. 2, the moving object motion recognition method includes:

step S202, obtaining a motion image of a moving object;

Step S204, analyzing the motion image of the moving object to obtain a joint point sequence of the moving object;

Step S206, extracting the space-time dynamic characteristics of the motion of the moving object from the moving object joint point sequence;

step S208, the motion of the moving object is identified based on the space-time dynamic characteristics.

Optionally, the moving object comprises one of a human body moving object, a human body-like moving object, an animal body moving object and a machine simulation moving object.

In an optional embodiment, the moving object motion image may be a moving object motion image such as walking, jumping, running, waving, swinging, etc., and the moving object joint point sequence is obtained by analyzing a moving object joint point in the moving object motion image, and the embodiment of the application adopts a novel graph convolution neural network model, for example, a space-time graph convolution network model, to extract a space-time dynamic feature of the moving object motion from the moving object joint point sequence; and identifying the motion of the moving object based on the space-time dynamic characteristics to obtain a motion category.

The graph convolution neural network model provided by the embodiment of the application is a basic neural network model aiming at the action recognition task based on the node, and can randomly edit a network structure according to service requirements to increase or decrease the depth and width of the network. Under the condition of the same network width and depth, the embodiment of the application can adopt the network parameter quantity of 1/5 and the calculated quantity of 1/10 to exceed the existing best neural network action recognition algorithm.

In an alternative embodiment, extracting the spatio-temporal dynamics from the sequence of moving object joint points includes:

step S302, inputting the joint point sequence of the moving object into a space-time diagram convolution network model, wherein the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit;

Step S304, the multi-layer space-time diagram convolution calculation unit is utilized to extract the space-time dynamic characteristics.

As an alternative embodiment, a schematic structure of the above-mentioned space-time diagram convolutional network model (Spatio-Temporal Inception Graph Convolutional Network, STIGCN) is shown in fig. 3, where the input Previous layer of the space-time diagram convolutional network model is a moving object joint point sequence (for example, a human body joint point sequence) formed by moving object joint points, and the recognition result of the Output is a predicted action type.

As shown in fig. 3, the space-time diagram convolution network model is formed by connecting a plurality of layers of space-time diagram convolution calculation units (Spatio-Temporal Inception Block) in front and back, the instant space-time diagram convolution calculation unit 1, the space-time diagram convolution calculation unit 2 and the space-time diagram convolution calculation unit N are used for extracting space-time dynamic characteristics of the pedestrian action, wherein each space-time diagram convolution calculation unit comprises SI (space-time diagram convolution unit) and TI (time diagram convolution unit), the space-time dynamic characteristics extracted by the plurality of layers of space-time diagram convolution calculation units are subjected to mean-value pooling processing by a mean-value pooling layer Avg Pooling of the space-time diagram convolution network model, the neuron random dropping layer Dropout is subjected to neuron random dropping processing, and finally the space-time dynamic characteristics are output by a full connection layer FC.

In an alternative embodiment, fig. 4 is a schematic structural diagram of an alternative space-time diagram convolution calculating unit according to an embodiment of the present application, and as shown in fig. 4, each layer of space-time diagram convolution calculating unit in the above-mentioned multi-layer space-time diagram convolution calculating unit includes a space branch (SPATIAL PATH), a time branch (Temporal Path), and a Residual Path.

As an alternative embodiment, extracting the spatio-temporal dynamics feature using the multi-layer spatio-temporal convolution calculation unit includes:

Step S402, extracting space characteristics among a plurality of joints of the motion object from the joint point sequence of the motion object on the space branch;

step S404, extracting time features among the plurality of joints from the joint point sequence of the moving object on the time branch;

Step S406, transmitting the original input information of each layer of space-time diagram convolution calculation unit on the residual branch;

in step S408, the spatio-temporal dynamic characteristics are obtained by adding the spatial characteristics, the temporal characteristics, and the original input information.

Optionally, the spatial branch comprises a plurality of sampling units, a plurality of convolution units and a fusion unit.

In an alternative embodiment, on the spatial branch, extracting the spatial feature from the sequence of moving object joint points includes:

Step S502, extracting a plurality of scale connection features corresponding to the plurality of joint points by using the plurality of sampling units, wherein each sampling unit in the plurality of sampling units respectively samples different scale connection features;

step S504, performing feature processing on the scale connection features by adopting convolution units corresponding to each sampling unit in the convolution units to obtain a plurality of processing results;

And step S506, the fusion unit is adopted to fuse the plurality of processing results, so that the spatial characteristics are obtained.

As also shown in fig. 4, the number of sampling units (Adjacency Sampling) on the control leg may be 4, and the number of convolution units (Convolution) may be 4.

In an alternative embodiment, the number of the plurality of scale connection features is determined by the number of the plurality of nodes connected in sequence, and the number of the plurality of sampling units is the same as the number of the plurality of scale connection features.

In the embodiment of the present application, the 4 sampling units in the spatial branch are respectively used to extract a plurality of scale connection features corresponding to the plurality of nodes, for example, each sampling unit is respectively used to extract a relationship feature between 1 to 4 reconnections between the nodes, and it is assumed that 5 nodes are sequentially connected, a-b-c-d-e, where the connection between a and b is referred to as 1 reconnection, the connection between a and c is referred to as 2 reconnection, the connection between a and d is referred to as 3 reconnection, and the connection between a and e is referred to as 4 reconnection.

The spatial branch is used for extracting a plurality of scale connection features through the extraction processing of the sampling units, the plurality of scale connection features are subjected to feature processing by adopting convolution units corresponding to each sampling unit in the plurality of convolution units in the spatial branch to obtain a plurality of processing results, and then the Fusion unit (Fusion) is used for carrying out Fusion processing on the plurality of processing results (feature combinations) to obtain the spatial features.

In an alternative embodiment, as also shown in FIG. 4, the time branch includes at least one Sampling unit (Motion Sampling), a plurality of convolution units (Convolution), and a Fusion unit (Fusion).

As an optional embodiment, on the time branch, extracting the time feature from the moving object joint point sequence includes:

Step S602, extracting second order time sequence characteristics corresponding to the plurality of nodes by adopting the at least one sampling unit;

Step S604, performing feature processing on the second-order time sequence feature by adopting at least one convolution unit in the convolution units to obtain a plurality of processing results;

step S606, the fusion unit is adopted to fuse the plurality of processing results, so as to obtain the time characteristics.

In the above optional embodiment, the at least one sampling unit in the time branch is used to extract second order time sequence features corresponding to the plurality of nodes, that is, motion features of nodes between adjacent frames, and at least one convolution unit in the plurality of convolution units performs feature processing on the second order time sequence features through the convolution unit to obtain a plurality of processing results, and then the fusion unit is used to fuse the plurality of processing results to obtain the time features.

In the embodiment of the application, the original input information of each layer of space-time diagram convolution calculation unit is transmitted on the residual branch, and the space-time dynamic characteristic is obtained by adding the spatial characteristic, the time characteristic and the original input information.

The network structure of the graph convolution neural network model provided by the embodiment of the application is simple in composition, the network structure can be flexibly edited according to different applications, the depth and the width can be increased or reduced, the end-to-end training can be carried out, the network parameter number of 1/5 and the calculated amount of 1/10 can be adopted to exceed the existing best neural network action recognition algorithm, and the graph convolution neural network model is easier to practically apply.

The embodiment of the application provides a brand-new basic network model which can be used for identifying the skeletal actions of the moving objects, namely an instant space diagram convolution network model, which can simultaneously extract the dependency relations among a plurality of joint points with different scales, namely a plurality of scale connection features corresponding to the joint points, and the space diagram convolution network model can simultaneously extract two features of time and space among the joint points, and fuse static and dynamic information, thereby achieving the aim of simultaneously focusing on the relationship among the joint points which are directly connected and the dependency relations among the joint points which are spaced and are not directly connected, and further achieving the technical effect of improving the accuracy of identifying the actions of the moving objects.

As an alternative embodiment, the inventive approach may be, but is not limited to, algorithm development using an open source deep learning algorithm framework PyTorch, all of the newly developed code being Python code.

In the above-mentioned operation environment, the present application provides another moving object motion recognition method as shown in fig. 5, and fig. 5 is a flowchart of another moving object motion recognition method according to an embodiment of the present application, as shown in fig. 5, where the moving object motion recognition method includes:

Step S702, obtaining a motion image of a moving object;

step S704, analyzing the motion image of the moving object to obtain a joint point sequence of the moving object;

Step S706, setting the moving object joint point sequence as an input parameter of a space-time diagram convolution network model, and outputting a recognition result of the moving object action, wherein the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the moving object action from the moving object joint point sequence, and the space-time dynamic characteristics are used for predicting the recognition result.

The embodiment of the application discloses a method for identifying motion of a moving object, which comprises the steps of acquiring a moving object motion image, analyzing the moving object motion image to obtain a moving object joint point sequence, setting the moving object joint point sequence as an input parameter of a space-time diagram convolution network model, and outputting an identification result of the motion of the moving object, wherein the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the motion of the moving object from the moving object joint point sequence, and the space-time dynamic characteristics are used for predicting the identification result.

It is easy to note that, in the embodiment of the application, the acquired moving object action image is analyzed to determine the moving object joint point sequence, and the time-space diagram convolution network model is adopted to extract two characteristics of time and space between the joint points from the moving object joint point sequence simultaneously, namely, the time-space dynamic characteristics are fused with static and dynamic information, and the moving object action is identified based on the time-space dynamic characteristics, so that the aim of simultaneously focusing on the relationship between directly connected joint points and the dependency relationship between the spaced and indirectly connected joint points is fulfilled, the technical effect of improving the accuracy of identifying the moving object action is realized, and the technical problem that the dependency relationship between the spaced and indirectly connected joint points is ignored when the pedestrian action is identified in the related art, so that the identification result is inaccurate is solved.

Optionally, the moving object comprises one of a human moving object, a human-like moving object, an animal moving object and a machine simulation moving object.

As an alternative embodiment, the input of the space-time diagram convolution network model is a moving object joint point sequence formed by moving object joint points, the output is a predicted action category, and the space-time diagram convolution network model is formed by connecting a plurality of layers of space-time diagram convolution computing units (Spatio-Temporal Inception Block) in sequence, and the space-time diagram convolution computing units are used for extracting space-time dynamic characteristics of pedestrian actions.

In the above-mentioned operation environment, the present application provides another moving object motion recognition method as shown in fig. 6, and fig. 6 is a flowchart of another moving object motion recognition method according to an embodiment of the present application, as shown in fig. 6, the moving object motion recognition method includes:

Step S802, responding to control operation received by a client, and obtaining a motion recognition request, wherein the information carried in the motion recognition request comprises motion images of moving objects to be recognized;

step S804, based on the action recognition request, invoking a software service of a server on the client to analyze the moving object action image to obtain a moving object joint point sequence, and performing action recognition on the moving object joint point sequence to obtain a recognition result of the moving object action, wherein the software service is used for providing a space-time diagram convolution network model, and the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the moving object action from the moving object joint point sequence, and the space-time dynamic characteristics are used for predicting the recognition result.

According to the embodiment of the application, a motion recognition request is acquired through a control operation received by a client, wherein the information carried in the motion recognition request comprises a motion image of a motion object to be recognized, a software service of a server is called on the client based on the motion recognition request to analyze the motion image of the motion object to obtain a joint point sequence of the motion object, and the joint point sequence of the motion object is subjected to motion recognition to obtain a recognition result of the motion object, the software service is used for providing a space-time diagram convolution network model, the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the motion object from the joint point sequence of the motion object, and the space-time dynamic characteristics are used for predicting the recognition result.

It is easy to note that, in the embodiment of the present application, the client responds to the received control operation to obtain the motion recognition request, based on the motion recognition request, the software service of the server is invoked on the client to analyze the obtained motion image of the moving object, determine the moving object node sequence, and perform motion recognition on the moving object node sequence to obtain the recognition result of the motion object, that is, the multi-layer space-time diagram convolution computing unit in the space-time diagram convolution network model provided by the software service is used to extract the space-time dynamic feature of the moving object motion from the moving object node sequence, so as to predict the recognition result.

In an alternative embodiment, the moving object motion image may be a motion image of a moving object walking, jumping, running, waving, swinging, and the like.

It should be noted that, the execution body of the embodiment of the present application is a client, the client responds to the received control operation, obtains a motion recognition request, and sends the motion recognition request to a server to request the server to recognize the motion image of the motion object to be recognized, because the software service of the server provides a novel graph convolution neural network model, for example, a space-time diagram convolution network model, by analyzing the motion object node point in the motion image of the motion object, a motion object node point sequence is obtained, and the space-time diagram convolution network model includes a multi-layer space-time diagram convolution calculation unit to extract the space-time dynamic feature of the motion object from the motion object node point sequence; and identifying the motion of the moving object based on the space-time dynamic characteristics, and predicting to obtain the motion category of the motion of the moving object.

Still to be noted, the graph convolution neural network model provided in the embodiment of the present application is a basic neural network model for the task of identifying actions based on the node, and can arbitrarily edit the network structure according to the service requirement, and increase or decrease the network depth and width. Under the condition of the same network width and depth, the embodiment of the application can adopt the network parameter quantity of 1/5 and the calculated quantity of 1/10 to exceed the existing best neural network action recognition algorithm.

In the above-mentioned operating environment, the present application further provides an embodiment of a moving object motion recognition method as shown in fig. 7, and fig. 7 is a flowchart of a further moving object motion recognition method according to an embodiment of the present application, where, as shown in fig. 7, the moving object motion recognition method includes:

Step S902, receiving an action recognition request from a client, wherein the information carried in the action recognition request comprises an action image of a moving object to be recognized;

Step S904, based on the action recognition request, calling a software service on a server side to analyze the action image of the moving object to obtain a moving object joint point sequence, and carrying out action recognition on the moving object joint point sequence to obtain a recognition result of the action of the moving object, wherein the software service is used for providing a space-time diagram convolution network model, and the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, a multi-layer space-time diagram convolution calculation unit and a motion recognition unit, wherein the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the action of the moving object from the moving object joint point sequence, and the space-time dynamic characteristics are used for predicting the recognition result;

Step S906, feeding back the identification result to the client.

In the embodiment of the application, a server receives a motion recognition request from a client, wherein the information carried in the motion recognition request comprises a motion image of a moving object to be recognized, a software service on the server is called to analyze the motion image of the moving object based on the motion recognition request to obtain a joint point sequence of the moving object, and the joint point sequence of the moving object is subjected to motion recognition to obtain a recognition result of the motion object, the software service is used for providing a space-time diagram convolution network model, the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the motion of the moving object from the joint point sequence of the moving object, the space-time dynamic characteristics are used for predicting the recognition result, and the recognition result is fed back to the client.

It is easy to note that, in the embodiment of the present application, the client responds to the received control operation, obtains and sends the motion recognition request to the server, invokes the software service of the server to analyze the obtained motion image of the moving object, determines the joint sequence of the moving object, and performs motion recognition on the joint sequence of the moving object, so as to obtain the recognition result of the motion of the moving object, that is, the multi-layer space-time graph convolution computing unit in the space-time graph convolution network model provided by the software service is used to extract the space-time dynamic feature of the motion of the moving object from the joint sequence of the moving object, predicts the recognition result, and because the space-time dynamic feature fuses static and dynamic information, and recognizes the motion of the moving object based on the space-time dynamic feature, the purpose of focusing on the relationship between the directly connected joints at the same time and the dependency relationship between the indirectly connected joints at intervals is achieved, thereby achieving the technical effect of improving the accuracy of recognizing the motion of the moving object, and further solving the technical problem that the recognition result is inaccurate because the dependency relationship between the indirectly connected joints at intervals is ignored when the motion is recognized in the related technology.

It should be noted that, the execution body of the embodiment of the present application is a server, after the client responds to the received control operation, the client obtains the motion recognition request, and then sends the motion recognition request to the server to request the server to recognize the motion image of the moving object to be recognized, because the software service of the server provides a novel graph convolution neural network model, for example, a space-time diagram convolution network model, the moving object joint point sequence is obtained by analyzing the moving object joint point in the motion image, the multi-layer space-time diagram convolution calculation unit included in the space-time diagram convolution network model extracts the space-time dynamic characteristics of the motion object motion from the moving object joint point sequence, and then recognizes the motion object motion based on the space-time dynamic characteristics, and predicts the motion category of the motion object motion.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a non-volatile storage medium (such as ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

Example 2

According to an embodiment of the present application, there is further provided an apparatus embodiment for implementing the above-mentioned moving object motion recognition method, and fig. 8 is a schematic structural diagram of a moving object motion recognition apparatus according to an embodiment of the present application, as shown in fig. 8, the apparatus includes an acquisition module 60, an analysis module 62, an extraction module 64, and a recognition module 66, where:

The system comprises an acquisition module 60 for acquiring motion images of a moving object, an analysis module 62 for analyzing the motion images of the moving object to obtain a joint point sequence of the moving object, an extraction module 64 for extracting space-time dynamic characteristics of the motion of the moving object from the joint point sequence of the moving object, and a recognition module 66 for recognizing the motion of the moving object based on the space-time dynamic characteristics.

Here, the above-mentioned obtaining module 60, analyzing module 62, extracting module 64 and identifying module 66 correspond to steps S202 to S208 in embodiment 1, and the four modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 1. It should be noted that the above-described module may be operated as a part of the apparatus in the computer terminal 10 provided in embodiment 1.

According to an embodiment of the present application, there is further provided an apparatus embodiment for implementing the above-mentioned moving object motion recognition method, and fig. 9 is a schematic structural diagram of a moving object motion recognition apparatus according to an embodiment of the present application, as shown in fig. 9, the apparatus includes an acquisition unit 70, an analysis unit 72, and a recognition unit 74, wherein:

The system comprises an acquisition unit 70 for acquiring motion images of a moving object, an analysis unit 72 for analyzing the motion images of the moving object to obtain a joint point sequence of the moving object, and a recognition unit 74 for setting the joint point sequence of the moving object as an input parameter of a space-time diagram convolution network model and outputting a recognition result of the motion of the moving object, wherein the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit for extracting space-time dynamic characteristics of the motion of the moving object from the joint point sequence of the moving object and predicting the recognition result.

Here, the above-described obtaining unit 70, analyzing unit 72, and identifying unit 74 correspond to steps S702 to S706 in embodiment 1, and the three units are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in the above-described embodiments. It should be noted that the above-described module may be operated as a part of the apparatus in the computer terminal 10 provided in the embodiment.

It should be further noted that, the preferred implementation manner of this embodiment may be referred to the related description in embodiment 1, and will not be repeated here.

Example 3

According to an embodiment of the present application, there is also provided an embodiment of a moving object motion recognition device, which may be any one of a group of computing devices. Fig. 10 is a schematic structural view of a moving object motion recognition apparatus according to an embodiment of the present application, as shown in fig. 10, including a processor 800 and a memory 802, wherein:

The processor 800 and the memory 802 are connected to the processor 800 and provide instructions for the processor to process the steps of obtaining a moving object motion image, analyzing the moving object motion image to obtain a moving object joint point sequence, extracting a spatio-temporal dynamic characteristic of a moving object motion from the moving object joint point sequence, and identifying the moving object motion based on the spatio-temporal dynamic characteristic.

Example 4

According to an embodiment of the present application, there is also provided an embodiment of a computer terminal, which may be any one of a group of computer terminals. Alternatively, in the present embodiment, the above-described computer terminal may be replaced with a terminal device such as a mobile terminal.

Alternatively, in this embodiment, the above-mentioned computer terminal may be located in at least one network device among a plurality of network devices of the computer network.

In this embodiment, the computer terminal may execute the program code for acquiring a moving object motion image, analyzing the moving object motion image to obtain a moving object joint point sequence, extracting a spatio-temporal dynamic feature of a moving object motion from the moving object joint point sequence, and identifying the moving object motion based on the spatio-temporal dynamic feature.

Alternatively, FIG. 11 is a block diagram of a computer terminal according to an embodiment of the present application, as shown in FIG. 11, which may include one or more (only one is shown in the figure) processors 902, memory 904, and a peripheral interface 906.

The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the moving object motion recognition method and apparatus in the embodiments of the present application, and the processor executes the software programs and modules stored in the memory, thereby executing various functional applications and data processing, that is, implementing the moving object motion recognition method described above. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located relative to the processor, which may be connected to the computer terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and the application program stored in the memory through the transmission device to execute the following steps of acquiring a moving object action image, analyzing the moving object action image to obtain a moving object joint point sequence, extracting the space-time dynamic characteristics of the moving object action from the moving object joint point sequence, and identifying the moving object action based on the space-time dynamic characteristics.

Optionally, the processor may further execute program code for inputting the moving object joint point sequence into a space-time diagram convolution network model, wherein the space-time diagram convolution network model includes a multi-layer space-time diagram convolution calculation unit, and extracting the space-time dynamic feature by using the multi-layer space-time diagram convolution calculation unit.

Optionally, the processor may further execute a program code for extracting spatial features among a plurality of nodes of the motion object motion from the motion object node sequence on the spatial branch, extracting temporal features among the plurality of nodes from the motion object node sequence on the temporal branch, transmitting original input information of each layer of space-time diagram convolution computing unit on the residual branch, and obtaining the space-time dynamic feature by adding the spatial features, the temporal features and the original input information.

Optionally, the processor may further execute program codes for extracting a plurality of scale connection features corresponding to the plurality of nodes by using the plurality of sampling units, where each sampling unit in the plurality of sampling units samples a different scale connection feature, performing feature processing on the plurality of scale connection features by using a convolution unit corresponding to each sampling unit in the plurality of convolution units to obtain a plurality of processing results, and performing fusion processing on the plurality of processing results by using the fusion unit to obtain the spatial feature.

Optionally, the processor may further execute program codes of extracting second order time sequence features corresponding to the plurality of nodes by using the at least one sampling unit, performing feature processing on the second order time sequence features by using at least one convolution unit in the plurality of convolution units to obtain a plurality of processing results, and performing fusion processing on the plurality of processing results by using the fusion unit to obtain the time feature.

Optionally, the processor may further execute program codes for acquiring a moving object action image, analyzing the moving object action image to obtain a moving object joint point sequence, setting the moving object joint point sequence as an input parameter of a space-time diagram convolution network model, and outputting a recognition result of the moving object action, wherein the space-time diagram convolution network model includes a multi-layer space-time diagram convolution calculation unit, the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the moving object action from the moving object joint point sequence, and the space-time dynamic characteristics are used for predicting the recognition result.

The embodiment of the application provides a scheme for identifying the motion of a moving object, which comprises the steps of obtaining a moving object motion image, analyzing the moving object motion image to obtain a moving object joint point sequence, extracting the space-time dynamic characteristics of the motion of the moving object from the moving object joint point sequence, and identifying the motion of the moving object based on the space-time dynamic characteristics.

It will be appreciated by those skilled in the art that the configuration shown in fig. 11 is merely illustrative, and the computer terminal may be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile internet device (Mobile INTERNET DEVICES, MID), a PAD, etc. Fig. 11 is not limited to the structure of the electronic device. For example, the computer terminal may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 11, or have a different configuration than shown in FIG. 11.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device related hardware, and the program may be stored in a computer readable nonvolatile storage medium, and the nonvolatile storage medium may include a flash disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or the like.

Example 5

According to an embodiment of the present application, there is also provided an embodiment of a nonvolatile storage medium. Alternatively, in the present embodiment, the above-described nonvolatile storage medium may be used to store the program code executed by the moving object action recognition method provided in the above-described embodiment 1.

Alternatively, in this embodiment, the above-mentioned nonvolatile storage medium may be located in any one of the computer terminals in the computer terminal group in the computer network, or in any one of the mobile terminals in the mobile terminal group.

Optionally, in this embodiment the non-volatile storage medium is arranged to store program code for obtaining a moving object action image, analyzing the moving object action image to obtain a moving object joint point sequence, extracting a spatio-temporal dynamics of a moving object action from the moving object joint point sequence, and identifying the moving object action based on the spatio-temporal dynamics.

Optionally, in this embodiment, the non-volatile storage medium is arranged to store program code for inputting the sequence of moving object joints into a space-time diagram convolution network model, wherein the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, and extracting the space-time dynamic characteristics with the multi-layer space-time diagram convolution calculation unit.

Optionally, in this embodiment the non-volatile storage medium is arranged to store program code for extracting spatial features between a plurality of nodes of the motion object action from the sequence of motion object nodes on the spatial branch, extracting temporal features between the plurality of nodes from the sequence of motion object nodes on the temporal branch, delivering original input information of each layer of space-time diagram convolution computation unit on the residual branch, obtaining the spatio-temporal dynamic features by adding the spatial features, the temporal features and the original input information.

Optionally, in this embodiment, the non-volatile storage medium is configured to store program code for extracting a plurality of scale connection features corresponding to the plurality of nodes using the plurality of sampling units, where each sampling unit of the plurality of sampling units samples a different scale connection feature, performing feature processing on the plurality of scale connection features using a convolution unit of the plurality of convolution units corresponding to each sampling unit, to obtain a plurality of processing results, and performing fusion processing on the plurality of processing results using the fusion unit to obtain the spatial feature.

Optionally, in this embodiment, the non-volatile storage medium is configured to store program code for extracting second order timing features corresponding to the plurality of nodes using the at least one sampling unit, performing feature processing on the second order timing features using at least one convolution unit of the plurality of convolution units to obtain a plurality of processing results, and performing fusion processing on the plurality of processing results using the fusion unit to obtain the time feature.

Optionally, in this embodiment, the non-volatile storage medium is configured to store program code for acquiring a moving object action image, analyzing the moving object action image to obtain a moving object joint point sequence, setting the moving object joint point sequence as an input parameter of a space-time diagram convolution network model, and outputting a recognition result of the moving object action, wherein the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, the multi-layer space-time diagram convolution calculation unit is used for extracting a space-time dynamic feature of the moving object action from the moving object joint point sequence, and the space-time dynamic feature is used for predicting the recognition result.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the above, is merely a logical function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable non-volatile storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a non-volatile storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the above-mentioned method of the various embodiments of the present application. The nonvolatile storage medium includes various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk.

While the foregoing is directed to the preferred embodiments of the present application, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the application, and such changes and modifications are intended to be included within the scope of the application.

Claims

1. A motion recognition method of a moving object, comprising:

Obtaining a motion image of a moving object, wherein the moving object comprises one of a human body moving object, a human-like moving object, an animal body moving object and a machine simulation moving object;

analyzing the motion image of the moving object to obtain a joint point sequence of the moving object;

Extracting space-time dynamic characteristics of motion of a moving object from the moving object joint point sequence by using a space-time diagram convolution network model, wherein the space-time dynamic characteristics comprise spatial characteristics, the spatial characteristics are determined based on a plurality of scale connection characteristics corresponding to a plurality of joints in the moving object joint point sequence, the plurality of scale connection characteristics are used for describing connection characteristics of direct connection and indirect connection among the plurality of joints, the plurality of scale connection characteristics are obtained by sampling a plurality of sampling units, each sampling unit in the plurality of sampling units respectively samples a relation characteristic corresponding to multiple connection among the plurality of joints as the plurality of scale connection characteristics, and the plurality of sampling units are contained on a space branch road in the space-time diagram convolution network model;

and identifying the motion of the moving object based on the space-time dynamic characteristics.

2. The method of claim 1, wherein extracting the spatio-temporal dynamics from the sequence of moving object nodes using the spatio-temporal convolution network model comprises:

Inputting the joint point sequence of the moving object to the space-time diagram convolution network model, wherein the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit;

and extracting the space-time dynamic characteristics by using the multi-layer space-time diagram convolution calculation unit.

3. The method of claim 2, wherein each of the multi-layer space-time diagram convolution computing units comprises a spatial branch, a temporal branch, and a residual branch, and wherein extracting the space-time dynamic feature using the multi-layer space-time diagram convolution computing unit comprises:

Extracting spatial features among a plurality of nodes of the motion object action from the motion object node sequence on the spatial branch;

Extracting time features among the plurality of nodes from the moving object node sequence on the time branch;

Transmitting original input information of each layer of space-time diagram convolution calculation unit on the residual branch;

and adding the spatial features, the temporal features and the original input information to obtain the space-time dynamic features.

4. The method of claim 3, wherein the spatial branch comprises a plurality of sampling units, a plurality of convolution units, and a fusion unit, and wherein extracting the spatial feature from the sequence of moving object nodes on the spatial branch comprises:

Extracting a plurality of scale connection features corresponding to the plurality of joint points by adopting the plurality of sampling units, wherein each sampling unit in the plurality of sampling units respectively samples different scale connection features;

performing feature processing on the scale connection features by adopting convolution units corresponding to each sampling unit in the convolution units to obtain a plurality of processing results;

and adopting the fusion unit to fuse the plurality of processing results to obtain the spatial characteristics.

5. The method of claim 4, wherein the number of the plurality of scale connection features is determined by the number of the plurality of nodes connected in sequence, and the number of the plurality of sampling units is the same as the number of the plurality of scale connection features.

6. The method of claim 3, wherein the temporal branch comprises at least one sampling unit, a plurality of convolution units, and a fusion unit, and wherein extracting the temporal feature from the sequence of moving object nodes on the temporal branch comprises:

Extracting second-order time sequence features corresponding to the plurality of nodes by adopting the at least one sampling unit;

Performing feature processing on the second-order time sequence feature by adopting at least one convolution unit in the convolution units to obtain a plurality of processing results;

and carrying out fusion processing on the plurality of processing results by adopting the fusion unit to obtain the time characteristic.

7. A moving object motion recognition method, comprising:

The method comprises the steps of setting a moving object joint point sequence as an input parameter of a space-time diagram convolution network model, and outputting a recognition result of a moving object action, wherein the space-time diagram convolution network model comprises a plurality of layers of space-time diagram convolution calculation units, the plurality of layers of space-time diagram convolution calculation units are used for extracting space-time dynamic characteristics of the moving object action from the moving object joint point sequence, the space-time dynamic characteristics are used for predicting the recognition result, the space-time dynamic characteristics comprise space characteristics, the space characteristics are determined based on a plurality of scale connection characteristics corresponding to a plurality of joint points in the moving object joint point sequence, the plurality of scale connection characteristics are used for describing connection characteristics of direct connection and indirect connection between the plurality of joint points, the plurality of scale connection characteristics are obtained by sampling by a plurality of sampling units, each sampling unit in the plurality of sampling units respectively samples a relation characteristic corresponding to multiple connection between the plurality of joint points as the plurality of scale connection characteristics, and the plurality of sampling units are contained on a space branch in the space-time diagram convolution network model.

8. A moving object motion recognition method, comprising:

Responding to control operation received by a client, and acquiring a motion recognition request, wherein the information carried in the motion recognition request comprises a motion image of a moving object to be recognized, and the moving object comprises one of a human body moving object, a human body-like moving object, an animal body moving object and a machine simulation moving object;

And based on the action recognition request, calling a software service of a service end on the client to analyze the moving object action image so as to obtain a moving object joint point sequence, and carrying out action recognition on the moving object joint point sequence so as to obtain a recognition result of the moving object action, wherein the software service is used for providing a space-time diagram convolution network model, the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution computing unit, the multi-layer space-time diagram convolution computing unit is used for extracting space-time dynamic characteristics of the moving object action from the moving object joint point sequence, the space-time dynamic characteristics are used for predicting the recognition result, the space-time dynamic characteristics comprise spatial characteristics, the spatial characteristics are determined based on a plurality of scale connection characteristics corresponding to a plurality of joint points in the moving object joint point sequence, the scale connection characteristics are used for describing connection characteristics of direct connection and indirect connection between the plurality of joint points, the scale connection characteristics are obtained by sampling by a plurality of sampling units, each sampling unit in the plurality of sampling units respectively samples the corresponding to the plurality of scale connection characteristics between the plurality of joint points as the space-time diagram convolution network model, and the space-time diagram connection characteristics comprise the space-time diagram.

9. A moving object motion recognition method, comprising:

Receiving a motion recognition request from a client, wherein the information carried in the motion recognition request comprises a motion image of a motion object to be recognized, and the motion object comprises one of a human motion object, a human-like motion object, an animal motion object and a machine simulation motion object;

Based on the motion recognition request, calling a software service on a server to analyze the motion image of the moving object to obtain a joint point sequence of the moving object, and performing motion recognition on the joint point sequence of the moving object to obtain a recognition result of the motion of the moving object, wherein the software service is used for providing a space-time diagram convolution network model, the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, a multi-layer space-time diagram convolution calculation unit and a sampling unit, wherein the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the motion of the moving object from the joint point sequence of the moving object, the space-time dynamic characteristics are used for predicting the recognition result, the space-time dynamic characteristics comprise spatial characteristics, the spatial characteristics are determined based on a plurality of scale connection characteristics corresponding to a plurality of joint points in the joint point sequence of the moving object, the scale connection characteristics are used for describing connection characteristics of direct connection and indirect connection between the plurality of joint points, the scale connection characteristics are obtained by sampling by the sampling units, each sampling unit in the plurality of sampling units respectively samples a multiple connection relation between the plurality of joint points as the plurality of scale connection characteristics of the space-time diagram convolution network model;

And feeding the identification result back to the client.

10. A moving object motion recognition apparatus, comprising:

The acquisition module is used for acquiring motion images of a moving object, wherein the moving object comprises one of a human body moving object, a human body-like moving object, an animal body moving object and a machine simulation moving object;

The analysis module is used for analyzing the motion image of the moving object to obtain a joint point sequence of the moving object;

The extraction module is used for extracting space-time dynamic characteristics of motion of a moving object from the joint point sequence of the moving object by utilizing a space-time diagram convolution network model, wherein the space-time dynamic characteristics comprise spatial characteristics, the spatial characteristics are determined based on a plurality of scale connection characteristics corresponding to a plurality of joints in the joint point sequence of the moving object, the plurality of scale connection characteristics are used for describing connection characteristics of direct connection and indirect connection among the plurality of joints, the plurality of scale connection characteristics are obtained by sampling by a plurality of sampling units, each sampling unit in the plurality of sampling units respectively samples a relation characteristic corresponding to multiple connection among the plurality of joints as the plurality of scale connection characteristics, and the plurality of sampling units are contained on a space branch in the space-time diagram convolution network model;

And the identification module is used for identifying the motion of the moving object based on the space-time dynamic characteristics.

11. A moving object motion recognition apparatus, comprising:

the device comprises an acquisition unit, a motion image acquisition unit and a motion image acquisition unit, wherein the motion image comprises one of a human body motion object, a human body motion object-like motion object, an animal body motion object and a machine simulation motion object;

The analysis unit is used for analyzing the motion image of the moving object to obtain a joint point sequence of the moving object;

The recognition unit is used for setting the moving object joint point sequence as an input parameter of a space-time diagram convolution network model and outputting a recognition result of the moving object action, wherein the space-time diagram convolution network model comprises a multi-layer space-time diagram convolution calculation unit, the multi-layer space-time diagram convolution calculation unit is used for extracting space-time dynamic characteristics of the moving object action from the moving object joint point sequence, the space-time dynamic characteristics are used for predicting the recognition result and comprise space characteristics, the space characteristics are determined based on a plurality of scale connection characteristics corresponding to a plurality of joint points in the moving object joint point sequence, the scale connection characteristics are used for describing connection characteristics of direct connection and indirect connection among the plurality of joint points, the scale connection characteristics are obtained by sampling by a plurality of sampling units, each sampling unit in the plurality of sampling units respectively samples a relation characteristic corresponding to multiple connection among the plurality of joint points as the scale connection characteristics, and the sampling units are contained on a space branch road in the space-time diagram convolution network model.

12. A nonvolatile storage medium, characterized in that the nonvolatile storage medium includes a stored program, wherein the program, when run, controls a device in which the nonvolatile storage medium is located to execute the moving object action recognition method according to any one of claims 1 to 9.

13. A moving object motion recognition apparatus, characterized by comprising:

Processor, and

A memory, coupled to the processor, for providing instructions to the processor to process the following processing steps: