CN110363086A - Image data recognition method, device, computer equipment and storage medium - Google Patents
Image data recognition method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110363086A CN110363086A CN201910503195.4A CN201910503195A CN110363086A CN 110363086 A CN110363086 A CN 110363086A CN 201910503195 A CN201910503195 A CN 201910503195A CN 110363086 A CN110363086 A CN 110363086A
- Authority
- CN
- China
- Prior art keywords
- matrix
- feature map
- target
- trained
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 239000011159 matrix material Substances 0.000 claims abstract description 307
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 98
- 238000012549 training Methods 0.000 claims description 39
- 230000006870 function Effects 0.000 claims description 29
- 238000010586 diagram Methods 0.000 claims description 28
- 238000004590 computer program Methods 0.000 claims description 27
- 238000004422 calculation algorithm Methods 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000007634 remodeling Methods 0.000 claims 3
- 241001269238 Data Species 0.000 claims 2
- 238000013528 artificial neural network Methods 0.000 claims 1
- 238000013481 data capture Methods 0.000 claims 1
- 238000003475 lamination Methods 0.000 claims 1
- 230000001537 neural effect Effects 0.000 claims 1
- 238000003062 neural network model Methods 0.000 claims 1
- 230000009467 reduction Effects 0.000 description 25
- 230000006399 behavior Effects 0.000 description 16
- 230000008569 process Effects 0.000 description 13
- 238000010606 normalization Methods 0.000 description 8
- 238000013507 mapping Methods 0.000 description 4
- 230000036544 posture Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000009191 jumping Effects 0.000 description 3
- 229940050561 matrix product Drugs 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
本申请涉及一种图数据识别方法、装置、计算机设备和存储介质。所述方法包括:获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图,获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,获取参考邻接矩阵,计算参考邻接矩阵和偏置矩阵的和,得到目标邻接矩阵,获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据对应的识别结果,根据目标输出特征图,识别出图数据对应的识别结果。通过对卷积层中的固定邻接矩阵增加根据任务需求生成的矩阵,提高已训练的卷积神经网络的识别准确率。
The present application relates to a graph data recognition method, device, computer equipment and storage medium. The method includes: obtaining the input feature map of the current convolutional layer of the trained convolutional neural network, the input feature map is a feature map obtained by extracting map data, and obtaining the bias matrix of the current convolutional layer, wherein the bias The matrix is the matrix generated when the trained convolutional neural network is generated, obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix and the bias matrix, obtain the target adjacency matrix, and obtain the convolution kernel of the current convolution layer, according to the current convolution The convolution kernel of the layer, the target adjacency matrix and the input feature map generate the target output feature map. According to the target output feature map, the recognition result corresponding to the graph data is recognized, and the recognition result corresponding to the graph data is recognized according to the target output feature map. By adding a matrix generated according to task requirements to the fixed adjacency matrix in the convolutional layer, the recognition accuracy of the trained convolutional neural network is improved.
Description
技术领域technical field
本申请涉及计算机技术领域,尤其涉及一种图数据识别方法、装置、计算机设备和存储介质。The present application relates to the field of computer technology, and in particular to a method, device, computer equipment and storage medium for identifying image data.
背景技术Background technique
在骨骼点数据中,人体是由若干预先定义好的关键关节点在相机坐标系中的坐标来表示的。它可以很方便地通过深度摄像头(例如 Kinect)以及各种姿态估计算法(例如OpenPose)获得。图1为Kinect 深度摄像机所定义的人体的关键关节点。它将人体定义为25个关键关节点的三维坐标。由于行为往往是以视频的形式存在的,所以一个长度为T帧的行为可以用Tx25x3的张量来表示。In the skeleton point data, the human body is represented by the coordinates of several predefined key joint points in the camera coordinate system. It can be easily obtained by depth cameras (such as Kinect) and various pose estimation algorithms (such as OpenPose). Figure 1 shows the key joint points of the human body defined by the Kinect depth camera. It defines the human body as the three-dimensional coordinates of 25 key joint points. Since behaviors often exist in the form of video, a behavior with a length of T frames can be represented by a Tx25x3 tensor.
参照图2,图2为一个实施例中的时空图。每个关节点定义为图的节点,关节点之间的物理连接定义为图的边,并且在相邻帧的同一个节点间加上时间维度的边,得到一张可以描述人体行为的时空图。Referring to FIG. 2, FIG. 2 is a space-time diagram in one embodiment. Each joint point is defined as a node of the graph, and the physical connection between the joint points is defined as the edge of the graph, and the edge of the time dimension is added between the same nodes of adjacent frames to obtain a space-time graph that can describe human behavior .
目前常见的基于骨骼点的行为识别方法为图卷积。图卷积和普通卷积操作不同,在图上做卷积时,每一个节点的邻节点数是不固定的,而卷积操作的参数是固定的,为了将固定数量的参数和不定数量的临节点数对应起来,需要定义映射函数,通过映射函数实现参数和节点的对应。如定义卷积核大小为三,如图3所示,三个参数分别对应于远离人体中心的点001,靠近人体中心点000的点002和卷积点本身 003。则卷积操作可以用公式(1)表示:At present, the common behavior recognition method based on skeleton points is graph convolution. Graph convolution is different from ordinary convolution operations. When convolution is performed on a graph, the number of neighbor nodes of each node is not fixed, and the parameters of the convolution operation are fixed. In order to combine a fixed number of parameters with an indeterminate number To correspond to the number of adjacent nodes, it is necessary to define a mapping function, and realize the correspondence between parameters and nodes through the mapping function. If the size of the convolution kernel is defined as three, as shown in Figure 3, the three parameters correspond to the point 001 far away from the center of the human body, the point 002 close to the center of the human body 000, and the convolution point itself 003. Then the convolution operation can be expressed by formula (1):
其中f是输入输出特征张量,w是卷积参数,v是图中节点,l代表节点与参数间的映射函数,Z是归一化函数。在具体实现时,映射函数可以通过图的邻接矩阵来实现,通过邻接矩阵表示的卷积操作如公式(2) 所示:Among them, f is the input and output feature tensor, w is the convolution parameter, v is the node in the graph, l represents the mapping function between the node and the parameter, and Z is the normalization function. In the specific implementation, the mapping function can be realized through the adjacency matrix of the graph, and the convolution operation represented by the adjacency matrix is shown in formula (2):
其中A代表图的邻接矩阵,K是卷积核大小,Λ用于对A进行归一化处理。通过与邻接矩阵A相乘,从特征张量中“筛选”出所需要的节点并与对应的参数相乘。Where A represents the adjacency matrix of the graph, K is the size of the convolution kernel, and Λ is used to normalize A. By multiplying with the adjacency matrix A, the required nodes are "filtered" from the feature tensor and multiplied with the corresponding parameters.
上述通过邻接矩阵表示的卷积操作时,邻接矩阵定义了用于图卷积网络中的人体图的拓扑结构。人体姿态多种多样,固定的拓扑结构无法准确地描述人体的每一种姿态,从而导致识别准确率低下。When the above convolution operation is represented by the adjacency matrix, the adjacency matrix defines the topology of the human body graph used in the graph convolutional network. There are various postures of the human body, and the fixed topological structure cannot accurately describe each posture of the human body, resulting in low recognition accuracy.
发明内容Contents of the invention
为了解决上述技术问题,本申请提供了一种图数据识别方法、装置、计算机设备和存储介质。In order to solve the above-mentioned technical problems, the present application provides a graph data recognition method, device, computer equipment and storage medium.
第一方面,本申请提供了一种图数据识别方法,包括:In the first aspect, the present application provides a graph data recognition method, including:
获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图;Obtain the input feature map of the current convolutional layer of the trained convolutional neural network, the input feature map is the feature map obtained by extracting the map data;
获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵;Obtain the bias matrix of the current convolutional layer, where the bias matrix is the matrix generated when the trained convolutional neural network is generated;
获取参考邻接矩阵,计算参考邻接矩阵和偏置矩阵的和,得到目标邻接矩阵;Obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix and the offset matrix, and obtain the target adjacency matrix;
获取当前卷积层的卷积核;Get the convolution kernel of the current convolution layer;
根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据对应的识别结果;According to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map, the target output feature map is generated, and the recognition result corresponding to the map data is identified according to the target output feature map;
根据目标输出特征图,识别出图数据对应的识别结果。According to the target output feature map, identify the recognition result corresponding to the map data.
第二方面,本申请提供了一种图数据识别装置,包括:In a second aspect, the present application provides a graph data identification device, including:
输入特征图获取模块,用于获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图;The input feature map acquisition module is used to obtain the input feature map of the current convolutional layer of the trained convolutional neural network, and the input feature map is a feature map obtained by extracting map data;
偏置矩阵获取模块,用于获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵;The bias matrix acquisition module is used to obtain the bias matrix of the current convolutional layer, wherein the bias matrix is a matrix generated when the trained convolutional neural network is generated;
目标邻接矩阵计算模块,用于获取参考邻接矩阵,计算参考邻接矩阵和偏置矩阵的和,得到目标邻接矩阵;The target adjacency matrix calculation module is used to obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix and the offset matrix, and obtain the target adjacency matrix;
卷积核获取模块,用于获取当前卷积层的卷积核;The convolution kernel acquisition module is used to obtain the convolution kernel of the current convolution layer;
特征图生成模块,用于根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据对应的识别结果;The feature map generation module is used to generate the target output feature map according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map, and identify the recognition result corresponding to the map data according to the target output feature map;
根据目标输出特征图,识别出图数据对应的识别结果。According to the target output feature map, identify the recognition result corresponding to the map data.
一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下步骤:A computer device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor implements the following steps when executing the computer program:
获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图;Obtain the input feature map of the current convolutional layer of the trained convolutional neural network, the input feature map is the feature map obtained by extracting the map data;
获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵;Obtain the bias matrix of the current convolutional layer, where the bias matrix is the matrix generated when the trained convolutional neural network is generated;
获取参考邻接矩阵,计算参考邻接矩阵和偏置矩阵的和,得到目标邻接矩阵;Obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix and the offset matrix, and obtain the target adjacency matrix;
获取当前卷积层的卷积核;Get the convolution kernel of the current convolution layer;
根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据对应的识别结果;According to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map, the target output feature map is generated, and the recognition result corresponding to the map data is identified according to the target output feature map;
根据目标输出特征图,识别出图数据对应的识别结果。According to the target output feature map, identify the recognition result corresponding to the map data.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图;Obtain the input feature map of the current convolutional layer of the trained convolutional neural network, the input feature map is the feature map obtained by extracting the map data;
获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵;Obtain the bias matrix of the current convolutional layer, where the bias matrix is the matrix generated when the trained convolutional neural network is generated;
获取参考邻接矩阵,计算参考邻接矩阵和偏置矩阵的和,得到目标邻接矩阵;Obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix and the offset matrix, and obtain the target adjacency matrix;
获取当前卷积层的卷积核;Get the convolution kernel of the current convolution layer;
根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据对应的识别结果;According to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map, the target output feature map is generated, and the recognition result corresponding to the map data is identified according to the target output feature map;
根据目标输出特征图,识别出图数据对应的识别结果。According to the target output feature map, identify the recognition result corresponding to the map data.
上述图数据识别方法、装置、计算机设备和存储介质,所述方法包括:获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图,获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,获取参考邻接矩阵,计算参考邻接矩阵和偏置矩阵的和,得到目标邻接矩阵,获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据对应的识别结果。对已训练的卷积神经网络中各个卷积层中的邻接矩阵增加偏置矩阵,偏置矩阵为已训练的卷积神经网络生成后得到的矩阵,增加偏置矩阵能够更好的表达人体姿态,提高生成特征图的准确性,进而提高已训练的卷积神经网络的识别准确率。The above graph data identification method, device, computer equipment and storage medium, said method comprising: obtaining the input feature map of the current convolutional layer of the trained convolutional neural network, the input feature map is a feature map obtained by extracting the graph data , to obtain the bias matrix of the current convolutional layer, where the bias matrix is the matrix generated when the trained convolutional neural network is generated, obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix and the bias matrix, and obtain the target adjacency matrix, Obtain the convolution kernel of the current convolution layer, generate the target output feature map according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map, and identify the recognition result corresponding to the graph data according to the target output feature map. Add a bias matrix to the adjacency matrix in each convolutional layer in the trained convolutional neural network. The bias matrix is the matrix generated by the trained convolutional neural network. Adding the bias matrix can better express the human body posture , improve the accuracy of the generated feature map, and then improve the recognition accuracy of the trained convolutional neural network.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, for those of ordinary skill in the art, In other words, other drawings can also be obtained from these drawings without paying creative labor.
图1为一个实施例中Kinect深度摄像机所定义的人体的关键关节点的示意图;Fig. 1 is the schematic diagram of the key joint point of the human body defined by Kinect depth camera in an embodiment;
图2为一个实施例中描述人体行为的时空图;Fig. 2 is a space-time diagram describing human behavior in one embodiment;
图3为一个实施例中图卷积中定义的节点示意图;Fig. 3 is a schematic diagram of nodes defined in graph convolution in an embodiment;
图4为一个实施例中图数据识别方法的应用环境图;Fig. 4 is an application environment diagram of the image data identification method in one embodiment;
图5为一个实施例中图数据识别方法的流程示意图;Fig. 5 is a schematic flow chart of a method for identifying image data in an embodiment;
图6为一个实施例中图数据识别装置的结构框图;Fig. 6 is a structural block diagram of an image data identification device in an embodiment;
图7为一个实施例中计算机设备的内部结构图。Figure 7 is an internal block diagram of a computer device in one embodiment.
图8为一个实施例中计算机设备的内部结构图。Figure 8 is a diagram of the internal structure of a computer device in one embodiment.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, but not all of them. Based on the embodiments in the present application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present application.
图4为一个实施例中图数据识别方法的应用环境图。参照图4,该图数据识别方法应用于行为识别系统。该行为识别系统包括终端110 和服务器120。终端110和服务器120通过网络连接。终端或服务器获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图,获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,获取参考邻接矩阵,计算参考邻接矩阵和偏置矩阵的和,得到目标邻接矩阵,获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据对应的识别结果。终端110具体可以是台式终端或移动终端,移动终端具体可以手机、平板电脑、笔记本电脑等中的至少一种。服务器 120可以用独立的服务器或者是多个服务器组成的服务器集群来实现。Fig. 4 is an application environment diagram of the image data identification method in an embodiment. Referring to Fig. 4, the data recognition method in this graph is applied to a behavior recognition system. The behavior recognition system includes a terminal 110 and a server 120 . Terminal 110 and server 120 are connected via a network. The terminal or server obtains the input feature map of the current convolutional layer of the trained convolutional neural network, the input feature map is the feature map obtained by extracting the image data, and obtains the bias matrix of the current convolutional layer, where the bias matrix is Generate the matrix generated when the trained convolutional neural network is generated, obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix and the bias matrix, obtain the target adjacency matrix, obtain the convolution kernel of the current convolution layer, according to the current convolution layer The convolution kernel, the target adjacency matrix and the input feature map generate the target output feature map, and identify the recognition result corresponding to the graph data according to the target output feature map. The terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 can be implemented by an independent server or a server cluster composed of multiple servers.
如图5所示,在一个实施例中,提供了一种图数据识别方法。本实施例主要以该方法应用于上述图4中的终端110(或服务器120)来举例说明。参照图2,该图数据识别方法具体包括如下步骤:As shown in FIG. 5 , in one embodiment, a method for identifying graph data is provided. This embodiment is mainly described by taking the method applied to the terminal 110 (or server 120) in FIG. 4 above as an example. Referring to Fig. 2, the data identification method in this figure specifically includes the following steps:
步骤S201,获取输入已训练的卷积神经网络的当前卷积层的输入特征图。Step S201, obtaining the input feature map of the current convolutional layer of the trained convolutional neural network.
在本具体实施例中,输入特征图是通过提取图数据得到的特征图。In this specific embodiment, the input feature map is a feature map obtained by extracting map data.
具体地,已训练的卷积神经网络是指通过大量的携带标签的图数据训练得到的,其中图数据为人体行为的时空图,时空图如图2所示。图数据携带的标签包括人体的行为,如拍手、跳跃、拉手和打架等个人行为或多人行为。已训练的卷积神经网络包含多个卷积层,当前卷积层可以为卷积神经网络中的任意一个卷积层。上一个卷积层的输出数据为当前卷积层的输入数据,获取上一个卷积层中的输出数据作为当前卷积层的输入数据,输入数据为输入特征图。Specifically, the trained convolutional neural network is obtained by training a large amount of labeled graph data, where the graph data is a spatio-temporal graph of human behavior, and the spatio-temporal graph is shown in Figure 2. The labels carried by the graph data include human behaviors, such as individual or multi-person behaviors such as clapping, jumping, pulling hands, and fighting. The trained convolutional neural network contains multiple convolutional layers, and the current convolutional layer can be any convolutional layer in the convolutional neural network. The output data of the previous convolutional layer is the input data of the current convolutional layer, and the output data of the previous convolutional layer is obtained as the input data of the current convolutional layer, and the input data is the input feature map.
步骤S202,获取当前卷积层的偏置矩阵。Step S202, obtaining the bias matrix of the current convolutional layer.
在本具体实施例中,偏置矩阵为生成已训练的卷积神经网络时生成的矩阵。In this specific embodiment, the bias matrix is a matrix generated when the trained convolutional neural network is generated.
步骤S203,获取参考邻接矩阵,计算参考邻接矩阵和偏置矩阵的和,得到目标邻接矩阵。Step S203, obtaining the reference adjacency matrix, calculating the sum of the reference adjacency matrix and the offset matrix, and obtaining the target adjacency matrix.
具体地,偏置矩阵是用于对参考邻接矩阵进行调整的矩阵,偏置矩阵与参考邻接矩阵具体相同的维度信息。偏置矩阵是根据训练需求得到的偏置矩阵,不同的训练需求,是指卷积神经网络训练后用于做什么的需求,如用于识别拍手和用于识别打架时,得到的偏置矩阵不相同。参考邻接矩阵用于图卷积网络中的人体图的拓扑结构,为一个包含固定矩阵元素的矩阵。偏置矩阵为根据生成的已训练的卷积神经网络生成的矩阵,各个卷积层对的偏置矩阵中对应位置的矩阵元素可以不相同。偏置矩阵的矩阵元素是通过生成已训练的卷积神经网络得到的。即偏置矩阵为生成已训练的卷积神经网络中的网络参数,在训练卷积神经网络过程中,网络参数的更新包括偏置矩阵的更新。计算参考邻接矩阵和偏置矩阵对应位置的矩阵元素的和,得到目标邻接矩阵。通过偏置矩阵调整参考邻接矩阵,得到目标邻接矩阵,采用目标邻接矩阵对人体行为进行描述,得到更为准确的人体行为描述。Specifically, the offset matrix is a matrix used to adjust the reference adjacency matrix, and the offset matrix and the reference adjacency matrix specifically have the same dimension information. The offset matrix is the offset matrix obtained according to the training requirements. Different training requirements refer to the requirements for what to do after the convolutional neural network is trained, such as the offset matrix obtained when it is used to identify clapping hands and for identifying fights. Are not the same. The reference adjacency matrix is used for the topology of human body graphs in graph convolutional networks, as a matrix containing fixed matrix elements. The bias matrix is a matrix generated according to the generated trained convolutional neural network, and the matrix elements at corresponding positions in the bias matrix of each convolution layer pair may be different. The matrix elements of the bias matrix are obtained by generating the trained convolutional neural network. That is, the bias matrix is to generate the network parameters in the trained convolutional neural network, and in the process of training the convolutional neural network, the update of the network parameters includes the update of the bias matrix. Calculate the sum of the matrix elements in the corresponding positions of the reference adjacency matrix and the offset matrix to obtain the target adjacency matrix. The reference adjacency matrix is adjusted by the bias matrix to obtain the target adjacency matrix, and the target adjacency matrix is used to describe human behavior, and a more accurate description of human behavior is obtained.
在一个实施例中,生成已训练的卷积神经网络,包括:获取包含多个训练图数据的训练集合,各个训练图数据包含对应的标签信息,通过初始卷积神经网络对各个训练图数据进行识别,得到对应的识别结果,根据预设损失函数计算各个训练图数据的识别结果与标签的损失值,当损失值小于或等于预设差损失值时,得到已训练的卷积神经网络。In one embodiment, generating a trained convolutional neural network includes: obtaining a training set that includes a plurality of training graph data, each training graph data includes corresponding label information, and performing an initial convolutional neural network on each training graph data Recognition, obtain the corresponding recognition results, calculate the recognition results of each training image data and the loss value of the label according to the preset loss function, and when the loss value is less than or equal to the preset difference loss value, the trained convolutional neural network is obtained.
具体地,训练图数据为采集的时空图。标签信息用于标识训练图数据。标签信息包括人体行为标签、图编号等等。将各个训练图数据输入初始卷积神经网络中,通过初始卷积神经网络对训练图数据特征提取,根据提取的特征识别得到对应的识别结果,根据预设损失函数计算各个图数据的识别结果与标签信息的损失值,根据损失值确定初始卷积神经网络是否收敛。其中预设损失函数是预先设置的用于计算网络的损失值的函数,该损失函数可以采用常见的网络的损失函数。当损失值小于或等于预设损失值时,初始卷积神经网络收敛,得到已训练的卷积神经网络。Specifically, the training graph data is the collected spatio-temporal graph. Label information is used to identify the training graph data. Label information includes human behavior labels, figure numbers, and so on. Input the data of each training graph into the initial convolutional neural network, extract the features of the training graph data through the initial convolutional neural network, and obtain the corresponding recognition results according to the extracted feature recognition, and calculate the recognition results of each graph data according to the preset loss function. The loss value of the label information, according to the loss value to determine whether the initial convolutional neural network converges. The preset loss function is a preset function used to calculate the loss value of the network, and the loss function may adopt a common network loss function. When the loss value is less than or equal to the preset loss value, the initial convolutional neural network converges to obtain a trained convolutional neural network.
在一个实施例中,当损失值大于预设损失值时,通过梯度回传算法回传损失值,更新初始卷积神经网络的网络参数,采用更新了网络参数的初始卷积神经网络对个训练图数据进行识别,得到对应的识别结果,直至识别结果与标签信息之间的损失值小于预设损失值时,得到已训练的卷积神经网络。In one embodiment, when the loss value is greater than the preset loss value, the loss value is returned through the gradient return algorithm, the network parameters of the initial convolutional neural network are updated, and the initial convolutional neural network with updated network parameters is used for training The graph data is recognized, and the corresponding recognition result is obtained. When the loss value between the recognition result and the label information is less than the preset loss value, the trained convolutional neural network is obtained.
具体地,梯度回传算法是采用梯度下降的方法追层对网络参数进行更新的方法。当损失值大于预设损失值,表示初始卷积神经网络未收敛,需要更新网络参数。网络参数的更新是根据最后一层的输出结果与真实输出结果之前的差异度确定的,即根据通过梯度回传算法回传损失值,根据回传的损失值逐层更新初始卷积神经网络的网络参数。对更新了网络参数的初始卷积神经网络对训练图数据进行再次识别,根据识别结果与标签信息之的损失值,再次确定网络是否收敛,不收敛,继续更新网络参数,直至网络收敛,得到已训练的卷积神经网络。Specifically, the gradient backpropagation algorithm is a method of updating network parameters by chasing layers using a gradient descent method. When the loss value is greater than the preset loss value, it means that the initial convolutional neural network has not converged, and the network parameters need to be updated. The update of network parameters is determined according to the difference between the output result of the last layer and the real output result, that is, according to the return loss value through the gradient return algorithm, the initial convolutional neural network is updated layer by layer according to the return loss value. Network parameters. Re-identify the training image data for the initial convolutional neural network with updated network parameters. According to the loss value between the recognition result and the label information, determine whether the network is converged again. If not, continue to update the network parameters until the network converges. Trained Convolutional Neural Network.
在一个实施例中,初始卷积神经网络模型包括至少一个卷积层,卷积层中包括初始偏置矩阵,通过梯度回传算法回传差异度,更新初始卷积神经网络的网络参数,包括:通过梯度回传算法将损失值回传到任意一个卷积层时,得到各个卷积层的回传值,根据各个卷积层的回传值更新偏置矩阵的参数。In one embodiment, the initial convolutional neural network model includes at least one convolutional layer, the convolutional layer includes an initial bias matrix, and returns the degree of difference through the gradient return algorithm, and updates the network parameters of the initial convolutional neural network, including : When the loss value is returned to any convolution layer through the gradient return algorithm, the return value of each convolution layer is obtained, and the parameters of the bias matrix are updated according to the return value of each convolution layer.
具体地,预设损失函数是预先配置的用于计算网络的损失值的函数,预设损失函数可以为常见的损失函数,如交叉熵损失函数和平方损失函数等等。梯度回传算法是指将错误识别结果与真实标签之间的差异数据从卷积神经网络的最终输出层逐层往下传递,每一层根据差异数据,即回传值进行网络参数的更新,当回传到任意一个卷积层时,根据该卷积层的回传值更新网络参数,其中网络参数包括偏置矩阵的参数。Specifically, the preset loss function is a pre-configured function used to calculate the loss value of the network, and the preset loss function may be a common loss function, such as a cross-entropy loss function, a square loss function, and the like. The gradient return algorithm refers to passing the difference data between the wrong recognition result and the real label from the final output layer of the convolutional neural network layer by layer, and each layer updates the network parameters according to the difference data, that is, the return value. When it is returned to any convolutional layer, the network parameters are updated according to the returned value of the convolutional layer, wherein the network parameters include the parameters of the bias matrix.
步骤S204,获取当前卷积层的卷积核。Step S204, acquiring the convolution kernel of the current convolution layer.
具体地,每个卷积层包含多个卷积核,各个卷积层对应的卷积核数量可以相同或不同,各个卷积核可以相同也可以不相同。卷积核是用于对图像进行卷积运算,不同的卷积核可以提取不同的图像特征。Specifically, each convolution layer includes multiple convolution kernels, the number of convolution kernels corresponding to each convolution layer may be the same or different, and each convolution kernel may be the same or different. Convolution kernels are used to perform convolution operations on images, and different convolution kernels can extract different image features.
步骤S205,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图。Step S205, generating a target output feature map according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map.
具体地,采用目标邻接矩阵对输入特征图进行特征提取,得到特征图。采用卷积核对特征图进行卷积运算,得到卷积特征图,将卷积特征图作为目标输出特征图。采用目标邻接矩阵对输入特征图进行特征提取,得到更为准确的图像特征。Specifically, the target adjacency matrix is used to perform feature extraction on the input feature map to obtain the feature map. The convolution kernel is used to perform convolution operation on the feature map to obtain the convolution feature map, and the convolution feature map is used as the target output feature map. The target adjacency matrix is used to extract features from the input feature map to obtain more accurate image features.
在一个实施例中,步骤S205,包括:In one embodiment, step S205 includes:
步骤S2051,重塑输入特征图,得到重塑特征图。Step S2051, reshaping the input feature map to obtain the reshaping feature map.
在本具体实施例中,重塑特征图的第一维度为输入特征图的第一维度和第二维度的乘积。In this specific embodiment, the first dimension of the reshaped feature map is the product of the first dimension and the second dimension of the input feature map.
具体地,重塑是指对输入特征图进行调整,使得第一维度和第二维度的乘积为重塑特征图的第一维度,如将包含三个维度的输入特征图调整成2个维度的重塑特征图,假设输入特征图为C×M×N,其中第一维度为C,第二维度为M,第三维度为N,则可以重塑图为C×M×N,第一维度为C和M的乘积CM,第二维度与输入特征图的第三维图相同,保持整个输入特征图的元素不变,总元素为C、M和N的乘积CMN。其中第一维度C为通道素,第二维度M为输入特征图的行数,第三维度N为输入特征图的列数。其中N代表的是人体关节点的数量,在Kinect中N定义为25。对矩阵进行重塑是为了方便运算。Specifically, reshaping refers to adjusting the input feature map so that the product of the first dimension and the second dimension is the first dimension of the reshaping feature map, such as adjusting the input feature map containing three dimensions into two dimensions. Reshape the feature map, assuming that the input feature map is C×M×N, where the first dimension is C, the second dimension is M, and the third dimension is N, then the reshaping map can be C×M×N, the first dimension It is the product CM of C and M, the second dimension is the same as the third dimension of the input feature map, keeping the elements of the entire input feature map unchanged, and the total element is the product CMN of C, M and N. The first dimension C is the channel pixel, the second dimension M is the number of rows of the input feature map, and the third dimension N is the number of columns of the input feature map. Among them, N represents the number of joint points of the human body, and N is defined as 25 in Kinect. The reshaping of the matrix is for convenience of operation.
步骤S2052,计算重塑特征图和目标邻接矩阵的各个通道的矩阵的乘积,得到各个通道的第二乘积矩阵。Step S2052, calculating the product of the reshaped feature map and the matrix of each channel of the target adjacency matrix to obtain a second product matrix of each channel.
具体地,重塑特征图的第二维度与目标邻接矩阵的各个通道的矩阵的第一维度相同,如重塑特征图为C×M×N,目标邻接矩阵为C×N×N,各个通道矩阵为N×N,则重塑特征图与各个通道的矩阵的乘积矩阵为 CMN。Specifically, the second dimension of the reshaped feature map is the same as the first dimension of the matrix of each channel of the target adjacency matrix. For example, the reshaped feature map is C×M×N, the target adjacency matrix is C×N×N, and each channel The matrix is N×N, then the product matrix of the reshaped feature map and the matrix of each channel is CMN.
步骤S2053,反重塑各个通道的第二乘积矩阵,得到各个通道的反重塑特征图。Step S2053, dereshaping the second product matrix of each channel to obtain the deremodeling feature map of each channel.
具体地,反重塑是重塑的逆过程,如重塑是将三维矩阵转换为二位矩阵,则反重塑为将二维矩阵转换为三维矩阵,如各个通道的乘积矩阵为CM×N,则反重塑后得到的反重塑特征图为C×M×N的矩阵。Specifically, anti-reshaping is the inverse process of reshaping. For example, reshaping is to convert a three-dimensional matrix into a two-dimensional matrix, then anti-reshaping is to convert a two-dimensional matrix into a three-dimensional matrix. For example, the product matrix of each channel is CM×N , then the anti-reshaping feature map obtained after anti-reshaping is a C×M×N matrix.
步骤S2054,根据各个通道的卷积核,根据各个通道的卷积核对反重塑特征图进行卷积运算,得到当前卷积层各个通道的目标特征图。Step S2054, according to the convolution kernel of each channel, perform convolution operation on the anti-reshaping feature map according to the convolution kernel of each channel, and obtain the target feature map of each channel of the current convolution layer.
步骤S2055,对各个通道的目标特征图进行求和,得到当前卷积层的输出特征图,将当前卷积层的输出特征图作为目标输出特征图。In step S2055, the target feature map of each channel is summed to obtain the output feature map of the current convolution layer, and the output feature map of the current convolution layer is used as the target output feature map.
具体地,通过各个通道对应的卷积核,对反重塑特征图进行特征提取,得到各个卷积核对应的特征,由各个卷积核提取到的特征组成各个通道的目标特征图。计算各个通道的目标特征图的和,即对应位置的矩阵元素相加,得到输出特征图,该输出特征图为当前卷积层的输出特征图。Specifically, through the convolution kernels corresponding to each channel, feature extraction is performed on the anti-reshaping feature map to obtain the features corresponding to each convolution kernel, and the features extracted by each convolution kernel form the target feature map of each channel. Calculate the sum of the target feature maps of each channel, that is, add the matrix elements at the corresponding positions to obtain the output feature map, which is the output feature map of the current convolutional layer.
在一个实施例中,步骤S205,还包括:In one embodiment, step S205 further includes:
步骤S2056,判断输出特征图的通道数是否与输入特征图的通道数一致。Step S2056, judging whether the number of channels of the output feature map is consistent with the number of channels of the input feature map.
步骤S2057,当一致时,将输入特征图对输出特征图的和,作为当前卷积层的目标输出特征图。Step S2057, when consistent, use the sum of the input feature map and the output feature map as the target output feature map of the current convolutional layer.
步骤S2058,当不一致时,对输入特征图进行卷积运算,得到与输出特征图的通道数一致的卷积特征图,将卷积特征图与输出特征图的和,作为目标输出特征图。Step S2058, if they are inconsistent, perform a convolution operation on the input feature map to obtain a convolution feature map with the same number of channels as the output feature map, and use the sum of the convolution feature map and the output feature map as the target output feature map.
具体地,根据各个通道的卷积核和对应的额反重塑矩阵生成的输出特征图的各个通道矩阵,判断输出特征图是通道是否与输入特征图的通道数一致,当一致时,输入特征图与输出特征对应的位置的元素进行相加,得到目标输出特征图。当不一致时,对输入特征图进行卷积运算,得到与输出特征图具有相同通道的卷积特征图,计算卷积特征图与输出特征图相同位置的元素的和,得到目标输出特征图。通过判断输入特征图与输出特征图的通道数,根据通道数确定目标输出特征图,提高了目标输出特征图的准确性。Specifically, according to each channel matrix of the output feature map generated by the convolution kernel of each channel and the corresponding frontal inverse reshaping matrix, it is judged whether the channel of the output feature map is consistent with the number of channels of the input feature map. When they are consistent, the input The feature map is added to the element at the position corresponding to the output feature to obtain the target output feature map. When inconsistent, perform convolution operation on the input feature map to obtain a convolution feature map with the same channel as the output feature map, calculate the sum of the elements at the same position in the convolution feature map and the output feature map, and obtain the target output feature map. By judging the channel numbers of the input feature map and the output feature map, the target output feature map is determined according to the channel number, which improves the accuracy of the target output feature map.
步骤S206,根据目标输出特征图,识别出图数据对应的识别结果。Step S206, according to the target output feature map, identify the recognition result corresponding to the map data.
具体地,将目标输出特征图输入已训练的卷积神经网络中的识别层,通过识别层识别目标输出特征图对应的候选识别结果,从候选识别结果中选择识别概率最大的候选识别结果,作为目标识别结果,将目标识别结果作为图数据对应的识别结果。如识别类型包括拍手、跳跃、牵手三个类型时,其中拍手对应的识别概率为0.89,跳跃对应的识别概率为0.01,牵手对应的识别概率为0.1时,则图数据对应的识别结果为拍手。Specifically, the target output feature map is input into the recognition layer in the trained convolutional neural network, the candidate recognition results corresponding to the target output feature map are identified through the recognition layer, and the candidate recognition result with the highest recognition probability is selected from the candidate recognition results as Target recognition result, the target recognition result is used as the recognition result corresponding to the graph data. For example, when the recognition types include clapping, jumping, and holding hands, the recognition probability corresponding to clapping is 0.89, the recognition probability corresponding to jumping is 0.01, and the recognition probability corresponding to holding hands is 0.1, then the recognition result corresponding to the graph data is clapping.
在一个实施例中,当已训练的卷积神经网络中包括当前卷积层,和当前卷积层的下一个卷积层时,将目标输出特征图作为下一个卷积层的输入特征图,将下一卷积层作为当前卷积层,进入获取输入已训练的卷积神经网络的当前卷积层的输入特征图,直至已训练的卷积神经网络中的各个卷积层都完成时,输出最后一个卷积层的目标输出特征图,将最后一个卷积层的目标输出特征图输入识别层,得到图书对应的识别结果。具有相同网络结构的卷积层的数据处理流程相同。In one embodiment, when the trained convolutional neural network includes the current convolutional layer and the next convolutional layer of the current convolutional layer, the target output feature map is used as the input feature map of the next convolutional layer, Use the next convolutional layer as the current convolutional layer, and enter to obtain the input feature map of the current convolutional layer of the trained convolutional neural network until all convolutional layers in the trained convolutional neural network are completed. Output the target output feature map of the last convolutional layer, input the target output feature map of the last convolutional layer into the recognition layer, and obtain the corresponding recognition result of the book. The data processing flow of convolutional layers with the same network structure is the same.
上述图数据识别方法,包括:获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图,获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,获取参考邻接矩阵,计算参考邻接矩阵和偏置矩阵的和,得到目标邻接矩阵,获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据对应的识别结果。对已训练的卷积神经网络中各个卷积层中的邻接矩阵增加偏置矩阵,偏置矩阵为已训练的卷积神经网络生成后得到的矩阵,该偏置矩阵能够更好的表达人体姿态,提高生成特征图的准确性,进而提高已训练的卷积神经网络的识别准确率。The above graph data recognition method includes: obtaining the input feature map of the current convolutional layer of the trained convolutional neural network, the input feature map is a feature map obtained by extracting the graph data, and obtaining the bias matrix of the current convolutional layer, The bias matrix is the matrix generated when the trained convolutional neural network is generated, obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix and the bias matrix, obtain the target adjacency matrix, and obtain the convolution kernel of the current convolution layer, according to The convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map generate the target output feature map, and identify the recognition result corresponding to the graph data according to the target output feature map. Add an offset matrix to the adjacency matrix in each convolutional layer in the trained convolutional neural network. The offset matrix is the matrix generated by the trained convolutional neural network. The offset matrix can better express the posture of the human body , improve the accuracy of the generated feature map, and then improve the recognition accuracy of the trained convolutional neural network.
在一个具体的实施例中,特征矩阵生成的方法,包括:参照图6,图6为一个实施例中卷积层的数据处理流程示意图,其中fin为当前卷积层的输入特征图,fout为当前卷积层的输出特征图。输出特征图采用输入特征图的具体表示方式如公式(3)所示:In a specific embodiment, the method for feature matrix generation includes: Referring to FIG. 6, FIG. 6 is a schematic diagram of a data processing flow diagram of a convolutional layer in an embodiment, wherein f in is the input feature map of the current convolutional layer, and f out is the output feature map of the current convolutional layer. The output feature map uses the specific representation of the input feature map as shown in formula (3):
其中,Ak为参考邻接矩阵中的第k个邻接矩阵,Bk为偏置矩阵中的第k 个邻接矩阵,Wk为卷积核的第k个参数,Kv为卷积核的大小,卷积核的大小可以自定义,如可以设置Kv=3或5。假设输入特征图的尺寸信息为Cin×T×N,其中C代表通道数,T代表图数据的帧数,N代表kinect 定义的关节节点数,其中N=25。对输入特征图进行重塑得到CinT×N的重塑特征图,偏置矩阵Bk为训练卷积神经网络后得到的矩阵,Bk与Ak具有相同的尺寸信息,即为N×N,计算偏置矩阵Bk与参考邻接矩阵Ak的和,得到目标邻接矩阵的各个通道的矩阵,计算目标邻接矩阵各个通道的矩阵与重塑特征图的乘积,得到各个通道的第二乘积矩阵,并对各个通道的第二乘积矩阵进行反重塑,得到反重塑矩阵,获取个卷积核Wk,其,通过各个通道的卷积核对反重塑矩阵进行卷积运算,得到各个通道对应的输出特征图。判断输出特征图是否与输入特征图的通道数是否一致,当不一致时,通过残差网络res,其中残差网络中的卷积核大小为1×1。将输入特征图调整致与输出特征图的通道数一致的矩阵,计算调整后的输入特征图与输出特征图的和,得到目标输出特征图。当一致时,计算输入特征图与输出特征图的和,得到目标输出特征图。根据目标输出特征图对各个图数据的行为进行识别,得到对应的识别结果。Among them, A k is the kth adjacency matrix in the reference adjacency matrix, B k is the kth adjacency matrix in the offset matrix, W k is the kth parameter of the convolution kernel, Kv is the size of the convolution kernel, The size of the convolution kernel can be customized, for example, K v =3 or 5 can be set. Assume that the size information of the input feature map is C in ×T×N, where C represents the number of channels, T represents the number of frames of graph data, and N represents the number of joint nodes defined by kinect, where N=25. Reshape the input feature map to obtain the reshaped feature map of C in T×N. The bias matrix B k is the matrix obtained after training the convolutional neural network. B k and A k have the same size information, which is N× N, calculate the sum of the bias matrix B k and the reference adjacency matrix A k to obtain the matrix of each channel of the target adjacency matrix, calculate the product of the matrix of each channel of the target adjacency matrix and the reshaped feature map, and obtain the second product of each channel Matrix, and dereshape the second product matrix of each channel to obtain the deremodeling matrix, and obtain a convolution kernel W k , which, through the convolution kernel of each channel, perform convolution operation on the deremodeling matrix to obtain each The output feature map corresponding to the channel. Determine whether the output feature map is consistent with the number of channels of the input feature map. If not, pass the residual network res, where the convolution kernel size in the residual network is 1×1. Adjust the input feature map to a matrix with the same number of channels as the output feature map, calculate the sum of the adjusted input feature map and output feature map, and obtain the target output feature map. When consistent, calculate the sum of the input feature map and the output feature map to obtain the target output feature map. According to the target output feature map, the behavior of each graph data is recognized, and the corresponding recognition result is obtained.
当上述特征图的生过程为训练卷积神经网络的数据处理过程,且当各个图数据对应的识别与图数据的标签中的类别不一致时,根据预设损失函数各个图数据对应的损失值,根据梯度回传算法回传损失值,得到卷积层的回传值,根据回传值更新对应的卷积层的各个通道的卷积核w的参数、偏置矩阵B的参数。When the generation process of the above feature map is the data processing process of training the convolutional neural network, and when the identification corresponding to each graph data is inconsistent with the category in the label of the graph data, according to the loss value corresponding to each graph data according to the preset loss function, Return the loss value according to the gradient return algorithm to obtain the return value of the convolution layer, and update the parameters of the convolution kernel w and the parameters of the bias matrix B of each channel of the corresponding convolution layer according to the return value.
当上述特征图的生过程为训练后的已训练的卷积神经网络的数据处理过程时,根据目标输出特征图识别的结果作为图数据对应的识别结果。将图数据输入已训练的卷积神经网路,卷积神经网络包含多个卷积层和识别层,各个卷积层包括卷积核和目标邻接矩阵,通过各个卷积层的目标邻接矩阵对图数据进行特征提取,得到对应的图像特征图集合,通过卷积核对图像特征集合进行卷积运算,得到各个卷积层的目标输出特征图,识别层的上一个卷积层的目标输出特征图作为识别层的输入数据,根据各个图数据的目标输出特征图,识别出对应的人体行为类型。When the above feature map generation process is the data processing process of the trained convolutional neural network, the recognition result of the feature map is output according to the target as the recognition result corresponding to the map data. Input the graph data into the trained convolutional neural network. The convolutional neural network includes multiple convolutional layers and recognition layers. Each convolutional layer includes a convolution kernel and a target adjacency matrix. The target adjacency matrix of each convolutional layer is used to Feature extraction is performed on the image data to obtain the corresponding image feature map set, and the convolution operation is performed on the image feature set through the convolution kernel to obtain the target output feature map of each convolution layer, and the target output feature map of the previous convolution layer of the recognition layer As the input data of the identification layer, according to the target output feature map of each graph data, the corresponding human behavior type is identified.
上述特征图生成过程中偏置矩阵是根据数据库统计出来的适应于行为识别任务的图,偏置矩阵会作为网络的参数在训练过程中根据分类损失不断更新,它对于每一个卷积层都是不同的。由于偏置矩阵是一个根据识别任务统计得到的矩阵,从而根据偏置矩阵得到的目标邻接矩阵,能够更好的对输入数据进行特征提取,得到准确的目标输出特征图,故根据目标输出特征图进行识别时,识别结果更为准确。In the above feature map generation process, the offset matrix is a graph adapted to the behavior recognition task based on the statistics of the database. The offset matrix will be used as a parameter of the network and continuously updated according to the classification loss during the training process. It is for each convolutional layer. different. Since the bias matrix is a matrix obtained according to the statistics of the recognition task, the target adjacency matrix obtained according to the bias matrix can better extract the features of the input data and obtain an accurate target output feature map. Therefore, according to the target output feature map When performing recognition, the recognition result is more accurate.
图5为一个实施例中图数据识别方法的流程示意图。应该理解的是,虽然图5的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图5中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。Fig. 5 is a schematic flowchart of a method for identifying image data in an embodiment. It should be understood that although the various steps in the flow chart of FIG. 5 are shown sequentially as indicated by the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Fig. 5 may include multiple sub-steps or multiple stages, these sub-steps or stages are not necessarily executed at the same moment, but may be executed at different moments, the execution of these sub-steps or stages The order is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在一个实施例中,如图7所示,提供了一种图数据识别装置200,包括:In one embodiment, as shown in FIG. 7 , a graph data identification device 200 is provided, including:
输入特征图获取模块201,用于获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图。The input feature map acquisition module 201 is configured to acquire the input feature map of the current convolutional layer of the trained convolutional neural network, where the input feature map is a feature map obtained by extracting map data.
偏置矩阵获取模块202,用于获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵。The offset matrix acquiring module 202 is configured to acquire the offset matrix of the current convolutional layer, wherein the offset matrix is a matrix generated when the trained convolutional neural network is generated.
目标邻接矩阵计算模块203,用于获取参考邻接矩阵,计算参考邻接矩阵和偏置矩阵的和,得到目标邻接矩阵。The target adjacency matrix calculation module 203 is configured to obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix and the offset matrix, and obtain the target adjacency matrix.
卷积核获取模块204,用于获取当前卷积层的卷积核。The convolution kernel acquisition module 204 is configured to acquire the convolution kernel of the current convolution layer.
特征图生成模块205,用于根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据对应的识别结果。The feature map generation module 205 is used to generate a target output feature map according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map, and identify the recognition result corresponding to the map data according to the target output feature map.
识别模块206,用于根据目标输出特征图,识别出图数据对应的识别结果。The identification module 206 is configured to identify the identification result corresponding to the image data according to the target output feature map.
在一个实施例中,目标邻接矩阵计算模块还用于对当前卷积层的输入特征图进行降维,得到降维矩阵,归一化降维矩阵,得到归一化矩阵,将归一化矩阵的各个元素与目标邻接矩阵对应的元素相加,得到更新后的目标邻接矩阵,将更新后的目标邻接矩阵作为目标邻接矩阵。In one embodiment, the target adjacency matrix calculation module is also used to reduce the dimensionality of the input feature map of the current convolutional layer to obtain a dimensionality reduction matrix, normalize the dimensionality reduction matrix, obtain a normalization matrix, and convert the normalization matrix Each element of is added to the corresponding element of the target adjacency matrix to obtain the updated target adjacency matrix, and the updated target adjacency matrix is used as the target adjacency matrix.
在一个实施例中,上述目标邻接矩阵计算模块,包括:In one embodiment, the above-mentioned target adjacency matrix calculation module includes:
降维单元,用于根据第一降维函数分别对输入特征图的各个通道的矩阵进行降维,得到各个通道对应的第一特征图,输入特征图至少包括三个维度,其中,第一维度为通道数,根据第二降维函数分别对所述特征图的各个通道的矩阵进行降维,得到各个通道对应的第二特征图;The dimensionality reduction unit is used to reduce the dimensionality of the matrix of each channel of the input feature map according to the first dimensionality reduction function to obtain the first feature map corresponding to each channel. The input feature map includes at least three dimensions, wherein the first dimension is the number of channels, respectively performing dimensionality reduction on the matrix of each channel of the feature map according to the second dimensionality reduction function, to obtain the second feature map corresponding to each channel;
第一矩阵乘积计算单元,用于计算各个通道的第一特征图和所述第二特征图的第一乘积矩阵,将各个通道第一乘积矩阵作为降维矩阵对应通道的矩阵。The first matrix product calculation unit is configured to calculate the first product matrix of the first feature map of each channel and the second feature map, and use the first product matrix of each channel as a matrix corresponding to the channel of the dimensionality reduction matrix.
归一化单元,用于归一化降维矩阵中各个通道的矩阵,得到归一化矩阵对应通道的矩阵。The normalization unit is used to normalize the matrix of each channel in the dimensionality reduction matrix to obtain the matrix of the channel corresponding to the normalization matrix.
在一个实施例中,上述图数据识别装置还包括:In one embodiment, the above image data identification device further includes:
网络生成模块,用于生成已训练的卷积神经网络。其中网络生成模块包括:Network generation module for generating trained convolutional neural networks. The network generation module includes:
训练数据获取单元,用于获取包含多个训练图数据的训练集合,各个训练图数据包含对应的标签信息。The training data acquiring unit is configured to acquire a training set including a plurality of training graph data, each training graph data including corresponding label information.
识别单元,用于通过初始卷积神经网络对各个训练图数据进行识别,得到对应的识别结果。The identification unit is configured to identify each training image data through the initial convolutional neural network to obtain corresponding identification results.
差异度计算单元,用于根据预设损失函数计算各个训练图数据的识别结果与标签的损失值。The difference calculation unit is used to calculate the recognition result of each training image data and the loss value of the label according to a preset loss function.
模型生成单元,用于当损失值小于或等于预设差损失值时,得到已训练的卷积神经网络。The model generation unit is used to obtain the trained convolutional neural network when the loss value is less than or equal to the preset difference loss value.
在一个实施中,上述网络生成模块,还包括:In one implementation, the above-mentioned network generation module further includes:
参数更新单元,用于当损失值大于预设损失值时,通过梯度回传算法回传损失值,更新初始卷积神经网络的网络参数。The parameter update unit is used to return the loss value through the gradient return algorithm when the loss value is greater than the preset loss value, and update the network parameters of the initial convolutional neural network.
模型确定单元还用于采用更新了网络参数的初始卷积神经网络对个训练图数据进行识别,得到对应的识别结果,直至识别结果与标签信息之间的损失值小于预设损失值时,得到已训练的卷积神经网络。The model determination unit is also used to use the initial convolutional neural network with updated network parameters to identify the training image data to obtain the corresponding recognition results, until the loss value between the recognition results and the label information is less than the preset loss value, it is obtained A trained convolutional neural network.
在一个实施例中,特征图生成模块,包括:In one embodiment, the feature map generation module includes:
特征图重塑单元,用于重塑输入特征图,得到重塑特征图,重塑特征图的第一维度为输入特征图的第一维度和第二维度的乘积,其中,输入特征图至少包括三个维度,第一维度为通道数,目标邻接矩阵至少包括三个维度。The feature map reshaping unit is used to reshape the input feature map to obtain the reshaped feature map. The first dimension of the reshaped feature map is the product of the first dimension and the second dimension of the input feature map, wherein the input feature map includes at least Three dimensions, the first dimension is the number of channels, and the target adjacency matrix includes at least three dimensions.
第二矩阵乘积计算单元,用于计算重塑特征图和目标邻接矩阵的各个通道的矩阵的乘积,得到各个通道的第二乘积矩阵。The second matrix product calculation unit is used to calculate the product of the reshaped feature map and the matrix of each channel of the target adjacency matrix to obtain the second product matrix of each channel.
反重塑单元,用于反重塑各个通道的第二乘积矩阵,得到各个通道的反重塑特征图。The anti-reshaping unit is used for anti-reshaping the second product matrix of each channel to obtain the anti-reshaping feature map of each channel.
卷积单元,用于根据各个通道的卷积核,根据各个通道的卷积核对反重塑特征图进行卷积运算,得到当前卷积层各个通道的目标特征图。The convolution unit is configured to perform a convolution operation on the anti-reshaping feature map according to the convolution kernel of each channel to obtain the target feature map of each channel of the current convolution layer.
特征图生成单元,用于对各个通道的所述目标特征图进行求和,得到当前卷积层的输出特征图,将当前卷积层的输出特征图作为目标输出特征图。The feature map generation unit is configured to sum the target feature maps of each channel to obtain the output feature map of the current convolution layer, and use the output feature map of the current convolution layer as the target output feature map.
在一个实施了中,上述图数据识别装置,还包括:In one implementation, the above image data identification device further includes:
通道判断模块,用于判断输出特征图的通道数是否与输入特征图的通道数一致。The channel judging module is used to judge whether the number of channels of the output feature map is consistent with the number of channels of the input feature map.
特征图生成模块还用于当一致时,将输入特征图对输出特征图的和,作为当前卷积层的目标输出特征图,当不一致时,对输入特征图进行卷积运算,得到与输出特征图的通道数一致的卷积特征图,将卷积特征图与输出特征图的和,作为目标输出特征图。The feature map generation module is also used to use the sum of the input feature map and the output feature map as the target output feature map of the current convolution layer when they are consistent, and perform convolution operations on the input feature map to obtain the output feature map when they are inconsistent The convolutional feature map with the same number of channels in the graph uses the sum of the convolutional feature map and the output feature map as the target output feature map.
图8示出了一个实施例中计算机设备的内部结构图。该计算机设备具体可以是图4中的终端110(或服务器120)。如图8所示,该计算机设备包括该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、输入装置和显示屏。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现图数据识别方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行图数据识别方法。计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。Figure 8 shows a diagram of the internal structure of a computer device in one embodiment. The computer device may specifically be the terminal 110 (or server 120) in FIG. 4 . As shown in FIG. 8 , the computer equipment includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program, and when the computer program is executed by the processor, the processor may realize the image data identification method. A computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor may execute the image data identification method. The display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touch pad provided on the casing of the computer equipment, or It can be an external keyboard, touchpad or mouse.
本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 8 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment to which the solution of this application is applied. The specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
在一个实施例中,本申请提供的图数据识别装置可以实现为一种计算机程序的形式,计算机程序可在如图8所示的计算机设备上运行。计算机设备的存储器中可存储组成该图数据识别装置的各个程序模块,比如,图7所示的特征图获取模块201、偏置矩阵获取模块202、标邻接矩阵计算模块203、卷积核获取模块204、特征图生成模块205 和识别模块206。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的图数据识别方法中的步骤。In one embodiment, the image data identification device provided in this application can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in FIG. 8 . Each program module forming the graph data recognition device can be stored in the memory of the computer equipment, for example, the feature map acquisition module 201 shown in Figure 7, the bias matrix acquisition module 202, the standard adjacency matrix calculation module 203, and the convolution kernel acquisition module 204 , feature map generation module 205 and recognition module 206 . The computer program constituted by each program module enables the processor to execute the steps in the image data identification method of each embodiment of the application described in this specification.
例如,图8所示的计算机设备可以通过如图7所示的图数据识别装置中的输入特征图获取模块201执行获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图。计算机设备可通过偏置矩阵获取模块202执行获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵。计算机设备可通过目标邻接矩阵计算模块203执行获取参考邻接矩阵,计算参考邻接矩阵和偏置矩阵的和,得到目标邻接矩阵。计算机设备可通过卷积核获取模块204执行获取当前卷积层的卷积核。计算机设备可通过特征图生成模块205执行根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图。计算机设备可通过识别模块206执行根据目标输出特征图,识别出图数据对应的识别结果。For example, the computer equipment shown in FIG. 8 can obtain the input feature map of the current convolutional layer of the trained convolutional neural network through the input feature map acquisition module 201 in the graph data recognition device as shown in FIG. 7 , input A feature map is a feature map obtained by extracting graph data. The computer device can obtain the bias matrix of the current convolutional layer through the bias matrix obtaining module 202, wherein the bias matrix is a matrix generated when the trained convolutional neural network is generated. The computer device can obtain the reference adjacency matrix through the target adjacency matrix calculation module 203, calculate the sum of the reference adjacency matrix and the offset matrix, and obtain the target adjacency matrix. The computer device can acquire the convolution kernel of the current convolution layer through the convolution kernel acquisition module 204 . The computer device can generate the target output feature map according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map through the feature map generation module 205 . The computer device can output the characteristic map according to the target through the recognition module 206, and recognize the recognition result corresponding to the map data.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现以下步骤:获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图,获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,获取参考邻接矩阵,计算参考邻接矩阵和所述偏置矩阵的和,得到目标邻接矩阵,获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据对应的识别结果。In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, the following steps are implemented: obtaining the input trained volume The input feature map of the current convolutional layer of the convolutional neural network, the input feature map is the feature map obtained by extracting the image data, and the bias matrix of the current convolutional layer is obtained, where the bias matrix is generated when the trained convolutional neural network is The generated matrix, obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix and the bias matrix, obtain the target adjacency matrix, obtain the convolution kernel of the current convolution layer, according to the convolution kernel of the current convolution layer, the target adjacency matrix Generate a target output feature map with the input feature map, and identify the recognition result corresponding to the map data according to the target output feature map.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:当前卷积层的输入特征图进行降维,得到降维矩阵,归一化降维矩阵,得到归一化矩阵,将归一化矩阵的各个元素与目标邻接矩阵对应的元素相加,得到更新后的目标邻接矩阵,将更新后的目标邻接矩阵作为目标邻接矩阵。In one embodiment, when the processor executes the computer program, the following steps are also implemented: performing dimensionality reduction on the input feature map of the current convolutional layer to obtain a dimensionality reduction matrix, normalizing the dimensionality reduction matrix, obtaining a normalization matrix, and normalizing Each element of the transformation matrix is added to the corresponding element of the target adjacency matrix to obtain the updated target adjacency matrix, and the updated target adjacency matrix is used as the target adjacency matrix.
在一个实施例中,输入特征图至少包括三个维度,其中,第一维度为通道数,包括:根据第一降维函数分别对输入特征图的各个通道的矩阵进行降维,得到各个通道对应的第一特征图,根据第二降维函数分别对特征图的各个通道的矩阵进行降维,得到各个通道对应的第二特征图,计算各个通道的第一特征图和第二特征图的第一乘积矩阵,将各个通道第一乘积矩阵作为降维矩阵对应通道的矩阵,归一化降维矩阵中各个通道的矩阵,得到归一化矩阵对应通道的矩阵。In one embodiment, the input feature map includes at least three dimensions, wherein the first dimension is the number of channels, including: performing dimensionality reduction on the matrix of each channel of the input feature map according to the first dimensionality reduction function to obtain the corresponding The first feature map of the feature map, according to the second dimensionality reduction function, the matrix of each channel of the feature map is respectively reduced in dimension, and the second feature map corresponding to each channel is obtained, and the first feature map and the second feature map of each channel are calculated. A product matrix, using the first product matrix of each channel as a matrix corresponding to the channel of the dimension reduction matrix, and normalizing the matrix of each channel in the dimension reduction matrix to obtain a matrix corresponding to the channel of the normalized matrix.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:成已训练的卷积神经网络,包括:获取包含多个训练图数据的训练集合,各个训练图数据包含对应的标签信息,通过初始卷积神经网络对各个训练图数据进行识别,得到对应的识别结果,根据预设损失函数计算各个训练图数据的识别结果与标签的损失值,当损失值小于或等于预设差损失值时,得到已训练的卷积神经网络。In one embodiment, when the processor executes the computer program, the following steps are also implemented: forming a trained convolutional neural network, including: obtaining a training set comprising a plurality of training graph data, each training graph data containing corresponding label information, through The initial convolutional neural network recognizes each training image data and obtains the corresponding recognition results, and calculates the recognition results of each training image data and the loss value of the label according to the preset loss function. When the loss value is less than or equal to the preset difference loss value , to get the trained convolutional neural network.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:当损失值大于预设损失值时,通过梯度回传算法回传损失值,更新初始卷积神经网络的网络参数,采用更新了网络参数的初始卷积神经网络对个训练图数据进行识别,得到对应的识别结果,直至识别结果与标签信息之间的损失值小于预设损失值时,得到已训练的卷积神经网络。In one embodiment, when the processor executes the computer program, the following steps are also implemented: when the loss value is greater than the preset loss value, return the loss value through the gradient return algorithm, update the network parameters of the initial convolutional neural network, and use the updated The initial convolutional neural network of the network parameters recognizes the training image data, and obtains the corresponding recognition results, until the loss value between the recognition results and the label information is less than the preset loss value, and the trained convolutional neural network is obtained.
在一个实施例中,输入特征图至少包括三个维度,其中,第一维度为通道数,目标邻接矩阵至少包括三个维度,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,包括:重塑输入特征图,得到重塑特征图,重塑特征图的第一维度为输入特征图的第一维度和第二维度的乘积,计算重塑特征图和目标邻接矩阵的各个通道的矩阵的乘积,得到各个通道的第二乘积矩阵,反重塑各个通道的第二乘积矩阵,得到各个通道的反重塑特征图根据各个通道的卷积核,根据各个通道的卷积核对反重塑特征图进行卷积运算,得到当前卷积层各个通道的目标特征图,对各个通道的目标特征图进行求和,得到当前卷积层的输出特征图,将当前卷积层的输出特征图作为目标输出特征图。In one embodiment, the input feature map includes at least three dimensions, wherein the first dimension is the number of channels, and the target adjacency matrix includes at least three dimensions, according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map Generating the target output feature map, including: reshaping the input feature map to obtain the reshaping feature map, the first dimension of the reshaping feature map is the product of the first dimension and the second dimension of the input feature map, and calculating the reshaping feature map and the target The product of the matrices of each channel of the adjacency matrix is obtained to obtain the second product matrix of each channel, and the second product matrix of each channel is inversely reshaped to obtain the inverse reshaping feature map of each channel. According to the convolution kernel of each channel, according to each channel The convolution kernel performs convolution operation on the anti-reshaping feature map to obtain the target feature map of each channel of the current convolution layer, and sums the target feature maps of each channel to obtain the output feature map of the current convolution layer, and the current convolution layer The output feature map of the product layer is used as the target output feature map.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:判断输出特征图的通道数是否与输入特征图的通道数一致,当一致时,将输入特征图对输出特征图的和,作为当前卷积层的目标输出特征图,当不一致时,对输入特征图进行卷积运算,得到与输出特征图的通道数一致的卷积特征图,将卷积特征图与输出特征图的和,作为目标输出特征图。In one embodiment, when the processor executes the computer program, the following steps are also implemented: judging whether the channel number of the output feature map is consistent with the channel number of the input feature map, and when consistent, the sum of the input feature map to the output feature map, As the target output feature map of the current convolutional layer, when it is inconsistent, perform a convolution operation on the input feature map to obtain a convolution feature map with the same number of channels as the output feature map, and combine the convolution feature map with the output feature map and , as the target output feature map.
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图是通过提取图数据得到的特征图,获取当前卷积层的偏置矩阵,其中偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,获取参考邻接矩阵,计算参考邻接矩阵和所述偏置矩阵的和,得到目标邻接矩阵,获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据对应的识别结果。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented: obtaining the input input to the current convolutional layer of the trained convolutional neural network The feature map, the input feature map is the feature map obtained by extracting the image data, and the bias matrix of the current convolutional layer is obtained, where the bias matrix is the matrix generated when the trained convolutional neural network is generated, the reference adjacency matrix is obtained, and the calculation Referring to the sum of the adjacency matrix and the bias matrix, the target adjacency matrix is obtained, the convolution kernel of the current convolution layer is obtained, and the target output feature map is generated according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map, According to the target output feature map, identify the recognition result corresponding to the map data.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:当前卷积层的输入特征图进行降维,得到降维矩阵,归一化降维矩阵,得到归一化矩阵,将归一化矩阵的各个元素与目标邻接矩阵对应的元素相加,得到更新后的目标邻接矩阵,将更新后的目标邻接矩阵作为目标邻接矩阵。In one embodiment, when the computer program is executed by the processor, the following steps are also implemented: performing dimensionality reduction on the input feature map of the current convolutional layer to obtain a dimensionality reduction matrix, normalizing the dimensionality reduction matrix, obtaining a normalization matrix, and normalizing the dimensionality reduction matrix. Each element of the normalization matrix is added to the corresponding element of the target adjacency matrix to obtain an updated target adjacency matrix, and the updated target adjacency matrix is used as the target adjacency matrix.
在一个实施例中,输入特征图至少包括三个维度,其中,第一维度为通道数,包括:根据第一降维函数分别对输入特征图的各个通道的矩阵进行降维,得到各个通道对应的第一特征图,根据第二降维函数分别对特征图的各个通道的矩阵进行降维,得到各个通道对应的第二特征图,计算各个通道的第一特征图和第二特征图的第一乘积矩阵,将各个通道第一乘积矩阵作为降维矩阵对应通道的矩阵,归一化降维矩阵中各个通道的矩阵,得到归一化矩阵对应通道的矩阵。In one embodiment, the input feature map includes at least three dimensions, wherein the first dimension is the number of channels, including: performing dimensionality reduction on the matrix of each channel of the input feature map according to the first dimensionality reduction function to obtain the corresponding The first feature map of the feature map, according to the second dimensionality reduction function, the matrix of each channel of the feature map is respectively reduced in dimension, and the second feature map corresponding to each channel is obtained, and the first feature map and the second feature map of each channel are calculated. A product matrix, using the first product matrix of each channel as a matrix corresponding to the channel of the dimension reduction matrix, and normalizing the matrix of each channel in the dimension reduction matrix to obtain a matrix corresponding to the channel of the normalized matrix.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:成已训练的卷积神经网络,包括:获取包含多个训练图数据的训练集合,各个训练图数据包含对应的标签信息,通过初始卷积神经网络对各个训练图数据进行识别,得到对应的识别结果,根据预设损失函数计算各个训练图数据的识别结果与标签的损失值,当损失值小于或等于预设差损失值时,得到已训练的卷积神经网络。In one embodiment, when the computer program is executed by the processor, the following steps are also implemented: forming a trained convolutional neural network, comprising: obtaining a training set comprising a plurality of training graph data, each training graph data including corresponding label information, Identify the data of each training image through the initial convolutional neural network to obtain the corresponding recognition results, calculate the recognition results of each training image data and the loss value of the label according to the preset loss function, when the loss value is less than or equal to the preset difference loss value , the trained convolutional neural network is obtained.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:当损失值大于预设损失值时,通过梯度回传算法回传损失值,更新初始卷积神经网络的网络参数,采用更新了网络参数的初始卷积神经网络对个训练图数据进行识别,得到对应的识别结果,直至识别结果与标签信息之间的损失值小于预设损失值时,得到已训练的卷积神经网络。In one embodiment, when the computer program is executed by the processor, the following steps are also implemented: when the loss value is greater than the preset loss value, return the loss value through the gradient return algorithm, update the network parameters of the initial convolutional neural network, and use the updated The initial convolutional neural network with network parameters recognizes each training image data, and obtains the corresponding recognition results, until the loss value between the recognition results and the label information is less than the preset loss value, and the trained convolutional neural network is obtained.
在一个实施例中,输入特征图至少包括三个维度,其中,第一维度为通道数,目标邻接矩阵至少包括三个维度,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,包括:重塑输入特征图,得到重塑特征图,重塑特征图的第一维度为输入特征图的第一维度和第二维度的乘积,计算重塑特征图和目标邻接矩阵的各个通道的矩阵的乘积,得到各个通道的第二乘积矩阵,反重塑各个通道的第二乘积矩阵,得到各个通道的反重塑特征图根据各个通道的卷积核,根据各个通道的卷积核对反重塑特征图进行卷积运算,得到当前卷积层各个通道的目标特征图,对各个通道的目标特征图进行求和,得到当前卷积层的输出特征图,将当前卷积层的输出特征图作为目标输出特征图。In one embodiment, the input feature map includes at least three dimensions, wherein the first dimension is the number of channels, and the target adjacency matrix includes at least three dimensions, according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map Generating the target output feature map, including: reshaping the input feature map to obtain the reshaping feature map, the first dimension of the reshaping feature map is the product of the first dimension and the second dimension of the input feature map, and calculating the reshaping feature map and the target The product of the matrices of each channel of the adjacency matrix is obtained to obtain the second product matrix of each channel, and the second product matrix of each channel is inversely reshaped to obtain the inverse reshaping feature map of each channel. According to the convolution kernel of each channel, according to each channel The convolution kernel performs convolution operation on the anti-reshaping feature map to obtain the target feature map of each channel of the current convolution layer, and sums the target feature maps of each channel to obtain the output feature map of the current convolution layer, and the current convolution layer The output feature map of the product layer is used as the target output feature map.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:判断输出特征图的通道数是否与输入特征图的通道数一致,当一致时,将输入特征图对输出特征图的和,作为当前卷积层的目标输出特征图,当不一致时,对输入特征图进行卷积运算,得到与输出特征图的通道数一致的卷积特征图,将卷积特征图与输出特征图的和,作为目标输出特征图。In one embodiment, when the computer program is executed by the processor, the following steps are also implemented: judging whether the channel number of the output feature map is consistent with the channel number of the input feature map, and when consistent, the sum of the input feature map to the output feature map , as the target output feature map of the current convolutional layer, when it is inconsistent, perform a convolution operation on the input feature map to obtain a convolutional feature map with the same number of channels as the output feature map, and combine the convolutional feature map with the output feature map and, as the target output feature map.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器 (ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步 DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be realized through computer programs to instruct related hardware, and the programs can be stored in a non-volatile computer-readable storage medium When the program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, any references to memory, storage, database or other media used in the various embodiments provided in the present application may include non-volatile and/or volatile memory. Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relative terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these No such actual relationship or order exists between entities or operations. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.
以上所述仅是本发明的具体实施方式,使本领域技术人员能够理解或实现本发明。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所申请的原理和新颖特点相一致的最宽的范围。The above descriptions are only specific embodiments of the present invention, so that those skilled in the art can understand or implement the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Accordingly, the present invention will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910503195.4A CN110363086A (en) | 2019-06-11 | 2019-06-11 | Image data recognition method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910503195.4A CN110363086A (en) | 2019-06-11 | 2019-06-11 | Image data recognition method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110363086A true CN110363086A (en) | 2019-10-22 |
Family
ID=68217259
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910503195.4A Pending CN110363086A (en) | 2019-06-11 | 2019-06-11 | Image data recognition method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110363086A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111325816A (en) * | 2020-02-11 | 2020-06-23 | 重庆特斯联智慧科技股份有限公司 | Feature map processing method and device, storage medium and terminal |
CN111539526A (en) * | 2020-04-24 | 2020-08-14 | 苏州浪潮智能科技有限公司 | Neural network convolution method and device |
CN111914029A (en) * | 2020-08-06 | 2020-11-10 | 平安科技(深圳)有限公司 | Block chain-based medical data calling method and device, electronic equipment and medium |
CN111967479A (en) * | 2020-07-27 | 2020-11-20 | 广东工业大学 | Image target identification method based on convolutional neural network idea |
CN111986071A (en) * | 2020-08-27 | 2020-11-24 | 苏州浪潮智能科技有限公司 | A picture data processing method, device, equipment and storage medium |
WO2020248581A1 (en) * | 2019-06-11 | 2020-12-17 | 中国科学院自动化研究所 | Graph data identification method and apparatus, computer device, and storage medium |
CN112116001A (en) * | 2020-09-17 | 2020-12-22 | 苏州浪潮智能科技有限公司 | Image recognition method, image recognition device and computer-readable storage medium |
CN112116066A (en) * | 2020-08-27 | 2020-12-22 | 苏州浪潮智能科技有限公司 | A computing method, system, device and medium for a neural network |
CN113706410A (en) * | 2021-08-19 | 2021-11-26 | 北京小米移动软件有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
CN113989523A (en) * | 2021-10-26 | 2022-01-28 | 深圳大学 | Image processing method, apparatus, computer equipment and storage medium |
CN115793490A (en) * | 2023-02-06 | 2023-03-14 | 南通弈匠智能科技有限公司 | Intelligent household energy-saving control method based on big data |
CN117251715A (en) * | 2023-11-17 | 2023-12-19 | 华芯程(杭州)科技有限公司 | Layout measurement area screening method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169415A (en) * | 2017-04-13 | 2017-09-15 | 西安电子科技大学 | Human motion recognition method based on convolutional neural networks feature coding |
US20180121760A1 (en) * | 2016-10-27 | 2018-05-03 | General Electric Company | Methods of systems of generating virtual multi-dimensional models using image analysis |
CN108304795A (en) * | 2018-01-29 | 2018-07-20 | 清华大学 | Human skeleton Activity recognition method and device based on deeply study |
CN109086652A (en) * | 2018-06-04 | 2018-12-25 | 平安科技(深圳)有限公司 | Handwritten word model training method, Chinese characters recognition method, device, equipment and medium |
-
2019
- 2019-06-11 CN CN201910503195.4A patent/CN110363086A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180121760A1 (en) * | 2016-10-27 | 2018-05-03 | General Electric Company | Methods of systems of generating virtual multi-dimensional models using image analysis |
CN107169415A (en) * | 2017-04-13 | 2017-09-15 | 西安电子科技大学 | Human motion recognition method based on convolutional neural networks feature coding |
CN108304795A (en) * | 2018-01-29 | 2018-07-20 | 清华大学 | Human skeleton Activity recognition method and device based on deeply study |
CN109086652A (en) * | 2018-06-04 | 2018-12-25 | 平安科技(深圳)有限公司 | Handwritten word model training method, Chinese characters recognition method, device, equipment and medium |
Non-Patent Citations (2)
Title |
---|
LEI SHI 等: "Non-Local Graph Convolutional Networks for Skeleton-Based Action Recognition", 《ARXIV》 * |
丰艳 等: "基于时空注意力深度网络的视角无关性骨架行为识别", 《计算机辅助设计与图形学学报》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020248581A1 (en) * | 2019-06-11 | 2020-12-17 | 中国科学院自动化研究所 | Graph data identification method and apparatus, computer device, and storage medium |
CN111325816A (en) * | 2020-02-11 | 2020-06-23 | 重庆特斯联智慧科技股份有限公司 | Feature map processing method and device, storage medium and terminal |
CN111539526B (en) * | 2020-04-24 | 2022-12-06 | 苏州浪潮智能科技有限公司 | Method and device for neural network convolution |
CN111539526A (en) * | 2020-04-24 | 2020-08-14 | 苏州浪潮智能科技有限公司 | Neural network convolution method and device |
CN111967479A (en) * | 2020-07-27 | 2020-11-20 | 广东工业大学 | Image target identification method based on convolutional neural network idea |
CN111914029A (en) * | 2020-08-06 | 2020-11-10 | 平安科技(深圳)有限公司 | Block chain-based medical data calling method and device, electronic equipment and medium |
CN111986071A (en) * | 2020-08-27 | 2020-11-24 | 苏州浪潮智能科技有限公司 | A picture data processing method, device, equipment and storage medium |
CN112116066A (en) * | 2020-08-27 | 2020-12-22 | 苏州浪潮智能科技有限公司 | A computing method, system, device and medium for a neural network |
CN111986071B (en) * | 2020-08-27 | 2022-11-29 | 苏州浪潮智能科技有限公司 | Image data processing method, device, equipment and storage medium |
CN112116066B (en) * | 2020-08-27 | 2022-12-20 | 苏州浪潮智能科技有限公司 | Calculation method, system, device and medium of a neural network |
CN112116001B (en) * | 2020-09-17 | 2022-06-07 | 苏州浪潮智能科技有限公司 | Image recognition method, device and computer-readable storage medium |
CN112116001A (en) * | 2020-09-17 | 2020-12-22 | 苏州浪潮智能科技有限公司 | Image recognition method, image recognition device and computer-readable storage medium |
CN113706410A (en) * | 2021-08-19 | 2021-11-26 | 北京小米移动软件有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
CN113989523A (en) * | 2021-10-26 | 2022-01-28 | 深圳大学 | Image processing method, apparatus, computer equipment and storage medium |
CN113989523B (en) * | 2021-10-26 | 2025-06-10 | 深圳大学 | Image processing method, device, computer equipment and storage medium |
CN115793490A (en) * | 2023-02-06 | 2023-03-14 | 南通弈匠智能科技有限公司 | Intelligent household energy-saving control method based on big data |
CN115793490B (en) * | 2023-02-06 | 2023-04-11 | 南通弈匠智能科技有限公司 | Intelligent household energy-saving control method based on big data |
CN117251715A (en) * | 2023-11-17 | 2023-12-19 | 华芯程(杭州)科技有限公司 | Layout measurement area screening method and device, electronic equipment and storage medium |
CN117251715B (en) * | 2023-11-17 | 2024-03-19 | 华芯程(杭州)科技有限公司 | Layout measurement area screening method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110378372A (en) | Diagram data recognition methods, device, computer equipment and storage medium | |
CN110363086A (en) | Image data recognition method, device, computer equipment and storage medium | |
Rao et al. | Deep convolutional neural networks for sign language recognition | |
CN110889325B (en) | Multitasking facial motion recognition model training and multitasking facial motion recognition method | |
CN110222611B (en) | Human skeleton behavior identification method, system and device based on graph convolution network | |
WO2021114892A1 (en) | Environmental semantic understanding-based body movement recognition method, apparatus, device, and storage medium | |
CN111989689B (en) | Method for identifying an object in an image and mobile device for executing the method | |
CN112651438A (en) | Multi-class image classification method and device, terminal equipment and storage medium | |
Kao et al. | Visual aesthetic quality assessment with a regression model | |
CN112749723B (en) | Sample labeling method, device, computer equipment and storage medium | |
WO2019100724A1 (en) | Method and device for training multi-label classification model | |
CN111126339A (en) | Gesture recognition method and device, computer equipment and storage medium | |
CN110516705A (en) | Target tracking method, device and computer-readable storage medium based on deep learning | |
CN110390259A (en) | Image data recognition method, device, computer equipment and storage medium | |
CN112084849B (en) | Image recognition method and device | |
CN109145766A (en) | Model training method, device, recognition methods, electronic equipment and storage medium | |
CN113705297B (en) | Training method, device, computer equipment and storage medium for detection model | |
CN113255557B (en) | Deep learning-based video crowd emotion analysis method and system | |
CN111160288A (en) | Gesture key point detection method and device, computer equipment and storage medium | |
CN112699837A (en) | Gesture recognition method and device based on deep learning | |
CN115565253B (en) | A dynamic gesture real-time recognition method, device, electronic equipment and storage medium | |
JP6107531B2 (en) | Feature extraction program and information processing apparatus | |
Hsu et al. | Unsupervised convolutional neural networks for large-scale image clustering | |
CN110378213A (en) | Activity recognition method, apparatus, computer equipment and storage medium | |
CN115063526A (en) | Three-dimensional reconstruction method and system of two-dimensional image, terminal device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191022 |