[go: up one dir, main page]

CN116403288B - Motion gesture recognition method and device and electronic equipment - Google Patents

Motion gesture recognition method and device and electronic equipment Download PDF

Info

Publication number
CN116403288B
CN116403288B CN202310478121.6A CN202310478121A CN116403288B CN 116403288 B CN116403288 B CN 116403288B CN 202310478121 A CN202310478121 A CN 202310478121A CN 116403288 B CN116403288 B CN 116403288B
Authority
CN
China
Prior art keywords
person
matching score
images
matching
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310478121.6A
Other languages
Chinese (zh)
Other versions
CN116403288A (en
Inventor
彭情
黄伟红
黄佳
李靖
高武强
刘冠宇
吴瑞文
于永福
吴邑岑
刘硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202310478121.6A priority Critical patent/CN116403288B/en
Publication of CN116403288A publication Critical patent/CN116403288A/en
Application granted granted Critical
Publication of CN116403288B publication Critical patent/CN116403288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a motion gesture recognition method, a motion gesture recognition device and electronic equipment. The method comprises the following steps: and acquiring images of a plurality of persons at a plurality of view angles at the same moment, identifying the boundary box of each person in the images, matching each image according to the boundary box, checking according to the matching result, and identifying and determining the gesture information of each person after the boundary box passes the checking. According to the process, the multiple images of multiple persons under multiple visual angles are obtained, so that the boundary box of each person in the images is identified, the gesture information of each person is determined through the boundary box, and the accuracy of gesture identification under the multiple visual angles of multiple persons is improved.

Description

运动姿态的识别方法、识别装置及电子设备Motion posture recognition method, recognition device and electronic equipment

技术领域Technical Field

本发明涉及图像识别技术领域,具体地涉及一种运动姿态的识别方法、识别装置及电子设备。The present invention relates to the technical field of image recognition, and in particular to a motion posture recognition method, a recognition device and an electronic device.

背景技术Background technique

在单视角动作识别领域,目前已有的比较常用的算法的共同点是基于单个视角的图片或者视频等输入来进行动作识别,通过使用神经网络模型来处理图像或者视频帧信息,从而得出关键点位置做到动作识别任务。这一方法在完整人体未受到遮挡的情况下性能表现非常好,但是在由于人体遮挡导致的图像中人体信息缺失的情况下识别效果较差。In the field of single-view action recognition, the common feature of the existing commonly used algorithms is that they perform action recognition based on input such as pictures or videos from a single viewpoint, and use a neural network model to process image or video frame information to obtain the key point positions to achieve the action recognition task. This method performs very well when the complete human body is not blocked, but the recognition effect is poor when the human body information is missing in the image due to human body occlusion.

而在多视角动作识别领域则主要是关于单人的多视角动作识别,单人的图像信息提取与处理相比于多人的来说简单不少,单人的多视角信息只需做好特征提取与融合等工作,但是对于多人的识别效果不佳。In the field of multi-view action recognition, it is mainly about single-person multi-view action recognition. The extraction and processing of single-person image information is much simpler than that of multiple people. The multi-view information of a single person only needs to do feature extraction and fusion, but the recognition effect for multiple people is not good.

因此,单视角动作识别或多视角动作识别在比较特殊的情况,如动作遮挡或存在多人时,动作识别效果较差。Therefore, the action recognition effect of single-view action recognition or multi-view action recognition is poor in special situations, such as when the action is blocked or there are multiple people.

发明内容Summary of the invention

基于此,本发明第一方面提供一种运动姿态的识别方法,有提高了姿态识别的准确性,该方法包括:Based on this, the first aspect of the present invention provides a method for identifying a motion posture, which improves the accuracy of posture recognition. The method comprises:

获取同一时刻包括多人的多个视角的图像;Acquire images from multiple perspectives of multiple people at the same time;

基于预设目标检测算法识别每个图像中每个人的边界框;Identify the bounding box of each person in each image based on a preset object detection algorithm;

选取任意两个视角的图像,对两个图像中同一个人的边界框进行匹配,得到匹配分数;Select images from any two perspectives, match the bounding boxes of the same person in the two images, and get the matching score;

根据匹配分数构建同一个人在多个视角下的匹配分数矩阵,其中,匹配分数矩阵的行和列为图像中的边界框;Construct a matching score matrix of the same person under multiple viewing angles according to the matching scores, wherein the rows and columns of the matching score matrix are bounding boxes in the image;

根据匹配分数矩阵对多个视角下每个人的多个边界框进行匹配校验;Perform matching verification on multiple bounding boxes of each person at multiple viewpoints according to the matching score matrix;

当每个人的多个边界框均通过校验后,根据每个边界框确定每个人的姿态信息。When multiple bounding boxes of each person are verified, the posture information of each person is determined based on each bounding box.

在本发明实施例中,选取任意两个视角的图像,对两个图像中同一个人的边界框进行匹配,得到匹配分数,包括:In the embodiment of the present invention, images from any two perspectives are selected, and the bounding boxes of the same person in the two images are matched to obtain a matching score, including:

利用第一预设神经网络,对两个图像中每个人的边界框进行卷积得到每个人的外观特征;Using the first preset neural network, convolve the bounding boxes of each person in the two images to obtain the appearance features of each person;

根据每个人的外观特征计算每个人的外观匹配分数;Calculate the appearance matching score of each person based on their appearance characteristics;

利用第二预设神经网络识别每个边界框中的关键点;Identify key points in each bounding box using a second preset neural network;

根据任意两个图像对应的视角的相对位置和每个关键点的坐标计算动作匹配分数;Calculate the action matching score based on the relative position of the corresponding perspectives of any two images and the coordinates of each key point;

计算动作匹配分数和外观匹配分数的乘积,得到匹配分数。Calculate the product of the action matching score and the appearance matching score to get the matching score.

在本发明实施例中,根据每个人的外观特征计算每个人的外观匹配分数,包括:In the embodiment of the present invention, the appearance matching score of each person is calculated according to the appearance features of each person, including:

根据外观特征计算两个图像的相似性得分;Calculate the similarity score of two images based on appearance features;

将相似性得分进行归一化处理,得到外观匹配分数。The similarity scores are normalized to obtain the appearance matching score.

在本发明实施例中,根据任意两个图像对应的视角的相对位置和每个关键点的坐标计算动作匹配分数,包括:In the embodiment of the present invention, the action matching score is calculated according to the relative position of the viewing angles corresponding to any two images and the coordinates of each key point, including:

根据任意两个图像对应的视角的相对位置构建对应的变换矩阵;Construct a corresponding transformation matrix according to the relative positions of the viewing angles corresponding to any two images;

利用变换矩阵,将任意两个图像中的第一图像中同一个人的关键点坐标映射到第二图像,得到映射坐标;Using the transformation matrix, the key point coordinates of the same person in the first image of any two images are mapped to the second image to obtain the mapped coordinates;

计算第二图像中每个关键点坐标与映射坐标的距离;Calculate the distance between the coordinates of each key point in the second image and the mapped coordinates;

对多个距离的和值进行归一化处理,得到动作匹配分数。The sum of multiple distances is normalized to obtain the action matching score.

在本发明实施例中,根据匹配分数矩阵对多个视角下每个人的多个边界框进行匹配校验,包括:In an embodiment of the present invention, matching verification is performed on multiple bounding boxes of each person under multiple viewing angles according to a matching score matrix, including:

针对任意两个视角的图像,执行第一循环步骤,直至匹配分数矩阵中所有的数值均为第一预设数值,得到任意两个视角的图像的匹配结果,其中,第一循环步骤,包括:For any two perspective images, the first loop step is performed until all values in the matching score matrix are the first preset values, and the matching results of the any two perspective images are obtained, wherein the first loop step includes:

在匹配分数矩阵中查找匹配分数最大值,根据匹配分数最大值所在的行和列,得到匹配结果并确定图像中匹配的边界框,将匹配分数最大值所在的行和列的所有数值设置为第一预设数值;Searching for a maximum matching score in the matching score matrix, obtaining a matching result and determining a matching bounding box in the image according to the row and column where the maximum matching score is located, and setting all values of the row and column where the maximum matching score is located to a first preset value;

选取包括边界框数量最多的图像作为目标图像;Select the image with the largest number of bounding boxes as the target image;

根据匹配结果和目标图像,将除目标图像外的所有图像中同一个人的边界框归为边界框集合;According to the matching results and the target image, the bounding boxes of the same person in all images except the target image are grouped into a bounding box set;

根据同一个人的边界框集合检验循环一致性。Check cycle consistency against a set of bounding boxes of the same person.

在本发明实施例中,根据同一个人的边界框集合检验循环一致性,包括:In an embodiment of the present invention, checking cycle consistency based on a set of bounding boxes of the same person includes:

根据目标图像中同一个人的边界框,将边界框集合中的边界框以第二预设数值分为多个小组;According to the bounding boxes of the same person in the target image, the bounding boxes in the bounding box set are divided into a plurality of groups according to a second preset value;

在每个小组中的边界框均互相匹配的情况下,确定同一个人的边界框符合循环一致性。In the case where the bounding boxes in each group match each other, it is determined that the bounding boxes of the same person meet cycle consistency.

在本发明实施例中,当每个人的多个边界框均通过校验后,根据每个边界框确定每个人的姿态信息,包括:In the embodiment of the present invention, after multiple bounding boxes of each person have passed verification, the posture information of each person is determined according to each bounding box, including:

当每个人的多个边界框通过校验后,利用第三预设神经网络模型提取每个图像的图像特征;When the multiple bounding boxes of each person pass the verification, the image features of each image are extracted using the third preset neural network model;

将多个图像中每个人的边界框输入至第四预设神经网络模型,得到每个人的姿态信息。The bounding boxes of each person in the multiple images are input into the fourth preset neural network model to obtain the posture information of each person.

本发明第二方面提供一种运动姿态的识别装置,包括:A second aspect of the present invention provides a motion gesture recognition device, comprising:

图像获取模块,用于获取同一时刻包括多人的多个视角的图像;An image acquisition module, used to acquire images from multiple perspectives of multiple people at the same time;

识别模块,用于基于预设目标检测算法识别每个图像中每个人的边界框;A recognition module, which is used to identify the bounding box of each person in each image based on a preset object detection algorithm;

匹配分数确定模块,用于选取任意两个视角的图像,对两个图像中,同一个人的边界框进行匹配,得到匹配分数;A matching score determination module is used to select images from any two perspectives, match the bounding boxes of the same person in the two images, and obtain a matching score;

匹配分数矩阵确定模块,用于根据匹配分数构建同一个人在多个视角下的匹配分数矩阵,其中,匹配分数矩阵的行和列为图像中的边界框;A matching score matrix determination module is used to construct a matching score matrix of the same person under multiple viewing angles according to the matching scores, wherein the rows and columns of the matching score matrix are bounding boxes in the image;

匹配校验模块,用于根据匹配分数矩阵对多个视角下每个人的多个边界框进行匹配校验;A matching verification module is used to perform matching verification on multiple bounding boxes of each person under multiple viewing angles according to a matching score matrix;

姿态信息确定模块,用于当每个人的多个边界框均通过校验后,根据每个边界框确定每个人的姿态信息。The posture information determination module is used to determine the posture information of each person according to each bounding box after multiple bounding boxes of each person have passed the verification.

本发明第三方面提供一种电子设备,电子设备包括处理器和存储器,存储器存储有能够被处理器执行的计算机可执行指令,处理器执行计算机可执行指令以实现上述任一项的运动姿态的识别方法。A third aspect of the present invention provides an electronic device, the electronic device includes a processor and a memory, the memory stores computer executable instructions that can be executed by the processor, and the processor executes the computer executable instructions to implement any of the above-mentioned motion posture recognition methods.

本发明第四方面提供一种机器可读存储介质,机器可读存储介质上存储有指令,指令在被处理器执行时实现如上述任意一项的运动姿态的识别方法。A fourth aspect of the present invention provides a machine-readable storage medium, on which instructions are stored, and when the instructions are executed by a processor, a method for recognizing a motion posture as described above is implemented.

通过上述技术方案,获取同一时刻包括多人的多个视角的图像;基于预设目标检测算法识别每个图像中每个人的边界框;选取任意两个视角的图像,对两个图像中,同一个人的边界框进行匹配,得到匹配分数;根据匹配分数构建同一个人在多个视角下的匹配分数矩阵,其中,匹配分数矩阵的行和列为图像中的边界框;根据匹配分数矩阵对多个视角下每个人的多个边界框进行匹配校验;当每个人的多个边界框均通过校验后,根据每个边界框确定每个人的姿态信息。该过程通过获取多个人在多个视角下的多个图像,根据多个图像识别每个人在图像中的边界框,通过边界框确定每个人的姿态信息,提高了在多人多视角下姿态识别的准确率。Through the above technical solution, images of multiple perspectives including multiple people are obtained at the same time; the bounding box of each person in each image is identified based on a preset target detection algorithm; images of any two perspectives are selected, and the bounding boxes of the same person in the two images are matched to obtain a matching score; a matching score matrix of the same person under multiple perspectives is constructed according to the matching score, wherein the rows and columns of the matching score matrix are the bounding boxes in the image; multiple bounding boxes of each person under multiple perspectives are matched and verified according to the matching score matrix; when multiple bounding boxes of each person pass the verification, the posture information of each person is determined according to each bounding box. This process improves the accuracy of posture recognition under multiple perspectives of multiple people by obtaining multiple images of multiple people under multiple perspectives, identifying the bounding boxes of each person in the image according to the multiple images, and determining the posture information of each person through the bounding box.

本发明实施例的其它特征和优点将在随后的具体实施方式部分予以详细说明。Other features and advantages of the embodiments of the present invention will be described in detail in the subsequent detailed description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

附图是用来提供对本发明实施例的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本发明实施例,但并不构成对本发明实施例的限制。在附图中:The accompanying drawings are used to provide a further understanding of the embodiments of the present invention and constitute a part of the specification. Together with the following specific embodiments, they are used to explain the embodiments of the present invention, but do not constitute a limitation on the embodiments of the present invention. In the accompanying drawings:

图1是本发明实施例提供的一种运动姿态的识别方法的流程示意图;FIG1 is a schematic flow chart of a method for identifying a motion posture provided by an embodiment of the present invention;

图2是本发明实施例提供的一种运动姿态的识别装置的结构示意图;FIG2 is a schematic diagram of the structure of a motion posture recognition device provided by an embodiment of the present invention;

图3是本发明实施例提供的一种电子设备的结构示意图。FIG. 3 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明实施例的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本发明实施例,并不用于限制本发明实施例。The specific implementation of the embodiment of the present invention is described in detail below in conjunction with the accompanying drawings. It should be understood that the specific implementation described here is only used to illustrate and explain the embodiment of the present invention, and is not used to limit the embodiment of the present invention.

基于此,本发明提供了一种运动姿态的识别方法,图1为本发明实施例提供的一种运动姿态的识别方法的流程示意图,如图1所示,该方法包括:Based on this, the present invention provides a method for identifying a motion posture. FIG1 is a flow chart of a method for identifying a motion posture provided by an embodiment of the present invention. As shown in FIG1 , the method includes:

步骤S101:获取同一时刻包括多人的多个视角的图像。Step S101: Acquire images from multiple perspectives of multiple persons at the same time.

在实际应用中,将多个摄像头分别设置在包括多人的人群的前方、后方、左方、右方、左前方、右前方、左后方以及右后方8个位置。In actual applications, multiple cameras are respectively set at eight positions: front, rear, left, right, left front, right front, left rear and right rear of a crowd of multiple people.

在实际应用中,通过各个位置的摄像头拍摄多人在同一时刻的多个视角的图像。若是通过摄像头拍摄多人的视频,则提取视频中的每帧或一定间隔的帧的图像。In practical applications, multiple cameras at various locations are used to capture images of multiple people from multiple perspectives at the same time. If a video of multiple people is captured by a camera, images of each frame or frames at a certain interval in the video are extracted.

步骤S102:基于预设目标检测算法识别每个图像中每个人的边界框。Step S102: Identify the bounding box of each person in each image based on a preset object detection algorithm.

在实际应用中,预设目标检测算法可以为SSD(Single Shot MultiBox Detector,单次多边框检测)目标检测算法,将多个视角的图像输入至SSD目标检测算法,得到多个图像中每个人的边界框,其中,边界框为2D边界框,2D边界框是原始输入的图像的一部分,为矩形框,该矩形框框出了某一个人的全部的身体部位。In practical applications, the preset target detection algorithm can be an SSD (Single Shot MultiBox Detector) target detection algorithm. Images from multiple perspectives are input into the SSD target detection algorithm to obtain bounding boxes of each person in the multiple images, where the bounding box is a 2D bounding box, which is a part of the original input image and is a rectangular box that frames all body parts of a person.

步骤S103:选取任意两个视角的图像,对两个图像中同一个人的边界框进行匹配,得到匹配分数。Step S103: Select images from any two perspectives, match the bounding boxes of the same person in the two images, and obtain a matching score.

在实际应用中,任意选取两个视角对应的图像,对两个图像中属于同一个人的边界框进行匹配,得到同一个人边界框的匹配分数。In practical applications, two images corresponding to two perspectives are arbitrarily selected, and the bounding boxes belonging to the same person in the two images are matched to obtain the matching scores of the bounding boxes of the same person.

步骤S104:根据匹配分数构建同一个人在多个视角下的匹配分数矩阵,其中,匹配分数矩阵的行和列为图像中的边界框。Step S104: construct a matching score matrix of the same person under multiple viewing angles according to the matching scores, wherein the rows and columns of the matching score matrix are bounding boxes in the image.

在实际应用中,匹配分数矩阵由匹配分数构成,匹配分数矩阵的行为多个视角中的一个视角中的所有边界框,匹配矩阵的列为多个视角中的另一个视角中的所有边界框,矩阵数值为行和列所对应的不同边界框之间的匹配分数。In practical applications, the matching score matrix is composed of matching scores, the rows of the matching score matrix are all bounding boxes in one of the multiple views, the columns of the matching matrix are all bounding boxes in another of the multiple views, and the matrix values are the matching scores between different bounding boxes corresponding to the rows and columns.

步骤S105:根据匹配分数矩阵对多个视角下每个人的多个边界框进行匹配校验。Step S105: performing matching verification on multiple bounding boxes of each person under multiple viewing angles according to the matching score matrix.

在实际应用中,利用匹配分数矩阵,在多个视角下,对每一个人的边界框的匹配结果进行匹配校验。In practical applications, the matching score matrix is used to perform matching verification on the matching results of each person's bounding box under multiple viewing angles.

在实际应用中,不同视角中属于同一人体的边界框应该在另一个视角中有相同的匹配结果。对匹配结果进行匹配校验可以保证匹配结果的准确性。若匹配校验失败,则结合视角的相对位置来重新对剩余的边界框进行匹配,直至同一个人的所有边界框通过匹配校验。In practical applications, the bounding boxes belonging to the same person in different perspectives should have the same matching results in another perspective. Matching verification of the matching results can ensure the accuracy of the matching results. If the matching verification fails, the remaining bounding boxes are re-matched based on the relative positions of the perspectives until all bounding boxes of the same person pass the matching verification.

步骤S106:当每个人的多个边界框均通过校验后,根据每个边界框确定每个人的姿态信息。Step S106: After multiple bounding boxes of each person have passed verification, the posture information of each person is determined according to each bounding box.

在实际应用中,在每个人的多个边界框均通过校验后,根据每个人的边界框确定每个人的姿态信息,姿态信息为3D姿态信息。In practical applications, after multiple bounding boxes of each person are verified, the posture information of each person is determined according to the bounding box of each person, and the posture information is 3D posture information.

通过上述实施例,获取同一时刻包括多人的多个视角的图像;基于预设目标检测算法识别每个图像中每个人的边界框;选取任意两个视角的图像,对两个图像中,同一个人的边界框进行匹配,得到匹配分数;根据匹配分数构建同一个人在多个视角下的匹配分数矩阵,其中,匹配分数矩阵的行和列为图像中的边界框;根据匹配分数矩阵对多个视角下每个人的多个边界框进行匹配校验;当每个人的多个边界框均通过校验后,根据每个边界框确定每个人的姿态信息。该过程通过获取多个人在多个视角下的多个图像,根据多个图像识别每个人在图像中的边界框,通过边界框确定每个人的姿态信息,提高了在多人多视角下姿态识别的准确率。Through the above embodiment, images of multiple perspectives including multiple people are obtained at the same time; the bounding box of each person in each image is identified based on a preset target detection algorithm; images of any two perspectives are selected, and the bounding boxes of the same person in the two images are matched to obtain a matching score; a matching score matrix of the same person at multiple perspectives is constructed based on the matching score, wherein the rows and columns of the matching score matrix are the bounding boxes in the image; multiple bounding boxes of each person at multiple perspectives are matched and verified based on the matching score matrix; when multiple bounding boxes of each person pass the verification, the posture information of each person is determined based on each bounding box. This process improves the accuracy of posture recognition under multiple perspectives of multiple people by obtaining multiple images of multiple people at multiple perspectives, identifying the bounding boxes of each person in the image based on the multiple images, and determining the posture information of each person through the bounding boxes.

在一实施例中,步骤S103,包括:In one embodiment, step S103 includes:

利用第一预设神经网络,对两个图像中每个人的边界框进行卷积得到每个人的外观特征;Using the first preset neural network, convolve the bounding boxes of each person in the two images to obtain the appearance features of each person;

根据每个人的外观特征计算每个人的外观匹配分数;Calculate each person's appearance matching score based on each person's appearance features;

利用第二预设神经网络识别每个边界框中的关键点;Identify key points in each bounding box using a second preset neural network;

根据任意两个图像对应的视角的相对位置和每个关键点的坐标计算动作匹配分数;The action matching score is calculated based on the relative position of the corresponding perspectives of any two images and the coordinates of each key point;

计算动作匹配分数和外观匹配分数的乘积,得到匹配分数。Calculate the product of the action matching score and the appearance matching score to get the matching score.

在实际应用中,匹配分数由外观匹配分数和动作匹配分数计算得到。第一预设神经网络可以为卷积神经网络(Convolutional Neural Networks,CNN),将任意两个视角对应的图像中每个人的边界框进行卷积,得到每个人的外观特征。对每个人的外观特征进行计算得到每个人的外观匹配分数。In practical applications, the matching score is calculated from the appearance matching score and the action matching score. The first preset neural network may be a convolutional neural network (CNN), which convolves the bounding boxes of each person in the images corresponding to any two perspectives to obtain the appearance features of each person. The appearance features of each person are calculated to obtain the appearance matching score of each person.

在实际应用中,第二预设神经网络同样也可以为卷积神经网络,通过卷积神经网络识别每个边界框中人物的关键点,关键点包括但不限于人物的鼻子、左右眼、左右耳、左右肩、左右肘、左右手腕、左右臀、左右膝盖以及左右脚踝等人体重要点。In actual applications, the second preset neural network can also be a convolutional neural network, which uses the convolutional neural network to identify the key points of the person in each bounding box. The key points include but are not limited to the person's nose, left and right eyes, left and right ears, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right knees, and left and right ankles. Other important points on the human body.

在实际应用中,根据多个视角的位置,计算任意两个视角对应的图像的相对位置。并获取这两个图像中,同一个人的关键点坐标。根据图像的相对位置和关键点坐标计算对应人物的动作匹配分数。匹配分数为外观匹配分数与动作匹配分数的积。In practical applications, the relative positions of images corresponding to any two perspectives are calculated based on the positions of multiple perspectives. The key point coordinates of the same person in the two images are obtained. The action matching score of the corresponding person is calculated based on the relative positions of the images and the key point coordinates. The matching score is the product of the appearance matching score and the action matching score.

通过上述实施例,利用第一预设神经网络,对两个图像中,每个人的边界框进行卷积得到每个人的外观特征;根据每个人的外观特征计算每个人的外观匹配分数;利用第二预设神经网络识别每个边界框中的关键点;根据任意两个图像对应的视角的相对位置和每个关键点的坐标计算动作匹配分数;计算动作匹配分数和外观匹配分数的乘积,得到匹配分数。该过程通过对两个图像中同一个人的边界框进行卷积得到外观匹配分数,通过识别边界框中的关键点,进而计算动作匹配分数,根据动作匹配分数和外观匹配分数确定两个图像中同一个人的边界框的匹配分数,保证了边界框匹配的准确性。Through the above embodiment, the first preset neural network is used to convolve the bounding boxes of each person in the two images to obtain the appearance features of each person; the appearance matching score of each person is calculated based on the appearance features of each person; the key points in each bounding box are identified using the second preset neural network; the action matching score is calculated based on the relative position of the perspectives corresponding to any two images and the coordinates of each key point; the product of the action matching score and the appearance matching score is calculated to obtain the matching score. In this process, the appearance matching score is obtained by convolving the bounding boxes of the same person in the two images, the action matching score is calculated by identifying the key points in the bounding boxes, and the matching score of the bounding boxes of the same person in the two images is determined based on the action matching score and the appearance matching score, thereby ensuring the accuracy of the bounding box matching.

在一实施例中,根据每个人的外观特征计算每个人的外观匹配分数,包括:In one embodiment, the appearance matching score of each person is calculated based on the appearance characteristics of each person, including:

根据外观特征计算两个图像的相似性得分;Calculate the similarity score of two images based on appearance features;

将相似性得分进行归一化处理,得到外观匹配分数。The similarity scores are normalized to obtain the appearance matching score.

在实际应用中,根据外观特征计算两个图像中,属于同一个人的相似性得分。将得到的相似性得分进行归一化处理,得到外观匹配分数。In practical applications, the similarity scores of the same person in two images are calculated based on the appearance features. The obtained similarity scores are normalized to obtain the appearance matching score.

在实际应用中,可以使用Sigmoid函数将相似性得分映射到区间[0,1]内,得到最终外观匹配分数。如果两个图像完全相同,外观匹配分数为1;如果两个图像完全不同,外观匹配分数为0。In practical applications, the Sigmoid function can be used to map the similarity score to the interval [0, 1 ] to obtain the final appearance matching score. If the two images are exactly the same, the appearance matching score is 1; if the two images are completely different, the appearance matching score is 0.

通过上述实施例,根据外观特征计算两个图像的相似性得分;将相似性得分进行归一化处理,得到外观匹配分数。该过程根据外观特征计算两个图像属于同一个人的相似性得分,并将相似性得分归一化处理得到外观匹配分数,利用相似性得分确定外观匹配分数,提高了外观匹配分数计算的准确性。Through the above embodiment, the similarity scores of the two images are calculated according to the appearance features; the similarity scores are normalized to obtain the appearance matching scores. This process calculates the similarity scores of the two images belonging to the same person according to the appearance features, and normalizes the similarity scores to obtain the appearance matching scores, and uses the similarity scores to determine the appearance matching scores, thereby improving the accuracy of the appearance matching score calculation.

在一实施例中,根据任意两个图像对应的视角的相对位置和每个关键点的坐标计算动作匹配分数,包括:In one embodiment, the action matching score is calculated according to the relative position of the viewing angles corresponding to any two images and the coordinates of each key point, including:

根据任意两个图像对应的视角的相对位置构建对应的变换矩阵;Construct a corresponding transformation matrix according to the relative positions of the viewing angles corresponding to any two images;

利用变换矩阵,将任意两个图像中的第一图像中同一个人的关键点坐标映射到第二图像,得到映射坐标;Using the transformation matrix, the key point coordinates of the same person in the first image of any two images are mapped to the second image to obtain the mapped coordinates;

计算第二图像中每个关键点坐标与映射坐标的距离;Calculate the distance between the coordinates of each key point in the second image and the mapped coordinates;

对多个距离的和值进行归一化处理,得到动作匹配分数。The sum of multiple distances is normalized to obtain the action matching score.

在实际应用中,分别计算两个图像对应的视角的相对位置,根据相对位置构建两个图像的变换矩阵。将两个图像中的第一图像同一个人的关键点坐标,利用对应的变换矩阵,映射到第二图像中,得到映射坐标。In practical applications, the relative positions of the perspectives corresponding to the two images are calculated respectively, and the transformation matrices of the two images are constructed according to the relative positions. The key point coordinates of the same person in the first image of the two images are mapped to the second image using the corresponding transformation matrix to obtain the mapping coordinates.

在实际应用中,分别计算第二图像中,同一个人的多个关键点坐标与多个映射坐标之间的距离。计算同一个人的所有关键点坐标与映射坐标距离的和值,对和值进行归一化处理得到动作匹配分数。具体地,可以使用Sigmoid函数距离和值映射到区间[0,1]内,得到动作匹配分数。In practical applications, the distances between multiple key point coordinates and multiple mapping coordinates of the same person in the second image are calculated respectively. The sum of the distances between all key point coordinates and mapping coordinates of the same person is calculated, and the sum is normalized to obtain the action matching score. Specifically, the distance sum can be mapped to the interval [0, 1] using the Sigmoid function to obtain the action matching score.

通过上述实施例,根据任意两个图像对应的视角的相对位置构建对应的变换矩阵;利用变换矩阵,将任意两个图像中的第一图像中同一个人的关键点坐标映射到第二图像,得到映射坐标;计算第二图像中每个关键点坐标与映射坐标的距离;对多个距离的和值进行归一化处理,得到动作匹配分数。该过程基于目标图像与其他图像的相对位置确定变化矩阵,根据变化矩阵将其他图像中同一个人的关键点坐标映射到目标图像中,计算目标图像中同一个人的关键点坐标与映射坐标的距离,根据距离和值确定动作匹配分数,提高了动作匹配的准确性。Through the above embodiment, a corresponding transformation matrix is constructed according to the relative position of the corresponding perspectives of any two images; the key point coordinates of the same person in the first image of any two images are mapped to the second image using the transformation matrix to obtain the mapping coordinates; the distance between each key point coordinate in the second image and the mapping coordinates is calculated; the sum of multiple distances is normalized to obtain the action matching score. This process determines the change matrix based on the relative position of the target image and other images, maps the key point coordinates of the same person in other images to the target image according to the change matrix, calculates the distance between the key point coordinates of the same person in the target image and the mapping coordinates, and determines the action matching score according to the sum of the distances, thereby improving the accuracy of action matching.

在一实施例中,步骤S105,包括:In one embodiment, step S105 includes:

针对任意两个视角的图像,执行第一循环步骤,直至匹配分数矩阵中所有的数值均为第一预设数值,得到任意两个视角的图像的匹配结果,其中,第一循环步骤,包括:For any two perspective images, the first loop step is performed until all values in the matching score matrix are the first preset values, and the matching results of the any two perspective images are obtained, wherein the first loop step includes:

在匹配分数矩阵中查找匹配分数最大值,根据匹配分数最大值所在的行和列,得到匹配结果并确定图像中匹配的边界框,将匹配分数最大值所在的行和列的所有数值设置为第一预设数值;Searching for a maximum matching score in the matching score matrix, obtaining a matching result and determining a matching bounding box in the image according to the row and column where the maximum matching score is located, and setting all values of the row and column where the maximum matching score is located to a first preset value;

选取包括边界框数量最多的图像作为目标图像;Select the image with the largest number of bounding boxes as the target image;

根据匹配结果和目标图像,将除目标图像外的所有图像中同一个人的边界框归为边界框集合;According to the matching results and the target image, the bounding boxes of the same person in all images except the target image are grouped into a bounding box set;

根据同一个人的边界框集合检验循环一致性。Check cycle consistency against a set of bounding boxes of the same person.

在实际应用中,对多个视角下每个人的多个边界框进行匹配校验,包括唯一性检验和循环一致性检验,唯一性检验是指一个视角中的边界框至多对应另一个视角中的一个边界框,在满足这个条件的情况下,利用匹配分数矩阵可以得到初步匹配结果。循环一致性检验是指任意三个视角中相互匹配的边界框应该连成环,即两个相互匹配的边界框在第三个视角中匹配的边界框应该是同一个。In practical applications, multiple bounding boxes of each person under multiple perspectives are matched and verified, including uniqueness check and cycle consistency check. The uniqueness check means that a bounding box in one perspective corresponds to at most one bounding box in another perspective. When this condition is met, the matching score matrix can be used to obtain preliminary matching results. The cycle consistency check means that bounding boxes that match each other in any three perspectives should be connected into a ring, that is, the bounding boxes that match two bounding boxes in the third perspective should be the same.

在实际应用中,利用匹配分数矩阵查找矩阵中匹配分数最大值,根据匹配分数最大值所在的行和列确定图像中相互匹配的边界框,将该最大匹配分数所在行和列的所有数值置为第一预设数值,即为0,并继续查找匹配分数矩阵中的匹配分数最大值,重复以上操作直到矩阵数值全为0,以此得到各视角之间的边界框匹配关系,使得边界框符合唯一性检验。In practical applications, the matching score matrix is used to find the maximum matching score in the matrix, and the bounding boxes that match each other in the image are determined according to the row and column where the maximum matching score is located. All values in the row and column where the maximum matching score is located are set to the first preset value, that is, 0, and the matching score maximum value in the matching score matrix is continued. The above operation is repeated until all matrix values are 0, so as to obtain the bounding box matching relationship between each perspective, so that the bounding box meets the uniqueness test.

在实际应用中,在所有图像中查找包括边界框最多的图像作为目标图像,以目标图像为基准,根据匹配结果分别将其中每个边界框与它在其它图像中匹配的边界框并为一组得到边界框集合,该边界框集合代表的是同一个人在不同视角下的图像,根据同一个人的边界框集合检验循环一致性。In practical applications, the image with the most bounding boxes is found among all images as the target image. Taking the target image as the benchmark, each bounding box in it is grouped with its matching bounding box in other images according to the matching results to obtain a bounding box set. This bounding box set represents images of the same person from different perspectives, and the cycle consistency is checked based on the bounding box set of the same person.

通过上述实施例,针对任意两个视角的图像,执行第一循环步骤,直至匹配分数矩阵中所有的数值均为第一预设数值,得到任意两个视角的图像的匹配结果,其中,第一循环步骤,包括:在匹配分数矩阵中查找匹配分数最大值,根据匹配分数最大值所在的行和列,得到匹配结果并确定图像中匹配的边界框,将匹配分数最大值所在的行和列的所有数值设置为第一预设数值;选取包括边界框数量最多的图像作为目标图像;根据匹配结果和目标图像,将除目标图像外的所有图像中同一个人的边界框归为边界框集合;根据同一个人的边界框集合检验循环一致性。该过程通过对多个视角下每个人的多个边界框检验边界框的唯一性和循环一致性,提高了姿态识别的准确率。Through the above embodiment, for images of any two perspectives, the first loop step is executed until all values in the matching score matrix are the first preset values, and the matching results of the images of any two perspectives are obtained, wherein the first loop step includes: searching for the maximum matching score in the matching score matrix, obtaining the matching result and determining the matching bounding box in the image according to the row and column where the maximum matching score is located, and setting all values in the row and column where the maximum matching score is located to the first preset value; selecting the image with the largest number of bounding boxes as the target image; according to the matching result and the target image, classifying the bounding boxes of the same person in all images except the target image into a bounding box set; and checking the cyclic consistency according to the bounding box set of the same person. This process improves the accuracy of gesture recognition by checking the uniqueness and cyclic consistency of multiple bounding boxes of each person under multiple perspectives.

在一实施例中,根据同一个人的边界框集合检验循环一致性,包括:In one embodiment, checking cycle consistency based on a set of bounding boxes of the same person includes:

根据目标图像中同一个人的边界框,将边界框集合中的边界框以第二预设数值分为多个小组;According to the bounding boxes of the same person in the target image, the bounding boxes in the bounding box set are divided into a plurality of groups according to a second preset value;

在每个小组中的边界框均互相匹配的情况下,确定同一个人的边界框符合循环一致性。In the case where the bounding boxes in each group match each other, it is determined that the bounding boxes of the same person meet cycle consistency.

在实际应用中,将边界框集合中的边界框按照每组边界框的数量为第二预设数值,如3的不同组合形式进行循环一致性检验,排除个别不符合循环一致性检验的边界框,若组内边界框数量等于三且不成环或边界框数量少于三,则删除该组,使得每组边界框都能通过校验。In practical applications, the bounding boxes in the bounding box set are subjected to a cyclic consistency check in different combinations in which the number of bounding boxes in each group is a second preset value, such as 3, and individual bounding boxes that do not meet the cyclic consistency check are excluded. If the number of bounding boxes in a group is equal to three and does not form a cycle or the number of bounding boxes is less than three, the group is deleted so that each group of bounding boxes can pass the check.

通过上述实施例,根据目标图像中同一个人的边界框,将边界框集合中的边界框以第二预设数值分为多个小组;在每个小组中的边界框均互相匹配的情况下,确定同一个人的边界框符合循环一致性。该过程通过检验边界框的循环一致性,确定了边界框匹配的准确性。Through the above embodiment, according to the bounding boxes of the same person in the target image, the bounding boxes in the bounding box set are divided into a plurality of groups by the second preset value; when the bounding boxes in each group match each other, it is determined that the bounding boxes of the same person meet the cycle consistency. This process determines the accuracy of the bounding box matching by checking the cycle consistency of the bounding boxes.

在一实施例中,步骤S106,包括:In one embodiment, step S106 includes:

当每个人的多个边界框通过校验后,利用第三预设神经网络模型提取每个图像的图像特征;When the multiple bounding boxes of each person pass the verification, the image features of each image are extracted using the third preset neural network model;

将多个图像中每个人的边界框输入至第四预设神经网络模型,得到每个人的姿态信息。The bounding boxes of each person in the multiple images are input into the fourth preset neural network model to obtain the posture information of each person.

在实际应用中,根据属于同一个人的不同视角的边界框来对每个人进行姿态信息确定。In practical applications, the pose information of each person is determined based on the bounding boxes belonging to the same person from different perspectives.

在实际应用中,第三预设神经网络模型包括HRNet(High-Resolution Net,高分辨率网络)神经网络模型,将对应多个视角得到的属于同一个人的边界框输入至第三预设神经网络模型中,提取每个视角的图像特征。In actual applications, the third preset neural network model includes a HRNet (High-Resolution Net) neural network model. The bounding boxes belonging to the same person obtained from multiple perspectives are input into the third preset neural network model to extract image features from each perspective.

在实际应用中,第四预设神经网络模型包括Transformer神经网络模型,将提取到的每个图像的图像特征输入至Transformer神经网络模型,以聚合不同视角的图像特征,最后接上全连接层输出预测的姿态信息。In actual applications, the fourth preset neural network model includes a Transformer neural network model, which inputs the extracted image features of each image into the Transformer neural network model to aggregate image features from different perspectives, and finally connects to a fully connected layer to output predicted posture information.

通过上述实施例,当每个人的多个边界框通过校验后,利用第三预设神经网络模型提取每个图像的图像特征;将多个图像中每个人的边界框输入至第四预设神经网络模型,得到每个人的姿态信息。该过程可以使得在任何一个在视角中出现过的个体都能够识别并得到其动作姿态,使得姿态识别更加精准。Through the above embodiment, when multiple bounding boxes of each person pass the verification, the image features of each image are extracted using the third preset neural network model; the bounding boxes of each person in the multiple images are input into the fourth preset neural network model to obtain the posture information of each person. This process can make it possible to identify and obtain the action posture of any individual that has appeared in the viewing angle, making the posture recognition more accurate.

基于上述运动姿态的识别方法,本发明实施例还提供了一种运动姿态的识别装置200,图2为本发明实施例提供的一种运动姿态的识别装置的结构示意图,该装置200包括:Based on the above-mentioned motion posture recognition method, an embodiment of the present invention further provides a motion posture recognition device 200. FIG2 is a schematic diagram of the structure of a motion posture recognition device provided by an embodiment of the present invention. The device 200 includes:

图像获取模块201,用于获取同一时刻包括多人的多个视角的图像;An image acquisition module 201 is used to acquire images from multiple perspectives of multiple people at the same time;

识别模块202,用于基于预设目标检测算法识别每个图像中每个人的边界框;A recognition module 202, for recognizing a bounding box of each person in each image based on a preset object detection algorithm;

匹配分数确定模块203,用于选取任意两个视角的图像,对两个图像中,同一个人的边界框进行匹配,得到匹配分数;A matching score determination module 203 is used to select images from any two perspectives, match the bounding boxes of the same person in the two images, and obtain a matching score;

匹配分数矩阵确定模块204,用于根据匹配分数构建同一个人在多个视角下的匹配分数矩阵,其中,匹配分数矩阵的行和列为图像中的边界框;A matching score matrix determination module 204 is used to construct a matching score matrix of the same person under multiple viewing angles according to the matching scores, wherein the rows and columns of the matching score matrix are bounding boxes in the image;

匹配校验模块205,用于根据匹配分数矩阵对多个视角下每个人的多个边界框进行匹配校验;A matching verification module 205 is used to perform matching verification on multiple bounding boxes of each person under multiple viewing angles according to a matching score matrix;

姿态信息确定模块206,用于当每个人的多个边界框均通过校验后,根据每个边界框确定每个人的姿态信息。The posture information determination module 206 is used to determine the posture information of each person according to each bounding box after multiple bounding boxes of each person have passed the verification.

发明实施例提供的运动姿态的识别装置能够实现方法实施例中运动姿态的识别方法的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The motion posture recognition device provided in the embodiment of the invention can implement each process of the motion posture recognition method in the method embodiment and can achieve the same technical effect. To avoid repetition, it will not be described here.

本发明实施例还提供了一种电子设备,参见图3所示,该电子设备包括处理器130和存储器131,该存储器131存储有能够被处理器130执行的机器可执行指令,该处理器130执行机器可执行指令以实现上述运动姿态的识别方法。An embodiment of the present invention further provides an electronic device, as shown in FIG3 , which includes a processor 130 and a memory 131 , wherein the memory 131 stores machine executable instructions that can be executed by the processor 130 , and the processor 130 executes the machine executable instructions to implement the above-mentioned motion posture recognition method.

进一步地,图3所示的电子设备还包括总线132和通信接口133,处理器130、通信接口133和存储器131通过总线132连接。Furthermore, the electronic device shown in FIG. 3 further includes a bus 132 and a communication interface 133 , and the processor 130 , the communication interface 133 and the memory 131 are connected via the bus 132 .

其中,存储器131可能包含高速随机存取存储器(RAM,Random Access Memory),也可能还包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。通过至少一个通信接口133(可以是有线或者无线)实现该系统网元与至少一个其他网元之间的通信连接,可以使用互联网,广域网,本地网,城域网等。总线132可以是ISA总线、PCI总线或EISA总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图3中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。Among them, the memory 131 may include a high-speed random access memory (RAM), and may also include a non-volatile memory (non-volatile memory), such as at least one disk storage. The communication connection between the system network element and at least one other network element is realized through at least one communication interface 133 (which can be wired or wireless), and the Internet, wide area network, local area network, metropolitan area network, etc. can be used. The bus 132 can be an ISA bus, a PCI bus or an EISA bus, etc. The bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one bidirectional arrow is used in Figure 3, but it does not mean that there is only one bus or one type of bus.

处理器130可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器130中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器130可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(DigitalSignal Processing,简称DSP)、专用集成电路(Application Specific IntegratedCircuit,简称ASIC)、现成可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器131,处理器130读取存储器131中的信息,结合其硬件完成前述实施例的方法的步骤。The processor 130 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the hardware integrated logic circuit or software instructions in the processor 130. The above processor 130 can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods, steps and logic block diagrams disclosed in the embodiments of the present invention can be implemented or executed. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor. The steps of the method disclosed in the embodiment of the present invention can be directly embodied as a hardware decoding processor for execution, or a combination of hardware and software modules in the decoding processor for execution. The software module may be located in a storage medium mature in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc. The storage medium is located in the memory 131, and the processor 130 reads the information in the memory 131 and completes the steps of the method of the above embodiment in combination with its hardware.

本发明实施例还提供了一种机器可读存储介质,该机器可读存储介质存储有机器可执行指令,该机器可执行指令在被处理器调用和执行时,该机器可执行指令促使处理器实现上述运动姿态的识别方法,具体实现可参见方法实施例,在此不再赘述。An embodiment of the present invention also provides a machine-readable storage medium, which stores machine-executable instructions. When the machine-executable instructions are called and executed by a processor, the machine-executable instructions prompt the processor to implement the above-mentioned motion posture recognition method. The specific implementation can be found in the method embodiment, which will not be repeated here.

本发明实施例所提供的一种运动姿态的识别方法、识别装置及电子设备,包括存储了程序代码的计算机可读存储介质,程序代码包括的指令可用于执行前面方法实施例中的方法,具体实现可参见方法实施例,在此不再赘述。A motion posture recognition method, recognition device and electronic device provided in an embodiment of the present invention include a computer-readable storage medium storing program code. The instructions included in the program code can be used to execute the method in the previous method embodiment. The specific implementation can be found in the method embodiment, which will not be repeated here.

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system and device described above can refer to the corresponding process in the aforementioned method embodiment, and will not be repeated here.

另外,在本发明实施例的描述中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。In addition, in the description of the embodiments of the present invention, unless otherwise clearly specified and limited, the terms "installed", "connected", and "connected" should be understood in a broad sense, for example, it can be a fixed connection, a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, or it can be the internal communication of two components. For those skilled in the art, the specific meanings of the above terms in the present invention can be understood according to specific circumstances.

功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,电子设备,或者网络设备等)执行本发明各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or the part of the technical solution, can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, electronic device, or network device, etc.) to perform all or part of the steps of the methods of each embodiment of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, etc. Various media that can store program codes.

在本发明的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。此外,术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicating the orientation or positional relationship, are based on the orientation or positional relationship shown in the drawings, and are only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore cannot be understood as limiting the present invention. In addition, the terms "first", "second", and "third" are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance.

最后应说明的是:以上实施例,仅为本发明的具体实施方式,用以说明本发明的技术方案,而非对其限制,本发明的保护范围并不局限于此,尽管参照前述实施例对本发明进行了详细的说明,本领域技术人员应当理解:任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。Finally, it should be noted that the above embodiments are only specific implementations of the present invention, which are used to illustrate the technical solutions of the present invention, rather than to limit them. The protection scope of the present invention is not limited thereto. Although the present invention is described in detail with reference to the above embodiments, those skilled in the art should understand that any person skilled in the art can still modify the technical solutions recorded in the above embodiments within the technical scope disclosed by the present invention, or can easily think of changes, or make equivalent replacements for some of the technical features therein; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in the protection scope of the present invention. Therefore, the protection scope of the present invention shall be based on the protection scope of the claims.

Claims (9)

1. A method for recognizing a motion gesture, comprising:
acquiring images of multiple visual angles of multiple persons at the same moment;
Identifying a bounding box of each person in each image based on a preset target detection algorithm;
Selecting the images of any two visual angles, and matching the boundary boxes of the same person in the two images to obtain matching scores;
Constructing a matching score matrix of the same person under the multiple view angles according to the matching scores, wherein the rows and columns of the matching score matrix are boundary boxes in the image;
Performing matching verification on the multiple bounding boxes of each person under the multiple view angles according to the matching score matrix;
when the multiple bounding boxes of each person pass verification, determining gesture information of each person according to each bounding box;
the performing the matching check on the bounding boxes of each person under the multiple view angles according to the matching score matrix includes:
And executing a first circulation step for the images of any two view angles until all values in the matching score matrix are first preset values, so as to obtain a matching result of the images of any two view angles, wherein the first circulation step comprises the following steps:
Searching a matching score maximum value in the matching score matrix, obtaining a matching result according to the row and the column where the matching score maximum value is located, determining the matched boundary box in the image, and setting all the values of the row and the column where the matching score maximum value is located as the first preset value;
selecting the image with the largest number of bounding boxes as a target image;
classifying the boundary boxes of the same person in all the images except the target image into a boundary box set according to the matching result and the target image;
the loop consistency is checked against the set of bounding boxes of the same person.
2. The method according to claim 1, wherein the selecting the images from any two perspectives, and matching the bounding boxes of the same person in the two images, to obtain a matching score, includes:
Convolving the boundary frame of each person in the two images by using a first preset neural network to obtain the appearance characteristics of each person;
Calculating appearance matching scores of each person according to the appearance characteristics of each person;
identifying key points in each boundary box by using a second preset neural network;
calculating an action matching score according to the relative positions of the visual angles corresponding to any two images and the coordinates of each key point;
And calculating the product of the action matching score and the appearance matching score to obtain the matching score.
3. The method of claim 2, wherein said calculating an appearance matching score for each person based on the appearance characteristics of each person comprises:
calculating similarity scores of the two images according to the appearance characteristics;
and normalizing the similarity score to obtain the appearance matching score.
4. A method according to claim 3, wherein said calculating an action matching score from the relative positions of the viewing angles corresponding to any two of said images and the coordinates of each of said keypoints comprises:
constructing a corresponding transformation matrix according to the relative positions of the view angles corresponding to any two images;
mapping the coordinates of key points of the same person in a first image in any two images to a second image by using the transformation matrix to obtain mapped coordinates;
calculating the distance between each key point coordinate in the second image and the mapping coordinate;
And normalizing the sum of the distances to obtain the action matching score.
5. The method of claim 1, wherein the verifying cyclical compliance from the set of bounding boxes of the same person comprises:
dividing the boundary boxes in the boundary box set into a plurality of subgroups according to the boundary boxes of the same person in the target image by a second preset value;
In the case that the bounding boxes in each of the subgroups match each other, it is determined that the bounding boxes of the same person conform to the cyclical consistency.
6. The method according to claim 1, wherein said determining pose information of each person from each of said bounding boxes after said plurality of said bounding boxes of each person pass verification comprises:
When the multiple bounding boxes of each person pass the verification, extracting the image characteristics of each image by using a third preset neural network model;
And inputting the boundary box of each person in the plurality of images to a fourth preset neural network model to obtain the gesture information of each person.
7. A motion gesture recognition apparatus, comprising:
the image acquisition module is used for acquiring images of multiple visual angles of multiple persons at the same moment;
the identification module is used for identifying the boundary box of each person in each image based on a preset target detection algorithm;
The matching score determining module is used for selecting the images of any two visual angles, and matching the boundary boxes of the same person in the two images to obtain a matching score;
A matching score matrix determining module, configured to construct a matching score matrix of the same person under the multiple view angles according to the matching score, where rows and columns of the matching score matrix are bounding boxes in the image;
The matching verification module is used for carrying out matching verification on the plurality of boundary boxes of each person under the plurality of view angles according to the matching score matrix;
the gesture information determining module is used for determining gesture information of each person according to each bounding box after the bounding boxes of each person pass verification;
the performing the matching check on the bounding boxes of each person under the multiple view angles according to the matching score matrix includes:
And executing a first circulation step for the images of any two view angles until all values in the matching score matrix are first preset values, so as to obtain a matching result of the images of any two view angles, wherein the first circulation step comprises the following steps:
Searching a matching score maximum value in the matching score matrix, obtaining a matching result according to the row and the column where the matching score maximum value is located, determining the matched boundary box in the image, and setting all the values of the row and the column where the matching score maximum value is located as the first preset value;
selecting the image with the largest number of bounding boxes as a target image;
classifying the boundary boxes of the same person in all the images except the target image into a boundary box set according to the matching result and the target image;
the loop consistency is checked against the set of bounding boxes of the same person.
8. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the method of identifying a motion gesture of any one of claims 1 to 6.
9. A machine-readable storage medium having instructions stored thereon, which when executed by a processor implement the method of motion gesture recognition of any one of claims 1 to 6.
CN202310478121.6A 2023-04-28 2023-04-28 Motion gesture recognition method and device and electronic equipment Active CN116403288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310478121.6A CN116403288B (en) 2023-04-28 2023-04-28 Motion gesture recognition method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310478121.6A CN116403288B (en) 2023-04-28 2023-04-28 Motion gesture recognition method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN116403288A CN116403288A (en) 2023-07-07
CN116403288B true CN116403288B (en) 2024-07-16

Family

ID=87017937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310478121.6A Active CN116403288B (en) 2023-04-28 2023-04-28 Motion gesture recognition method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116403288B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740659A (en) * 2018-12-28 2019-05-10 浙江商汤科技开发有限公司 A kind of image matching method and device, electronic equipment, storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012065851A (en) * 2010-09-24 2012-04-05 Chuo Univ Multi-view auto-stereoscopic endoscope system
US11036989B1 (en) * 2019-12-11 2021-06-15 Snap Inc. Skeletal tracking using previous frames
CN111476883B (en) * 2020-03-30 2023-04-07 清华大学 Three-dimensional posture trajectory reconstruction method and device for multi-view unmarked animal
US20220148453A1 (en) * 2020-11-12 2022-05-12 Tencent America LLC Vision-based rehabilitation training system based on 3d human pose estimation using multi-view images
US11488376B2 (en) * 2021-02-15 2022-11-01 Sony Group Corporation Human skin detection based on human-body prior
CN114036969B (en) * 2021-03-16 2023-07-25 上海大学 A 3D Human Action Recognition Algorithm in Multi-View Situation
WO2022241583A1 (en) * 2021-05-15 2022-11-24 电子科技大学 Family scenario motion capture method based on multi-target video
US20240233441A1 (en) * 2021-05-18 2024-07-11 Garena Online Private Limited Neural Network System for 3D Pose Estimation
CN114694257B (en) * 2022-04-06 2024-09-20 中南大学 Multi-user real-time three-dimensional action recognition evaluation method, device, equipment and medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740659A (en) * 2018-12-28 2019-05-10 浙江商汤科技开发有限公司 A kind of image matching method and device, electronic equipment, storage medium

Also Published As

Publication number Publication date
CN116403288A (en) 2023-07-07

Similar Documents

Publication Publication Date Title
CN108764024B (en) Device and method for generating face recognition model and computer readable storage medium
CN110135246B (en) A method and device for recognizing human body movements
Chen et al. Similarity learning with spatial constraints for person re-identification
WO2020199480A1 (en) Body movement recognition method and device
CN109960742B (en) Local information searching method and device
CN112364827A (en) Face recognition method and device, computer equipment and storage medium
CN106897675A (en) The human face in-vivo detection method that binocular vision depth characteristic is combined with appearance features
CN110199296A (en) Face identification method, processing chip and electronic equipment
CN105139000B (en) A face recognition method and device for removing eyeglass marks
JP2009157767A (en) Face image recognition device, face image recognition method, face image recognition program, and recording medium recording the program
CN113807451A (en) Training method, device and server for panoramic image feature point matching model
US20250037499A1 (en) Method and apparatus for processing identity recognition image, computer device, and storage medium
CN109993021A (en) Face detection method, device and electronic device
CN114495266B (en) Non-standing posture detection method, non-standing posture detection device, computer equipment and storage medium
CN113902855B (en) Three-dimensional face reconstruction method based on camera equipment and related equipment
CN111833441A (en) A method and device for face 3D reconstruction based on multi-camera system
CN115082996A (en) A face key point detection method, device, terminal device and storage medium
WO2024208099A1 (en) Machine vision based method and system for recognizing person in operation area, and device and medium
CN113902781A (en) 3D face reconstruction method, device, equipment and medium
CN118072385A (en) Parkinson disease assessment method based on video gait analysis
CN116403288B (en) Motion gesture recognition method and device and electronic equipment
WO2021051538A1 (en) Face detection method and apparatus, and terminal device
CN111723610A (en) Image recognition method, device and equipment
CN116597523A (en) Living body detection method, model, training method and device based on near-infrared images
CN111967579B (en) Method and device for performing convolution calculation on image using convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant