[go: up one dir, main page]

CN113869151A - Cross-view gait recognition method and system based on feature fusion - Google Patents

Cross-view gait recognition method and system based on feature fusion Download PDF

Info

Publication number
CN113869151A
CN113869151A CN202111076716.6A CN202111076716A CN113869151A CN 113869151 A CN113869151 A CN 113869151A CN 202111076716 A CN202111076716 A CN 202111076716A CN 113869151 A CN113869151 A CN 113869151A
Authority
CN
China
Prior art keywords
feature
module
channels
global
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111076716.6A
Other languages
Chinese (zh)
Other versions
CN113869151B (en
Inventor
王中元
洪琪
陈建宇
邓练兵
邵振峰
肖进胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202111076716.6A priority Critical patent/CN113869151B/en
Publication of CN113869151A publication Critical patent/CN113869151A/en
Application granted granted Critical
Publication of CN113869151B publication Critical patent/CN113869151B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于特征融合的跨视角步态识别方法及系统,首先,通过一个多尺度特征融合模块提取不同粒度的步态信息;然后,采用一种多分支学习方式,一方面提取全局特征以获得步态轮廓信息,另一方面提取局部特征以获得步态细微信息,将得到的全局特征和局部特征在通道维度上进行融合,从而达到提取互补信息的目的;最后,在特征映射阶段,采用广义平均池化层时间聚集器增强步态序列的时序信息。本发明能够从行人步态序列中提取更加完整的全局和局部特征信息,提高步态识别的精度。

Figure 202111076716

The invention discloses a cross-view gait recognition method and system based on feature fusion. First, gait information of different granularities is extracted through a multi-scale feature fusion module; then, a multi-branch learning method is adopted to extract the global On the other hand, the local features are extracted to obtain the subtle information of gait, and the obtained global features and local features are fused in the channel dimension, so as to achieve the purpose of extracting complementary information; finally, in the feature mapping stage , using a generalized average pooling layer temporal aggregator to enhance the temporal information of gait sequences. The invention can extract more complete global and local feature information from the pedestrian gait sequence, and improve the accuracy of gait recognition.

Figure 202111076716

Description

Cross-view gait recognition method and system based on feature fusion
Technical Field
The invention belongs to the technical field of computer vision, relates to a method and a system for identifying gait of video data across visual angles, and particularly relates to a method and a system for identifying gait of video data across visual angles based on feature fusion.
Technical Field
In recent years, a large number of monitoring cameras are deployed in social public places, and a large-scale video monitoring system is constructed to guarantee public safety. The cross-camera pedestrian search is an important means for video investigation, and although the pedestrian re-identification technology can obtain good performance on a standard data set, the pedestrian re-identification technology is excessively dependent on the appearance characteristics of pedestrians, such as the colors of pedestrian dressed clothes. Such pedestrian retrieval techniques would be frustrated when a suspect consciously disguised identity by a retooling. Gait, as a behavioral biometric, includes the appearance of an individual and the dynamic nature of walking, the gait of a pedestrian can be captured at a distance and is difficult to mimic and disguise compared to other biometrics. Therefore, the gait recognition has great application prospect and practical value in the fields of intelligent monitoring, urban security and the like.
The existing gait recognition research is generally divided into two main categories, namely a template-based method and a sequence-based method. The identification process of the former mainly comprises two steps of template generation and feature matching, generally, the human body contour of each frame image in a video is obtained through background subtraction, then the well-ordered contour images are processed into a gait template at the frame level, the representation of gait is extracted through a machine learning method, and finally the similarity between different template features is compared through distance measurement and classified. The latter firstly generates a contour map of each frame image by the same method, and then directly takes a series of contour maps as input to extract the features of the gait sequence. The former has the advantage of simplicity, but is easy to lose timing sequence information, sensitive to changes in a real scene and poor in stability; the latter requires maintaining the sequence order of the sequences without flexibility of recognition and at a high computational cost.
In an actual scene, the view angle of a pedestrian is often changed, and when the pedestrian enters a camera area from different directions, the captured postures of the pedestrian are different, and the gait characteristics of the pedestrian are also different, so that the gait recognition is challenged.
Disclosure of Invention
In order to solve the problems, the invention provides a method and a system for identifying gait of video data across visual angles based on feature fusion. Gait information with different granularities is extracted through a multi-scale feature fusion module, then a multi-branch learning mode is adopted, global features are extracted to obtain gait outline information, local features are extracted to obtain gait detail information, features of different branches are gradually fused along with the deepening of a network, and the purpose of extracting complementary information is achieved. Finally, in the feature mapping stage, a generalized average pooling layer is adopted to replace the traditional space pooling layer in the time aggregator so as to obtain more remarkable feature representation.
The method adopts the technical scheme that: a cross-perspective gait recognition method based on feature fusion comprises the following steps:
step 1: constructing a cross-perspective gait recognition model based on feature fusion;
the model comprises a multi-scale feature fusion module, a global feature extraction module, a local feature extraction module, a global and local feature fusion module and a feature mapping module;
the multi-scale feature fusion module is formed by splicing three parallel convolution branches in a channel dimension, wherein each convolution branch comprises a convolution layer and a pooling layer; wherein, the three parallel convolution layers are respectively a 1 × 1 convolution layer, a 3 × 3 convolution layer and a 5 × 5 convolution layer;
the global feature extraction module comprises two stages of feature extraction; the first level is composed of two 3 x 3 convolutional layers and a 2 x 2 maximal pooling layer, the first convolutional layer transforms the feature from 96 channels to 128 channels, and the second convolutional layer keeps the number of channels of the feature unchanged; the second level is composed of two 3 x 3 convolutional layers, the first layer transforms the feature from 128 channels to 256 channels, the second layer keeps the number of channels of the feature unchanged;
the local feature extraction module comprises an upper branch and a lower branch; when the features pass through the upper branch, the features firstly pass through two parallel 3 x 3 convolutional layers, the feature channel is converted from 96 to 128, then the obtained two parallel features are spliced in height, and the feature channel passes through one 3 x 3 convolutional layer and is converted from 128 to 256; the lower branch comprises two stages of feature extraction, the first stage is composed of two focusing convolution layers with parameters of 4 and a 2 multiplied by 2 maximum pooling layer, the first focusing convolution layer converts the features from 96 channels to 128 channels, and the second convolution layer keeps the number of the channels of the features unchanged; the second level consists of two focused convolutional layers with parameters of 8, the first convolutional layer transforms the feature from 128 channels to 256 channels, and the second convolutional layer keeps the number of channels of the feature unchanged;
the global and local feature fusion module comprises three times of fusion of global features and local features; the first time, the local features extracted by the first stage of the lower branch of the local feature extraction module are fused to the features extracted by the first stage of the global feature extraction module, the second time, the local features extracted by the second stage of the lower branch of the local feature extraction module are fused to the features extracted by the second stage of the global feature extraction module, and the third time, the features output by the second stage of the global feature extraction module are fused to the upper branch of the local feature extraction module;
the feature mapping module comprises a horizontal feature mapping module and a time sequence enhancing module; the horizontal feature mapping module is composed of a one-dimensional global average pooling layer and a one-dimensional global maximum pooling layer, and the time sequence enhancing module introduces a generalized average pooling layer which is between the average pooling layer and the maximum pooling layer;
step 2: aiming at a pedestrian image of a sequence to be detected, extracting gait feature information with different granularities by using a multi-scale feature fusion module to obtain a feature map;
and step 3: extracting complete gait feature information from the feature map obtained in the step 2 by using a global feature extraction module, a local feature extraction module and a global and local feature fusion module;
and 4, step 4: mapping the complete gait feature information obtained in the step 3 to a high-dimensional space by using a feature mapping module;
and 5: and calculating the similarity between different characteristics through the Euclidean distance to finally obtain the pedestrian identity of the sequence to be detected.
The technical scheme adopted by the system of the invention is as follows: a cross-perspective gait recognition system based on feature fusion comprises the following modules:
the module 1 is used for constructing a cross-perspective gait recognition model based on feature fusion;
the model comprises a multi-scale feature fusion module, a global feature extraction module, a local feature extraction module, a global and local feature fusion module and a feature mapping module;
the multi-scale feature fusion module is formed by splicing three parallel convolution branches in a channel dimension, wherein each convolution branch comprises a convolution layer and a pooling layer; wherein, the three parallel convolution layers are respectively a 1 × 1 convolution layer, a 3 × 3 convolution layer and a 5 × 5 convolution layer;
the global feature extraction module comprises two stages of feature extraction; the first level is composed of two 3 x 3 convolutional layers and a 2 x 2 maximal pooling layer, the first convolutional layer transforms the feature from 96 channels to 128 channels, and the second convolutional layer keeps the number of channels of the feature unchanged; the second level is composed of two 3 x 3 convolutional layers, the first layer transforms the feature from 128 channels to 256 channels, the second layer keeps the number of channels of the feature unchanged;
the local feature extraction module comprises an upper branch and a lower branch; when the features pass through the upper branch, the features firstly pass through two parallel 3 x 3 convolutional layers, the feature channel is converted from 96 to 128, then the obtained two parallel features are spliced in height, and the feature channel passes through one 3 x 3 convolutional layer and is converted from 128 to 256; the lower branch comprises two stages of feature extraction, the first stage is composed of two focusing convolution layers with parameters of 4 and a 2 multiplied by 2 maximum pooling layer, the first focusing convolution layer converts the features from 96 channels to 128 channels, and the second convolution layer keeps the number of the channels of the features unchanged; the second level consists of two focused convolutional layers with parameters of 8, the first convolutional layer transforms the feature from 128 channels to 256 channels, and the second convolutional layer keeps the number of channels of the feature unchanged;
the global and local feature fusion module comprises three times of fusion of global features and local features; the first time, the local features extracted by the first stage of the lower branch of the local feature extraction module are fused to the features extracted by the first stage of the global feature extraction module, the second time, the local features extracted by the second stage of the lower branch of the local feature extraction module are fused to the features extracted by the second stage of the global feature extraction module, and the third time, the features output by the second stage of the global feature extraction module are fused to the upper branch of the local feature extraction module;
the feature mapping module comprises a horizontal feature mapping module and a time sequence enhancing module; the horizontal feature mapping module is composed of a one-dimensional global average pooling layer and a one-dimensional global maximum pooling layer, and the time sequence enhancing module introduces a generalized average pooling layer which is between the average pooling layer and the maximum pooling layer;
the module 2 is used for extracting gait feature information with different granularities by using a multi-scale feature fusion module aiming at a pedestrian image of a sequence to be detected to obtain a feature map;
a module 3, which is used for extracting complete gait feature information from the feature map obtained in the module 2 by using a global feature extraction module, a local feature extraction module and a global and local feature fusion module;
a module 4, configured to map the complete gait feature information obtained in the module 3 to a high-dimensional space by using a feature mapping module;
and the module 5 is used for calculating the similarity between different characteristics through the Euclidean distance to finally obtain the pedestrian identity of the sequence to be detected.
Compared with the existing cross-perspective gait recognition scheme, the invention can extract more complete global and local characteristic information from the pedestrian gait sequence, and improve the gait recognition precision.
Drawings
FIG. 1: the cross-perspective gait recognition model structure diagram based on feature fusion is disclosed by the embodiment of the invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and the implementation examples, it is to be understood that the implementation examples described herein are only for the purpose of illustration and explanation and are not to be construed as limiting the present invention.
Referring to fig. 1, the cross-perspective gait recognition method based on feature fusion provided by the invention comprises the following steps:
step 1: constructing a cross-perspective gait recognition model based on feature fusion;
in this embodiment, the model includes a multi-scale feature fusion module, a global feature extraction module, a local feature extraction module, a global and local feature fusion module, and a feature mapping module;
in this embodiment, the multi-scale feature fusion module is formed by splicing three parallel convolution branches in a channel dimension, and each convolution branch includes a convolution layer and a pooling layer; wherein, the three parallel convolution layers are respectively a 1 × 1 convolution layer, a 3 × 3 convolution layer and a 5 × 5 convolution layer;
in this embodiment, the global feature extraction module includes two stages of feature extraction; the first level is composed of two 3 x 3 convolutional layers and a 2 x 2 maximal pooling layer, the first convolutional layer transforms the feature from 96 channels to 128 channels, and the second convolutional layer keeps the number of channels of the feature unchanged; the second level is composed of two 3 x 3 convolutional layers, the first layer transforms the feature from 128 channels to 256 channels, the second layer keeps the number of channels of the feature unchanged;
in this embodiment, the local feature extraction module includes an upper branch and a lower branch; when the features pass through the upper branch, the features firstly pass through two parallel 3 x 3 convolutional layers, the feature channel is converted from 96 to 128, then the obtained two parallel features are spliced in height, and the feature channel passes through one 3 x 3 convolutional layer and is converted from 128 to 256; the lower branch comprises two stages of feature extraction, the first stage is composed of two focusing convolution layers with parameters of 4 and a 2 multiplied by 2 maximum pooling layer, the first focusing convolution layer converts the features from 96 channels to 128 channels, and the second convolution layer keeps the number of the channels of the features unchanged; the second level consists of two focused convolutional layers with parameters of 8, the first convolutional layer transforms the feature from 128 channels to 256 channels, and the second convolutional layer keeps the number of channels of the feature unchanged;
in this embodiment, the global and local feature fusion module includes three times of fusion of the global feature and the local feature; the first time, the local features extracted by the first stage of the lower branch of the local feature extraction module are fused to the features extracted by the first stage of the global feature extraction module, the second time, the local features extracted by the second stage of the lower branch of the local feature extraction module are fused to the features extracted by the second stage of the global feature extraction module, and the third time, the features output by the second stage of the global feature extraction module are fused to the upper branch of the local feature extraction module;
in this embodiment, the feature mapping module includes a horizontal feature mapping module and a timing enhancement module; the horizontal feature mapping module is composed of a one-dimensional global average pooling layer and a one-dimensional global maximum pooling layer, and the time sequence enhancing module introduces a special generalized average pooling layer which is between the average pooling layer and the maximum pooling layer and contains learnable parameters;
step 2: aiming at a pedestrian image of a sequence to be detected, extracting gait feature information with different granularities by using a multi-scale feature fusion module to obtain a feature map;
in this embodiment, the specific implementation of step 2 includes the following substeps:
step 2.1: determining the central point of a gait silhouette image of a pedestrian, aligning according to the central point, and cutting the edge of the image to obtain a gait image sequence with the size of 64 multiplied by 44 pixels;
step 2.2: inputting the gait sequence into three different parallel branches, namely 1 × 1 convolution, 3 × 3 convolution and 5 × 5 convolution operations, wherein the image sequence can obtain feature maps with different granularities after passing through the different branches;
step 2.3: splicing the obtained feature maps with different particle sizes on the channel dimension;
assume that the input is
Figure BDA0003262513160000061
Where c represents the number of channels, and h and w represent the length and width of each frame imageThen the feature after stitching in the channel dimension is:
Figure BDA0003262513160000062
where F1×1(X)、F3×3(X) and F5×5(X) represents two-dimensional convolution operations with convolution kernels of 1, 3 and 5, respectively, and catC represents a splicing operation in the channel dimension.
Step 2.4: and (4) transmitting the characteristics of different receptive fields of the same level to the next layer through a convolution and pooling layer by the spliced characteristic diagram.
And step 3: extracting complete gait feature information from the feature map obtained in the step 2 by using a global feature extraction module, a local feature extraction module and a global and local feature fusion module;
in this embodiment, the specific implementation of step 3 includes the following substeps:
step 3.1: extracting global features of the sequence from the feature map obtained in the step 2 by using a common convolutional layer and a pooling layer;
assume that the input is
Figure BDA0003262513160000063
Where c represents the number of channels, h and w represent the length and width of each frame of image, the global feature is:
Figure BDA0003262513160000064
step 3.2: bisecting the feature map obtained in the step 2 into a plurality of blocks along the horizontal direction, mapping each block through the same convolution kernel, and splicing the obtained feature maps on the horizontal dimension to obtain complete local features;
assume that the input is
Figure BDA0003262513160000065
Where c represents the number of channels, and h and w represent the length and width of each frame of image; dividing the input feature into n parts along the horizontal direction to obtain a local feature map, and recording the local feature map as
Figure BDA0003262513160000066
Wherein
Figure BDA0003262513160000067
And representing the ith local feature map in the horizontal direction, and outputting the complete local features as follows:
Figure BDA0003262513160000068
where F3×3Representing a two-dimensional convolution operation with a convolution kernel of 3, and catH representing a stitching operation in the horizontal dimension.
Step 3.3: and fusing the obtained global features and the local features on the channel dimension.
And 4, step 4: mapping the complete gait feature information obtained in the step 3 to a high-dimensional space by using a feature mapping module;
in this embodiment, the specific implementation of step 4 includes the following sub-steps:
step 4.1: obtaining a micromotion feature vector sequence by one-dimensional generalized average pooling;
in the present embodiment, it is assumed that the input characteristics of this process are
Figure BDA0003262513160000071
Where n represents the number of sequence frames, c represents the number of channels, and h represents the length of each frame image; then the characteristic Y is outputoutComprises the following steps:
Figure BDA0003262513160000072
here, p is a learnable parameter, which corresponds to maximum pooling when p → ∞ is 1, and which corresponds to average pooling when p → ∞.
Step 4.2: a channel attention mechanism is introduced to weight the feature vector of each moment, and a one-dimensional convolution layer is adopted to weight the micro motion component;
step 4.3: the resulting features are mapped to a high-dimensional space using an independent fully connected layer.
And 5: and calculating the similarity between different characteristics through the Euclidean distance to finally obtain the pedestrian identity of the sequence to be detected.
In the training process, gait silhouette sequences of the same pedestrian at different visual angles are sequentially input into the network, the model can learn gait characteristics of the same pedestrian at different visual angles by combining data at multiple visual angles, the network is endowed with proper weight through back propagation, and the trained network model can map the gait characteristics at different visual angles into a unified judgment subspace. In the process of model testing, pedestrian gait features under different visual angles are mapped into the same judgment subspace, similarity among different features is calculated through Euclidean distance, and finally the identity of a pedestrian of a sequence to be tested is obtained.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A cross-perspective gait recognition method based on feature fusion is characterized by comprising the following steps:
step 1: constructing a cross-perspective gait recognition model based on feature fusion;
the model comprises a multi-scale feature fusion module, a global feature extraction module, a local feature extraction module, a global and local feature fusion module and a feature mapping module;
the multi-scale feature fusion module is formed by splicing three parallel convolution branches in a channel dimension, wherein each convolution branch comprises a convolution layer and a pooling layer; wherein, the three parallel convolution layers are respectively a 1 × 1 convolution layer, a 3 × 3 convolution layer and a 5 × 5 convolution layer;
the global feature extraction module comprises two stages of feature extraction; the first level is composed of two 3 x 3 convolutional layers and a 2 x 2 maximal pooling layer, the first convolutional layer transforms the feature from 96 channels to 128 channels, and the second convolutional layer keeps the number of channels of the feature unchanged; the second level is composed of two 3 x 3 convolutional layers, the first layer transforms the feature from 128 channels to 256 channels, the second layer keeps the number of channels of the feature unchanged;
the local feature extraction module comprises an upper branch and a lower branch; when the features pass through the upper branch, the features firstly pass through two parallel 3 x 3 convolutional layers, the feature channel is converted from 96 to 128, then the obtained two parallel features are spliced in height, and the feature channel passes through one 3 x 3 convolutional layer and is converted from 128 to 256; the lower branch comprises two stages of feature extraction, the first stage is composed of two focusing convolution layers with parameters of 4 and a 2 multiplied by 2 maximum pooling layer, the first focusing convolution layer converts the features from 96 channels to 128 channels, and the second convolution layer keeps the number of the channels of the features unchanged; the second level consists of two focused convolutional layers with parameters of 8, the first convolutional layer transforms the feature from 128 channels to 256 channels, and the second convolutional layer keeps the number of channels of the feature unchanged;
the global and local feature fusion module comprises three times of fusion of global features and local features; the first time, the local features extracted by the first stage of the lower branch of the local feature extraction module are fused to the features extracted by the first stage of the global feature extraction module, the second time, the local features extracted by the second stage of the lower branch of the local feature extraction module are fused to the features extracted by the second stage of the global feature extraction module, and the third time, the features output by the second stage of the global feature extraction module are fused to the upper branch of the local feature extraction module;
the feature mapping module comprises a horizontal feature mapping module and a time sequence enhancing module; the horizontal feature mapping module is composed of a one-dimensional global average pooling layer and a one-dimensional global maximum pooling layer, and the time sequence enhancing module introduces a generalized average pooling layer which is between the average pooling layer and the maximum pooling layer;
step 2: aiming at a pedestrian image of a sequence to be detected, extracting gait feature information with different granularities by using a multi-scale feature fusion module to obtain a feature map;
and step 3: extracting complete gait feature information from the feature map obtained in the step 2 by using a global feature extraction module, a local feature extraction module and a global and local feature fusion module;
and 4, step 4: mapping the complete gait feature information obtained in the step 3 to a high-dimensional space by using a feature mapping module;
and 5: and calculating the similarity between different characteristics through the Euclidean distance to finally obtain the pedestrian identity of the sequence to be detected.
2. The cross-perspective gait recognition method based on feature fusion as claimed in claim 1, wherein the step 2 is realized by the following steps:
step 2.1: determining the central point of a gait silhouette image of a pedestrian, aligning according to the central point, and cutting the edge of the image to obtain a gait image sequence with the size of 64 multiplied by 44 pixels;
step 2.2: inputting the gait sequence into three different parallel branches, namely 1 × 1 convolution, 3 × 3 convolution and 5 × 5 convolution operations, wherein the image sequence can obtain feature maps with different granularities after passing through the different branches;
step 2.3: splicing the obtained feature maps with different particle sizes on the channel dimension;
step 2.4: and (4) transmitting the characteristics of different receptive fields of the same level to the next layer through a convolution and pooling layer by the spliced characteristic diagram.
3. The cross-perspective gait recognition method based on feature fusion according to claim 2, characterized in that: in step 2.3, assume the input is
Figure FDA0003262513150000021
Where c represents the number of channels, and h and w represent the length and width of each frame of image, the feature after stitching in channel dimension is:
Figure FDA0003262513150000022
where F1×1(X)、F3×3(X) and F5×5(X) represents two-dimensional convolution operations with convolution kernels of 1, 3 and 5, respectively, and catC represents a splicing operation in the channel dimension.
4. The cross-perspective gait recognition method based on feature fusion as claimed in claim 1, wherein the step 3 is realized by the following steps:
step 3.1: extracting global features of the sequence from the feature map obtained in the step 2 by using a common convolutional layer and a pooling layer;
step 3.2: bisecting the feature map obtained in the step 2 into a plurality of blocks along the horizontal direction, mapping each block through the same convolution kernel, and splicing the obtained feature maps on the horizontal dimension to obtain complete local features;
step 3.3: and fusing the obtained global features and the local features on the channel dimension.
5. The cross-perspective gait recognition method based on feature fusion according to claim 4, characterized in that: in step 3.1, assume the input is
Figure FDA0003262513150000031
Where c represents the number of channels, h and w represent the length and width of each frame of image, the global feature is
Figure FDA0003262513150000032
6. The cross-perspective gait recognition method based on feature fusion according to claim 4, characterized in that: in step 3.2, assume the input is
Figure FDA0003262513150000033
Where c represents the number of channels, and h and w represent the length and width of each frame of image; dividing the input feature into n parts along the horizontal direction to obtain a local feature map, and recording the local feature map as
Figure FDA0003262513150000034
Wherein
Figure FDA0003262513150000035
And representing the ith local feature map in the horizontal direction, and outputting the complete local features as follows:
Figure FDA0003262513150000036
where F3×3Representing a two-dimensional convolution operation with a convolution kernel of 3, and catH representing a stitching operation in the horizontal dimension.
7. The cross-perspective gait recognition method based on feature fusion as claimed in claim 1, wherein the step 4 is realized by the following steps:
step 4.1: obtaining a micromotion feature vector sequence by one-dimensional generalized average pooling;
step 4.2: a channel attention mechanism is introduced to weight the feature vector of each moment, and a one-dimensional convolution layer is adopted to weight the micro motion component;
step 4.3: the resulting features are mapped to a high-dimensional space using an independent fully connected layer.
8. The method for cross-perspective gait recognition based on feature fusion of claim 7, characterized in that in step 4.1, the micro-motion feature vector sequence is obtained by one-dimensional generalized mean pooling, and the input features of the process are assumed to be
Figure FDA0003262513150000037
Where n represents the number of sequence frames, c represents the number of channels, and h represents the length of each frame image; then the characteristic Y is outputoutComprises the following steps:
Figure FDA0003262513150000038
here, p is a learnable parameter, which corresponds to maximum pooling when p → ∞ is 1, and which corresponds to average pooling when p → ∞.
9. A cross-perspective gait recognition system based on feature fusion is characterized by comprising the following modules:
the module 1 is used for constructing a cross-perspective gait recognition model based on feature fusion;
the model comprises a multi-scale feature fusion module, a global feature extraction module, a local feature extraction module, a global and local feature fusion module and a feature mapping module;
the multi-scale feature fusion module is formed by splicing three parallel convolution branches in a channel dimension, wherein each convolution branch comprises a convolution layer and a pooling layer; wherein, the three parallel convolution layers are respectively a 1 × 1 convolution layer, a 3 × 3 convolution layer and a 5 × 5 convolution layer;
the global feature extraction module comprises two stages of feature extraction; the first level is composed of two 3 x 3 convolutional layers and a 2 x 2 maximal pooling layer, the first convolutional layer transforms the feature from 96 channels to 128 channels, and the second convolutional layer keeps the number of channels of the feature unchanged; the second level is composed of two 3 x 3 convolutional layers, the first layer transforms the feature from 128 channels to 256 channels, the second layer keeps the number of channels of the feature unchanged;
the local feature extraction module comprises an upper branch and a lower branch; when the features pass through the upper branch, the features firstly pass through two parallel 3 x 3 convolutional layers, the feature channel is converted from 96 to 128, then the obtained two parallel features are spliced in height, and the feature channel passes through one 3 x 3 convolutional layer and is converted from 128 to 256; the lower branch comprises two stages of feature extraction, the first stage is composed of two focusing convolution layers with parameters of 4 and a 2 multiplied by 2 maximum pooling layer, the first focusing convolution layer converts the features from 96 channels to 128 channels, and the second convolution layer keeps the number of the channels of the features unchanged; the second level consists of two focused convolutional layers with parameters of 8, the first convolutional layer transforms the feature from 128 channels to 256 channels, and the second convolutional layer keeps the number of channels of the feature unchanged;
the global and local feature fusion module comprises three times of fusion of global features and local features; the first time, the local features extracted by the first stage of the lower branch of the local feature extraction module are fused to the features extracted by the first stage of the global feature extraction module, the second time, the local features extracted by the second stage of the lower branch of the local feature extraction module are fused to the features extracted by the second stage of the global feature extraction module, and the third time, the features output by the second stage of the global feature extraction module are fused to the upper branch of the local feature extraction module;
the feature mapping module comprises a horizontal feature mapping module and a time sequence enhancing module; the horizontal feature mapping module is composed of a one-dimensional global average pooling layer and a one-dimensional global maximum pooling layer, and the time sequence enhancing module introduces a generalized average pooling layer which is between the average pooling layer and the maximum pooling layer;
the module 2 is used for extracting gait feature information with different granularities by using a multi-scale feature fusion module aiming at a pedestrian image of a sequence to be detected to obtain a feature map;
a module 3, which is used for extracting complete gait feature information from the feature map obtained in the module 2 by using a global feature extraction module, a local feature extraction module and a global and local feature fusion module;
a module 4, configured to map the complete gait feature information obtained in the module 3 to a high-dimensional space by using a feature mapping module;
and the module 5 is used for calculating the similarity between different characteristics through the Euclidean distance to finally obtain the pedestrian identity of the sequence to be detected.
CN202111076716.6A 2021-09-14 2021-09-14 Cross-view gait recognition method and system based on feature fusion Active CN113869151B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111076716.6A CN113869151B (en) 2021-09-14 2021-09-14 Cross-view gait recognition method and system based on feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111076716.6A CN113869151B (en) 2021-09-14 2021-09-14 Cross-view gait recognition method and system based on feature fusion

Publications (2)

Publication Number Publication Date
CN113869151A true CN113869151A (en) 2021-12-31
CN113869151B CN113869151B (en) 2024-09-24

Family

ID=78995794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111076716.6A Active CN113869151B (en) 2021-09-14 2021-09-14 Cross-view gait recognition method and system based on feature fusion

Country Status (1)

Country Link
CN (1) CN113869151B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205983A (en) * 2022-09-14 2022-10-18 武汉大学 A cross-view gait recognition method, system and device based on multi-feature aggregation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583298A (en) * 2018-10-26 2019-04-05 复旦大学 An ensemble-based approach for cross-view gait recognition
CN111860291A (en) * 2020-07-16 2020-10-30 上海交通大学 Multimodal pedestrian identification method and system based on pedestrian appearance and gait information
US20210224524A1 (en) * 2020-01-22 2021-07-22 Board Of Trustees Of Michigan State University Systems And Methods For Gait Recognition Via Disentangled Representation Learning
US20210232813A1 (en) * 2020-01-23 2021-07-29 Tongji University Person re-identification method combining reverse attention and multi-scale deep supervision

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583298A (en) * 2018-10-26 2019-04-05 复旦大学 An ensemble-based approach for cross-view gait recognition
US20210224524A1 (en) * 2020-01-22 2021-07-22 Board Of Trustees Of Michigan State University Systems And Methods For Gait Recognition Via Disentangled Representation Learning
US20210232813A1 (en) * 2020-01-23 2021-07-29 Tongji University Person re-identification method combining reverse attention and multi-scale deep supervision
CN111860291A (en) * 2020-07-16 2020-10-30 上海交通大学 Multimodal pedestrian identification method and system based on pedestrian appearance and gait information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
叶波;文玉梅;何卫华;: "多分类器信息融合的步态识别算法", 中国图象图形学报, no. 08, 15 September 2009 (2009-09-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205983A (en) * 2022-09-14 2022-10-18 武汉大学 A cross-view gait recognition method, system and device based on multi-feature aggregation

Also Published As

Publication number Publication date
CN113869151B (en) 2024-09-24

Similar Documents

Publication Publication Date Title
Zhang et al. Differential feature awareness network within antagonistic learning for infrared-visible object detection
CN108537136B (en) Pedestrian Re-identification Method Based on Pose Normalized Image Generation
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
CN109977773B (en) Human behavior recognition method and system based on multi-target detection 3D CNN
WO2021098261A1 (en) Target detection method and apparatus
CN109508663B (en) A Pedestrian Re-identification Method Based on Multi-level Supervision Network
CN110363140A (en) A real-time recognition method of human action based on infrared images
CN113221770B (en) Cross-domain pedestrian re-recognition method and system based on multi-feature hybrid learning
Liu et al. Action recognition based on 3d skeleton and rgb frame fusion
CN111860291A (en) Multimodal pedestrian identification method and system based on pedestrian appearance and gait information
CN103714181B (en) A kind of hierarchical particular persons search method
CN105574510A (en) Gait identification method and device
CN111539255A (en) Cross-modal pedestrian re-identification method based on multi-modal image style conversion
CN109086659B (en) Human behavior recognition method and device based on multi-channel feature fusion
CN110390308B (en) Video behavior identification method based on space-time confrontation generation network
CN109993103A (en) A Human Behavior Recognition Method Based on Point Cloud Data
CN112668493B (en) GAN and deep learning based re-identification and location tracking system for dressed pedestrians
CN114973418B (en) Behavior recognition method of cross-mode three-dimensional point cloud sequence space-time characteristic network
CN112070010A (en) Pedestrian re-recognition method combining multi-loss dynamic training strategy to enhance local feature learning
El-Ghaish et al. Human action recognition based on integrating body pose, part shape, and motion
CN103295221A (en) Water surface target motion detecting method simulating compound eye visual mechanism and polarization imaging
CN110163175A (en) A kind of gait recognition method and system based on improvement VGG-16 network
CN118799919A (en) A full-time multimodal person re-identification method based on simulation augmentation and prototype learning
CN113869151B (en) Cross-view gait recognition method and system based on feature fusion
CN111160115B (en) A Video Pedestrian Re-Identification Method Based on Siamese Two-Stream 3D Convolutional Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant