CN116012955A

CN116012955A - Infrared gait recognition method for improving GaitSet

Info

Publication number: CN116012955A
Application number: CN202310314282.1A
Authority: CN
Inventors: 张云佐; 董旭; 郑宇鑫; 武存宇; 张天; 刘亚猛; 康伟丽; 朱鹏飞; 杨月辉
Original assignee: Shijiazhuang Tiedao University
Current assignee: Shijiazhuang Tiedao University
Priority date: 2023-03-28
Filing date: 2023-03-28
Publication date: 2023-04-25
Anticipated expiration: 2043-03-28
Also published as: CN116012955B

Abstract

The invention discloses an infrared gait recognition method for improving GaitSet, which belongs to the field of computer vision and comprises the following steps: 1) extracting CASIA-C using a lossless silhouette extraction scheme combining Faster R-CNN detection algorithm and Deeplab v3+ semantic segmentation algorithm Human silhouette in the infrared gait data set, and construct a sample set; 2) Introduce silhouette difference fusion module, residual unit, multi-scale feature fusion technology and multi-scale pyramid mapping module on the basis of GaitSet network to improve; 3) Use steps 1) The obtained training samples are used for training; 4) The gait features of pedestrians in the infrared video sequence are extracted to complete the human body identification under the infrared state. This method can fully capture and utilize different granularity and spatio-temporal gait information in infrared images, and does not need to calculate the gait cycle, and can take any sequence and number of infrared gait frame sequences as input.

Description

An infrared gait recognition method based on improved GaitSet

技术领域Technical Field

本发明涉及一种改进GaitSet的红外步态识别方法，属于计算机视觉技术领域。The invention relates to an infrared gait recognition method for improving GaitSet, and belongs to the technical field of computer vision.

背景技术Background Art

近年来，随着监控视频大数据的发展，具有非接触、难伪装等特性的步态特征识别技术备受关注。研究者们提出了许多有效的步态识别方法，但其中大部分都是在实验预先设定、光照充分的环境下来采集步态信息并进行相关的算法性能测试，对于夜晚状态下采集的红外步态图像的关注比较少。由于白天实验室环境下光线充足且光照稳定，所以对检测对象进行人体剪影提取较为简单，噪声影响小。但在红外状态下，由于采集到的步态图像的清晰度和对比度都比较低，预处理后的人体剪影经常会出现空洞以及肢体残缺不全的现象，会对后续的步态特征提取产生很大影响。同时，许多研究方法中所设计的步态识别模型大多是针对光照条件良好的环境，而在红外条件下，这些网络模型对于轮廓特征中不同粒度信息以及深层次时空特征的提取和表达能力都会大幅降低。因此，如何消除步态剪影中外在冗余信息的影响，提升步态识别模型的特征提取能力和识别表现力成为后续研究的重点。In recent years, with the development of big data of surveillance videos, gait feature recognition technology with characteristics such as non-contact and difficult to disguise has attracted much attention. Researchers have proposed many effective gait recognition methods, but most of them collect gait information and conduct relevant algorithm performance tests in an environment with sufficient lighting in advance, and pay less attention to infrared gait images collected at night. Since the light is sufficient and stable in the laboratory environment during the day, it is relatively simple to extract the human silhouette of the detection object, and the noise influence is small. However, in the infrared state, due to the low clarity and contrast of the collected gait images, the pre-processed human silhouette often has holes and incomplete limbs, which will have a great impact on the subsequent gait feature extraction. At the same time, the gait recognition models designed in many research methods are mostly for environments with good lighting conditions. Under infrared conditions, these network models will greatly reduce the extraction and expression capabilities of different granularity information in the contour features and deep spatiotemporal features. Therefore, how to eliminate the influence of external redundant information in the gait silhouette and improve the feature extraction ability and recognition expression of the gait recognition model has become the focus of subsequent research.

目前，在步态识别领域的相关研究仍然不多，现有的识别方法可大致分为基于外观和基于模型两类。其中基于外观的方法是指直接从步态视频序列中提取人体剪影或轮廓图等特征进行步态识别。此类方法虽然可以获取步态周期等步态信息，但极易受剪影图质量的影响。如：Kale等提出隐马尔可夫模型方法，该方法可实现紧凑有效的步态识别，但需要逐帧处理，计算量大。Han等提出经典的基于步态能量图(Gait Energy Image，GEI)的识别方法，该方法可充分利用人体外观形状随时间变化的信息。Sokolova和Castro等提出从步态轮廓序列中提取包含时间信息的光流来获取相邻帧之间的差异变化信息，但该提取过程操作复杂，极为耗时。Yu等提出了一种适用于多视角的识别方法，该方法利用堆叠自动编码器实现，可将任意视角和状态下的GEI序列转换至指定视角、正常状态下的GEI序列。Chao等提出GaitSet方法，其可以将无时序约束关系的步态轮廓剪影图作为输入，并通过CNN来提取步态图像中的内在关系。基于模型的方法是指对人肢体结构或行为习惯进行建模。其虽可以有效降低噪声及遮挡等因素的影响，但大多模型对时空信息和不同粒度信息敏感性较差，且适用场景十分受限，在红外条件下，识别效果并不理想。如：Cunado等提出钟摆模型，使用霍夫变换提取人体腿部的骨骼线条作为步态特征。Lu等提出一种基于链接模型的方法，采用图像阈值和轮廓自动调节方法来提取人体剪影，并利用SVM(Support VectorMachine)分类器进行最终识别。Zhu等提出LFN和LFN V2方法，其中LFN V2方法在LFN方法基础上进行微调，扩大了获取步态轮廓信息的范围，可有效剔除步态轮廓特征中包含的冗余空间信息。Mei等利用长短时记忆网络，集中注意力提取腿部的变化信息以及每个人体步态周期内的时间维度特征，有效提升了部分特征缺失情况下的识别准确率。At present, there are still few related studies in the field of gait recognition, and the existing recognition methods can be roughly divided into two categories: appearance-based and model-based. Among them, the appearance-based method refers to directly extracting features such as human silhouettes or contours from gait video sequences for gait recognition. Although this type of method can obtain gait information such as gait cycle, it is easily affected by the quality of the silhouette. For example, Kale et al. proposed a hidden Markov model method, which can achieve compact and effective gait recognition, but requires frame-by-frame processing and has a large amount of calculation. Han et al. proposed a classic recognition method based on gait energy image (GEI), which can make full use of the information of human appearance shape changing over time. Sokolova and Castro et al. proposed to extract optical flow containing time information from gait contour sequences to obtain the difference change information between adjacent frames, but the extraction process is complex and extremely time-consuming. Yu et al. proposed a recognition method suitable for multiple perspectives. This method is implemented using a stacked autoencoder, which can convert GEI sequences at any perspective and state into GEI sequences at a specified perspective and normal state. Chao et al. proposed the GaitSet method, which can take the gait silhouette image without temporal constraints as input and extract the internal relationship in the gait image through CNN. The model-based method refers to modeling the human limb structure or behavioral habits. Although it can effectively reduce the influence of factors such as noise and occlusion, most models have poor sensitivity to spatiotemporal information and information of different granularity, and the applicable scenarios are very limited. Under infrared conditions, the recognition effect is not ideal. For example, Cunado et al. proposed a pendulum model, using Hough transform to extract the bone lines of the human leg as gait features. Lu et al. proposed a method based on a link model, using image threshold and contour automatic adjustment method to extract human silhouettes, and using SVM (Support Vector Machine) classifier for final recognition. Zhu et al. proposed the LFN and LFN V2 methods, among which the LFN V2 method is fine-tuned on the basis of the LFN method, expanding the scope of obtaining gait contour information, and can effectively eliminate the redundant spatial information contained in the gait contour features. Mei et al. used long short-term memory networks to focus on extracting leg change information and time dimension characteristics within each human gait cycle, effectively improving the recognition accuracy when some features are missing.

发明内容Summary of the invention

针对低质量红外步态图像细节模糊、对比度低，导致提取到的步态剪影存在空洞现象以及现有基于卷积神经网络的步态识别模型网络层次较浅，不能充分利用和捕捉红外图像中的不同粒度及时空步态信息的问题，本发明的目的是在于提供一种可充分捕捉和利用红外步态图像中不同粒度和维度时空信息的步态识别方法，通过设计人体剪影无损提取方案和构建残差多尺度融合的双流深度网络模型实现在红外状态下高效识别人体身份。In order to solve the problems that low-quality infrared gait images have blurred details and low contrast, resulting in holes in the extracted gait silhouettes, and that the existing gait recognition model based on convolutional neural networks has a shallow network layer and cannot fully utilize and capture gait information of different granularities and temporal and spatial dimensions in infrared images, the purpose of the present invention is to provide a gait recognition method that can fully capture and utilize temporal and spatial information of different granularities and dimensions in infrared gait images, and to achieve efficient recognition of human identity in infrared state by designing a lossless extraction scheme for human silhouettes and constructing a dual-stream deep network model with residual multi-scale fusion.

为实现上述目的，本发明的一个实施例提供了一种改进GaitSet的红外步态识别方法，其特征包括如下步骤：To achieve the above object, an embodiment of the present invention provides an infrared gait recognition method based on an improved GaitSet, which is characterized by comprising the following steps:

1)将初始CASIA-C红外步态数据集中的视频序列按帧率拆分成图像序列；1) Split the video sequences in the initial CASIA-C infrared gait dataset into image sequences according to the frame rate;

2)采用Faster R-CNN算法精准定位红外步态图像中的人体所在区域，返回人体最小包围框的同时标明不同区域内容所属类别，并采用Deeplab v3+算法提取人体剪影，构建样本集；2) The Faster R-CNN algorithm is used to accurately locate the human body area in the infrared gait image, and the minimum bounding box of the human body is returned while indicating the categories of the content in different areas. The Deeplab v3+ algorithm is used to extract the human silhouette and construct a sample set;

3)构建改进GaitSet的步态识别网络，通过引入剪影差分融合模块来凸显相邻步态帧之间的动态变化信息；3) Build an improved GaitSet gait recognition network and introduce a silhouette difference fusion module to highlight the dynamic change information between adjacent gait frames;

4)使用残差单元Bottleneck替换卷积层，用以加深网络层次，提高模型收敛速度；4) Use the residual unit Bottleneck to replace the convolutional layer to deepen the network layer and improve the model convergence speed;

5)提出一种多尺度特征融合技术来捕获不同粒度的步态信息，使提取到的特征信息更具代表性和区分度；5) A multi-scale feature fusion technique is proposed to capture gait information of different granularities, making the extracted feature information more representative and discriminative;

6)采用多尺度金字塔映射模块来有效融合横向和纵向上的步态信息，进一步增强模型的特征提取能力；6) A multi-scale pyramid mapping module is used to effectively integrate gait information in the horizontal and vertical directions, further enhancing the feature extraction capability of the model;

7)利用步骤2)得到的训练样本对网络模型进行训练，通过相似性度量完成身份识别。7) Use the training samples obtained in step 2) to train the network model and complete identity recognition through similarity measurement.

进一步的技术方案在于：首先，将初始CASIA-C数据集中的行人步态视频流按照帧率拆分成图像序列；然后，采用Faster R-CNN算法对图像序列进行目标检测，并形成人形快照；最后将人形快照序列输入到Deeplab v3+语义分割模型中进行人体剪影提取，并将人体剪影归一化至64×64大小，对归一化后的分割结果做进一步处理，将分割后图像中人体区域部分的像素值置为255，其余部分(背景)像素值置为0。A further technical solution is: first, split the pedestrian gait video stream in the initial CASIA-C dataset into image sequences according to the frame rate; then, use the Faster R-CNN algorithm to detect targets in the image sequence and form human snapshots; finally, input the human snapshot sequence into the Deeplab v3+ semantic segmentation model for human silhouette extraction, and normalize the human silhouette to a size of 64×64, further process the normalized segmentation results, set the pixel values of the human area in the segmented image to 255, and set the pixel values of the rest (background) to 0.

进一步的技术方案在于：设计出步态剪影差分融合模块，

帧的初始步态剪影可以得到

帧的剪影差分融合图，计算为：A further technical solution is to design a gait silhouette difference fusion module.

The initial gait silhouette of the frame can be obtained

The silhouette difference fusion map of the frame is calculated as:

进一步的技术方案在于：引入残差单元Bottleneck，使得网络前向传播过程中卷积层之间可形成跳跃连接，在实现对输入与输出的恒等映射、特征降维、加深网络层次以及加快模型收敛速度目的的同时不会造成模型性能退化，输入

与残差映射

的关系为：A further technical solution is to introduce the residual unit Bottleneck, so that jump connections can be formed between convolutional layers during the network forward propagation process, achieving the purpose of equal mapping of input and output, feature dimensionality reduction, deepening the network layer and accelerating the model convergence speed without causing model performance degradation.

With residual mapping

The relationship is:

，

,

当残差映射为0时，计算深层

的输出为：When the residual map is 0, the deep layer is calculated

The output is:

，

,

其反向梯度计算如下所示，其中

是损失函数的梯度下降：The reverse gradient calculation is as follows, where

is the gradient descent of the loss function:

。

.

进一步的技术方案在于：在由4个Bottleneck、2个池化层、2个SP层以及2个3×3的卷积层构成的网络主干路中增加两个1×1卷积的分支结构，用于在主干路的不同深度进行特征采样，得到2个特征图，并经过Add特征图相加方法将双分支的特征图与主干路特征图进行融合，最终得到了具有丰富粒度特征信息的输出特征图，其计算表达为：A further technical solution is to add two 1×1 convolution branch structures to the network trunk consisting of 4 Bottleneck layers, 2 pooling layers, 2 SP layers and 2 3×3 convolution layers to perform feature sampling at different depths of the trunk to obtain two feature maps, and then fuse the feature maps of the two branches with the trunk feature map through the Add feature map addition method, and finally obtain an output feature map with rich granular feature information. Its calculation expression is:

，

,

其中SP(Set Pooling)是一种集合池操作，其可以将frame-level特征聚合为单独的、包含全局信息的set-level特征，实现步态信息的压缩，计算过程如下：SP (Set Pooling) is a set pooling operation that can aggregate frame-level features into separate set-level features containing global information to compress gait information. The calculation process is as follows:

，

,

，

,

式中，

表示set-level特征，

表示frame-level特征。In the formula,

Represents set-level features,

Indicates frame-level features.

进一步的技术方案在于：提出多尺度金字塔映射模块，用于将水平方向及垂直方向上的步态特征信息一起引入到最终的判别任务中。其中横向以及纵向分割得到的特征块个数计算如下：A further technical solution is to propose a multi-scale pyramid mapping module to introduce gait feature information in both horizontal and vertical directions into the final discrimination task. The number of feature blocks obtained by horizontal and vertical segmentation is calculated as follows:

，

,

，

,

式中

和

代表分割尺度，其中

，

，经纵向、横向分割后，对得到的特征子块

和

进行全局池化操作：In the formula

and

represents the segmentation scale, where

,

After vertical and horizontal segmentation, the feature sub-blocks are obtained

and

Perform global pooling operations:

。

.

进一步的技术方案在于：对行人目标进行匹配，在众多类别标签中获取person的目标快照，通过余弦相似度量来判断上一目标特征

和当前快照目标特征

是否一致，计算如下：A further technical solution is to match pedestrian targets, obtain a snapshot of the person target among many category labels, and use cosine similarity measurement to determine the previous target feature.

and the current snapshot target feature

Whether it is consistent is calculated as follows:

。

.

采用上述技术方案所产生的有益效果在于：本发明为避免提取低质量红外图像中人体剪影时出现空洞，提出一种细粒度语义分割方案，将Faster R-CNN目标检测算法和Deeplab v3+语义分割算法相结合，实现无损人体剪影提取；此外为了加深网络模型的层次、加快模型收敛速度、提高模型对不同粒度信息和时空步态特征的提取及学习能力，引入了残差单元Bottleneck、剪影差分融合模块、多尺度特征融合技术和多尺度金字塔映射模块。实验结果表明，本发明综合性能要优于许多目前先进的方法。The beneficial effects of adopting the above technical solution are: in order to avoid the occurrence of holes when extracting human silhouettes in low-quality infrared images, the present invention proposes a fine-grained semantic segmentation solution, combining the Faster R-CNN target detection algorithm and the Deeplab v3+ semantic segmentation algorithm to achieve lossless human silhouette extraction; in addition, in order to deepen the hierarchy of the network model, accelerate the model convergence speed, and improve the model's ability to extract and learn different granularity information and spatiotemporal gait features, the residual unit Bottleneck, silhouette difference fusion module, multi-scale feature fusion technology and multi-scale pyramid mapping module are introduced. Experimental results show that the comprehensive performance of the present invention is better than many currently advanced methods.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

通过阅读参照以下附图对非限制性实施例所作的详细描述，本发明的其它特征、目的和优点将会变得更明显：Other features, objects and advantages of the present invention will become more apparent from the detailed description of non-limiting embodiments made with reference to the following drawings:

图1为本发明实施例提供的一种改进GaitSet的红外步态识别方法框架流程图；FIG1 is a framework flow chart of an improved GaitSet infrared gait recognition method provided by an embodiment of the present invention;

图2为本发明实施例提供的无损人体剪影提取方案流程图；FIG2 is a flow chart of a lossless human silhouette extraction solution provided by an embodiment of the present invention;

图3为本发明实施例提供的无损人体剪影示意图；FIG3 is a schematic diagram of a lossless human body silhouette provided by an embodiment of the present invention;

图4为本发明实施例提供的改进GaitSet的红外步态识别网络模型结构图；FIG4 is a structural diagram of an infrared gait recognition network model of an improved GaitSet provided by an embodiment of the present invention;

图5为本发明实施例提供的剪影差分融合示意图；FIG5 is a schematic diagram of silhouette difference fusion provided by an embodiment of the present invention;

图6为本发明实施例提供的残差单元Bottleneck结构示意图；FIG6 is a schematic diagram of a Bottleneck structure of a residual unit provided in an embodiment of the present invention;

图7为本发明实施例提供的多尺度特征融合技术示意图；FIG7 is a schematic diagram of a multi-scale feature fusion technology provided by an embodiment of the present invention;

图8为本发明实施例提供的多尺度金字塔映射模块结构示意图。FIG8 is a schematic diagram of the structure of a multi-scale pyramid mapping module provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following is a clear and complete description of the technical solutions in the embodiments of the present invention in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

在下面的描述中阐述了很多具体细节以便于充分理解本发明，但是本发明还可以采用其他不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本发明内涵的情况下做类似推广，因此本发明不受下面公开的具体实施例的限制。In the following description, many specific details are set forth to facilitate a full understanding of the present invention, but the present invention may also be implemented in other ways different from those described herein, and those skilled in the art may make similar generalizations without violating the connotation of the present invention. Therefore, the present invention is not limited to the specific embodiments disclosed below.

如图1所示，本发明实施例提供的一种改进GaitSet的红外步态识别方法框架流程图，包括：As shown in FIG1 , a framework flow chart of an improved GaitSet infrared gait recognition method provided by an embodiment of the present invention includes:

1)采用Fater R-CNN检测算法与Deeplab v3+语义分割算法相结合的无损剪影提取方案提取CASIA-C红外步态数据集中的人体剪影，并构建样本集；1) A lossless silhouette extraction scheme combining the Faster R-CNN detection algorithm and the Deeplab v3+ semantic segmentation algorithm is used to extract human silhouettes from the CASIA-C infrared gait dataset and construct a sample set;

2)在GaitSet网络基础上引入剪影差分融合模块、残差单元、多尺度特征融合技术以及多尺度金字塔映射模块来进行改进；2) Based on the GaitSet network, the silhouette difference fusion module, residual unit, multi-scale feature fusion technology and multi-scale pyramid mapping module are introduced to improve it;

3)利用步骤1)得到的训练样本进行训练；3) Using the training samples obtained in step 1) for training;

4)提取红外视频序列中行人步态特征，完成红外状态下的人体身份识别。4) Extract pedestrian gait features from infrared video sequences and complete human identity recognition in infrared state.

其中，所述的Fater R-CNN检测算法与Deeplab v3+语义分割算法相结合的无损剪影提取方案可实现无损人体剪影提取，流程如图2所示，包括：Among them, the lossless silhouette extraction scheme combining the Faster R-CNN detection algorithm with the Deeplab v3+ semantic segmentation algorithm can realize lossless human silhouette extraction. The process is shown in Figure 2, including:

1)将CASIA-C红外步态数据集按帧率拆分成步态图像序列；1) Split the CASIA-C infrared gait dataset into gait image sequences according to the frame rate;

2)采用目标检测算法Faster R-CNN检测红外图像中的运动目标，精准定位步态图像中的人体所在区域，返回人体最小包围框的同时标明所属类别，形成目标快照；2) Use the target detection algorithm Faster R-CNN to detect moving targets in infrared images, accurately locate the human body area in the gait image, return the minimum bounding box of the human body and indicate the category to which it belongs, and form a target snapshot;

3)获得目标快照后，通过余弦相似度量来判断上一目标特征

和当前快照目标特征

是否一致，如果一致便保存至相似目标集合中，不一致则丢弃，计算为：3) After obtaining the target snapshot, the cosine similarity metric is used to determine the previous target feature

and the current snapshot target feature

Are they consistent? If they are consistent, they are saved in the similar target set. If they are inconsistent, they are discarded. The calculation is:

，

,

4)人形快照匹配完成后使用细粒度语义分割算法Deeplab v3+来分割人体剪影，并将分割后图像中人体区域部分的像素值置为255，其余部分(背景)像素值置为0，最终得到的剪影如图3所示。4) After the human snapshot matching is completed, the fine-grained semantic segmentation algorithm Deeplab v3+ is used to segment the human silhouette, and the pixel value of the human area in the segmented image is set to 255, and the pixel value of the rest (background) is set to 0. The final silhouette is shown in Figure 3.

如图4所示，所述的改进GaitSet的红外步态识别网络模型以64×64大小的顺序剪影序列作为输入，然后经由2个分支网络进行处理，其中上支流为剪影差分融合流，其由剪影差分融合模块、特征提取及多尺度特征融合模块、多尺度金字塔映射模块和识别分类模块构成。下支流为剪影流，除去剪影差分融合模块后，其余结构与上支流高度一致。本发明通过剪影差分融合流和剪影流来分别捕获相邻步态特征图之间的动态变化信息及空间信息；通过引入残差单元来达到加深网络层次，加快模型收敛速度的目的；利用多尺度融合技术来获取深层和浅层的不同粒度特征，提高步态信息的代表性和表现力；使用多尺度金字塔映射模块来进一步加强模型对局部以及全局特征信息的表征能力，网络框架中，SP(SetPooling)是一种集合池操作，其可以将frame-level特征聚合为单独的、包含全局信息的set-level特征，实现步态信息的压缩，计算过程如下：As shown in Figure 4, the infrared gait recognition network model of the improved GaitSet takes a 64×64-sized sequential silhouette sequence as input, and then processes it through two branch networks, where the upper branch is a silhouette difference fusion flow, which is composed of a silhouette difference fusion module, a feature extraction and multi-scale feature fusion module, a multi-scale pyramid mapping module, and a recognition classification module. The lower branch is a silhouette flow. After removing the silhouette difference fusion module, the remaining structure is highly consistent with the upper branch. The present invention uses silhouette difference fusion flow and silhouette flow to capture the dynamic change information and spatial information between adjacent gait feature graphs respectively; introduces residual units to deepen the network hierarchy and accelerate the convergence speed of the model; uses multi-scale fusion technology to obtain different granularity features of deep and shallow layers to improve the representativeness and expressiveness of gait information; uses a multi-scale pyramid mapping module to further enhance the model's ability to represent local and global feature information. In the network framework, SP (SetPooling) is a set pooling operation that can aggregate frame-level features into separate set-level features containing global information to achieve gait information compression. The calculation process is as follows:

，

,

，

,

式中，

表示set-level特征，

表示frame-level特征。In the formula,

Represents set-level features,

Indicates frame-level features.

如图5所示，所述的剪影差分融合模块是利用图像中像素点灰度值与门限阈值进行比较，从而突出图像帧中的变化部分，忽略相似部分，该模块可捕捉相邻步态剪影中纹理及边界的差异信息，这种信息专注于运动人体的轮廓边缘变化，可以有效解决人体细微动作及服饰变化信息难以捕获的问题，该过程定义为：As shown in FIG5 , the silhouette difference fusion module compares the grayscale value of the pixel in the image with the threshold value, thereby highlighting the changed part in the image frame and ignoring the similar part. The module can capture the difference information of texture and boundary in adjacent gait silhouettes. This information focuses on the contour edge changes of the moving human body, which can effectively solve the problem that the subtle movements of the human body and the change information of clothing are difficult to capture. The process is defined as:

，

,

式中：

，

代表剪影帧数；

和

分别为第

和

帧的剪影图；

为剪影图像中的像素坐标；

表示逆向输出的差分图；

表示正向输出的差分图；

表示按照0.5倍权重相加得到的剪影差分融合图。Where:

,

Represents the number of silhouette frames;

and

Respectively

and

Silhouette drawing of the frame;

is the pixel coordinate in the silhouette image;

A difference diagram showing the reverse output;

A differential graph showing the forward output;

It represents the silhouette difference fusion image obtained by adding the weights by 0.5.

如图6所示，本发明使用resnet50中的残差单元Bottleneck来替换卷积层，通过添加Bottleneck，在网络前向传播过程中使得卷积层之间形成跳跃连接，在实现对输入与输出的恒等映射、特征降维、加深网络层次以及加快模型收敛速度目的的同时不会造成模型性能退化，极大程度上优化了特征提取方式。Bottleneck的结构设计思路是先通过1×1的卷积对特征图做降维处理；然后做一次3×3的卷积操作，特征图通道维数保持不变；最后再次通过1×1的卷积操作将特征维度进行恢复。整个过程通过三次批量正则化(BatchNormalization，BN)来寻找线性与非线性之间的平衡点，使得输出的数据落在非线性区域内，从而促使网络的收敛速度加快。

和

分别表示特征输入与残差映射，其关系为：As shown in Figure 6, the present invention uses the residual unit Bottleneck in resnet50 to replace the convolution layer. By adding Bottleneck, a jump connection is formed between the convolution layers during the forward propagation of the network. While achieving the purpose of equal mapping of input and output, feature dimensionality reduction, deepening the network level and accelerating the convergence speed of the model, it will not cause the performance degradation of the model, and greatly optimizes the feature extraction method. The structural design idea of Bottleneck is to first reduce the dimension of the feature map through a 1×1 convolution; then perform a 3×3 convolution operation, and the channel dimension of the feature map remains unchanged; finally, restore the feature dimension again through a 1×1 convolution operation. The whole process uses three batch normalizations (BatchNormalization, BN) to find the balance point between linearity and nonlinearity, so that the output data falls within the nonlinear region, thereby accelerating the convergence speed of the network.

and

Represent the feature input and residual mapping respectively, and their relationship is:

，

,

当残差映射为0时，计算深层

的输出为：When the residual map is 0, the deep layer is calculated

The output is:

，

,

其反向梯度计算如下所示，其中

is the gradient descent of the loss function:

。

.

如图7所示，为了获取更精准有效的特征，尽可能多的识别出红外状态下的不同行人身份，提高识别准确率，本发明提出了一种多尺度特征融合技术，其可以对不同尺度的特征图进行特征采样，在包含丰富语义信息的同时减少了细节特征损失。主干路是由4个Bottleneck、2个池化层、2个SP层以及2个3×3的卷积层构成，上、下双分支的1×1卷积分别在主干路的不同深度进行特征采样，得到2个特征图，并经过Add特征图相加方法将双分支的特征图与主干路特征图进行融合，最终得到了具有丰富粒度特征信息的输出特征图，该过程定义为：As shown in Figure 7, in order to obtain more accurate and effective features, identify as many different pedestrian identities as possible in the infrared state, and improve the recognition accuracy, the present invention proposes a multi-scale feature fusion technology, which can perform feature sampling on feature maps of different scales, while containing rich semantic information and reducing the loss of detail features. The main road is composed of 4 Bottlenecks, 2 pooling layers, 2 SP layers, and 2 3×3 convolution layers. The 1×1 convolution of the upper and lower double branches performs feature sampling at different depths of the main road to obtain 2 feature maps, and the feature map of the double branches is fused with the feature map of the main road through the Add feature map addition method, and finally an output feature map with rich granular feature information is obtained. The process is defined as:

，

,

式中，

、

和

分别为3条通路的输出特征；

为融合后的输出特征；

和

分别为经1个以及2个Bottleneck(2)&Pooling&SP模块处理后的输出特征；

、

分别是卷积核大小为1×1和3×3的卷积层。In the formula,

,

and

They are the output features of the three pathways;

is the output feature after fusion;

and

They are the output features after being processed by 1 and 2 Bottleneck (2) & Pooling & SP modules respectively;

,

They are convolutional layers with kernel sizes of 1×1 and 3×3 respectively.

如图8所示，所述的采用多尺度金字塔映射模块对特征图F进行多尺度纵向和横向分割，

。其中，

、

和

分别为特征图的高度、宽度及通道数。假设

代表横向分割、

表示分割尺度，

，则在横向分割后会得到

个特征子块

。其中，

的大小与

有关，受下式关系约束，

为横向分割后的特征子块编号，

。As shown in FIG8 , the multi-scale pyramid mapping module is used to perform multi-scale vertical and horizontal segmentation on the feature map F.

.in,

,

and

are the height, width and number of channels of the feature map respectively. Assume

Represents horizontal division,

represents the segmentation scale,

, then after horizontal segmentation we get

Feature sub-block

.in,

The size and

Related, subject to the following relationship,

is the feature sub-block number after horizontal segmentation,

.

。

.

同理，假设

代表纵向分割、

为分割尺度，则在纵向分割后会到

个特征子块

。为了避免分割得到的特征子块中包含重复的特征信息，纵向分割尺度从

开始，即

，定义如下：Similarly, assuming

Represents vertical segmentation,

is the segmentation scale, then after vertical segmentation,

Feature sub-block

In order to avoid repeated feature information in the segmented feature sub-blocks, the vertical segmentation scale is

Start, that is

, defined as follows:

。

.

经纵向、横向分割后，对得到的特征子块

和

进行全局池化操作，计算过程如下，以便获取特征子块的全局信息。After vertical and horizontal segmentation, the obtained feature sub-blocks

and

A global pooling operation is performed, and the calculation process is as follows to obtain the global information of the feature sub-block.

，

,

式中

和

分别表示全局最大池化操作(Global Max Pooling，GMP)和全局平均池化操作(Global Average Pooling，GAP)。In the formula

and

They represent the global maximum pooling operation (Global Max Pooling, GMP) and the global average pooling operation (Global Average Pooling, GAP) respectively.

为了验证以上实施例的有效性，我们通过设置在四种行走条件下的平均识别精度的实验将本发明与其他先进方法进行对比。具体来说，我们使用CASIA-C数据集来评估我们的发明。CASIA-C数据集中包含153名行人(130男23女)，每名行人有快走、慢走、普通行走和背包行走四个行走场景，共计1530段红外步态视频。设置了Test A、Test B、Test C和TestD这4组实验来进行测试。其中，Test A的目的是评价普通行走状态下算法性能的实验，TestB和Test C是测试行走速度对识别效果影响的实验，Test D是测试背包情况对识别效果影响的实验。训练样本是从每个样本中的4个普通行走序列中随机挑选3个构成，共计36720张训练图像。测试集是由剩下的1个普通行走序列、2个快走序列、2个慢走序列以及2个背包行走序列构成，共计9180张测试图像。结果对比如表1所示。In order to verify the effectiveness of the above embodiments, we compare the present invention with other advanced methods by setting the average recognition accuracy under four walking conditions. Specifically, we use the CASIA-C data set to evaluate our invention. The CASIA-C data set contains 153 pedestrians (130 men and 23 women), each of whom has four walking scenes: fast walking, slow walking, normal walking and backpack walking, totaling 1530 infrared gait videos. Four groups of experiments, Test A, Test B, Test C and Test D, were set up for testing. Among them, the purpose of Test A is to evaluate the performance of the algorithm under normal walking conditions, TestB and Test C are experiments to test the influence of walking speed on recognition effect, and Test D is an experiment to test the influence of backpack situation on recognition effect. The training samples are randomly selected from 3 of the 4 normal walking sequences in each sample, totaling 36720 training images. The test set is composed of the remaining 1 normal walking sequence, 2 fast walking sequences, 2 slow walking sequences and 2 backpack walking sequences, totaling 9180 test images. The results are shown in Table 1.

表1 不同方法在4种行走条件下的识别准确率比较(单位：%)Table 1 Comparison of recognition accuracy of different methods under four walking conditions (unit: %)

从表1可以看出，在Test A中，本发明识别率为97.84%，相较于LFN方法、LFN V2方法、CNN方法和LSTM方法，分别提升了0.75%、0.39%、0.21%和0.05%；在Test B中，本发明识别率为96.89%，与其他4种方法相比，分别提升了2.76%、1.68%、0.89%和0.69%；在Test C中，本发明识别率为97.04%，相较于其他4种方法，分别提升了3.01%、2.19%、1.13%和0.66%；在Test D中，本发明识别率为93.32%，比LFN方法和LFN V2方法分别提升了4.07%和0.63%，与CNN方法和LSTM方法相比降低了0.74%和0.86%；在最终的平均识别率上，本发明相较于其他4种方法，分别提升了2.64%、1.22%、0.37%和0.13%，说明，本发明的综合性能最优。As can be seen from Table 1, in Test A, the recognition rate of the present invention is 97.84%, which is 0.75%, 0.39%, 0.21% and 0.05% higher than that of the LFN method, LFN V2 method, CNN method and LSTM method, respectively; in Test B, the recognition rate of the present invention is 96.89%, which is 2.76%, 1.68%, 0.89% and 0.69% higher than that of the other four methods, respectively; in Test C, the recognition rate of the present invention is 97.04%, which is 3.01%, 2.19%, 1.13% and 0.66% higher than that of the other four methods, respectively; in Test D, the recognition rate of the present invention is 93.32%, which is 3.2% higher than that of the LFN method and LFN V2 method, respectively. The V2 method improved by 4.07% and 0.63% respectively, and decreased by 0.74% and 0.86% compared with the CNN method and the LSTM method; in terms of the final average recognition rate, the present invention improved by 2.64%, 1.22%, 0.37% and 0.13% respectively compared with the other four methods, indicating that the present invention has the best comprehensive performance.

以上对本发明的具体实施例进行了详细描述。需要理解的是，本发明并不局限于上述特定实施方式，本领域技术人员可以在权利要求的范围内做出各种变形或修改，这并不影响本发明的实质内容。The specific embodiments of the present invention are described in detail above. It should be understood that the present invention is not limited to the above specific embodiments, and those skilled in the art may make various modifications or variations within the scope of the claims, which do not affect the essence of the present invention.

Claims

1. An infrared gait recognition method for improving GaitSet is characterized by comprising the following steps:

1) Splitting a video sequence in an initial CASIA-C infrared gait data set into an image sequence according to a frame rate;

2) The method comprises the steps of adopting a Faster R-CNN algorithm to accurately locate the region where a human body is located in an infrared gait image, returning to a minimum bounding box of the human body, marking the category to which different region contents belong, extracting human body silhouettes by adopting a deep v3+ algorithm, and constructing a sample set;

3) Constructing a gait recognition network of an improved GaitSet, and highlighting dynamic change information between adjacent gait frames by introducing a silhouette differential fusion module;

4) The residual error unit Bottleneck is used for replacing a convolution layer, so that the network hierarchy is deepened, and the convergence speed of a model is improved;

5) Providing a multi-scale feature fusion technology to capture gait information with different granularities, so that the extracted feature information is more representative and distinguishable;

6) The multi-scale pyramid mapping module is adopted to effectively fuse gait information in the transverse direction and the longitudinal direction, so that the feature extraction capability of the model is further enhanced;

7) Training the network model by using the training sample obtained in the step 2), and completing identity recognition through similarity measurement.

2. The improved GaitSet infrared gait recognition method of claim 1, wherein the sample data generation is specifically performed as follows: firstly, splitting pedestrian gait video stream in an initial CASIA-C data set into an image sequence according to a frame rate; then, performing target detection on the image sequence by adopting a Faster R-CNN algorithm, and forming a humanoid snapshot; finally, inputting the human snapshot sequence into the deep v3+ semantic segmentation model to extract the human silhouette, and normalizing the human silhouette to 64×64.

3. The method for improving GaitSet's infrared gait recognition of claim 1, wherein a gait silhouette differential fusion module is designed, and the calculation formula is:

。

4. the method for infrared gait recognition by improving GaitSet according to claim 1, wherein a residual unit Bottleneck is introduced to form jump connection between convolution layers in the forward propagation process of the network, so that the purposes of constant mapping of input and output, feature dimension reduction, network hierarchy deepening and model convergence speed acceleration are achieved without causing model performance degradation, and input

Mapping with residual->

The relation of (2) is:

，

when the residual map is 0, the deep layer is calculated

The outputs of (2) are:

。

5. the improved GaitSet infrared gait recognition network of claim 1, wherein two 1 x 1 convolved branch structures are added to a network trunk consisting of 4 Bottleneck, 2 pooling layers, 2 SP layers and 23 x 3 convolved layers for feature sampling at different depths of the trunk, the calculation formula is:

。/>

6. the improved GaitSet infrared gait recognition method of claim 1, wherein a multi-scale pyramid mapping module is provided for introducing gait feature information in both the horizontal and vertical directions together into a final discrimination task; the calculation formula of the number of the feature blocks obtained by transverse and longitudinal segmentation is as follows:

，

，

in the middle ofiAndjrepresenting a segmentation scale, wherein

，

After longitudinal and transverse segmentation, the obtained characteristic sub-blocks are +.>

And->

And performing global pooling operation, wherein the calculation formula is as follows:

。

7. the improved GaitSet infrared gait recognition method of claim 2, wherein the pedestrian target is matched, a snapshot of the person's target is taken from a plurality of classes of tags, and the cosine similarity measure is used to determine the last target feature

And current snapshot target feature->

Whether the two calculation formulas are consistent is as follows:

。/>