[go: up one dir, main page]

CN117994822B - Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion - Google Patents

Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion Download PDF

Info

Publication number
CN117994822B
CN117994822B CN202410406480.5A CN202410406480A CN117994822B CN 117994822 B CN117994822 B CN 117994822B CN 202410406480 A CN202410406480 A CN 202410406480A CN 117994822 B CN117994822 B CN 117994822B
Authority
CN
China
Prior art keywords
features
mode
resnet
residual block
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410406480.5A
Other languages
Chinese (zh)
Other versions
CN117994822A (en
Inventor
张国庆
汪海蕊
郑钰辉
张家伟
董仕豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202410406480.5A priority Critical patent/CN117994822B/en
Publication of CN117994822A publication Critical patent/CN117994822A/en
Application granted granted Critical
Publication of CN117994822B publication Critical patent/CN117994822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于辅助模态增强和多尺度特征融合的跨模态行人重识别方法,包括以下步骤:(1)获取原始图像,划分训练集、验证集和测试集;对训练集中的可见光图像和红外图像进行预处理;(2)利用ResNet50作为骨干网络,添加辅助模态增强模块;(3)将步骤(2)输出的特征继续输入到由ResNet50进行特征提取和融合;(4)将ResNet50最终的输出特征进行全局平均池化和批量归一化,计算局部语义一致性损失;本发明减小可见光与红外之间的模态差异,学习到更多模态共享的身份信息,还能捕获不同感受野的身份信息,实现行人身份特征的充分挖掘。

The present invention discloses a cross-modal pedestrian re-identification method based on auxiliary modality enhancement and multi-scale feature fusion, comprising the following steps: (1) obtaining the original image, dividing it into a training set, a validation set and a test set; preprocessing the visible light image and the infrared image in the training set; (2) using ResNet50 as the backbone network and adding an auxiliary modality enhancement module; (3) continuing to input the features output from step (2) into ResNet50 for feature extraction and fusion; (4) performing global average pooling and batch normalization on the final output features of ResNet50, and calculating the local semantic consistency loss; the present invention reduces the modal difference between visible light and infrared, learns more identity information shared by the modalities, and can also capture identity information of different receptive fields, thereby fully mining the identity features of pedestrians.

Description

Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion
Technical Field
The invention relates to the technical field of computer vision image retrieval, in particular to a cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion.
Background
Pedestrian re-recognition aims at searching for pedestrians of a specific identity from a wide candidate set, the images in the candidate set come from different cameras, and the shooting angles, backgrounds, illumination and the like are different. In recent years, the research on the re-identification of pedestrians in a visible light mode is endless, and the performance of the pedestrian is also higher and higher. However, the conventional method has a limited use scenario because a visible light camera cannot capture a clear image at night. To achieve all-weather monitoring, modern monitoring systems often incorporate infrared cameras to obtain images of pedestrians in dark environments. Because of the significant mode difference between the infrared image and the visible light image, the traditional visible light mode pedestrian re-identification method cannot effectively match the two modes. Thus, visible-infrared cross-modality pedestrian re-recognition technology has emerged to address this challenge.
The challenge of visible-infrared cross-modal pedestrian re-recognition is that there is a large cross-modal difference between the two images. Existing approaches typically reduce modal differences from two perspectives: image level and feature level. Image-level methods typically utilize a generation countermeasure network to convert the original visible and infrared images into the same or similar style images to reduce the style differences between them. However, the resulting image obtained by this method is poor in quality and is prone to additional noise, and does not take advantage of the improvement in model performance. Feature level approaches aim to map features of two modalities to a common space to obtain features shared by the modalities. However, this approach sacrifices some modality-specific discriminant information, which is detrimental to the performance enhancement of the model.
Disclosure of Invention
The invention aims to: the invention aims to provide a cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion, which solves the problems existing in the prior art, effectively makes up the mode difference and fully excavates the identity information of pedestrians.
The technical scheme is as follows: the invention discloses a cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion, which comprises the following steps:
(1) Acquiring an original image, and dividing a training set, a verification set and a test set; preprocessing visible light images and infrared images in the training set;
(2) Using ResNet as a backbone network, and adding an auxiliary mode enhancement module;
(3) Continuously inputting the features output in the step (2) to ResNet for feature extraction and fusion, and calculating cross-modal instance aggregation loss; wherein, adding a multi-scale feature fusion module after the third and fourth residual blocks of ResNet;
(4) And carrying out global average pooling and batch normalization on the ResNet final output characteristics, and calculating the local semantic consistency loss.
Further, the step (1) specifically comprises the following steps: acquiring pedestrian images and identity tags from the existing dataset SYSU-MM01 and regDB, and dividing the pedestrian images and the identity tags into a training set, a verification set and a test set; performing horizontal overturning and random erasing pretreatment operation on the training set image, and cutting the image into 288 x 144 pixels; all images were then normalized using the channel mean and standard deviation.
Further, the step (2) specifically includes the following steps: firstly, carrying out random channel combination on visible light images in a training set to obtain an auxiliary mode image, inputting the images of the three modes into ResNet networks, and then enhancing the image representation of the auxiliary mode by using an attention weighted fusion strategy.
Further, the step (3) includes the following steps:
(31) Continuously inputting the features output in the step (2) into a shallow network formed by a first residual block and a second residual block of ResNet to continuously extract the features;
(32) Carrying out global average pooling and batch normalization on the characteristics output by the shallow network, and then calculating cross-modal instance aggregation loss;
(33) Inputting the characteristics of three modes output by the ResNet second residual block into a mode sharing branch, wherein the mode sharing branch is composed of third and fourth residual blocks of ResNet; and the third residual error block and the fourth residual error block are respectively added with a multi-scale characteristic fusion module.
Further, the step (32) specifically includes the following steps: let the shallow feature map of the second residual block output beThe average value of the cross-modal paired sample characteristic differences is calculated after global average pooling and batch normalization, and the formula is as follows:
where N is the number of paired samples in a training batch, AndFeatures of the ith sample representing m-mode and n-mode respectively; /(I)Representing the mean of the difference between the two features.
Further, in the step (33), the multi-scale feature fusion module includes two branches with identical structures: low-level features of last residual block outputAdvanced features/>, and current residual block outputAs inputs, where h, w, c represent the height, width, and number of channels of the feature, respectively;
Wherein each branch utilizes expansion convolution to respectively obtain multi-scale low-level characteristics And advanced features; Then weighting and fusing, and fusing the characteristicsInput to the next stage, the formula is as follows:
Wherein, Is a learnable parameter for controlling the fusion ratio of low-level features and high-level features.
Further, in the step (4), the partial semantic consistency loss formula is as follows:
Wherein,
AndIs a super parameter.
The invention relates to a cross-mode pedestrian re-identification system based on auxiliary mode enhancement and multi-scale feature fusion, which comprises the following components:
and a pretreatment module: the method comprises the steps of obtaining an original image, and dividing a training set, a verification set and a test set; preprocessing visible light images and infrared images in the training set;
an auxiliary mode enhancement module: the auxiliary mode enhancement module is used for adding an auxiliary mode enhancement module by utilizing ResNet as a backbone network;
a multi-scale feature fusion module: the method comprises the steps of continuously inputting the characteristics output by an auxiliary mode enhancement module into ResNet for characteristic extraction and fusion, and calculating cross-mode instance aggregation loss; wherein, adding a multi-scale feature fusion module after the third and fourth residual blocks of ResNet;
A local semantic consistency module: and carrying out global average pooling and batch normalization on the ResNet final output characteristics, and calculating the local semantic consistency loss.
The device comprises a memory, a processor and a program stored on the memory and capable of running on the processor, wherein the processor realizes any one of the cross-mode pedestrian re-identification methods based on auxiliary mode enhancement and multi-scale feature fusion when executing the program.
The storage medium of the invention stores a computer program designed to implement any one of the above-described cross-modal pedestrian re-recognition methods based on auxiliary modal enhancement and multi-scale feature fusion when running.
The beneficial effects are that: compared with the prior art, the invention has the remarkable advantages that: by adding the auxiliary mode enhancement mechanism and the multi-scale feature fusion module in ResNet, the mode difference between visible light and infrared can be effectively reduced, more mode shared identity information can be learned, the identity information of different receptive fields can be captured, and full mining of the identity features of pedestrians is realized. The cross-modal instance aggregation loss and the local semantic consistency loss realize double constraint on shallow features and deep features, and the distinguishing property and the robustness of the features are further enhanced.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a network structure diagram of a cross-modality pedestrian re-recognition framework based on auxiliary modality enhancement and multi-scale feature fusion of the present invention;
FIG. 3 is a network block diagram of an auxiliary modality enhancement module in a cross-modality pedestrian re-recognition framework of the present invention;
FIG. 4 is a network block diagram of a multi-scale feature fusion module in a cross-modal pedestrian re-recognition framework of the present invention;
FIG. 5 is a schematic diagram of a loss function in a dual feature space constraint in a cross-modal pedestrian re-recognition framework of the present invention;
Fig. 6 is a training flow chart of the neural network model of the present invention.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
As shown in fig. 1-6, an embodiment of the present invention provides a cross-mode pedestrian re-recognition method based on auxiliary mode enhancement and multi-scale feature fusion, which includes the following steps:
(1) Acquiring an original image, and dividing a training set, a verification set and a test set; preprocessing visible light images and infrared images in the training set; the method comprises the following steps: acquiring pedestrian images and identity tags from the existing dataset SYSU-MM01 and regDB, and dividing the pedestrian images and the identity tags into a training set, a verification set and a test set; performing horizontal overturning and random erasing pretreatment operation on the training set image, and cutting the image into 288 x 144 pixels; all images were then normalized using the channel mean and standard deviation.
(2) Using ResNet as a backbone network, and adding an auxiliary mode enhancement module; the method comprises the following steps: firstly, carrying out random channel combination on visible light images in a training set to obtain an auxiliary mode image, inputting the images of the three modes into ResNet networks, and then enhancing the image representation of the auxiliary mode by using an attention weighted fusion strategy. The specific process of the attention weighted fusion strategy is as follows: respectively calculating the similarity of the auxiliary mode image and the visible light image and the similarity of the auxiliary mode image and the infrared image, and then enhancing the auxiliary mode image representation by using the two similarities; first, a1×1 convolution is employedThe features of the three modes are changed into compact features:
Wherein, Respectively expressIs a parameter of (a).
And then the attention force diagrams between the auxiliary features and the visible light features and the infrared features are calculated by using the Softmax, namely:
Wherein d represents the channel dimension of Q; weighted fusion of the two strikings is performed, and enhanced strikings are obtained:
Wherein, AndAre all learnable parameters, andRepresenting the fusion weights of the two attention attempts.
Finally, an auxiliary mode characteristic diagram enhanced by fusion attention diagram and residual connection is utilized:
Where W is a learnable parameter of the fully connected layer comprising a 1x1 convolution and batch normalization. And (5) representing the enhanced auxiliary mode characteristic diagram.
(3) Continuously inputting the features output in the step (2) to ResNet for feature extraction and fusion, and calculating cross-modal instance aggregation loss; wherein, adding a multi-scale feature fusion module after the third and fourth residual blocks of ResNet; the method comprises the following steps:
(31) Continuously inputting the features output in the step (2) into a shallow network formed by a first residual block and a second residual block of ResNet to continuously extract the features;
(32) Carrying out global average pooling and batch normalization on the characteristics output by the shallow network, and then calculating cross-modal instance aggregation loss; the method comprises the following steps: let the shallow feature map of the second residual block output be The average value of the cross-modal paired sample characteristic differences is calculated after global average pooling and batch normalization, and the formula is as follows:
where N is the number of paired samples in a training batch, AndFeatures of the ith sample representing m-mode and n-mode respectively; /(I)Representing the mean of the difference between the two features.
(33) Inputting the characteristics of three modes output by the ResNet second residual block into a mode sharing branch, wherein the mode sharing branch is composed of third and fourth residual blocks of ResNet; and the third residual error block and the fourth residual error block are respectively added with a multi-scale characteristic fusion module. The multi-scale feature fusion module comprises two branches with the same structure: low-level features of last residual block outputAdvanced features/>, and current residual block outputAs inputs, where h, w, c represent the height, width, and number of channels of the feature, respectively;
Wherein each branch utilizes expansion convolution to respectively obtain multi-scale low-level characteristics And advanced features; Then weighting and fusing, and fusing the characteristicsInput to the next stage, the formula is as follows:
Wherein, Is a learnable parameter for controlling the fusion ratio of low-level features and high-level features.
Wherein the multi-scale low-level featuresAnd advanced featuresThe calculation method is as follows: two fully connected layers were designed to obtain characteristics of different receptive fields using 3 x 3 convolutions of different expansion rates, expansion convolutions of expansion rate 1 and expansion convolutions of expansion rate 2 were used respectively:
Wherein, Features representing different receptive fields. /(I)AndRepresenting fully connected layers, consisting of an expansion convolution, a batch normalization layer, and a ReLU activation function, respectively.
And then, the features of different scales obtained by the two branches are fused through element-level addition, and global feature information is obtained through global average pooling. Using a fully connected layer to scale the channel dimensions of the featureCompressed toTo obtain a more compact featureTo balance performance and complexity, the dimension reduction rate r is set to 16. The process is represented as follows:
To enable adaptive selection of different scale features, feature maps of different receptive field branches are given different attention weights derived from compact feature Z:
Wherein, FeaturesChannel dimension reduction to. A and b represent the attention weights of U and V. And finally, weighting and fusing the deep multi-scale features with the attention as the weight to obtain the deep multi-scale features:
Similarly, low-level features After the resolution is reduced by a 1X 1 convolution, the multi-scale low-level characteristic/> can be obtained by the steps
(4) And carrying out global average pooling and batch normalization on the ResNet final output characteristics, and calculating the local semantic consistency loss. The local semantic consistency loss formula is as follows:
Wherein,
AndIs a super parameter.
The invention obtains good performance on two main stream cross-mode pedestrian re-identification data sets of SYSU-MM01 and RegDB, and the comparison experiment results are shown in Table 1.
Table 1 comparison of the accuracy of the method with other cross-modality pedestrian re-recognition methods
The embodiment of the invention also provides a cross-mode pedestrian re-identification system based on auxiliary mode enhancement and multi-scale feature fusion, which comprises the following steps:
and a pretreatment module: the method comprises the steps of obtaining an original image, and dividing a training set, a verification set and a test set; preprocessing visible light images and infrared images in the training set;
an auxiliary mode enhancement module: the auxiliary mode enhancement module is used for adding an auxiliary mode enhancement module by utilizing ResNet as a backbone network;
a multi-scale feature fusion module: the method comprises the steps of continuously inputting the characteristics output by an auxiliary mode enhancement module into ResNet for characteristic extraction and fusion, and calculating cross-mode instance aggregation loss; wherein, adding a multi-scale feature fusion module after the third and fourth residual blocks of ResNet;
a local semantic consistency module: global average pooling and batch normalization are performed with ResNet final output features to calculate local semantic consistency loss.
The embodiment of the invention also provides equipment, which comprises a memory, a processor and a program stored on the memory and capable of running on the processor, wherein the processor realizes any one of the cross-mode pedestrian re-identification methods based on auxiliary mode enhancement and multi-scale feature fusion when executing the program.
The embodiment of the invention also provides a storage medium, which stores a computer program, wherein the computer program is designed to realize the cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion in any one of the running processes.

Claims (5)

1. The cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion is characterized by comprising the following steps of:
(1) Acquiring an original image, and dividing a training set, a verification set and a test set; preprocessing visible light images and infrared images in the training set;
(2) Using ResNet as a backbone network, and adding an auxiliary mode enhancement module; the method comprises the following steps: firstly, carrying out random channel combination on visible light images in a training set to obtain an auxiliary mode image, inputting the images of three modes into ResNet networks, and then enhancing the image representation of the auxiliary mode by using an attention weighted fusion strategy;
(3) Continuously inputting the features output in the step (2) to ResNet for feature extraction and fusion, and calculating cross-modal instance aggregation loss; wherein, adding a multi-scale feature fusion module after the third and fourth residual blocks of ResNet; the method comprises the following steps:
(31) Continuously inputting the features output in the step (2) into a shallow network formed by a first residual block and a second residual block of ResNet to continuously extract the features;
(32) Carrying out global average pooling and batch normalization on the characteristics output by the shallow network, and then calculating cross-modal instance aggregation loss; the method comprises the following steps: let the shallow feature map of the second residual block output be The average value of the cross-modal paired sample characteristic differences is calculated after global average pooling and batch normalization, and the formula is as follows:
where N is the number of paired samples in a training batch, AndFeatures of the ith sample representing m-mode and n-mode respectively; /(I)A mean value representing the difference between the two features;
(33) Inputting the characteristics of three modes output by the ResNet second residual block into a mode sharing branch, wherein the mode sharing branch is composed of third and fourth residual blocks of ResNet; the third residual block and the fourth residual block are respectively added with a multi-scale feature fusion module; the multi-scale feature fusion module comprises two branches with the same structure: low-level features of last residual block output And advanced features of the current residual block outputAs inputs, where h, w, c represent the height, width, and number of channels of the feature, respectively;
Wherein each branch utilizes expansion convolution to respectively obtain multi-scale low-level characteristics And advanced features; Then weighting and fusing, and fusing the characteristicsInput to the next stage, the formula is as follows:
Wherein, Is a learnable parameter for controlling the fusion ratio of the low-level features and the high-level features;
(4) Carrying out global average pooling and batch normalization on the ResNet final output characteristics, and calculating local semantic consistency loss; the partial semantic consistency loss formula is as follows:
Wherein,
AndIs a super parameter.
2. The method for identifying the cross-modal pedestrian re-based on auxiliary modal enhancement and multi-scale feature fusion according to claim 1, wherein the step (1) is specifically as follows: acquiring pedestrian images and identity tags from the existing dataset SYSU-MM01 and regDB, and dividing the pedestrian images and the identity tags into a training set, a verification set and a test set; performing horizontal overturning and random erasing pretreatment operation on the training set image, and cutting the image into 288 x 144 pixels; all images were then normalized using the channel mean and standard deviation.
3. The system for identifying the cross-mode pedestrian re-based on auxiliary mode enhancement and multi-scale feature fusion is characterized by comprising the following components:
and a pretreatment module: the method comprises the steps of obtaining an original image, and dividing a training set, a verification set and a test set; preprocessing visible light images and infrared images in the training set;
an auxiliary mode enhancement module: the auxiliary mode enhancement module is used for adding an auxiliary mode enhancement module by utilizing ResNet as a backbone network;
a multi-scale feature fusion module: the method comprises the steps of continuously inputting the characteristics output by an auxiliary mode enhancement module into ResNet for characteristic extraction and fusion, and calculating cross-mode instance aggregation loss; wherein, adding a multi-scale feature fusion module after the third and fourth residual blocks of ResNet; the method comprises the following steps:
(31) Continuously inputting the features output in the step (2) into a shallow network formed by a first residual block and a second residual block of ResNet to continuously extract the features;
(32) Carrying out global average pooling and batch normalization on the characteristics output by the shallow network, and then calculating cross-modal instance aggregation loss; the method comprises the following steps: let the shallow feature map of the second residual block output be The average value of the cross-modal paired sample characteristic differences is calculated after global average pooling and batch normalization, and the formula is as follows:
where N is the number of paired samples in a training batch, AndFeatures of the ith sample representing m-mode and n-mode respectively; /(I)A mean value representing the difference between the two features;
(33) Inputting the characteristics of three modes output by the ResNet second residual block into a mode sharing branch, wherein the mode sharing branch is composed of third and fourth residual blocks of ResNet; the third residual block and the fourth residual block are respectively added with a multi-scale feature fusion module; the multi-scale feature fusion module comprises two branches with the same structure: low-level features of last residual block output And advanced features of the current residual block outputAs inputs, where h, w, c represent the height, width, and number of channels of the feature, respectively;
Wherein each branch utilizes expansion convolution to respectively obtain multi-scale low-level characteristics And advanced features; Then weighting and fusing, and fusing the characteristicsInput to the next stage, the formula is as follows:
Wherein, Is a learnable parameter for controlling the fusion ratio of the low-level features and the high-level features;
A local semantic consistency module: carrying out global average pooling and batch normalization on the ResNet final output characteristics, and calculating local semantic consistency loss, wherein the local semantic consistency loss formula is as follows:
Wherein,
AndIs a super parameter.
4. An apparatus comprising a memory, a processor, and a program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements a cross-modal pedestrian re-recognition method based on auxiliary modal enhancement and multi-scale feature fusion as claimed in any one of claims 1-2.
5. A storage medium storing a computer program, characterized in that the computer program is designed to implement a cross-modality pedestrian re-recognition method based on auxiliary modality enhancement and multi-scale feature fusion according to any one of claims 1 to 2 when run.
CN202410406480.5A 2024-04-07 2024-04-07 Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion Active CN117994822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410406480.5A CN117994822B (en) 2024-04-07 2024-04-07 Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410406480.5A CN117994822B (en) 2024-04-07 2024-04-07 Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion

Publications (2)

Publication Number Publication Date
CN117994822A CN117994822A (en) 2024-05-07
CN117994822B true CN117994822B (en) 2024-06-14

Family

ID=90889286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410406480.5A Active CN117994822B (en) 2024-04-07 2024-04-07 Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion

Country Status (1)

Country Link
CN (1) CN117994822B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120220043B (en) * 2025-02-25 2025-09-26 山东科技大学 A cross-modal person re-identification method based on feature enhancement
CN120236301B (en) * 2025-05-29 2025-10-28 西安电子科技大学 A multimodal person re-identification method based on complementary data enhancement

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818790A (en) * 2021-01-25 2021-05-18 浙江理工大学 Pedestrian re-identification method based on attention mechanism and space geometric constraint
CN112818931A (en) * 2021-02-26 2021-05-18 中国矿业大学 Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991278A (en) * 2019-11-20 2020-04-10 北京影谱科技股份有限公司 Human body action recognition method and device in video of computer vision system
CN113392786B (en) * 2021-06-21 2022-04-12 电子科技大学 Cross-domain person re-identification method based on normalization and feature enhancement
CN113807440B (en) * 2021-09-17 2022-08-26 北京百度网讯科技有限公司 Method, apparatus, and medium for processing multimodal data using neural networks
CN114998928A (en) * 2022-05-18 2022-09-02 南京信息工程大学 Cross-modal pedestrian re-identification method based on multi-granularity feature utilization
CN115171165A (en) * 2022-07-29 2022-10-11 南京邮电大学 Pedestrian re-identification method and device with global features and step-type local features fused
CN115393901A (en) * 2022-09-13 2022-11-25 广东工业大学 Cross-modal pedestrian re-identification method and computer readable storage medium
CN116110118B (en) * 2022-11-08 2025-10-17 西安电子科技大学 Pedestrian re-recognition and gait recognition method based on space-time feature complementary fusion
CN115909407A (en) * 2022-12-01 2023-04-04 南京邮电大学 A cross-modal person re-identification method based on person attribute assistance
CN116959098A (en) * 2023-06-16 2023-10-27 南京邮电大学 A pedestrian re-identification method and system based on dual-granularity three-mode metric learning
CN117315714B (en) * 2023-09-11 2025-11-18 淮阴工学院 A Multispectral Pedestrian Detection Method Based on Cross-Modal Eigendecomposition
CN117541944B (en) * 2023-11-07 2024-06-11 南京航空航天大学 A multi-modal infrared small target detection method
CN117727066A (en) * 2023-11-17 2024-03-19 南通理工学院 A cross-modal person re-identification method based on feature collaborative attention
CN117746467B (en) * 2024-01-05 2024-05-28 南京信息工程大学 A cross-modal person re-identification method with modality enhancement and compensation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818790A (en) * 2021-01-25 2021-05-18 浙江理工大学 Pedestrian re-identification method based on attention mechanism and space geometric constraint
CN112818931A (en) * 2021-02-26 2021-05-18 中国矿业大学 Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion

Also Published As

Publication number Publication date
CN117994822A (en) 2024-05-07

Similar Documents

Publication Publication Date Title
CN117994822B (en) Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion
CN113159043A (en) Feature point matching method and system based on semantic information
CN112818790A (en) Pedestrian re-identification method based on attention mechanism and space geometric constraint
CN113033454A (en) Method for detecting building change in urban video camera
CN113704276B (en) Map updating method, device, electronic device and computer-readable storage medium
CN113591545B (en) Deep learning-based multi-level feature extraction network pedestrian re-identification method
CN113361475A (en) Multi-spectral pedestrian detection method based on multi-stage feature fusion information multiplexing
CN118799919B (en) Full-time multi-mode pedestrian re-recognition method based on simulation augmentation and prototype learning
CN113076947A (en) RGB-T image significance detection system with cross-guide fusion
CN114882525B (en) Cross-modal pedestrian re-identification method based on modal specific memory network
CN116994164B (en) A Joint Learning Method for Multimodal Aerial Image Fusion and Object Detection
CN116071752B (en) A method and system for intelligent recognition of digital instrument readings
CN114627500B (en) A cross-modal person re-identification method based on convolutional neural network
CN116740480A (en) Multimodal image fusion target tracking method
CN119863491B (en) RGBT target tracking method based on domain adaptation and space-time information fusion
CN114708321A (en) Semantic-based camera pose estimation method and system
CN117351246B (en) A method, system and readable medium for removing mismatched pairs
Ren et al. Hierarchical loop closure detection with weighted local patch features and global descriptors: Ren et al.
CN114494736B (en) Outdoor place re-identification method based on salient region detection
CN118736364A (en) A method for infrared dim small target detection based on sparse attention and multi-scale feature fusion
CN118115947A (en) Cross-modal person re-identification method based on random color conversion and multi-scale feature fusion
CN117953537A (en) A person re-identification method based on improved Transformer and multi-scale feature fusion
CN113920303B (en) A convolutional neural network-based weakly supervised category-independent image similarity retrieval system and its control method
Xia et al. Self‐training with one‐shot stepwise learning method for person re‐identification
CN112699846B (en) Specific character and specific behavior combined retrieval method and device with identity consistency check function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant