CN118297984A

CN118297984A - Multi-target tracking method and system for smart city camera

Info

Publication number: CN118297984A
Application number: CN202410114313.3A
Authority: CN
Inventors: 夏婵娟; 宦小答; 胡玥; 贺一唐; 王颖琪
Original assignee: Hunan Police Academy
Current assignee: Hunan Police Academy
Priority date: 2024-01-26
Filing date: 2024-01-26
Publication date: 2024-07-05
Anticipated expiration: 2044-01-26
Also published as: CN118297984B

Abstract

The invention relates to the technical field of computer vision, in particular to a multi-target tracking method and system for a smart city camera, comprising the following steps of: based on the monitoring image, the depth separation convolutional neural network and the scale invariant feature transformation are adopted to perform image processing and preliminary target recognition, and preliminary detection of targets is performed. In the invention, the feature pyramid network and the attention mechanism accurately identify targets with different sizes, enhance multi-scale feature extraction and fusion, adjust and optimize target tracking flexibility and accuracy by self-adaptive scale, solve target shielding by fusion of a mask region convolution neural network and monocular depth estimation, enhance multi-camera data integration by a graph convolution neural network, improve space-time correlation construction efficiency, combine a time convolution neural network with a long-short-term memory neural network, and use a single support vector machine for abnormal behavior identification and early warning, improve behavior analysis accuracy and early warning timeliness, and enhance response capability and event processing efficiency of a smart city monitoring system.

Description

Smart city camera multi-target tracking method and system

技术领域Technical Field

本发明涉及计算机视觉技术领域，尤其涉及智慧城市摄像机多目标追踪方法及系统。The present invention relates to the field of computer vision technology, and in particular to a multi-target tracking method and system for smart city cameras.

背景技术Background technique

计算机视觉是一门研究如何使计算机能够理解和解释图像或视频的科学和工程领域。它涉及使用算法和模型来处理、分析和理解数字图像或视频数据，以实现对场景中的对象、动作和事件的识别、跟踪和理解。Computer vision is a field of science and engineering that studies how to enable computers to understand and interpret images or videos. It involves the use of algorithms and models to process, analyze, and understand digital image or video data to identify, track, and understand objects, actions, and events in a scene.

其中，智慧城市摄像机多目标追踪方法，旨在利用智慧城市摄像机系统对多个目标进行追踪。该方法的目的是通过分析视频监控画面中的运动对象，自动检测、跟踪和识别多个目标，从而提供有关目标位置、运动轨迹和行为模式的信息。为了实现这样的效果，该方法一般通过目标检测、目标跟踪、多目标追踪和行为分析等手段来达成。这些技术手段通常基于深度学习、图像处理和模式识别等相关领域的算法和方法，通过训练大量的标注数据来提高追踪的准确性和鲁棒性。Among them, the smart city camera multi-target tracking method aims to track multiple targets using the smart city camera system. The purpose of this method is to automatically detect, track and identify multiple targets by analyzing the moving objects in the video surveillance screen, thereby providing information about the target location, motion trajectory and behavior pattern. In order to achieve this effect, this method is generally achieved through means such as target detection, target tracking, multi-target tracking and behavior analysis. These technical means are usually based on algorithms and methods in related fields such as deep learning, image processing and pattern recognition, and improve the accuracy and robustness of tracking by training a large amount of labeled data.

在现有智慧城市摄像机多目标追踪方法中，现有方法在多目标追踪方面，常因为缺乏有效的多尺度特征融合和自适应尺度调整机制，导致在处理多尺寸目标时识别能力有限，对于远距离或小尺寸目标的追踪精度不高。此外，传统方法在处理目标遮挡问题时往往缺少有效的深度信息分析工具，难以准确获取遮挡关系，造成目标间的关系理解不足。在多摄像头数据整合方面，现有方法由于缺乏高效的网络结构，往往无法充分利用不同视角的信息，影响时空关联的建立。同时，在行为学习和预警方面，传统方法常常局限于表层特征，忽略了深层次模式识别，使得异常行为的预警不够及时或准确，限制了整体监控系统的反应速度和处理复杂事件的能力。In the existing smart city camera multi-target tracking methods, the existing methods often lack effective multi-scale feature fusion and adaptive scale adjustment mechanisms, resulting in limited recognition capabilities when processing multi-size targets, and low tracking accuracy for long-distance or small-size targets. In addition, traditional methods often lack effective deep information analysis tools when dealing with target occlusion problems, making it difficult to accurately obtain occlusion relationships, resulting in insufficient understanding of the relationship between targets. In terms of multi-camera data integration, existing methods often cannot fully utilize information from different perspectives due to the lack of efficient network structures, affecting the establishment of spatiotemporal associations. At the same time, in terms of behavior learning and early warning, traditional methods are often limited to surface features and ignore deep-level pattern recognition, making the early warning of abnormal behavior not timely or accurate enough, limiting the response speed of the overall monitoring system and the ability to handle complex events.

发明内容Summary of the invention

本发明的目的是解决现有技术中存在的缺点，而提出的智慧城市摄像机多目标追踪方法及系统。The purpose of the present invention is to solve the shortcomings of the prior art and propose a smart city camera multi-target tracking method and system.

为了实现上述目的，本发明采用了如下技术方案：智慧城市摄像机多目标追踪方法，包括以下步骤：In order to achieve the above object, the present invention adopts the following technical solution: a multi-target tracking method of a smart city camera, comprising the following steps:

S1：基于监控图像，采用深度分离卷积神经网络和尺度不变特征转换进行图像处理和初步目标识别，并进行目标的初步检测，生成初步目标识别信息；S1: Based on the surveillance image, a deep separation convolutional neural network and scale-invariant feature transformation are used for image processing and preliminary target recognition, and preliminary target detection is performed to generate preliminary target recognition information;

S2：基于所述初步目标识别信息，采用特征金字塔网络以及注意力机制进行多尺度特征融合，并加强对多尺寸目标的识别能力，生成多尺度目标特征信息；S2: Based on the preliminary target recognition information, a feature pyramid network and an attention mechanism are used to perform multi-scale feature fusion, and the recognition capability of multi-scale targets is enhanced to generate multi-scale target feature information;

S3：基于所述多尺度目标特征信息，运用尺度估计模块和锚框机制进行自适应尺度调整，优化目标追踪框的大小，生成自适应尺度调整后的追踪信息；S3: Based on the multi-scale target feature information, adaptive scale adjustment is performed using a scale estimation module and an anchor frame mechanism to optimize the size of the target tracking frame and generate tracking information after adaptive scale adjustment;

S4：基于所述自适应尺度调整后的追踪信息，使用掩膜区域卷积神经网络和单眼深度估计算法处理目标遮挡问题，获取目标间的遮挡关系和场景深度信息，生成目标遮挡关系与深度信息；S4: Based on the tracking information after the adaptive scale adjustment, a mask area convolutional neural network and a monocular depth estimation algorithm are used to process the target occlusion problem, obtain the occlusion relationship between targets and the scene depth information, and generate the target occlusion relationship and depth information;

S5：基于所述目标遮挡关系与深度信息，通过图卷积神经网络整合多摄像头视角的数据，并建立目标在多摄像头视野中的时空关联，生成跨摄像头目标追踪信息；S5: Based on the target occlusion relationship and depth information, the data of multiple camera perspectives are integrated through a graph convolutional neural network, and the spatiotemporal correlation of the target in the field of view of multiple cameras is established to generate cross-camera target tracking information;

S6：基于所述跨摄像头目标追踪信息，结合时间卷积神经网络和长短期记忆神经网络对目标模式进行学习，并使用单类支持向量机对异常行为进行识别、预警，生成异常行为识别与预警报告；S6: Based on the cross-camera target tracking information, the target pattern is learned by combining the temporal convolutional neural network and the long short-term memory neural network, and the abnormal behavior is identified and warned by using a single-class support vector machine, and an abnormal behavior identification and warning report is generated;

所述初步目标识别信息包括目标的位置、大小和形状信息，所述多尺度目标特征信息具体为多层次、多尺寸的目标特征描述，包括多尺度下目标的细节和结构信息，所述自适应尺度调整后的追踪信息具体为多尺度下匹配目标的追踪框及特征信息，所述目标遮挡关系与深度信息具体指包括遮挡目标的识别信息，以及目标在场景中的深度位置，所述跨摄像头目标追踪信息包括多摄像头间关键目标的位置、时间关联以及行为信息，所述异常行为识别与预警报告具体为发现的违法行为或异常行为，并对其时间、地点、性质进行描述和预警。The preliminary target recognition information includes the position, size and shape information of the target. The multi-scale target feature information is specifically a multi-level, multi-size target feature description, including the details and structural information of the target at multiple scales. The tracking information after adaptive scale adjustment is specifically the tracking frame and feature information of the matching target at multiple scales. The target occlusion relationship and depth information specifically refers to the recognition information of the occluded target and the depth position of the target in the scene. The cross-camera target tracking information includes the position, time association and behavior information of key targets between multiple cameras. The abnormal behavior recognition and warning report is specifically the discovered illegal behavior or abnormal behavior, and its time, place and nature are described and warned.

作为本发明的进一步方案，基于监控图像，采用深度分离卷积神经网络和尺度不变特征转换进行图像处理和初步目标识别，并进行目标的初步检测，生成初步目标识别信息的步骤具体为：As a further solution of the present invention, based on the monitoring image, a deep separation convolutional neural network and a scale-invariant feature transformation are used to perform image processing and preliminary target recognition, and preliminary detection of the target is performed. The steps of generating preliminary target recognition information are specifically as follows:

S101：基于原始监控图像，采用直方图均衡化和高斯模糊算法进行图像增强和噪声降低，生成增强后图像；S101: Based on the original monitoring image, histogram equalization and Gaussian blur algorithm are used to perform image enhancement and noise reduction to generate an enhanced image;

S102：基于所述增强后图像，采用深度可分离卷积算法提取基本特征，减少计算量并保持性能，生成基本特征图；S102: Based on the enhanced image, a depthwise separable convolution algorithm is used to extract basic features, reduce the amount of calculation and maintain performance, and generate a basic feature map;

S103：基于所述基本特征图，采用多尺度空间金字塔池化增强尺度不变特征提取，生成矢量特征图；S103: Based on the basic feature map, multi-scale spatial pyramid pooling is used to enhance scale-invariant feature extraction to generate a vector feature map;

S104：基于所述矢量特征图，采用区域提议网络结合边框回归进行目标的初步检测，精确定位目标，生成初步目标识别信息。S104: Based on the vector feature map, a region proposal network is used in combination with bounding box regression to perform preliminary detection of the target, accurately locate the target, and generate preliminary target recognition information.

作为本发明的进一步方案，基于所述初步目标识别信息，采用特征金字塔网络以及注意力机制进行多尺度特征融合，并加强对多尺寸目标的识别能力，生成多尺度目标特征信息的步骤具体为：As a further solution of the present invention, based on the preliminary target recognition information, a feature pyramid network and an attention mechanism are used to perform multi-scale feature fusion, and the recognition capability of multi-size targets is enhanced. The steps of generating multi-scale target feature information are specifically as follows:

S201：基于所述初步目标识别信息，采用特征金字塔网络实现特征层间的融合，强化对多尺度目标的检测能力，生成融合特征图；S201: Based on the preliminary target recognition information, a feature pyramid network is used to achieve fusion between feature layers, enhance the detection capability of multi-scale targets, and generate a fusion feature map;

S202：基于所述融合特征图，采用注意力机制，增强特征的表达能力，生成注意力加权特征图；S202: Based on the fused feature map, an attention mechanism is used to enhance the expressiveness of features and generate an attention-weighted feature map;

S203：基于所述注意力加权特征图，采用锚框优化算法进行锚框的优化调整，提升模型对多尺寸目标的拟合精度，生成优化锚框特征信息；S203: Based on the attention weighted feature map, an anchor frame optimization algorithm is used to optimize and adjust the anchor frame, so as to improve the fitting accuracy of the model for multi-size objects and generate optimized anchor frame feature information;

S204：基于所述优化锚框特征信息，通过级联卷积网络提炼特征，增强多尺度特征的表示力，生成多尺度目标特征信息。S204: Based on the optimized anchor frame feature information, features are refined through a cascaded convolutional network to enhance the representation of multi-scale features and generate multi-scale target feature information.

作为本发明的进一步方案，基于所述多尺度目标特征信息，运用尺度估计模块和锚框机制进行自适应尺度调整，优化目标追踪框的大小，生成自适应尺度调整后的追踪信息的步骤具体为：As a further solution of the present invention, based on the multi-scale target feature information, the scale estimation module and the anchor frame mechanism are used to perform adaptive scale adjustment, optimize the size of the target tracking frame, and generate the tracking information after adaptive scale adjustment in the following steps:

S301：基于所述多尺度目标特征信息，采用实时尺度估计算法对目标大小进行动态预测，适应目标尺度的变化，生成尺度调整信息；S301: Based on the multi-scale target feature information, a real-time scale estimation algorithm is used to dynamically predict the target size, adapt to changes in the target scale, and generate scale adjustment information;

S302：基于所述尺度调整信息，采用锚框生成算法调整追踪框尺寸，使得追踪框贴合目标实际尺度，生成自适应尺度的追踪框；S302: Based on the scale adjustment information, an anchor frame generation algorithm is used to adjust the size of the tracking frame so that the tracking frame fits the actual scale of the target, and a tracking frame with an adaptive scale is generated;

S303：基于所述自适应尺度的追踪框，使用非极大值抑制算法消除重叠的追踪框，减少冗余，生成非极大值抑制后追踪框；S303: Based on the tracking frame of the adaptive scale, a non-maximum suppression algorithm is used to eliminate overlapping tracking frames, reduce redundancy, and generate a non-maximum suppressed tracking frame;

S304：基于所述非极大值抑制后追踪框，结合跟踪学习算法进行最终的目标追踪框定位，生成自适应尺度调整后的追踪信息。S304: Based on the non-maximum suppressed tracking frame, the final target tracking frame is located in combination with a tracking learning algorithm to generate tracking information after adaptive scale adjustment.

作为本发明的进一步方案，基于所述自适应尺度调整后的追踪信息，使用掩膜区域卷积神经网络和单眼深度估计算法处理目标遮挡问题，获取目标间的遮挡关系和场景深度信息，生成目标遮挡关系与深度信息的步骤具体为：As a further solution of the present invention, based on the tracking information after the adaptive scale adjustment, the mask area convolutional neural network and the monocular depth estimation algorithm are used to process the target occlusion problem, obtain the occlusion relationship between targets and the scene depth information, and generate the target occlusion relationship and depth information in the following steps:

S401：基于所述自适应尺度调整后的追踪信息，采用特征金字塔网络增强的掩膜区域卷积神经网络进行目标实例分割，并准确判定遮挡边界，生成目标分割与遮挡边界信息；S401: Based on the tracking information after the adaptive scale adjustment, a mask region convolutional neural network enhanced by a feature pyramid network is used to perform target instance segmentation, and occlusion boundaries are accurately determined to generate target segmentation and occlusion boundary information;

S402：基于所述目标分割与遮挡边界信息，利用深度学习的单眼深度估计方法，对场景进行三维重建，生成场景深度图；S402: Based on the target segmentation and occlusion boundary information, a monocular depth estimation method based on deep learning is used to perform three-dimensional reconstruction of the scene to generate a scene depth map;

S403：基于所述场景深度图，运用像素级融合技术处理，对目标间的遮挡关系进行解析，生成目标遮挡关系图；S403: Based on the scene depth map, pixel-level fusion technology is used to analyze the occlusion relationship between objects and generate an object occlusion relationship map;

S404：基于所述目标遮挡关系图，结合深度排序算法和遮挡处理技术，确定目标间的遮挡等级和深度，生成目标遮挡关系与深度信息。S404: Based on the target occlusion relationship graph, combined with the depth sorting algorithm and the occlusion processing technology, the occlusion level and depth between the targets are determined, and the target occlusion relationship and depth information are generated.

作为本发明的进一步方案，基于所述目标遮挡关系与深度信息，通过图卷积神经网络整合多摄像头视角的数据，并建立目标在多摄像头视野中的时空关联，生成跨摄像头目标追踪信息的步骤具体为：As a further solution of the present invention, based on the target occlusion relationship and depth information, the data of multiple camera perspectives are integrated through a graph convolutional neural network, and the spatiotemporal association of the target in the multi-camera field of view is established, and the steps of generating cross-camera target tracking information are specifically as follows:

S501：基于所述目标遮挡关系与深度信息，应用图卷积神经网络，结合特征抽取和优化策略进行多视角数据整合，生成初步跨摄像头目标关联信息；S501: Based on the target occlusion relationship and depth information, a graph convolutional neural network is applied, and feature extraction and optimization strategies are combined to integrate multi-view data to generate preliminary cross-camera target association information;

S502：基于所述初步跨摄像头目标关联信息，采用多摄像机几何校准方法，通过基于极线约束的校准技术，细化目标位置关系，生成多视角目标位置信息；S502: Based on the preliminary cross-camera target association information, a multi-camera geometric calibration method is used to refine the target position relationship through a calibration technology based on epipolar constraints to generate multi-view target position information;

S503：基于所述多视角目标位置信息，运用时间序列分析技术，通过动态时间规整进行时序数据分析，优化目标时空关联，生成时间优化后的目标关联信息；S503: Based on the multi-view target position information, using time series analysis technology, performing time series data analysis through dynamic time warping, optimizing target spatiotemporal association, and generating time-optimized target association information;

S504：基于所述时间优化后的目标关联信息，整合多摄像头数据，通过多视角追踪融合算法完成时空关联，生成跨摄像头目标追踪信息。S504: Based on the time-optimized target association information, the multi-camera data is integrated, and the spatiotemporal association is completed through a multi-view tracking fusion algorithm to generate cross-camera target tracking information.

作为本发明的进一步方案，基于所述跨摄像头目标追踪信息，结合时间卷积神经网络和长短期记忆神经网络对目标模式进行学习，并使用单类支持向量机对异常行为进行识别、预警，生成异常行为识别与预警报告的步骤具体为：As a further solution of the present invention, based on the cross-camera target tracking information, the target pattern is learned by combining the temporal convolutional neural network and the long short-term memory neural network, and the abnormal behavior is identified and warned by using a single-class support vector machine. The steps of generating abnormal behavior identification and warning reports are specifically as follows:

S601：基于所述跨摄像头目标追踪信息，结合时间卷积神经网络和长短期记忆网络进行深度特征学习，生成目标行为特征学习结果；S601: Based on the cross-camera target tracking information, deep feature learning is performed in combination with a temporal convolutional neural network and a long short-term memory network to generate a target behavior feature learning result;

S602：基于所述目标行为特征学习结果，采用特征聚类算法进行时空特征分析，提取目标行为模式，生成时空聚类后的行为特征；S602: Based on the target behavior feature learning result, a feature clustering algorithm is used to perform spatiotemporal feature analysis, extract the target behavior pattern, and generate spatiotemporal clustered behavior features;

S603：基于所述时空聚类后的行为特征，利用单类支持向量机进行异常行为的识别和建模，生成异常行为识别模型；S603: Based on the behavior characteristics after the spatiotemporal clustering, using a single-class support vector machine to identify and model abnormal behaviors, and generate an abnormal behavior identification model;

S604：基于所述异常行为识别模型，结合实时监控数据，进行异常行为预警，生成异常行为识别与预警报告。S604: Based on the abnormal behavior recognition model and in combination with real-time monitoring data, abnormal behavior warning is performed to generate an abnormal behavior recognition and warning report.

智慧城市摄像机多目标追踪系统，所述智慧城市摄像机多目标追踪系统用于执行上述智慧城市摄像机多目标追踪方法，所述系统包括图像预处理模块、基本特征提取模块、初步目标识别模块、特征增强与优化模块、目标追踪框预处理模块、目标关联模块、行为学习与预警模块。A smart city camera multi-target tracking system, the smart city camera multi-target tracking system is used to execute the above-mentioned smart city camera multi-target tracking method, the system includes an image preprocessing module, a basic feature extraction module, a preliminary target recognition module, a feature enhancement and optimization module, a target tracking frame preprocessing module, a target association module, and a behavior learning and early warning module.

作为本发明的进一步方案，所述图像预处理模块基于原始监控图像，采用直方图均衡化和高斯模糊算法，进行图像增强和噪声降低，生成增强后图像；As a further solution of the present invention, the image preprocessing module uses histogram equalization and Gaussian blur algorithm based on the original monitoring image to perform image enhancement and noise reduction to generate an enhanced image;

所述基本特征提取模块基于增强后图像，提取基本特征，采用深度可分离卷积算法，减少计算量并保持性能，生成基本特征图；The basic feature extraction module extracts basic features based on the enhanced image, adopts a depth-separable convolution algorithm to reduce the amount of calculation and maintain performance, and generates a basic feature map;

所述初步目标识别模块基于基本特征图，进行初步目标识别，采用多尺度空间金字塔池化以增强尺度不变特征提取，并采用区域提议网络结合边框回归，定位目标，生成初步目标识别信息；The preliminary target recognition module performs preliminary target recognition based on the basic feature map, adopts multi-scale spatial pyramid pooling to enhance scale-invariant feature extraction, and adopts a region proposal network combined with bounding box regression to locate the target and generate preliminary target recognition information;

所述特征增强与优化模块基于初步目标识别信息，进行特征增强并优化，使用特征金字塔网络实现特征层间的融合并采用注意力机制以增强特征的表达能力，通过采用锚框优化算法进行框的优化调整，并通过级联卷积网络提炼特征，生成多尺度目标特征信息；The feature enhancement and optimization module performs feature enhancement and optimization based on preliminary target recognition information, uses a feature pyramid network to achieve fusion between feature layers and adopts an attention mechanism to enhance the expressiveness of features, optimizes and adjusts the frame by using an anchor frame optimization algorithm, and extracts features through a cascaded convolutional network to generate multi-scale target feature information;

所述目标追踪框预处理模块基于多尺度目标特征信息，处理追踪框，采用实时尺度估计算法对目标大小进行动态预测，通过锚框生成算法调整追踪框尺寸，并结合非极大值抑制算法消除重叠的追踪框，生成自适应尺度调整后的追踪信息；The target tracking frame preprocessing module processes the tracking frame based on multi-scale target feature information, uses a real-time scale estimation algorithm to dynamically predict the target size, adjusts the tracking frame size through an anchor frame generation algorithm, and combines a non-maximum suppression algorithm to eliminate overlapping tracking frames, thereby generating tracking information after adaptive scale adjustment;

所述目标关联模块基于自适应尺度调整后的追踪信息，运行目标关联算法，采用特征金字塔网络增强的掩膜区域卷积神经网络进行目标实例分割，通过深度学习的单眼深度估计方法对场景进行三维重建，使用像素级融合技术处理，并通过深度排序算法和遮挡处理技术确定目标间的遮挡等级和深度，生成目标遮挡关系与深度信息；The target association module runs a target association algorithm based on the tracking information after adaptive scale adjustment, uses a mask area convolutional neural network enhanced by a feature pyramid network to perform target instance segmentation, reconstructs the scene in three dimensions through a deep learning monocular depth estimation method, uses pixel-level fusion technology for processing, and determines the occlusion level and depth between targets through a depth sorting algorithm and occlusion processing technology, and generates target occlusion relationships and depth information;

所述行为学习与预警模块基于目标遮挡关系与深度信息进行行为学习与预警，使用时间卷积神经网络和长短期记忆网络进行深度特征学习，通过特征聚类算法进行时空特征分析，利用单类支持向量机进行异常行为的识别和建模，结合实时监控数据进行异常行为预警，生成异常行为识别与预警报告。The behavior learning and warning module performs behavior learning and warning based on target occlusion relationship and depth information, uses temporal convolutional neural network and long short-term memory network for deep feature learning, performs spatiotemporal feature analysis through feature clustering algorithm, uses single-class support vector machine to identify and model abnormal behavior, combines real-time monitoring data for abnormal behavior warning, and generates abnormal behavior identification and warning report.

作为本发明的进一步方案，所述图像预处理模块包括噪声降低子模块、图像增强子模块、过滤异常子模块、图像标准化子模块；As a further solution of the present invention, the image preprocessing module includes a noise reduction submodule, an image enhancement submodule, an abnormality filtering submodule, and an image normalization submodule;

所述基本特征提取模块包括深度卷积子模块、特征提取子模块、特征压缩子模块、特征噪声筛选子模块；The basic feature extraction module includes a deep convolution submodule, a feature extraction submodule, a feature compression submodule, and a feature noise screening submodule;

所述初步目标识别模块包括目标定位子模块、目标检测子模块、目标筛选子模块、目标确认子模块；The preliminary target recognition module includes a target positioning submodule, a target detection submodule, a target screening submodule, and a target confirmation submodule;

所述特征增强与优化模块包括特征融合子模块、特征注意力加权子模块、锚框优化子模块、特征提炼子模块；The feature enhancement and optimization module includes a feature fusion submodule, a feature attention weighting submodule, an anchor frame optimization submodule, and a feature extraction submodule;

所述目标追踪框预处理模块包括实时尺度估计子模块、锚框生成子模块、非极大值抑制子模块、目标追踪框定位子模块；The target tracking frame preprocessing module includes a real-time scale estimation submodule, an anchor frame generation submodule, a non-maximum suppression submodule, and a target tracking frame positioning submodule;

所述目标关联模块包括目标实例分割子模块、场景三维重建子模块、目标遮挡关系解析子模块、遮挡等级和深度判定子模块；The target association module includes a target instance segmentation submodule, a scene 3D reconstruction submodule, a target occlusion relationship analysis submodule, and an occlusion level and depth determination submodule;

所述行为学习与预警模块包括深度特征学习子模块、时空特征分析子模块、异常行为识别子模块、实时异常预警子模块。The behavior learning and warning module includes a deep feature learning submodule, a spatiotemporal feature analysis submodule, an abnormal behavior recognition submodule, and a real-time abnormal warning submodule.

与现有技术相比，本发明的优点和积极效果在于：Compared with the prior art, the advantages and positive effects of the present invention are:

本发明中，通过整合深度分离卷积神经网络与尺度不变特征转换，有效提高了图像处理的准确性，并加强了初步目标识别的能力。特征金字塔网络和注意力机制的应用，使得对不同尺寸目标的识别更为精确，增强了多尺度目标特征的抽取与融合。自适应尺度调整通过尺度估计模块和锚框机制优化追踪框的大小，提高了目标追踪的灵活性和准确度。掩膜区域卷积神经网络和单眼深度估计算法的融合，为解决目标遮挡问题提供了有效手段，确保了场景深度信息的准确获取。图卷积神经网络的引入，加强了多摄像头数据的整合能力，提升了时空关联的构建效率。最终，时间卷积神经网络和长短期记忆神经网络的结合，以及单类支持向量机在异常行为识别和预警的使用，大大提高了行为分析的准确性和预警的及时性，整体上显著增强了智慧城市监控系统对复杂场景的响应能力和事件处理效率。In the present invention, by integrating the deep separation convolutional neural network and the scale-invariant feature conversion, the accuracy of image processing is effectively improved, and the ability of preliminary target recognition is enhanced. The application of feature pyramid network and attention mechanism makes the recognition of targets of different sizes more accurate, and enhances the extraction and fusion of multi-scale target features. Adaptive scale adjustment optimizes the size of the tracking frame through the scale estimation module and the anchor frame mechanism, improving the flexibility and accuracy of target tracking. The fusion of mask area convolutional neural network and monocular depth estimation algorithm provides an effective means to solve the problem of target occlusion and ensures the accurate acquisition of scene depth information. The introduction of graph convolutional neural network strengthens the integration ability of multi-camera data and improves the efficiency of building spatiotemporal association. Finally, the combination of time convolutional neural network and long short-term memory neural network, as well as the use of single-class support vector machine in abnormal behavior recognition and early warning, greatly improves the accuracy of behavior analysis and the timeliness of early warning, and significantly enhances the overall responsiveness and event processing efficiency of smart city monitoring system to complex scenes.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的工作流程示意图；Fig. 1 is a schematic diagram of the workflow of the present invention;

图2为本发明的S1细化流程图；FIG2 is a flow chart of the refinement of S1 of the present invention;

图3为本发明的S2细化流程图；FIG3 is a flow chart of the refinement of S2 of the present invention;

图4为本发明的S3细化流程图；FIG4 is a flow chart of the refinement of S3 of the present invention;

图5为本发明的S4细化流程图；FIG5 is a flow chart of the refinement of S4 of the present invention;

图6为本发明的S5细化流程图；FIG6 is a flow chart of the refinement of S5 of the present invention;

图7为本发明的S6细化流程图；FIG7 is a detailed flow chart of S6 of the present invention;

图8为本发明的系统流程图；FIG8 is a system flow chart of the present invention;

图9为本发明的系统框架示意图。FIG. 9 is a schematic diagram of a system framework of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention.

在本发明的描述中，需要理解的是，术语“长度”、“宽度”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。此外，在本发明的描述中，“多个”的含义是两个或两个以上，除非另有明确具体的限定。In the description of the present invention, it should be understood that the terms "length", "width", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inside", "outside" and the like indicate positions or positional relationships based on the positions or positional relationships shown in the drawings, and are only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore cannot be understood as limiting the present invention. In addition, in the description of the present invention, "plurality" means two or more, unless otherwise clearly and specifically defined.

实施例一Embodiment 1

请参阅图1，本发明提供一种技术方案：智慧城市摄像机多目标追踪方法，包括以下步骤：Please refer to FIG1 . The present invention provides a technical solution: a multi-target tracking method of a smart city camera, comprising the following steps:

S2：基于初步目标识别信息，采用特征金字塔网络以及注意力机制进行多尺度特征融合，并加强对多尺寸目标的识别能力，生成多尺度目标特征信息；S2: Based on the preliminary target recognition information, the feature pyramid network and attention mechanism are used to fuse multi-scale features, enhance the recognition ability of multi-size targets, and generate multi-scale target feature information;

S3：基于多尺度目标特征信息，运用尺度估计模块和锚框机制进行自适应尺度调整，优化目标追踪框的大小，生成自适应尺度调整后的追踪信息；S3: Based on the multi-scale target feature information, the scale estimation module and anchor frame mechanism are used to perform adaptive scale adjustment, optimize the size of the target tracking frame, and generate tracking information after adaptive scale adjustment;

S4：基于自适应尺度调整后的追踪信息，使用掩膜区域卷积神经网络和单眼深度估计算法处理目标遮挡问题，获取目标间的遮挡关系和场景深度信息，生成目标遮挡关系与深度信息；S4: Based on the tracking information after adaptive scale adjustment, the mask region convolutional neural network and monocular depth estimation algorithm are used to deal with the target occlusion problem, obtain the occlusion relationship between targets and scene depth information, and generate target occlusion relationship and depth information;

S5：基于目标遮挡关系与深度信息，通过图卷积神经网络整合多摄像头视角的数据，并建立目标在多摄像头视野中的时空关联，生成跨摄像头目标追踪信息；S5: Based on the target occlusion relationship and depth information, the graph convolutional neural network is used to integrate the data from multiple camera perspectives, and the spatiotemporal correlation of the target in the multi-camera field of view is established to generate cross-camera target tracking information;

S6：基于跨摄像头目标追踪信息，结合时间卷积神经网络和长短期记忆神经网络对目标模式进行学习，并使用单类支持向量机对异常行为进行识别、预警，生成异常行为识别与预警报告；S6: Based on cross-camera target tracking information, the target pattern is learned by combining the temporal convolutional neural network and the long short-term memory neural network, and a single-class support vector machine is used to identify and warn abnormal behaviors, and generate abnormal behavior identification and warning reports;

初步目标识别信息包括目标的位置、大小和形状信息，多尺度目标特征信息具体为多层次、多尺寸的目标特征描述，包括多尺度下目标的细节和结构信息，自适应尺度调整后的追踪信息具体为多尺度下匹配目标的追踪框及特征信息，目标遮挡关系与深度信息具体指包括遮挡目标的识别信息，以及目标在场景中的深度位置，跨摄像头目标追踪信息包括多摄像头间关键目标的位置、时间关联以及行为信息，异常行为识别与预警报告具体为发现的违法行为或异常行为，并对其时间、地点、性质进行描述和预警。Preliminary target recognition information includes the position, size and shape information of the target. Multi-scale target feature information is specifically a multi-level, multi-size target feature description, including the details and structural information of the target at multiple scales. The tracking information after adaptive scale adjustment is specifically the tracking frame and feature information of the matching target at multiple scales. The target occlusion relationship and depth information specifically refers to the recognition information of the occluded target and the depth position of the target in the scene. The cross-camera target tracking information includes the position, time correlation and behavior information of key targets between multiple cameras. The abnormal behavior recognition and warning report is specifically the discovered illegal or abnormal behavior, and its time, place and nature are described and warned.

通过利用智慧城市摄像机系统进行多目标追踪，能够实时监测和识别多个目标，提供丰富的目标位置、运动轨迹和行为模式的信息。该方法采用深度分离卷积神经网络和尺度不变特征转换等技术手段，提高追踪的准确性。同时，通过特征金字塔网络和注意力机制进行多尺度特征融合，加强对多尺寸目标的识别能力。为了解决目标遮挡问题，该方法使用掩膜区域卷积神经网络和单眼深度估计算法处理目标遮挡关系和场景深度信息。此外，通过图卷积神经网络整合多摄像头视角的数据，实现跨摄像头的目标追踪，获取更全面的目标信息。最后，结合时间卷积神经网络和长短期记忆神经网络对目标模式进行学习，使用单类支持向量机对异常行为进行识别和预警。By using the smart city camera system for multi-target tracking, multiple targets can be monitored and identified in real time, providing rich information on target locations, motion trajectories, and behavior patterns. This method uses deep separation convolutional neural networks and scale-invariant feature transformation to improve tracking accuracy. At the same time, multi-scale feature fusion is performed through feature pyramid networks and attention mechanisms to enhance the recognition ability of multi-size targets. In order to solve the problem of target occlusion, this method uses mask region convolutional neural networks and monocular depth estimation algorithms to process target occlusion relationships and scene depth information. In addition, the graph convolutional neural network integrates data from multiple camera perspectives to achieve cross-camera target tracking and obtain more comprehensive target information. Finally, the temporal convolutional neural network and long short-term memory neural network are combined to learn the target pattern, and a single-class support vector machine is used to identify and warn abnormal behaviors.

请参阅图2，基于监控图像，采用深度分离卷积神经网络和尺度不变特征转换进行图像处理和初步目标识别，并进行目标的初步检测，生成初步目标识别信息的步骤具体为：Please refer to Figure 2. Based on the surveillance image, the deep separation convolutional neural network and scale-invariant feature transformation are used to perform image processing and preliminary target recognition, and preliminary target detection is performed. The specific steps of generating preliminary target recognition information are as follows:

S102：基于增强后图像，采用深度可分离卷积算法提取基本特征，减少计算量并保持性能，生成基本特征图；S102: Based on the enhanced image, a depth-separable convolution algorithm is used to extract basic features, reduce the amount of calculation and maintain performance, and generate a basic feature map;

S103：基于基本特征图，采用多尺度空间金字塔池化增强尺度不变特征提取，生成矢量特征图；S103: Based on the basic feature map, multi-scale spatial pyramid pooling is used to enhance scale-invariant feature extraction to generate a vector feature map;

S104：基于矢量特征图，采用区域提议网络结合边框回归进行目标的初步检测，精确定位目标，生成初步目标识别信息。S104: Based on the vector feature map, a region proposal network is used in combination with bounding box regression to perform preliminary target detection, accurately locate the target, and generate preliminary target recognition information.

首先，对原始监控图像进行直方图均衡化和高斯模糊算法处理，以增强图像的对比度和清晰度，并减少噪声的影响。这样可以生成增强后的图像，为后续的目标识别提供更好的输入。First, the original surveillance image is processed by histogram equalization and Gaussian blur algorithm to enhance the contrast and clarity of the image and reduce the influence of noise. This can generate an enhanced image and provide better input for subsequent target recognition.

接下来，利用深度可分离卷积算法对增强后的图像进行特征提取。该算法将传统的卷积操作分解为深度卷积和逐点卷积两个步骤，减少了计算量的同时保持了较好的性能。通过这一步骤，可以生成包含基本特征信息的基本特征图。Next, the enhanced image is subjected to feature extraction using the depthwise separable convolution algorithm. This algorithm decomposes the traditional convolution operation into two steps: depthwise convolution and pointwise convolution, which reduces the amount of computation while maintaining good performance. Through this step, a basic feature map containing basic feature information can be generated.

为了增强对不同尺度目标的识别能力，采用多尺度空间金字塔池化方法对基本特征图进行处理。该方法在不同层级的特征图中提取尺度不变的特征，并将其融合在一起，生成矢量特征图。这样可以得到更丰富、更具判别性的特征表示。In order to enhance the recognition ability of objects of different scales, a multi-scale spatial pyramid pooling method is used to process the basic feature map. This method extracts scale-invariant features from feature maps at different levels and fuses them together to generate a vector feature map. This can obtain a richer and more discriminative feature representation.

基于生成的矢量特征图，使用区域提议网络结合边框回归进行目标的初步检测。区域提议网络能够生成一系列候选框，并通过边框回归对这些候选框进行调整和优化，以精确定位目标的位置和大小。通过这一步骤，可以生成初步目标识别信息，包括目标的位置、大小和形状等关键信息。Based on the generated vector feature map, the region proposal network is used in combination with bounding box regression to perform preliminary detection of the target. The region proposal network can generate a series of candidate boxes, and these candidate boxes are adjusted and optimized through bounding box regression to accurately locate the position and size of the target. Through this step, preliminary target recognition information can be generated, including key information such as the position, size and shape of the target.

在本发明中，在生成了初步目标识别信息之后，为了更好地锁定初步目标识别信息，提高所框选的目标的准确性，需要根据初步目标识别信息计算得到候选框内所框选的初步目标的准确性，也即计算框选的初步目标为最终目标的可能性，通过一目标吻合指数进行判断。In the present invention, after the preliminary target identification information is generated, in order to better lock the preliminary target identification information and improve the accuracy of the selected target, it is necessary to calculate the accuracy of the preliminary target selected in the candidate frame based on the preliminary target identification information, that is, calculate the possibility that the preliminary target selected is the final target, and judge it through a target matching index.

具体的，本发明还包括如下步骤：Specifically, the present invention further comprises the following steps:

S104a：基于所述初步目标识别信息，获取目标的位置、目标的大小以及目标的形状；S104a: Based on the preliminary target recognition information, obtain the position, size and shape of the target;

S104b：根据目标的位置、目标的大小以及目标的形状，在预设目标数据库中分别进行比对，以分别得到目标的位置项相似度、目标的大小项相似度以及目标的形状项相似度；S104b: performing comparisons in a preset target database according to the position of the target, the size of the target, and the shape of the target, respectively, to obtain the similarity of the position item of the target, the similarity of the size item of the target, and the similarity of the shape item of the target;

S104c：根据目标的位置项相似度、目标的大小项相似度以及目标的形状项相似度计算得到初步目标的目标吻合指数。S104c: Calculate the target matching index of the preliminary target according to the target position item similarity, the target size item similarity and the target shape item similarity.

在本步骤中，初步目标的目标吻合指数的计算公式表示为：In this step, the calculation formula of the target matching index of the preliminary target is expressed as:

W＝W₀+η_l·w_l+η_d·w_d+η_s·w_s；W＝W ₀ +η _l ·w _l +η _d ·w _d +η _s ·w _s ;

其中，W表示初步目标的目标吻合指数，W₀表示初步目标的目标吻合指数的基准值，η_l表示目标的位置项相似度，w_l表示目标的位置项的权重因子，η_d表示目标的大小项相似度，w_d表示目标的大小项的权重因子，η_s表示目标的形状项相似度，w_s表示目标的形状项的权重因子。Among them, W represents the target fit index of the preliminary target, _W0 represents the baseline value of the target fit index of the preliminary target, _ηl represents the similarity of the target's position item, _wl represents the weight factor of the target's position item, _ηd represents the similarity of the target's size item, _wd represents the weight factor of the target's size item, _ηs represents the similarity of the target's shape item, and _ws represents the weight factor of the target's shape item.

可以理解的，根据计算得到的初步目标的目标吻合指数，与预设的目标吻合指数阈值进行比较，便可将概率很低的初步目标进行筛选排除，从而提高初步目标为最终目标的可能性。It can be understood that by comparing the calculated target match index of the preliminary target with a preset target match index threshold, preliminary targets with very low probabilities can be screened out, thereby increasing the possibility that the preliminary target is the final target.

请参阅图3，基于初步目标识别信息，采用特征金字塔网络以及注意力机制进行多尺度特征融合，并加强对多尺寸目标的识别能力，生成多尺度目标特征信息的步骤具体为：Please refer to Figure 3. Based on the preliminary target recognition information, the feature pyramid network and attention mechanism are used to fuse multi-scale features and enhance the recognition ability of multi-size targets. The specific steps of generating multi-scale target feature information are as follows:

S201：基于初步目标识别信息，采用特征金字塔网络实现特征层间的融合，强化对多尺度目标的检测能力，生成融合特征图；S201: Based on the preliminary target recognition information, a feature pyramid network is used to achieve fusion between feature layers, enhance the detection capability of multi-scale targets, and generate a fusion feature map;

S202：基于融合特征图，采用注意力机制，增强特征的表达能力，生成注意力加权特征图；S202: Based on the fused feature map, an attention mechanism is used to enhance the expressiveness of the features and generate an attention-weighted feature map;

S203：基于注意力加权特征图，采用锚框优化算法进行锚框的优化调整，提升模型对多尺寸目标的拟合精度，生成优化锚框特征信息；S203: Based on the attention weighted feature map, an anchor frame optimization algorithm is used to optimize and adjust the anchor frame, improve the fitting accuracy of the model for multi-size targets, and generate optimized anchor frame feature information;

S204：基于优化锚框特征信息，通过级联卷积网络提炼特征，增强多尺度特征的表示力，生成多尺度目标特征信息。S204: Based on the optimized anchor frame feature information, features are refined through a cascaded convolutional network to enhance the representation of multi-scale features and generate multi-scale target feature information.

首先基于初步目标识别信息构建特征金字塔网络。特征金字塔是一种多尺度特征表示方法，通常包括多个分辨率的特征图。采用卷积神经网络(CNN)来提取每个分辨率的特征，并确保这些特征具有相同的语义信息。这样，可以获得不同尺度的特征表示，从而增强对多尺寸目标的检测能力。生成的融合特征图将包含来自不同尺度的信息。First, a feature pyramid network is constructed based on the preliminary target recognition information. Feature pyramid is a multi-scale feature representation method, which usually includes feature maps of multiple resolutions. A convolutional neural network (CNN) is used to extract features of each resolution and ensure that these features have the same semantic information. In this way, feature representations of different scales can be obtained, thereby enhancing the detection ability of multi-sized targets. The generated fused feature map will contain information from different scales.

基于融合特征图，引入注意力机制，以增强特征的表达能力。通过注意力机制，模型可以自动关注对目标识别最重要的部分。计算每个位置的注意力权重，然后将这些权重应用于融合特征图，生成注意力加权特征图。这有助于模型更好地聚焦在关键区域，提高了对多尺寸目标的识别能力。Based on the fused feature map, an attention mechanism is introduced to enhance the expressiveness of features. With the attention mechanism, the model can automatically focus on the most important parts for target recognition. The attention weights for each position are calculated, and then these weights are applied to the fused feature map to generate an attention-weighted feature map. This helps the model better focus on key areas and improves the recognition of multi-sized targets.

使用锚框优化算法对生成的注意力加权特征图进行锚框的优化调整。锚框是用于目标检测的预定义边界框，它们的大小和位置对于不同尺寸的目标非常关键。通过优化锚框，可以提升模型对多尺寸目标的拟合精度。这一步骤涉及到调整锚框的尺寸和位置，以使其更好地适应不同目标的尺寸和形状。The generated attention-weighted feature map is optimized using the anchor box optimization algorithm. Anchor boxes are predefined bounding boxes used for object detection, and their size and position are critical for objects of different sizes. By optimizing the anchor boxes, the model's fitting accuracy for objects of multiple sizes can be improved. This step involves adjusting the size and position of the anchor boxes to better adapt them to the sizes and shapes of different objects.

基于优化锚框特征信息，引入级联卷积网络，以进一步提炼特征并增强多尺度特征的表示能力。级联卷积网络是一系列卷积层的堆叠，用于捕获更高级的语义信息。这有助于进一步提高对多尺寸目标的识别性能。生成的多尺度目标特征信息将包括来自各个步骤的信息，提供了更全面的目标表示。Based on optimizing the anchor box feature information, a cascaded convolutional network is introduced to further refine the features and enhance the representation ability of multi-scale features. The cascaded convolutional network is a stack of convolutional layers to capture higher-level semantic information. This helps to further improve the recognition performance of multi-scale objects. The generated multi-scale object feature information will include information from each step, providing a more comprehensive object representation.

请参阅图4，基于多尺度目标特征信息，运用尺度估计模块和锚框机制进行自适应尺度调整，优化目标追踪框的大小，生成自适应尺度调整后的追踪信息的步骤具体为：Please refer to FIG. 4 . Based on the multi-scale target feature information, the scale estimation module and the anchor frame mechanism are used to perform adaptive scale adjustment, optimize the size of the target tracking frame, and generate the tracking information after adaptive scale adjustment. Specifically, the steps are as follows:

S301：基于多尺度目标特征信息，采用实时尺度估计算法对目标大小进行动态预测，适应目标尺度的变化，生成尺度调整信息；S301: Based on multi-scale target feature information, a real-time scale estimation algorithm is used to dynamically predict the target size, adapt to changes in the target scale, and generate scale adjustment information;

S302：基于尺度调整信息，采用锚框生成算法调整追踪框尺寸，使得追踪框贴合目标实际尺度，生成自适应尺度的追踪框；S302: Based on the scale adjustment information, an anchor frame generation algorithm is used to adjust the size of the tracking frame so that the tracking frame fits the actual scale of the target, and a tracking frame with an adaptive scale is generated;

S303：基于自适应尺度的追踪框，使用非极大值抑制算法消除重叠的追踪框，减少冗余，生成非极大值抑制后追踪框；S303: Based on the adaptive scale tracking frame, a non-maximum suppression algorithm is used to eliminate overlapping tracking frames, reduce redundancy, and generate a non-maximum suppressed tracking frame;

S304：基于非极大值抑制后追踪框，结合跟踪学习算法进行最终的目标追踪框定位，生成自适应尺度调整后的追踪信息。S304: Based on the tracking frame after non-maximum suppression, the final target tracking frame is located in combination with the tracking learning algorithm to generate tracking information after adaptive scale adjustment.

基于多尺度目标特征信息，采用实时尺度估计算法对目标大小进行动态预测。该算法可以根据当前帧的目标特征信息和历史帧的目标特征信息，预测目标在下一帧中的大小变化情况。通过这一步骤，可以生成尺度调整信息，用于后续的自适应尺度调整。Based on the multi-scale target feature information, a real-time scale estimation algorithm is used to dynamically predict the target size. The algorithm can predict the size change of the target in the next frame based on the target feature information of the current frame and the target feature information of the historical frame. Through this step, scale adjustment information can be generated for subsequent adaptive scale adjustment.

基于生成的尺度调整信息，采用锚框生成算法来调整追踪框的尺寸。该算法根据尺度调整信息，将追踪框的尺寸调整为与目标实际尺度相匹配的大小。通过这一步骤，可以生成自适应尺度的追踪框，以更好地适应目标尺度的变化。Based on the generated scale adjustment information, an anchor box generation algorithm is used to adjust the size of the tracking box. The algorithm adjusts the size of the tracking box to a size that matches the actual scale of the target based on the scale adjustment information. Through this step, an adaptive scale tracking box can be generated to better adapt to changes in the target scale.

基于生成的自适应尺度追踪框，使用非极大值抑制算法来消除重叠的追踪框。该算法通过比较不同追踪框的置信度，保留具有最高置信度的追踪框，并抑制其他重叠的追踪框。通过这一步骤，可以减少冗余的追踪框，提高追踪的准确性和效率。Based on the generated adaptive scale tracking frame, the non-maximum suppression algorithm is used to eliminate overlapping tracking frames. The algorithm compares the confidence of different tracking frames, retains the tracking frame with the highest confidence, and suppresses other overlapping tracking frames. Through this step, redundant tracking frames can be reduced and the accuracy and efficiency of tracking can be improved.

基于非极大值抑制后的追踪框，结合跟踪学习算法进行最终的目标追踪框定位。该算法利用前后帧之间的目标位置信息和运动信息，对追踪框进行进一步优化和定位。通过这一步骤，可以生成自适应尺度调整后的追踪信息，包括目标的位置、大小和形状等关键信息。Based on the tracking frame after non-maximum suppression, the final target tracking frame is located in combination with the tracking learning algorithm. The algorithm uses the target position information and motion information between the previous and next frames to further optimize and locate the tracking frame. Through this step, adaptively scaled tracking information can be generated, including key information such as the target's position, size, and shape.

请参阅图5，基于自适应尺度调整后的追踪信息，使用掩膜区域卷积神经网络和单眼深度估计算法处理目标遮挡问题，获取目标间的遮挡关系和场景深度信息，生成目标遮挡关系与深度信息的步骤具体为：Please refer to Figure 5. Based on the tracking information after adaptive scale adjustment, the mask area convolutional neural network and the monocular depth estimation algorithm are used to deal with the target occlusion problem, obtain the occlusion relationship between targets and the scene depth information, and the specific steps of generating the target occlusion relationship and depth information are as follows:

S401：基于自适应尺度调整后的追踪信息，采用特征金字塔网络增强的掩膜区域卷积神经网络进行目标实例分割，并准确判定遮挡边界，生成目标分割与遮挡边界信息；S401: Based on the tracking information after adaptive scale adjustment, a mask region convolutional neural network enhanced by a feature pyramid network is used to perform target instance segmentation, and occlusion boundaries are accurately determined to generate target segmentation and occlusion boundary information;

S402：基于目标分割与遮挡边界信息，利用深度学习的单眼深度估计方法，对场景进行三维重建，生成场景深度图；S402: Based on the target segmentation and occlusion boundary information, a monocular depth estimation method based on deep learning is used to perform three-dimensional reconstruction of the scene and generate a scene depth map;

S403：基于场景深度图，运用像素级融合技术处理，对目标间的遮挡关系进行解析，生成目标遮挡关系图；S403: Based on the scene depth map, pixel-level fusion technology is used to analyze the occlusion relationship between targets and generate a target occlusion relationship map;

S404：基于目标遮挡关系图，结合深度排序算法和遮挡处理技术，确定目标间的遮挡等级和深度，生成目标遮挡关系与深度信息。S404: Based on the target occlusion relationship graph, combined with the depth sorting algorithm and the occlusion processing technology, the occlusion level and depth between the targets are determined, and the target occlusion relationship and depth information are generated.

基于自适应尺度调整后的追踪信息，采用特征金字塔网络增强的掩膜区域卷积神经网络进行目标实例分割。该算法能够准确判定目标和背景之间的边界，并生成目标分割与遮挡边界信息。通过这一步骤，可以对目标进行精确的分割，并确定遮挡边界的位置。Based on the tracking information after adaptive scale adjustment, the mask region convolutional neural network enhanced by the feature pyramid network is used to segment the target instance. This algorithm can accurately determine the boundary between the target and the background and generate target segmentation and occlusion boundary information. Through this step, the target can be accurately segmented and the location of the occlusion boundary can be determined.

基于生成的目标分割与遮挡边界信息，利用深度学习的单眼深度估计方法对场景进行三维重建。该方法可以通过分析图像中的像素信息，推断出场景中物体的深度信息，并生成场景深度图。通过这一步骤，可以获得场景中各个物体的相对位置和深度信息。Based on the generated target segmentation and occlusion boundary information, the scene is reconstructed in 3D using a deep learning monocular depth estimation method. This method can infer the depth information of objects in the scene by analyzing the pixel information in the image and generate a scene depth map. Through this step, the relative position and depth information of each object in the scene can be obtained.

基于生成的场景深度图，运用像素级融合技术处理，对目标间的遮挡关系进行解析。该技术可以根据场景深度图中的深度信息，判断目标之间的遮挡关系，并生成目标遮挡关系图。通过这一步骤，可以准确地描述目标之间的遮挡情况。Based on the generated scene depth map, the pixel-level fusion technology is used to analyze the occlusion relationship between targets. This technology can determine the occlusion relationship between targets based on the depth information in the scene depth map and generate a target occlusion relationship map. Through this step, the occlusion situation between targets can be accurately described.

基于生成的目标遮挡关系图，结合深度排序算法和遮挡处理技术，确定目标间的遮挡等级和深度。该算法可以根据目标遮挡关系图中的信息，对目标进行排序，并确定每个目标的遮挡等级和深度。通过这一步骤，可以获得目标之间的遮挡关系和场景深度信息的详细描述。Based on the generated target occlusion relationship graph, the depth sorting algorithm and occlusion processing technology are combined to determine the occlusion level and depth between targets. The algorithm can sort the targets according to the information in the target occlusion relationship graph and determine the occlusion level and depth of each target. Through this step, a detailed description of the occlusion relationship between targets and the scene depth information can be obtained.

请参阅图6，基于目标遮挡关系与深度信息，通过图卷积神经网络整合多摄像头视角的数据，并建立目标在多摄像头视野中的时空关联，生成跨摄像头目标追踪信息的步骤具体为：Please refer to Figure 6. Based on the target occlusion relationship and depth information, the graph convolutional neural network is used to integrate the data of multiple camera perspectives, and the spatiotemporal correlation of the target in the multi-camera field of view is established. The specific steps for generating cross-camera target tracking information are as follows:

S501：基于目标遮挡关系与深度信息，应用图卷积神经网络，结合特征抽取和优化策略进行多视角数据整合，生成初步跨摄像头目标关联信息；S501: Based on the target occlusion relationship and depth information, a graph convolutional neural network is applied, combined with feature extraction and optimization strategies to integrate multi-view data and generate preliminary cross-camera target association information;

S502：基于初步跨摄像头目标关联信息，采用多摄像机几何校准方法，通过基于极线约束的校准技术，细化目标位置关系，生成多视角目标位置信息；S502: Based on the preliminary cross-camera target association information, a multi-camera geometric calibration method is used to refine the target position relationship through a calibration technology based on epipolar constraints to generate multi-view target position information;

S503：基于多视角目标位置信息，运用时间序列分析技术，通过动态时间规整进行时序数据分析，优化目标时空关联，生成时间优化后的目标关联信息；S503: Based on the multi-view target position information, using time series analysis technology, performing time series data analysis through dynamic time warping, optimizing the target spatiotemporal association, and generating time-optimized target association information;

S504：基于时间优化后的目标关联信息，整合多摄像头数据，通过多视角追踪融合算法完成时空关联，生成跨摄像头目标追踪信息。S504: Based on the time-optimized target association information, the multi-camera data is integrated, and the spatiotemporal association is completed through a multi-view tracking fusion algorithm to generate cross-camera target tracking information.

基于目标遮挡关系与深度信息，应用图卷积神经网络结合特征抽取和优化策略进行多视角数据整合。该算法能够将来自不同摄像头的视角数据进行融合，生成初步的跨摄像头目标关联信息。通过这一步骤，可以充分利用多个摄像头的视角信息，提高目标追踪的准确性和鲁棒性。Based on the target occlusion relationship and depth information, a graph convolutional neural network is used in combination with feature extraction and optimization strategies to integrate multi-view data. This algorithm can fuse the view data from different cameras and generate preliminary cross-camera target association information. Through this step, the view information of multiple cameras can be fully utilized to improve the accuracy and robustness of target tracking.

基于生成的初步跨摄像头目标关联信息，采用多摄像机几何校准方法，通过基于极线约束的校准技术，细化目标位置关系，生成多视角目标位置信息。该方法可以通过对多个摄像头之间的几何关系进行校准，进一步提高目标位置的准确性和一致性。Based on the generated preliminary cross-camera target association information, a multi-camera geometric calibration method is used to refine the target position relationship through the calibration technology based on epipolar constraints to generate multi-view target position information. This method can further improve the accuracy and consistency of target position by calibrating the geometric relationship between multiple cameras.

基于生成的多视角目标位置信息，运用时间序列分析技术，通过动态时间规整进行时序数据分析，优化目标时空关联，生成时间优化后的目标关联信息。该算法可以根据时间维度的信息，对目标的运动轨迹进行建模和预测，进一步提高目标追踪的准确性和稳定性。Based on the generated multi-view target position information, time series analysis technology is used to analyze time series data through dynamic time warping, optimize the target time-space association, and generate time-optimized target association information. The algorithm can model and predict the target's motion trajectory based on the information in the time dimension, further improving the accuracy and stability of target tracking.

基于生成的时间优化后的目标关联信息，整合多摄像头数据，通过多视角追踪融合算法完成时空关联，生成跨摄像头目标追踪信息。该算法可以将来自不同摄像头的目标关联信息进行融合，得到最终的跨摄像头目标追踪结果。通过这一步骤，可以实现对多个摄像头下的目标进行全局追踪和关联分析。Based on the generated time-optimized target association information, the multi-camera data is integrated, and the spatiotemporal association is completed through the multi-view tracking fusion algorithm to generate cross-camera target tracking information. This algorithm can fuse the target association information from different cameras to obtain the final cross-camera target tracking result. Through this step, global tracking and association analysis of targets under multiple cameras can be achieved.

请参阅图7，基于跨摄像头目标追踪信息，结合时间卷积神经网络和长短期记忆神经网络对目标模式进行学习，并使用单类支持向量机对异常行为进行识别、预警，生成异常行为识别与预警报告的步骤具体为：Please refer to Figure 7. Based on the cross-camera target tracking information, the target pattern is learned by combining the temporal convolutional neural network and the long short-term memory neural network, and the abnormal behavior is identified and warned using a single-class support vector machine. The specific steps of generating the abnormal behavior identification and warning report are as follows:

S601：基于跨摄像头目标追踪信息，结合时间卷积神经网络和长短期记忆网络进行深度特征学习，生成目标行为特征学习结果；S601: Based on the cross-camera target tracking information, a temporal convolutional neural network and a long short-term memory network are combined to perform deep feature learning to generate target behavior feature learning results;

S602：基于目标行为特征学习结果，采用特征聚类算法进行时空特征分析，提取目标行为模式，生成时空聚类后的行为特征；S602: Based on the target behavior feature learning result, a feature clustering algorithm is used to perform spatiotemporal feature analysis, extract the target behavior pattern, and generate spatiotemporal clustered behavior features;

S603：基于时空聚类后的行为特征，利用单类支持向量机进行异常行为的识别和建模，生成异常行为识别模型；S603: Based on the behavior characteristics after spatiotemporal clustering, a single-class support vector machine is used to identify and model abnormal behaviors to generate an abnormal behavior identification model;

S604：基于异常行为识别模型，结合实时监控数据，进行异常行为预警，生成异常行为识别与预警报告。S604: Based on the abnormal behavior recognition model and in combination with real-time monitoring data, abnormal behavior warning is performed to generate an abnormal behavior recognition and warning report.

在本实施例中，基于上述的异常行为识别模型，会对监控数据进行分析进而分类，通过计算每一类的异常行为对应的异常指数来计算得到对应的异常行为综合值。具体的，异常行为综合值的计算公式表示为：In this embodiment, based on the above abnormal behavior recognition model, the monitoring data will be analyzed and classified, and the corresponding abnormal behavior comprehensive value will be calculated by calculating the abnormal index corresponding to each type of abnormal behavior. Specifically, the calculation formula of the abnormal behavior comprehensive value is expressed as:

其中，Y表示异常行为综合值，Y₀表示异常行为综合值的基准值，λ_i表示第i类异常行为的权重因子，ε_i表示第i类异常行为的校准因子，y_i表示第i类异常行为的异常指数，表示第i类异常行为的基准异常指数，I表示异常行为的最大类别数量。Where Y represents the comprehensive value of abnormal behavior, _Y0 represents the reference value of the comprehensive value of abnormal behavior, _λi represents the weight factor of the i-th abnormal behavior, _εi represents the calibration factor of the i-th abnormal behavior, _yi represents the abnormal index of the i-th abnormal behavior, represents the baseline anomaly index of the i-th type of abnormal behavior, and I represents the maximum number of categories of abnormal behavior.

可以理解的，当计算得到的异常行为综合值大于预设异常行为阈值，则此时可触发预警，协助尽早地发现险情并及时作出预案。It can be understood that when the calculated comprehensive value of abnormal behavior is greater than the preset abnormal behavior threshold, an early warning can be triggered to assist in discovering dangerous situations as early as possible and making emergency plans in a timely manner.

基于跨摄像头目标追踪信息，结合时间卷积神经网络和长短期记忆网络进行深度特征学习。该算法能够对目标的行为模式进行学习和提取，生成目标行为特征学习结果。通过这一步骤，可以获取目标在不同时间和空间下的行为特征，为后续的异常行为识别提供基础。Based on cross-camera target tracking information, deep feature learning is performed by combining temporal convolutional neural networks and long short-term memory networks. This algorithm can learn and extract the target's behavior patterns and generate target behavior feature learning results. Through this step, the target's behavior characteristics in different time and space can be obtained, providing a basis for subsequent abnormal behavior identification.

基于生成的目标行为特征学习结果，采用特征聚类算法进行时空特征分析，提取目标行为模式，生成时空聚类后的行为特征。该算法可以将具有相似行为模式的目标进行聚类，得到不同类别的目标行为特征。通过这一步骤，可以进一步细化目标行为特征，提高异常行为的识别准确性。Based on the generated target behavior feature learning results, a feature clustering algorithm is used to perform spatiotemporal feature analysis, extract target behavior patterns, and generate spatiotemporal clustered behavior features. This algorithm can cluster targets with similar behavior patterns to obtain target behavior features of different categories. Through this step, the target behavior features can be further refined to improve the accuracy of abnormal behavior recognition.

基于生成的时空聚类后的行为特征，利用单类支持向量机进行异常行为的识别和建模，生成异常行为识别模型。该算法可以通过训练一个二分类器来区分正常行为和异常行为，并生成相应的异常行为识别模型。通过这一步骤，可以实现对异常行为的自动识别和预警。Based on the generated spatiotemporal clustering behavior features, a single-class support vector machine is used to identify and model abnormal behaviors and generate an abnormal behavior recognition model. This algorithm can distinguish between normal and abnormal behaviors by training a binary classifier and generate a corresponding abnormal behavior recognition model. Through this step, automatic identification and early warning of abnormal behaviors can be achieved.

基于生成的异常行为识别模型，结合实时监控数据，进行异常行为预警，生成异常行为识别与预警报告。该算法可以根据实时监控数据，对目标的行为进行实时分析和预测，一旦发现异常行为，即可发出预警信号，并生成相应的异常行为识别与预警报告。Based on the generated abnormal behavior recognition model, combined with real-time monitoring data, abnormal behavior warning is carried out, and abnormal behavior recognition and warning reports are generated. The algorithm can analyze and predict the target's behavior in real time based on real-time monitoring data. Once abnormal behavior is found, a warning signal can be issued and a corresponding abnormal behavior recognition and warning report can be generated.

请参阅图8，智慧城市摄像机多目标追踪系统，智慧城市摄像机多目标追踪系统用于执行上述智慧城市摄像机多目标追踪方法，系统包括图像预处理模块、基本特征提取模块、初步目标识别模块、特征增强与优化模块、目标追踪框预处理模块、目标关联模块、行为学习与预警模块。Please refer to Figure 8, a smart city camera multi-target tracking system, which is used to execute the above-mentioned smart city camera multi-target tracking method. The system includes an image preprocessing module, a basic feature extraction module, a preliminary target recognition module, a feature enhancement and optimization module, a target tracking frame preprocessing module, a target association module, and a behavior learning and early warning module.

图像预处理模块基于原始监控图像，采用直方图均衡化和高斯模糊算法，进行图像增强和噪声降低，生成增强后图像；The image preprocessing module uses histogram equalization and Gaussian blur algorithm to perform image enhancement and noise reduction based on the original surveillance image to generate an enhanced image;

基本特征提取模块基于增强后图像，提取基本特征，采用深度可分离卷积算法，减少计算量并保持性能，生成基本特征图；The basic feature extraction module extracts basic features based on the enhanced image, adopts the depth-separable convolution algorithm to reduce the amount of calculation and maintain performance, and generates a basic feature map;

初步目标识别模块基于基本特征图，进行初步目标识别，采用多尺度空间金字塔池化以增强尺度不变特征提取，并采用区域提议网络结合边框回归，定位目标，生成初步目标识别信息；The preliminary target recognition module performs preliminary target recognition based on the basic feature map, uses multi-scale spatial pyramid pooling to enhance scale-invariant feature extraction, and uses a region proposal network combined with bounding box regression to locate the target and generate preliminary target recognition information;

特征增强与优化模块基于初步目标识别信息，进行特征增强并优化，使用特征金字塔网络实现特征层间的融合并采用注意力机制以增强特征的表达能力，通过采用锚框优化算法进行框的优化调整，并通过级联卷积网络提炼特征，生成多尺度目标特征信息；The feature enhancement and optimization module performs feature enhancement and optimization based on the preliminary target recognition information. It uses the feature pyramid network to achieve the fusion between feature layers and adopts the attention mechanism to enhance the expressiveness of features. It optimizes the frame by using the anchor frame optimization algorithm and refines the features through the cascade convolutional network to generate multi-scale target feature information.

目标追踪框预处理模块基于多尺度目标特征信息，处理追踪框，采用实时尺度估计算法对目标大小进行动态预测，通过锚框生成算法调整追踪框尺寸，并结合非极大值抑制算法消除重叠的追踪框，生成自适应尺度调整后的追踪信息；The target tracking frame preprocessing module processes the tracking frame based on multi-scale target feature information, uses a real-time scale estimation algorithm to dynamically predict the target size, adjusts the tracking frame size through an anchor frame generation algorithm, and combines a non-maximum suppression algorithm to eliminate overlapping tracking frames, generating tracking information after adaptive scale adjustment;

目标关联模块基于自适应尺度调整后的追踪信息，运行目标关联算法，采用特征金字塔网络增强的掩膜区域卷积神经网络进行目标实例分割，通过深度学习的单眼深度估计方法对场景进行三维重建，使用像素级融合技术处理，并通过深度排序算法和遮挡处理技术确定目标间的遮挡等级和深度，生成目标遮挡关系与深度信息；The target association module runs the target association algorithm based on the tracking information after adaptive scale adjustment, uses the mask area convolutional neural network enhanced by the feature pyramid network to perform target instance segmentation, reconstructs the scene in three dimensions through the monocular depth estimation method based on deep learning, uses pixel-level fusion technology to process, and determines the occlusion level and depth between targets through the depth sorting algorithm and occlusion processing technology, and generates target occlusion relationship and depth information;

行为学习与预警模块基于目标遮挡关系与深度信息进行行为学习与预警，使用时间卷积神经网络和长短期记忆网络进行深度特征学习，通过特征聚类算法进行时空特征分析，利用单类支持向量机进行异常行为的识别和建模，结合实时监控数据进行异常行为预警，生成异常行为识别与预警报告。The behavior learning and warning module performs behavior learning and warning based on target occlusion relationships and depth information, uses temporal convolutional neural networks and long short-term memory networks for deep feature learning, performs spatiotemporal feature analysis through feature clustering algorithms, and uses single-class support vector machines to identify and model abnormal behaviors. It combines real-time monitoring data to perform abnormal behavior warnings and generate abnormal behavior identification and warning reports.

图像预处理模块采用直方图均衡化和高斯模糊算法进行图像增强和噪声降低，可以有效提高后续特征提取的准确性和稳定性。基本特征提取模块采用深度可分离卷积算法，减少计算量的同时保持了性能，提高了特征提取的效率。初步目标识别模块采用多尺度空间金字塔池化和区域提议网络结合边框回归的方法，能够准确地定位和识别目标，为后续的目标追踪提供了可靠的基础。The image preprocessing module uses histogram equalization and Gaussian blur algorithms for image enhancement and noise reduction, which can effectively improve the accuracy and stability of subsequent feature extraction. The basic feature extraction module uses a deep separable convolution algorithm to reduce the amount of calculation while maintaining performance and improving the efficiency of feature extraction. The preliminary target recognition module uses a multi-scale spatial pyramid pooling and region proposal network combined with a bounding box regression method to accurately locate and identify targets, providing a reliable foundation for subsequent target tracking.

此外，特征增强与优化模块使用特征金字塔网络实现特征层间的融合，并引入注意力机制来增强特征的表达能力，进一步提高了目标识别的准确性。In addition, the feature enhancement and optimization module uses a feature pyramid network to achieve fusion between feature layers and introduces an attention mechanism to enhance the expressiveness of features, further improving the accuracy of target recognition.

目标追踪框预处理模块采用实时尺度估计算法对目标大小进行动态预测，并通过锚框生成算法调整追踪框尺寸，结合非极大值抑制算法消除重叠的追踪框，实现了自适应尺度调整后的追踪信息生成。The target tracking frame preprocessing module uses a real-time scale estimation algorithm to dynamically predict the target size, and adjusts the tracking frame size through the anchor frame generation algorithm. It combines the non-maximum suppression algorithm to eliminate overlapping tracking frames, and realizes the generation of tracking information after adaptive scale adjustment.

目标关联模块利用特征金字塔网络增强的掩膜区域卷积神经网络进行目标实例分割，通过深度学习的单眼深度估计方法对场景进行三维重建，使用像素级融合技术处理，并通过深度排序算法和遮挡处理技术确定目标间的遮挡等级和深度，生成目标遮挡关系与深度信息。The target association module uses a mask area convolutional neural network enhanced by a feature pyramid network to perform target instance segmentation, reconstructs the scene in three dimensions through a deep learning monocular depth estimation method, processes it using pixel-level fusion technology, and determines the occlusion level and depth between targets through a depth sorting algorithm and occlusion processing technology, generating target occlusion relationships and depth information.

行为学习与预警模块利用时间卷积神经网络和长短期记忆网络进行深度特征学习，通过特征聚类算法进行时空特征分析，利用单类支持向量机进行异常行为的识别和建模，结合实时监控数据进行异常行为预警，生成异常行为识别与预警报告。The behavior learning and warning module uses temporal convolutional neural networks and long short-term memory networks for deep feature learning, performs spatiotemporal feature analysis through feature clustering algorithms, uses single-class support vector machines to identify and model abnormal behaviors, combines real-time monitoring data for abnormal behavior warnings, and generates abnormal behavior identification and warning reports.

请参阅图9，图像预处理模块包括噪声降低子模块、图像增强子模块、过滤异常子模块、图像标准化子模块；Please refer to FIG9 , the image preprocessing module includes a noise reduction submodule, an image enhancement submodule, an abnormality filtering submodule, and an image normalization submodule;

基本特征提取模块包括深度卷积子模块、特征提取子模块、特征压缩子模块、特征噪声筛选子模块；The basic feature extraction module includes a deep convolution submodule, a feature extraction submodule, a feature compression submodule, and a feature noise screening submodule;

初步目标识别模块包括目标定位子模块、目标检测子模块、目标筛选子模块、目标确认子模块；The preliminary target recognition module includes a target positioning submodule, a target detection submodule, a target screening submodule, and a target confirmation submodule;

特征增强与优化模块包括特征融合子模块、特征注意力加权子模块、锚框优化子模块、特征提炼子模块；The feature enhancement and optimization module includes feature fusion submodule, feature attention weighting submodule, anchor box optimization submodule, and feature extraction submodule;

目标追踪框预处理模块包括实时尺度估计子模块、锚框生成子模块、非极大值抑制子模块、目标追踪框定位子模块；The target tracking frame preprocessing module includes a real-time scale estimation submodule, an anchor frame generation submodule, a non-maximum suppression submodule, and a target tracking frame positioning submodule;

目标关联模块包括目标实例分割子模块、场景三维重建子模块、目标遮挡关系解析子模块、遮挡等级和深度判定子模块；The target association module includes a target instance segmentation submodule, a scene 3D reconstruction submodule, a target occlusion relationship analysis submodule, and an occlusion level and depth determination submodule;

行为学习与预警模块包括深度特征学习子模块、时空特征分析子模块、异常行为识别子模块、实时异常预警子模块。The behavior learning and warning module includes a deep feature learning submodule, a spatiotemporal feature analysis submodule, an abnormal behavior recognition submodule, and a real-time abnormal warning submodule.

图像预处理模块中，噪声降低子模块采用高斯模糊算法对原始监控图像进行噪声降低；图像增强子模块采用直方图均衡化算法对图像进行增强；过滤异常子模块用于去除图像中的异常值；图像标准化子模块将图像数据进行标准化处理。In the image preprocessing module, the noise reduction submodule uses the Gaussian blur algorithm to reduce the noise of the original monitoring image; the image enhancement submodule uses the histogram equalization algorithm to enhance the image; the abnormal filtering submodule is used to remove abnormal values in the image; and the image normalization submodule normalizes the image data.

基本特征提取模块中，深度卷积子模块采用深度可分离卷积算法提取基本特征；特征提取子模块用于从增强后图像中提取关键特征；特征压缩子模块通过减少特征维度来降低计算量；特征噪声筛选子模块用于筛选出有效的特征。In the basic feature extraction module, the deep convolution submodule uses the deep separable convolution algorithm to extract basic features; the feature extraction submodule is used to extract key features from the enhanced image; the feature compression submodule reduces the amount of calculation by reducing the feature dimension; and the feature noise screening submodule is used to screen out effective features.

初步目标识别模块中，目标定位子模块使用区域提议网络结合边框回归来定位目标；目标检测子模块用于检测出多个目标；目标筛选子模块根据预设的阈值和置信度筛选出可靠的目标；目标确认子模块用于确认目标的身份信息。In the preliminary target recognition module, the target localization submodule uses a region proposal network combined with bounding box regression to locate the target; the target detection submodule is used to detect multiple targets; the target screening submodule screens out reliable targets based on preset thresholds and confidence levels; and the target confirmation submodule is used to confirm the identity information of the target.

特征增强与优化模块中，特征融合子模块使用特征金字塔网络实现特征层间的融合；特征注意力加权子模块通过引入注意力机制来增强特征的表达能力；锚框优化子模块通过调整追踪框尺寸来优化目标的位置估计；特征提炼子模块通过级联卷积网络进一步提炼特征。In the feature enhancement and optimization module, the feature fusion submodule uses the feature pyramid network to achieve fusion between feature layers; the feature attention weighting submodule enhances the expressiveness of features by introducing the attention mechanism; the anchor box optimization submodule optimizes the target position estimation by adjusting the tracking box size; the feature extraction submodule further refines features through a cascaded convolutional network.

目标追踪框预处理模块中，实时尺度估计子模块采用实时尺度估计算法对目标大小进行动态预测；锚框生成子模块通过锚框生成算法调整追踪框尺寸；非极大值抑制子模块用于消除重叠的追踪框；目标追踪框定位子模块用于确定追踪框的位置。In the target tracking frame preprocessing module, the real-time scale estimation submodule uses a real-time scale estimation algorithm to dynamically predict the target size; the anchor frame generation submodule adjusts the tracking frame size through the anchor frame generation algorithm; the non-maximum suppression submodule is used to eliminate overlapping tracking frames; and the target tracking frame positioning submodule is used to determine the position of the tracking frame.

目标关联模块中，目标实例分割子模块使用特征金字塔网络增强的掩膜区域卷积神经网络进行目标实例分割；场景三维重建子模块通过深度学习的单眼深度估计方法对场景进行三维重建；目标遮挡关系解析子模块使用像素级融合技术处理，并通过深度排序算法和遮挡处理技术确定目标间的遮挡等级和深度；遮挡等级和深度判定子模块用于判断目标之间的遮挡关系和深度信息。In the target association module, the target instance segmentation submodule uses the mask area convolutional neural network enhanced by the feature pyramid network to perform target instance segmentation; the scene 3D reconstruction submodule reconstructs the scene in 3D through the monocular depth estimation method based on deep learning; the target occlusion relationship analysis submodule uses pixel-level fusion technology to process, and determines the occlusion level and depth between targets through the depth sorting algorithm and occlusion processing technology; the occlusion level and depth determination submodule is used to determine the occlusion relationship and depth information between targets.

行为学习与预警模块中，深度特征学习子模块使用时间卷积神经网络和长短期记忆网络进行深度特征学习；时空特征分析子模块通过特征聚类算法进行时空特征分析；异常行为识别子模块利用单类支持向量机进行异常行为的识别和建模；实时异常预警子模块结合实时监控数据进行异常行为预警，并生成异常行为识别与预警报告。In the behavior learning and warning module, the deep feature learning submodule uses temporal convolutional neural networks and long short-term memory networks for deep feature learning; the spatiotemporal feature analysis submodule performs spatiotemporal feature analysis through feature clustering algorithms; the abnormal behavior recognition submodule uses a single-class support vector machine to identify and model abnormal behaviors; the real-time abnormal warning submodule combines real-time monitoring data to warn of abnormal behaviors and generates abnormal behavior recognition and warning reports.

以上，仅是本发明的较佳实施例而已，并非对本发明作其他形式的限制，任何熟悉本专业的技术人员可能利用上述揭示的技术内容加以变更或改型为等同变化的等效实施例应用于其他领域，但是凡是未脱离本发明技术方案内容，依据本发明的技术实质对以上实施例所做的任何简单修改、等同变化与改型，仍属于本发明技术方案的保护范围。The above are only preferred embodiments of the present invention and are not intended to limit the present invention in other forms. Any technician familiar with the profession may use the technical contents disclosed above to change or modify them into equivalent embodiments with equivalent changes and apply them to other fields. However, any simple modification, equivalent change and modification made to the above embodiments based on the technical essence of the present invention without departing from the technical solution of the present invention still falls within the protection scope of the technical solution of the present invention.

Claims

1. The multi-target tracking method for the smart city camera is characterized by comprising the following steps of:

Based on the monitoring image, performing image processing and preliminary target identification by adopting a depth separation convolutional neural network and scale invariant feature transformation, and performing preliminary detection of targets to generate preliminary target identification information;

based on the preliminary target identification information, carrying out multi-scale feature fusion by adopting a feature pyramid network and an attention mechanism, enhancing the identification capability of multi-size targets, and generating multi-scale target feature information;

Based on the multi-scale target characteristic information, performing self-adaptive scale adjustment by using a scale estimation module and an anchor frame mechanism, optimizing the size of a target tracking frame, and generating tracking information after the self-adaptive scale adjustment;

based on the tracking information after the self-adaptive scale adjustment, a mask region convolutional neural network and a monocular depth estimation algorithm are used for processing a target shielding problem, shielding relation and scene depth information between targets are obtained, and target shielding relation and depth information are generated;

integrating the data of multiple camera view angles through a graph convolution neural network based on the target shielding relation and the depth information, and establishing space-time correlation of targets in the multiple camera view fields to generate target tracking information crossing cameras;

Based on the target tracking information of the cross-camera, learning a target mode by combining a time convolution neural network and a long-term and short-term memory neural network, and identifying and early warning abnormal behaviors by using a single support vector machine to generate an abnormal behavior identification and early warning report;

The preliminary target identification information comprises position, size and shape information of a target, the multi-scale target characteristic information is specifically multi-level and multi-scale target characteristic description, the multi-scale target characteristic information comprises details and structure information of the target under the multi-scale, the tracking information after the self-adaptive scale adjustment is specifically tracking frames and characteristic information of the matched target under the multi-scale, the target shielding relation and depth information is specifically identification information comprising shielding targets and depth positions of the targets in a scene, the cross-camera target tracking information comprises positions, time relations and behavior information of key targets among multiple cameras, and the abnormal behavior identification and early warning report is specifically found illegal behaviors or abnormal behaviors, and describes and early warns time, place and property of the illegal behaviors or abnormal behaviors.

2. The smart city camera multi-target tracking method of claim 1, wherein the steps of performing image processing and preliminary target recognition using a depth-separation convolutional neural network and scale-invariant feature transform based on the monitored image, and performing preliminary detection of the target, and generating preliminary target recognition information are specifically as follows:

based on the original monitoring image, carrying out image enhancement and noise reduction by adopting a histogram equalization and Gaussian blur algorithm to generate an enhanced image;

based on the enhanced image, extracting basic features by adopting a depth separable convolution algorithm, reducing calculated amount and maintaining performance, and generating a basic feature map;

Based on the basic feature map, adopting a multi-scale space pyramid pooling enhancement scale invariant feature extraction to generate a vector feature map;

based on the vector feature map, the area proposal network is adopted to combine with frame regression to perform preliminary detection of the target, the target is accurately positioned, and preliminary target identification information is generated.

3. The smart city camera multi-target tracking method of claim 1, wherein the step of generating multi-scale target feature information by performing multi-scale feature fusion using a feature pyramid network and an attention mechanism based on the preliminary target identification information and enhancing the identification capability of multi-scale targets comprises the steps of:

based on the preliminary target identification information, realizing fusion between feature layers by adopting a feature pyramid network, enhancing the detection capability of a multi-scale target, and generating a fusion feature map;

based on the fusion feature map, adopting an attention mechanism to enhance the expression capacity of the features and generating an attention weighted feature map;

based on the attention weighted feature map, an anchor frame optimization algorithm is adopted to perform optimization adjustment of the anchor frame, fitting precision of the model to the multi-size targets is improved, and optimized anchor frame feature information is generated;

And extracting the characteristics through a cascade convolution network based on the optimized anchor frame characteristic information, enhancing the representation force of the multi-scale characteristics, and generating multi-scale target characteristic information.

4. The smart city camera multi-target tracking method according to claim 1, wherein the step of adaptively scaling the size of the target tracking frame and optimizing the size of the target tracking frame based on the multi-scale target feature information by using a scale estimation module and an anchor frame mechanism, and generating the tracking information after adaptively scaling is specifically as follows:

Based on the multi-scale target characteristic information, adopting a real-time scale estimation algorithm to dynamically predict the size of the target, adapting to the change of the target scale and generating scale adjustment information;

Based on the scale adjustment information, adjusting the size of the tracking frame by adopting an anchor frame generation algorithm, enabling the tracking frame to be attached to the actual scale of the target, and generating a tracking frame with a self-adaptive scale;

based on the tracking frames of the self-adaptive scale, using a non-maximum suppression algorithm to eliminate overlapped tracking frames, reducing redundancy and generating tracking frames after non-maximum suppression;

and based on the tracking frame after non-maximum suppression, carrying out final target tracking frame positioning by combining a tracking learning algorithm, and generating tracking information after self-adaptive scale adjustment.

5. The smart city camera multi-target tracking method according to claim 1, wherein the step of processing a target occlusion problem using a mask area convolutional neural network and a monocular depth estimation algorithm based on the tracking information after the adaptive scale adjustment to obtain an occlusion relationship between targets and scene depth information, and generating the target occlusion relationship and the depth information is specifically as follows:

Based on the tracking information after the self-adaptive scale adjustment, a mask area convolutional neural network enhanced by a characteristic pyramid network is adopted to segment a target instance, and an occlusion boundary is accurately judged to generate target segmentation and occlusion boundary information;

based on the target segmentation and shielding boundary information, performing three-dimensional reconstruction on the scene by using a monocular depth estimation method of depth learning to generate a scene depth map;

Based on the scene depth map, analyzing the shielding relation among targets by using pixel-level fusion technology processing to generate a target shielding relation map;

And determining the shielding level and depth between targets based on the target shielding relation graph by combining a depth ordering algorithm and a shielding processing technology, and generating target shielding relation and depth information.

6. The smart city camera multi-target tracking method of claim 1, wherein integrating data of multiple camera angles of view through a graph convolutional neural network based on the target occlusion relationship and depth information, and establishing space-time correlation of targets in multiple camera fields of view, the step of generating inter-camera target tracking information specifically comprises:

Based on the target shielding relation and the depth information, a graph convolutional neural network is applied, and multi-view data integration is carried out by combining feature extraction and optimization strategies, so that preliminary cross-camera target association information is generated;

based on the preliminary cross-camera target association information, a multi-camera geometric calibration method is adopted, and the target position relation is refined through a calibration technology based on polar line constraint, so that multi-view target position information is generated;

based on the multi-view target position information, time sequence analysis technology is applied, time sequence data analysis is carried out through dynamic time warping, target space-time correlation is optimized, and time optimized target correlation information is generated;

And integrating the multi-camera data based on the time-optimized target association information, and completing space-time association through a multi-view tracking fusion algorithm to generate inter-camera target tracking information.

7. The smart city camera multi-target tracking method according to claim 1, wherein based on the cross-camera target tracking information, a target mode is learned by combining a time convolution neural network and a long-term and short-term memory neural network, and abnormal behaviors are identified and early-warned by using a single support vector machine, and the steps of generating an abnormal behavior identification and early-warning report are specifically as follows:

based on the target tracking information of the cross-camera, performing deep feature learning by combining a time convolution neural network and a long-term and short-term memory network, and generating a target behavior feature learning result;

based on the target behavior feature learning result, performing space-time feature analysis by adopting a feature clustering algorithm, extracting a target behavior mode, and generating behavior features after space-time clustering;

based on the behavior characteristics after space-time clustering, carrying out recognition and modeling of abnormal behaviors by using a single-class support vector machine, and generating an abnormal behavior recognition model;

And based on the abnormal behavior recognition model, carrying out abnormal behavior early warning by combining real-time monitoring data, and generating abnormal behavior recognition and early warning reports.

8. The smart city camera multi-target tracking system according to any one of claims 1-7, wherein the system comprises an image preprocessing module, a basic feature extraction module, a preliminary target recognition module, a feature enhancement and optimization module, a target tracking frame preprocessing module, a target association module, and a behavior learning and early warning module.

9. The smart city camera multi-target tracking system of claim 8, wherein the image pre-processing module performs image enhancement and noise reduction using histogram equalization and gaussian blur algorithm based on the original monitored image to generate an enhanced image;

The basic feature extraction module extracts basic features based on the enhanced image, adopts a depth separable convolution algorithm, reduces calculated amount and maintains performance, and generates a basic feature map;

The preliminary target recognition module carries out preliminary target recognition based on the basic feature map, adopts multi-scale space pyramid pooling to enhance scale invariant feature extraction, adopts a region proposal network to combine with frame regression, positions a target and generates preliminary target recognition information;

The feature enhancement and optimization module performs feature enhancement and optimization based on preliminary target identification information, realizes fusion between feature layers by using a feature pyramid network, adopts a attention mechanism to enhance the expression capability of features, performs optimization adjustment of frames by adopting an anchor frame optimization algorithm, refines the features by using a cascade convolution network, and generates multi-scale target feature information;

The target tracking frame preprocessing module processes the tracking frame based on the multi-scale target characteristic information, dynamically predicts the size of the target by adopting a real-time scale estimation algorithm, adjusts the size of the tracking frame by adopting an anchor frame generation algorithm, and eliminates overlapped tracking frames by combining a non-maximum suppression algorithm to generate tracking information after self-adaptive scale adjustment;

The target association module runs a target association algorithm based on tracking information after self-adaptive scale adjustment, performs target instance segmentation by adopting a mask region convolutional neural network enhanced by a feature pyramid network, performs three-dimensional reconstruction on a scene by a monocular depth estimation method of deep learning, processes by using a pixel level fusion technology, determines the shielding level and depth between targets by a depth ordering algorithm and a shielding processing technology, and generates a target shielding relation and depth information;

The behavior learning and early warning module performs behavior learning and early warning based on the target shielding relation and the depth information, performs deep feature learning by using a time convolution neural network and a long-term and short-term memory network, performs space-time feature analysis by using a feature clustering algorithm, performs abnormal behavior recognition and modeling by using a single support vector machine, performs abnormal behavior early warning by combining real-time monitoring data, and generates an abnormal behavior recognition and early warning report.

10. The smart city camera multi-target tracking system of claim 8, wherein the image pre-processing module comprises a noise reduction sub-module, an image enhancer sub-module, a filtering anomaly sub-module, an image normalization sub-module;

The basic feature extraction module comprises a depth convolution sub-module, a feature extraction sub-module, a feature compression sub-module and a feature noise screening sub-module;

The preliminary target recognition module comprises a target positioning sub-module, a target detection sub-module, a target screening sub-module and a target confirmation sub-module;

The feature enhancement and optimization module comprises a feature fusion sub-module, a feature attention weighting sub-module, an anchor frame optimization sub-module and a feature extraction sub-module;

the target tracking frame preprocessing module comprises a real-time scale estimation sub-module, an anchor frame generation sub-module, a non-maximum value suppression sub-module and a target tracking frame positioning sub-module;

the target association module comprises a target instance segmentation sub-module, a scene three-dimensional reconstruction sub-module, a target occlusion relationship analysis sub-module, an occlusion level and depth judgment sub-module;

The behavior learning and early warning module comprises a deep feature learning sub-module, a space-time feature analysis sub-module, an abnormal behavior identification sub-module and a real-time abnormal early warning sub-module.