CN118334743A

CN118334743A - Method for detecting stay behavior of personnel in public place

Info

Publication number: CN118334743A
Application number: CN202410478030.7A
Authority: CN
Inventors: 王晔
Original assignee: Suzhou Luopan Network Technology Co ltd
Current assignee: Suzhou Luopan Network Technology Co ltd
Priority date: 2024-04-19
Filing date: 2024-04-19
Publication date: 2024-07-12

Abstract

The invention discloses a method for detecting personnel stay behaviors in public places, which relates to the technical field of public behavior detection and comprises the following steps: raw data: installing a camera in a public place to continuously capture video streams, obtaining original video data, transmitting the original video data to a preprocessing center through a network, and carrying out encryption storage and access control on the original video data to obtain preprocessed target data; according to the invention, based on a Deep learning technology, YOLOv models are adopted to detect, identify, screen, mark and track target pedestrians captured by a public place installation camera, abnormal stay and loitering behaviors of suspicious personnel are accurately identified and tracked, the target data are further monitored and processed through ReID technologies, and the track of the associated target pedestrians in multi-target tracking information is determined by combining Deep-SORT algorithm, so that the functions of automatically extracting and analyzing the behavior characteristic information and the motion track of the personnel are realized.

Description

Method for detecting stay behavior of personnel in public place

Technical Field

The invention relates to the technical field of public behavior detection, in particular to a method for detecting stay behaviors of people in public places.

Background

The personnel stay algorithm is mainly used for judging and analyzing the stay time of personnel in a specific area. Currently, conventional technologies within the domestic and foreign industries include both video-based analysis and sensor-based technologies.

Personnel stay algorithms based on video analysis typically use computer vision techniques and deep learning algorithms to detect and track personnel, by analyzing video frames, the entry and exit times of personnel can be detected, thereby calculating the stay time; the method can directly utilize the existing monitoring camera without additional hardware equipment, but the video analysis technology is sensitive to factors such as illumination conditions, shielding objects, the number of people and the like, a large amount of computing resources and storage space are needed, the cost is relatively high, and the video data possibly relate to privacy protection problems, so that the management and supervision of the data are required to be enhanced;

The sensor-based personnel stay algorithm is to calculate stay time by deploying various sensors such as infrared sensors, wiFi fingerprint sensors, etc. to detect the time when personnel enter and leave an area, the sensor cost is relatively low and is less sensitive to environmental changes, and the sensor technology needs to be deployed in a specific area, which may be affected by factors such as personnel moving speed and sensor coverage, etc., and since sensor data may involve privacy protection problems, there is a need to enhance management and supervision of data.

The prior art has the following defects:

1. Precision problem: the method based on video analysis can be influenced by factors such as illumination, shielding objects and the like, while the method based on the sensor can be influenced by factors such as personnel moving speed, sensor coverage range and the like, so that the accuracy is not high enough;

2. Cost problem: the method based on video analysis requires a large amount of computing resources and storage space, and has relatively high cost, while the method based on the sensor has low cost, but needs to deploy a large amount of sensors, and has the cost problem;

3. Privacy protection problem: video analysis based methods may involve privacy protection issues, requiring enhanced protection and management of video data. The sensor-based method may also involve privacy protection issues, requiring enhanced management and supervision of sensor data;

4. Real-time problem: for scenes with high real-time requirements, conventional techniques may not meet the requirements, and more efficient algorithms and techniques are required to improve the real-time performance.

In summary, although the conventional technology of the personnel stay algorithm in the present stage has a certain application value, there are a plurality of disadvantages and shortcomings, and further improvement and perfection are needed.

The above information disclosed in the background section is only for enhancement of understanding of the background of the disclosure and therefore it may include information that does not form the prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

The invention aims to provide a method for detecting stay behaviors of people in public places, which is characterized in that YOLOv models are adopted to detect, identify, screen, mark and track target pedestrians captured by cameras installed in public places, reID technology is adopted to further monitor target data, deep-SORT algorithm is combined to determine tracks of related target pedestrians in multi-target tracking information, and video streams continuously captured by the cameras are encrypted, stored and accessed to control so as to solve the problems in the background technology.

In order to achieve the above object, the present invention provides the following technical solutions: a method for detecting personnel stay behavior in public places comprises the following steps:

S1, original data: installing a camera in a public place to continuously capture video streams, obtaining original video data, transmitting the original video data to a preprocessing center through a network, and carrying out encryption storage and access control on the original video data to obtain preprocessed target data;

S2, target detection: detecting, identifying, screening and marking pedestrians in the target data by adopting YOLOv model to generate a pedestrian target data set;

S3, target tracking: performing cascade matching and track prediction on target pedestrians in a pedestrian target data set by adopting a Deep-SORT algorithm and a Kalman filtering algorithm, so as to realize multi-target tracking and obtain multi-target motion track information;

S4, re-identifying pedestrians: capturing target data by different cameras in a public scene by adopting ReID technology, re-identifying target pedestrians with multi-target motion track information, determining tracks of related target pedestrians in the multi-target tracking information by combining Deep-SORT algorithm, and obtaining track related data so as to avoid ID jump;

S5, loitering detection: after the motion trail information and trail association data of multiple targets are obtained, a judgment method based on the motion trail distance is adopted, and the loitering judgment of target pedestrians is carried out according to the displacement and the distance so as to judge whether the target pedestrians have the loitering behavior or not, and loitering detection results are generated;

S6, detecting results: when the loitering detection result judges that the target pedestrian is in a loitering state, an alarm signal is triggered immediately, the detected abnormal behavior and track of the target pedestrian are transmitted in a communication mode in the form of images, video clips and alarm information, and the information is fed back to security personnel in public places, and response measures of dispelling, carding and necessary early warning are taken in time.

Optionally, the preprocessing step of the original video data is as follows:

Capturing video streams in a continuous manner by means of several cameras connected to a public place, obtaining raw video data, calibrated as And (2) andIn which, in the process,Represented as random variables in the pixels of the video stream,Represented as a set of pixel variables within a continuous video stream and transmitted over a network to a preprocessing center;

extracting individual frames from the video stream, and using a Gaussian filter to perform noise reduction and resolution adjustment processing on each frame, wherein the noise reduction calculation formula is as follows In which, in the process,Represented as raw video dataIs a gaussian function of (c) and,Expressed as random variablesThe standard deviation of the Gaussian distribution of (2) is adjusted to the resolution calculation formulaIn which, in the process,Represented as adjustmentNew pixel point coordinates after resolution, andRepresented as raw video dataIn (3) the source pixel point coordinates,Represented as atThe neighboring pixels at which the pixel is located,Respectively denoted asThe extent of the neighborhood of the surroundings,Expressed as according toPoint to pointThe weights calculated by the distance weighted average between the points;

for original video data before transmission and storage Encryption calculation is carried out by adopting an AES algorithm to obtain encrypted data signals, and the encrypted data signals are calibrated asThe encryption calculation formula isAnd (2) andIn which, in the process,Represented as an encryption function,Represented as plain text of the original video data,Represented as a key,Represented as random pixel variables read after encryption,Represented as an encrypted set of pixel variables;

meanwhile, based on the fact that access control strategies are implemented for role personnel capable of processing data signals, only authorized personnel can access and process the data signals;

To encrypt data signal The format of the encrypted data is standardized for convenient transmission or storage, i.e. the representation of the data is changed by hexadecimal encoding or Base64 encoding processing for storage and transmission, so as to obtain the encrypted data to be processed and calibrated asAnd (2) andIn which, in the process,Respectively expressed as encrypted data signalsThe data format of the encoded representation is re-standardized,Represented as a number of target data sets;

And applying background subtraction technique to highlight moving object, namely subtracting background model from current frame by background difference algorithm to detect moving object in original video data so as to make the moment of moving object of current frame When in positionThe pixel value of (2) isThe background difference calculation formula isIn which, in the process,Expressed as the current frameTime of day at positionAbsolute differences between the pixels of (c) and the pixels of the background frame,Represented as pixel values at any time in the background model,Respectively expressed as slave target dataMedium readingThe pixel value of the position of the target is moved at the moment.

Optionally, the target data acquisition logic is as follows:

reading encrypted data to be processed Performing inverse algorithm decryption by using the AES key, wherein the AES decryption calculation formula is as followsIn which, in the process,Represented as raw plain text data is provided,Represented as an inverse of the encryption function,Represented as ciphertext data is provided in the form of ciphertext,Represented as a symmetric key;

identifying raw video data And for plain text original video dataPerforming gradual processing calculation of formatting, scaling and normalization to obtain target data, and calibrating to beWherein, the formatting process adopts a video processing library to automatically convert the original video dataThe color space of the operable frame sequence is converted from BGR to RGB format to obtain formatted data, which is calibrated asAnd (2) andIn which, in the process,Respectively expressed as variablesIs a formatted result of (1);

the scaling process is to format the data by bilinear interpolation algorithm To a resolution of each frame scaled to obtain scaled data, scaled toThen bilinear interpolation calculation formula isIn which, in the process,Represented asThe pixel value of the location is determined,Respectively expressed as the relative position adjacency between the target point and the original pixel point of the adjacent four pixel values in the image,Expressed as points in the original coordinate systemMapping to a location of the new coordinate system;

Normalization is the scaling of the image data Each pixel value range [0, 255] in the range [0, 1] is normalized to the range [0, 1] to obtain normalized data, namely final target dataThe normalized calculation formula isIn which, in the process,Represented as a normalized value,Respectively expressed as the position of the new coordinate systemIs used for the normalization of the results of (a),Represented as the post-scaling position of the original imageIs a pixel value of (a).

Optionally, the object detection step of the YOLOv model is as follows:

Target data Inputting the model into a YOLOv network where the YOLOv model is located;

convolutional layer, residual layer, downsampling layer, upsampling layer and routing layer contained in YOLOv network for target data Extracting features to obtain a target feature map, wherein the target feature map comprises prediction and classification confidence of a boundary box;

The YOLOv model divides the image of the target feature map into grid units, can detect the boundary boxes based on YOLOv model training and forward reasoning, predicts a plurality of boundary boxes and corresponding confidence scores on each grid unit, removes overlapped boundary boxes by adopting a non-maximum suppression NMS to compare the intersection ratio threshold value, and ensures that only one boundary box corresponds to each real target;

Screening out detection results with confidence coefficient higher than a set threshold value, and classifying the moving pedestrian targets;

and analyzing the finally reserved boundary box and classification result, and outputting and storing the detected pedestrian position and classification label.

Optionally, the acquiring logic of the pedestrian target data set is as follows:

YOLOv5 model receives input target data Feature extraction is performed through the convolution layer, the residual layer, the downsampling layer, the upsampling layer and the routing layer, then for each convolution layer, a convolution kernel is used for feature extraction, and a target feature map is output and calibrated asThe convolution calculation formula isIn which, in the process,Represented as a convolutional layer at a positionIs provided with an output characteristic map of (a),Represented as a convolution kernel,Represented as target dataIn the convolution kernelIs used for the operation of the input values of (a),Represented as convolution kernelsAt the position ofThe value at the location is a function of the value at the location,Respectively denoted as convolution kernelsThree different points in (a)Is used for the control of the temperature of the liquid crystal display device,Represented as convolution kernelsThree different weight parameters of the three-dimensional model,Bias terms, denoted as convolutional layers;

Target feature diagram after convolution calculation And carrying out batch normalization processing, wherein the calculation formula of batch normalization is as followsAnd (2) andIn which, in the process,Represented as target feature graphs for inputProceeding withThe results of the normalization of the individual batches,Represented asThe number of batches of the product is one,Represented as atTarget feature map input in each batch，Expressed as a batch average value,Represented as a parameter that can be learned,Represented as the variance of the batch and,Represented as a divide by zero positive number controlling the scaling,Represented as at the firstThe predicted output values in the individual batches are,The hyper-parameters denoted as scaling factors,An offset translation term expressed as an adjustment prediction value;

the YOLOv model normalizes the output characteristic map in batches Will be used as predictive training to generate predictive output valueQuantity and corresponding confidenceAnd (2) andAnd according to the predicted output value of the feature mapGenerating prediction frame information, wherein the prediction frame information comprises category probability, frame position and prediction frame width height, and is respectively calibrated as、AndI.e.；

Class probabilityThe normalization exponential function is adopted to predict the probability distribution of each anchor frame belonging to each category, and the calculation formula of category probability is set asAnd (2) andIn which, in the process,A model denoted YOLOv predicts the score of the target feature,Expressed as predicted output valueAt presentThe feature vector of the category is used to determine,Represented asFeature vectors of individual classesSumming;

frame position An anchor frame mechanism is adopted, offset related to the anchor frame is output through a Sigmoid function, and the width and the height of the frame are predicted according to the offset, the central coordinate of the original boundary frameTo determine the final bounding box position based on the predicted output valueSetting the central coordinate value of the prediction boundary box asThe prediction bounding boxes are each of width and heightAnd the upper left corner coordinates of the grid cells of the image segmentation are known to beThe width and the height are respectivelyThe offset formula calculated by adopting the Sigmoid function is as followsAnd (2) andIn which, in the process,Respectively shown as atShaft and method for producing the sameThe magnitude of the offset in the axial direction,Represented as the width and height of the final bounding box position,Represented as a central coordinate value of a prediction bounding boxIs used for the control of the temperature of the liquid crystal display device,Expressed as natural constantExponential function of the base, i.e. frame positionPredicting frame width and height；

When the YOLOv model is trained, network parameters are adjusted through a forward propagation algorithm, the difference between detection results and actual labels is minimized through a loss function, a non-maximum suppression NMS is not adopted to remove redundant overlapped frames, a plurality of detection results of each pedestrian are reserved for position fusion processing, after training, pedestrian detection is carried out on given new target data, and a dataset only comprising pedestrian targets is generated, namely, a pedestrian target dataset is calibrated as followsWherein, the calculation of the loss function and the method of the position fusion processing are critical, the loss function comprises coordinate loss, object confidence loss and category loss, and the calculation formula of the coordinate loss is as followsIn which, in the process,Represented asThe loss value calculated by the loss function,Expressed as the cross-over ratio,Expressed as predicted box positionAnd grid cell real frameThe euclidean distance of the center point,The diagonal length of the minimum closure region expressed as two boxes of the prediction box and the real box,Represented as a weight coefficient(s) that,Expressed as a measure of the consistency of the predicted and real frame aspect ratios;

the calculation formula of the confidence loss of the object is as follows In which, in the process,Represented as employing a binary cross entropy loss algorithmThe calculated confidence loss value of the object,Real label expressed as detection behavioural person and detection person existsThe absence of detection personnel，Confidence of detection object expressed as YOLOv model prediction result output, and；

The calculation formula of the category loss is as followsIn which, in the process,Represented as the difference loss value between the predicted class distribution and the true class distribution calculated using the cross entropy loss algorithm,Expressed as a total number of categories,Expressed as the detection object belonging to the categoryIs a function of the true probability of (1),Expressed as the detection object belonging to the categoryIs used for predicting the probability of (1);

The position fusion processing is to acquire the final boundary frame coordinates of the detected target pedestrian by adopting an average fusion algorithm to construct a pedestrian target data set The calculation formula of the average fusion algorithm isAnd (2) and、、、In which, in the process,、、AndRespectively expressed as the central point after the fusion of the detection framesAxis coordinates, center pointAverage values of axis coordinates, width and height,Expressed as a confidence level of the fusion box,Expressed as a total number of categories,Represented as the number of bounding boxes detected,Represented as the coordinate location and width height of each bounding box, respectively.

Optionally, the Deep-SORT algorithm combines the kalman filtering and the hungarian algorithm to track the multiple targets of the pedestrian target data set as follows:

The Deep-SORT algorithm realizes efficient and accurate tracking of multiple targets in a pedestrian target data set by comprehensively utilizing data association of Kalman filtering prediction and Hungary algorithm;

From pedestrian target data sets Performing behavior target tracking on pedestrians in detected images, and predicting current intra-frame by using Kalman filterThe prediction calculation formula of the Kalman filtering is thatAnd (2) andIn which, in the process,Expressed as presentMoment pedestrian target data setIs used to determine the state prediction value of (c),Represented as a pedestrian target datasetIs used for the state transition matrix of the (c),Expressed as the last timeMoment pedestrian target data setIs used to determine the state estimate of (1),Represented as a pedestrian target datasetIs provided with a control input matrix of (a),Represented asA control input of the moment of time,Represented asThe time of day prediction state covariance is obtained,Expressed as the last timeA state covariance matrix of the time of day,Represented as a transpose,Represented asA time process noise covariance matrix;

Target state predicted from Kalman filtering And at presentDetection of time of day observationsAnd a track of the detection target is distributed by using a Hungary algorithm, wherein the track calculation formula of the Hungary algorithm is as followsIn which, in the process,Expressed as target stateAnd detectingThe total distance between the two parts is set,A parameter denoted as the equilibrium distance,Represented as a matrix product of the two,Represented as the mahalanobis distance predicted based on kalman filtering,Expressed as target stateAnd detectingPredicted values based on cosine similarity;

updating the new state of the target position and speed again for each matched target and detection pair using the Kalman filtering updating step, the new state calculation formula of the Kalman filtering is updated as follows And (2) and、In which, in the process,Denoted as the kalman gain,Represented as an observation model, is provided,Represented as the observed noise covariance(s),Expressed as the last timeA state covariance matrix of the time of day,Represented as a transpose,Represented asPedestrian target data set at next-momentIs used to determine the state prediction value of (c),Expressed as presentMoment pedestrian target data setIs used to determine the state prediction value of (c),Represented as a pedestrian target datasetControl input matrix of (a), i.e. currentMoment pedestrian target data setIs used for the observation of the (a),Expressed as the next timeThe probability of the state of the update,Represented asTime prediction state covariance;

The change of the position of each tracking target with time is recorded, a motion track is formed, and the motion track of the target is optimized by predicting the current state and realizing track smoothing based on the updated state of the new observation data.

Optionally, the logic for acquiring the motion trail information of the multiple targets is as follows:

detecting the position information of a target pedestrian in target data of a current frame by using YOLOv model to obtain a predicted target boundary frame;

extracting features from the detected predicted target boundary box, determining the confidence of the detected object, and acquiring a pedestrian target data set Tracking the target and matching the track of the target detected by the current frame;

Matching the track where the detection target is located by using a Hungary algorithm, predicting and updating the position of each track of the multi-target pedestrian in the next frame by using a Kalman filter, and updating a new state of the target by using an updating step of the Kalman filter again for each matched target and detection pair so as to improve the accuracy of data association;

Obtaining a motion track by predicting the current state and updating the state based on new observation data according to the Kalman filtering smooth track technology, and obtaining multi-target motion track information In which, in the process,Represented as motion trajectory information of the detection target,Expressed as the number of detection targets.

Optionally, the step of re-identifying the pedestrian by the ReID technology is as follows:

A1, extracting features: a convolutional neural network model is adopted, and behavior characteristic information with distinction degree of multi-target pedestrians is learned and extracted from a pedestrian target data set;

A2, feature matching: matching the pedestrian characteristics extracted from the current target data with a pedestrian characteristic library extracted from the known pedestrian target data set by calculating the distance or similarity between the characteristics;

a3, track association: and according to the result of the feature matching, correlating the target pedestrians of the target data with the motion track information of the targets corresponding to the known pedestrian target data set through a Hungary algorithm, and finding out the optimal correlation track information.

Optionally, the track association data acquisition logic is as follows:

use of convolutional neural network CNN model from pedestrian target dataset The method is characterized by learning and extracting behavior characteristic information with differentiation degree and calibrating asThe behavior characteristic information with the differentiation is obtained by using a convolution layer calculation formula in a CNN model asIn which, in the process,Represented as an activation function,Represented as convolution kernel weights,Represented as a bias term,Represented as a convolution operation;

calculating the distance between two feature vectors by using Euclidean distance to perform feature matching, generating a correlated distance matrix, and calibrating as According to the behavior characteristic informationSetting two feature vectorsIn which, in the process,Respectively expressed as a detected target feature and a track feature of a target pedestrian, andThe calculation formula of Euclidean distance is that；

From a distance matrix which generates an associationCorrelating behavior feature information of target pedestrians using a matching algorithm such as a hungarian algorithmAnd motion trail information of multiple targetsAnd associating the cost matrix and searching for optimal matching to obtain track association data, and calibrating asThe association degree calculation formula isAnd (2) andIn which, in the process,Behavior characteristic information expressed as target pedestrianAnd motion trail information of multiple targetsIs used for the association of (a) with (b),Expressed as a minimum distance cost between the detected target feature and the trajectory feature vector of the target pedestrian,The number of the detection targets is represented, the correlation is calculated according to the detection targets and the corresponding tracks, and the track optimal matching result is obtained;

to avoid ID jumps, deep-SORT does not delete tracks immediately when track matching fails, but uses a timing counter to keep track of the number of frames each track has since the last successful match.

Optionally, the logic for obtaining the loitering detection result is as follows:

motion trail information through multiple targets Associating data with a trackDetermining whether a target pedestrian remains or wanders in an area according to the ratio of the comparison distance and the displacement by adopting a distance and displacement calculation method;

wherein, the optimal motion trail information of the detected and matched multi-target pedestrians in the public area is that In which, in the process,Optimal motion trajectory information expressed as a multi-target pedestrian,Respectively expressed as the plane position of the target pedestrian in the optimal movement track informationShaft and method for producing the sameThe coordinates of the axes are used to determine,Indicated as the time period for which the optimal motion profile information is sustained,Expressed as the number of detection targets;

Optimal motion trail information of any selected target pedestrian Finding out the starting point to the ending point from the track of the target pedestrian, and calibrating as、And counting continuous movement of the target pedestrian on the optimal movement trackEach track coordinate of each point is marked as；

Respectively calculating the distance and displacement of the target pedestrian according to the track starting point to the track ending point of the target pedestrian, wherein the displacement calculation formula is as follows；

The calculation formula of the journey isAnd (2) and；

The calculation result according to the ratio of the distance and the displacement isIn which, in the process,Expressed as a ratio of distance to displacement, a threshold value is set for determining whether a target pedestrian is hovering or hovering in an area, calibrated asIn which, in the process,Behavior expressed as the occurrence of a lingering or wandering in an area, thus comparing the magnitude of the ratio to the magnitude of the threshold, when presentWhen the target pedestrian does not stay or wander in a public area, judging that the target pedestrian does not stay or wander; when it occursAnd judging whether the target pedestrian lingers or wanders in a public area.

In the technical scheme, the invention has the technical effects and advantages that:

According to the invention, based on a Deep learning technology, a YOLOv model is adopted to detect, identify, screen, mark and track target pedestrians captured by a public place installation camera, abnormal stay and loiter behaviors of suspicious personnel are accurately identified and tracked, the target data are further monitored and processed through a ReID technology, and the track of the associated target pedestrians in multi-target tracking information is determined by combining a Deep-SORT algorithm, so that the functions of automatically extracting and analyzing the behavior characteristic information and the motion track of the personnel are realized, the purpose of accurately judging whether the target pedestrians belong to the stay or loiter behaviors is achieved, and the monitoring accuracy and reliability are improved;

The video stream is continuously captured by the camera to be encrypted, stored and accessed, so that the privacy protection of the collected data is further realized, and the safety and confidentiality of the data are improved;

Based on the determination method of the movement track distance, the loitering detection result is calculated, the real-time early warning and the information feedback are triggered to security personnel in the public place, the real-time monitoring and early warning functions are realized, the management personnel are informed to take the response measures of dispelling, carding and necessary early warning, the efficiency and the level of emergency response are improved, powerful support is provided for public security work, and the method has wider applicability, real-time performance and flexibility, and can be customized and adjusted according to special requirements of different areas and different scenes in the public place so as to meet the monitoring requirements and the monitoring accuracy in different scenes.

Drawings

FIG. 1 is a flow chart of a method for detecting personnel stay behavior in public places according to the present invention.

FIG. 2 is a flow chart of the pedestrian re-recognition step of the present invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

The invention provides a method for detecting personnel stay behavior in public places as shown in fig. 1, which comprises the following steps:

Specifically, the preprocessing steps of the original video data are as follows:

Extracting individual frames from the video stream, and using a Gaussian filter to perform noise reduction and resolution adjustment processing on each frame to improve the accuracy of the original video data, wherein a noise reduction calculation formula is In which, in the process,Represented as raw video dataIs a gaussian function of (c) and,Expressed as random variablesThe standard deviation of the Gaussian distribution of (2) is adjusted to the resolution calculation formulaIn which, in the process,Represented as adjustmentNew pixel point coordinates after resolution, andRepresented as raw video dataIn (3) the source pixel point coordinates,Represented as atThe neighboring pixels at which the pixel is located,Respectively denoted asThe extent of the neighborhood of the surroundings,Expressed as according toPoint to pointThe weights calculated by the distance weighted average between the points;

for original video data before transmission and storage Encryption calculation is carried out by adopting an AES algorithm to obtain encrypted data signals, and the encrypted data signals are calibrated asTo ensure the security of the data, the encryption calculation formula isAnd (2) andIn which, in the process,Represented as an encryption function,Represented as plain text of the original video data,Represented as a key,Represented as random pixel variables read after encryption,Represented as an encrypted set of pixel variables;

To encrypt data signal The format of the encrypted data is standardized for convenient transmission or storage, i.e. the representation of the data is changed by hexadecimal encoding or Base64 encoding processing for storage and transmission, so as to obtain the encrypted data to be processed and calibrated asAnd (2) andIn which, in the process,Respectively expressed as encrypted data signalsThe data format of the encoded representation is re-standardized,Represented as a number of target data sets to reduce storage space and improve processing efficiency;

Specifically, the target data acquisition logic is as follows:

specifically, the object detection procedure of YOLOv model is as follows:

The YOLOv model divides the image of the target feature map into grid cells, a plurality of boundary boxes and corresponding confidence scores are predicted on each grid cell based on YOLOv model training and forward reasoning, the confidence scores are expressed as the product of the probability that the predicted boxes contain targets and the accuracy of the predicted boxes, overlapping boundary boxes are removed by adopting non-maximum suppression NMS to compare the intersection ratio threshold values, and only one boundary box of each real target is ensured to correspond;

Specifically, the acquisition logic of the pedestrian target data set is as follows:

YOLOv5 model receives input target data Feature extraction is performed through the convolution layer, the residual layer, the downsampling layer, the upsampling layer and the routing layer, then for each convolution layer, a convolution kernel is used for feature extraction, and a target feature map is output and calibrated asThe convolution calculation formula isIn which, in the process,Represented as a convolutional layer at a positionIs provided with an output characteristic map of (a),Represented as a convolution kernel,Represented as target dataIn the convolution kernelIs used for the operation of the input values of (a),Represented as convolution kernelsAt the position ofThe value at the location is a function of the value at the location,Respectively denoted as convolution kernelsThree different points in (a)Is used for the control of the temperature of the liquid crystal display device,Represented as convolution kernelsThree different weight parameters of the three-dimensional model,Bias terms shown as convolutional layers;

Class probabilityThe normalization exponential function is adopted to predict the probability distribution of each anchor frame belonging to each category, and the calculation formula of category probability is set asAnd (2) andIn which, in the process,A model denoted YOLOv predicts the score of the target feature,Expressed as predicted output valueAt presentThe feature vector of the category is used to determine,Represented asFeature vectors of individual classesSumming, the normalized exponential function ensures that all class probabilities are added up to be equal to one;

When the YOLOv model is trained, network parameters are adjusted through a forward propagation algorithm, the difference between detection results and actual labels is minimized through a loss function, a non-maximum suppression NMS is not adopted to remove redundant overlapped frames, a plurality of detection results of each pedestrian are reserved for position fusion processing, after training, pedestrian detection is carried out on given new target data, and a dataset only comprising pedestrian targets is generated, namely, a pedestrian target dataset is calibrated as followsThe method is used for facilitating the subsequent behavior analysis and people stream statistics, wherein the calculation of the loss function and the position fusion processing method are of great importance, the loss function comprises coordinate loss, object confidence loss and category loss, and the calculation formula of the coordinate loss is as followsIn which, in the process,Represented asThe loss value calculated by the loss function,Expressed as the cross-over ratio,Expressed as predicted box positionAnd grid cell real frameThe euclidean distance of the center point,The diagonal length of the minimum closure region expressed as two boxes of the prediction box and the real box,Represented as a weight coefficient(s) that,Expressed as a measure of the consistency of the predicted and real frame aspect ratios;

The position fusion processing is to acquire the final boundary frame coordinates of the detected target pedestrian by adopting an average fusion algorithm to construct a pedestrian target data set The calculation formula of the average fusion algorithm isAnd, in addition, the method comprises,、、、In which, in the process,、、AndRespectively expressed as the central point after the fusion of the detection framesAxis coordinates, center pointAverage values of axis coordinates, width and height,Expressed as a confidence level of the fusion box,Expressed as a total number of categories,Represented as the number of bounding boxes detected,Represented as the coordinate location and width height of each bounding box, respectively.

Specifically, the Deep-SORT algorithm combines Kalman filtering and Hungary algorithm to track multiple targets of the pedestrian target data set as follows:

the Deep-SORT algorithm realizes efficient and accurate tracking of multiple targets in a pedestrian target data set by comprehensively utilizing Kalman filtering prediction and data association of the Hungary algorithm, can keep higher tracking stability and accuracy in a dynamically changing environment, and provides reliable target motion trail information for subsequent analysis and application;

updating the new state of the target position and speed again for each matched target and detection pair using the Kalman filtering updating step, the new state calculation formula of the Kalman filtering is updated as follows And, in addition, the method comprises,、In which, in the process,Denoted as the kalman gain,Represented as an observation model, is provided,Represented as the observed noise covariance(s),Expressed as the last timeA state covariance matrix of the time of day,Represented as a transpose,Represented asPedestrian target data set at next momentIs used to determine the state prediction value of (c),Expressed as presentMoment pedestrian target data setIs used to determine the state prediction value of (c),Represented as a pedestrian target datasetControl input matrix of (a), i.e. currentMoment pedestrian target data setIs used for the observation of the (a),Expressed as the next timeThe probability of the state of the update,Represented asTime prediction state covariance;

The change of the position of each tracking target along with time is recorded to form a motion track, and the motion track of the target is optimized by predicting the current state and realizing track smoothing based on the updated state of the new observation data, so that the influence of jump and noise is reduced, and the overall quality of tracking is improved.

Specifically, the logic for acquiring the motion trail information of the multiple targets is as follows:

specifically, the steps of pedestrian re-identification in ReID technology are as follows:

Specifically, the track associated data acquisition logic is as follows:

calculating the distance between two feature vectors by using Euclidean distance to perform feature matching, generating a correlated distance matrix, and calibrating as According to the behavior characteristic informationSetting two feature vectorsIn which, in the process,Respectively expressed as a detected target feature and a track feature of a target pedestrian, andThe calculation formula of Euclidean distance is thatWhile two eigenvectors are calculated using cosine similarityThe formula of the similarity isIn which, in the process,Denoted as cosine similarity,Represented as a dot product of the vectors,The euclidean norm expressed as a vector;

From a distance matrix which generates an association Correlating behavior feature information of target pedestrians using a matching algorithm such as a hungarian algorithmAnd motion trail information of multiple targetsAnd associating the cost matrix and searching for optimal matching to obtain track association data, and calibrating asThe association degree calculation formula isAnd (2) andIn which, in the process,Behavior characteristic information expressed as target pedestrianAnd motion trail information of multiple targetsIs used for the association of (a) with (b),Expressed as a minimum distance cost between the detected target feature and the trajectory feature vector of the target pedestrian,The number of the detection targets is represented, the correlation is calculated according to the detection targets and the corresponding tracks, and the track optimal matching result is obtained;

in order to avoid the ID jump, the Deep-SORT does not delete the tracks immediately when the track matching fails, but uses a time sequence counter to track the number of frames of each track since the last successful matching, if the track of the target pedestrian to be detected exceeds a preset time sequence track threshold value, the track of the target pedestrian to be detected is deleted, and meanwhile, when the matching cost of the newly detected track of the target pedestrian is too high, the track is initialized as a new track.

Specifically, the logic for obtaining the loitering detection result is as follows:

The calculation formula of the journey isAnd, in addition, the method comprises,；

The specific explanation is that the alarm signal mode includes sound and light alarm, short message mail notification alarm and intelligent monitoring program push alarm, and the alarm signal content includes the occurrence time, place, duration and track image of the target pedestrian.

The method specifically illustrates that the content of communication transmission and feedback report comprises the steps of using an existing monitoring camera system to directly transmit a captured real-time video stream of loitering behavior to a monitoring center or mobile equipment of security personnel; intercepting and storing the image or video fragment with the loitering behavior detected by the automatic software, and sending the image or video fragment to security personnel through a network; and generating an alarm report of loitering event details, sending the alarm report to related security personnel through an e-mail or an internal communication network, and requiring the security personnel to verify and check the alarm report content on site and adopting site response and plan processing.

The specific method and the flow of the method for detecting the stay behavior of the personnel in the public place provided by the embodiment of the invention are detailed in the embodiment of the method for detecting the stay behavior of the personnel in the public place, and are not repeated here.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The method for detecting the stay behavior of the personnel in the public place is characterized by comprising the following steps:

2. The method for detecting stay behavior of persons in public places according to claim 1, wherein the preprocessing step of the original video data is as follows:

3. The method for detecting personnel stay in public places according to claim 2, wherein the target data acquisition logic is as follows:

4. A method for detecting stay behavior of a person in a public place according to claim 3, wherein the object detection step of the YOLOv model is as follows:

5. The method for detecting personal retention behavior in a public place according to claim 4, wherein the logic for obtaining the pedestrian target data set is as follows:

6. The method for detecting the stay behavior of people in public places according to claim 5, wherein the Deep-SORT algorithm combines the steps of kalman filtering and multi-objective tracking of pedestrian objective data sets by the hungarian algorithm as follows:

updating the new state of the target position and speed again for each matched target and detection pair using the Kalman filtering updating step, the new state calculation formula of the Kalman filtering is updated as follows And (2) and、In which, in the process,Denoted as the kalman gain,Represented as an observation model, is provided,Represented as the observed noise covariance(s),Expressed as the last timeA state covariance matrix of the time of day,Represented as a transpose,Represented asPedestrian target data set at next momentIs used to determine the state prediction value of (c),Expressed as presentMoment pedestrian target data setIs used to determine the state prediction value of (c),Represented as a pedestrian target datasetControl input matrix of (a), i.e. currentMoment pedestrian target data setIs used for the observation of the (a),Expressed as the next timeThe probability of the state of the update,Represented asTime prediction state covariance;

7. The method for detecting stay behavior of people in public places according to claim 6, wherein the acquiring logic of the multi-target movement track information is as follows:

8. The method for detecting stay behavior of a person in a public place according to claim 7, wherein the ReID technique of pedestrian re-recognition comprises the steps of:

9. The method for detecting personnel stay in a public place according to claim 8, wherein the track association data acquisition logic is as follows:

10. The method for detecting the stay behavior of a person in a public place according to claim 9, wherein the logic for obtaining the loitering detection result is as follows:

The calculation formula of the journey isAnd (2) and；