CN119152575B

CN119152575B - Identification method of coal mine personnel chasing overhead passenger devices based on computer vision

Info

Publication number: CN119152575B
Application number: CN202411288242.5A
Authority: CN
Inventors: 赵文静; 高佳锋; 邵福; 袁少博; 卢宏铭; 侯晋红; 司斌; 曹栋鹏; 李翰林
Original assignee: Shanxi Sunshine Three Pole Polytron Technologies Inc
Current assignee: Shanxi Sunshine Three Pole Polytron Technologies Inc
Filing date: 2024-09-14
Publication date: 2025-04-04
Anticipated expiration: 2044-09-14

Abstract

The invention belongs to the technical field of computer vision and discloses a recognition method of a colliery personnel catch-up overhead riding device based on computer vision, which comprises the following steps of acquiring a real-time video stream of a monitoring point of a scene of a riding overhead riding device, intercepting a plurality of sections of videos of the pedestrian catch-up overhead riding device, intercepting the videos frame by frame and obtaining a data set; the method comprises the steps of preprocessing acquired data, marking the data, identifying personnel and overhead riding devices by adopting an improved algorithm, tracking a plurality of cameras based on a tracking algorithm, calculating motion vectors and distances among tracking targets, judging whether catch-up behaviors exist or not, solving the problems of human resource waste and blind area monitoring caused by manual inspection and monitoring in the past, timely checking illegal operation and potential safety hazards of coal mine workers in the production process, improving the working efficiency of the coal mine, and reducing the risk of accidents caused by the fact that the coal mine personnel do not take the overhead riding devices normally.

Description

Recognition method of colliery personnel catching up overhead riding device based on computer vision

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a recognition method of a colliery personnel catch-up overhead riding device based on computer vision.

Background

Traditional colliery production process often relies on artificial inspection and control, has the problem of manpower resources waste and blind area control, and colliery workman's operation violating regulations and potential safety hazard in the production process can not in time be examined moreover, can reduce colliery work efficiency, and the factor of safety risk rises. Along with the floor application of artificial intelligence technology, intelligent management of the coal mine industry is also becoming a trend. Based on the video image, intelligent recognition analysis of underground monitoring scenes of the coal mine can be realized by deploying an AI algorithm model, and abnormal events, illegal operations, potential safety hazards and the like in the safety production process of the coal mine can be monitored and early warned in real time.

The overhead man-riding device is a device for underground transportation of mines, and consists of a steel wire rope, pulleys, a motor and the like, and can run up and down in inclined shafts or vertical shafts. Through the field practical application of the mine enterprise overhead man-riding device, not only can the production efficiency of mines in China be effectively improved and improved, but also the working intensity of mine workers can be effectively reduced. However, in the practical application scene of the overhead passenger device, a series of accidents caused by the fact that the device is not used normally often exist. The coal mine personnel catch up the overhead riding device, so that the personnel fall down and get on the car, the device generates larger deflection, and the steel wire rope falls down or the rope clip is clamped on the rope supporting wheel, so that safety accidents occur. Therefore, the abnormal behavior is detected, and the alarm result is pushed in time, so that the safety problem can be effectively reduced.

Disclosure of Invention

The invention provides an identification method of a colliery personnel catch-up aerial riding device based on computer vision, which aims to solve the technical problems in the prior art, mainly comprises the steps of acquiring a real-time video stream of a monitoring point of a scene of the riding aerial riding device, performing frame extraction processing on video, preprocessing and marking a cut-out data set, detecting a miner and an independent aerial riding device target in the video by adopting a YOLOv-MCW improved algorithm, tracking the detected target by adopting a target tracking technology, considering the problem that identity loss and switching of the target are easy to occur due to mutual shielding and shielding among the targets in the tracking process, utilizing homography matrixes of two visual angles, realizing the splicing of images of different visual angles, judging the moving speed and the moving direction of the target based on a tracking result, combining the calculated distance of the two targets, and finally judging whether a person has the action of catching up the aerial riding device.

The recognition method for the colliery personnel catching up the overhead riding device based on computer vision comprises the following specific steps:

S1, acquiring image data, namely acquiring a real-time video stream of a monitoring point of a scene of a riding overhead riding device by adopting an existing camera in an actual coal mine application scene, intercepting a plurality of sections of videos of the pedestrian catching the overhead riding device, screening the intercepted videos, and intercepting the videos frame by frame to obtain a data set for subsequent processing.

S2, data set manufacturing, namely respectively carrying out data enhancement preprocessing and data labeling on the acquired data.

Further, step S2 includes the steps of:

S21, data enhancement, namely transforming and expanding the data collected in the step S1 so as to generate more and richer samples, so that the diversity of training samples can be improved, the dependence of the model on specific data distribution can be reduced, and the generalization capability and the robustness of the model can be enhanced.

S22, marking the pictures subjected to data enhancement pretreatment by using LabelImg software on images, namely marking pedestrians and overhead riding devices, namely, a pedestrian category is named as person, the overhead riding devices are named as man_ riding _device, generating txt files with position coordinates and category names after marking, and dividing the marked data set into a training set, a test set and a verification set according to the proportion of 7:2:1.

S3, model training, namely, the method adopts YOLOv-MCW improved algorithm to realize the identification of personnel and overhead riding devices, and YOLOv improved algorithm has significant progress in detection precision and speed, but still has room for improvement under specific application scenes. Considering that the invention carries out target detection in the place where the overhead riding device is taken in the wellhead and the tunnel, the YOLOv algorithm needs to be further optimized, and the overall performance of target detection is improved.

The YOLOv algorithm comprises the following three core points that firstly, a MobileNetV-Large is used for replacing a traditional trunk structure of a yolov network, the light weight and easy deployment of a model are guaranteed, secondly, a channel attention module is introduced, because the installation position of a camera is far away from a detection target, the detection target is small in the whole picture and the underground target is fuzzy, and the small targets are often easy to miss by the detection network, an attention mechanism is added to improve the detection effect of the model on a remote aerial multiplying device and personnel targets, and WIoU v is selected as a frame regression loss function for further optimizing the algorithm model and improving the accuracy of target detection positioning. Through the optimization strategy, the aims of efficiently and accurately detecting personnel and overhead riding devices at the wellhead and underground are finally achieved, and the improved algorithm is named YOLOv-MCW.

Further, step S3 includes the steps of:

S31, the MobileNetV-Large is used for replacing a traditional backbone structure of the yolov network, and the accuracy of the model is almost lost while the lightweight model is ensured. MobileNetV x 5-sized depth convolution is introduced in MobileNetV instead of partial 3 x 3 depth convolution, and a squeze-and-excitation (SE) module and an h-swish (HS) activation function are introduced to improve model accuracy, a yolov-M model is concise, and both performance and speed are excellent.

And S32, an attention mechanism adding module, namely introducing a coordinate attention mechanism (CA) in the neck network, embedding an accurate positioning signal into channel attention, dividing the accurate positioning signal into two feature codes in different directions, wherein one direction is used for retaining a short position signal, the other direction is used for capturing long-range dependence, and the feature codes in the two directions are respectively used for forming a feature image containing a pair of target perception and positioning signals, so that the accurate positioning and target recognition capability of the feature image can be enhanced.

S33, optimizing a loss function, namely, the traditional Intersection over Union (IoU) only considers the overlapping part of the prediction frame and the real frame, and does not consider the area between the prediction frame and the real frame, so that a deviation possibly exists in the evaluation result. The factors of aspect ratio, centroid distance, overlapping area and the like are considered, and the arithmetic calculation power consumption caused by an inverse trigonometric function is reduced. WIoU v3 is more suitable for the fuzzy small target detection task, WIoU v is selected as a loss function of YOLOv-CW, and the specific calculation process of the WIoU v loss function is as follows.

The L _WIoUv1 bounding box loss function is defined as follows:

L_IoU=1-IoU

L_WIoUv1＝R_WIoUL_IoU

Wherein IoU is used to measure the degree of overlap between the prediction bounding box and the real bounding box, L _IoU is the bounding box loss function, c _h and c _w represent the height and width of the smallest closed box formed by the prediction and real boxes respectively, AndRepresenting the centroid coordinates of the real box,AndRepresenting the centroid coordinates of the prediction box.

Multiplying the gradient gain r on the basis of L _WIoUv1 defines the L _WIoUv3 bounding box loss function as follows:

Where r is a non-monotonic focal factor, L _WIoUv1 is a bounding box loss function, Is a monotonic focal coefficient and is used for the lens,For the set average bounding box loss value, the ratio of the two values is beta, the sample quality can be assessed through beta, and delta and alpha are hyper-parameters which are manually set by adapting to different models.

WIoU v3 uses a dynamic non-monotonic mechanism to evaluate the quality of the anchor frame, so that the model is more focused on the anchor frame with common quality, and the capability of the model for positioning the object is improved. For a target detection task in a complex underground scene, due to the fact that the target is fuzzy and the target size is small, the detection difficulty is large, and the WIoU v loss function can dynamically optimize the loss weight of a small target so as to improve the detection performance of YOLOv-MCW.

S34, model detection effect and performance evaluation in order to test the improved model detection performance of the invention, the invention uses precision, recall, mAP@0.5, model parameter scale, model calculation amount (floating point operations, FLOPs), model training time and detection speed (unit: frame/S) as evaluation indexes. The above-described formulas of the evaluation index use the parameters TP (predicted as positive sample, actually positive sample), FP (predicted as positive sample, actually negative sample) and FN (predicted as negative sample, actually positive sample).

S4, tracking by multiple cameras based on DeepSort tracking algorithm, when tracking targets, tracking failure can be caused due to mutual shielding among targets and light change in the environment, and further judgment of follow-up catch-up relation among targets is affected. Therefore, it is necessary to effectively use multi-view information to improve the accuracy and stability of target tracking.

S41, calculating homography matrixes corresponding to cameras with different view angles, mapping the image space with the overlapping area by using the homography matrixes, and further processing shielding to complete target matching. The homography matrix may establish correspondence between different images so that the model can infer information in one image from another.

Specifically, the overlapping area between two or more different visual angles is utilized, a shift algorithm is used for extracting key points, the relation between the image space and the world coordinate system is utilized for deducing the spatial relation between the different visual angles, and then the homography matrix between the different visual angles is obtained.

S42, deepSort is a multi-target tracking algorithm, uses a Kalman filtering algorithm to infer the motion quantity at the next moment according to the motion quantity at the current moment, combines the motion quantity information and the appearance information, and uses a Hungary algorithm to match the detection frame and the prediction frame. In each frame, the algorithm first predicts all possible targets using a deep learning model, then data correlates according to the appearance information, and finally updates the trajectory information using a kalman filter.

The kalman filter algorithm is a recursive filter algorithm that continuously improves the estimation of the state by updating the observations.

The hungarian algorithm (Hungarian algorithm) is an optimization algorithm that solves the allocation problem (ASSIGNMENT PROBLEM), the principle being based on maximum matching in graph theory and linear programming methods for finding the best way of task allocation in a given cost matrix, in order to match multiple targets from frame to frame in multi-target tracking, including the appearance of new targets, the disappearance of old targets, and the target id matching of previous and current frames.

The algorithm has excellent performance in complex scenes such as dense pedestrians, shielding and the like, can accurately track the motion trail of the pedestrians, and provides powerful support for acquiring the information of the moving pedestrians in real time. Meanwhile, the excellent performance of realizing multi-camera tracking based on DeepSort tracking algorithm also proves that the combination of the two can realize more efficient and more accurate target tracking.

S43, calculating a homography matrix among videos, splicing the multi-view videos, taking the multi-view videos as input, and predicting data by using Kalman filtering.

And then, combining the homography matrix and the Hungary algorithm to judge the target, thereby realizing target tracking among multiple cameras.

And S5, calculating motion vectors and distances between tracking targets, namely acquiring the position of a central point of an object in each frame based on a tracking result, and calculating the motion direction and speed of the object and Euclidean distance between two points according to the change of the position of the central point in each pair of adjacent frames to obtain the motion information of the object.

The specific steps of the catch-up decision S6 are 1) if the speed of the object a is greater than the object B and this is continuous, it is considered that catch-up is possible, 2) calculating the direction vectors of motion of the two objects, and if the two vector angles are small (i.e. the two direction vectors are nearly parallel, an angle threshold can be set, e.g. 30 degrees), indicating that the object a is moving towards the object B. 3) This further confirms that catch-up behaviour is present if the distance between object a and object B decreases over time.

Compared with the prior art, the invention has the following beneficial effects:

1. Aiming at the problems of fuzzy data and smaller targets of materials collected in a special underground scene of a coal mine, the invention adds a CA attention mechanism to better finish feature extraction, well solves the problems of ambient light interference and multiple and complex targets, and obtains the rise of detection precision.

2. The method and the device improve the accuracy and stability of target tracking, solve the problems of visual angle change, shielding and the like in a multi-camera environment through multi-camera tracking, improve the accuracy and robustness of target tracking, effectively utilize multi-visual angle information, improve the accuracy and stability of target tracking, better cope with challenges such as shielding, light change and the like, and provide powerful support for tracking of real-time miners and aerial riding devices.

3. The identification method for the overhead riding device for the coal mine personnel in the invention solves the problems of human resource waste and blind area monitoring caused by manual inspection and monitoring in the past, and the illegal operation and potential safety hazards of the coal mine personnel in the production process are timely checked, so that the working efficiency of the coal mine is improved, and the accident risk caused by the fact that the coal mine personnel do not take the overhead riding device normally is reduced.

Drawings

FIG. 1 is a schematic overall flow chart of a method for a judging person to catch up with an overhead riding device.

FIG. 2 is a schematic diagram of a working interface for LabelImg software annotation pictures.

FIG. 3 is a schematic diagram of the network architecture of the improved algorithm YOLOV-MCW of the present invention.

Fig. 4 is a block diagram of a CA attention mechanism model.

FIG. 5 is a diagram of an example of the detected targets by YOLOV-MCW algorithm.

FIG. 6 is a diagram of an example of the detected targets by YOLOV-MCW algorithm.

Fig. 7 is a schematic diagram of a Deepsort algorithm multi-camera tracking process.

FIG. 8 is a diagram showing an example of detection tracking in an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, the recognition method for the colliery personnel catching up the overhead riding device based on computer vision specifically comprises the following steps:

S1, acquiring image data, namely acquiring a real-time video stream of a monitoring point of a scene of a riding aerial passenger device by adopting an existing camera in an actual coal mine application scene, intercepting a plurality of sections of videos of the pedestrian catching the aerial passenger device, screening the intercepted videos, deleting the video with a fuzzy target, intercepting the videos frame by using PotPlayer software, and obtaining 1500 pieces of image data required by model detection for subsequent data processing.

S21, data enhancement refers to a method for generating more and richer samples by utilizing limited data to perform transformation and expansion, and is an effective means for improving the diversity of training samples, reducing the dependence of a model on specific data distribution and enhancing the generalization capability and robustness of the model.

In order to make up for the shortages of the data set, the invention adopts a data enhancement technology to expand the original training set. In this process, a imgaug framework (image enhancement library) was chosen as the implementation tool, and the imgaug framework allows for the introduction of a variety of image transformations that can be applied randomly during the training process to generate image samples with diversity. In the data enhancement process, a series of transformation strategies such as random rotation, horizontal overturn, vertical overturn, random scaling, brightness and contrast adjustment and the like are adopted, and a large number of images with different angles and scales can be generated by randomly applying the transformations during training, so that the image diversity under the real scene can be better simulated. For example, the invention carries out data enhancement by carrying out operations of horizontally turning the image by 90 degrees, rotating the image by 15 degrees clockwise, increasing 5% noise, 25% gray level, 40% brightness and the like, thus increasing the diversity of training data sets and improving the robustness of the model.

S22, marking the pictures subjected to data enhancement pretreatment by using LabelImg software (marking tool) on the images, wherein in the example of FIG. 2, the working interfaces of the pictures of the personnel and the aerial passenger device are marked by using LabelImg software, the category of the pedestrians is named as person, the aerial passenger device can be named as man_ riding _device, and txt files with position coordinates and category names are generated after marking.

In a specific operation process, a labeling frame is obtained by dragging with a mouse in a labelImg tool, then labeling and classifying are carried out on the well-framed area, and the label is named as person and man_ riding _device respectively in data labeling. In order to ensure the accuracy of image annotation in the data set, the annotation standard is uniformly set to be the smallest rectangle capable of containing the image target position before annotation.

The marked data are divided into a training set, a testing set and a verification set according to the proportion of 7:2:1. The training set 1050 pictures, the test set 300 pictures, the verification set 150 pictures and the txt format data set are formed.

S3, model training, namely, the invention adopts an improved YOLOv algorithm (target detection algorithm) to realize the identification of personnel and overhead riding devices, and the YOLOv algorithm has obvious progress in detection precision and speed, but still has room for improvement under specific application scenes. Considering that the invention carries out target detection in the place where the overhead riding device is taken in the wellhead and the tunnel, the YOLOv algorithm needs to be further optimized, and the overall performance of target detection is improved.

The improvement of YOLOv algorithm mainly relates to three core points, namely firstly, mobileNetV-Large is used for replacing a traditional trunk structure of a yolov network, the light weight and easy deployment of a model are guaranteed, secondly, a Channel Attention (CA) module is introduced, as the installation position of a camera is far away from a detection target, the detection target is small in the whole picture and the underground target is fuzzy, the small targets are easy to miss by the detection network, an attention mechanism is added to improve the detection effect of the model on an overhead riding device and personnel targets which are far away, and then WIoU v is selected as a frame regression loss function for further optimizing an algorithm model and improving the accuracy of target detection and positioning. Through the optimization strategy, the high-efficiency and accurate detection personnel and overhead riding device targets at the wellhead and underground are finally realized, and meanwhile, the improved algorithm is named YOLOv-MCW. A schematic diagram of the network architecture of the modified algorithm YOLOV-MCW is shown in fig. 3.

Further, step S3 includes the steps of:

S31, a MobileNetV-Large (a version of MobileNet series third generation of lightweight network model) is used for replacing a traditional trunk structure of the yolov network, so that the model is lightweight, and meanwhile, the accuracy of the model is not lost. MobileNetV (lightweight network model) is introduced into the model MobileNetV, 5-size depth convolution replaces partial 3 x 3 depth convolution, a squeze-and-excitation (SE) module and an h-swish (HS) activation function are introduced to improve model accuracy, a yolov-M (yolov backbone network is changed into MobileNetV3-Large to improve yolov algorithm for short) model is simple, performance and speed are excellent, and model detection effects after the backbone network is changed are shown in the following table 1:

table 1 comparison of model test results after backbone network replacement

Compared with the original YOLOv network, the improved network model has the advantages that under the conditions that the accuracy is reduced by 0.2%, the recall is reduced by 3.6% and the average accuracy is reduced by 0.4%, the detection speed is improved to 90.29FPS, the calculated amount is reduced by 11.4G compared with the original YOLOv model, and finally, the model size is reduced by 8.4MB. Table 1-related data demonstrates the high efficiency and lightweight nature of replacing the backbone network of YOLOv with MobileNetV 3.

S32, adding an attention mechanism module, namely introducing a coordinate attention mechanism (CA) in the neck network, embedding an accurate positioning signal into channel attention, and dividing the accurate positioning signal into two feature codes in different directions, wherein one direction is used for retaining a short position signal, and the other direction is used for capturing long-range dependence, and a model structure diagram of the CA attention mechanism is shown in figure 4. The feature images comprising a pair of target sensing and positioning signals are formed by encoding the feature images respectively, so that the accurate positioning and target recognition capability of the feature images can be enhanced, and the detection effect of the added attention mechanism model is shown in the following table 2:

TABLE 2 comparison of detection results of added CA attention mechanism model

As shown in step S21, after the backbone network is replaced by MobileNetV, although the detection speed and the model size are optimized, the accuracy, recall and average accuracy are all reduced, and the results in table 2 indicate that adding the CA attention mechanism can effectively compensate for the partial index loss caused by replacing the backbone part of the YOLOv network by MobileNetV 3.

S33, optimizing a loss function, namely, taking only the overlapping part of a predicted frame and a real frame into consideration in the traditional Intersection over Union (IoU measurement standard) and not taking the area between the predicted frame and the real frame into consideration, so that a deviation possibly exists in the evaluation result. Considering aspect ratio, centroid distance, overlapping area and other factors, reducing arithmetic calculation power consumption caused by inverse trigonometric function, and excellent dynamic non-monotonic focusing mechanism of WIoU v being more suitable for fuzzy small target detection task, the invention selects WIoU v3 as a loss function of YOLOv-MC (changing yolov backbone network to MobileNetV-Large and improving yolov8 algorithm after adding attention module), and the specific calculation process of WIoU v3 loss function is as follows.

The L _WIoUv1 bounding box loss function is defined as follows:

L_IoU=1-IoU

L_WIoUv1＝R_WIoUL_IoU

Where r is a non-monotonic focal factor, L _WIoUv1 is a bounding box loss function, Is a monotonic focal coefficient and is used for the lens,For the set average bounding box loss value, the ratio of the two is beta, the sample quality can be assessed through beta, and delta and alpha are hyper-parameters which are manually set by adapting to different models, and are set to be 1.9 and 3 in the invention.

WIoU v3 uses a dynamic non-monotonic mechanism to evaluate the quality of the anchor frame, so that the model is more focused on the anchor frame with common quality, and the capability of the model for positioning the object is improved. For a target detection task in a complex underground scene, due to the fact that the target is fuzzy and the target size is small, the detection difficulty is large, and the WIoU v loss function can dynamically optimize the loss weight of a small target so as to improve the detection performance of YOLOv-MCW. The model test effect of the different loss functions is shown in table 3 below:

Table 3 yolov-MC comparison of the results of the detection of the introduction of different loss functions

From the data in Table 3, it can be clearly observed that the YOLOv-MCW improved model provided by the invention has obvious advantages in the detection tasks of underground personnel and overhead riding devices of the coal mine. The improvement algorithm YOLOv-MCW detected personnel and overhead ride targets as shown in fig. 5 and 6, with the upper left corner indicating the detected category and confidence. Compared with the traditional YOLOv target detection algorithm model, the improved model of the invention shows more excellent performance in detecting underground personnel of a coal mine and targets of an overhead riding device.

S34, model detection effect and performance evaluation in order to test the detection performance of the improved model of the invention, the invention uses precision, recall, mAP@0.5, model parameter scale, model calculation amount (floating point operations, FLOPs), model training time and detection speed (unit: frame/S) as evaluation indexes. The above-described formulas of the evaluation index use the parameters TP (predicted as positive sample, actually positive sample), FP (predicted as positive sample, actually negative sample) and FN (predicted as negative sample, actually positive sample).

The accuracy is the ratio of the number of positive samples predicted by the model to the number of all samples detected, calculated as follows:

Where TP represents the number of positive samples of the model prediction and FP represents the number of negative samples of the model prediction.

The recall is the ratio of the number of positive samples correctly predicted by the model to the number of positive samples actually present, and is calculated as follows:

Where TP represents the positive number of samples of the model prediction and FN represents the positive number of samples of the model prediction error.

The Average Precision (AP) is the precision and area under the recall curve, and the average precision mean (mAP@0.5) is the result obtained by a weighted average of the AP values of all sample classes, used to evaluate the detection performance of the model in all classes, where the threshold of the intersection ratio of the prediction and real frames is set to 0.5, with the following formula:

Where N is the sample class, AP represents the precision and area under the recall curve, and @0.5 represents the IoU threshold used when calculating the precision and area under the recall curve is 0.5.

S4, realizing multi-camera tracking based on DeepSort tracking algorithm. When tracking targets, tracking failure is caused by mutual shielding among the targets and light change in the environment, so that subsequent judgment of the catch-up relationship among the targets is affected. Therefore, it is necessary to effectively use multi-view information to improve the accuracy and stability of target tracking. A schematic diagram of a multi-camera tracking procedure based on Deepsort algorithm is shown in fig. 7.

Specifically, with overlapping regions between two or more different perspectives, keypoints are extracted using the shift algorithm. And deducing the spatial relation between different visual angles by utilizing the relation between the image space and the world coordinate system, and then obtaining the homography matrix between the different visual angles.

S42, deepSort, deepSort is a multi-target tracking algorithm, which uses a kalman filtering algorithm to infer the motion amount at the next moment according to the motion amount at the current moment, and uses a hungarian algorithm to match the detection frame and the prediction frame in combination with the motion amount information and the appearance information. In each frame, the algorithm first predicts all possible targets using a deep learning model, then data correlates according to the appearance information, and finally updates the trajectory information using a kalman filter.

With respect to the kalman filter algorithm, it is a recursive filter algorithm that continuously improves the estimation of the state by updating the observations.

Regarding the hungarian algorithm, hungarian algorithm (Hungarian algorithm) is an optimization algorithm that solves the allocation problem (ASSIGNMENT PROBLEM). The principle is based on a maximum matching and linear programming method in graph theory, and is used for finding the optimal task allocation mode in a given cost matrix, so as to match a plurality of targets between frames in multi-target tracking, wherein the targets comprise the appearance of a new target, the disappearance of an old target and the target id matching of a previous frame and a current frame.

The algorithm has excellent performance in complex scenes such as dense pedestrians, shielding and the like, can accurately track the motion trail of the pedestrians, and provides powerful support for acquiring the information of the moving pedestrians in real time. Meanwhile, the excellent performance of realizing multi-camera tracking based on DeepSort tracking algorithm also proves that the combination of the two can realize more efficient and more accurate target tracking. FIG. 8 shows an example graph of personnel and overhead rides detected using the target detection algorithm and tracking algorithm employed by the present invention, with the upper left corner indicating the tracked target class and class id.

S43, realizing target tracking by multiple cameras, calculating a homography matrix among videos, splicing the multi-view videos, taking the multi-view videos as input, and predicting data by using Kalman filtering.

And then, combining the homography matrix and the Hungary algorithm to judge the target, so that the target tracking among multiple cameras is realized, and the overall flow of the target tracking among the multiple cameras is shown in figure 4.

S5, calculating the distance between the motion vector of the tracking target and the target, acquiring the position of the center point of the object in each frame based on the tracking result, and calculating the motion direction and the motion speed of the object and the Euclidean distance between the two points according to the change of the position of the center point in each pair of adjacent frames, so that the motion information of the object can be obtained.

Further, the step S5 includes the steps of:

S51, calculating Euclidean distance between two targets according to the detected coordinate information, wherein the calculation formula is as follows:

Where (x _min1,y_min1,x_max1,y_max1) and (x _min2,y_min2,x_max2,y_max2) are the coordinates of two different targets, respectively, (x _center1,y_center1) and (x _center2,y_center2) are the coordinates of the center points of the two targets, and d is the Euclidean distance between the targets.

S52, according to tracking results, respectively calculating the movement directions of the same target in n continuous frames, and calculating the direction included angles of two types of targets of the pedestrian (target A) and the overhead passenger device (target B), wherein the calculation formula is as follows:

Where x ₁、y₁ is the abscissa of the current frame of object A, x '₁、y′₁ is the abscissa of the next frame of object A, x ₂、y₂ is the abscissa of the current frame of object B, x' ₂、y′₂ is the abscissa of the next frame of object B, Respectively representing vectors of the two targets, and theta is the included angle of the two targets.

S53, calculating the relative movement speed between targets, calculating the movement time according to the Euclidean distance result obtained in the step S61 through the frame rate and the counted frame number, and finally obtaining the movement speed, wherein the calculation formula is as follows:

Where frame_num is the set number of frames, fps is the video frame rate, and d is the distance the target moves within the set number of frames.

S6, catch-up judging 1) according to the relative movement speed between the targets calculated in S63, if the speed of the object A is greater than that of the object B and the situation is continuous, the catch-up is considered to be possible, 2) according to the included angle between the movement directions of the two objects calculated in S62, if the included angle between the movement directions in the continuous frames is smaller than the set threshold (the threshold is set to 20 DEG), the object A is moved towards the object B, and 3) if the distance between the object A and the object B is reduced with time, the catch-up behavior is further confirmed.

The foregoing description of the preferred embodiment of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method for identifying coal mine personnel chasing overhead passenger devices based on computer vision, characterized in that the specific steps are:

Step 1: Use existing cameras in actual coal mine application scenarios to obtain real-time video streams from monitoring points of the aerial ride scene, intercept multiple videos of pedestrians chasing the aerial ride, and then filter the intercepted videos, intercept the videos frame by frame, and obtain a data set;

Step 2: Perform data enhancement preprocessing and data labeling on the data obtained in step 1;

Step 3: Use the YOLOv8-MCW improved algorithm to identify people and overhead passenger devices;

The YOLOv8-MCW improved algorithm specifically includes: using MobileNetV3-Large as the backbone structure, introducing 5×5 deep convolution, Squeeze-and-excitation module and h-swish activation function;

The coordinate attention mechanism is introduced into the neck network and the positioning signal is embedded in the channel attention. Two feature codes with different directions are divided, one of which is used to retain short position signals and the other is used to capture long-range dependencies. They are encoded separately to form a feature image containing a pair of target perception and positioning signals.

Select WIoUv3 as the loss function of YOLOv8-MCW. The specific calculation process of WIoUv3 loss function is as follows:

The _LWIoUv1 bounding box loss function is defined as follows:

L _IoU = 1-IoU

L _WIoUv1 ＝R _WIoU L _IoU

Where IoU is used to measure the overlap between the predicted bounding box and the true bounding box, L _IoU is the bounding box loss function, _ch represents the height of the minimum enclosed box formed by the predicted box and the true box, _cw represents the width of the minimum enclosed box formed by the predicted box and the true box, and represents the centroid coordinates of the real box, and Represents the centroid coordinates of the prediction box;

The _LWIoUv3 bounding box loss function is defined by multiplying the gradient gain r on the basis of _LWIoUv1 , as shown in the following formula:

Where r is the non-monotonic focus factor, _LWIoUv1 is the bounding box loss function, is the monotone focus coefficient, is the average bounding box loss value set, the ratio of the two is β, and the sample quality can be evaluated by β. Both δ and α are hyperparameters manually set to adapt to different models;

Use precision, recall, mAP@0.5, model parameter size, model computation, model training time, and detection speed as evaluation indicators;

Step 4: Realize target tracking between multiple cameras based on DeepSort tracking algorithm;

Step 5: Calculate the motion vector and distance between the tracking targets;

Step 6: Determine whether there is any pursuit behavior.

2. The method for identifying coal mine personnel chasing overhead passenger devices based on computer vision according to claim 1 is characterized in that, in step 2, the specific steps are as follows:

Step S21, data enhancement preprocessing: transform and expand the data collected in step 1 to generate new samples;

Step S22, data labeling: label pedestrians on the image after data enhancement preprocessing and name the category as person, label overhead riding devices on the image after data enhancement preprocessing and name the category as man_riding_device. After labeling, a txt file with position coordinates and category name will be generated. The labeled data set is divided into training set, test set and validation set in a ratio of 7:2:1.

3. The method for identifying coal mine personnel chasing overhead passenger devices based on computer vision according to claim 2, characterized in that in step 4, the specific steps are as follows:

Step S41, calculating the homography matrix corresponding to the cameras with different viewing angles, using the homography matrix to map the image space with overlapping areas, processing the occlusion and completing the target matching;

Step S42: Use the Kalman filter algorithm to infer the amount of motion at the next moment based on the amount of motion at the current moment, combine the motion amount information and the appearance information, and use the Hungarian algorithm to match the detection frame and the prediction frame;

Step S43, calculate the homography matrix between videos, stitch the multi-view videos, and use the stitched videos as input, use Kalman filtering to predict data, and then combine the homography matrix and Hungarian algorithm to determine the target, so as to achieve target tracking between multiple cameras.

4. According to the computer vision-based identification method of coal mine personnel chasing overhead passenger devices according to claim 3, it is characterized in that in step five, the specific calculation steps are: based on the tracking results, the center point position of the object in each frame is obtained, and in each pair of adjacent frames, the movement direction, speed, and Euclidean distance between the tracking targets of the object are calculated according to the changes in the center point position to obtain the movement information of the object.

5. According to the computer vision-based identification method of coal mine personnel chasing overhead passenger devices as described in claim 4, it is characterized in that in step six, the specific determination method is: 1) If the speed of object A is greater than the speed of object B, and this situation is continuous, it is considered that there is a possibility of chasing; 2) Calculate the movement direction vectors of object A and object B. If the angle of the two vectors is less than the set value, it is determined that object A is moving towards object B; 3) If the distance between object A and object B decreases over time, it is determined that there is a chasing behavior.