CN119152575B - Identification method of coal mine personnel chasing overhead passenger devices based on computer vision - Google Patents
Identification method of coal mine personnel chasing overhead passenger devices based on computer visionInfo
- Publication number
- CN119152575B CN119152575B CN202411288242.5A CN202411288242A CN119152575B CN 119152575 B CN119152575 B CN 119152575B CN 202411288242 A CN202411288242 A CN 202411288242A CN 119152575 B CN119152575 B CN 119152575B
- Authority
- CN
- China
- Prior art keywords
- overhead
- algorithm
- coal mine
- tracking
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 239000003245 coal Substances 0.000 title claims abstract description 23
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 64
- 230000033001 locomotion Effects 0.000 claims abstract description 31
- 238000012544 monitoring process Methods 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000001514 detection method Methods 0.000 claims description 49
- 230000006870 function Effects 0.000 claims description 31
- 230000007246 mechanism Effects 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 238000002372 labelling Methods 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 8
- 230000003121 nonmonotonic effect Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 230000007423 decrease Effects 0.000 claims description 2
- 230000008447 perception Effects 0.000 claims description 2
- 238000010200 validation analysis Methods 0.000 claims 1
- 238000004519 manufacturing process Methods 0.000 abstract description 8
- 230000006399 behavior Effects 0.000 abstract description 3
- 238000007689 inspection Methods 0.000 abstract description 3
- 239000002699 waste material Substances 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 10
- 230000000007 visual effect Effects 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 230000008859 change Effects 0.000 description 6
- 230000006872 improvement Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 229910000831 Steel Inorganic materials 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012821 model calculation Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 239000010959 steel Substances 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Abstract
The invention belongs to the technical field of computer vision and discloses a recognition method of a colliery personnel catch-up overhead riding device based on computer vision, which comprises the following steps of acquiring a real-time video stream of a monitoring point of a scene of a riding overhead riding device, intercepting a plurality of sections of videos of the pedestrian catch-up overhead riding device, intercepting the videos frame by frame and obtaining a data set; the method comprises the steps of preprocessing acquired data, marking the data, identifying personnel and overhead riding devices by adopting an improved algorithm, tracking a plurality of cameras based on a tracking algorithm, calculating motion vectors and distances among tracking targets, judging whether catch-up behaviors exist or not, solving the problems of human resource waste and blind area monitoring caused by manual inspection and monitoring in the past, timely checking illegal operation and potential safety hazards of coal mine workers in the production process, improving the working efficiency of the coal mine, and reducing the risk of accidents caused by the fact that the coal mine personnel do not take the overhead riding devices normally.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a recognition method of a colliery personnel catch-up overhead riding device based on computer vision.
Background
Traditional colliery production process often relies on artificial inspection and control, has the problem of manpower resources waste and blind area control, and colliery workman's operation violating regulations and potential safety hazard in the production process can not in time be examined moreover, can reduce colliery work efficiency, and the factor of safety risk rises. Along with the floor application of artificial intelligence technology, intelligent management of the coal mine industry is also becoming a trend. Based on the video image, intelligent recognition analysis of underground monitoring scenes of the coal mine can be realized by deploying an AI algorithm model, and abnormal events, illegal operations, potential safety hazards and the like in the safety production process of the coal mine can be monitored and early warned in real time.
The overhead man-riding device is a device for underground transportation of mines, and consists of a steel wire rope, pulleys, a motor and the like, and can run up and down in inclined shafts or vertical shafts. Through the field practical application of the mine enterprise overhead man-riding device, not only can the production efficiency of mines in China be effectively improved and improved, but also the working intensity of mine workers can be effectively reduced. However, in the practical application scene of the overhead passenger device, a series of accidents caused by the fact that the device is not used normally often exist. The coal mine personnel catch up the overhead riding device, so that the personnel fall down and get on the car, the device generates larger deflection, and the steel wire rope falls down or the rope clip is clamped on the rope supporting wheel, so that safety accidents occur. Therefore, the abnormal behavior is detected, and the alarm result is pushed in time, so that the safety problem can be effectively reduced.
Disclosure of Invention
The invention provides an identification method of a colliery personnel catch-up aerial riding device based on computer vision, which aims to solve the technical problems in the prior art, mainly comprises the steps of acquiring a real-time video stream of a monitoring point of a scene of the riding aerial riding device, performing frame extraction processing on video, preprocessing and marking a cut-out data set, detecting a miner and an independent aerial riding device target in the video by adopting a YOLOv-MCW improved algorithm, tracking the detected target by adopting a target tracking technology, considering the problem that identity loss and switching of the target are easy to occur due to mutual shielding and shielding among the targets in the tracking process, utilizing homography matrixes of two visual angles, realizing the splicing of images of different visual angles, judging the moving speed and the moving direction of the target based on a tracking result, combining the calculated distance of the two targets, and finally judging whether a person has the action of catching up the aerial riding device.
The recognition method for the colliery personnel catching up the overhead riding device based on computer vision comprises the following specific steps:
S1, acquiring image data, namely acquiring a real-time video stream of a monitoring point of a scene of a riding overhead riding device by adopting an existing camera in an actual coal mine application scene, intercepting a plurality of sections of videos of the pedestrian catching the overhead riding device, screening the intercepted videos, and intercepting the videos frame by frame to obtain a data set for subsequent processing.
S2, data set manufacturing, namely respectively carrying out data enhancement preprocessing and data labeling on the acquired data.
Further, step S2 includes the steps of:
S21, data enhancement, namely transforming and expanding the data collected in the step S1 so as to generate more and richer samples, so that the diversity of training samples can be improved, the dependence of the model on specific data distribution can be reduced, and the generalization capability and the robustness of the model can be enhanced.
S22, marking the pictures subjected to data enhancement pretreatment by using LabelImg software on images, namely marking pedestrians and overhead riding devices, namely, a pedestrian category is named as person, the overhead riding devices are named as man_ riding _device, generating txt files with position coordinates and category names after marking, and dividing the marked data set into a training set, a test set and a verification set according to the proportion of 7:2:1.
S3, model training, namely, the method adopts YOLOv-MCW improved algorithm to realize the identification of personnel and overhead riding devices, and YOLOv improved algorithm has significant progress in detection precision and speed, but still has room for improvement under specific application scenes. Considering that the invention carries out target detection in the place where the overhead riding device is taken in the wellhead and the tunnel, the YOLOv algorithm needs to be further optimized, and the overall performance of target detection is improved.
The YOLOv algorithm comprises the following three core points that firstly, a MobileNetV-Large is used for replacing a traditional trunk structure of a yolov network, the light weight and easy deployment of a model are guaranteed, secondly, a channel attention module is introduced, because the installation position of a camera is far away from a detection target, the detection target is small in the whole picture and the underground target is fuzzy, and the small targets are often easy to miss by the detection network, an attention mechanism is added to improve the detection effect of the model on a remote aerial multiplying device and personnel targets, and WIoU v is selected as a frame regression loss function for further optimizing the algorithm model and improving the accuracy of target detection positioning. Through the optimization strategy, the aims of efficiently and accurately detecting personnel and overhead riding devices at the wellhead and underground are finally achieved, and the improved algorithm is named YOLOv-MCW.
Further, step S3 includes the steps of:
S31, the MobileNetV-Large is used for replacing a traditional backbone structure of the yolov network, and the accuracy of the model is almost lost while the lightweight model is ensured. MobileNetV x 5-sized depth convolution is introduced in MobileNetV instead of partial 3 x 3 depth convolution, and a squeze-and-excitation (SE) module and an h-swish (HS) activation function are introduced to improve model accuracy, a yolov-M model is concise, and both performance and speed are excellent.
And S32, an attention mechanism adding module, namely introducing a coordinate attention mechanism (CA) in the neck network, embedding an accurate positioning signal into channel attention, dividing the accurate positioning signal into two feature codes in different directions, wherein one direction is used for retaining a short position signal, the other direction is used for capturing long-range dependence, and the feature codes in the two directions are respectively used for forming a feature image containing a pair of target perception and positioning signals, so that the accurate positioning and target recognition capability of the feature image can be enhanced.
S33, optimizing a loss function, namely, the traditional Intersection over Union (IoU) only considers the overlapping part of the prediction frame and the real frame, and does not consider the area between the prediction frame and the real frame, so that a deviation possibly exists in the evaluation result. The factors of aspect ratio, centroid distance, overlapping area and the like are considered, and the arithmetic calculation power consumption caused by an inverse trigonometric function is reduced. WIoU v3 is more suitable for the fuzzy small target detection task, WIoU v is selected as a loss function of YOLOv-CW, and the specific calculation process of the WIoU v loss function is as follows.
The L WIoUv1 bounding box loss function is defined as follows:
LIoU=1-IoU
LWIoUv1=RWIoULIoU
Wherein IoU is used to measure the degree of overlap between the prediction bounding box and the real bounding box, L IoU is the bounding box loss function, c h and c w represent the height and width of the smallest closed box formed by the prediction and real boxes respectively, AndRepresenting the centroid coordinates of the real box,AndRepresenting the centroid coordinates of the prediction box.
Multiplying the gradient gain r on the basis of L WIoUv1 defines the L WIoUv3 bounding box loss function as follows:
Where r is a non-monotonic focal factor, L WIoUv1 is a bounding box loss function, Is a monotonic focal coefficient and is used for the lens,For the set average bounding box loss value, the ratio of the two values is beta, the sample quality can be assessed through beta, and delta and alpha are hyper-parameters which are manually set by adapting to different models.
WIoU v3 uses a dynamic non-monotonic mechanism to evaluate the quality of the anchor frame, so that the model is more focused on the anchor frame with common quality, and the capability of the model for positioning the object is improved. For a target detection task in a complex underground scene, due to the fact that the target is fuzzy and the target size is small, the detection difficulty is large, and the WIoU v loss function can dynamically optimize the loss weight of a small target so as to improve the detection performance of YOLOv-MCW.
S34, model detection effect and performance evaluation in order to test the improved model detection performance of the invention, the invention uses precision, recall, mAP@0.5, model parameter scale, model calculation amount (floating point operations, FLOPs), model training time and detection speed (unit: frame/S) as evaluation indexes. The above-described formulas of the evaluation index use the parameters TP (predicted as positive sample, actually positive sample), FP (predicted as positive sample, actually negative sample) and FN (predicted as negative sample, actually positive sample).
S4, tracking by multiple cameras based on DeepSort tracking algorithm, when tracking targets, tracking failure can be caused due to mutual shielding among targets and light change in the environment, and further judgment of follow-up catch-up relation among targets is affected. Therefore, it is necessary to effectively use multi-view information to improve the accuracy and stability of target tracking.
S41, calculating homography matrixes corresponding to cameras with different view angles, mapping the image space with the overlapping area by using the homography matrixes, and further processing shielding to complete target matching. The homography matrix may establish correspondence between different images so that the model can infer information in one image from another.
Specifically, the overlapping area between two or more different visual angles is utilized, a shift algorithm is used for extracting key points, the relation between the image space and the world coordinate system is utilized for deducing the spatial relation between the different visual angles, and then the homography matrix between the different visual angles is obtained.
S42, deepSort is a multi-target tracking algorithm, uses a Kalman filtering algorithm to infer the motion quantity at the next moment according to the motion quantity at the current moment, combines the motion quantity information and the appearance information, and uses a Hungary algorithm to match the detection frame and the prediction frame. In each frame, the algorithm first predicts all possible targets using a deep learning model, then data correlates according to the appearance information, and finally updates the trajectory information using a kalman filter.
The kalman filter algorithm is a recursive filter algorithm that continuously improves the estimation of the state by updating the observations.
The hungarian algorithm (Hungarian algorithm) is an optimization algorithm that solves the allocation problem (ASSIGNMENT PROBLEM), the principle being based on maximum matching in graph theory and linear programming methods for finding the best way of task allocation in a given cost matrix, in order to match multiple targets from frame to frame in multi-target tracking, including the appearance of new targets, the disappearance of old targets, and the target id matching of previous and current frames.
The algorithm has excellent performance in complex scenes such as dense pedestrians, shielding and the like, can accurately track the motion trail of the pedestrians, and provides powerful support for acquiring the information of the moving pedestrians in real time. Meanwhile, the excellent performance of realizing multi-camera tracking based on DeepSort tracking algorithm also proves that the combination of the two can realize more efficient and more accurate target tracking.
S43, calculating a homography matrix among videos, splicing the multi-view videos, taking the multi-view videos as input, and predicting data by using Kalman filtering.
And then, combining the homography matrix and the Hungary algorithm to judge the target, thereby realizing target tracking among multiple cameras.
And S5, calculating motion vectors and distances between tracking targets, namely acquiring the position of a central point of an object in each frame based on a tracking result, and calculating the motion direction and speed of the object and Euclidean distance between two points according to the change of the position of the central point in each pair of adjacent frames to obtain the motion information of the object.
The specific steps of the catch-up decision S6 are 1) if the speed of the object a is greater than the object B and this is continuous, it is considered that catch-up is possible, 2) calculating the direction vectors of motion of the two objects, and if the two vector angles are small (i.e. the two direction vectors are nearly parallel, an angle threshold can be set, e.g. 30 degrees), indicating that the object a is moving towards the object B. 3) This further confirms that catch-up behaviour is present if the distance between object a and object B decreases over time.
Compared with the prior art, the invention has the following beneficial effects:
1. Aiming at the problems of fuzzy data and smaller targets of materials collected in a special underground scene of a coal mine, the invention adds a CA attention mechanism to better finish feature extraction, well solves the problems of ambient light interference and multiple and complex targets, and obtains the rise of detection precision.
2. The method and the device improve the accuracy and stability of target tracking, solve the problems of visual angle change, shielding and the like in a multi-camera environment through multi-camera tracking, improve the accuracy and robustness of target tracking, effectively utilize multi-visual angle information, improve the accuracy and stability of target tracking, better cope with challenges such as shielding, light change and the like, and provide powerful support for tracking of real-time miners and aerial riding devices.
3. The identification method for the overhead riding device for the coal mine personnel in the invention solves the problems of human resource waste and blind area monitoring caused by manual inspection and monitoring in the past, and the illegal operation and potential safety hazards of the coal mine personnel in the production process are timely checked, so that the working efficiency of the coal mine is improved, and the accident risk caused by the fact that the coal mine personnel do not take the overhead riding device normally is reduced.
Drawings
FIG. 1 is a schematic overall flow chart of a method for a judging person to catch up with an overhead riding device.
FIG. 2 is a schematic diagram of a working interface for LabelImg software annotation pictures.
FIG. 3 is a schematic diagram of the network architecture of the improved algorithm YOLOV-MCW of the present invention.
Fig. 4 is a block diagram of a CA attention mechanism model.
FIG. 5 is a diagram of an example of the detected targets by YOLOV-MCW algorithm.
FIG. 6 is a diagram of an example of the detected targets by YOLOV-MCW algorithm.
Fig. 7 is a schematic diagram of a Deepsort algorithm multi-camera tracking process.
FIG. 8 is a diagram showing an example of detection tracking in an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, the recognition method for the colliery personnel catching up the overhead riding device based on computer vision specifically comprises the following steps:
S1, acquiring image data, namely acquiring a real-time video stream of a monitoring point of a scene of a riding aerial passenger device by adopting an existing camera in an actual coal mine application scene, intercepting a plurality of sections of videos of the pedestrian catching the aerial passenger device, screening the intercepted videos, deleting the video with a fuzzy target, intercepting the videos frame by using PotPlayer software, and obtaining 1500 pieces of image data required by model detection for subsequent data processing.
S2, data set manufacturing, namely respectively carrying out data enhancement preprocessing and data labeling on the acquired data.
S21, data enhancement refers to a method for generating more and richer samples by utilizing limited data to perform transformation and expansion, and is an effective means for improving the diversity of training samples, reducing the dependence of a model on specific data distribution and enhancing the generalization capability and robustness of the model.
In order to make up for the shortages of the data set, the invention adopts a data enhancement technology to expand the original training set. In this process, a imgaug framework (image enhancement library) was chosen as the implementation tool, and the imgaug framework allows for the introduction of a variety of image transformations that can be applied randomly during the training process to generate image samples with diversity. In the data enhancement process, a series of transformation strategies such as random rotation, horizontal overturn, vertical overturn, random scaling, brightness and contrast adjustment and the like are adopted, and a large number of images with different angles and scales can be generated by randomly applying the transformations during training, so that the image diversity under the real scene can be better simulated. For example, the invention carries out data enhancement by carrying out operations of horizontally turning the image by 90 degrees, rotating the image by 15 degrees clockwise, increasing 5% noise, 25% gray level, 40% brightness and the like, thus increasing the diversity of training data sets and improving the robustness of the model.
S22, marking the pictures subjected to data enhancement pretreatment by using LabelImg software (marking tool) on the images, wherein in the example of FIG. 2, the working interfaces of the pictures of the personnel and the aerial passenger device are marked by using LabelImg software, the category of the pedestrians is named as person, the aerial passenger device can be named as man_ riding _device, and txt files with position coordinates and category names are generated after marking.
In a specific operation process, a labeling frame is obtained by dragging with a mouse in a labelImg tool, then labeling and classifying are carried out on the well-framed area, and the label is named as person and man_ riding _device respectively in data labeling. In order to ensure the accuracy of image annotation in the data set, the annotation standard is uniformly set to be the smallest rectangle capable of containing the image target position before annotation.
The marked data are divided into a training set, a testing set and a verification set according to the proportion of 7:2:1. The training set 1050 pictures, the test set 300 pictures, the verification set 150 pictures and the txt format data set are formed.
S3, model training, namely, the invention adopts an improved YOLOv algorithm (target detection algorithm) to realize the identification of personnel and overhead riding devices, and the YOLOv algorithm has obvious progress in detection precision and speed, but still has room for improvement under specific application scenes. Considering that the invention carries out target detection in the place where the overhead riding device is taken in the wellhead and the tunnel, the YOLOv algorithm needs to be further optimized, and the overall performance of target detection is improved.
The improvement of YOLOv algorithm mainly relates to three core points, namely firstly, mobileNetV-Large is used for replacing a traditional trunk structure of a yolov network, the light weight and easy deployment of a model are guaranteed, secondly, a Channel Attention (CA) module is introduced, as the installation position of a camera is far away from a detection target, the detection target is small in the whole picture and the underground target is fuzzy, the small targets are easy to miss by the detection network, an attention mechanism is added to improve the detection effect of the model on an overhead riding device and personnel targets which are far away, and then WIoU v is selected as a frame regression loss function for further optimizing an algorithm model and improving the accuracy of target detection and positioning. Through the optimization strategy, the high-efficiency and accurate detection personnel and overhead riding device targets at the wellhead and underground are finally realized, and meanwhile, the improved algorithm is named YOLOv-MCW. A schematic diagram of the network architecture of the modified algorithm YOLOV-MCW is shown in fig. 3.
Further, step S3 includes the steps of:
S31, a MobileNetV-Large (a version of MobileNet series third generation of lightweight network model) is used for replacing a traditional trunk structure of the yolov network, so that the model is lightweight, and meanwhile, the accuracy of the model is not lost. MobileNetV (lightweight network model) is introduced into the model MobileNetV, 5-size depth convolution replaces partial 3 x 3 depth convolution, a squeze-and-excitation (SE) module and an h-swish (HS) activation function are introduced to improve model accuracy, a yolov-M (yolov backbone network is changed into MobileNetV3-Large to improve yolov algorithm for short) model is simple, performance and speed are excellent, and model detection effects after the backbone network is changed are shown in the following table 1:
table 1 comparison of model test results after backbone network replacement
Compared with the original YOLOv network, the improved network model has the advantages that under the conditions that the accuracy is reduced by 0.2%, the recall is reduced by 3.6% and the average accuracy is reduced by 0.4%, the detection speed is improved to 90.29FPS, the calculated amount is reduced by 11.4G compared with the original YOLOv model, and finally, the model size is reduced by 8.4MB. Table 1-related data demonstrates the high efficiency and lightweight nature of replacing the backbone network of YOLOv with MobileNetV 3.
S32, adding an attention mechanism module, namely introducing a coordinate attention mechanism (CA) in the neck network, embedding an accurate positioning signal into channel attention, and dividing the accurate positioning signal into two feature codes in different directions, wherein one direction is used for retaining a short position signal, and the other direction is used for capturing long-range dependence, and a model structure diagram of the CA attention mechanism is shown in figure 4. The feature images comprising a pair of target sensing and positioning signals are formed by encoding the feature images respectively, so that the accurate positioning and target recognition capability of the feature images can be enhanced, and the detection effect of the added attention mechanism model is shown in the following table 2:
TABLE 2 comparison of detection results of added CA attention mechanism model
As shown in step S21, after the backbone network is replaced by MobileNetV, although the detection speed and the model size are optimized, the accuracy, recall and average accuracy are all reduced, and the results in table 2 indicate that adding the CA attention mechanism can effectively compensate for the partial index loss caused by replacing the backbone part of the YOLOv network by MobileNetV 3.
S33, optimizing a loss function, namely, taking only the overlapping part of a predicted frame and a real frame into consideration in the traditional Intersection over Union (IoU measurement standard) and not taking the area between the predicted frame and the real frame into consideration, so that a deviation possibly exists in the evaluation result. Considering aspect ratio, centroid distance, overlapping area and other factors, reducing arithmetic calculation power consumption caused by inverse trigonometric function, and excellent dynamic non-monotonic focusing mechanism of WIoU v being more suitable for fuzzy small target detection task, the invention selects WIoU v3 as a loss function of YOLOv-MC (changing yolov backbone network to MobileNetV-Large and improving yolov8 algorithm after adding attention module), and the specific calculation process of WIoU v3 loss function is as follows.
The L WIoUv1 bounding box loss function is defined as follows:
LIoU=1-IoU
LWIoUv1=RWIoULIoU
Wherein IoU is used to measure the degree of overlap between the prediction bounding box and the real bounding box, L IoU is the bounding box loss function, c h and c w represent the height and width of the smallest closed box formed by the prediction and real boxes respectively, AndRepresenting the centroid coordinates of the real box,AndRepresenting the centroid coordinates of the prediction box.
Multiplying the gradient gain r on the basis of L WIoUv1 defines the L WIoUv3 bounding box loss function as follows:
Where r is a non-monotonic focal factor, L WIoUv1 is a bounding box loss function, Is a monotonic focal coefficient and is used for the lens,For the set average bounding box loss value, the ratio of the two is beta, the sample quality can be assessed through beta, and delta and alpha are hyper-parameters which are manually set by adapting to different models, and are set to be 1.9 and 3 in the invention.
WIoU v3 uses a dynamic non-monotonic mechanism to evaluate the quality of the anchor frame, so that the model is more focused on the anchor frame with common quality, and the capability of the model for positioning the object is improved. For a target detection task in a complex underground scene, due to the fact that the target is fuzzy and the target size is small, the detection difficulty is large, and the WIoU v loss function can dynamically optimize the loss weight of a small target so as to improve the detection performance of YOLOv-MCW. The model test effect of the different loss functions is shown in table 3 below:
Table 3 yolov-MC comparison of the results of the detection of the introduction of different loss functions
From the data in Table 3, it can be clearly observed that the YOLOv-MCW improved model provided by the invention has obvious advantages in the detection tasks of underground personnel and overhead riding devices of the coal mine. The improvement algorithm YOLOv-MCW detected personnel and overhead ride targets as shown in fig. 5 and 6, with the upper left corner indicating the detected category and confidence. Compared with the traditional YOLOv target detection algorithm model, the improved model of the invention shows more excellent performance in detecting underground personnel of a coal mine and targets of an overhead riding device.
S34, model detection effect and performance evaluation in order to test the detection performance of the improved model of the invention, the invention uses precision, recall, mAP@0.5, model parameter scale, model calculation amount (floating point operations, FLOPs), model training time and detection speed (unit: frame/S) as evaluation indexes. The above-described formulas of the evaluation index use the parameters TP (predicted as positive sample, actually positive sample), FP (predicted as positive sample, actually negative sample) and FN (predicted as negative sample, actually positive sample).
The accuracy is the ratio of the number of positive samples predicted by the model to the number of all samples detected, calculated as follows:
Where TP represents the number of positive samples of the model prediction and FP represents the number of negative samples of the model prediction.
The recall is the ratio of the number of positive samples correctly predicted by the model to the number of positive samples actually present, and is calculated as follows:
Where TP represents the positive number of samples of the model prediction and FN represents the positive number of samples of the model prediction error.
The Average Precision (AP) is the precision and area under the recall curve, and the average precision mean (mAP@0.5) is the result obtained by a weighted average of the AP values of all sample classes, used to evaluate the detection performance of the model in all classes, where the threshold of the intersection ratio of the prediction and real frames is set to 0.5, with the following formula:
Where N is the sample class, AP represents the precision and area under the recall curve, and @0.5 represents the IoU threshold used when calculating the precision and area under the recall curve is 0.5.
S4, realizing multi-camera tracking based on DeepSort tracking algorithm. When tracking targets, tracking failure is caused by mutual shielding among the targets and light change in the environment, so that subsequent judgment of the catch-up relationship among the targets is affected. Therefore, it is necessary to effectively use multi-view information to improve the accuracy and stability of target tracking. A schematic diagram of a multi-camera tracking procedure based on Deepsort algorithm is shown in fig. 7.
S41, calculating homography matrixes corresponding to cameras with different view angles, mapping the image space with the overlapping area by using the homography matrixes, and further processing shielding to complete target matching. The homography matrix may establish correspondence between different images so that the model can infer information in one image from another.
Specifically, with overlapping regions between two or more different perspectives, keypoints are extracted using the shift algorithm. And deducing the spatial relation between different visual angles by utilizing the relation between the image space and the world coordinate system, and then obtaining the homography matrix between the different visual angles.
S42, deepSort, deepSort is a multi-target tracking algorithm, which uses a kalman filtering algorithm to infer the motion amount at the next moment according to the motion amount at the current moment, and uses a hungarian algorithm to match the detection frame and the prediction frame in combination with the motion amount information and the appearance information. In each frame, the algorithm first predicts all possible targets using a deep learning model, then data correlates according to the appearance information, and finally updates the trajectory information using a kalman filter.
With respect to the kalman filter algorithm, it is a recursive filter algorithm that continuously improves the estimation of the state by updating the observations.
Regarding the hungarian algorithm, hungarian algorithm (Hungarian algorithm) is an optimization algorithm that solves the allocation problem (ASSIGNMENT PROBLEM). The principle is based on a maximum matching and linear programming method in graph theory, and is used for finding the optimal task allocation mode in a given cost matrix, so as to match a plurality of targets between frames in multi-target tracking, wherein the targets comprise the appearance of a new target, the disappearance of an old target and the target id matching of a previous frame and a current frame.
The algorithm has excellent performance in complex scenes such as dense pedestrians, shielding and the like, can accurately track the motion trail of the pedestrians, and provides powerful support for acquiring the information of the moving pedestrians in real time. Meanwhile, the excellent performance of realizing multi-camera tracking based on DeepSort tracking algorithm also proves that the combination of the two can realize more efficient and more accurate target tracking. FIG. 8 shows an example graph of personnel and overhead rides detected using the target detection algorithm and tracking algorithm employed by the present invention, with the upper left corner indicating the tracked target class and class id.
S43, realizing target tracking by multiple cameras, calculating a homography matrix among videos, splicing the multi-view videos, taking the multi-view videos as input, and predicting data by using Kalman filtering.
And then, combining the homography matrix and the Hungary algorithm to judge the target, so that the target tracking among multiple cameras is realized, and the overall flow of the target tracking among the multiple cameras is shown in figure 4.
S5, calculating the distance between the motion vector of the tracking target and the target, acquiring the position of the center point of the object in each frame based on the tracking result, and calculating the motion direction and the motion speed of the object and the Euclidean distance between the two points according to the change of the position of the center point in each pair of adjacent frames, so that the motion information of the object can be obtained.
Further, the step S5 includes the steps of:
S51, calculating Euclidean distance between two targets according to the detected coordinate information, wherein the calculation formula is as follows:
Where (x min1,ymin1,xmax1,ymax1) and (x min2,ymin2,xmax2,ymax2) are the coordinates of two different targets, respectively, (x center1,ycenter1) and (x center2,ycenter2) are the coordinates of the center points of the two targets, and d is the Euclidean distance between the targets.
S52, according to tracking results, respectively calculating the movement directions of the same target in n continuous frames, and calculating the direction included angles of two types of targets of the pedestrian (target A) and the overhead passenger device (target B), wherein the calculation formula is as follows:
Where x 1、y1 is the abscissa of the current frame of object A, x '1、y′1 is the abscissa of the next frame of object A, x 2、y2 is the abscissa of the current frame of object B, x' 2、y′2 is the abscissa of the next frame of object B, Respectively representing vectors of the two targets, and theta is the included angle of the two targets.
S53, calculating the relative movement speed between targets, calculating the movement time according to the Euclidean distance result obtained in the step S61 through the frame rate and the counted frame number, and finally obtaining the movement speed, wherein the calculation formula is as follows:
Where frame_num is the set number of frames, fps is the video frame rate, and d is the distance the target moves within the set number of frames.
S6, catch-up judging 1) according to the relative movement speed between the targets calculated in S63, if the speed of the object A is greater than that of the object B and the situation is continuous, the catch-up is considered to be possible, 2) according to the included angle between the movement directions of the two objects calculated in S62, if the included angle between the movement directions in the continuous frames is smaller than the set threshold (the threshold is set to 20 DEG), the object A is moved towards the object B, and 3) if the distance between the object A and the object B is reduced with time, the catch-up behavior is further confirmed.
The foregoing description of the preferred embodiment of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411288242.5A CN119152575B (en) | 2024-09-14 | Identification method of coal mine personnel chasing overhead passenger devices based on computer vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411288242.5A CN119152575B (en) | 2024-09-14 | Identification method of coal mine personnel chasing overhead passenger devices based on computer vision |
Publications (2)
Publication Number | Publication Date |
---|---|
CN119152575A CN119152575A (en) | 2024-12-17 |
CN119152575B true CN119152575B (en) | 2025-04-04 |
Family
ID=
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116681724A (en) * | 2023-04-11 | 2023-09-01 | 安徽理工大学 | Video tracking method and storage medium for mine personnel target based on YOLOv5-deep algorithm |
CN116778410A (en) * | 2023-06-08 | 2023-09-19 | 西安博深安全科技股份有限公司 | Deep learning-based coal mine underground operation personnel detection and tracking method |
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116681724A (en) * | 2023-04-11 | 2023-09-01 | 安徽理工大学 | Video tracking method and storage medium for mine personnel target based on YOLOv5-deep algorithm |
CN116778410A (en) * | 2023-06-08 | 2023-09-19 | 西安博深安全科技股份有限公司 | Deep learning-based coal mine underground operation personnel detection and tracking method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | Vision-based tower crane tracking for understanding construction activity | |
Xiao et al. | A vision-based method for automatic tracking of construction machines at nighttime based on deep learning illumination enhancement | |
CN103366569B (en) | The method and system of real-time grasp shoot traffic violation vehicle | |
CN112767644B (en) | Method and device for early warning fire in highway tunnel based on video identification | |
CN112329671B (en) | Pedestrian running behavior detection method based on deep learning and related components | |
Kim | Visual analytics for operation-level construction monitoring and documentation: State-of-the-art technologies, research challenges, and future directions | |
CN114495421B (en) | Intelligent open type road construction operation monitoring and early warning method and system | |
Wu et al. | Vehicle Classification and Counting System Using YOLO Object Detection Technology. | |
CN114299106A (en) | High-altitude parabolic early warning system and method based on visual sensing and track prediction | |
CN117994700A (en) | Intelligent construction site personnel behavior recognition system and method based on AI intelligent recognition | |
CN119152575B (en) | Identification method of coal mine personnel chasing overhead passenger devices based on computer vision | |
JP7078295B2 (en) | Deformity detection device, deformation detection method, and program | |
CN119229526A (en) | Intelligent identification method of risky behavior violations in power operations based on machine vision | |
CN113744302A (en) | Dynamic target behavior prediction method and system | |
CN119152575A (en) | Recognition method of colliery personnel catching up overhead riding device based on computer vision | |
Kim et al. | Training a visual scene understanding model only with synthetic construction images | |
CN115035543B (en) | Big data-based movement track prediction system | |
Deng et al. | Automatic Vision-Based Dump Truck Productivity Measurement Based on Deep-Learning Illumination Enhancement for Low-Visibility Harsh Construction Environment | |
CN114581863A (en) | Method and system for identifying dangerous state of vehicle | |
CN109740518A (en) | The determination method and device of object in a kind of video | |
CN114119657B (en) | Method, device, computer equipment and storage medium for detecting objects thrown from high altitude | |
CN119068469B (en) | Firework detection and analysis method based on YOLO algorithm and dynamic analysis | |
Mostafa et al. | Automated Vehicle Counting and Speed Estimation Using Yolov8 and Computer Vision | |
KR20090050890A (en) | Behavior Analysis System and Method | |
CN117392706A (en) | Pedestrian detection method and system, data processing device, program, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |