[go: up one dir, main page]

CN119152575B - Identification method of coal mine personnel chasing overhead passenger devices based on computer vision - Google Patents

Identification method of coal mine personnel chasing overhead passenger devices based on computer vision

Info

Publication number
CN119152575B
CN119152575B CN202411288242.5A CN202411288242A CN119152575B CN 119152575 B CN119152575 B CN 119152575B CN 202411288242 A CN202411288242 A CN 202411288242A CN 119152575 B CN119152575 B CN 119152575B
Authority
CN
China
Prior art keywords
overhead
algorithm
coal mine
tracking
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411288242.5A
Other languages
Chinese (zh)
Other versions
CN119152575A (en
Inventor
赵文静
高佳锋
邵福
袁少博
卢宏铭
侯晋红
司斌
曹栋鹏
李翰林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi Sunshine Three Pole Polytron Technologies Inc
Original Assignee
Shanxi Sunshine Three Pole Polytron Technologies Inc
Filing date
Publication date
Application filed by Shanxi Sunshine Three Pole Polytron Technologies Inc filed Critical Shanxi Sunshine Three Pole Polytron Technologies Inc
Priority to CN202411288242.5A priority Critical patent/CN119152575B/en
Publication of CN119152575A publication Critical patent/CN119152575A/en
Application granted granted Critical
Publication of CN119152575B publication Critical patent/CN119152575B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention belongs to the technical field of computer vision and discloses a recognition method of a colliery personnel catch-up overhead riding device based on computer vision, which comprises the following steps of acquiring a real-time video stream of a monitoring point of a scene of a riding overhead riding device, intercepting a plurality of sections of videos of the pedestrian catch-up overhead riding device, intercepting the videos frame by frame and obtaining a data set; the method comprises the steps of preprocessing acquired data, marking the data, identifying personnel and overhead riding devices by adopting an improved algorithm, tracking a plurality of cameras based on a tracking algorithm, calculating motion vectors and distances among tracking targets, judging whether catch-up behaviors exist or not, solving the problems of human resource waste and blind area monitoring caused by manual inspection and monitoring in the past, timely checking illegal operation and potential safety hazards of coal mine workers in the production process, improving the working efficiency of the coal mine, and reducing the risk of accidents caused by the fact that the coal mine personnel do not take the overhead riding devices normally.

Description

Recognition method of colliery personnel catching up overhead riding device based on computer vision
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a recognition method of a colliery personnel catch-up overhead riding device based on computer vision.
Background
Traditional colliery production process often relies on artificial inspection and control, has the problem of manpower resources waste and blind area control, and colliery workman's operation violating regulations and potential safety hazard in the production process can not in time be examined moreover, can reduce colliery work efficiency, and the factor of safety risk rises. Along with the floor application of artificial intelligence technology, intelligent management of the coal mine industry is also becoming a trend. Based on the video image, intelligent recognition analysis of underground monitoring scenes of the coal mine can be realized by deploying an AI algorithm model, and abnormal events, illegal operations, potential safety hazards and the like in the safety production process of the coal mine can be monitored and early warned in real time.
The overhead man-riding device is a device for underground transportation of mines, and consists of a steel wire rope, pulleys, a motor and the like, and can run up and down in inclined shafts or vertical shafts. Through the field practical application of the mine enterprise overhead man-riding device, not only can the production efficiency of mines in China be effectively improved and improved, but also the working intensity of mine workers can be effectively reduced. However, in the practical application scene of the overhead passenger device, a series of accidents caused by the fact that the device is not used normally often exist. The coal mine personnel catch up the overhead riding device, so that the personnel fall down and get on the car, the device generates larger deflection, and the steel wire rope falls down or the rope clip is clamped on the rope supporting wheel, so that safety accidents occur. Therefore, the abnormal behavior is detected, and the alarm result is pushed in time, so that the safety problem can be effectively reduced.
Disclosure of Invention
The invention provides an identification method of a colliery personnel catch-up aerial riding device based on computer vision, which aims to solve the technical problems in the prior art, mainly comprises the steps of acquiring a real-time video stream of a monitoring point of a scene of the riding aerial riding device, performing frame extraction processing on video, preprocessing and marking a cut-out data set, detecting a miner and an independent aerial riding device target in the video by adopting a YOLOv-MCW improved algorithm, tracking the detected target by adopting a target tracking technology, considering the problem that identity loss and switching of the target are easy to occur due to mutual shielding and shielding among the targets in the tracking process, utilizing homography matrixes of two visual angles, realizing the splicing of images of different visual angles, judging the moving speed and the moving direction of the target based on a tracking result, combining the calculated distance of the two targets, and finally judging whether a person has the action of catching up the aerial riding device.
The recognition method for the colliery personnel catching up the overhead riding device based on computer vision comprises the following specific steps:
S1, acquiring image data, namely acquiring a real-time video stream of a monitoring point of a scene of a riding overhead riding device by adopting an existing camera in an actual coal mine application scene, intercepting a plurality of sections of videos of the pedestrian catching the overhead riding device, screening the intercepted videos, and intercepting the videos frame by frame to obtain a data set for subsequent processing.
S2, data set manufacturing, namely respectively carrying out data enhancement preprocessing and data labeling on the acquired data.
Further, step S2 includes the steps of:
S21, data enhancement, namely transforming and expanding the data collected in the step S1 so as to generate more and richer samples, so that the diversity of training samples can be improved, the dependence of the model on specific data distribution can be reduced, and the generalization capability and the robustness of the model can be enhanced.
S22, marking the pictures subjected to data enhancement pretreatment by using LabelImg software on images, namely marking pedestrians and overhead riding devices, namely, a pedestrian category is named as person, the overhead riding devices are named as man_ riding _device, generating txt files with position coordinates and category names after marking, and dividing the marked data set into a training set, a test set and a verification set according to the proportion of 7:2:1.
S3, model training, namely, the method adopts YOLOv-MCW improved algorithm to realize the identification of personnel and overhead riding devices, and YOLOv improved algorithm has significant progress in detection precision and speed, but still has room for improvement under specific application scenes. Considering that the invention carries out target detection in the place where the overhead riding device is taken in the wellhead and the tunnel, the YOLOv algorithm needs to be further optimized, and the overall performance of target detection is improved.
The YOLOv algorithm comprises the following three core points that firstly, a MobileNetV-Large is used for replacing a traditional trunk structure of a yolov network, the light weight and easy deployment of a model are guaranteed, secondly, a channel attention module is introduced, because the installation position of a camera is far away from a detection target, the detection target is small in the whole picture and the underground target is fuzzy, and the small targets are often easy to miss by the detection network, an attention mechanism is added to improve the detection effect of the model on a remote aerial multiplying device and personnel targets, and WIoU v is selected as a frame regression loss function for further optimizing the algorithm model and improving the accuracy of target detection positioning. Through the optimization strategy, the aims of efficiently and accurately detecting personnel and overhead riding devices at the wellhead and underground are finally achieved, and the improved algorithm is named YOLOv-MCW.
Further, step S3 includes the steps of:
S31, the MobileNetV-Large is used for replacing a traditional backbone structure of the yolov network, and the accuracy of the model is almost lost while the lightweight model is ensured. MobileNetV x 5-sized depth convolution is introduced in MobileNetV instead of partial 3 x 3 depth convolution, and a squeze-and-excitation (SE) module and an h-swish (HS) activation function are introduced to improve model accuracy, a yolov-M model is concise, and both performance and speed are excellent.
And S32, an attention mechanism adding module, namely introducing a coordinate attention mechanism (CA) in the neck network, embedding an accurate positioning signal into channel attention, dividing the accurate positioning signal into two feature codes in different directions, wherein one direction is used for retaining a short position signal, the other direction is used for capturing long-range dependence, and the feature codes in the two directions are respectively used for forming a feature image containing a pair of target perception and positioning signals, so that the accurate positioning and target recognition capability of the feature image can be enhanced.
S33, optimizing a loss function, namely, the traditional Intersection over Union (IoU) only considers the overlapping part of the prediction frame and the real frame, and does not consider the area between the prediction frame and the real frame, so that a deviation possibly exists in the evaluation result. The factors of aspect ratio, centroid distance, overlapping area and the like are considered, and the arithmetic calculation power consumption caused by an inverse trigonometric function is reduced. WIoU v3 is more suitable for the fuzzy small target detection task, WIoU v is selected as a loss function of YOLOv-CW, and the specific calculation process of the WIoU v loss function is as follows.
The L WIoUv1 bounding box loss function is defined as follows:
LIoU=1-IoU
LWIoUv1=RWIoULIoU
Wherein IoU is used to measure the degree of overlap between the prediction bounding box and the real bounding box, L IoU is the bounding box loss function, c h and c w represent the height and width of the smallest closed box formed by the prediction and real boxes respectively, AndRepresenting the centroid coordinates of the real box,AndRepresenting the centroid coordinates of the prediction box.
Multiplying the gradient gain r on the basis of L WIoUv1 defines the L WIoUv3 bounding box loss function as follows:
Where r is a non-monotonic focal factor, L WIoUv1 is a bounding box loss function, Is a monotonic focal coefficient and is used for the lens,For the set average bounding box loss value, the ratio of the two values is beta, the sample quality can be assessed through beta, and delta and alpha are hyper-parameters which are manually set by adapting to different models.
WIoU v3 uses a dynamic non-monotonic mechanism to evaluate the quality of the anchor frame, so that the model is more focused on the anchor frame with common quality, and the capability of the model for positioning the object is improved. For a target detection task in a complex underground scene, due to the fact that the target is fuzzy and the target size is small, the detection difficulty is large, and the WIoU v loss function can dynamically optimize the loss weight of a small target so as to improve the detection performance of YOLOv-MCW.
S34, model detection effect and performance evaluation in order to test the improved model detection performance of the invention, the invention uses precision, recall, mAP@0.5, model parameter scale, model calculation amount (floating point operations, FLOPs), model training time and detection speed (unit: frame/S) as evaluation indexes. The above-described formulas of the evaluation index use the parameters TP (predicted as positive sample, actually positive sample), FP (predicted as positive sample, actually negative sample) and FN (predicted as negative sample, actually positive sample).
S4, tracking by multiple cameras based on DeepSort tracking algorithm, when tracking targets, tracking failure can be caused due to mutual shielding among targets and light change in the environment, and further judgment of follow-up catch-up relation among targets is affected. Therefore, it is necessary to effectively use multi-view information to improve the accuracy and stability of target tracking.
S41, calculating homography matrixes corresponding to cameras with different view angles, mapping the image space with the overlapping area by using the homography matrixes, and further processing shielding to complete target matching. The homography matrix may establish correspondence between different images so that the model can infer information in one image from another.
Specifically, the overlapping area between two or more different visual angles is utilized, a shift algorithm is used for extracting key points, the relation between the image space and the world coordinate system is utilized for deducing the spatial relation between the different visual angles, and then the homography matrix between the different visual angles is obtained.
S42, deepSort is a multi-target tracking algorithm, uses a Kalman filtering algorithm to infer the motion quantity at the next moment according to the motion quantity at the current moment, combines the motion quantity information and the appearance information, and uses a Hungary algorithm to match the detection frame and the prediction frame. In each frame, the algorithm first predicts all possible targets using a deep learning model, then data correlates according to the appearance information, and finally updates the trajectory information using a kalman filter.
The kalman filter algorithm is a recursive filter algorithm that continuously improves the estimation of the state by updating the observations.
The hungarian algorithm (Hungarian algorithm) is an optimization algorithm that solves the allocation problem (ASSIGNMENT PROBLEM), the principle being based on maximum matching in graph theory and linear programming methods for finding the best way of task allocation in a given cost matrix, in order to match multiple targets from frame to frame in multi-target tracking, including the appearance of new targets, the disappearance of old targets, and the target id matching of previous and current frames.
The algorithm has excellent performance in complex scenes such as dense pedestrians, shielding and the like, can accurately track the motion trail of the pedestrians, and provides powerful support for acquiring the information of the moving pedestrians in real time. Meanwhile, the excellent performance of realizing multi-camera tracking based on DeepSort tracking algorithm also proves that the combination of the two can realize more efficient and more accurate target tracking.
S43, calculating a homography matrix among videos, splicing the multi-view videos, taking the multi-view videos as input, and predicting data by using Kalman filtering.
And then, combining the homography matrix and the Hungary algorithm to judge the target, thereby realizing target tracking among multiple cameras.
And S5, calculating motion vectors and distances between tracking targets, namely acquiring the position of a central point of an object in each frame based on a tracking result, and calculating the motion direction and speed of the object and Euclidean distance between two points according to the change of the position of the central point in each pair of adjacent frames to obtain the motion information of the object.
The specific steps of the catch-up decision S6 are 1) if the speed of the object a is greater than the object B and this is continuous, it is considered that catch-up is possible, 2) calculating the direction vectors of motion of the two objects, and if the two vector angles are small (i.e. the two direction vectors are nearly parallel, an angle threshold can be set, e.g. 30 degrees), indicating that the object a is moving towards the object B. 3) This further confirms that catch-up behaviour is present if the distance between object a and object B decreases over time.
Compared with the prior art, the invention has the following beneficial effects:
1. Aiming at the problems of fuzzy data and smaller targets of materials collected in a special underground scene of a coal mine, the invention adds a CA attention mechanism to better finish feature extraction, well solves the problems of ambient light interference and multiple and complex targets, and obtains the rise of detection precision.
2. The method and the device improve the accuracy and stability of target tracking, solve the problems of visual angle change, shielding and the like in a multi-camera environment through multi-camera tracking, improve the accuracy and robustness of target tracking, effectively utilize multi-visual angle information, improve the accuracy and stability of target tracking, better cope with challenges such as shielding, light change and the like, and provide powerful support for tracking of real-time miners and aerial riding devices.
3. The identification method for the overhead riding device for the coal mine personnel in the invention solves the problems of human resource waste and blind area monitoring caused by manual inspection and monitoring in the past, and the illegal operation and potential safety hazards of the coal mine personnel in the production process are timely checked, so that the working efficiency of the coal mine is improved, and the accident risk caused by the fact that the coal mine personnel do not take the overhead riding device normally is reduced.
Drawings
FIG. 1 is a schematic overall flow chart of a method for a judging person to catch up with an overhead riding device.
FIG. 2 is a schematic diagram of a working interface for LabelImg software annotation pictures.
FIG. 3 is a schematic diagram of the network architecture of the improved algorithm YOLOV-MCW of the present invention.
Fig. 4 is a block diagram of a CA attention mechanism model.
FIG. 5 is a diagram of an example of the detected targets by YOLOV-MCW algorithm.
FIG. 6 is a diagram of an example of the detected targets by YOLOV-MCW algorithm.
Fig. 7 is a schematic diagram of a Deepsort algorithm multi-camera tracking process.
FIG. 8 is a diagram showing an example of detection tracking in an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, the recognition method for the colliery personnel catching up the overhead riding device based on computer vision specifically comprises the following steps:
S1, acquiring image data, namely acquiring a real-time video stream of a monitoring point of a scene of a riding aerial passenger device by adopting an existing camera in an actual coal mine application scene, intercepting a plurality of sections of videos of the pedestrian catching the aerial passenger device, screening the intercepted videos, deleting the video with a fuzzy target, intercepting the videos frame by using PotPlayer software, and obtaining 1500 pieces of image data required by model detection for subsequent data processing.
S2, data set manufacturing, namely respectively carrying out data enhancement preprocessing and data labeling on the acquired data.
S21, data enhancement refers to a method for generating more and richer samples by utilizing limited data to perform transformation and expansion, and is an effective means for improving the diversity of training samples, reducing the dependence of a model on specific data distribution and enhancing the generalization capability and robustness of the model.
In order to make up for the shortages of the data set, the invention adopts a data enhancement technology to expand the original training set. In this process, a imgaug framework (image enhancement library) was chosen as the implementation tool, and the imgaug framework allows for the introduction of a variety of image transformations that can be applied randomly during the training process to generate image samples with diversity. In the data enhancement process, a series of transformation strategies such as random rotation, horizontal overturn, vertical overturn, random scaling, brightness and contrast adjustment and the like are adopted, and a large number of images with different angles and scales can be generated by randomly applying the transformations during training, so that the image diversity under the real scene can be better simulated. For example, the invention carries out data enhancement by carrying out operations of horizontally turning the image by 90 degrees, rotating the image by 15 degrees clockwise, increasing 5% noise, 25% gray level, 40% brightness and the like, thus increasing the diversity of training data sets and improving the robustness of the model.
S22, marking the pictures subjected to data enhancement pretreatment by using LabelImg software (marking tool) on the images, wherein in the example of FIG. 2, the working interfaces of the pictures of the personnel and the aerial passenger device are marked by using LabelImg software, the category of the pedestrians is named as person, the aerial passenger device can be named as man_ riding _device, and txt files with position coordinates and category names are generated after marking.
In a specific operation process, a labeling frame is obtained by dragging with a mouse in a labelImg tool, then labeling and classifying are carried out on the well-framed area, and the label is named as person and man_ riding _device respectively in data labeling. In order to ensure the accuracy of image annotation in the data set, the annotation standard is uniformly set to be the smallest rectangle capable of containing the image target position before annotation.
The marked data are divided into a training set, a testing set and a verification set according to the proportion of 7:2:1. The training set 1050 pictures, the test set 300 pictures, the verification set 150 pictures and the txt format data set are formed.
S3, model training, namely, the invention adopts an improved YOLOv algorithm (target detection algorithm) to realize the identification of personnel and overhead riding devices, and the YOLOv algorithm has obvious progress in detection precision and speed, but still has room for improvement under specific application scenes. Considering that the invention carries out target detection in the place where the overhead riding device is taken in the wellhead and the tunnel, the YOLOv algorithm needs to be further optimized, and the overall performance of target detection is improved.
The improvement of YOLOv algorithm mainly relates to three core points, namely firstly, mobileNetV-Large is used for replacing a traditional trunk structure of a yolov network, the light weight and easy deployment of a model are guaranteed, secondly, a Channel Attention (CA) module is introduced, as the installation position of a camera is far away from a detection target, the detection target is small in the whole picture and the underground target is fuzzy, the small targets are easy to miss by the detection network, an attention mechanism is added to improve the detection effect of the model on an overhead riding device and personnel targets which are far away, and then WIoU v is selected as a frame regression loss function for further optimizing an algorithm model and improving the accuracy of target detection and positioning. Through the optimization strategy, the high-efficiency and accurate detection personnel and overhead riding device targets at the wellhead and underground are finally realized, and meanwhile, the improved algorithm is named YOLOv-MCW. A schematic diagram of the network architecture of the modified algorithm YOLOV-MCW is shown in fig. 3.
Further, step S3 includes the steps of:
S31, a MobileNetV-Large (a version of MobileNet series third generation of lightweight network model) is used for replacing a traditional trunk structure of the yolov network, so that the model is lightweight, and meanwhile, the accuracy of the model is not lost. MobileNetV (lightweight network model) is introduced into the model MobileNetV, 5-size depth convolution replaces partial 3 x 3 depth convolution, a squeze-and-excitation (SE) module and an h-swish (HS) activation function are introduced to improve model accuracy, a yolov-M (yolov backbone network is changed into MobileNetV3-Large to improve yolov algorithm for short) model is simple, performance and speed are excellent, and model detection effects after the backbone network is changed are shown in the following table 1:
table 1 comparison of model test results after backbone network replacement
Compared with the original YOLOv network, the improved network model has the advantages that under the conditions that the accuracy is reduced by 0.2%, the recall is reduced by 3.6% and the average accuracy is reduced by 0.4%, the detection speed is improved to 90.29FPS, the calculated amount is reduced by 11.4G compared with the original YOLOv model, and finally, the model size is reduced by 8.4MB. Table 1-related data demonstrates the high efficiency and lightweight nature of replacing the backbone network of YOLOv with MobileNetV 3.
S32, adding an attention mechanism module, namely introducing a coordinate attention mechanism (CA) in the neck network, embedding an accurate positioning signal into channel attention, and dividing the accurate positioning signal into two feature codes in different directions, wherein one direction is used for retaining a short position signal, and the other direction is used for capturing long-range dependence, and a model structure diagram of the CA attention mechanism is shown in figure 4. The feature images comprising a pair of target sensing and positioning signals are formed by encoding the feature images respectively, so that the accurate positioning and target recognition capability of the feature images can be enhanced, and the detection effect of the added attention mechanism model is shown in the following table 2:
TABLE 2 comparison of detection results of added CA attention mechanism model
As shown in step S21, after the backbone network is replaced by MobileNetV, although the detection speed and the model size are optimized, the accuracy, recall and average accuracy are all reduced, and the results in table 2 indicate that adding the CA attention mechanism can effectively compensate for the partial index loss caused by replacing the backbone part of the YOLOv network by MobileNetV 3.
S33, optimizing a loss function, namely, taking only the overlapping part of a predicted frame and a real frame into consideration in the traditional Intersection over Union (IoU measurement standard) and not taking the area between the predicted frame and the real frame into consideration, so that a deviation possibly exists in the evaluation result. Considering aspect ratio, centroid distance, overlapping area and other factors, reducing arithmetic calculation power consumption caused by inverse trigonometric function, and excellent dynamic non-monotonic focusing mechanism of WIoU v being more suitable for fuzzy small target detection task, the invention selects WIoU v3 as a loss function of YOLOv-MC (changing yolov backbone network to MobileNetV-Large and improving yolov8 algorithm after adding attention module), and the specific calculation process of WIoU v3 loss function is as follows.
The L WIoUv1 bounding box loss function is defined as follows:
LIoU=1-IoU
LWIoUv1=RWIoULIoU
Wherein IoU is used to measure the degree of overlap between the prediction bounding box and the real bounding box, L IoU is the bounding box loss function, c h and c w represent the height and width of the smallest closed box formed by the prediction and real boxes respectively, AndRepresenting the centroid coordinates of the real box,AndRepresenting the centroid coordinates of the prediction box.
Multiplying the gradient gain r on the basis of L WIoUv1 defines the L WIoUv3 bounding box loss function as follows:
Where r is a non-monotonic focal factor, L WIoUv1 is a bounding box loss function, Is a monotonic focal coefficient and is used for the lens,For the set average bounding box loss value, the ratio of the two is beta, the sample quality can be assessed through beta, and delta and alpha are hyper-parameters which are manually set by adapting to different models, and are set to be 1.9 and 3 in the invention.
WIoU v3 uses a dynamic non-monotonic mechanism to evaluate the quality of the anchor frame, so that the model is more focused on the anchor frame with common quality, and the capability of the model for positioning the object is improved. For a target detection task in a complex underground scene, due to the fact that the target is fuzzy and the target size is small, the detection difficulty is large, and the WIoU v loss function can dynamically optimize the loss weight of a small target so as to improve the detection performance of YOLOv-MCW. The model test effect of the different loss functions is shown in table 3 below:
Table 3 yolov-MC comparison of the results of the detection of the introduction of different loss functions
From the data in Table 3, it can be clearly observed that the YOLOv-MCW improved model provided by the invention has obvious advantages in the detection tasks of underground personnel and overhead riding devices of the coal mine. The improvement algorithm YOLOv-MCW detected personnel and overhead ride targets as shown in fig. 5 and 6, with the upper left corner indicating the detected category and confidence. Compared with the traditional YOLOv target detection algorithm model, the improved model of the invention shows more excellent performance in detecting underground personnel of a coal mine and targets of an overhead riding device.
S34, model detection effect and performance evaluation in order to test the detection performance of the improved model of the invention, the invention uses precision, recall, mAP@0.5, model parameter scale, model calculation amount (floating point operations, FLOPs), model training time and detection speed (unit: frame/S) as evaluation indexes. The above-described formulas of the evaluation index use the parameters TP (predicted as positive sample, actually positive sample), FP (predicted as positive sample, actually negative sample) and FN (predicted as negative sample, actually positive sample).
The accuracy is the ratio of the number of positive samples predicted by the model to the number of all samples detected, calculated as follows:
Where TP represents the number of positive samples of the model prediction and FP represents the number of negative samples of the model prediction.
The recall is the ratio of the number of positive samples correctly predicted by the model to the number of positive samples actually present, and is calculated as follows:
Where TP represents the positive number of samples of the model prediction and FN represents the positive number of samples of the model prediction error.
The Average Precision (AP) is the precision and area under the recall curve, and the average precision mean (mAP@0.5) is the result obtained by a weighted average of the AP values of all sample classes, used to evaluate the detection performance of the model in all classes, where the threshold of the intersection ratio of the prediction and real frames is set to 0.5, with the following formula:
Where N is the sample class, AP represents the precision and area under the recall curve, and @0.5 represents the IoU threshold used when calculating the precision and area under the recall curve is 0.5.
S4, realizing multi-camera tracking based on DeepSort tracking algorithm. When tracking targets, tracking failure is caused by mutual shielding among the targets and light change in the environment, so that subsequent judgment of the catch-up relationship among the targets is affected. Therefore, it is necessary to effectively use multi-view information to improve the accuracy and stability of target tracking. A schematic diagram of a multi-camera tracking procedure based on Deepsort algorithm is shown in fig. 7.
S41, calculating homography matrixes corresponding to cameras with different view angles, mapping the image space with the overlapping area by using the homography matrixes, and further processing shielding to complete target matching. The homography matrix may establish correspondence between different images so that the model can infer information in one image from another.
Specifically, with overlapping regions between two or more different perspectives, keypoints are extracted using the shift algorithm. And deducing the spatial relation between different visual angles by utilizing the relation between the image space and the world coordinate system, and then obtaining the homography matrix between the different visual angles.
S42, deepSort, deepSort is a multi-target tracking algorithm, which uses a kalman filtering algorithm to infer the motion amount at the next moment according to the motion amount at the current moment, and uses a hungarian algorithm to match the detection frame and the prediction frame in combination with the motion amount information and the appearance information. In each frame, the algorithm first predicts all possible targets using a deep learning model, then data correlates according to the appearance information, and finally updates the trajectory information using a kalman filter.
With respect to the kalman filter algorithm, it is a recursive filter algorithm that continuously improves the estimation of the state by updating the observations.
Regarding the hungarian algorithm, hungarian algorithm (Hungarian algorithm) is an optimization algorithm that solves the allocation problem (ASSIGNMENT PROBLEM). The principle is based on a maximum matching and linear programming method in graph theory, and is used for finding the optimal task allocation mode in a given cost matrix, so as to match a plurality of targets between frames in multi-target tracking, wherein the targets comprise the appearance of a new target, the disappearance of an old target and the target id matching of a previous frame and a current frame.
The algorithm has excellent performance in complex scenes such as dense pedestrians, shielding and the like, can accurately track the motion trail of the pedestrians, and provides powerful support for acquiring the information of the moving pedestrians in real time. Meanwhile, the excellent performance of realizing multi-camera tracking based on DeepSort tracking algorithm also proves that the combination of the two can realize more efficient and more accurate target tracking. FIG. 8 shows an example graph of personnel and overhead rides detected using the target detection algorithm and tracking algorithm employed by the present invention, with the upper left corner indicating the tracked target class and class id.
S43, realizing target tracking by multiple cameras, calculating a homography matrix among videos, splicing the multi-view videos, taking the multi-view videos as input, and predicting data by using Kalman filtering.
And then, combining the homography matrix and the Hungary algorithm to judge the target, so that the target tracking among multiple cameras is realized, and the overall flow of the target tracking among the multiple cameras is shown in figure 4.
S5, calculating the distance between the motion vector of the tracking target and the target, acquiring the position of the center point of the object in each frame based on the tracking result, and calculating the motion direction and the motion speed of the object and the Euclidean distance between the two points according to the change of the position of the center point in each pair of adjacent frames, so that the motion information of the object can be obtained.
Further, the step S5 includes the steps of:
S51, calculating Euclidean distance between two targets according to the detected coordinate information, wherein the calculation formula is as follows:
Where (x min1,ymin1,xmax1,ymax1) and (x min2,ymin2,xmax2,ymax2) are the coordinates of two different targets, respectively, (x center1,ycenter1) and (x center2,ycenter2) are the coordinates of the center points of the two targets, and d is the Euclidean distance between the targets.
S52, according to tracking results, respectively calculating the movement directions of the same target in n continuous frames, and calculating the direction included angles of two types of targets of the pedestrian (target A) and the overhead passenger device (target B), wherein the calculation formula is as follows:
Where x 1、y1 is the abscissa of the current frame of object A, x '1、y′1 is the abscissa of the next frame of object A, x 2、y2 is the abscissa of the current frame of object B, x' 2、y′2 is the abscissa of the next frame of object B, Respectively representing vectors of the two targets, and theta is the included angle of the two targets.
S53, calculating the relative movement speed between targets, calculating the movement time according to the Euclidean distance result obtained in the step S61 through the frame rate and the counted frame number, and finally obtaining the movement speed, wherein the calculation formula is as follows:
Where frame_num is the set number of frames, fps is the video frame rate, and d is the distance the target moves within the set number of frames.
S6, catch-up judging 1) according to the relative movement speed between the targets calculated in S63, if the speed of the object A is greater than that of the object B and the situation is continuous, the catch-up is considered to be possible, 2) according to the included angle between the movement directions of the two objects calculated in S62, if the included angle between the movement directions in the continuous frames is smaller than the set threshold (the threshold is set to 20 DEG), the object A is moved towards the object B, and 3) if the distance between the object A and the object B is reduced with time, the catch-up behavior is further confirmed.
The foregoing description of the preferred embodiment of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (5)

1.基于计算机视觉的煤矿人员追赶架空乘装置的识别方法,其特征在于,具体步骤为:1. A method for identifying coal mine personnel chasing overhead passenger devices based on computer vision, characterized in that the specific steps are: 步骤一、采用实际煤矿应用场景下已有摄像头获取乘坐架空乘装置场景的监控点实时视频流,截取多段行人追赶架空乘装置的视频,然后对截取到的视频进行筛选,对视频进行逐帧截取并得到数据集;Step 1: Use existing cameras in actual coal mine application scenarios to obtain real-time video streams from monitoring points of the aerial ride scene, intercept multiple videos of pedestrians chasing the aerial ride, and then filter the intercepted videos, intercept the videos frame by frame, and obtain a data set; 步骤二、对步骤一获取的数据分别进行数据增强预处理及数据标注;Step 2: Perform data enhancement preprocessing and data labeling on the data obtained in step 1; 步骤三、采用YOLOv8-MCW改进算法来识别人员以及架空乘装置;Step 3: Use the YOLOv8-MCW improved algorithm to identify people and overhead passenger devices; 其中,YOLOv8-MCW改进算法具体包括:使用MobileNetV3-Large作为主干结构,引入5×5大小的深度卷积、Squeeze-and-excitation模块和h-swish激活函数;The YOLOv8-MCW improved algorithm specifically includes: using MobileNetV3-Large as the backbone structure, introducing 5×5 deep convolution, Squeeze-and-excitation module and h-swish activation function; 在颈部网络中引入坐标注意力机制并将定位信号嵌入在通道注意力中,划分出二种不同方向的特征编码,其中一个方向用来保留短位置信号,另一个方向用来捕获长程依赖,分别编码形成包含一对目标感知和定位信号的特征图像;The coordinate attention mechanism is introduced into the neck network and the positioning signal is embedded in the channel attention. Two feature codes with different directions are divided, one of which is used to retain short position signals and the other is used to capture long-range dependencies. They are encoded separately to form a feature image containing a pair of target perception and positioning signals. 选择WIoUv3作为YOLOv8-MCW的损失函数,WIoUv3损失函数具体计算过程如下:Select WIoUv3 as the loss function of YOLOv8-MCW. The specific calculation process of WIoUv3 loss function is as follows: LWIoUv1边界框损失函数定义如下所示:The LWIoUv1 bounding box loss function is defined as follows: LIoU=1-IoUL IoU = 1-IoU LWIoUv1=RWIoULIoU L WIoUv1 =R WIoU L IoU 式中,IoU用于衡量预测边界框与真实边界框之间的重叠程度,LIoU为边界框损失函数,ch表示预测框和真实框形成的最小封闭框的高度,cw表示预测框和真实框形成的最小封闭框的宽度,表示真实框的质心坐标,表示预测框的质心坐标;Where IoU is used to measure the overlap between the predicted bounding box and the true bounding box, L IoU is the bounding box loss function, ch represents the height of the minimum enclosed box formed by the predicted box and the true box, cw represents the width of the minimum enclosed box formed by the predicted box and the true box, and represents the centroid coordinates of the real box, and Represents the centroid coordinates of the prediction box; 在LWIoUv1的基础上乘以梯度增益r定义LWIoUv3边界框损失函数,如下式所示:The LWIoUv3 bounding box loss function is defined by multiplying the gradient gain r on the basis of LWIoUv1 , as shown in the following formula: 式中,r为非单调焦点因子,LWIoUv1为边界框损失函数,为单调焦点系数,为设置的平均边界框损失值,两者的比值为β,通过β即可评定样本质量,δ和α均为适应不同模型手动设置的超参数;Where r is the non-monotonic focus factor, LWIoUv1 is the bounding box loss function, is the monotone focus coefficient, is the average bounding box loss value set, the ratio of the two is β, and the sample quality can be evaluated by β. Both δ and α are hyperparameters manually set to adapt to different models; 使用精度、召回率、mAP@0.5、模型参数规模、模型计算量、模型训练时间和检测速度作为评估指标;Use precision, recall, mAP@0.5, model parameter size, model computation, model training time, and detection speed as evaluation indicators; 步骤四、基于DeepSort追踪算法实现多摄像头之间的目标追踪;Step 4: Realize target tracking between multiple cameras based on DeepSort tracking algorithm; 步骤五、计算追踪目标间的运动矢量和距离;Step 5: Calculate the motion vector and distance between the tracking targets; 步骤六、判定是否存在追赶行为。Step 6: Determine whether there is any pursuit behavior. 2.根据权利要求1所述的基于计算机视觉的煤矿人员追赶架空乘装置的识别方法,其特征在于,在步骤二中,具体步骤如下:2. The method for identifying coal mine personnel chasing overhead passenger devices based on computer vision according to claim 1 is characterized in that, in step 2, the specific steps are as follows: 步骤S21、数据增强预处理:对步骤一中搜集到的数据进行变换和扩充,生成新的样本;Step S21, data enhancement preprocessing: transform and expand the data collected in step 1 to generate new samples; 步骤S22、数据标注:在进行数据增强预处理后的图片上对行人进行标注并将类别命名为person,在进行数据增强预处理后的图片上对架空乘装置进行标注并将类别命名为man_riding_device,标注后会生成带有位置坐标和类别名称的txt文件,对标注后的数据集按7:2:1的比例划分为训练集、测试集和验证集。Step S22, data labeling: label pedestrians on the image after data enhancement preprocessing and name the category as person, label overhead riding devices on the image after data enhancement preprocessing and name the category as man_riding_device. After labeling, a txt file with position coordinates and category name will be generated. The labeled data set is divided into training set, test set and validation set in a ratio of 7:2:1. 3.根据权利要求2所述的基于计算机视觉的煤矿人员追赶架空乘装置的识别方法,其特征在于,在步骤四中,具体步骤如下:3. The method for identifying coal mine personnel chasing overhead passenger devices based on computer vision according to claim 2, characterized in that in step 4, the specific steps are as follows: 步骤S41、计算不同视角摄像头对应的单应性矩阵,利用单应性矩阵对具有重叠区域的的图像空间进行映射,对遮挡进行处理并完成目标匹配;Step S41, calculating the homography matrix corresponding to the cameras with different viewing angles, using the homography matrix to map the image space with overlapping areas, processing the occlusion and completing the target matching; 步骤S42、使用卡尔曼滤波算法根据当前时刻的运动量推测下一时刻的运动量,结合运动量信息以及外观信息,使用匈牙利算法对检测框以及预测框进行匹配;Step S42: Use the Kalman filter algorithm to infer the amount of motion at the next moment based on the amount of motion at the current moment, combine the motion amount information and the appearance information, and use the Hungarian algorithm to match the detection frame and the prediction frame; 步骤S43、计算视频间的单应性矩阵,将多视角视频进行拼接,并将拼接后的视频作为输入,利用卡尔曼滤波进行预测数据,再结合单应矩阵和匈牙利算法对目标进行判定,实现多摄像头之间的目标追踪。Step S43, calculate the homography matrix between videos, stitch the multi-view videos, and use the stitched videos as input, use Kalman filtering to predict data, and then combine the homography matrix and Hungarian algorithm to determine the target, so as to achieve target tracking between multiple cameras. 4.根据权利要求3所述的基于计算机视觉的煤矿人员追赶架空乘装置的识别方法,其特征在于,在步骤五中,具体计算步骤为:基于追踪结果获取每一帧中物体的中心点位置,在每一对相邻帧中,根据中心点位置的变化计算物体的运动方向、速度、追踪目标间的欧氏距离,得到物体的运动信息。4. According to the computer vision-based identification method of coal mine personnel chasing overhead passenger devices according to claim 3, it is characterized in that in step five, the specific calculation steps are: based on the tracking results, the center point position of the object in each frame is obtained, and in each pair of adjacent frames, the movement direction, speed, and Euclidean distance between the tracking targets of the object are calculated according to the changes in the center point position to obtain the movement information of the object. 5.根据权利要求4所述的基于计算机视觉的煤矿人员追赶架空乘装置的识别方法,其特征在于,在步骤六中,具体判定方法为:1)、若物体A的速度大于物体B的速度,且这种情况是持续的,认为有可能发生追赶;2)、计算物体A和物体B的运动方向矢量,若两个矢量角度小于设定值,则判定物体A在朝向物体B运动;3)、若物体A和物体B之间的距离随时间而减小,即判定存在追赶行为。5. According to the computer vision-based identification method of coal mine personnel chasing overhead passenger devices as described in claim 4, it is characterized in that in step six, the specific determination method is: 1) If the speed of object A is greater than the speed of object B, and this situation is continuous, it is considered that there is a possibility of chasing; 2) Calculate the movement direction vectors of object A and object B. If the angle of the two vectors is less than the set value, it is determined that object A is moving towards object B; 3) If the distance between object A and object B decreases over time, it is determined that there is a chasing behavior.
CN202411288242.5A 2024-09-14 Identification method of coal mine personnel chasing overhead passenger devices based on computer vision Active CN119152575B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411288242.5A CN119152575B (en) 2024-09-14 Identification method of coal mine personnel chasing overhead passenger devices based on computer vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411288242.5A CN119152575B (en) 2024-09-14 Identification method of coal mine personnel chasing overhead passenger devices based on computer vision

Publications (2)

Publication Number Publication Date
CN119152575A CN119152575A (en) 2024-12-17
CN119152575B true CN119152575B (en) 2025-04-04

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681724A (en) * 2023-04-11 2023-09-01 安徽理工大学 Video tracking method and storage medium for mine personnel target based on YOLOv5-deep algorithm
CN116778410A (en) * 2023-06-08 2023-09-19 西安博深安全科技股份有限公司 Deep learning-based coal mine underground operation personnel detection and tracking method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681724A (en) * 2023-04-11 2023-09-01 安徽理工大学 Video tracking method and storage medium for mine personnel target based on YOLOv5-deep algorithm
CN116778410A (en) * 2023-06-08 2023-09-19 西安博深安全科技股份有限公司 Deep learning-based coal mine underground operation personnel detection and tracking method

Similar Documents

Publication Publication Date Title
Yang et al. Vision-based tower crane tracking for understanding construction activity
Xiao et al. A vision-based method for automatic tracking of construction machines at nighttime based on deep learning illumination enhancement
CN103366569B (en) The method and system of real-time grasp shoot traffic violation vehicle
CN112767644B (en) Method and device for early warning fire in highway tunnel based on video identification
CN112329671B (en) Pedestrian running behavior detection method based on deep learning and related components
Kim Visual analytics for operation-level construction monitoring and documentation: State-of-the-art technologies, research challenges, and future directions
CN114495421B (en) Intelligent open type road construction operation monitoring and early warning method and system
Wu et al. Vehicle Classification and Counting System Using YOLO Object Detection Technology.
CN114299106A (en) High-altitude parabolic early warning system and method based on visual sensing and track prediction
CN117994700A (en) Intelligent construction site personnel behavior recognition system and method based on AI intelligent recognition
CN119152575B (en) Identification method of coal mine personnel chasing overhead passenger devices based on computer vision
JP7078295B2 (en) Deformity detection device, deformation detection method, and program
CN119229526A (en) Intelligent identification method of risky behavior violations in power operations based on machine vision
CN113744302A (en) Dynamic target behavior prediction method and system
CN119152575A (en) Recognition method of colliery personnel catching up overhead riding device based on computer vision
Kim et al. Training a visual scene understanding model only with synthetic construction images
CN115035543B (en) Big data-based movement track prediction system
Deng et al. Automatic Vision-Based Dump Truck Productivity Measurement Based on Deep-Learning Illumination Enhancement for Low-Visibility Harsh Construction Environment
CN114581863A (en) Method and system for identifying dangerous state of vehicle
CN109740518A (en) The determination method and device of object in a kind of video
CN114119657B (en) Method, device, computer equipment and storage medium for detecting objects thrown from high altitude
CN119068469B (en) Firework detection and analysis method based on YOLO algorithm and dynamic analysis
Mostafa et al. Automated Vehicle Counting and Speed Estimation Using Yolov8 and Computer Vision
KR20090050890A (en) Behavior Analysis System and Method
CN117392706A (en) Pedestrian detection method and system, data processing device, program, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant