CN114494040A

CN114494040A - Image data processing method and device based on multi-target detection

Info

Publication number: CN114494040A
Application number: CN202111655676.0A
Authority: CN
Inventors: 王姣; 谷丰强; 李东昌; 马静雅
Original assignee: Beijing Kedong Electric Power Control System Co Ltd
Current assignee: Beijing Kedong Electric Power Control System Co Ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-05-13

Abstract

The invention discloses an image data processing method and device based on multi-target detection, wherein the method comprises the steps of obtaining an image, preprocessing the image, extracting image characteristics from a network through a basic convolution spirit to obtain a characteristic diagram of the image; generating a large number of default suggestion boxes by the feature graph through a pre-constructed regional suggestion network, and classifying the feature graph and the suggestion boxes; the classified suggestion boxes are subjected to duplication removal based on a non-maximum area suppression algorithm to complete processing, and the brightness and the contrast of the image and the adaptability to the image in a complex environment are improved through an adaptive preprocessing algorithm; the method comprises the steps that an area-based A-NMS algorithm is used for solving the problem of single target and multiple detection frames so as to improve the discrimination capability of the algorithm on small feature differences; and the segmentation detection method is adopted to improve the recognition capability of the algorithm on the small target, reduce the omission ratio of the distant view target and obtain more accurate fault detection on multiple targets in the aerial image under the complex background.

Description

Image data processing method and device based on multi-target detection

Technical Field

The invention relates to an image data processing method and device based on multi-target detection, and belongs to the technical field of data processing.

Background

The electric power safety is the guarantee of national economy smooth operation. Because the transmission line is exposed in the field for a long time and the environment is severe, the transmission line needs to be regularly inspected to find out faults or hidden trouble parts for timely replacement. At present, the traditional manual line patrol mode is gradually replaced by an unmanned aerial vehicle and a monitoring camera, and an electric power worker does not need to patrol the line in person but only needs to identify pictures in a monitoring room. In the face of massive return images, pure manual identification consumes a great deal of manpower and energy, and therefore how to realize automatic identification of faults by using computer technology is a popular topic of current researchers.

Early researchers mostly use the traditional image recognition and machine learning method to perform data mining on the pictures of the high-voltage transmission line shot by the unmanned aerial vehicle, and then perform fault location and detection on the information obtained by mining. Taking an insulator as an example, the sun, promotion and the like propose a slope model by utilizing the appearance characteristic of the insulator, the model is used for matching on an image, and an area which accords with the characteristic of the model is determined as the insulator; contour matching is carried out on H component images in HSV color space of images extracted by Zhang Feng Yu and the like, binary segmentation is carried out on insulator images by utilizing a pixel statistical method such as Zhao Jun and the like, the insulator is highlighted as a target, and then texture features such as gray level co-occurrence matrix, invariant distance, wavelet coefficient and the like are extracted to describe the insulator. And after the characteristic vector is obtained, building a neural network model, such as an RBF neural network or a BP neural network, for fault judgment. The model or algorithm adopted above is designed specifically only for a certain class of targets, and cannot identify multiple targets simultaneously. And under the complex background that the target and the environment are integrated, the algorithm has poor effect and insufficient adaptability when the foreground and the background are segmented.

The deep learning theory was originally proposed by Hinton et al in 2006, and through years of development, a great breakthrough was made in the fields of Convolutional Neural Networks (CNN) and target detection. Wan uses unmanned aerial vehicle to patrol and examine the picture, regards common electric power part as the target, has tested the effect that multiple target detection algorithm positions and classifies the part respectively, has reached 92.7% correct rate, but has not realized the fault discrimination.

Disclosure of Invention

The present invention is directed to overcome the deficiencies in the prior art, and provides a method and an apparatus for processing image data based on multi-target detection, so as to solve the above problems.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

in a first aspect, the present invention provides an image data processing method based on multi-target detection, including:

acquiring an image, preprocessing the image, and extracting image features from a network through a basic convolution spirit to obtain a feature map of the image;

generating a large number of default suggestion boxes by the feature graph through a pre-constructed regional suggestion network, and classifying the feature graph and the suggestion boxes;

and (4) carrying out duplicate removal on the classified suggestion boxes based on an area non-maximum value inhibition algorithm to complete the processing.

Further, the preprocessing the image includes: the brightness and contrast of the image are adjusted to appropriate intervals.

Further, the adjusting the brightness and the contrast of the image to the proper intervals includes:

the standard deviation of the pixels in the image is used to represent the contrast, and the formula is as follows:

wherein

Represents the pixel mean of the image, where X ∈ R^r×c×3X is a 3-channel color image with length r and width c;

finding out the best segmentation point of the foreground and the background by adopting an OTSU algorithm, finding out a gray level T to maximize the inter-class variance of the foreground and the background, traversing 256 values from 0 to 255 by using a gray level image to find out the best segmentation point T, and respectively calculating the mean value m of all pixels greater than the segmentation point by using a formula (2) and a formula (3)₀And all pixel means m smaller than the division point₁：

m₀＝mean(Image[Image>T]) (2)

m₁＝mean(Image[Image≤T]) (3)

They represent the pixel levels of the foreground and the background respectively, and are used as the judgment basis of the image brightness level, and then the difference between the two is used as the quantization index of the contrast intensity:

const＝m₀-m₁ (4)

when const is less than 80, m is low contrast₀Less than 50 belongs to low luminance background, and the magnitude of the contrast scaling factor α, and the magnitude of the pixel value shift offset β are determined by equation (5):

β＝C_β-α*m₁ (5)

in the formula C_αIs the expected value of contrast, C_βThe expected value of the background mean value is a random value outside the abnormal interval, the adjusting function adopts a method in OpenCV, namely, each pixel value in the original image is multiplied by a scaling coefficient alpha, and then a bias coefficient beta is added:

newImage＝α*Image+β (6)

the truncation processing is performed on the value beyond the image range, the value greater than 255 is set to 255, and the value less than 0 is set to 0.

Further, a large number of default suggestion boxes are generated by passing the feature map through a pre-constructed regional suggestion network, and the feature map and the suggestion boxes are classified, wherein the classification comprises the following steps:

the feature map passes through two branches, one branch is called a regional suggestion network RPN and is used for generating suggestion frames, and preliminary target frame regression is carried out; the other branch performs RoI pooling of feature maps and suggestion boxes, followed by classification and fine regression through the full connectivity layer, where,

the area suggestion network firstly generates a large number of default frames, then deletes target frames exceeding the image boundary, adopts non-maximum value to inhibit the remaining target frames to remove a large number of overlapped frames, and finally selects the first N detection frames to be sent to the next network.

Further, the area-based non-maximum suppression algorithm performs deduplication on the classified suggestion boxes, and includes:

obtaining a classification set C in the suggestion frame, extracting a detection frame set B and a score set S belonging to the insulator according to the classification set C, then obtaining the areas of all frames, comparing the frame with the largest area with other frames, and calculating the proportion of the small-area box covered by the large-area box, as shown in formula (7):

if the IoS is larger than a certain threshold value, the IoS and the IoS are considered to be overlapped, then the scores are compared, if the difference of the scores of the IoS and the IoS is smaller than a certain value, the detection frame with small area is deleted, otherwise, the detection frame with low score is deleted.

Further, the method also comprises the following steps: and (4) segmenting the characteristic graph by using a segmentation detection method, and detecting the segmented subgraph.

Further, the segmenting the feature map by using a segmentation detection method, and detecting the segmented subgraph includes:

dividing a special diagnosis picture into four parts by taking 1/4 and 3/4 positions of a horizontal and vertical axis as dividing points, wherein the length and the width of each divided sub-picture are 3/4 and the area is 9/16 of the original picture respectively;

and amplifying the segmented image to the size of the original image, so that the area of the small target in the original image is expanded, and detecting the expanded part.

Further, the method also comprises the following steps: the detection result of the expanded part detection is multiplied by 0.75 for scaling, and then an offset of 0.25 is added according to the difference of the positions.

Further, the method also comprises the following steps: and performing target frame fusion processing on the incomplete detection frame obtained by crossing the targets of the multiple segmentation subgraphs, and fusing the small-area frame into the frame with the largest current area before deleting the small-area frame.

In a second aspect, the present invention provides an image data processing apparatus based on multi-target detection, comprising:

the preprocessing unit is used for acquiring an image, preprocessing the image, and extracting image characteristics from the network through a basic convolution spirit to obtain a characteristic diagram of the image;

the classification unit is used for generating a large number of default suggestion boxes through a pre-constructed regional suggestion network on the feature map, and classifying the feature map and the suggestion boxes;

and the duplication removing unit is used for removing duplication of the classified suggestion boxes based on the area non-maximum suppression algorithm to finish processing.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides an image data processing method and device based on multi-target detection, which provides three improvement methods on the basis of analyzing the defects of fault detection of a conventional algorithm, wherein the three improvement methods comprise an adaptive preprocessing algorithm to improve the brightness and contrast of an image and the adaptability to a complex environment image; the method comprises the steps that an area-based A-NMS algorithm is used for solving the problem of single target and multiple detection frames so as to improve the discrimination capability of the algorithm on small feature differences; and the segmentation detection method is adopted to improve the recognition capability of the algorithm on the small target and reduce the omission ratio of the long-range target. Therefore, the fault detection of multiple targets in the aerial image under the complex background is more accurate.

Drawings

FIG. 1 is a flow chart of a method for processing image data based on multi-target detection according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of image segmentation provided by an embodiment of the present invention;

FIG. 3 is a graphical illustration of the sensitivity of various aspects provided by embodiments of the present invention to the hyperparameter T in the A-NMS and NMS;

fig. 4 is a schematic diagram illustrating an influence of segmentation detection and non-segmentation detection on 6 types of targets according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

Example 1

The embodiment introduces an image data processing method and device based on multi-target detection, which includes:

acquiring an image, preprocessing the image, extracting image characteristics from a network through a basic convolution spirit to obtain a characteristic diagram of the image;

and (4) carrying out duplicate removal on the classified suggestion boxes based on an area non-maximum value inhibition algorithm to finish the processing.

The application process of the image data processing method and device based on multi-target detection provided by the embodiment specifically relates to the following steps:

firstly, extracting the characteristics of an image by the image through a basic Convolutional Neural Network (CNN) to obtain a characteristic diagram of the image, then, enabling the characteristic diagram to pass through two branches, wherein one branch is called a region suggestion network (RPN) (region suggestion network) and is used for generating a suggestion frame and carrying out primary target frame regression; and the other branch performs RoI pooling treatment on the feature map and the suggestion box, and then performs classification and fine regression through a full connection layer. In the algorithm, an RPN is a core module, a large number of default frames are generated, then target frames beyond the image boundary are deleted, non-maximum suppression (NMS) is adopted for the remaining target frames to remove a large number of overlapped frames, and finally the first N detection frames are selected and sent to the next step of network.

And secondly, preprocessing the image, and designing a method for automatically detecting the brightness and the contrast of the image, wherein the method can self-adaptively adjust the brightness and the contrast of the image to a proper interval.

Contrast refers to the magnitude of the difference between bright and dark pixels in an image. There are many ways to quantify the contrast of an image, as equation (1) uses the standard deviation [20] of pixels in an image to represent the contrast:

wherein

Represents the pixel mean of the image, where X ∈ R^r×c×3And X is a 3-channel color image with length r and width c.

The embodiment first finds the best segmentation point of the foreground and the background, and the OTSU algorithm which is most commonly used for image segmentation is adopted. The algorithm classifies the foreground and the background into two categories, and the core idea is to find a gray level T so as to maximize the inter-category variance of the foreground and the background. Using gray level image, traversing 256 values from 0 to 255, finding out optimum dividing point T, and respectively calculating all pixel mean values m greater than the dividing point by using formula (2) and formula (3)₀And all pixel means m smaller than the division point₁：

m₀＝mean(Image[Image>T]) (2)

m₁＝mean(Image[Image≤T]) (3)

They represent the pixel levels of the foreground and background, respectively, and are used as the basis for determining the brightness level of the image. Then, taking the difference between the two as a quantization index of the contrast intensity:

const＝m₀-m₁ (4)

this example considers const less than80 belong to the low contrast, m₀Less than 50 belongs to a low brightness background. The magnitude of the contrast scaling factor α, and the magnitude of the pixel value shift offset β are determined by equation (5):

β＝C_β-α*m₁ (5)

where C is_αIs the expected value of contrast, C_βIs the expected value of the background mean, both are random values outside the abnormal interval. The adjusting function adopts a method in OpenCV, namely multiplying each pixel value in the original image by a scaling coefficient alpha, and adding a bias coefficient beta:

newImage＝α*Image+β (6)

the truncation processing is performed on the value beyond the image range, the value greater than 255 is set to 255, and the value less than 0 is set to 0. It should be noted that, in the present embodiment, when determining the scaling factor α and the offset β, the image is not adjusted to a uniform contrast and brightness level, but is randomly selected within a certain suitable interval, so as to use this preprocessing operation as a part of data enhancement, increase the data size, and improve the generalization capability of the model.

Step three, based on the area non-maximum value inhibition, carrying out duplicate removal on the classified suggestion boxes;

in the target detection algorithm, in default boxes with various scales and various aspect ratios, which are generated on some feature maps, a large number of detection boxes have high overlapping degree, and if all the detection boxes are sent to a subsequent network, the calculation amount of the network is increased, and errors can also occur. The performance of the detector is affected to some extent by the performance of the NMS algorithm, and the Soft-NMS proposed by Navaneth Bodla et al improves the performance of the detector by improving the NMS algorithm.

Conventional NMS can only handle target boxes belonging to the same class. However, in the task scenario of the embodiment, the perfect insulator and the broken insulator are labeled as two types of targets, and the perfect shockproof hammer and the broken shockproof hammer are also labeled as two types of targets, so that the same target may output two or more labels.

For a target, the detector is expected to output only one label, and the redundant label is removed, which can not be done by the traditional NMS algorithm, so the embodiment proposes an area-based non-maximum suppression algorithm (A-NMS) for the problem.

The a-NMS algorithm treats perfect insulators and broken string insulators as the same class, and perfect shakeproof hammers and broken shakeproof hammers as the same class, so that classification information of the detection box needs to be obtained before applying the algorithm. There are two NMSs in the Fast RCNN, RPN and Fast RCNN, respectively, and since there is only a score to detect whether a box belongs to a target in RPN and there is no specific classification, this algorithm only replaces NMS in Fast RCNN. Specifically, the algorithm firstly extracts a detection frame set B and a score set S belonging to the insulator according to a classification set C, then obtains the areas of all frames, and compares the frame with the largest area with other frames. Unlike conventional NMS algorithm calculation IoU, what is calculated here is the inter of small area coverage (IoS), i.e., the proportion of small area box covered by large area box, as shown in equation (7):

if the IoS is larger than a certain threshold value, the IoS and the IoS are considered to be overlapped, then the scores are compared, if the difference of the scores of the IoS and the IoS is smaller than a certain value, the detection frame with small area is deleted, otherwise, the detection frame with low score is deleted. The processing of the shockproof hammer is similar.

The algorithm is also the basis of the fourth section target frame fusion algorithm, and the specific content will be expressed in the fourth section.

Step four, segmentation detection and target frame fusion;

the CNN plays a role of an image feature extractor in the detector, and the size of the feature map is gradually reduced from the bottom layer to the top layer, in other words, the higher the number of layers is, the larger the receptive field of a single pixel point in the feature map is. State of the art CNN networks, such as ResNet, densnet, etc., are mostly scaled by at least a factor of 32 before the pooling layer. If the zoom factor is reduced, the network may not extract the high-level semantic features of the image, which is not beneficial to improving the performance of the classifier. SSDs generate anchor points on a multi-layer signature to deal with small target detection issues.

This embodiment seeks a method for balancing time and precision, i.e. a segmentation detection method, outside the fast RCNN framework. It should be noted that the size of the input picture of the target detection algorithm is not fixed, and the larger the input size is, the more beneficial it is to detect the small target, but at the same time, the longer the inference time of the model is, and the increase is much larger than the linearity. In order to accelerate the inference speed, the present embodiment uniformly sets the length of the short side of the input image to 600. Under the parallel processing framework, a plurality of pictures can be reasoned at the same time, so that the time consumption caused by segmentation detection is not increased.

As shown in fig. 2, the present embodiment takes the positions 1/4 and 3/4 of the horizontal and vertical axes as the dividing points, and divides a picture into four parts. The length and width of each sub-image after splitting are 3/4 of the original image and the area is 9/16 of the original image and is about 1/2. Because the short edge of the model inlet is set to be 600, the segmented image needs to be enlarged to the size of the original image, so that the area of the small target in the original image is enlarged to about 2 times, and the identification capability of the detector for the small target can be effectively improved.

In the fusion stage, the coordinates in the sliced image need to be converted into the coordinates in the original image. Since the sub-images are enlarged to the size of the original image after the detection, the detection result of each sub-image needs to be scaled by multiplying 0.75, and then adding an offset of 0.25 according to the position. Equation (8) is a coordinate transformation matrix equation of the upper right sub-graph, and since the image is shifted in the X-axis compared to the origin, an offset needs to be added, and the other positions are similar.

And (4) marking all the conversion results of the four subgraphs on the original graph, wherein a plurality of detection frames exist in the same target, and meanwhile, the area is the dominant factor, which is obviously the place where the A-NMS plays a role. However, the A-NMS cannot be directly used for target frame fusion, because the original algorithm needs to delete the detection frame with small area, and the target spanning multiple segmentation subgraphs can obtain an incomplete detection frame. Therefore, before deleting the small area box, the embodiment fuses the small area box into the box with the largest current area, as shown in equations (9) to (12).

b_large[x_min]＝min(b_large[x_min],b_small[x_min]) (9)

b_large[y_min]＝min(b_large[y_min],b_small[y_min]) (10)

b_large[x_max]＝min(b_large[x_max],b_small[x_max]) (11)

b_large[y_max]＝min(b_large[y_max],b_small[y_max]) (12)

The target frame fusion algorithm also needs to set a hyper-parameter threshold T. Considering that the most extreme phenomenon among all possible results is that an object spans the whole coordinate axis without loss of generality, assuming that the x-axis is used, after segmentation, the object will appear 3/4 in each of the left and right segmented subgraphs, and the intersection of the two subgraphs is 1/2, so that the detection frame in the left subgraph in the detection result covers 2/3 of the detection frame in the right subgraph. In addition, in any other case, the block having the largest area always covers the area of the block having the smaller area 2/3 or more corresponding to the plurality of detection blocks of the same object, so that the threshold T may be set to 0.67 at this time, and since the extreme case is less likely to occur, it is reasonable to set the threshold to 0.7.

Results of the experiment

In this example, FasterRCNN was selected as the basic model in conjunction with the ResNet101 network structure, and all experiments were controlled with this model. According to the three improved algorithms provided by the embodiment, experiments of different combination schemes are performed in this section, and the influence and reasons of the different combination schemes and different parameters on the algorithm performance are analyzed. The experimental data set is derived from daily inspection photos provided by an electric power company, and the training data set and the verification data set are divided according to a ratio of 8: 2.

A-NMS threshold selection

The choice of threshold T is crucial, both for the conventional NMS algorithm and for the a-NMS algorithm proposed in this embodiment. In this section of experiment, the performance of the algorithm was measured by using the average precision average (mAP) of the general evaluation criteria of the target detection algorithm. As shown in fig. 3, the threshold T is set at intervals of 0.1, and is set at intervals of 0.3 to 0.9. The black line in the graph is the test result of the basic model, and the curves of other colors are different and represent the trend of the mAP value of different combination schemes along with the change of the threshold value T. It is clear that the area where a-NMS performance performed best is in the region of 0.7 to 0.9 and does not vary significantly in the region of 0.6 to 0.9. This is because the traditional NMS algorithm calculates IoU and the a-NMS calculates IoS, and when the smaller area box is covered by the larger area box for the most part, the algorithm can only identify the two as the same target, which is determined by the image characteristics of the good or bad insulator and the good or bad damper.

In the comparative experiments hereafter, in order to avoid multivariate, the NMS threshold was uniformly set to the optimum value, i.e., the conventional NMS threshold was set to 0.5 and the a-NMS threshold was set to 0.7, which resulted in the best results.

Segmentation detection

In order to verify the effectiveness of the segmentation detection method on the small target detection, fig. 4 shows the result of the comparison experiment between the segmentation detection and the non-segmentation detection. In the experiment, four sub-images of the upper left, the upper right, the lower left and the lower right are cut from each image, multiple GPUs are adopted for parallel calculation, and then target frame fusion is carried out on detection results of 4 sub-images to obtain a final detection image. It can be seen that the segmentation detection scheme brings about performance improvement of 7% and 9% for the detection of the good and bad vibration dampers respectively. The shakers are superior to other objects because they occupy a smaller area in the picture and are more easily detected by the detector after being cut and enlarged. At the same time, the number of the vibration dampers is considerable in the power components, so this solution is of great importance. In addition, the size of two types of targets of the climbing pedestrian and the bird nest is moderate in the original image, the characteristics are obvious, the targets are easy to detect, and therefore the performance is not obviously improved. However, for the insulator, the perfect class is improved by 3%, but the fault class is reduced by more than 1%, because sometimes the insulator occupies a large area in the picture, and the target frame fusion algorithm cannot be accurate, which is also a place where the algorithm still needs to be further improved. In a whole, the segmentation detection scheme still has obvious improvement effect on the detection effect.

Analysis of integrated results

The section will discuss the detection effect of different combination schemes of three improved algorithms proposed by the embodiment, and evaluate the algorithms by adopting two indexes, namely, a target detection general evaluation index average precision mAP and a recall rate call, because the power system routing inspection relates to a safety problem, a detector can find all targets expected, an adaptive preprocessing scheme does not relate to an NMS threshold, and the experimental result will be discussed in the section;

TABLE 2 mAP and recall values for different combination schemes

Table 2 shows the mep and recall values of the detectors under different combination schemes, all schemes use fast RCNN in combination with ResNet101 as a basic framework, and it can be seen from experiments in group 1 and group 2 that a single adaptive preprocessing algorithm does not bring a large performance improvement because the scheme is only for images with low contrast, and in data sets and practical application scenarios, this situation is relatively less, so there is only a small improvement. However, the A-NMS is different and is designed for the insulator and the vibration damper, the comparison test of the group 1 and the group 3 shows that the mAP value is increased by 3.9% and the recall value is increased by 3.95% only by independently increasing the A-NMS algorithm, which shows that the A-NMS effectively reduces the false detection rate of the insulator and the vibration damper. The segmentation detection aims to enhance the detection capability of the detector on small targets, and the comparison tests of the 3 rd group and the 5 th group show that the segmentation detection brings about 2.58% of mAP improvement and 3.82% of call improvement. Finally, group 6 experiments show that when the three improvement methods provided by the embodiment are used, compared with a basic model, the performance is obviously improved, wherein the mAP is improved by 7.27%, the recall is improved by 9.03%, and in addition, 95.23% of the recall value basically meets the application requirement.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. An image data processing method based on multi-target detection is characterized by comprising the following steps:

2. The multi-target detection-based image data processing method according to claim 1, characterized in that: the preprocessing the image comprises: the brightness and contrast of the image are adjusted to appropriate intervals.

3. The multi-target detection-based image data processing method according to claim 2, characterized in that: the adjusting the brightness and the contrast of the image to the proper intervals comprises:

wherein

m₀＝mean(Image[Image>T]) (2)

m₁＝mean(Image[Image≤T]) (3)

const＝m₀-m₁ (4)

m is low contrast when const is less than 80₀Less than 50 belongs to low luminance background, and the magnitude of the contrast scaling factor α, and the magnitude of the pixel value shift offset β are determined by equation (5):

newImage＝α*Image+β (6)

and performing truncation processing on values beyond the image range, setting the values larger than 255 as 255, and setting the values smaller than 0 as 0.

4. The multi-target detection-based image data processing method according to claim 1, characterized in that: generating a large number of default suggestion boxes by passing the feature map through a pre-constructed regional suggestion network, and classifying the feature map and the suggestion boxes, wherein the method comprises the following steps of:

the feature map passes through two branches, one branch is called a regional suggestion network RPN and is used for generating suggestion frames, and preliminary target frame regression is carried out; the other branch performs RoI pooling of the feature map and suggestion box, followed by classification and fine regression through the full connectivity layer, wherein,

the area suggestion network firstly generates a large number of default frames, then deletes target frames exceeding the image boundary, adopts non-maximum values to the remaining target frames to inhibit and remove a large number of overlapped frames, and finally selects the first N detection frames to be sent to the next network.

5. The multi-target detection-based image data processing method according to claim 1, characterized in that: the area-based non-maximum suppression algorithm is used for removing the duplicate of the classified suggestion boxes, and comprises the following steps:

6. The multi-target detection-based image data processing method according to claim 1, characterized in that: further comprising: and (4) segmenting the characteristic graph by using a segmentation detection method, and detecting the segmented subgraph.

7. The multi-target detection-based image data processing method according to claim 1, characterized in that: the method for segmenting the characteristic graph by using a segmentation detection method and detecting the segmented subgraph comprises the following steps:

8. The multi-target detection-based image data processing method according to claim 1, characterized in that: further comprising: the detection result of the expanded part detection is multiplied by 0.75 for scaling, and then an offset of 0.25 is added according to the difference of the positions.

9. The multi-target detection-based image data processing method according to claim 1, characterized in that: further comprising: and performing target frame fusion processing on the incomplete detection frame obtained by the target spanning multiple segmentation subgraphs, and fusing the small-area frame into the frame with the largest current area before deleting the small-area frame.

10. An image data processing apparatus based on multi-target detection, characterized by comprising: