[go: up one dir, main page]

CN117037064B - Detection method and system for illegal land occupation and tillage actions based on improved SSD - Google Patents

Detection method and system for illegal land occupation and tillage actions based on improved SSD Download PDF

Info

Publication number
CN117037064B
CN117037064B CN202311029943.2A CN202311029943A CN117037064B CN 117037064 B CN117037064 B CN 117037064B CN 202311029943 A CN202311029943 A CN 202311029943A CN 117037064 B CN117037064 B CN 117037064B
Authority
CN
China
Prior art keywords
image
ssd
module
real
illegal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311029943.2A
Other languages
Chinese (zh)
Other versions
CN117037064A (en
Inventor
王柯
李翰
万久地
卢建春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Branch China Tower Co ltd
Original Assignee
Chongqing Branch China Tower Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Branch China Tower Co ltd filed Critical Chongqing Branch China Tower Co ltd
Priority to CN202311029943.2A priority Critical patent/CN117037064B/en
Publication of CN117037064A publication Critical patent/CN117037064A/en
Application granted granted Critical
Publication of CN117037064B publication Critical patent/CN117037064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a detection method and a system for illegal land occupation and tillage actions based on an improved SSD, wherein the method comprises the following steps: acquiring a video image of a near subway tower and converting the video image into an image to be identified; carrying out defogging treatment on the image to be identified by adopting a Retinex defogging algorithm; the method comprises the steps of inputting the processed image into a trained improved SSD detection model, so as to realize identification and judgment of illegal occupation behaviors, wherein the structure of the SSD is improved, a feature fusion module is used, an additional layer and CBAM attention mechanisms are added, different layers and different scales of low-level features with low semantics and high resolution and top-level features with high semantics and low resolution are fused, and a feature pyramid different from an FPN network is generated, so that the problem that the detection performance of a small target is poor due to insufficient semantic information of a shallow layer of the traditional SSD is solved, and the detection precision of illegal occupation behaviors is further improved.

Description

Detection method and system for illegal land occupation and tillage actions based on improved SSD
Technical Field
The invention belongs to the field of image processing, and particularly relates to an illegal land occupation and tillage behavior detection method and system based on an improved SSD.
Background
For a long time, monitoring of homeland resources depends on satellite remote sensing, unmanned aerial vehicle aerial photography and other technologies. However, these techniques all have respective drawbacks: the resolution of the remote sensing image is too low, so that the detection precision is greatly reduced, the detection period is longer, the timeliness is poor, the shooting angle is uncontrollable, and the accuracy of the remote sensing technology is challenged by the diversity of illegal occupation behaviors. The unmanned aerial vehicle aerial photography technology is more suitable for detecting small targets in rural cultivated lands, but needs to input a large amount of manpower and material resources, cannot monitor a certain area in real time, is poor in timeliness, is poor in stability in long-term monitoring of a certain land, has no mature data set, and is easily influenced by weather environments.
Disclosure of Invention
In view of the above, the invention aims to provide an illegal land occupation behavior detection method and system based on an improved SSD, which can improve the real-time stability, overcome the influence of bad weather on a detection structure and improve the detection precision.
The invention aims at realizing the following technical scheme:
The invention provides a detection method for illegal land occupation and tillage actions based on an improved SSD, which comprises the following steps:
acquiring a video image of a near subway tower and converting the video image into an image to be identified;
carrying out defogging treatment on the image to be identified by adopting a Retinex defogging algorithm;
Inputting the processed image into a trained improved SSD detection model so as to realize identification and judgment of illegal occupation behaviors, wherein the improved SSD detection model comprises:
ResNet50 is taken as an SSD backbone network;
The fifth convolution layer of ResNet and its following structures are removed, leaving the first four convolution layers of RESNETNET;
merging CBAM the attention mechanism module into one or more of the first, second, third and fourth convolution layers;
Adding a feature fusion module, wherein the feature fusion module adjusts feature graphs obtained by the second convolution layer, the third convolution layer and the fourth convolution layer to the same size and performs concat connection operation to perform feature fusion; setting an additional layer after concat connection, and inputting the fused features into the additional layer to generate a feature fusion pyramid;
The activation function relu in ResNet is replaced with leak-relu and BN normalization operations are added.
The invention also provides a detection system for illegally occupying the land and the cultivated land based on the improved SSD, which comprises an image generation module for generating an image to be identified;
the image processing module is used for defogging the image;
the model training module is used for training the improved SSD detection model;
A pattern recognition module for recognizing illegal occupation behavior of the image to be recognized,
Wherein, the improvement SSD detects the model includes:
ResNet50 is taken as an SSD backbone network;
The fifth convolution layer of ResNet and its following structures are removed, leaving the first four convolution layers of RESNETNET;
merging CBAM the attention mechanism module into one or more of the first, second, third and fourth convolution layers;
Adding a feature fusion module, wherein the feature fusion module adjusts feature graphs obtained by the second convolution layer, the third convolution layer and the fourth convolution layer to the same size and performs concat connection operation to perform feature fusion; setting an additional layer after concat connection, and inputting the fused features into the additional layer to generate a feature fusion pyramid;
The activation function relu in ResNet is replaced with leak-relu and BN normalization operations are added.
Further, training of the improved SSD detection model includes:
Collecting a video image of a camera of the high-altitude iron tower, removing images without occupation behaviors, and constructing an illegal occupation behavior image data set a, wherein each image in the image data set a adopts a rectangular frame to mark the area of the illegal occupation behavior;
defogging the image data set a according to a Retinex defogging algorithm;
expanding the defogging-processed image data set a by utilizing a data enhancement strategy to obtain an image data set b;
And carrying out feature extraction training on the data set b according to the improved SSD detection model to finally obtain a trained SSD detection model, wherein the proportion setting mode of the anchor frame in the improved SSD detection model is as follows:
Acquiring 4 coordinate values [ Xmin, ymin, xmax, ymax ] of each real frame in the image data set b, and calculating to obtain the corresponding aspect ratio of each real frame, wherein Xmin represents the minimum abscissa of the real frame; ymin represents the minimum ordinate of the real frame; xmax represents the maximum abscissa of the real box; ymax represents the maximum ordinate of the real box;
Initializing k cluster centers;
sequentially calculating the distance between the real frame and the clustering center, and distributing the real frame to the clustering cluster with the minimum distance;
After all the real frames are distributed, the position of the clustering center is recalculated;
judging whether the center of the current cluster changes, if so, recalculating the distance between each real frame and the cluster center of the center, and reassigning the real frames to the cluster clusters with the minimum distance, if not, ending the flow, and finally obtaining the latest k cluster centers;
after k cluster centers are obtained, the aspect ratio is rewritten into an improved SSD detection model, so that the proportion optimization of the anchor frame is completed.
Further, the data enhancement strategy includes a combination of one or more of image random flipping, rotation, filtering, color space conversion, and the like.
Further, the additional layers include an additional layer 1, an additional layer 2, an additional layer 3, an additional layer 4 and an additional layer 5, which are sequentially connected, each of which is a bottleneck layer composed of three layers of convolution operations of 1×1, 3×3 and 1×1.
Further, CBAM attention mechanism modules include a channel attention module and a spatial attention module, wherein,
The operation of the channel attention module includes:
respectively carrying out maximum pooling and average pooling on each channel characteristic diagram F;
Combining the values in channel order to obtain two result vectors;
Respectively inputting the two result vectors into a fully-connected neural network of the same two layers, firstly reducing the dimension in the neural network, then increasing the dimension, adding the results, and mapping through a Sigmoid function to obtain an output F' of the channel attention module;
the operation of the spatial attention module includes:
Respectively carrying out maximum pooling and average pooling on the feature map F' at the same position of each channel feature map;
merging the two pooling results according to the channel;
The combined result is convolved by using a7×7 convolution kernel and the convolved feature map is subjected to nonlinear transformation by using Sigmoid, so that the output of the CBAM attention mechanism module is obtained.
Further, the Retinex defogging algorithm is MSRCR algorithm.
The beneficial effects of the invention are as follows:
1. the method completes the identification of illegal occupation of land through the video image obtained by the high-altitude camera, is convenient and quick, reduces the cost, shortens the time limit, has flexible camera position and controllable angle, has scene self-adaptability and improves the real-time stability;
2. The invention applies computer vision deep learning to national resource monitoring, can realize rapid identification of illegal land occupation of cultivated land, can greatly reduce manual inspection cost, and has great effect on planning urban and rural lands.
3. The invention adopts the Retinex defogging algorithm, and can overcome the influence of bad weather such as rain, snow, fog and the like on the detection precision of illegal land occupation behavior of the cultivated land;
4. According to the invention, a K-means clustering algorithm is used for the SSD prior frame, so that the aspect ratio of the SSD prior frame is more consistent with the SSD prior frame, and the SSD detection precision is further improved;
5. According to the invention, resNet is adopted as a backbone network of an SSD target detection algorithm, so that the problems of network gradient disappearance and explosion can be effectively relieved, the structure of the SSD is improved, an additional layer and a CBAM attention mechanism are added by using a feature fusion module, and the low-level features with low semantics and high resolution and the top-level features with high semantics and low resolution are fused in different levels and different scales to generate a feature pyramid different from an FPN network, so that the problem that the detection performance of a small target is poor due to insufficient semantic information of a shallow layer of the traditional SSD is solved, and the detection precision of illegal occupation behaviors is further improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, in which:
FIG. 1 is a schematic flow chart of an illegal land occupation behavior detection method based on an improved SSD;
FIG. 2 is a flow chart of an illegal occupancy real-time monitoring system;
FIG. 3 is a basic flow diagram of a conventional SSD algorithm;
FIG. 4 is a basic flow of a K-means clustering algorithm;
FIG. 5 is a schematic diagram of a feature fusion module architecture;
FIG. 6 is a schematic diagram of an embedded feature fusion module within ResNet;
FIG. 7 is a specific structure of ResNet additional layers;
FIG. 8 is a CBAM attention module schematic;
FIG. 9 is a graph showing the effect of the defogging algorithm according to the present invention;
FIG. 10 is a feature diagram of an SSD feature extraction process, in accordance with one embodiment of the invention;
FIG. 11 is a diagram of the effect of a particular embodiment on the identification of general occupancy behavior and small targets;
FIG. 12 is a graph of the original SSD model evaluation map and loss;
FIG. 13 is a graph of the map and loss curves of the SSD model evaluation after modification.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the invention provides a method and a system for detecting a target of illegal land occupation behavior of a homeland cultivated land based on a near-field video image. The invention only realizes the preliminary judgment of illegal occupation behavior, and further verifies and checks the suspected illegal occupation behavior.
Referring to fig. 1 and 2, the method for detecting the targets of illegal land occupation behaviors of the homeland cultivated land based on the near-field video image comprises the following steps:
acquiring a video image of a near subway tower and converting the video image into an image to be identified;
carrying out defogging treatment on the image to be identified by adopting a Retinex defogging algorithm;
and inputting the processed image into a trained improved SSD detection model, so as to realize the identification and judgment of illegal occupation behaviors.
Wherein training the improved SSD detection model comprises:
And (3) collecting video images of the cameras of the high-altitude iron towers, and removing images without occupied behaviors, so as to construct an illegal occupied behavior image data set a. In some embodiments, the conversion of video data into image data is stored in frames, because the amount of video data is large, and an unreasonable frame rate setting will intercept a large amount of picture data when converting into image data, increasing the effort. Therefore, it is necessary to select an appropriate interval to convert video data into image data. As an example, a production image dataset may be set that holds 10 images per hour. The saved image size may be any suitable size, for example, the image size may be unified to 640 x 550. After the conversion into an image, the image without the occupied behaviors is removed, and the rest of the image with the occupied behaviors is taken as an image data set a. The image data set a comprises images containing various illegal occupation behavior characteristics of an excavator, a farm, a fishpond, chickens, ducks and the like.
Then, defogging processing is carried out on the image data set a according to the Retinex defogging algorithm, and the defogging processed image data set a is expanded by utilizing a data enhancement strategy, so that an image data set b is obtained.
The Retinex defogging algorithm is described below.
A given image S (x, y) is decomposed into two different images: the reflected object image R (x, y) and the incident light image L (x, y), for each point (x, y) in the observation image S, there is,
The single-scale SSR algorithm comprises the following specific implementation processes:
(1) The irradiation light component and the reflected light component are separated by a logarithmic method, so that s=log [ S (x, y) ], r=log [ R (x, y) ], and l=log [ L (x, y) ]
s(x,y)=r(x,y)+l(x,y);
(2) Convolving the original image with a Gaussian template, i.e. low-pass filtering the original image to obtain a low-pass image D (x, y),
Wherein G (x, y) represents a gaussian filter function, and δ represents a standard deviation of the gaussian filter function;
(3) In the logarithmic domain, the original image is subtracted from the low-pass image to obtain a high-frequency enhanced image r (x, y):
r(x,y)=s(x,y)-d(x,y),
(4) Taking the inverse logarithm of R (x, y), an enhanced image R (x, y) is obtained:
R(x,y)=exp[r(x,y)];
(5) And (3) carrying out contrast enhancement on R (x, y) to obtain a final result image.
Based on this, the present invention uses a multi-scale Retinex image defogging algorithm MSR:
wherein the total scale n=3, representing a color image; w n is the weight factor of the nth scale; g n (x, y) represents a gaussian function on the n-th scale; i.epsilon.R, G, B.
Still further, the present invention may also employ MSRCR algorithms. Color recovery factor C i is introduced to adjust the ratio of 3 channel colors, and the ratio is multiplied by C i on the basis of MSR to obtain a final Retinex output R (x, y).
R(x,y)=Ci(x,y)R'(x,y)
Where f (·) represents the mapping function of the color space.
In some embodiments, the data enhancement strategy includes a combination of one or more of image random flipping, rotation, clipping, filtering, color space conversion, and the like.
Rotation refers to a rotation of the image by 0 deg. to 360 deg., which causes the image coordinates to be converted from (x, y) to (x ', y'), θ representing the rotation angle, typically (90 deg., 180 deg., 270 deg.). Expressed by the formula:
x'=x cosθ+y sinθ
y'=-x sinθ+y cosθ。
Cutting means that the length and the width of the picture are cut according to a certain proportion, and the cut picture is enlarged to the original picture size.
Flipping refers to flipping the image horizontally, vertically, or a combination of both. The image coordinates P (x, y) are down-converted. m and n respectively represent the width and height of the image. The following equations represent the image horizontal, vertical, and the combination flip, respectively.
p'x,y=px,m-y
p'x,y=pn-x,y
p'x,y=pn-x,m-y
Filtering refers to enhancing data by adding noise (e.g., gaussian noise) to the image. The influence of noise on the image can be reduced to a certain extent by filtering. The relation between the noisy image and the original image is as follows:
P'=P+NP,
where P' represents the image brightness after noise addition, P represents the original image brightness, N ε (0, 0.5), and N represents the conversion ratio of the image.
Color space conversion refers to switching between RGB and HSV color spaces for an image. The picture can be switched from RGB to HSV using the Image function in the picture, i.e. the picture hue (H) saturation (S) and value (V) is adjusted with the three components of the picture RGB color space red, green, blue.
Calculating maximum and minimum values: max_val=max (R, G, B), min_val=min (R, G, B).
Calculate Hue (Hue):
(1) If max_val=min_val, hue=0 (defined as red)
(2) If max_val=r, hue= (G-B)/(max_val-min_val)
(3) If max_val=g, hue=2+ (B-R)/(max_val-min_val)
(4) If max_val=b, hue=4+ (R-G)/(max_val-min_val)
Calculate Saturation (Saturation):
(1) If max_val=0, then the saturation=0
(2) Otherwise, the saturation= (max_val-min_val)/max_val
Calculated Value (Value): value=max_val
In one embodiment of the present invention, the illegal occupation behavior is 15 kinds in total, and the image dataset a contains 900 images in total. The random combination enhancement such as turning, rotating, cutting, filtering, color space conversion and the like is carried out on each image, and the data set can be expanded by 5 times to 5400 pieces.
After the data set is expanded, feature extraction training is carried out on the data set b according to the improved SSD detection model, so that a certain effect precision is achieved, and finally the trained SSD detection model is obtained.
Each image in the image data set a and the image data set b adopts a rectangular frame to mark the area of illegal occupation behavior;
fig. 3 is a basic flow diagram of a conventional SSD algorithm.
As shown in fig. 3, feature extraction is performed first, that is, feature extraction is performed on an input image by using a pre-trained convolutional neural network, so as to obtain feature graphs with different scales;
Next, generating candidate frames, namely generating a plurality of columns of candidate frames by using a sliding window mode on the characteristic map of each scale, wherein each candidate frame represents a target area (namely a land occupation behavior area) possibly appearing in the image;
then, classifying candidate frames, namely classifying each candidate frame, and judging whether targets are contained in the frame (namely whether land occupation behaviors exist or not);
Then, carrying out boundary regression, namely carrying out boundary regression on the candidate frames classified as the targets, and further correcting the positions of the candidate frames to enable the candidate frames to more accurately frame the targets;
and finally, performing non-maximum suppression, namely performing non-maximum suppression (NMS) on the candidate frames subjected to the bounding box regression, removing the frames with higher overlapping degree, and only keeping the candidate frame with highest confidence as the final target detection result.
In order to enable the aspect ratio of the anchor frame to be more suitable for practical application so as to refer to the detection precision of SSD, the invention uses a K-means clustering algorithm to cluster the width and height of the marked real frame in the data set, and the anchor frame proportion of the anchor frame which is more suitable for illegal occupation is obtained.
Firstly, loading a dataset (for example, dataset b) to obtain width and height data of a target, further calculating to obtain the aspect ratio of a real frame, obtaining a statistical rule through a clustering method, replacing the default aspect ratio of an priori frame in an original SSD by using a clustering result, thereby reducing the time required for a network to fine-tune the priori frame to approach the real frame, selecting a distance metric d=1-IoU for reducing the size of the dimension of the real frame to influence the clustering result, and reducing the distance by improving the intersection ratio of the labeling frame and a clustering center.
FIG. 4 is a basic flow of the K-means clustering algorithm. As shown in fig. 4, 4 coordinate values [ Xmin, ymin, xmax, ymax ] of each real frame in the image dataset b may be obtained, and an aspect ratio corresponding to each real frame may be calculated, where Xmin represents a minimum abscissa of the real frame; ymin represents the minimum ordinate of the real frame; xmax represents the maximum abscissa of the real box; ymax represents the maximum ordinate of the real box;
Initializing k cluster centers;
sequentially calculating the distance between the real frame and the clustering center, and distributing the real frame to the clustering cluster with the minimum distance;
After all the real frames are distributed, the position of the clustering center is recalculated;
then, judging whether the center of the current cluster is changed, if so, recalculating the distance between each real frame and the cluster center of the center, and reassigning the real frames to the cluster clusters with the minimum distance, if not, ending the flow, and finally obtaining the latest k cluster centers;
after k cluster centers are obtained, the aspect ratio is rewritten into an improved SSD detection model, so that the proportion optimization of the anchor frame is completed.
According to the invention, anchor frames corresponding to the K-means optimized proportion are generated on each scale feature map, and then classified, and whether the frames contain any object is judged; the bounding box returns, and the position of the candidate frame classified as the target is further corrected, so that the candidate frame can accurately surround the target; and finally, deleting the redundant frames by using an NMS algorithm, and only reserving the candidate frame with the highest confidence as a final prediction result. And reserving the optimal weight to obtain the SSD model.
In the invention, a labeling tool is used for labeling the images in the dataset, and the images and the labeled files are respectively stored under JPEGImage and Annotations files. The images in the dataset are randomly divided into a training set, a verification set and a test set according to the proportion of 6:2:2, and the training set, the verification set and the test set are respectively stored in train, val, test folders. Train, val, test folders were stored under IMAGESETS folders. 200 epochs are trained on the data set based on an SSD destination detection algorithm, and an optimal weight file is reserved.
In order to improve the feature extraction capability, resNet with better feature extraction capability and relatively lower calculation amount is adopted as an SSD backbone network, so that the high-order semantic features of the image cannot disappear along with the deepening of the network, and a series of additional layers are added to construct the SSD whole network. Table 1 shows various ResNet structures.
TABLE 1 various ResNet structures
As shown in table 1, the convolutional layers in the various ResNet structures are organized by a plurality of Bottleneck (i.e., bottleneck layer) organics. Each Bottleneck is composed of three layers of convolution operations of 1×1,3×3 and1×1, and the number of channels is determined according to the situation, but the direct addition of input and output can be realized only if the same number of input and output channels is ensured.
The improved SSD detection model of the present invention, with the fifth convolutional layer of ResNet50 0 removed (i.e., conv5_x in Table 1) and its following structure, retains the first four convolutional layers of RESNETNET (i.e., conv1_x, conv2_x, conv3_x, conv4_x in Table 1), i.e., with the ResNet structure of conv5_x removed, leaving only the first four layers, the first layer having a convolution kernel size of 7 x 7, the second layer consisting of 3 Bottleneck, the third layer consisting of 4 Bottleneck, and the fourth layer consisting of 6 Bottleneck, with each layer parameter as shown in Table 1, also replaces the activation function relu in ResNet50 with the leaky-relu and adds BN normalization operations.
The SSD initial input image size is adjusted to 300×300×3, the size becomes 150×150×64, 75×75×256, 38×38×512 sequentially after passing through the first three convolutional layers, and the fourth layer does not change the image size because stride=1 but the number of channels becomes 1024. The following conv5_x, avg pool, fc, softmax structures of ResNet are discarded, and the SSD feature fusion module is directly added at the back.
As shown in fig. 5 and fig. 6, the feature fusion module adjusts feature graphs obtained by the second convolution layer, the third convolution layer and the fourth convolution layer (i.e. conv2_3, conv3_4 and conv4_6) of ResNet to the same size and performs a concat connection operation to perform high-low layer feature fusion, so as to realize multi-scale feature fusion. The Feature Map after fusion is taken as a first Feature Map (i.e., feature Map 1 in fig. 7). And setting an additional layer after concat connection, and inputting the fused features into the additional layer to generate a feature fusion pyramid.
In some embodiments, the additional layers include an additional layer 1, an additional layer 2, an additional layer 3, an additional layer 4, and an additional layer 5 connected in sequence, each of the additional layers being a bottleneck layer consisting of 1 x 1,3 x 3, and 1 x 1 three-layer convolution operations. Fig. 7 is a schematic structure of ResNet additional layers.
The additional layer 1 contains a Bottleneck,38×38×512 Feature Map (i.e., feature Map 1 in fig. 7), and after the additional layer 1 is added, the image Feature Map becomes 19×19×512 (i.e., feature Map 2 in fig. 7);
Adding Bottleneck as an additional layer 2, the Feature Map becomes 10×10×512 (i.e., feature Map3 in fig. 7);
Adding Bottleneck as an additional layer 3, the image Feature Map becomes 5×5×256 (i.e., feature Map 4 in fig. 7);
adding Bottleneck as the additional layer 4, the image Feature Map becomes 3×3×256 (i.e., feature Map 5 in fig. 7);
adding Bottleneck as an additional layer 5, the image Feature Map becomes 1×1×256 (i.e., feature Map 6 in fig. 7);
the Feature Map after the additional layer is added forms a Feature fusion pyramid (comprising Feature Map 1 to Feature Map 6), and the Feature fusion pyramid can be used as a Feature Map of the SSD prediction Feature layer. Expressed by the formula:
Xf=φfi(Xi)],i∈(2-3,3-4,4-6)
X'p=φp(Xf)
loc,class=φc,l[∪(X'p)],
Wherein i represents the position of a convolution layer, X i is each feature map of an original feature pyramid, ζ i is a conversion function of scaling each X i to the same scale, phi f is a feature fusion function, phi p generates a function of a new feature pyramid for X f, X' p is the new feature pyramid, and phi c,l is a feature fusion pyramid generated by the new feature pyramid.
As shown in FIG. 6, the conv2-3 output is 75×75×256, the conv3-4 output is 38×38×512, and the conv4-6 output is 38×38×1024. The conv3-4 size is uniformly adjusted to 38×38×512 by using bilinear interpolation and 1×1 convolution operations, so that the three features have the same size in the spatial dimension, and the concat operation can be realized. And then generating a feature fusion pyramid after adding the layers, and finally obtaining the feature map with the size of 1 multiplied by 256 through the later feature fusion pyramid.
In some embodiments, the present invention further adds CBAM a attention mechanism module that can be independent of any convolution structure (e.g., CBAM attention mechanism module can be incorporated into one or more of the first, second, third, and fourth convolution layers), enabling efficient allocation of information processing resources that can de-focus important information with high weights, de-ignore irrelevant information with low weights, and also continuously adjust weights so that important information can be selected under different circumstances.
Fig. 8 is a diagram of CBAM attention module. As shown in fig. 8, the CBAM attention mechanism module includes a channel attention module and a spatial attention module.
The channel attention module (i.e., CAM) may implement compression at the spatial level. The specific operation comprises the following steps:
respectively carrying out maximum pooling and average pooling on each channel characteristic diagram F;
Combining the values in channel order to obtain two result vectors;
And respectively inputting the two result vectors into the same two-layer fully-connected neural network, wherein the dimension is firstly reduced and then increased in the neural network, so that the dimension of the input vector of the neural network is ensured to be the same as that of the output vector, and the number of parameters is effectively reduced. And adding the results, and mapping by a Sigmoid function to obtain the output F' of the channel attention module. Expressed by the formula:
Wherein AvgPool denotes an average pooling operation, Representing the average pooling value of the feature map F in the channel attention module, maxPool representing the maximum pooling operation,Representing the maximum pooling value of the feature map F in the channel attention module, MLP represents the multi-layer perceptron, W 0 and W 1 represent parameters in the multi-layer perceptron, σ represents the Sigmoid function.
The spatial attention module may implement compression at the channel level. The specific operation comprises the following steps:
Respectively carrying out maximum pooling and average pooling on the feature map F' at the same position of each channel feature map;
Combining the two pooling results according to the channels to obtain a characteristic diagram of (W, H, 2), wherein W represents the image width and H represents the image height;
The combined result is convolved with a 7 x 7 convolution kernel and the convolved feature map is non-linearly transformed using Sigmoid, resulting in the CBAM attention mechanism module output M s (F'). Expressed by the formula:
wherein, AndRepresenting the average and maximum pooling values, respectively, for the feature map F' in the spatial attention module, F 7×7 represents the convolution operation with a 7 x 7 convolution kernel.
Through the improvement of the SSD model, the model is utilized for training, and finally, the model is utilized, so that the identification and judgment of the behaviors of illegally occupying the land and the cultivated land can be realized.
The invention also provides a detection system for illegally occupying the land and the cultivated land based on the improved SSD, which comprises an image generation module for generating an image to be identified;
the image processing module is used for defogging the image;
the model training module is used for training the improved SSD detection model;
A pattern recognition module for recognizing illegal occupation behavior of the image to be recognized,
Wherein, the improvement SSD detects the model includes:
ResNet50 is taken as an SSD backbone network;
The fifth convolution layer of ResNet and its following structures are removed, leaving the first four convolution layers of RESNETNET;
merging CBAM the attention mechanism module into one or more of the first, second, third and fourth convolution layers;
Adding a feature fusion module, wherein the feature fusion module adjusts feature graphs obtained by the second convolution layer, the third convolution layer and the fourth convolution layer to the same size and performs concat connection operation to perform feature fusion; setting an additional layer after concat connection, and inputting the fused features into the additional layer to generate a feature fusion pyramid;
The activation function relu in ResNet is replaced with leak-relu and BN normalization operations are added.
The invention provides a real-time detection method and a system for illegal land occupation behavior of a homeland cultivated land based on an improved SSD. The problems of poor real-time performance, poor stability and the like of remote sensing and unmanned aerial vehicle aerial photographing technologies are solved, and the detection precision of illegal occupation of land is effectively improved. The influence of rain, snow and fog weather on the detection precision of illegal land occupation behavior of the cultivated land is overcome. Provides powerful technical support for protecting cultivated land resources, and has strong practicability and engineering generalization capability.
Fig. 9 is a comparison of effect graphs after a Restinex defogging algorithm, wherein the left graph is an image before defogging and the right graph is an image after defogging, according to an embodiment of the present invention. After defogging operation, the image definition is obviously improved.
Fig. 10 is a feature diagram of an SSD feature extraction process according to an embodiment of the invention, through which a footprint may be obtained.
FIG. 11 is a diagram of the effect of an embodiment of the method of the present invention on the identification of general occupancy and small objects. As can be seen from fig. 11, the method of the present invention can identify general occupancy and also small targets.
Fig. 12 (a) and (b) are graphs of the original SSD model evaluation map and loss, respectively. As can be seen from FIG. 12, the original SSD model had a maximum map value of 86.02% and an overall stable map value of around 85-86%. The minimum loss reaches 2.4339 and is stabilized around 2.4-2.5.
Fig. 13 (a) and (b) are graphs of the map and loss curves of the improved SSD model evaluation. As can be seen from fig. 13, the maximum map value of the improved SSD model is 90.77%, and the overall stable map value is around 89-90%. Compared with the original model, the method has the advantages that the loss minimum value is 2.4885, and the stability is 2.4-2.5.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims (8)

1. The detection method for illegally occupying the homeland tilling behavior based on the improved SSD is characterized by comprising the following steps:
acquiring a video image of a near subway tower and converting the video image into an image to be identified;
carrying out defogging treatment on the image to be identified by adopting a Retinex defogging algorithm;
Inputting the processed image into a trained improved SSD detection model so as to realize identification and judgment of illegal occupation behaviors, wherein the improved SSD detection model comprises:
ResNet50 is taken as an SSD backbone network;
The fifth convolution layer of ResNet and its following structures are removed, leaving the first four convolution layers of RESNETNET;
merging CBAM the attention mechanism module into one or more of the first, second, third and fourth convolution layers;
Adding a feature fusion module, wherein the feature fusion module adjusts feature graphs obtained by the second convolution layer, the third convolution layer and the fourth convolution layer to the same size and performs concat connection operation to perform feature fusion; setting an additional layer after concat connection, and inputting the fused features into the additional layer to generate a feature fusion pyramid; the additional layers comprise an additional layer 1, an additional layer 2, an additional layer 3, an additional layer 4 and an additional layer 5 which are sequentially connected, each additional layer is a bottleneck layer, and the bottleneck layer consists of three layers of convolution operations of 1×1,3×3 and 1×1;
The activation function relu in ResNet is replaced with leak-relu and BN normalization operations are added.
2. The method for detecting illegal land occupation activity based on improved SSD according to claim 1, wherein training of the improved SSD detection model includes:
Collecting a video image of a camera of the high-altitude iron tower, removing images without occupation behaviors, and constructing an illegal occupation behavior image data set a, wherein each image in the image data set a adopts a rectangular frame to mark the area of the illegal occupation behavior;
defogging the image data set a according to a Retinex defogging algorithm;
expanding the defogging-processed image data set a by utilizing a data enhancement strategy to obtain an image data set b;
And carrying out feature extraction training on the data set b according to the improved SSD detection model to finally obtain a trained SSD detection model, wherein the proportion setting mode of the anchor frame in the improved SSD detection model is as follows:
Acquiring 4 coordinate values [ Xmin, ymin, xmax, ymax ] of each real frame in the image data set b, and calculating to obtain the corresponding aspect ratio of each real frame, wherein Xmin represents the minimum abscissa of the real frame; ymin represents the minimum ordinate of the real frame; xmax represents the maximum abscissa of the real box; ymax represents the maximum ordinate of the real box;
Initializing k cluster centers;
sequentially calculating the distance between the real frame and the clustering center, and distributing the real frame to the clustering cluster with the minimum distance;
After all the real frames are distributed, the position of the clustering center is recalculated;
judging whether the center of the current cluster changes, if so, recalculating the distance between each real frame and the cluster center of the center, and reassigning the real frames to the cluster clusters with the minimum distance, if not, ending the flow, and finally obtaining the latest k cluster centers;
after k cluster centers are obtained, the aspect ratio is rewritten into an improved SSD detection model, so that the proportion optimization of the anchor frame is completed.
3. The method for detecting the illegal land occupation activity based on the improvement of SSD according to claim 2, wherein the data enhancement strategy includes one or more of image random flip, rotation, filtering, color space conversion, and the like.
4. The method for detecting the illegal use of the land tilling behavior based on the improved SSD as recited in claim 1, wherein CBAM the attention mechanism module includes a channel attention module and a space attention module, wherein,
The operation of the channel attention module includes:
respectively carrying out maximum pooling and average pooling on each channel characteristic diagram F;
Combining the values in channel order to obtain two result vectors;
Respectively inputting the two result vectors into a fully-connected neural network of the same two layers, firstly reducing the dimension in the neural network, then increasing the dimension, adding the results, and mapping through a Sigmoid function to obtain an output F' of the channel attention module;
the operation of the spatial attention module includes:
Respectively carrying out maximum pooling and average pooling on the feature map F' at the same position of each channel feature map;
merging the two pooling results according to the channel;
The combined result is convolved by using a7×7 convolution kernel and the convolved feature map is subjected to nonlinear transformation by using Sigmoid, so that the output of the CBAM attention mechanism module is obtained.
5. The detection method for illegal land occupation behavior based on improved SSD according to claim 1 or 2, characterized in that Retinex defogging algorithm is MSRCR algorithm.
6. The detection system for illegally occupying the territorial farmland behavior based on the improved SSD is characterized by comprising
The image generation module is used for generating an image to be identified;
the image processing module is used for defogging the image;
the model training module is used for training the improved SSD detection model;
A pattern recognition module for recognizing illegal occupation behavior of the image to be recognized,
Wherein, the improvement SSD detects the model includes:
ResNet50 is taken as an SSD backbone network;
The fifth convolution layer of ResNet and its following structures are removed, leaving the first four convolution layers of RESNETNET;
merging CBAM the attention mechanism module into one or more of the first, second, third and fourth convolution layers;
Adding a feature fusion module, wherein the feature fusion module adjusts feature graphs obtained by the second convolution layer, the third convolution layer and the fourth convolution layer to the same size and performs concat connection operation to perform feature fusion; setting an additional layer after concat connection, and inputting the fused features into the additional layer to generate a feature fusion pyramid; the additional layers comprise an additional layer 1, an additional layer 2, an additional layer 3, an additional layer 4 and an additional layer 5 which are sequentially connected, each additional layer is a bottleneck layer, and the bottleneck layer consists of three layers of convolution operations of 1×1,3×3 and 1×1;
The activation function relu in ResNet is replaced with leak-relu and BN normalization operations are added.
7. The improved SSD-based illegal land occupation activity detection system of claim 6, wherein the training of the improved SSD detection model by the model training module comprises:
Collecting a video image of a camera of the high-altitude iron tower, removing images without occupation behaviors, and constructing an illegal occupation behavior image data set a, wherein each image in the image data set a adopts a rectangular frame to mark the area of the illegal occupation behavior;
defogging the image data set a according to a Retinex defogging algorithm;
expanding the defogging-processed image data set a by utilizing a data enhancement strategy to obtain an image data set b;
And carrying out feature extraction training on the data set b according to the improved SSD detection model to finally obtain a trained SSD detection model, wherein the proportion setting mode of the anchor frame in the improved SSD detection model is as follows:
Acquiring 4 coordinate values [ Xmin, ymin, xmax, ymax ] of each real frame in the image data set b, and calculating to obtain the corresponding aspect ratio of each real frame, wherein Xmin represents the minimum abscissa of the real frame; ymin represents the minimum ordinate of the real frame; xmax represents the maximum abscissa of the real box; ymax represents the maximum ordinate of the real box;
Initializing k cluster centers;
sequentially calculating the distance between the real frame and the clustering center, and distributing the real frame to the clustering cluster with the minimum distance;
After all the real frames are distributed, the position of the clustering center is recalculated;
judging whether the center of the current cluster changes, if so, recalculating the distance between each real frame and the cluster center of the center, and reassigning the real frames to the cluster clusters with the minimum distance, if not, ending the flow, and finally obtaining the latest k cluster centers;
after k cluster centers are obtained, the aspect ratio is rewritten into an improved SSD detection model, so that the proportion optimization of the anchor frame is completed.
8. The system for detecting the behavior of an illegally occupied homeland tillage based on an improved SSD as recited in claim 6, wherein CBAM. The attention mechanism module includes a channel attention module and a spatial attention module, wherein,
The operation of the channel attention module includes:
respectively carrying out maximum pooling and average pooling on each channel characteristic diagram F;
Combining the values in channel order to obtain two result vectors;
Respectively inputting the two result vectors into a fully-connected neural network of the same two layers, firstly reducing the dimension in the neural network, then increasing the dimension, adding the results, and mapping through a Sigmoid function to obtain an output F' of the channel attention module;
the operation of the spatial attention module includes:
Respectively carrying out maximum pooling and average pooling on the feature map F' at the same position of each channel feature map;
merging the two pooling results according to the channel;
The combined result is convolved by using a7×7 convolution kernel and the convolved feature map is subjected to nonlinear transformation by using Sigmoid, so that the output of the CBAM attention mechanism module is obtained.
CN202311029943.2A 2023-08-16 2023-08-16 Detection method and system for illegal land occupation and tillage actions based on improved SSD Active CN117037064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311029943.2A CN117037064B (en) 2023-08-16 2023-08-16 Detection method and system for illegal land occupation and tillage actions based on improved SSD

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311029943.2A CN117037064B (en) 2023-08-16 2023-08-16 Detection method and system for illegal land occupation and tillage actions based on improved SSD

Publications (2)

Publication Number Publication Date
CN117037064A CN117037064A (en) 2023-11-10
CN117037064B true CN117037064B (en) 2024-10-22

Family

ID=88642749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311029943.2A Active CN117037064B (en) 2023-08-16 2023-08-16 Detection method and system for illegal land occupation and tillage actions based on improved SSD

Country Status (1)

Country Link
CN (1) CN117037064B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118072163A (en) * 2024-02-02 2024-05-24 重庆科技大学 Neural network-based method and system for detecting illegal occupation of territorial cultivated land

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114202672A (en) * 2021-12-09 2022-03-18 南京理工大学 A small object detection method based on attention mechanism
CN114240878A (en) * 2021-12-16 2022-03-25 国网河南省电力公司电力科学研究院 Routing inspection scene-oriented insulator defect detection neural network construction and optimization method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528963A (en) * 2021-01-09 2021-03-19 江苏拓邮信息智能技术研究院有限公司 Intelligent arithmetic question reading system based on MixNet-YOLOv3 and convolutional recurrent neural network CRNN
CN115439693A (en) * 2022-07-29 2022-12-06 中国科学院空天信息创新研究院 Training method of target recognition network model, electronic device and program product
CN115620180A (en) * 2022-10-24 2023-01-17 湖南师范大学 A Target Detection Method for Aerial Images Based on Improved YOLOv5
CN115661777A (en) * 2022-11-03 2023-01-31 西安邮电大学 Semantic-combined foggy road target detection algorithm
CN115861772A (en) * 2023-02-22 2023-03-28 杭州电子科技大学 Multi-scale single-stage target detection method based on RetinaNet

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114202672A (en) * 2021-12-09 2022-03-18 南京理工大学 A small object detection method based on attention mechanism
CN114240878A (en) * 2021-12-16 2022-03-25 国网河南省电力公司电力科学研究院 Routing inspection scene-oriented insulator defect detection neural network construction and optimization method

Also Published As

Publication number Publication date
CN117037064A (en) 2023-11-10

Similar Documents

Publication Publication Date Title
CN111046880B (en) Infrared target image segmentation method, system, electronic equipment and storage medium
CN107145846B (en) A kind of insulator recognition methods based on deep learning
CN114529817B (en) Photovoltaic fault diagnosis and positioning method for unmanned aerial vehicles based on attention neural network
US20220301301A1 (en) System and method of feature detection in satellite images using neural networks
CN112288008B (en) Mosaic multispectral image disguised target detection method based on deep learning
Yang et al. Single image haze removal via region detection network
CN112614136B (en) Infrared small target real-time instance segmentation method and device
CN109584251A (en) A kind of tongue body image partition method based on single goal region segmentation
CN112184604B (en) Color image enhancement method based on image fusion
CN109919026B (en) Surface unmanned ship local path planning method
CN106169081A (en) A kind of image classification based on different illumination and processing method
CN114387195A (en) Infrared image and visible light image fusion method based on non-global pre-enhancement
Liu et al. A shadow imaging bilinear model and three-branch residual network for shadow removal
CN108388905A (en) A kind of Illuminant estimation method based on convolutional neural networks and neighbourhood context
CN114627269B (en) A virtual reality security monitoring platform based on deep learning target detection
CN109993806A (en) A kind of color identification method, device and electronic equipment
CN117409339A (en) Unmanned aerial vehicle crop state visual identification method for air-ground coordination
CN117037064B (en) Detection method and system for illegal land occupation and tillage actions based on improved SSD
Liangjun et al. MSFA-YOLO: A multi-scale SAR ship detection algorithm based on fused attention
CN116596792A (en) Inland river foggy scene recovery method, system and equipment for intelligent ship
CN118587563B (en) Self-supervised pre-training method for adaptive inspection of distribution network lines based on drones
CN119152269A (en) Dust detection method based on lightweight pixel differential network
CN113554568A (en) An Unsupervised Recurrent Rain Removal Network Method Based on Self-Supervised Constraints and Unpaired Data
CN111368776B (en) High-resolution remote sensing image classification method based on deep ensemble learning
Sulaiman et al. Building Precision: Efficient Encoder-Decoder Networks for Remote Sensing based on Aerial RGB and LiDAR data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant