CN117037064B

CN117037064B - Detection method and system for illegal land occupation and tillage actions based on improved SSD

Info

Publication number: CN117037064B
Application number: CN202311029943.2A
Authority: CN
Inventors: 王柯; 李翰; 万久地; 卢建春
Original assignee: Chongqing Branch China Tower Co ltd
Current assignee: Chongqing Branch China Tower Co ltd
Priority date: 2023-08-16
Filing date: 2023-08-16
Publication date: 2024-10-22
Anticipated expiration: 2043-08-16
Also published as: CN117037064A

Abstract

The invention discloses a detection method and a system for illegal land occupation and tillage actions based on an improved SSD, wherein the method comprises the following steps: acquiring a video image of a near subway tower and converting the video image into an image to be identified; carrying out defogging treatment on the image to be identified by adopting a Retinex defogging algorithm; the method comprises the steps of inputting the processed image into a trained improved SSD detection model, so as to realize identification and judgment of illegal occupation behaviors, wherein the structure of the SSD is improved, a feature fusion module is used, an additional layer and CBAM attention mechanisms are added, different layers and different scales of low-level features with low semantics and high resolution and top-level features with high semantics and low resolution are fused, and a feature pyramid different from an FPN network is generated, so that the problem that the detection performance of a small target is poor due to insufficient semantic information of a shallow layer of the traditional SSD is solved, and the detection precision of illegal occupation behaviors is further improved.

Description

Detection method and system for illegal land occupation and tillage actions based on improved SSD

Technical Field

The invention belongs to the field of image processing, and particularly relates to an illegal land occupation and tillage behavior detection method and system based on an improved SSD.

Background

For a long time, monitoring of homeland resources depends on satellite remote sensing, unmanned aerial vehicle aerial photography and other technologies. However, these techniques all have respective drawbacks: the resolution of the remote sensing image is too low, so that the detection precision is greatly reduced, the detection period is longer, the timeliness is poor, the shooting angle is uncontrollable, and the accuracy of the remote sensing technology is challenged by the diversity of illegal occupation behaviors. The unmanned aerial vehicle aerial photography technology is more suitable for detecting small targets in rural cultivated lands, but needs to input a large amount of manpower and material resources, cannot monitor a certain area in real time, is poor in timeliness, is poor in stability in long-term monitoring of a certain land, has no mature data set, and is easily influenced by weather environments.

Disclosure of Invention

In view of the above, the invention aims to provide an illegal land occupation behavior detection method and system based on an improved SSD, which can improve the real-time stability, overcome the influence of bad weather on a detection structure and improve the detection precision.

The invention aims at realizing the following technical scheme:

The invention provides a detection method for illegal land occupation and tillage actions based on an improved SSD, which comprises the following steps:

acquiring a video image of a near subway tower and converting the video image into an image to be identified;

carrying out defogging treatment on the image to be identified by adopting a Retinex defogging algorithm;

Inputting the processed image into a trained improved SSD detection model so as to realize identification and judgment of illegal occupation behaviors, wherein the improved SSD detection model comprises:

ResNet50 is taken as an SSD backbone network;

The fifth convolution layer of ResNet and its following structures are removed, leaving the first four convolution layers of RESNETNET;

merging CBAM the attention mechanism module into one or more of the first, second, third and fourth convolution layers;

Adding a feature fusion module, wherein the feature fusion module adjusts feature graphs obtained by the second convolution layer, the third convolution layer and the fourth convolution layer to the same size and performs concat connection operation to perform feature fusion; setting an additional layer after concat connection, and inputting the fused features into the additional layer to generate a feature fusion pyramid;

The activation function relu in ResNet is replaced with leak-relu and BN normalization operations are added.

The invention also provides a detection system for illegally occupying the land and the cultivated land based on the improved SSD, which comprises an image generation module for generating an image to be identified;

the image processing module is used for defogging the image;

the model training module is used for training the improved SSD detection model;

A pattern recognition module for recognizing illegal occupation behavior of the image to be recognized,

Wherein, the improvement SSD detects the model includes:

ResNet50 is taken as an SSD backbone network;

Further, training of the improved SSD detection model includes:

Collecting a video image of a camera of the high-altitude iron tower, removing images without occupation behaviors, and constructing an illegal occupation behavior image data set a, wherein each image in the image data set a adopts a rectangular frame to mark the area of the illegal occupation behavior;

defogging the image data set a according to a Retinex defogging algorithm;

expanding the defogging-processed image data set a by utilizing a data enhancement strategy to obtain an image data set b;

And carrying out feature extraction training on the data set b according to the improved SSD detection model to finally obtain a trained SSD detection model, wherein the proportion setting mode of the anchor frame in the improved SSD detection model is as follows:

Acquiring 4 coordinate values [ Xmin, ymin, xmax, ymax ] of each real frame in the image data set b, and calculating to obtain the corresponding aspect ratio of each real frame, wherein Xmin represents the minimum abscissa of the real frame; ymin represents the minimum ordinate of the real frame; xmax represents the maximum abscissa of the real box; ymax represents the maximum ordinate of the real box;

Initializing k cluster centers;

sequentially calculating the distance between the real frame and the clustering center, and distributing the real frame to the clustering cluster with the minimum distance;

After all the real frames are distributed, the position of the clustering center is recalculated;

judging whether the center of the current cluster changes, if so, recalculating the distance between each real frame and the cluster center of the center, and reassigning the real frames to the cluster clusters with the minimum distance, if not, ending the flow, and finally obtaining the latest k cluster centers;

after k cluster centers are obtained, the aspect ratio is rewritten into an improved SSD detection model, so that the proportion optimization of the anchor frame is completed.

Further, the data enhancement strategy includes a combination of one or more of image random flipping, rotation, filtering, color space conversion, and the like.

Further, the additional layers include an additional layer 1, an additional layer 2, an additional layer 3, an additional layer 4 and an additional layer 5, which are sequentially connected, each of which is a bottleneck layer composed of three layers of convolution operations of 1×1, 3×3 and 1×1.

Further, CBAM attention mechanism modules include a channel attention module and a spatial attention module, wherein,

The operation of the channel attention module includes:

respectively carrying out maximum pooling and average pooling on each channel characteristic diagram F;

Combining the values in channel order to obtain two result vectors;

Respectively inputting the two result vectors into a fully-connected neural network of the same two layers, firstly reducing the dimension in the neural network, then increasing the dimension, adding the results, and mapping through a Sigmoid function to obtain an output F' of the channel attention module;

the operation of the spatial attention module includes:

Respectively carrying out maximum pooling and average pooling on the feature map F' at the same position of each channel feature map;

merging the two pooling results according to the channel;

The combined result is convolved by using a7×7 convolution kernel and the convolved feature map is subjected to nonlinear transformation by using Sigmoid, so that the output of the CBAM attention mechanism module is obtained.

Further, the Retinex defogging algorithm is MSRCR algorithm.

The beneficial effects of the invention are as follows:

1. the method completes the identification of illegal occupation of land through the video image obtained by the high-altitude camera, is convenient and quick, reduces the cost, shortens the time limit, has flexible camera position and controllable angle, has scene self-adaptability and improves the real-time stability;

2. The invention applies computer vision deep learning to national resource monitoring, can realize rapid identification of illegal land occupation of cultivated land, can greatly reduce manual inspection cost, and has great effect on planning urban and rural lands.

3. The invention adopts the Retinex defogging algorithm, and can overcome the influence of bad weather such as rain, snow, fog and the like on the detection precision of illegal land occupation behavior of the cultivated land;

4. According to the invention, a K-means clustering algorithm is used for the SSD prior frame, so that the aspect ratio of the SSD prior frame is more consistent with the SSD prior frame, and the SSD detection precision is further improved;

5. According to the invention, resNet is adopted as a backbone network of an SSD target detection algorithm, so that the problems of network gradient disappearance and explosion can be effectively relieved, the structure of the SSD is improved, an additional layer and a CBAM attention mechanism are added by using a feature fusion module, and the low-level features with low semantics and high resolution and the top-level features with high semantics and low resolution are fused in different levels and different scales to generate a feature pyramid different from an FPN network, so that the problem that the detection performance of a small target is poor due to insufficient semantic information of a shallow layer of the traditional SSD is solved, and the detection precision of illegal occupation behaviors is further improved.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, in which:

FIG. 1 is a schematic flow chart of an illegal land occupation behavior detection method based on an improved SSD;

FIG. 2 is a flow chart of an illegal occupancy real-time monitoring system;

FIG. 3 is a basic flow diagram of a conventional SSD algorithm;

FIG. 4 is a basic flow of a K-means clustering algorithm;

FIG. 5 is a schematic diagram of a feature fusion module architecture;

FIG. 6 is a schematic diagram of an embedded feature fusion module within ResNet;

FIG. 7 is a specific structure of ResNet additional layers;

FIG. 8 is a CBAM attention module schematic;

FIG. 9 is a graph showing the effect of the defogging algorithm according to the present invention;

FIG. 10 is a feature diagram of an SSD feature extraction process, in accordance with one embodiment of the invention;

FIG. 11 is a diagram of the effect of a particular embodiment on the identification of general occupancy behavior and small targets;

FIG. 12 is a graph of the original SSD model evaluation map and loss;

FIG. 13 is a graph of the map and loss curves of the SSD model evaluation after modification.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment of the invention provides a method and a system for detecting a target of illegal land occupation behavior of a homeland cultivated land based on a near-field video image. The invention only realizes the preliminary judgment of illegal occupation behavior, and further verifies and checks the suspected illegal occupation behavior.

Referring to fig. 1 and 2, the method for detecting the targets of illegal land occupation behaviors of the homeland cultivated land based on the near-field video image comprises the following steps:

and inputting the processed image into a trained improved SSD detection model, so as to realize the identification and judgment of illegal occupation behaviors.

Wherein training the improved SSD detection model comprises:

And (3) collecting video images of the cameras of the high-altitude iron towers, and removing images without occupied behaviors, so as to construct an illegal occupied behavior image data set a. In some embodiments, the conversion of video data into image data is stored in frames, because the amount of video data is large, and an unreasonable frame rate setting will intercept a large amount of picture data when converting into image data, increasing the effort. Therefore, it is necessary to select an appropriate interval to convert video data into image data. As an example, a production image dataset may be set that holds 10 images per hour. The saved image size may be any suitable size, for example, the image size may be unified to 640 x 550. After the conversion into an image, the image without the occupied behaviors is removed, and the rest of the image with the occupied behaviors is taken as an image data set a. The image data set a comprises images containing various illegal occupation behavior characteristics of an excavator, a farm, a fishpond, chickens, ducks and the like.

Then, defogging processing is carried out on the image data set a according to the Retinex defogging algorithm, and the defogging processed image data set a is expanded by utilizing a data enhancement strategy, so that an image data set b is obtained.

The Retinex defogging algorithm is described below.

A given image S (x, y) is decomposed into two different images: the reflected object image R (x, y) and the incident light image L (x, y), for each point (x, y) in the observation image S, there is,

The single-scale SSR algorithm comprises the following specific implementation processes:

(1) The irradiation light component and the reflected light component are separated by a logarithmic method, so that s=log [ S (x, y) ], r=log [ R (x, y) ], and l=log [ L (x, y) ]

s(x，y)＝r(x，y)+l(x，y)；

(2) Convolving the original image with a Gaussian template, i.e. low-pass filtering the original image to obtain a low-pass image D (x, y),

Wherein G (x, y) represents a gaussian filter function, and δ represents a standard deviation of the gaussian filter function;

(3) In the logarithmic domain, the original image is subtracted from the low-pass image to obtain a high-frequency enhanced image r (x, y):

r(x,y)＝s(x,y)-d(x,y)，

(4) Taking the inverse logarithm of R (x, y), an enhanced image R (x, y) is obtained:

R(x,y)＝exp[r(x,y)]；

(5) And (3) carrying out contrast enhancement on R (x, y) to obtain a final result image.

Based on this, the present invention uses a multi-scale Retinex image defogging algorithm MSR:

wherein the total scale n=3, representing a color image; w _n is the weight factor of the nth scale; g _n (x, y) represents a gaussian function on the n-th scale; i.epsilon.R, G, B.

Still further, the present invention may also employ MSRCR algorithms. Color recovery factor C _i is introduced to adjust the ratio of 3 channel colors, and the ratio is multiplied by C _i on the basis of MSR to obtain a final Retinex output R (x, y).

R(x,y)＝C_i(x,y)R'(x,y)

Where f (·) represents the mapping function of the color space.

In some embodiments, the data enhancement strategy includes a combination of one or more of image random flipping, rotation, clipping, filtering, color space conversion, and the like.

Rotation refers to a rotation of the image by 0 deg. to 360 deg., which causes the image coordinates to be converted from (x, y) to (x ', y'), θ representing the rotation angle, typically (90 deg., 180 deg., 270 deg.). Expressed by the formula:

x'＝x cosθ+y sinθ

y'＝-x sinθ+y cosθ。

Cutting means that the length and the width of the picture are cut according to a certain proportion, and the cut picture is enlarged to the original picture size.

Flipping refers to flipping the image horizontally, vertically, or a combination of both. The image coordinates P (x, y) are down-converted. m and n respectively represent the width and height of the image. The following equations represent the image horizontal, vertical, and the combination flip, respectively.

p'_x,y＝p_x,m-y

p'_x,y＝p_n-x,y。

p'_x,y＝p_n-x,m-y

Filtering refers to enhancing data by adding noise (e.g., gaussian noise) to the image. The influence of noise on the image can be reduced to a certain extent by filtering. The relation between the noisy image and the original image is as follows:

P'＝P+NP，

where P' represents the image brightness after noise addition, P represents the original image brightness, N ε (0, 0.5), and N represents the conversion ratio of the image.

Color space conversion refers to switching between RGB and HSV color spaces for an image. The picture can be switched from RGB to HSV using the Image function in the picture, i.e. the picture hue (H) saturation (S) and value (V) is adjusted with the three components of the picture RGB color space red, green, blue.

Calculating maximum and minimum values: max_val=max (R, G, B), min_val=min (R, G, B).

Calculate Hue (Hue):

(1) If max_val=min_val, hue=0 (defined as red)

(2) If max_val=r, hue= (G-B)/(max_val-min_val)

(3) If max_val=g, hue=2+ (B-R)/(max_val-min_val)

(4) If max_val=b, hue=4+ (R-G)/(max_val-min_val)

Calculate Saturation (Saturation):

(1) If max_val=0, then the saturation=0

(2) Otherwise, the saturation= (max_val-min_val)/max_val

Calculated Value (Value): value=max_val

In one embodiment of the present invention, the illegal occupation behavior is 15 kinds in total, and the image dataset a contains 900 images in total. The random combination enhancement such as turning, rotating, cutting, filtering, color space conversion and the like is carried out on each image, and the data set can be expanded by 5 times to 5400 pieces.

After the data set is expanded, feature extraction training is carried out on the data set b according to the improved SSD detection model, so that a certain effect precision is achieved, and finally the trained SSD detection model is obtained.

Each image in the image data set a and the image data set b adopts a rectangular frame to mark the area of illegal occupation behavior;

fig. 3 is a basic flow diagram of a conventional SSD algorithm.

As shown in fig. 3, feature extraction is performed first, that is, feature extraction is performed on an input image by using a pre-trained convolutional neural network, so as to obtain feature graphs with different scales;

Next, generating candidate frames, namely generating a plurality of columns of candidate frames by using a sliding window mode on the characteristic map of each scale, wherein each candidate frame represents a target area (namely a land occupation behavior area) possibly appearing in the image;

then, classifying candidate frames, namely classifying each candidate frame, and judging whether targets are contained in the frame (namely whether land occupation behaviors exist or not);

Then, carrying out boundary regression, namely carrying out boundary regression on the candidate frames classified as the targets, and further correcting the positions of the candidate frames to enable the candidate frames to more accurately frame the targets;

and finally, performing non-maximum suppression, namely performing non-maximum suppression (NMS) on the candidate frames subjected to the bounding box regression, removing the frames with higher overlapping degree, and only keeping the candidate frame with highest confidence as the final target detection result.

In order to enable the aspect ratio of the anchor frame to be more suitable for practical application so as to refer to the detection precision of SSD, the invention uses a K-means clustering algorithm to cluster the width and height of the marked real frame in the data set, and the anchor frame proportion of the anchor frame which is more suitable for illegal occupation is obtained.

Firstly, loading a dataset (for example, dataset b) to obtain width and height data of a target, further calculating to obtain the aspect ratio of a real frame, obtaining a statistical rule through a clustering method, replacing the default aspect ratio of an priori frame in an original SSD by using a clustering result, thereby reducing the time required for a network to fine-tune the priori frame to approach the real frame, selecting a distance metric d=1-IoU for reducing the size of the dimension of the real frame to influence the clustering result, and reducing the distance by improving the intersection ratio of the labeling frame and a clustering center.

FIG. 4 is a basic flow of the K-means clustering algorithm. As shown in fig. 4, 4 coordinate values [ Xmin, ymin, xmax, ymax ] of each real frame in the image dataset b may be obtained, and an aspect ratio corresponding to each real frame may be calculated, where Xmin represents a minimum abscissa of the real frame; ymin represents the minimum ordinate of the real frame; xmax represents the maximum abscissa of the real box; ymax represents the maximum ordinate of the real box;

Initializing k cluster centers;

then, judging whether the center of the current cluster is changed, if so, recalculating the distance between each real frame and the cluster center of the center, and reassigning the real frames to the cluster clusters with the minimum distance, if not, ending the flow, and finally obtaining the latest k cluster centers;

According to the invention, anchor frames corresponding to the K-means optimized proportion are generated on each scale feature map, and then classified, and whether the frames contain any object is judged; the bounding box returns, and the position of the candidate frame classified as the target is further corrected, so that the candidate frame can accurately surround the target; and finally, deleting the redundant frames by using an NMS algorithm, and only reserving the candidate frame with the highest confidence as a final prediction result. And reserving the optimal weight to obtain the SSD model.

In the invention, a labeling tool is used for labeling the images in the dataset, and the images and the labeled files are respectively stored under JPEGImage and Annotations files. The images in the dataset are randomly divided into a training set, a verification set and a test set according to the proportion of 6:2:2, and the training set, the verification set and the test set are respectively stored in train, val, test folders. Train, val, test folders were stored under IMAGESETS folders. 200 epochs are trained on the data set based on an SSD destination detection algorithm, and an optimal weight file is reserved.

In order to improve the feature extraction capability, resNet with better feature extraction capability and relatively lower calculation amount is adopted as an SSD backbone network, so that the high-order semantic features of the image cannot disappear along with the deepening of the network, and a series of additional layers are added to construct the SSD whole network. Table 1 shows various ResNet structures.

TABLE 1 various ResNet structures

As shown in table 1, the convolutional layers in the various ResNet structures are organized by a plurality of Bottleneck (i.e., bottleneck layer) organics. Each Bottleneck is composed of three layers of convolution operations of 1×1,3×3 and1×1, and the number of channels is determined according to the situation, but the direct addition of input and output can be realized only if the same number of input and output channels is ensured.

The improved SSD detection model of the present invention, with the fifth convolutional layer of ResNet50 0 removed (i.e., conv5_x in Table 1) and its following structure, retains the first four convolutional layers of RESNETNET (i.e., conv1_x, conv2_x, conv3_x, conv4_x in Table 1), i.e., with the ResNet structure of conv5_x removed, leaving only the first four layers, the first layer having a convolution kernel size of 7 x 7, the second layer consisting of 3 Bottleneck, the third layer consisting of 4 Bottleneck, and the fourth layer consisting of 6 Bottleneck, with each layer parameter as shown in Table 1, also replaces the activation function relu in ResNet50 with the leaky-relu and adds BN normalization operations.

The SSD initial input image size is adjusted to 300×300×3, the size becomes 150×150×64, 75×75×256, 38×38×512 sequentially after passing through the first three convolutional layers, and the fourth layer does not change the image size because stride=1 but the number of channels becomes 1024. The following conv5_x, avg pool, fc, softmax structures of ResNet are discarded, and the SSD feature fusion module is directly added at the back.

As shown in fig. 5 and fig. 6, the feature fusion module adjusts feature graphs obtained by the second convolution layer, the third convolution layer and the fourth convolution layer (i.e. conv2_3, conv3_4 and conv4_6) of ResNet to the same size and performs a concat connection operation to perform high-low layer feature fusion, so as to realize multi-scale feature fusion. The Feature Map after fusion is taken as a first Feature Map (i.e., feature Map 1 in fig. 7). And setting an additional layer after concat connection, and inputting the fused features into the additional layer to generate a feature fusion pyramid.

In some embodiments, the additional layers include an additional layer 1, an additional layer 2, an additional layer 3, an additional layer 4, and an additional layer 5 connected in sequence, each of the additional layers being a bottleneck layer consisting of 1 x 1,3 x 3, and 1 x 1 three-layer convolution operations. Fig. 7 is a schematic structure of ResNet additional layers.

The additional layer 1 contains a Bottleneck,38×38×512 Feature Map (i.e., feature Map 1 in fig. 7), and after the additional layer 1 is added, the image Feature Map becomes 19×19×512 (i.e., feature Map 2 in fig. 7);

Adding Bottleneck as an additional layer 2, the Feature Map becomes 10×10×512 (i.e., feature Map3 in fig. 7);

Adding Bottleneck as an additional layer 3, the image Feature Map becomes 5×5×256 (i.e., feature Map 4 in fig. 7);

adding Bottleneck as the additional layer 4, the image Feature Map becomes 3×3×256 (i.e., feature Map 5 in fig. 7);

adding Bottleneck as an additional layer 5, the image Feature Map becomes 1×1×256 (i.e., feature Map 6 in fig. 7);

the Feature Map after the additional layer is added forms a Feature fusion pyramid (comprising Feature Map 1 to Feature Map 6), and the Feature fusion pyramid can be used as a Feature Map of the SSD prediction Feature layer. Expressed by the formula:

X_f＝φ_f[ζ_i(X_i)],i∈(2-3,3-4,4-6)

X'_p＝φ_p(X_f)

loc，class＝φ_c,l[∪(X'_p)]，

Wherein i represents the position of a convolution layer, X _i is each feature map of an original feature pyramid, ζ _i is a conversion function of scaling each X _i to the same scale, phi _f is a feature fusion function, phi _p generates a function of a new feature pyramid for X _f, X' _p is the new feature pyramid, and phi _c,l is a feature fusion pyramid generated by the new feature pyramid.

As shown in FIG. 6, the conv2-3 output is 75×75×256, the conv3-4 output is 38×38×512, and the conv4-6 output is 38×38×1024. The conv3-4 size is uniformly adjusted to 38×38×512 by using bilinear interpolation and 1×1 convolution operations, so that the three features have the same size in the spatial dimension, and the concat operation can be realized. And then generating a feature fusion pyramid after adding the layers, and finally obtaining the feature map with the size of 1 multiplied by 256 through the later feature fusion pyramid.

In some embodiments, the present invention further adds CBAM a attention mechanism module that can be independent of any convolution structure (e.g., CBAM attention mechanism module can be incorporated into one or more of the first, second, third, and fourth convolution layers), enabling efficient allocation of information processing resources that can de-focus important information with high weights, de-ignore irrelevant information with low weights, and also continuously adjust weights so that important information can be selected under different circumstances.

Fig. 8 is a diagram of CBAM attention module. As shown in fig. 8, the CBAM attention mechanism module includes a channel attention module and a spatial attention module.

The channel attention module (i.e., CAM) may implement compression at the spatial level. The specific operation comprises the following steps:

Combining the values in channel order to obtain two result vectors;

And respectively inputting the two result vectors into the same two-layer fully-connected neural network, wherein the dimension is firstly reduced and then increased in the neural network, so that the dimension of the input vector of the neural network is ensured to be the same as that of the output vector, and the number of parameters is effectively reduced. And adding the results, and mapping by a Sigmoid function to obtain the output F' of the channel attention module. Expressed by the formula:

Wherein AvgPool denotes an average pooling operation, Representing the average pooling value of the feature map F in the channel attention module, maxPool representing the maximum pooling operation,Representing the maximum pooling value of the feature map F in the channel attention module, MLP represents the multi-layer perceptron, W ₀ and W ₁ represent parameters in the multi-layer perceptron, σ represents the Sigmoid function.

The spatial attention module may implement compression at the channel level. The specific operation comprises the following steps:

Combining the two pooling results according to the channels to obtain a characteristic diagram of (W, H, 2), wherein W represents the image width and H represents the image height;

The combined result is convolved with a 7 x 7 convolution kernel and the convolved feature map is non-linearly transformed using Sigmoid, resulting in the CBAM attention mechanism module output M _s (F'). Expressed by the formula:

wherein, AndRepresenting the average and maximum pooling values, respectively, for the feature map F' in the spatial attention module, F ^7×7 represents the convolution operation with a 7 x 7 convolution kernel.

Through the improvement of the SSD model, the model is utilized for training, and finally, the model is utilized, so that the identification and judgment of the behaviors of illegally occupying the land and the cultivated land can be realized.

the image processing module is used for defogging the image;

Wherein, the improvement SSD detects the model includes:

ResNet50 is taken as an SSD backbone network;

The invention provides a real-time detection method and a system for illegal land occupation behavior of a homeland cultivated land based on an improved SSD. The problems of poor real-time performance, poor stability and the like of remote sensing and unmanned aerial vehicle aerial photographing technologies are solved, and the detection precision of illegal occupation of land is effectively improved. The influence of rain, snow and fog weather on the detection precision of illegal land occupation behavior of the cultivated land is overcome. Provides powerful technical support for protecting cultivated land resources, and has strong practicability and engineering generalization capability.

Fig. 9 is a comparison of effect graphs after a Restinex defogging algorithm, wherein the left graph is an image before defogging and the right graph is an image after defogging, according to an embodiment of the present invention. After defogging operation, the image definition is obviously improved.

Fig. 10 is a feature diagram of an SSD feature extraction process according to an embodiment of the invention, through which a footprint may be obtained.

FIG. 11 is a diagram of the effect of an embodiment of the method of the present invention on the identification of general occupancy and small objects. As can be seen from fig. 11, the method of the present invention can identify general occupancy and also small targets.

Fig. 12 (a) and (b) are graphs of the original SSD model evaluation map and loss, respectively. As can be seen from FIG. 12, the original SSD model had a maximum map value of 86.02% and an overall stable map value of around 85-86%. The minimum loss reaches 2.4339 and is stabilized around 2.4-2.5.

Fig. 13 (a) and (b) are graphs of the map and loss curves of the improved SSD model evaluation. As can be seen from fig. 13, the maximum map value of the improved SSD model is 90.77%, and the overall stable map value is around 89-90%. Compared with the original model, the method has the advantages that the loss minimum value is 2.4885, and the stability is 2.4-2.5.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. The detection method for illegally occupying the homeland tilling behavior based on the improved SSD is characterized by comprising the following steps:

ResNet50 is taken as an SSD backbone network;

Adding a feature fusion module, wherein the feature fusion module adjusts feature graphs obtained by the second convolution layer, the third convolution layer and the fourth convolution layer to the same size and performs concat connection operation to perform feature fusion; setting an additional layer after concat connection, and inputting the fused features into the additional layer to generate a feature fusion pyramid; the additional layers comprise an additional layer 1, an additional layer 2, an additional layer 3, an additional layer 4 and an additional layer 5 which are sequentially connected, each additional layer is a bottleneck layer, and the bottleneck layer consists of three layers of convolution operations of 1×1,3×3 and 1×1;

2. The method for detecting illegal land occupation activity based on improved SSD according to claim 1, wherein training of the improved SSD detection model includes:

defogging the image data set a according to a Retinex defogging algorithm;

Initializing k cluster centers;

3. The method for detecting the illegal land occupation activity based on the improvement of SSD according to claim 2, wherein the data enhancement strategy includes one or more of image random flip, rotation, filtering, color space conversion, and the like.

4. The method for detecting the illegal use of the land tilling behavior based on the improved SSD as recited in claim 1, wherein CBAM the attention mechanism module includes a channel attention module and a space attention module, wherein,

The operation of the channel attention module includes:

Combining the values in channel order to obtain two result vectors;

the operation of the spatial attention module includes:

merging the two pooling results according to the channel;

5. The detection method for illegal land occupation behavior based on improved SSD according to claim 1 or 2, characterized in that Retinex defogging algorithm is MSRCR algorithm.

6. The detection system for illegally occupying the territorial farmland behavior based on the improved SSD is characterized by comprising

The image generation module is used for generating an image to be identified;

the image processing module is used for defogging the image;

Wherein, the improvement SSD detects the model includes:

ResNet50 is taken as an SSD backbone network;

7. The improved SSD-based illegal land occupation activity detection system of claim 6, wherein the training of the improved SSD detection model by the model training module comprises:

defogging the image data set a according to a Retinex defogging algorithm;

Initializing k cluster centers;

8. The system for detecting the behavior of an illegally occupied homeland tillage based on an improved SSD as recited in claim 6, wherein CBAM. The attention mechanism module includes a channel attention module and a spatial attention module, wherein,

The operation of the channel attention module includes:

Combining the values in channel order to obtain two result vectors;

the operation of the spatial attention module includes:

merging the two pooling results according to the channel;