CN119851242A

CN119851242A - Real-time foggy-day road vehicle target detection method with enhanced self-adaptive target characteristics

Info

Publication number: CN119851242A
Application number: CN202411732235.XA
Authority: CN
Inventors: 殷旭平; 朱长仁; 郭军; 童绳武; 李旭佳; 欧书祐; 谭文军; 赵飞
Original assignee: China Communication System Co ltd Changsha Branch; CETC 54 Research Institute
Current assignee: China Communication System Co ltd Changsha Branch; CETC 54 Research Institute
Priority date: 2024-11-29
Filing date: 2024-11-29
Publication date: 2025-04-18
Anticipated expiration: 2044-11-29
Also published as: CN119851242B

Abstract

The invention discloses a real-time foggy-day road vehicle target detection method with enhanced self-adaptive target characteristics, and relates to the technical field of image processing. The method comprises the steps of firstly constructing a training set, then designing a lightweight defogging module and a lightweight self-adaptive target characteristic enhancement module, forming a real-time foggy road vehicle target detection model together with a target detection network YOLOv, training the lightweight defogging network step by utilizing the constructed training set, then training the lightweight self-adaptive target characteristic enhancement module and the target detection network YOLOv5 end to end, inputting foggy road vehicle images into the constructed model, detecting vehicle targets in the images in real time, and outputting detection results. The invention can fully restore the detail characteristics of the foggy image by utilizing the lightweight defogging network, and remarkably improves the target detection performance of foggy road vehicles by combining with the lightweight self-adaptive target characteristic enhancement module, and the network model has smaller parameter and meets the real-time requirement.

Description

Real-time foggy-day road vehicle target detection method with enhanced self-adaptive target characteristics

Technical Field

The invention relates to the technical field of image processing, in particular to a real-time foggy-day road vehicle target detection method with enhanced self-adaptive target characteristics.

Background

Under the promotion of the Internet of things, big data, cloud computing, artificial intelligence and new foundation tide, intelligent traffic in China is in a new development stage. Intelligent traffic aims to provide innovative services for different traffic modes and traffic management, allowing users to use traffic networks more fully, safely, more consistently and more efficiently. An important content of intelligent traffic is road vehicle detection, and a target detection technology is used as a key technology in the intelligent traffic, so that strong support can be provided for tasks such as vehicle track tracking and traffic scene recognition, traffic police road duty pressure is effectively relieved, informatization management level is enhanced, traffic running efficiency is improved, and traffic safety is guaranteed.

Under good weather conditions, a common target detection model such as YOLOv can effectively solve the road vehicle detection task, however, the road monitoring camera is often interfered by fog in the imaging process, the acquired image quality is reduced, the detail characteristics of the target object in the image are lost, the color saturation is reduced, the contrast is reduced, the texture information is weakened, the contour edge is fuzzy, a large amount of information beneficial to target detection is contained in a degraded image in a more recessive way, and the untreated degraded image is directly subjected to target detection, so that the performance of the target detection model is greatly reduced. Aiming at the problems, the prior technical proposal mainly can be divided into three types, namely a defogging-then-detection method, a domain self-adaptive target detection method and a combined image defogging-target detection method.

The defogging-after-detection method comprises the steps of constructing a synthetic foggy day image dataset, training a defogging model on the synthetic foggy day image dataset, defogging a foggy day image by adopting a pre-training image defogging model, training a target detection model by adopting the defogging image, and sending a defogging image and the defogging image into the target detection model for target detection. The method is the earliest technical proposal, and the detailed characteristics of the foggy-day image can be enhanced by defogging, so that the target detection performance of the foggy-day image is improved, but the generalization effect on the true foggy-day image is poor because the synthetic foggy-day image and the true foggy-day image have the offset of the domain.

In order to solve the problem caused by inter-domain offset, a domain self-adaptive target detection method is provided, and the method comprises the steps of constructing a synthetic foggy image dataset, pre-training a defogging model on the synthetic foggy image dataset, carrying out self-adaptive migration training from a synthetic domain to a real domain through physical priori loss or counterloss on the real foggy image dataset, reducing inter-domain gaps, defogging a foggy image by adopting a domain self-adaptive defogging network, training a target detection model by adopting a foggy image and a defogging image, and sending the defogging image into the target detection model for target detection. The method can improve the target detection effect of the real foggy-day image, but is the same as the method of defogging before detection, and is characterized in that the image is restored by taking the visual sense of a person as a judgment criterion, and the information related to target detection in the foggy-day image can not be fully utilized, so that the improvement capability is limited.

The combined image defogging and target detection method is the latest technical scheme, and the method comprises the steps of constructing a synthetic foggy image data set, pre-training a defogging model on the synthetic foggy image data set, training a target detection model by adopting a defogging image, cascading a defogging network with a target detection network, combining image defogging loss and target detection loss, training a cascading network end to end by adopting the defogging image and the synthetic foggy image, and sending the foggy image into the cascading network for end to end target detection. The method can self-adaptively restore the foggy image into an image favorable for target detection by utilizing the hidden information in the foggy image, and remarkably improves the target detection performance of the foggy image, but is constrained by defogging loss, the target feature enhancement of the restored image is not thorough enough, and the parameter quantity of a network model is usually larger, so that the real-time requirement cannot be met.

Disclosure of Invention

In view of the above, the invention provides a real-time foggy road vehicle target detection method with enhanced adaptive target characteristics. The method can more fully excavate potential target features in the foggy weather image, and perform self-adaptive target feature enhancement, so that the target detection performance of the foggy weather road vehicle is remarkably improved, the network model parameters are small, and the real-time requirement is met.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the method for detecting the target of the road vehicle on the foggy days in real time with the enhanced self-adaptive target characteristics comprises the following steps of;

step 1, synthesizing corresponding image target detection data sets of foggy days with different concentrations according to an open-source foggy road vehicle target detection data set, and synthesizing a defogging network training set and an overall network training set according to the open-source foggy road vehicle target detection data set, the corresponding image target detection data sets of different concentration foggy days and the open-source real foggy road vehicle target detection data set;

step 2, a real-time foggy-day road vehicle target detection model is constructed, wherein the real-time foggy-day road vehicle target detection model comprises a lightweight defogging module, a lightweight self-adaptive target characteristic enhancement module and a target detection network YOLOv;

Step 3, training the lightweight defogging module by using a defogging network training set, wherein the input of the lightweight defogging module is an image in the defogging network training set, the input of the lightweight defogging module is a corresponding refined defogging image, and the current lightweight defogging module is stored after the training is finished;

Step 4, freezing parameters of the lightweight defogging module, performing end-to-end training on a real-time foggy-day road vehicle target detection model by utilizing an integral network training set, wherein the input of the real-time foggy-day road vehicle target detection model is an image in the integral network training set, the input is output as a corresponding type and coordinate of a vehicle target, and the current real-time foggy-day road vehicle target detection model is stored after training is finished;

and 5, inputting the foggy road vehicle image into a trained real-time foggy road vehicle target detection model to obtain the category and coordinates of the vehicle target in the foggy road vehicle image, and finishing the real-time foggy road vehicle target detection with enhanced self-adaptive target characteristics.

Further, the specific mode of the step 1 is as follows:

Step 101, according to an open-source foggy road vehicle target detection data set, synthesizing corresponding image target detection data sets of foggy days with different concentrations:

I(x)=J(x)e^-β(λ)d(x)+A(1-e^-β(λ)d(x));

Wherein x represents the pixel position of an image, J (x) represents the image in the open-source foggy road vehicle target detection dataset, I (x) represents the corresponding image of the corresponding synthetic foggy day, beta (lambda) represents the atmospheric scattering coefficient, d (x) represents the depth map corresponding to J (x) estimated by the monocular image depth estimation algorithm MiDaS, A represents the atmospheric light at infinity;

102, combining the open-source non-fog road vehicle target detection data set and the image target detection data set corresponding to different concentration foggy days to form a defogging network training set, and combining the open-source non-fog road vehicle target detection data set, the image target detection data set corresponding to different concentration foggy days and the open-source real foggy day road vehicle target detection data set to form an integral network training set.

Further, the lightweight defogging module in the step 2 comprises a first-stage encoder, a first-stage decoder, a second-stage encoder, a second-stage decoder and an image cascade layer;

the first-stage encoder comprises 1×1 convolution layers with 1 step length of 1 output channel number of 8, a first Relu activation function layer, 23×3 convolution layers with 1 step length of 1 output channel number of 8, a second Relu activation function layer, 1 first maximum pooling layer with 2 step length of 2 window sizes of 3×3, 23×3 convolution layers with 1 step length of 1 output channel number of 16, a third Relu activation function layer, 1 second maximum pooling layer with 2 step length of 2 window sizes of 3×3, 13×3 convolution layer with 1 step length of 1 output channel number of 32 and a fourth Relu activation function layer which are connected in sequence;

The first-stage decoder comprises a3 multiplied by3 convolution layer with 1 step length of 1 output channel number of 32, a fifth Relu activation function layer, a first cascade layer, a3 multiplied by3 convolution layer with 1 step length of 1 output channel number of 16, a sixth Relu activation function layer, a first up-sampling layer with 1 step length of 2, a second cascade layer, a3 multiplied by3 convolution layer with 1 step length of 1 output channel number of 8, a seventh Relu activation function layer, a second up-sampling layer with 1 step length of 2, a3 multiplied by3 convolution layer with 1 step length of 1 output channel number of 3 and an eighth Relu activation function layer which are sequentially connected;

The output of the fourth Relu activation function layer in the first-stage encoder is the input of a 3×3 convolution layer with the step length of 1 and the output channel number of 32 in the first-stage decoder, the first cascade layer in the first-stage decoder is used for cascading the output of the second maximum pooling layer in the first-stage encoder with the output of the fifth Relu activation function layer in the first-stage decoder, and the second cascade layer in the first-stage decoder is used for cascading the output of the first maximum pooling layer in the first-stage encoder with the output of the first up-sampling layer in the first-stage decoder;

The second-stage encoder comprises 1×1 convolution layer with 1 step length of 1 output channel number of 8, a ninth Relu activation function layer, 23×3 convolution layers with 1 step length of 1 output channel number of 8, a tenth Relu activation function layer, 1 third maximum pooling layer with 2 step length of 3×3 window size, a third cascade layer, 23×3 convolution layers with 1 step length of 1 output channel number of 16, an eleventh Relu activation function layer, 1 fourth maximum pooling layer with 2 step length of 2 window size of 3×3, a fourth cascade layer, 13×3 convolution layer with 1 step length of 1 output channel number of 32 and a twelfth Relu activation function layer which are connected in sequence;

The third cascade layer in the second-stage encoder is used for cascading the output of the first maximum pooling layer in the first-stage encoder with the output of the third maximum pooling layer in the second-stage encoder, and the fourth cascade layer in the second-stage encoder is used for cascading the output of the second maximum pooling layer in the first-stage encoder with the output of the fourth maximum pooling layer in the second-stage encoder;

The second-stage decoder comprises a 3×3 convolution layer with 1 step length of 1 output channel number of 32, a thirteenth Relu activation function layer, a fifth cascade layer, a 3×3 convolution layer with 1 step length of 1 output channel number of 16, a fourteenth Relu activation function layer, a third up-sampling layer with 1 step length of 2, a sixth cascade layer, a 3×3 convolution layer with 1 step length of 1 output channel number of 8, a fifteenth Relu activation function layer, a fourth up-sampling layer with 1 step length of 2, a 3×3 convolution layer with 2 step length of 1 output channel number of 3 and a sixteenth Relu activation function layer which are sequentially connected;

The twelfth Relu activation function layer output in the second-stage encoder is the input of the 3×3 convolution layer with the step length of 1 output channel number of 32 in the second-stage decoder, the fifth cascade layer in the second-stage decoder is used for cascading the output of the fifth Relu activation function layer in the first-stage decoder, the output of the fourth maximum pooling layer in the second-stage encoder and the output of the thirteenth Relu activation function layer in the second-stage decoder, and the sixth cascade layer in the second-stage decoder is used for cascading the output of the first upsampling layer in the first-stage decoder, the output of the third maximum pooling layer in the second-stage encoder and the output of the third upsampling layer in the second-stage decoder.

Further, the lightweight adaptive target feature enhancement module in step 2 includes a super-parameter prediction module, a conductive image processing module, and a gating module.

Further, the super-parametric prediction module comprises a3×3 convolution layer with 1 step length of 1 output channel number of 16, a seventeenth Relu activation function layer, a fifth maximum pooling layer with 1 step length of 2 window sizes of 3×3, a3×3 convolution layer with 1 step length of 1 output channel number of 32, an eighteenth Relu activation function layer, a sixth maximum pooling layer with 1 step length of 2 window sizes of 3×3, a3×3 convolution layer with 1 step length of 1 output channel number of 32, a nineteenth Relu activation function layer, a seventh maximum pooling layer with 1 step length of 2 window sizes of 3×3, a3×3 convolution layer with 1 step length of 1 output channel number of 64, a twenty Relu activation function layer with 1 step length of 2 window sizes of 3×3, a3×3 convolution layer with 1 output channel number of 64, a twenty first Relu activation function layer, a3×3 convolution layer with 1 step length of 2 window sizes of 2×3×3, a third full-scale neural layer with 1 output channel number of 16, a full-scale neural layer with 1 window sizes of 2×3×3, a full-scale neural layer with 3 input and full-scale neural layer with 3, a full-bridge function layer with 3 input to the full-bridge module, and a full-bridge image module with full-bridge function layer with full-bridge input function of 16, and full-bridge input to the full-bridge function layer with full-bridge input function layer with full-bridge input function input to the full-bridge function layer with full-bridge input to the full-bridge function layer.

Further, the conductive image processing module comprises six unordered conductive image processors including exposure degree adjustment, white balance, gamma conversion, contrast enhancement, sharpening and color adjustment, and the conductive image processing module receives the refined defogging image output by the lightweight defogging module and 15 super parameters output by the super parameter prediction module and outputs the corresponding processed image;

The exposure adjustment is represented by the following mapping function:

J_exposure(x)＝2^E×J_Essence(x);

wherein J _Essence (x) represents a refined defogging image output by the lightweight defogging module, J _exposure (x) represents a refined defogging image after exposure degree adjustment, and E represents an exposure degree adjustment super-parameter;

the white balance is represented by the following mapping function:

J_wb(x)＝(W_rr(x),W_gg(x),W_bb(x));

Wherein J _wb (x) is a refined defogging image after white balance, W _r,W_g,W_b respectively represents the white balance super parameters of r, g and b color channels, r (x), g (x) and b (x) respectively represent the pixel values of the refined defogging image J _Essence (x) in r, g and b color channels;

The gamma transformation is represented by the following mapping function:

Wherein J _gamma (x) is a refined defogging image after gamma conversion, and gamma represents gamma conversion super-parameters;

The contrast enhancement is represented by the following mapping function:

J_contrast(x)＝α×En(J_Essence(x))+(1-α)×J_Essence(x);

Wherein J _contrast (x) is a refined defogging image after contrast enhancement, and alpha represents contrast enhancement super-parameters;

Lum(J_Essence(x))=0.3r(x)+0.59g(x)+0.11b(x);

Sigmoid () is a nonlinear activation function, lum () represents a pixel brightness function, and En () represents an image brightness enhancement function;

The sharpening is represented by the following mapping function:

J_sharpen(x)＝J_Essence(x)+λ₁(J_Essence(x)-Gaussian(J_Essence(x)));

Wherein J _sharpen (x) is a sharpened refined defogging image, λ ₁ is a sharpening hyper-parameter, and Gaussian () is a Gaussian filter function;

The color adjustment is represented by the following mapping function:

Wherein J _tone (x) is the color-adjusted refined defogging image, t _i is the i+1th color-adjustment super parameter, i=0, 1,2,..;

Further, the gating module receives 6 weight parameters output by the super parameter prediction module, and is used for performing adaptive weighted fusion on the images processed by the six unordered conductive image processors to obtain an adaptive enhanced image J _{Strong strength} (x):

J_{Strong strength}(x)＝w₁J_exposure(x)+w₂J_wb(x)+w₃J_gamma(x)+w₄J_contrast(x)+w₅J_sharpen(x)+w₆J_tone(x);

Wherein w ₁ is an exposure adjustment weight parameter, w ₂ is a white balance weight parameter, w ₃ is a gamma conversion weight parameter, w ₄ is a contrast enhancement weight parameter, w ₅ is a sharpening weight parameter, and w ₆ is a color adjustment weight parameter.

Further, the specific mode of the step 3 is as follows:

Step 301, inputting the image in the defogging network training set into a 1×1 convolution layer with the step length of the first-stage encoder in the lightweight defogging module being 1 and the output channel number being 8, wherein an eighth Relu of the first-stage decoder in the lightweight defogging module activates a coarse map K _{Coarse size} (x) with the function layer output channel number being 3, and calculating a coarse defogging image corresponding to the image in the defogging network training set according to the coarse map K _{Coarse size} (x):

J_{Coarse size}(x)＝K_{Coarse size}(x)×I'(x)-K_{Coarse size}(x)+1;

Wherein, I ' (x) represents an image in the defogging network training set, I ' (x) =I (x) or J (x), and I ' (x) is an RGB image with 3 channels, J _{Coarse size} (x) is a coarse defogging image;

According to the image cascade layer of the lightweight defogging module, cascading the image in the defogging network training set with the corresponding coarse defogging image output by the first-stage decoder to obtain a cascade image with the channel number of 6, inputting the cascade image into a 1X 1 convolution layer with the step length of a second-stage encoder in the lightweight defogging module of 1 output channel number of 8, enabling a sixteenth Relu of a second-stage decoder of the lightweight defogging module to activate a fine mapping graph K _Essence (x) with the function layer output channel number of 3, and calculating the fine defogging image corresponding to the image in the defogging network training set according to the fine mapping graph K _Essence (x):

J_Essence(x)＝K_Essence(x)×I'(x)-K_Essence(x)+1;

Step 302, using an open-source non-fog road vehicle target detection data set as a defogging truth value label, calculating a Loss value between a coarse defogging image and a corresponding defogging truth value label by adopting an L1 Loss function, marking as Loss ₁, calculating a Loss value between a fine defogging image and a corresponding defogging truth value label by adopting an L2 Loss function, marking as Loss ₂, calculating a total Loss value Loss corresponding to a lightweight defogging module, namely loss=0.4 Loss ₁+0.6loss₂, and performing reverse propagation training on the lightweight defogging module by adopting a gradient descent method according to the total Loss value Loss until the total Loss value is converged, ending training, and storing the current lightweight defogging module.

Further, the specific mode of the step 4 is as follows:

Step 401, freezing parameters of a lightweight defogging module, inputting images in an integral network training set into a real-time foggy-day road vehicle target detection model, outputting a refined defogging image to a super-parameter prediction module and a conductive image processing module by the lightweight defogging module, and outputting 15 super-parameters in the conductive image processing module and 6 weight parameters in a gating module by the super-parameter prediction module;

the guided image processing module obtains the fine defogging images after exposure degree adjustment, white balance, gamma conversion, contrast enhancement, sharpening and color adjustment according to the 15 super parameters output by the super parameter prediction module and the fine defogging images output by the lightweight defogging module;

The gating module performs weighted fusion on the refined defogging images after exposure degree adjustment, white balance, gamma conversion, contrast enhancement, sharpening and color adjustment output by the conductive image processing module according to the 6 weight parameters output by the super parameter prediction module to obtain corresponding self-adaptive enhancement images;

The self-adaptive enhanced image is input into a target detection network YOLOv, the class and the coordinate of a vehicle target corresponding to the image in the whole network training set are output, the class and the coordinate of the vehicle target contained in the image in the whole network training set are used as target detection truth value labels, the loss value between the class and the coordinate of the vehicle target and the target detection truth value labels is calculated and output according to the loss function of the target detection network YOLOv, the super-parameter prediction module and the target detection network YOLOv5 are subjected to reverse propagation training according to the loss value by adopting a gradient descent method until the loss value converges, the training is ended, and the current real-time foggy road vehicle target detection model is stored.

Due to the adoption of the technical scheme, the invention has the beneficial effects compared with the prior art that:

1. According to the target detection method for the real-time foggy road vehicle with the enhanced self-adaptive target characteristics, the target characteristics of the image can be enhanced by utilizing the target detection loss in a self-adaptive manner, rather than only the foggy degraded image is restored into the image with good sense, so that the information potentially beneficial to target detection in the foggy image is utilized more fully, and the target detection performance of the foggy road vehicle is obviously improved.

2. The lightweight defogging network module provided by the invention adopts a multi-view structure and utilizes the atmospheric scattering principle to defog, and has excellent defogging effect, and a far ultra-dark channel priori defogging method and other traditional defogging methods.

3. The lightweight self-adaptive target feature enhancement module provided by the invention can carry out self-adaptive enhancement on defogging images by learning super parameters of image processing operations, and realize self-adaptive fusion of different image processing operations by the gating module, thereby effectively avoiding the problem of excessive enhancement of images caused by sequential image processing.

4. The defogging module and the self-adaptive target characteristic enhancement module provided by the invention both adopt lightweight structures, can realize real-time target detection of road vehicles in foggy days, and also have an improvement effect on the target detection of road vehicles under natural weather conditions and adverse conditions such as sand storm, dim light and the like.

Drawings

Fig. 1 is an overall flowchart of a method for detecting a target of a road vehicle in a foggy weather in real time with enhanced adaptive target features in an embodiment of the present invention.

Fig. 2 is a schematic diagram of a real-time foggy road vehicle target detection model in an embodiment of the invention.

FIG. 3 is a schematic diagram of a lightweight defogging module according to an embodiment of the present invention.

FIG. 4 is a schematic diagram of a lightweight adaptive target feature enhancement module in an embodiment of the invention.

Fig. 5 is a comparison method and a comparison schematic diagram of a visual result of the vehicle target detection on a foggy road according to an embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the drawings and the specific embodiments.

The method for detecting the target of the road vehicle on the foggy days in real time with the enhanced self-adaptive target characteristics is shown in fig. 1, and comprises the following steps of;

Specifically, in this embodiment, a UA-DETRAC non-fog road vehicle target detection dataset and an RTTS real fog target detection dataset are selected, where both datasets include the road vehicle target concerned, and the corresponding class and coordinates of the road vehicle;

step 2, a real-time foggy-day road vehicle target detection model is constructed, and as shown in fig. 2, the real-time foggy-day road vehicle target detection model comprises a lightweight defogging module, a lightweight self-adaptive target feature enhancement module and a target detection network YOLOv;

Further, the specific mode of the step 1 is as follows:

I(x)=J(x)e^-β(λ)d(x)+A(1-e^-β(λ)d(x));

Specifically, in this embodiment, the adopted atmospheric light a∈ {0.7,0.8,0.9}, and the atmospheric scattering coefficient β (λ) ∈ {0.5,1.0,1.5,2.0,3.0,4.0}, and 18 kinds of foggy-day image target detection datasets with different concentrations are synthesized by adjusting the values of β (λ) and a.

Further, as shown in fig. 3, the lightweight defogging module in step 2 includes a first-stage encoder, a first-stage decoder, a second-stage encoder, a second-stage decoder, and an image cascade layer;

The twelfth Relu activation function layer output in the second-stage encoder is the input of the 3×3 convolution layer with the step length of 1 and the output channel number of 32 in the second-stage decoder, the fifth cascade layer in the second-stage decoder is used for cascading the output of the fifth Relu activation function layer in the first-stage decoder, the output of the fourth maximum pooling layer in the second-stage encoder and the output of the thirteenth Relu activation function layer in the second-stage decoder, and the sixth cascade layer in the second-stage decoder is used for cascading the output of the first upsampling layer in the first-stage decoder, the output of the third maximum pooling layer in the second-stage encoder and the output of the third upsampling layer in the second-stage decoder.

Further, as shown in fig. 4, the super-parametric prediction module comprises a 3×3 convolution layer with 1 step length of 1 output channel number of 16, a seventeenth Relu activation function layer, a fifth maximum pooling layer with 1 step length of 2 window sizes of 3×3, a 3×3 convolution layer with 1 step length of 1 output channel number of 32, an eighteenth Relu activation function layer, a sixth maximum pooling layer with 1 step length of 2 window sizes of 3×3, a 3×3 convolution layer with 1 step length of 1 output channel number of 32, a nineteenth Relu activation function layer, a seventh maximum pooling layer with 1 step length of 2 window sizes of 3×3, a 3×3 convolution layer with 1 step length of 1 output channel number of 64, a twenty Relu activation function layer, a eighth maximum pooling layer with 1 step length of 2 window sizes of 3×3, a 3×3 convolution layer with 1 output channel number of 64, a twenty first Relu activation function layer, a 3×3 window size of 1×3 window size of 1, a third full-scale layer with 1 window size of 2×3, a full-scale layer with 1 input-fog-level of 16, a full-scale image-channel number of 15, a full-scale filter module, a full-scale image-bridge module with a full-scale filter layer with 1 window size of 1×3, a full-scale buffer layer with 1 window size of 1×3, and a full-scale-channel number of 16, and a full-scale-channel number of 15, and a full-bridge module with full-scale-bridge, wherein the full-bridge image input-bridge module is connected to the full-bridge input to the full-bridge module, and the full-bridge module is connected with the full-bridge, and the full-bridge, the full-bridge layer.

The exposure adjustment is represented by the following mapping function:

J_exposure(x)＝2^E×J_Essence(x);

the white balance is represented by the following mapping function:

J_wb(x)＝(W_rr(x),W_gg(x),W_bb(x));

The gamma transformation is represented by the following mapping function:

The contrast enhancement is represented by the following mapping function:

J_contrast(x)＝α×En(J_Essence(x))+(1-α)×J_Essence(x);

Lum(J_Essence(x))=0.3r(x)+0.59g(x)+0.11b(x);

The sharpening is represented by the following mapping function:

J_sharpen(x)＝J_Essence(x)+λ₁(J_Essence(x)-Gaussian(J_Essence(x)));

The color adjustment is represented by the following mapping function:

Further, the specific mode of the step 3 is as follows:

J_{Coarse size}(x)＝K_{Coarse size}(x)×I'(x)-K_{Coarse size}(x)+1;

J_Essence(x)＝K_Essence(x)×I'(x)-K_Essence(x)+1;

Specifically, the simplified atmospheric scattering model can be expressed as:

I(x)=J(x)t(x)+A(1-t(x));

Wherein I (x) represents a foggy image, J (x) represents a foggy image, a represents atmospheric light intensity, t (x) represents a transmission map, and x represents an image pixel.

Defogging images can be obtained according to an atmospheric scattering model:

combining the parameters t (x) and a to be predicted into the unknown K (x), the formula can be transformed into:

J(x)=K(x)I(x)-K(x)+1;

Calculating a mapping parameter K (x) of the foggy day image, and defogging the foggy day image by combining the images.

Further, the specific mode of the step 4 is as follows:

Specifically, as shown in fig. 5,6 images on the left side of fig. 5 are visual result diagrams for detecting the targets of the foggy road vehicles under different concentration foggy days, and 6 images on the right side of fig. 5 are visual result diagrams for detecting the targets of the foggy road vehicles under different concentration foggy days.

In a word, the invention can utilize the lightweight defogging module to defog the foggy road vehicle image rapidly, recover the overall detail information of the foggy road image, and guide the lightweight self-adaptive target feature enhancement module by adopting the target detection task to adaptively enhance the effective features of the foggy road vehicle target, thereby greatly improving the detection performance of the foggy road vehicle target. Compared with the prior art, the method has the advantages of better detection effect and stronger robustness on the target of the foggy road vehicle, meets the real-time requirement, can be applied to intelligent traffic, and provides a new solution for the target detection of the foggy road vehicle.

Those skilled in the art will recognize that the embodiments described are for the purpose of aiding the reader in understanding the principles of the invention and should be understood to be not limited to the embodiments described. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A real-time foggy road vehicle target detection method with adaptive target feature enhancement, characterized in that it comprises the following steps;

Step 1: synthesize the corresponding image target detection datasets of foggy days with different concentrations based on the open source fog-free road vehicle target detection dataset, and form the defogging network training set and the overall network training set based on the open source fog-free road vehicle target detection dataset, the synthesized corresponding image target detection datasets of foggy days with different concentrations, and the open source real foggy road vehicle target detection dataset;

Step 2, constructing a real-time foggy road vehicle target detection model, wherein the real-time foggy road vehicle target detection model includes a lightweight defogging module, a lightweight adaptive target feature enhancement module, and a target detection network YOLOv5;

Step 3, use the defogging network training set to train the lightweight defogging module, the input of the lightweight defogging module is the image in the defogging network training set, and the output is the corresponding refined defogging image. After the training is completed, the current lightweight defogging module is saved;

Step 4, freeze the parameters of the lightweight defogging module, use the overall network training set to perform end-to-end training on the real-time foggy road vehicle target detection model, the input of the real-time foggy road vehicle target detection model is the image in the overall network training set, and the output is the category and coordinates of the corresponding vehicle target. After the training is completed, save the current real-time foggy road vehicle target detection model;

Step 5, input the foggy road vehicle image into the trained real-time foggy road vehicle target detection model, obtain the category and coordinates of the vehicle target in the foggy road vehicle image, and complete the real-time foggy road vehicle target detection with adaptive target feature enhancement.

2. The real-time foggy road vehicle target detection method with adaptive target feature enhancement according to claim 1 is characterized in that the specific manner of step 1 is:

Step 101, based on the open-source fog-free road vehicle target detection dataset, synthesize the corresponding image target detection datasets of foggy days with different concentrations:

I(x)=J(x)e ^-β(λ)d(x) +A(1-e ^-β(λ)d(x) );

Where x represents the pixel position of the image, J(x) represents the image in the open-source fog-free road vehicle target detection dataset, I(x) represents the corresponding image of the synthetic foggy day, β(λ) represents the atmospheric scattering coefficient, d(x) represents the depth map corresponding to J(x) estimated by the monocular image depth estimation algorithm MiDaS, and A represents the atmospheric light at infinity. By adjusting the values of β(λ) and A, the corresponding image target detection datasets of foggy days with different concentrations are synthesized.

Step 102, the open source fog-free road vehicle target detection dataset and the corresponding image target detection dataset of synthetic fog with different concentrations are combined into a defogging network training set; the open source fog-free road vehicle target detection dataset, the corresponding image target detection dataset of synthetic fog with different concentrations and the open source real foggy road vehicle target detection dataset are combined into an overall network training set.

3. The real-time foggy road vehicle target detection method with adaptive target feature enhancement according to claim 1, characterized in that the lightweight defogging module in step 2 comprises a first-stage encoder, a first-stage decoder, a second-stage encoder, a second-stage decoder and an image cascade layer;

Among them, the first-level encoder includes a 1×1 convolution layer with a stride of 1 and an output channel number of 8, a first Relu activation function layer, two 3×3 convolution layers with a stride of 1 and an output channel number of 8, a second Relu activation function layer, a first maximum pooling layer with a stride of 2 and a window size of 3×3, two 3×3 convolution layers with a stride of 1 and an output channel number of 16, a third Relu activation function layer, a second maximum pooling layer with a stride of 2 and a window size of 3×3, a 3×3 convolution layer with a stride of 1 and an output channel number of 32, and a fourth Relu activation function layer.

The first-stage decoder includes a 3×3 convolution layer with a step size of 1 and an output channel number of 32, a fifth Relu activation function layer, a first cascade layer, a 3×3 convolution layer with a step size of 1 and an output channel number of 16, a sixth Relu activation function layer, a first upsampling layer with a step size of 2, a second cascade layer, a 3×3 convolution layer with a step size of 1 and an output channel number of 8, a seventh Relu activation function layer, a second upsampling layer with a step size of 2, a 3×3 convolution layer with a step size of 1 and an output channel number of 3, and an eighth Relu activation function layer.

Among them, the output of the fourth Relu activation function layer in the first-stage encoder is the input of the 3×3 convolution layer with a step size of 1 and an output channel number of 32 in the first-stage decoder; the first cascade layer in the first-stage decoder is used to cascade the output of the second maximum pooling layer in the first-stage encoder with the output of the fifth Relu activation function layer in the first-stage decoder, and the second cascade layer in the first-stage decoder is used to cascade the output of the first maximum pooling layer in the first-stage encoder with the output of the first upsampling layer in the first-stage decoder;

The second-stage encoder includes a 1×1 convolution layer with a stride of 1 and an output channel number of 8, a ninth Relu activation function layer, two 3×3 convolution layers with a stride of 1 and an output channel number of 8, a tenth Relu activation function layer, a third maximum pooling layer with a stride of 2 and a window size of 3×3, a third cascade layer, two 3×3 convolution layers with a stride of 1 and an output channel number of 16, an eleventh Relu activation function layer, a fourth maximum pooling layer with a stride of 2 and a window size of 3×3, a fourth cascade layer, a 3×3 convolution layer with a stride of 1 and an output channel number of 32, and a twelfth Relu activation function layer.

The third cascade layer in the second-stage encoder is used to cascade the output of the first maximum pooling layer in the first-stage encoder with the output of the third maximum pooling layer in the second-stage encoder, and the fourth cascade layer in the second-stage encoder is used to cascade the output of the second maximum pooling layer in the first-stage encoder with the output of the fourth maximum pooling layer in the second-stage encoder;

The second stage decoder includes a 3×3 convolution layer with a step size of 1 and an output channel number of 32, a thirteenth Relu activation function layer, a fifth cascade layer, a 3×3 convolution layer with a step size of 1 and an output channel number of 16, a fourteenth Relu activation function layer, a third upsampling layer with a step size of 2, a sixth cascade layer, a 3×3 convolution layer with a step size of 1 and an output channel number of 8, a fifteenth Relu activation function layer, a fourth upsampling layer with a step size of 2, two 3×3 convolution layers with a step size of 1 and an output channel number of 3, and a sixteenth Relu activation function layer.

Among them, the output of the twelfth Relu activation function layer in the second-level encoder is the input of the 3×3 convolutional layer with a step size of 1 and an output channel number of 32 in the second-level decoder; the fifth cascade layer in the second-level decoder is used to cascade the output of the fifth Relu activation function layer in the first-level decoder, the output of the fourth maximum pooling layer in the second-level encoder, and the output of the thirteenth Relu activation function layer in the second-level decoder; the sixth cascade layer in the second-level decoder is used to cascade the output of the first upsampling layer in the first-level decoder, the output of the third maximum pooling layer in the second-level encoder, and the output of the third upsampling layer in the second-level decoder.

4. The real-time foggy road vehicle target detection method with adaptive target feature enhancement according to claim 3 is characterized in that the lightweight adaptive target feature enhancement module in step 2 includes a hyperparameter prediction module, a guideable image processing module and a gating module.

5. The real-time foggy road vehicle target detection method with adaptive target feature enhancement according to claim 4 is characterized in that the hyperparameter prediction module includes a 3×3 convolution layer with a step size of 1 and an output channel number of 16, a seventeenth Relu activation function layer, a fifth maximum pooling layer with a step size of 2 and a window size of 3×3, a 3×3 convolution layer with a step size of 1 and an output channel number of 32, an eighteenth Relu activation function layer, a sixth maximum pooling layer with a step size of 2 and a window size of 3×3, a 3×3 convolution layer with a step size of 1 and an output channel number of 32, a nineteenth Relu activation function layer, a seventh maximum pooling layer with a step size of 2 and a window size of 3×3, a 3×3 convolution layer with a step size of 1 and an output channel number of 64, a twentieth Relu activation function layer, a step size of 2 and a window size of 3×3 The eighth maximum pooling layer, a 3×3 convolution layer with a stride of 1 and an output channel number of 64, the twenty-first Relu activation function layer, a ninth maximum pooling layer with a stride of 2 and a window size of 3×3, a 3×3 convolution layer with a stride of 1 and an output channel number of 128, the twenty-second Relu activation function layer, a global average pooling layer, a first fully connected layer with an output number of 256 neurons, and two parallel second fully connected layers and a third fully connected layer; the number of output neurons of the second fully connected layer is 15, and the number of output neurons of the third fully connected layer is 6; wherein, the input of the 3×3 convolution layer with a stride of 1 and an output channel number of 16 is the refined dehazed image output by the lightweight dehazing module, the output of the second fully connected layer is the 15 hyperparameters in the derivable image processing module, and the output of the third fully connected layer is the 6 weight parameters in the gating module.

6. The real-time foggy road vehicle target detection method with adaptive target feature enhancement according to claim 5 is characterized in that the guideable image processing module includes six disordered guideable image processors for exposure adjustment, white balance, gamma transformation, contrast enhancement, sharpening, and color adjustment; the guideable image processing module receives the refined defogging image output by the lightweight defogging module and 15 hyperparameters output by the hyperparameter prediction module, and outputs the corresponding processed image;

The exposure adjustment is represented by the following mapping function:

_Jexposure (x) = 2 ^E × _Jfine (x);

Where _Jfine (x) represents the refined dehazed image output by the lightweight dehazing module, _Jexposure (x) represents the refined dehazed image after exposure adjustment, and E represents the exposure adjustment hyperparameter.

The white balance is represented by the following mapping function:

J _wb (x) = (W _r r (x), W _g g (x), W _b b (x));

Wherein, _Jwb (x)= is the refined dehazed image after white balance, _Wr , _Wg , _Wb represent the white balance hyperparameters of the r, g, b color channels respectively, r(x), g(x), b(x) represent the pixel values of the refined dehazed image _Jprecision (x) in the r, g, b color channels respectively;

The gamma transform is represented by the following mapping function:

Where J _gamma (x) is the refined dehazed image after gamma transformation, and γ represents the gamma transformation hyperparameter;

The contrast enhancement is represented by the following mapping function:

J _contrast (x) = α × En (J _fine (x)) + (1-α) × J _fine (x);

Where J _contrast (x) is the refined dehazed image after contrast enhancement, and α represents the contrast enhancement hyperparameter;

Lum( _J (x))=0.3r(x)+0.59g(x)+0.11b(x);

Sigmoid() is a nonlinear activation function, Lum() represents the pixel brightness function, and En() represents the image brightness enhancement function;

The sharpening is represented by the following mapping function:

J _sharpen (x) = J _fine (x) + λ ₁ (J _fine (x) - Gaussian (J _fine (x)));

Wherein, J _sharpen (x) is the sharpened dehazed image, λ ₁ is the sharpening hyperparameter, and Gaussian() is the Gaussian filter function;

The color adjustment is represented by the following mapping function:

Wherein, J _tone (x) is the refined dehazed image after color adjustment, _ti is the i+1th color adjustment hyperparameter, i=0,1,2,...,7, clip(y,0,1) represents the truncation function;

7. The real-time foggy road vehicle target detection method with adaptive target feature enhancement according to claim 6 is characterized in that the gating module receives the six weight parameters output by the hyperparameter prediction module, and is used to perform adaptive weighted fusion on the images processed by the six disordered derivable image processors to obtain an adaptive enhanced image J _strength (x):

J _strong (x)＝w ₁ J _exposure (x)+w ₂ J _wb (x)+w ₃ J _gamma (x)+w ₄ J _contrast (x)+w ₅ J _sharpen (x)+w ₆ J _tone (x);

Among them, _w1 is the exposure adjustment weight parameter, _w2 is the white balance weight parameter, _w3 is the gamma transformation weight parameter, _w4 is the contrast enhancement weight parameter, _w5 is the sharpening weight parameter, and _w6 is the color adjustment weight parameter.

8. The real-time foggy road vehicle target detection method with adaptive target feature enhancement according to claim 7, characterized in that the specific manner of step 3 is:

Step 301, input the image in the defogging network training set into the 1×1 convolution layer with a step size of 1 and an output channel number of 8 of the first-level encoder in the lightweight defogging module, and the eighth Relu activation function layer of the first-level decoder of the lightweight defogging module outputs a coarse mapping map _Kcoarse (x) with a channel number of 3. The coarse defogging image corresponding to the image in the defogging network training set is calculated according to the coarse mapping map _Kcoarse (x):

_Jcoarse (x)= _Kcoarse (x)×I'(x) _-Kcoarse (x)+1;

Where I'(x) represents the image in the dehazing network training set, I'(x) = I(x) or J(x), and I'(x) is an RGB image with 3 channels; _Jcoarse (x) is the coarse dehazed image;

According to the image cascade layer of the lightweight defogging module, the image in the defogging network training set is cascaded with the corresponding coarse defogging image output by the first-level decoder to obtain a cascaded image with 6 channels. The cascaded image is input into the 1×1 convolution layer with a step size of 1 and an output channel number of 8 of the second-level encoder in the lightweight defogging module. The sixteenth Relu activation function layer of the second-level decoder of the lightweight defogging module outputs a fine map _Kfine (x) with a channel number of 3; the fine defogging image corresponding to the image in the defogging network training set is calculated according to the fine map _Kfine (x):

J_精(x)＝K_精(x)×I'(x)-K_精(x)+1;

Step 302: Use the open-source fog-free road vehicle target detection dataset as the defogging true value label, use the L1 loss function to calculate the loss value between the coarse defogging image and the corresponding defogging true value label, recorded as loss ₁ , use the L2 loss function to calculate the loss value between the fine defogging image and the corresponding defogging true value label, recorded as loss ₂ , and calculate the total loss value Loss corresponding to the lightweight defogging module:

Loss = 0.4loss ₁ + 0.6loss ₂ ; According to the total loss value Loss, the gradient descent method is used to perform back propagation training on the lightweight defogging module until the total loss value converges, the training is terminated, and the current lightweight defogging module is saved.

9. The real-time foggy road vehicle target detection method with adaptive target feature enhancement according to claim 8, characterized in that the specific manner of step 4 is:

Step 401, freeze the parameters of the lightweight defogging module, input the images in the overall network training set into the real-time foggy road vehicle target detection model, the lightweight defogging module outputs the refined defogging image to the hyperparameter prediction module and the derivable image processing module, and the hyperparameter prediction module outputs 15 hyperparameters in the derivable image processing module and 6 weight parameters in the gating module;

The guideable image processing module obtains the refined defogging image after exposure adjustment, white balance, gamma transformation, contrast enhancement, sharpening and color adjustment according to the 15 hyperparameters output by the hyperparameter prediction module and the refined defogging image output by the lightweight defogging module;

The gating module performs weighted fusion on the refined dehazed image after exposure adjustment, white balance, gamma transformation, contrast enhancement, sharpening, and color adjustment output by the guideable image processing module according to the six weight parameters output by the hyperparameter prediction module to obtain the corresponding adaptive enhanced image;

The adaptive enhanced image is input into the target detection network YOLOv5, and the category and coordinates of the vehicle target corresponding to the image in the overall network training set are output. The category and coordinates of the vehicle target contained in the image in the overall network training set are used as the target detection true value label. According to the loss function of the target detection network YOLOv5 itself, the loss value between the output vehicle target category and coordinates and the target detection true value label is calculated. According to the loss value, the gradient descent method is used to perform back propagation training on the hyperparameter prediction module and the target detection network YOLOv5 until the loss value converges, the training is terminated, and the current real-time foggy road vehicle target detection model is saved.