CN120182788B

CN120182788B - Abnormal weather identification method and device based on deep learning, electronic equipment and program product

Info

Publication number: CN120182788B
Application number: CN202510549910.3A
Authority: CN
Inventors: 王鹏; 张凯; 刘加美
Original assignee: Shenzhen Ruiming Pixel Technology Co ltd
Current assignee: Shenzhen Ruiming Pixel Technology Co ltd
Priority date: 2025-04-29
Filing date: 2025-04-29
Publication date: 2025-09-12
Anticipated expiration: 2045-04-29
Also published as: CN120182788A

Abstract

The present application discloses a method, device, electronic device and program product for abnormal weather recognition based on deep learning. The recognition method sets a C2fPRFF module in the backbone network through the weather recognition model, extracts multi-scale receptive field features through the progressive fusion of stacked PRFFBlocks, and enhances the model's understanding of contextual information. The neck network is provided with an EGMA module, which can combine the channel attention and spatial attention sub-modules, and add a channel shuffling operation to effectively improve the expressive ability of features and help the model better deal with the problems of poor image quality and difficulty in distinguishing foreground and background. In addition, by introducing a newly designed focused fusion intersection-over-union loss, the positioning accuracy of target detection and the convergence speed of the model can be optimized, especially under complex background and interference conditions, which improves the robustness and accuracy of the model. The comprehensive improvements in the three aspects can significantly improve the detection performance and reliability of the weather recognition model in harsh environments.

Description

Abnormal weather identification method and device based on deep learning, electronic equipment and program product

Technical Field

The application belongs to the technical field of image processing, and particularly relates to an abnormal weather identification method, an abnormal weather identification device, electronic equipment and a computer program product based on deep learning.

Background

The abnormal weather identification technology can be used for automatically detecting and classifying extreme meteorological conditions such as sand storm, fog, rainfall and ice and snow, and has important application value in the fields of traffic safety, agricultural production, urban management and the like. By identifying abnormal weather in real time, the effects of early warning prompt, reducing accident risk, optimizing resource scheduling and the like can be achieved, and the coping capacity of society to the abnormal weather is improved.

However, abnormal weather often causes disturbances to the sensor, such as lens occlusion or image blurring, affecting recognition accuracy. Meanwhile, the phenomena of sand dust, rain fog and the like cause severe background change, and the foreground and the background are difficult to distinguish, so that false leakage detection is easy to cause.

Disclosure of Invention

The application provides an abnormal weather identification method, an abnormal weather identification device, electronic equipment and a computer program product based on deep learning, which can effectively improve the expression capability of features and help a model to better cope with the problems of poor image quality and indistinguishable foreground and background.

In a first aspect, the present application provides a method for identifying abnormal weather based on deep learning, including:

Extracting features of an image to be detected based on a backbone network of a weather identification model which is trained in advance to obtain image features, wherein the image to be detected comprises weather information;

fusing the image features based on the neck network of the weather identification model to obtain target fusion features;

Detecting the target fusion characteristic based on a detection network of the weather identification model to obtain a detection result of abnormal weather in the image to be detected;

The neck network is provided with an EGMA module, and the EGMA module enhances the global association strength of the expression capability of the weather features based on at least two attention mechanisms.

Further, the EGMA module includes a channel awareness modulation sub-module, an ECA sub-module, a cross-channel spatial attention sub-module ESA sub-module, for a first input feature of the input EGMA module:

Executing channel characteristic enhancement operation on the first input characteristic through the channel perception modulation submodule to obtain a first enhancement characteristic;

performing channel attention operation on the first enhancement feature by the ECA submodule to obtain a second enhancement feature;

performing cross-channel spatial attention operation on the second enhancement feature by a cross-channel spatial attention sub-module to obtain a multi-attention enhancement feature;

and the ESA submodule executes the spatial attention operation on the multi-attention enhancement feature to obtain a first output feature corresponding to the first input feature.

Further, the channel perception modulation submodule comprises a dimension replacement layer, a double-layer multi-perception machine layer, an inverse dimension replacement layer and an activation function layer first weighting layer, and the channel perception modulation submodule is used for executing channel characteristic enhancement operation on the first input characteristic to obtain a first enhancement characteristic, and the method comprises the following steps:

each dimension parameter of the first input feature is replaced through a dimension replacement layer, so that a flattened space feature is obtained;

performing channel compression operation nonlinear activation operation on the flattened space characteristics through a first multi-perception machine layer to obtain compression characteristics;

Performing channel expansion operation on the compressed features through a second multi-perception machine layer to obtain first expansion features matched with the dimensions of the flattened space features;

Performing inverse displacement on each dimension parameter of the first expansion feature through an inverse dimension displacement layer to obtain a reconstruction feature of which each dimension parameter is matched with the first input feature;

performing linear activation operation on the reconstruction features through an activation function layer to obtain channel modulation weights;

And performing a weighting operation on the first input feature based on the channel modulation weight through the first weighting layer to obtain a first enhancement feature.

Further, the cross-channel spatial attention sub-module comprises a channel shuffling layer, a CBR layer and a CBS layer second weighting layer, and performs cross-channel spatial attention operation on the second enhancement feature by the cross-channel spatial attention sub-module to obtain a multi-attention enhancement feature, and the method comprises the following steps:

Executing channel shuffling operation on the second enhanced feature through the channel shuffling layer to obtain a reorganized feature;

sequentially executing channel compression operation and normalization operation nonlinear transformation operation on the reconfiguration characteristics through the CBR layer to obtain compression activation characteristics;

Sequentially executing channel expansion operation and normalization operation linear activation operation on the compression activation feature through the CBS layer to obtain space weight;

And performing weighting operation on the reconstructed features based on the spatial weights through a second weighting layer to obtain the multi-attention-enhancing features.

Further, the backbone network is provided with C2fPRFF modules comprising a first convolutional layer, a progressive fusion structure comprising at least two PRFFBlock, a first splice layer, a second convolutional layer, for each second input feature of the input C2fPRFF module:

Performing channel compression operation on the second input feature through the first rolling layer to obtain a convolution feature;

Sequentially executing progressive fusion operation on the convolution characteristics through each PRFFBlock connected in series through the progressive fusion structure to obtain hierarchy fusion characteristics corresponding to each hierarchy;

Performing splicing operation on the convolution characteristics and the hierarchy fusion characteristics through a first splicing layer to obtain first splicing characteristics;

And performing channel expansion operation on the first splicing characteristic through the second convolution layer to obtain a second output characteristic corresponding to the second input characteristic, wherein the image characteristic is obtained based on the second output characteristic.

Further, PRFFBlock includes a progressive convolution structure, a second splice layer, a third convolution layer, an ESE layer, a third splice layer, the progressive convolution structure including at least two convolution layers in series, a third input feature for each input PRFFBlock:

Performing progressive convolution operation on the third input feature through at least two serially connected convolution layers in the progressive convolution structure to obtain a level convolution feature corresponding to each level;

performing splicing operation on the third input feature and the convolution features of each level through a second splicing layer to obtain a second splicing feature;

Performing channel expansion operation on the second splicing characteristic through the third convolution layer to obtain a second expansion characteristic;

performing global average pooling operation and convolution operation on the second expansion feature through the ESE layer, and weighting the second expansion feature by the calculated channel attention weight to obtain an initial fusion feature;

And performing splicing operation on the third input features and the initial fusion features through a third splicing layer to obtain third output features corresponding to the third input features.

Further, the weather identification model is trained based on regression loss and classification loss, wherein the regression loss comprises focus fusion intersection ratio loss, and the formula is as follows:

Wherein, the For focusing fusion cross ratio loss, ioU is the cross ratio between the predicted frame and the real frame, and gamma is the parameter for controlling the inhibition degree of the abnormal value; Is fusion-to-fusion ratio loss, L _IoU is cross-to-fusion ratio loss, L _dis is center distance loss, L _shp is width-to-height loss, L _ang is angle loss, delta is 2 times center distance loss, b and b Respectively representing the center point of the real frame and the center point of the prediction frame; respectively representing squares of Euclidean distances between center points of the real frames and the predicted frames; And Respectively representing the width and the height of the minimum circumscribed rectangular frame, the width and the height loss of omega being 2 times, and the wThe widths of the predicted frame and the real frame are respectively; representing the square of the Euclidean distance between the real and predicted frame widths, h and The heights of the predicted frame and the real frame are respectively; representing the square of the Euclidean distance between the real frame and the predicted frame height; an angle loss of 2 times; And The width and height of a rectangular frame constructed from the center points of the real and predicted frames are represented, respectively.

In a second aspect, the present application provides an abnormal weather identification apparatus, comprising:

the extraction module is used for extracting the characteristics of the image to be detected based on the backbone network of the weather identification model which is trained in advance to obtain the image characteristics;

the fusion module is used for fusing the image characteristics based on the neck network of the weather identification model to obtain target fusion characteristics;

The detection module is used for detecting the target fusion characteristics based on a detection network of the weather identification model to obtain a detection result of abnormal weather in the image to be detected;

In a third aspect, the present application provides an electronic device comprising a memory, a computer program stored in said memory and executable on said processor, said processor implementing the steps of the method of the first aspect when said computer program is executed.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by one or more processors, implements the steps of the method of the first aspect described above.

Compared with the prior art, the weather identification model has the advantages that the weather identification model introduces an enhanced global mixed attention mechanism (Enhanced Global Mixed Attention, EGMA) module in the neck network, the EGMA module can enhance the expression capability of key features through fusion of at least two attention mechanisms and improve the global association strength among the features, and the weather identification model is favorable for better processing images with poor quality and indistinguishable panoramic background, such as images with shielding or blurred images. The target fusion characteristics obtained through the neck network extraction fusion are subjected to weather characteristics with strong global relevance and high distinction, so that the expression capacity of the target fusion characteristics is remarkably improved, and further, the recognition accuracy of images with poor image quality and indistinguishable foreground and background is improved.

It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a network structure of a weather identification model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a positional relationship between a detection frame and a prediction frame according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of an abnormal weather identification method based on deep learning according to an embodiment of the present application;

fig. 4 is a schematic diagram of a network structure of an EGMA module according to an embodiment of the present application;

Fig. 5 is a schematic diagram of a network structure of a C2fPRFF module according to an embodiment of the present application;

Fig. 6 is a schematic diagram of a network structure of PRFFBlock according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a device for identifying abnormal weather according to an embodiment of the present application;

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, and circuit methods are omitted so as not to obscure the description of the present application with unnecessary detail.

Bad weather often has physical effects on the sensor, such as the camera lens being blocked or snow covered, resulting in a significant degradation of the input image quality. In addition, weather phenomena such as sand storm, rainfall, dense fog and the like easily cause dynamic changes of the background, so that the difficulty in distinguishing between a foreground object and the background is increased. Particularly, under the combined action of the factors, the identification of abnormal weather is greatly interfered, so that the false detection rate is obviously increased.

In order to solve the problem, the application provides a weather identification model, which comprises a backbone network and a neck network detection head. The method comprises the steps of obtaining image characteristics by carrying out characteristic extraction on an input image to be detected, namely an image to be detected comprising weather information, carrying out characteristic fusion on the image characteristics by a neck network to obtain target fusion characteristics, and detecting abnormal weather damage detection results in the image to be processed by a detection head.

In order to help the model better cope with the detection of images with poor quality and indistinguishable foreground and background, the EGMA module can enhance the expression capability of key features through fusion of at least two attention mechanisms and improve the global association strength between the features, which helps the model better cope with images with poor quality and indistinguishable panoramic background, such as images with occlusion or blurred images. The target fusion characteristics obtained through the neck network extraction fusion are subjected to weather characteristics with strong global relevance and high distinction, so that the expression capacity of the target fusion characteristics is remarkably improved, and further, the recognition accuracy of images with poor image quality and indistinguishable foreground and background is improved.

In some embodiments, the EGMA module includes a channel aware modulation sub-module, an ECA sub-module, a cross-channel spatial attention sub-module ESA sub-module. The mixing of multiple attention mechanisms can not only strengthen the expression of key information in the characteristics, but also effectively promote the information communication across channels, accurately extract important characteristics of space dimension, and effectively cope with the model even if abnormal weather is identified in a scene with poor image quality caused by bad weather, namely, the structural improvement can comprehensively improve the understanding and processing capacity of the model to complex modes.

In some embodiments, the channel aware modulation submodule includes a dimension substitution layer, a double-layer multi-perceptron layer, an inverse dimension substitution layer, and an activation function layer first weighting layer. The channel perception modulation submodule enables a network to learn importance and relevance among channels through collaborative design of each network layer, so that key channels are selectively enhanced, redundant channels are restrained, effective characteristic response is highlighted, discrimination and robustness of characteristic expression are improved, the target fusion characteristics are more differentiated, and perception capability of the model to abnormal weather under a complex scene is integrally enhanced.

In some embodiments, the cross-channel spatial attention submodule includes a channel shuffling layer, a CBR layer, a CBS layer, and a second weighting layer. The introduction of the channel shuffling layer promotes the information interaction across channels, and the CBR and CBS layers sequentially extract the spatial attention characteristics and perform nonlinear transformation and normalization processing, thereby being beneficial to enhancing the spatial response of a significant region. Therefore, the overall design of the cross-channel space attention sub-module can improve the attention capability of the model to the key space region, strengthen the space feature expression and provide more-identified space information for the final target fusion feature, thereby improving the perception precision and the robustness under a complex scene.

Abnormal weather, especially sand storm, fog, rainfall, snow, etc., not only easily reduces the visibility of the environment, and obscures the image captured by the camera, affecting the quality of the input image, but also easily causes significant changes in image contrast, for example, overexposure is easily caused by white background in ice and snow weather, and the sand storm and fog may darken the image as a whole. The change of the image contrast not only weakens the visibility of abnormal weather characteristics, but also increases the difficulty of characteristic extraction and the accuracy of an interference recognition algorithm, so that the probability of false detection and omission is greatly improved.

In some embodiments, to reduce the probability of false detection and false omission, the backbone network is provided with a Progressive receptive field Fusion (C2 f Progressive RECEPTIVE FIELD Fusion, C2 fPRFF) module, including a first convolution layer, a Progressive Fusion structure formed by at least two PRFFBlock, and a first splice layer and a second convolution layer. The gradual fusion structure is used for gradually superposing and fusing the characteristics through PRFFBlock of the multi-level receptive field, so that multi-scale characteristic information can be effectively captured, and the perceptibility of targets and context structures with different sizes is improved. The module is helpful for enhancing the richness and layering of the feature expression, and extracting the feature expression which is more robust and has discrimination.

In some embodiments, PRFFBlock comprises a progressive convolutional structure, a second splice layer, a third convolutional layer, an ESE layer, a third splice layer, the progressive convolutional structure comprising at least two convolutional layers in series. The progressive convolution structure enables the model to learn finer context information and capture richer spatial hierarchy features by continuously and progressively stacking a plurality of convolution operations and performing feature fusion. The improvement obviously enhances the performance of the model in a complex environment, particularly in severe weather conditions, effectively improves the accuracy and the robustness of detection, and reduces the probability of false detection and missed detection.

In some embodiments, the weather identification model may be developed based on the YOLO series model in view of the following advantages of the YOLO series model:

The YOLO series models are all end-to-end single-stage detection frameworks, and can realize efficient target detection. Compared with a two-stage detector (such as a fast R-CNN), the YOLO directly returns to the bounding box and class through single forward propagation, so that the detection speed is improved, and the method is suitable for real-time application. The structure is continuously optimized, such as an Anchor-Free mechanism, a feature pyramid (FPN, PAN) and the like are introduced, so that the detection capability of small targets and multi-scale targets is enhanced. In addition, the YOLO series model is continuously optimized in the aspects of light weight, calculation efficiency, robustness and the like, so that the YOLO series model has good adaptability in embedded equipment and cloud reasoning scenes. YOLOv8 is a significant upgrade of the YOLO series, supporting tasks such as object detection, image classification, and instance segmentation. The architecture consists of a Backbone network (Backbone), a neck network (Neck) and a detection Head (Head). The backbone network adopts a C2f module to improve the feature extraction efficiency, the neck network uses PANet structures to enhance multi-scale feature fusion, and the detection head introduces Anchor-Free design and DFL loss to improve the detection precision and flexibility. Furthermore YOLOv provides n/s/m/l/x five model variants, adapting to different scenarios. In the abnormal weather identification task, YOLOv s are selected for optimization so as to balance accuracy and instantaneity and improve detection effect.

Based on this, YOLOv s may be preferable in constructing the weather identification model. For example, if the network is based on YOLOv s, the network structure of the weather identification model can be referred to in fig. 1 by modifying the network with the EGMA module and the C2fPRFF module.

Based on the weather identification model of FIG. 1, the backbone network performs multi-scale feature extraction on the input image to be detected through the Group Conv-C2fPRFF-3× [ Group Conv-C2fPRFF ×2] -SPPF which are connected in sequence. Where Group Conv is a Group convolution, a special form of convolution operation that groups input features by channels, convolves each Group independently, and then concatenates the outputs of the groups together. Each [ Group Conv-C2fPRFF ×2] can be regarded as a feature extraction module of one scale, and 3 [ Group Conv-C2fPRFF ×2] are feature extraction modules of three scales connected in sequence. The rapid spatial pyramid pooling layer (SPATIAL PYRAMID Pooling-Fast, SPPF) is used for rapidly extracting multi-scale features on the premise of not changing the size of an input feature map, improving the perceptibility of the model to targets with different sizes and reducing the calculated amount. The SPPF uses continuous pooling, splicing and convolution to rapidly extract multi-scale information, so that the feature expression capability is enhanced, and the calculated amount is small. The output of the feature extraction module of the last scale passes through the SPPF, and the output of the SPPF is used as the input of the scale input neck network for feature extraction.

Based on feature extraction of each scale in the backbone network, the neck network is provided with feature fusion operation of the corresponding scale. Aiming at feature fusion of each scale, the output features of other two scales are spliced after the feature alignment, and then output to the corresponding decoupling detection head through the c2f module and the EGMA module, when the scales are aligned, the large scale is aligned with the small scale, convolution operation is adopted, and when the small scale is aligned with the large scale, up-sampling operation is adopted.

Aiming at the target fusion characteristics of each scale, the corresponding detection head detects the target fusion characteristics, and then a detection result can be output.

In the training process, the regression of the detection frame is finer and more stable through a4 Xreg_max distributed regression strategy, so that the positioning accuracy is improved, and the output layer is guided to generate a corresponding number of category prediction results through setting the total number of categories (nc), so that the detection task can cover all target categories.

After the trained weather identification model is put into the application stage, redundant detection frame removal (such as non-maximum suppression (NMS)) and confidence level filtering operation can be further introduced to remove overlapped detection frames and screen out a high-confidence prediction result, so that the accuracy and reliability of a final detection result are improved.

In order to improve the perception capability of a weather identification model, a backbone network is introduced with a C2fPRFF module, gradual fusion of multi-scale receptive fields is realized through stacked PRFFBlock, understanding of context information is enhanced, the capability of processing complex content is improved, and accordingly the probability of false detection and omission of abnormal weather identification is reduced.

In some embodiments, to more accurately address the challenges of anomalous weather identification, specialized datasets may be created for training weather identification models. Specifically, a dataset named abnormal weather identification (Abnormal WEATHER DETECT, AWD) is created. The dataset focuses on the collection and analysis of examples of unusual weather, and images in the dataset may originate from the network, and also from samples collected in the field in a real vehicle-mounted environment, to ensure a high degree of realism and wide diversity of data. The AWD dataset is unique in that it covers a variety of abnormal weather scenarios, providing valuable and representative resources for model training and assessment.

The application focuses on identifying and classifying four common abnormal weather conditions, and aims at accurately coping with multiple challenges in representative weather identification. These four abnormal weather categories may include sand storm, fog, rainfall and ice and snow weather, each of which is finely divided and defined. The AWD data set created based on four abnormal weather covers 3000 high quality images, providing a rich visual resource for research. Each image is manually marked.

Illustratively, each scene may include multiple time periods such as day and night to improve the complexity and comprehensiveness of the scene, and each image may be configured with a corresponding txt tag file detailing the location and category of the abnormal weather.

In order to ensure the effectiveness of model training and the reliability of evaluation results, each image in the data set is divided into a training set and a verification set according to the proportion of 8:2.

In some embodiments, to improve the accuracy of the localization of anomalous weather and the convergence speed of the model, the loss function includes a classification loss during training of the weather identification model. Specifically, the classification Loss uses a Binary Cross entropy Loss (BCE Loss), which is a Loss function commonly used for the two-classification problem, and measures the difference between the predicted class and the true class of the model, for judging the specific class in the anchor frame. Specifically, for each sample, if its true class is y _i ε [0,1], the predicted class isE [0,1], the fractional cross entropy loss can be expressed as:

Wherein L _BCE is BCELoss, n is the number of image samples of all city facility anomalies, y _i is the true class of the ith image sample, Is the prediction category of the i-th image sample.

The bounding box regression loss typically includes a full cross ratio loss (Complete Intersection over Union loss, CIoU Loss) and a dynamic class loss (Distribution Focal Loss, DFL loss) for measuring the error between the predicted and real bounding boxes, respectively. CIoU Loss is obtained by modifying common loss functions such as the cross-over ratio (Intersection over Union, ioU), the generalized cross-over ratio (Generalized Intersection over Union, GIoU), the distance cross-over ratio (Distance Intersection over Union, DIoU) and the like. It adds a penalty term for aspect ratio compared to previous loss functions, can better distinguish errors in different situations when the center points of the predicted and real borders coincide, and has scale invariance, and its formula is as follows:

wherein L _CIoU is CIoU Loss, b and Representing the center point of a real frame and the center point of a predicted frame respectively, ρ being the Euclidean distance between the predicted frame and the real frame, c being the distance of the diagonal of the closed region of the predicted frame and the real frame, v being the consistency of the relative proportions of the predicted frame and the real frame, ioU being the intersection ratio of the predicted frame and the real frame, α being the weight coefficient, w andThe width of the predicted frame and the width of the real frame are respectively, and h andThe heights of the predicted and real borders, respectively.

The introduction of DFL loss may further improve the regression accuracy of the bounding box based on CIoU Loss. The DFL loss performs coordinate discretization processing on uncertainty in boundary frame coordinate prediction, so that the regression process is finer and more robust, and prediction errors are effectively reduced. The formula of DFL loss is as follows:

Wherein S _i is the cross entropy loss of the real frame on the left and the predicted frame, and S _i+1 is the cross entropy loss of the real frame on the right and the predicted frame.

However, the CIoU loss, while taking into account three important factors of positioning loss, overlap area, center point distance, and aspect ratio, is problematic in the design of the αv in the CIoU loss equation, resulting in an impaired convergence speed.

In order to improve the convergence speed, on the basis of the original CIoU loss penalty term, the aspect ratio penalty term is split, the width penalty and the height penalty are processed respectively, and meanwhile, the angle penalty term is introduced, so that a new IoU loss is provided, and the intersection ratio loss (Kombine Intersection over Union, KIoU) loss is fused. The loss function includes overlap loss, center distance loss, wide-height loss, and angle loss. Referring to fig. 2, the core idea of the kiou penalty includes first guiding the fast fit of the prediction box to the nearest axis position, i.e. letting the center point of the prediction box and the center point of the real box be parallel to the horizontal or vertical direction. Then, the prediction frame only needs to be further adjusted along one direction. By introducing the angle penalty term, the total number of degrees of freedom can be effectively reduced, and the convergence speed is increased. Meanwhile, the high-width ratio punishment term is split into a high punishment term and a wide punishment term, so that the prediction errors of the model to different directions can be adjusted more accurately. Splitting the aspect ratio penalty term can avoid considering the aspect ratio as a whole, allowing the error in each dimension (width and height) to be more independent and targeted optimized, thus making the predicted positioning result more accurate.

Specifically, the equation for KIoU loss is as follows:

Wherein, the Is fusion ratio loss, ioU is the fusion ratio between the predicted frame and the real frame, L _IoU is the fusion ratio loss, L _dis is the center distance loss, L _shp is the width and height loss, L _ang is the angle loss, delta is 2 times the center distance loss, b and bRespectively representing the center point of the real frame and the center point of the prediction frame; respectively representing squares of Euclidean distances between center points of the real frames and the predicted frames; And Respectively representing the width and the height of the minimum circumscribed rectangular frame, the width and the height loss of omega being 2 times, and the wThe widths of the predicted frame and the real frame are respectively; representing the square of the Euclidean distance between the real and predicted frame widths, h and The heights of the predicted frame and the real frame are respectively; representing the square of the Euclidean distance between the real frame and the predicted frame height; an angle loss of 2 times; And The width and height of a rectangular frame constructed from the center points of the real and predicted frames are represented, respectively.

Considering that there is a problem of sample imbalance in the bounding box regression process, i.e., the number of high quality prediction boxes (small errors) in one image is far less than the low quality prediction boxes (large errors). These low quality prediction frames may produce excessive gradients that adversely affect the training process. To solve this problem, the present application proposes a new Loss function combining Focal Loss and KIoU losses, called Focal fusion cross-ratio Loss (Focal Kombine Intersection over Union, focal KIoU Loss). The formula is as follows:

Wherein, the For focusing fusion cross ratio loss, γ is a parameter controlling the degree of outlier suppression.

The formula of the total Loss function Loss is as follows, combining the above formulas:

Where λ ₁ and λ ₂ are equilibrium coefficients.

The weather identification in any of the foregoing embodiments is performed based on the total loss function, so that the model convergence speed can be improved, and further, the model training efficiency can be improved. The weather identification model is iterated continuously, so that the model has a good abnormal weather identification effect and tends to be stable, and finally a robust version can be selected from multiple converged versions to serve as a trained weather identification model.

In some embodiments, weather identification model training is optimized in combination with classification loss and regression loss. The classification Loss adopts a two-class cross entropy Loss (BCE Loss) for determining the anchor frame class, and the regression Loss consists of Focal KIoU Loss and DFL Loss to measure the error of the predicted frame and the real frame. And the positive and negative sample matching strategy adopts a TAL dynamic matching mode, so that the target distribution is optimized, and the detection precision is improved.

In some embodiments, in order to comprehensively and accurately measure the performance of the weather identification models of all versions, after at least one version of the converged weather identification model is obtained based on training set training, the converged model can be evaluated through a verification set so as to avoid over-fitting of the model, verify generalization of the model and ensure stable operation of the model after deployment.

In particular, the performance of the weather identification model on the verification set may be evaluated according to preset conditions.

For example, the preset conditions may include performance indexes such as intersection ratio, detection accuracy and recall rate of the abnormal weather by the abnormal weather identification device.

IoU is an index for evaluating the overlapping degree of the predicted frame and the real frame, and is calculated as the ratio of the intersection and union of the predicted frame and the real frame, which plays a key role in determining whether to correctly detect. The calculation formula for IoU can be written as:

Precision (p), also known as Precision, refers to the proportion of positive predictions that are positive to all predictions, as shown in the formula:

recall (R), also known as Recall, refers to a proportion that is predicted to be positive to all actual positive, as shown in:

F1-Score is a comprehensive index for measuring the performance of the model, and combines the dual advantages of accuracy (Precision) and Recall (Recall) for comprehensively evaluating the balanced performance of the model in abnormal weather detection tasks, as shown in the formula:

the average precision (AveragePrecision, AP) is calculated by the precision and the recall, a line graph of the precision is drawn according to the recall, and the area under the line graph is calculated, wherein the area is shown as the formula:

the average accuracy mean (meanAveragePrecision, mAP) refers to the average of the average accuracy AP of C different abnormal weather categories, as shown in:

that is, after the weather identification models of the respective versions are verified by the verification set, the weather identification models of the respective versions may be comprehensively evaluated based on the above-described several indexes so that the weather identification model with the best performance is determined from the respective versions as the weather identification model after training.

In some embodiments, the model runtime environment includes Intel Xeon Platinum 8235C processor, 314 GB memory, NVIDIA TESLA V100 32 GB graphics card, operating system CentOS 8.5.2 (64 bits). The weather identification model is constructed based on PyTorch frames, the input image size is [640,640], and a multi-scale training strategy is adopted. The experiment sets the batch size to 64, trains 200 rounds (epochs), adopts an SGD optimizer with the initial learning rate of 0.01 in the model training process, and dynamically adjusts the learning rate by combining a cosine attenuation strategy. The SGD optimizer may set the Momentum factor (Momentum) to 0.937 to assist in accelerating the gradient descent process and suppressing oscillations, and the weight decay coefficient (WEIGHT DECAY) to 0.0005 for regularization model to prevent overfitting.

Based on the network structure of the weather identification model in the previous embodiment, the application provides an abnormal weather identification method based on deep learning.

The abnormal weather identification method based on deep learning provided by the embodiment of the application can be applied to mobile phones, tablet computers, vehicle-mounted equipment, augmented reality (augmentedreality, AR)/virtual reality (virtualreality, VR) equipment, notebook computers, ultra-mobilepersonalcomputer (UMPC), netbooks, personal Digital Assistants (PDA) and other electronic equipment, and the embodiment of the application does not limit the specific type of the electronic equipment.

In order to illustrate the technical solution proposed by the present application, the following describes each embodiment with an electronic device as an execution body.

FIG. 3 shows a schematic flow chart of the abnormal weather identification method based on deep learning provided by the application, which comprises the following steps:

And 310, the electronic equipment performs feature extraction on the image to be detected based on a backbone network of the weather identification model which is trained in advance.

And 320, the electronic equipment fuses the image characteristics based on the neck network of the weather identification model to obtain target fusion characteristics.

And 330, the electronic equipment detects the target fusion characteristic based on a detection network of the weather identification model to obtain a detection result of abnormal weather in the image to be detected.

In this embodiment, the EGMA module introduced by the neck network can enhance the expression capability of the key features through fusion of at least two attention mechanisms and improve the global association strength between the features, which helps the model to better process images with poor quality and indistinguishable panoramic backgrounds, such as images with occlusion or blurred images. The target fusion characteristics obtained through the neck network extraction fusion are subjected to weather characteristics with strong global relevance and high distinction, so that the expression capacity of the target fusion characteristics is remarkably improved, and further, the recognition accuracy of images with poor image quality and indistinguishable foreground and background is improved.

In some embodiments, the EGMA module includes a channel awareness modulation sub-module, an ECA sub-module, a cross-channel spatial attention sub-module ESA sub-module, for a first input feature of the input EGMA module, the electronic device may perform the following operations:

and A1, the electronic equipment executes channel characteristic enhancement operation on the first input characteristic through the channel perception modulation submodule to obtain a first enhancement characteristic.

The electronic equipment processes the first input features through the channel perception modulation submodule, and the module can model the dependency relationship among channels, selectively strengthen key channels and inhibit invalid channels, so that the first enhancement features with stronger discrimination capability are obtained.

And A2, the electronic equipment executes channel attention operation on the first enhancement feature through the ECA submodule to obtain a second enhancement feature.

Based on the first enhancement feature, the electronic device further performs a lightweight channel Attention operation through an ECA (EFFICIENT CHANNEL Attention) submodule, refines and strengthens important channel information, suppresses irrelevant redundancy, and outputs a more discriminative second enhancement feature.

And A3, the electronic equipment executes the cross-channel spatial attention operation on the second enhancement feature through the cross-channel spatial attention sub-module to obtain the multi-attention enhancement feature.

The electronic equipment applies a cross-channel spatial attention sub-module to the second enhancement feature, enhances the perception of a key area and a position through information interaction and spatial feature modeling among channels, and obtains a multi-attention enhancement feature fusing the channels and the spatial attention.

And A43, the electronic equipment executes the spatial attention operation on the multi-attention enhancement feature through the ESA submodule to obtain a first output feature corresponding to the first input feature.

Finally, the electronic device executes the spatial Attention operation on the multi-Attention enhancement feature through an ESA (ENHANCED SPATIAL Attention) submodule, further enhances the response of the remarkable spatial information, and finally outputs a first output feature corresponding to the first input feature so as to improve the overall perception effect and the downstream task performance.

The embodiment constructs a multi-level and multi-dimensional attention enhancement mechanism by introducing channel perception modulation, ECA channel attention and cross-channel space attention ESA space attention module layer by layer. The process can effectively excavate and strengthen key information (especially channel information and space information) in the input characteristics, inhibit invalid or interference characteristics, and promote the discrimination and the robustness of characteristic expression on the basis of fully modeling the relation between channels and the space significant region, thereby providing higher-quality characteristic input for subsequent tasks.

In some embodiments, the channel perception modulation submodule comprises a dimension substitution layer, a double-layer multi-perception machine layer, an inverse dimension substitution layer and an activation function layer first weighting layer, and the channel perception modulation submodule is used for executing channel characteristic enhancement operation on the first input characteristic to obtain a first enhancement characteristic, and the method comprises the following steps:

And step A11, the electronic equipment performs displacement on each dimension parameter of the first input feature through a dimension displacement layer to obtain a flattened space feature.

The electronic device rearranges the spatial dimension and the channel dimension of the first input feature through the dimension substitution layer, for example, transforms the feature from a shape b×c×h×w (lot×channel number×height×width) to b× (h×w) ×c, flattens the spatial information, and simultaneously retains the channel feature vector corresponding to each spatial position. This operation helps to model the dependency between features in the channel dimension in the following, thereby improving the expressive power and discriminant of the channel features.

And step A12, the electronic equipment executes a channel compression operation nonlinear activation operation on the flattened space characteristics through the first multi-perception machine layer to obtain compression characteristics.

And A13, the electronic equipment performs channel expansion operation on the compressed features through a second multi-perception machine layer to obtain first expansion features matched with the dimensions of the flattened space features.

The two multi-perception machine layers can effectively model the dependency relationship between channels. The electronic equipment performs channel compression and nonlinear activation on the flattened space features through the first multi-perception machine layer, extracts key channel information to obtain compact compression features, and then performs channel expansion on the compression features through the second multi-perception machine layer to restore the original channel dimensions to obtain first expansion features, so that alignment of feature structures is ensured, and information integrity is maintained.

And A14, the electronic equipment performs inverse substitution on each dimension parameter of the first expansion feature through an inverse dimension substitution layer to obtain a reconstruction feature of which each dimension parameter is matched with the first input feature.

The electronic device restores the first extended feature from b× (h×w) x C to b×c×h×w space-channel arrangement by an inverse dimension substitution operation, resulting in a reconstructed feature that is consistent with the first input feature dimension.

And A15, the electronic equipment executes linear activation operation on the reconstruction feature through the activation function layer to obtain the channel modulation weight.

And step A16, the electronic equipment performs weighting operation on the first input characteristic based on the channel modulation weight through the first weighting layer to obtain a first enhancement characteristic.

The electronic equipment performs normalization processing on the reconstructed features through an activation function (such as Sigmoid) to generate channel modulation weights, and then performs channel-by-channel weighting on the first input features through a first weighting layer to highlight key features and suppress irrelevant interference to obtain first enhancement features. In particular, the weighting operation essentially represents the channel modulation weights as channel attention features, multiplied element by element with the first input features.

In some embodiments, the cross-channel spatial attention sub-module includes a channel shuffling layer, a CBR layer, and a CBS layer second weighting layer, and performs a cross-channel spatial attention operation on the second enhancement feature to obtain a multi-attention enhancement feature, including:

And step A31, the electronic equipment performs channel shuffling operation on the second enhanced feature through the channel shuffling layer to obtain a reorganized feature.

The electronic equipment rearranges the second enhancement features through the channel shuffling layer to promote information interaction among channels, more effectively mix and enrich feature information and obtain recombination features with more expression capability.

And A32, the electronic equipment sequentially executes channel compression operation and normalization operation nonlinear transformation operation on the reconfiguration feature through the CBR layer to obtain compression activation feature.

The electronic device sequentially performs channel compression, normalization and nonlinear transformation operations on the reconstructed features through CBR layers (Convolution, batch NormalizationReLU activation functions, i.e., combinations of convolution, batch normalization and ReLU activation) to extract compact and discriminative compressed activation features.

And A33, the electronic equipment sequentially executes channel expansion operation and normalization operation linear activation operation on the compression activation feature through the CBS layer to obtain spatial weight.

The electronic device sequentially performs channel expansion, normalization and linear activation operations on the compressed activation features through CBS layers (Convolution, batch NormalizationSigmoid activation functions, i.e., a combination of convolution, batch normalization and Sigmoid activation) to generate normalized spatial attention weights that can be used to highlight regional features in the image that have discriminant, suppressing extraneous interference.

And step A34, the electronic equipment performs weighting operation on the reconstructed characteristics based on the spatial weights through a second weighting layer to obtain the multi-attention enhancement characteristics.

Finally, the electronic equipment performs weighting operation on the reconstructed features based on the spatial weights through the second weighting layer to obtain the multi-attention-enhancing features. In particular, the weighting operation essentially represents the spatial weights as spatial attention features, multiplied element by element with the reorganization features.

In the embodiment, the electronic equipment combines the spatial attention weight guiding characteristic enhancement by introducing a channel shuffling mechanism and a layer-by-layer compression-expansion structure, so that the spatial distinguishing capability and the inter-channel collaborative sensing capability of the characteristic are effectively improved, and the sensing capability of the model on a key area is enhanced.

In some embodiments, fig. 4 shows a network architecture diagram of an EGMA module. Based on the network structure of fig. 4, for the first input feature, the electronic device first extracts the attention feature in the channel dimension through the channel attention sub-module, then performs channel shuffling operation to promote interaction and fusion of cross-channel information, then inputs the processed feature to the spatial attention sub-module, and further digs the salient feature in the spatial dimension. Through the progressive processing, the model can be more effectively focused on the key region and the important channel, the characteristic representation is optimized, and the overall task performance is improved.

Specifically, if the first input feature is represented asIn the channel attention sub-module, first, a first input feature is converted from b×c×h×w to b× (h×w) ×c by a dimension substitution operation. Next, a two-layer multi-layer perceptron (MLP) is used to capture inter-channel dependencies. The first layer MLP reduces the channel number to 1/4 of the original channel number, introduces nonlinearity through a ReLU activation function, and then the second layer MLP restores the channel number to the original dimension to obtain a first expansion feature. And then, restoring each dimension parameter of the first expansion feature to B multiplied by C multiplied by H multiplied by W through inverse dimension substitution to obtain a reconstruction feature, and generating a channel attention feature by the reconstruction feature through a Sigmoid activation function. Finally, the channel attention feature is multiplied by the first input feature element by element to obtain a first enhancement feature, and the first enhancement feature is further processed by the ECA module to obtain a second enhancement feature Y _GCA.

Wherein Permute is a dimension permutation operation, X _Permute is a flattened space feature, W ₁ is a channel compression operation performed by the first layer MLP, reLU is a nonlinear operation performed by the activation function, W ₂ is a channel expansion operation performed by the second layer MLP, X _GCA is a channel attention map, and Reverse Permute is an inverse dimension permutation operation.

To facilitate further mixing and sharing of the characteristic information, channel shuffling operations are introduced. Specifically, Y _GCA is first partitioned into 4 groups, each group containing C/4 channels. Then, a transpose operation is performed on the channel order within each group, randomizing the channel order within each group. And then, recombining the shuffled features to restore the original BXCXHXW shape, and obtaining the recombined features Y _CS.Y_CS which can promote the effective mixing and sharing of the features of the subsequent channels, thereby improving the feature expression capability.

ChannelShuffle is a channel shuffling operation.

Assuming that the convolution operation in the CBR layer is a 5x 5 convolution to achieve channel compression, the convolution in the CBS layer is a 7 x 7 convolution to achieve channel recovery. Then in the cross-channel spatial attention sub-module, the channel number is compressed to 1/4 times of the original number by first passing Y _CS through a 5X 5 convolution layer, and then nonlinear transformation is performed by batch normalization and ReLU activation function to obtain the compressed activation feature. Then, the number of channels is restored to the original dimension C by a 7×7 convolution layer, and batch normalization is performed again. Subsequently, spatial weights, i.e., spatial attention features, are generated using Sigmoid activation functions. Finally, the spatial attention feature X _GSA is multiplied element by element with Y _CS to yield a multi-attention enhancement feature. The multi-attention enhancing feature is input to an ESA module to further optimize the spatial feature to obtain a first output feature O. O merges the channel and space saliency information and has stronger characteristic expression capability.

Wherein the mathematical description of the ECA module can be expressed as follows:

given an input tensor The following steps are:

The mathematical description of the ESA module may be expressed as follows:

given an input tensor The following steps are:

In the embodiment, the EGMA module remarkably enhances the expression capability of input features, and can effectively extract and strengthen key visual information even under the condition that the image quality is reduced due to severe weather such as lens shielding, snow coverage and the like, so that the interference of physical environment on the sensor perception is slowed down, and the robustness of a detection system is improved. In the face of background dynamic changes caused by sand storm, rainfall, dense fog and the like, the EGMA relies on the space attention sub-module to accurately capture the remarkable characteristics of space dimension, and the power-assisted model accurately distinguishes the foreground from the complex background, so that the false detection rate is reduced, and the detection precision is improved. Meanwhile, EGMA promotes cross-channel information interaction through a channel shuffling mechanism, and combines a spatial attention strengthening structure to effectively improve the perception capability and stability of the model in a complex environment.

In some embodiments, the backbone network is provided with a C2fPRFF module, FIG. 5 shows a network architecture schematic of the C2fPRFF module, comprising a first convolution layer, a progressive fusion fabric comprising N PRFFBlock, a first splice layer, a second convolution layer, N≥2, for each second input feature of the input C2fPRFF module, the electronic device may perform the steps of:

and B1, the electronic equipment executes channel compression operation on the second input feature through the first rolling layer to obtain a convolution feature.

The electronic device executes channel compression operation on the second input features through the first convolution layer, extracts compact low-dimensional convolution features, reduces computational complexity and highlights key semantic information.

And B2, the electronic equipment sequentially executes progressive fusion operation on the convolution characteristics through the PRFFBlock connected in series through the progressive fusion structure to obtain hierarchy fusion characteristics corresponding to each hierarchy.

The electronic equipment sends the convolution characteristics into a progressive fusion structure, and the progressive fusion structure gradually extracts multi-scale context information through gradual superposition and fusion of multi-level receptive fields to generate fusion characteristics corresponding to each level.

And B3, the electronic equipment performs splicing operation on the convolution characteristics and the hierarchy fusion characteristics through the first splicing layer to obtain first splicing characteristics.

The electronic equipment splices the original convolution characteristics and the hierarchical fusion characteristics in the channel dimension through the first splicing layer to form first splicing characteristics containing multi-scale information, and the richness of characteristic representation is enhanced.

And B4, the electronic equipment performs channel expansion operation on the first splicing characteristic through the second convolution layer to obtain a second output characteristic corresponding to the second input characteristic, and the image characteristic is obtained based on the second output characteristic.

And the electronic equipment carries out channel expansion operation on the first spliced feature through the second convolution layer, restores the feature to the target dimension, and generates a second output feature corresponding to the second input feature as a basis for extracting the subsequent image feature.

In this embodiment, the electronic device performs multi-scale semantic fusion and feature enhancement on the second input feature. The progressive PRFFBlock structure can refine the context information layer by layer, so that the recognition capability of the model to the complex structure is effectively improved, the multi-level feature splicing and channel expansion operation further enriches the feature expression, and the capturing capability of the model to detail and semantic information is remarkably enhanced in tasks such as image understanding and target detection.

In some embodiments PRFFBlock comprises a progressive convolution structure comprising at least two convolutions in series, a second splice layer, a third convolution layer, an ESE layer, a third splice layer, a third input feature for each input PRFFBlock:

And C1, the electronic equipment executes progressive convolution operation on the third input feature through at least two serially connected convolution layers in the progressive convolution structure to obtain a level convolution feature corresponding to each level.

The electronic equipment performs layer-by-layer convolution processing on the third input feature through a progressive convolution structure (composed of at least two convolution layers connected in series), extracts multi-scale semantic information from shallow layers to deep layers, obtains level convolution features corresponding to each level, and gradually strengthens local detail and global structure information.

And C2, the electronic equipment performs splicing operation on the third input feature and the convolution features of all levels through the second splicing layer to obtain a second splicing feature.

The electronic equipment splices the original third input features and convolution features of different levels in the channel dimension through the second splicing layer to obtain second splicing features, so that multi-scale fusion of the features is realized, and the characterization capability is improved.

And C3, the electronic equipment executes channel expansion operation on the second splicing characteristic through the third convolution layer to obtain a second expansion characteristic.

And the electronic equipment executes channel expansion operation on the second spliced feature through the third convolution layer, restores the channel dimension to an expected scale and generates a second expansion feature with complete structure and rich semantics.

And C4, the electronic equipment executes global average pooling operation and convolution operation on the second expansion feature through the ESE layer, and weights the calculated channel attention weight on the second expansion feature to obtain an initial fusion feature.

The electronic device performs global average pooling operation on the second expansion feature through the ESE layer (namely EFFICIENT SQUEEZE-and-extraction layer), calculates the channel attention weight through convolution operation, and weights the second expansion feature channel by channel to obtain an initial fusion feature with key reinforcement so as to highlight a key semantic region.

And C5, the electronic equipment performs splicing operation on the third input features and the initial fusion features through a third splicing layer to obtain third output features corresponding to the third input features.

And the electronic equipment splices the third input characteristic and the initial fusion characteristic through a third splicing layer to form a third output characteristic with original information and enhanced attention information for subsequent higher-level characteristic extraction and task processing.

In this embodiment, the electronic device performs multi-level convolution extraction, feature fusion, and channel attention enhancement on the third input feature. The progressive convolution structure ensures the depth and continuity of feature extraction, the splicing mechanism integrates multi-scale information, and the attention mechanism introduced by the ESE layer further improves the perception capability of the model on key channel features. The third output characteristic of final output obviously enhances the distinguishing and expressing capacity of the characteristic on the basis of maintaining the integrity of the original information, and provides powerful characteristic support for subsequent visual tasks.

In some embodiments, fig. 6 shows a network architecture diagram of PRFFBlock. Based on the network structure of fig. 6, for the third input featurePRFFBlock performs the following steps:

First, to facilitate stitching of individual scale features, a list Y may be initialized, containing a third input feature X:

then, for the i (i e {0,1,.., n-1 }) th convolution layer, the hierarchical convolution characteristics of the i-1 convolution layer outputs are taken as inputs to the i convolution layer, and a 3×3 convolution operation is performed. It is noted that the inputs of the other convolution layers are hierarchical convolution features of the outputs of the previous layer except that the input of the first convolution layer is the third input feature X.

Wherein Y ₀ = X. After each convolution operation, the result is added to list Y.

And splicing the outputs of all the convolution layers in the list Y with the third input feature X to obtain a second splicing feature, and carrying out channel fusion through 1X 1 to obtain a second expansion feature.

Finally, the importance of the second expansion feature with different receptive fields is adjusted through ESE attention mechanism to obtain an initial fusion feature Y _Agg, and the third input feature X is spliced with the initial fusion feature Y _Agg by residual connection to obtain a third output feature O.

In this embodiment PRFFBlock achieves progressive extraction and fine modeling of feature information by stacking multiple 3 x 3 convolution operations in succession and fusing multi-scale features. The structure is favorable for the model to learn richer context and spatial hierarchy information, so that the accuracy and the robustness of target detection are obviously improved under a complex environment, particularly under severe weather (such as heavy rainfall, dense fog or low light) conditions.

In some embodiments, the identification method introduces a C2fPRFF module into a backbone network, realizes gradual fusion through stacked PRFFBlock, extracts multi-scale receptive field features, and enhances the understanding capability of a model on context information. The EGMA module is integrated in the neck network, a channel attention and a space attention mechanism are combined, channel shuffling operation is introduced, feature expression capability is effectively improved, and the performance of the model under the conditions that image quality is reduced and foreground and background are difficult to distinguish is enhanced. Furthermore, a newly designed focusing fusion cross-ratio loss function is adopted, so that the model convergence is quickened while the target positioning precision is optimized, and the robustness and accuracy under complex background and interference conditions are remarkably improved. Through the collaborative optimization of the three aspects, the detection performance and reliability of the weather identification model in a severe environment are obviously enhanced.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Corresponding to the abnormal weather identification method based on deep learning of the above embodiment, fig. 7 shows a block diagram of the abnormal weather identification apparatus 7 provided in the embodiment of the present application, and for convenience of explanation, only the portions related to the embodiment of the present application are shown.

Referring to fig. 7, the abnormal weather identification apparatus 7 includes:

The extraction module 71 is configured to perform feature extraction on an image to be detected based on a backbone network of the weather identification model that is trained in advance, so as to obtain image features;

The fusion module 72 is configured to fuse the image features based on the neck network of the weather identification model to obtain target fusion features;

The detection module 73 is configured to detect the target fusion feature based on a detection network of the weather identification model, so as to obtain a detection result of abnormal weather in the image to be detected;

Optionally, the EGMA module includes a channel awareness modulation sub-module, an ECA sub-module, a cross-channel spatial attention sub-module ESA sub-module, and the fusion module 72 includes a fusion unit configured to:

and performing spatial attention operation on the multi-attention enhancement feature by the ESA submodule to obtain a first output feature corresponding to the first input feature.

Optionally, the channel perception modulation submodule comprises a dimension substitution layer, a double-layer multi-perception machine layer, an inverse dimension substitution layer and an activation function layer first weighting layer, and the fusion unit is specifically used for:

Optionally, the cross-channel spatial attention submodule comprises a channel shuffling layer, a CBR layer and a CBS layer second weighting layer, and the fusion unit is specifically used for:

Optionally, the backbone network is provided with a C2fPRFF module, comprising a first convolution layer, a progressive fusion structure comprising at least two PRFFBlock, a first splice layer, a second convolution layer, for each second input feature of the input C2fPRFF module, the extraction module comprising an extraction unit for:

Optionally, PRFFBlock includes a progressive convolution structure, a second splice layer, a third convolution layer, and an ESE layer, the progressive convolution structure including at least two convolution layers in series, and for a third input feature of each PRFFBlock, the extraction unit is specifically configured to:

Optionally, the weather identification model is trained based on regression loss and classification loss, wherein the regression loss comprises focus fusion intersection ratio loss, and the formula is as follows:

It should be noted that, because the content such as the information interaction and the execution process between the above devices/units are based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

Fig. 8 is a schematic structural diagram of a physical layer of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic device 8 of this embodiment includes at least one processor 80 (only one processor is shown in fig. 8), a computer program 82 stored in the memory 81 and executable on the at least one processor 80, the processor 80 implementing the steps in any of the above-described deep learning-based abnormal weather identification method embodiments, such as steps 310-330 shown in fig. 3, when the computer program 82 is executed.

The processor 80 may be a central processing unit (CentralProcessingUnit, CPU), and the processor 80 may also be other general purpose processors, digital signal processors (DigitalSignalProcessor, DSP), application specific integrated circuits (ApplicationSpecificIntegratedCircuit, ASIC), off-the-shelf programmable gate arrays (Field-ProgrammableGateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 81 may in some embodiments be an internal storage unit of the electronic device 8, such as a hard disk or a memory of the electronic device 8. The memory 81 may in other embodiments also be an external storage device of the electronic device 8, such as a plug-in hard disk provided on the electronic device 8, a smart memory card (SMARTMEDIACARD, SMC), a secure digital (SecureDigital, SD) card, a flash memory card (FLASHCARD), etc.

Further, the memory 81 may also include both an internal storage unit and an external storage device of the electronic device 8. The memory 81 is used to store an operating device, an application program, a boot loader (BootLoader), a data other program, and the like, for example, program codes of a computer program, and the like. The memory 81 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on an electronic device, causes the electronic device to perform steps that may be carried out in the various method embodiments described above.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The computer readable medium may include at least any entity or device capable of carrying computer program code to a camera device/electronic apparatus, a recording medium, a computer memory, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), an electrical carrier signal, a telecommunications signal software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of modules or elements described above is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The foregoing embodiments are merely for illustrating the technical solution of the present application, but not for limiting the same, and although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the technical solution described in the foregoing embodiments may be modified or substituted for some of the technical features thereof, and that these modifications or substitutions should not depart from the spirit and scope of the technical solution of the embodiments of the present application and should be included in the protection scope of the present application.

Claims

1. An abnormal weather identification method based on deep learning is characterized by comprising the following steps:

the neck network is provided with an EGMA module, and the EGMA module enhances the global association strength of the expression capability of the weather features based on at least two attention mechanisms;

The EGMA module includes a channel awareness modulation sub-module, an ECA sub-module, a cross-channel spatial attention sub-module, and an ESA sub-module, for a first input feature input to the EGMA module:

Executing channel attention operation on the first enhancement feature by the ECA sub-module to obtain a second enhancement feature;

Executing cross-channel spatial attention operation on the second enhancement feature by the cross-channel spatial attention sub-module to obtain a multi-attention enhancement feature;

the ESA sub-module performs a spatial attention operation on the multi-attention-enhancing feature to obtain a first output feature corresponding to the first input feature;

The channel perception modulation submodule comprises a dimension replacement layer, a double-layer multi-perception machine layer, an inverse dimension replacement layer, an activation function layer and a first weighting layer, wherein the channel perception modulation submodule executes channel characteristic enhancement operation on the first input characteristic to obtain a first enhancement characteristic, and the channel perception modulation submodule comprises:

Each dimension parameter of the first input feature is replaced through the dimension replacement layer, so that a flattened space feature is obtained;

performing channel compression operation nonlinear activation operation on the flattened space features through the first multi-perception machine layer to obtain compression features;

Performing channel expansion operation on the compressed features through the second multi-perception machine layer to obtain first expansion features matched with the dimensions of the flattened space features;

performing inverse displacement on each dimension parameter of the first expansion feature through the inverse dimension displacement layer to obtain a reconstruction feature of which each dimension parameter is matched with the first input feature;

performing linear activation operation on the reconstruction feature through the activation function layer to obtain a channel modulation weight;

Performing a weighting operation on the first input feature based on the channel modulation weight through the first weighting layer to obtain the first enhancement feature;

the cross-channel spatial attention sub-module comprises a channel shuffling layer, a CBR layer, a CBS layer and a second weighting layer, and performs cross-channel spatial attention operation on the second enhancement feature to obtain a multi-attention enhancement feature, and the method comprises the following steps:

sequentially executing channel compression operation and normalization operation nonlinear transformation operation on the recombination features through the CBR layer to obtain compression activation features;

and performing a weighting operation on the recombined characteristic based on the spatial weight through the second weighting layer to obtain the multi-attention-enhancing characteristic.

2. The abnormal weather identification method of claim 1, wherein the backbone network is provided with a C2fPRFF module, the C2fPRFF module comprising a first convolution layer, a progressive fusion fabric comprising at least two PRFFBlock, a first splice layer and a second convolution layer, for each second input feature input to the C2fPRFF module:

Performing channel compression operation on the second input feature through the first volume layer to obtain a convolution feature;

Sequentially executing progressive fusion operation on the convolution characteristics through each PRFFBlock connected in series by the progressive fusion structure to obtain hierarchy fusion characteristics corresponding to each hierarchy;

performing splicing operation on the convolution characteristics and the hierarchy fusion characteristics through the first splicing layer to obtain first splicing characteristics;

3. The abnormal weather identification method of claim 2, wherein PRFFBlock comprises a progressive convolution structure, a second splice layer, a third convolution layer, an ESE layer, and a third splice layer, the progressive convolution structure comprising at least two convolution layers in series, a third input feature for inputting each of the PRFFBlock:

Performing splicing operation on the third input feature and each level convolution feature through the second splicing layer to obtain a second splicing feature;

Performing global average pooling operation and convolution operation on the second expansion feature through an ESE layer, and weighting the second expansion feature by the calculated channel attention weight to obtain an initial fusion feature;

and performing splicing operation on the third input feature and the initial fusion feature through the third splicing layer to obtain a third output feature corresponding to the third input feature.

4. The abnormal weather identification method according to any one of claims 1-3, wherein the weather identification model is trained based on regression loss and classification loss, wherein the regression loss comprises a focused fusion cross-ratio loss, and the formula is as follows:

Wherein the said For focusing fusion cross-ratio loss, ioU is the cross-ratio between the predicted frame and the real frame, gamma is the parameter for controlling the inhibition degree of abnormal value, andIs fusion-to-fusion ratio loss, the L _IoU is cross-to-fusion ratio loss, the L _dis is center distance loss, the L _shp is width-to-height loss, the L _ang is angle loss, the delta is center distance loss which is 2 times, the b and the bRepresenting the center point of the real frame and the center point of the predicted frame respectivelyRespectively represent squares of Euclidean distances between center points of a real frame and a predicted frame, wherein the squares are respectively represented by the squaresAnd saidRespectively representing the width and the height of the minimum circumscribed rectangular frame, wherein omega is 2 times of the width and the height loss, and w areThe width of the predicted frame and the width of the real frame are respectivelyRepresenting the square of the Euclidean distance between the real and predicted frame widths, h andThe heights of the predicted frame and the real frame are respectivelyRepresenting the square of the Euclidean distance between the real frame and the predicted frame heightAn angle loss of 2 times, saidAnd saidThe width and height of a rectangular frame constructed from the center points of the real and predicted frames are represented, respectively.

5. An abnormal weather identification apparatus, comprising:

the fusion module is used for fusing the image features based on the neck network of the weather identification model to obtain target fusion features;

the detection module is used for detecting the target fusion characteristic based on a detection network of the weather identification model to obtain a detection result of abnormal weather in the image to be detected;

The EGMA module comprises a channel perception modulation sub-module, an ECA sub-module, a cross-channel space attention sub-module and an ESA sub-module, and the fusion module comprises a fusion unit which is used for:

for a first input feature to input the EGMA module:

the channel perception modulation submodule comprises a dimension replacement layer, a double-layer multi-perception machine layer, an inverse dimension replacement layer, an activation function layer and a first weighting layer, wherein the fusion unit is specifically used for:

the cross-channel spatial attention submodule comprises a channel shuffling layer, a CBR layer, a CBS layer and a second weighting layer, and the fusion unit is specifically used for:

6. An electronic device comprising a memory, a computer program stored in the memory and executable on the processor, wherein the processor implements the deep learning based abnormal weather identification method of any one of claims 1 to 4 when the computer program is executed.

7. A computer program product storing a computer program, characterized in that the computer program, when executed by a processor, implements the method for depth learning based abnormal weather identification according to any one of claims 1 to 4.