[go: up one dir, main page]

CN112836713A - Identification and Tracking Method of Mesoscale Convective System Based on Image Anchorless Frame Detection - Google Patents

Identification and Tracking Method of Mesoscale Convective System Based on Image Anchorless Frame Detection Download PDF

Info

Publication number
CN112836713A
CN112836713A CN202110270336.XA CN202110270336A CN112836713A CN 112836713 A CN112836713 A CN 112836713A CN 202110270336 A CN202110270336 A CN 202110270336A CN 112836713 A CN112836713 A CN 112836713A
Authority
CN
China
Prior art keywords
feature map
network
layer
mask
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110270336.XA
Other languages
Chinese (zh)
Other versions
CN112836713B (en
Inventor
杨育彬
罗威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110270336.XA priority Critical patent/CN112836713B/en
Publication of CN112836713A publication Critical patent/CN112836713A/en
Application granted granted Critical
Publication of CN112836713B publication Critical patent/CN112836713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30181Earth observation
    • G06T2207/30192Weather; Meteorology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了基于图像无锚框检测的中尺度对流系统(Mesoscale ConvectiveSystem,MCS)识别与追踪方法,包括如下步骤:步骤1,原始静止卫星红外亮温数据预处理,对处理后得到的红外云图进行中尺度对流系统标注,然后进行训练集、验证集以及测试集的随机划分;步骤2,构建基于无锚框的实例分割网络,该网络用于提取图像特征、检测中尺度对流系统以及分割出具体实例;步骤3,训练集图像增强,并使用迁移学习监督训练实例分割卷积神经网络,自动学习网络参数;步骤4,利用训练后的模型来对相邻时刻的静止卫星红外云图进行中尺度对流系统检测与分割;步骤5,根据相关目标匹配原则来实现中尺度对流系统的追踪。

Figure 202110270336

The invention discloses a method for identifying and tracking a mesoscale convective system (Mesoscale Convective System, MCS) based on image anchor-free frame detection. Label the mesoscale convective system, and then randomly divide the training set, validation set and test set; step 2, construct an instance segmentation network based on anchor-free frames, which is used to extract image features, detect mesoscale convective systems, and segment out Specific example; Step 3, image enhancement of the training set, and use transfer learning to supervise the training instance to segment the convolutional neural network to automatically learn network parameters; Step 4, use the trained model to perform mesoscale on the geostationary satellite infrared cloud images at adjacent moments Convective system detection and segmentation; Step 5, to achieve the tracking of mesoscale convective systems according to the relevant target matching principle.

Figure 202110270336

Description

Image anchor-frame-free detection-based mesoscale convection system identification and tracking method
Technical Field
The invention belongs to the technical field of deep learning and computer vision, and particularly relates to a mesoscale convection system identification and tracking method based on image anchor-frame-free detection.
Background
Global warming in the climate has led in recent years to increased frequency and activity in heavily convected, disastrous weather. The Mesoscale Convective System (MCS) is a weather System with strong convection, has the characteristics of short life cycle and small spatial scale, is often accompanied by catastrophic weather such as rainstorm, typhoon, hail and the like, endangers life safety of people and causes huge loss to national economy. Particularly in civil aviation operation, the strong convective motion of the MCS can cause severe jolt of the airplane and even cause serious damage to the airplane body, thereby causing air crash accidents. Therefore, how to quickly and accurately identify the MCS instance and analyze the evolution and moving path thereof is very important and is an important research topic of the meteorological discipline today. In the past, the identification and tracking of MCS was mainly based on conventional observation data, however, these data have small coverage areas and are affected by various factors with large errors, so that it is difficult to effectively monitor and forecast the position and development trend of MCS. With the improvement of satellite remote sensing and radar detection capability, especially a high-space-time-resolution geostationary satellite is widely applied to MCS monitoring application by virtue of the characteristics of strong reliability, continuous observation, high precision and the like.
At present, most MCS identification methods are based on traditional image characteristics, namely identification is carried out according to relevant judgment standards, the methods depend on selection of characteristic threshold values too much, and the whole process is relatively complex and large in calculation amount. With the rapid development of deep learning, especially deep convolutional neural networks, complex computer vision tasks such as target detection, instance segmentation, target tracking and the like make breakthrough progress. Therefore, how to use the deep convolutional neural network to quickly, simply and effectively segment the MCS example has great significance for subsequent MCS tracking and related meteorological analysis.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem that the identification and tracking method of the mesoscale convection system based on image anchor-frame-free detection is provided aiming at various judgment standards of the traditional identification method of the mesoscale convection system in the meteorological field. Considering that the MCS form scale changes greatly, the current mainstream example segmentation method based on the heuristic preset anchor frame has a poor effect on the application, so that the category of each position in the Feature Map, the four edge distances to the corresponding frame and the production Mask are directly predicted by a pixel-by-pixel full convolution mode, and the MCS is continuously and effectively tracked for multiple times on the basis.
The method specifically comprises the following steps:
step 1, preprocessing an original infrared brightness temperature data file according to the relevant description of satellite data: cutting original satellite data, marking image polygon examples, and randomly dividing a training set, a verification set and a test set;
step 2, constructing a mesoscale convection system example segmentation convolutional neural network based on an anchor-free frame: the method comprises the steps of dividing the method into a main network for feature extraction, a feature pyramid network for fusing multi-scale features, a prediction frame network head for classification, distance frame regression and center-to-center (center-less) prediction, a Mask network head for generating a Mask and a network head for predicting a Mask IoU;
step 3, performing multi-scale image enhancement on the training set, and automatically learning network parameters by using the mesoscale convective system example segmentation convolutional neural network constructed in the step 2 of migration learning supervision training: training by adopting a small batch random gradient descent method, and setting loss functions of three network heads, namely a prediction frame, a generated Mask and a prediction Mask IoU;
step 4, carrying out mesoscale convection system example segmentation on the stationary satellite infrared cloud pictures at adjacent moments by using the trained network to obtain mesoscale convection system related records at continuous moments;
step 5, on the basis of the step 4, tracking of the mesoscale convection system is realized according to a related target matching principle, and meanwhile, the commonly occurring splitting and merging conditions are considered;
the method comprises the following steps of 1:
the data set used in the present invention is full resolution (4km) light and temperature data (60N-60S) for monitoring global precipitation obtained from 11 micron infrared channels on geostationary satellites GMS-5, GOES-8, Goes-10, Metasat-7, and Metasat-5, as provided by the National Oceanic and Atmospheric Administration, NOAA. In order to reduce discontinuities between adjacent geostationary satellites, the acquired data has been corner angle correlation corrected and finally stored in the form of a rectangular grid. Each infrared bright temperature data file name format is merg _ yyymddhh _4km-pixel, where yyyy represents year (e.g. 2020), mm represents month (range 01-12), dd represents date (range 01-31), and hh represents hour (range 00-23). Each file contains 2 records of light temperature data corresponding to hours 0 and 30, respectively, each record being 9896 × 3298 in size (9896 dimensions span 0.036378335 and 3298 dimensions span 0.036383683). Meanwhile, in order to store data with the size of one byte (8 bits with the range size of 0-255), the data obtained is to subtract 75 on the basis of the real monitored brightness temperature value, and 255 (corresponding to the real brightness temperature value 330) in the obtained data is an absent value.
Step 1-1, reading an infrared brightness temperature data file of each original geostationary satellite according to the related description of satellite data, wherein the file is an array in a format of 2 x 3298 x 9896, and obtaining two gray cloud pictures of 0 minute and 30 minutes at corresponding time;
in the step 1-2, the size of the remote sensing satellite image is 3298 × 9896, and the image is too large and is not suitable for the input of a subsequent example segmentation network, so that the gray cloud image obtained in the step 1-1 is cut. According to the latitude of 27N to 40N and the longitude of 110E to 125E in the Jianghuai region of China, the relevant description of the brightness temperature data is combined, so that the size of the Jianghuai region of China is obtained as follows:
Figure BDA0002974083080000021
amplifying the result, and cutting each gray cloud image into more than two (generally 240) gray infrared cloud images with the size of 420 × 360;
and 1-3, giving polygon example level labels of the mesoscale convection system for each gray level infrared cloud subgraph, filtering out subgraphs without the mesoscale convection system, and obtaining json files corresponding to each gray level infrared cloud subgraph, wherein the whole label only has one category. And the grayscale infrared cloud sub-images were randomly divided into 9407 training set images, 3135 verification set images, and 3136 test set images at a ratio of 6:2:2, thereby constituting a training set, a verification set, and a test set.
The step 2 comprises the following steps:
step 2-1, constructing a backbone network for feature extraction: the Backbone Network in the split Network of the embodiment of the invention adopts a convolutional neural Network VoVNetV2-99 (reference: An Energy and GPU-computing Efficient backhaul Network for Real-Time Object Detection.) with a total of 99 layers. The convolutional layers in the convolutional neural network VoVNetV2-99 are all in the form of Conv-BN-ReLU, i.e. in turn a combination of convolutional layers with a bulk normalization layer BN and a linear rectification function ReLU. Meanwhile, cross-layer connection similar to the residual error network ResNet is adopted in the convolutional neural network VoVNetV2-99 to realize identity mapping, and a residual block containing the identity mapping is defined as:
Y=F(X,{Wi})+X
where X and Y represent the input and output profiles, F (X, { W), respectively, for each building blocki}) is a residual mapping function that needs to be learned. Meanwhile, in order to improve the quality of feature representation, a channel attention eSE (Effective Squeeze-Excitation) (reference: center mask: Real-Time Anchor-Free distance Segmentation) mechanism is also introduced into the network residual block, so that the network focuses more on important feature map channels and inhibits irrelevant channels, and the specific implementation mode is as follows:
AeSE(Xdiv)=σ(WC(gap(Xdiv)))
Figure BDA0002974083080000031
wherein XdivFor the diversified feature map obtained by dimensionality reduction after the network feature map cascade connection,
Figure BDA0002974083080000032
(W and H are the width and height, respectively, of feature X) is channel-level global average pooling, WCFor full connection layer weight, Xi,jFor the feature value of the feature map X at (i, j), σ is expressed as Sigmoid activation function, AeSE(Xdiv) A characterizer for the computed channel attention, said characterizer associated with XdivElement-level multiplication (symbol for element-level multiplication)
Figure BDA0002974083080000033
Representation) finally obtained refined feature map Xrefine
The specific construction steps of the convolutional neural network VoVNetV2-99 comprise:
step 2-1-1, constructing a network Stem stage 1: this stage contains three convolutional layers: first, the convolutional layer with convolution kernel size of 3 × 3, step size of 2, padding of 1, and output channel number of 64 (if not described otherwise, the convolutional layer with convolution kernel size of 3 × 3 defaults to step size of 1 and padding of 1) is used to perform downsampling on the input image, and then a convolutional layer with convolution kernel sizes of 3 × 3 and output channel number of 64 and a convolutional layer with convolution kernel size of 3 × 3, step size of 2, and output channel number of 128 are connected. After the input image passes through the stage, a first scale feature map C1 is generated, wherein the feature map C1 is 4 compared with the input image scale Output Stride;
step 2-1-2, constructing a network single Aggregation (OSA) module stage 2: this stage contains a residual block containing 5 convolution kernels of size 3 x 3,Outputting convolution layers with 128 channels, obtaining diversified feature maps with 128 channels by 5 convolution layers after convolution operation of input images, performing cascade operation on the feature maps of the layers, the feature maps of the first 4 layers and the input first scale feature map C1 on the last convolution layer to obtain a feature map with 128 × 6 to 768 channels, and performing dimension reduction through convolution layers with convolution kernel size of 1 × 1, step size of 1, padding of 0 and output channel number of 256 to obtain diversified feature maps XdivCombining the channel attention eSE mechanism to obtain the final Xrefine,XrefineElement-level addition with the input signature is also required to achieve identity mapping. Finally, the second scale feature map C2 with the channel number of 256 is obtained at the stage, and the feature map C2 is 4 compared with the input image scale Output Stride;
step 2-1-3, constructing a network single polymerization module stage 3: this phase is similar to the OSA module phase 2, except that it first performs 2-fold downsampling using a 3 × 3 max pooling layer with step size 2 and padding 0, and employs a 3 × 3 adjustable warped convolution in the residual block (ref: Deformable ConvNet v2: More Deformable, Better Results.) instead of regular convolution, the adjustable warped convolution being defined as:
Figure BDA0002974083080000034
where K is the total number of convolution kernel sample locations (e.g., K is 9 for a convolution kernel size of 3 × 3), and w iskFor the convolution kernel weight, pkFor a predefined offset of the kth position with respect to the central position of the receptive field (e.g., p when K is 9)kE { (-1, -1), (-1,0), (-1,1), (0, -1), (0,0), (0,1), (1, -1), (1,0), (1,1) }), x (p) and y (p) are the eigenvalue of the input profile x at position p and the eigenvalue of the output profile y at position p, respectively, Δ pkOffset of the k-th position, Δ m, which can be learnedk∈[0,1]The number of adjustments for the k-th position. The adjustable deformed convolution implementation mode is to add an additional spatial resolution which is the same as that of the conventional convolution in the conventional convolutionThe convolution layer of the rate and the expansion rate is used for learning the offset and the adjustment number of each position of the feature map in the x and y directions of the two-dimensional plane.
The OSA module stage 3 includes 3 residual blocks, each of which includes 5 scalable convolutional layers with convolution kernel size of 3 × 3 and output channel of 160, and as with the network single aggregation module stage 2 in step 2-1-2, the step map, the first 4 layer feature maps and the input second scale feature map C2 are cascaded on the last convolutional layer to obtain a diversified feature map including 256+160 × 5 × 1056 channels, which is the diversified feature map of the first residual block, and similarly, the number of the diversified feature map channels obtained after the second and third residual block cascade operations is 512+160 × 5 × 1312, and the output channel number of the conventional convolutional layer with step size of 1 after feature map cascade operation and 1 × 1 after mapping being 0 is 512, which also includes a channel attention mechanism, and after the operation of the OSA module stage 3, a third scale feature map C3 with channel number of 512 is finally obtained, feature map C3 is 8 compared to the input image scale Output Stride;
step 2-1-4, constructing a network single polymerization module stage 4: this stage is similar to the OSA module stage 3 described above, except that it contains 9 residual blocks, each containing 5 adjustable warped layers with a convolution kernel size of 3 x 3 and an output channel of 192, the same as the network single aggregation module stage 2 and the network single aggregation module stage 3, the feature map of the layer, the feature map of the previous 4 layers and the input second scale feature map C3 are cascaded on the last convolution layer to obtain a diversified feature map with 512+192 × 5 ═ 1472 channels, which is the diversified feature map of the first residual block, and similarly, the number of the diversified feature map channels obtained after the cascade operation of the second to ninth residual blocks is 768+192 × 5 ═ 1728), and the number of output channels of a 1 × 1 conventional convolutional layer with the step size of 1 and the padding of 0 for dimensionality reduction after the cascade of feature maps is 768, and the channel attention mechanism is also included. Finally obtaining a fourth scale feature map C4 after the network single aggregation module stage 4, wherein the feature map C4 is 16 compared with the input image scale Output Stride;
step 2-1-5, constructing the last structure of the backbone network, and performing single aggregation module stage 5: the phase structure is similar to phase 4, but the phase only contains 3 residual blocks, each of which contains 5 adjustable warped convolutional layers with a convolutional kernel size of 3 x 3 and an output channel of 224, the same as the network single aggregation module stage 2 and the network single aggregation module stage 3, the feature map of the layer, the feature map of the first 4 layers and the input second scale feature map C3 are cascaded on the last convolution layer to obtain a diversified feature map with the number of 768+224 × 5-1888 channels, which is the diversified feature map of the first residual block, and similarly, the number of channels of the diversified feature map obtained after the second and third residual blocks are cascaded is 1024+224 × 5-2144, and the number of output channels of the 1 × 1 conventional convolutional layer with the step size of 1 and the padding of 0 for dimensionality reduction after the feature map cascade is 1024, and the channel attention mechanism is also included. After this stage, a feature map C5 is finally obtained, where the feature map C5 is 32 compared to the input image scale Output Stride;
and 2-2, combining a Feature Pyramid Network (FPN) (reference: Feature Pyramid networks for object detection) to fuse the multi-scale features. And (3) carrying out top-down combination on the features { C3, C4, C5} of different scales obtained in the step 2-1 by the feature pyramid network FPN, and combining transverse connection to fuse the features to obtain { M3, M4, M5 }. Wherein M5 is obtained by passing through a 1 × 1 convolutional layer from a feature map C5, M4 is obtained by performing element-level addition on a feature map obtained by passing through a 1 × 1 convolutional layer from a feature map C4 after nearest neighbor 2-fold upsampling by M5, and M3 is obtained by performing element-level addition on a feature map obtained by passing through a 1 × 1 convolutional layer from a feature map C3 after nearest neighbor 2-fold upsampling by M4.
Finally, for each layer in the { M3, M4 and M5}, the aliasing influence caused by nearest neighbor interpolation is relieved through convolution with the convolution kernel size of 3 multiplied by 3 and the number of output channels of 256, and the feature layer { P3, P4 and P5} is obtained.
Additionally adding a P6 feature layer and a P7 feature layer in the example segmentation network, wherein the P6 feature layer and the P7 feature layer are obtained by performing 2-time downsampling on P5 and P6 through 13 × 3 convolutional layer with the step size of 2 respectively, and finally obtaining feature layers { P3, P4, P5, P6 and P7 };
step 2-3, constructing an example segmentation frame prediction header, wherein the example segmentation frame prediction header comprises 2 branches which are respectively a classification branch and a center-to-ness (reference: FCOS: full connected One-Stage Object Detection.) and a multitask branch parallel to distance frame regression, and the method specifically comprises the following steps:
step 2-3-1, constructing a classification branch: after each feature layer (i.e., P3-P7) in the feature pyramid FPN, three conventional convolutional layers with 256 input channels and 256 output channels and 3 × 3 convolutional kernel size and an adjustable warped convolutional layer are sequentially connected, and the four convolutional layers do not adopt Batch Normalization (BN) but use Group Normalization (GN) to avoid the influence of Batch size on the network model. Because only the class of the mesoscale convection system is detected, a convolution layer for predicting classification is added at the end of a classification branch, the number of input channels is 256, the number of output channels is 1, and the size of a convolution kernel is 3 multiplied by 3;
step 2-3-2, constructing a multi-task branch with the centrality and the distance frame regression being parallel: the part is also sequentially connected with three conventional convolution layers with the same structure as the classification branch and an adjustable deformation convolution layer behind the feature pyramid network FPN feature layer of each scale. However, connected after these four convolutional layers is a convolutional layer containing two branches in parallel: the number of output channels for frame regression is 4, namely 3 multiplied by 3 convolution layers, and four dimension values of the branch output respectively represent the distance from the current position to four edges of the frame; 3 x 3 convolutional layers with the number of output channels being 1 for predicting centrality, the branch output being a one-dimensional representation centrality value;
assuming that the distance d from a certain sample point (x, y) on the feature map F to the four sides of the target frame to which the point belongs is (l, t, r, b), where l, t, r, b respectively represent the distances from the sample point (x, y) to the left, upper, right and lower sides of the rectangular frame, the centrality cennteress is defined as:
Figure BDA0002974083080000051
where min () and max () are minimum and maximum taking functions, respectively. When the example segmented convolutional neural network is used for detection, the predicted centrality value is multiplied by the classification score to obtain the final confidence. The centrality branch mainly suppresses low-quality frames farther from the center point of the target object, and it can be found from the centrality definition that if the sample point (x, y) is closer to the center of the target frame, the centrality value is closer to 1, otherwise, the centrality value is closer to 0, so that the classification of the farther points is multiplied by a smaller centrality value to obtain a lower confidence, so that the low-quality frames regressed by the farther points are more easily filtered in the Non-Maximum Suppression (NMS) stage.
And 2-4, similarly to the generation of the network RPN by the region in the Anchor-based detector such as Mask R-CNN, predicting a plurality of frame regions by the frame head, and obtaining 100 regions of interest RoI (region of interest) after the regions are subjected to partial elimination through non-maximum suppression with IoU threshold of 0.6. In view of the implementation principle of Mask R-CNN, the method for constructing the RoI header of the instance segmentation network based on the non-anchor frame specifically comprises the following steps:
step 2-4-1, constructing a region-of-interest alignment RoI Align (reference: Mask R-CNN.) layer: the RoI Align layer is firstly mapped to a corresponding feature pyramid network FPN feature layer P according to the size of the RoI of the region of interestkThe implementation mode is as follows:
k=Ceil(kmax-log2(Ainput/ARoI))
wherein A isinputAnd ARoIRespectively representing the area of the input image and the area of the RoI, kmaxSetting the number of the last layer of the backbone network as 5, setting Ceil () as an upward rounding function, and setting k as the number of the characteristic layers of the FPN of the characteristic pyramid network mapped by the RoI;
the corresponding position area on the mapped feature map is divided into 196 small areas with the same size by the RoI Align layer, the small areas are sampled in a self-adaptive mode, the central point position of each small area is taken, a bilinear interpolation method is used for calculation, and finally a feature map with the fixed size of 14 x 14 is obtained;
step 2-4-2, constructing a Mask head comprising a spatial attention mechanism: the Mask header contains four convolutional layers with a convolutional kernel size of 3 × 3 and 256 input/output channels. In order to make the branch focus more on the meaningful pixel points, a spatial attention module is introduced after the fourth convolution layer, and the implementation mechanism is as follows:
Figure BDA0002974083080000061
Figure BDA0002974083080000063
wherein XiFor inputting a feature map, Asag(Xi) Spatial attention feature descriptor, PmaxAnd PavgA feature map representing maximum pooling and an average pooling in the channel dimension,
Figure BDA0002974083080000064
representing cascade operation, F3×3Is a 3 × 3 convolutional layer, σ is a Sigmoid function,
Figure BDA0002974083080000062
is element-level multiplication, XsagA characteristic diagram finally combining spatial attention;
then X obtainedsagAnd performing up-sampling by a deconvolution layer with convolution kernel size of 2 multiplied by 2 and step length of 2 to obtain a characteristic diagram with size of 28 multiplied by 28 and same channel number. The convolution layer at the last layer of the Mask head is a Mask prediction layer specific to each category, and the detection target is a category of a mesoscale convection system, so the convolution kernel size of the Mask prediction layer is 1 multiplied by 1, the step length is 1, and the number of output channels is 1;
step 2-4-3, construct maskIoU heads, and use Mask score Mask Scoring (ref: Mask Scoring R-CNN.) to re-express the Mask quality: and 2-4-2, performing 2-time down-sampling on the output characteristic diagram of the Mask prediction layer by adopting a 2 x 2 maximum pooling layer, and then cascading the output characteristic diagram with the RoI characteristic diagram with the output size of 14 x 14 and the channel number of 256 of the RoI Align layer to obtain a characteristic diagram with the size of 28 x 28 and the channel number of 257.
The maskIoU header contains four successive convolutional layers with convolutional kernel size of 3 × 3 and output channel number of 256, where the step size of the last convolutional layer is 2. Then, 2 full-connection layers with the output channel number of 1024 and 1 full-connection layer with the output channel number of 1 are also connected;
the step 3 of the invention comprises:
step 3-1, because the infrared cloud atlas of the data set is a single-channel gray image, the gray image in the training set is converted into three channel images of RGB (red, green and blue) for subsequent transfer learning, the values of the three channels of RGB are the same and are the gray values of the gray image, and data enhancement is performed on the converted images in the training set and corresponding example labels: the image is first scaled to multiple scales, with the long side at 1333 and the short side at a random one of {640,672,704,736,768,800}, and the original scale of the image is preserved while also data enhancement is performed using random horizontal flipping.
The pixel values are centrally processed, and since the training uses a pre-trained model on the ImageNet dataset, the images need to be normalized according to the mean value in the ImageNet dataset. The normalization operation is carried out according to channels, the mean values of R, G, B channels are 123.675, 116.28 and 103.53 respectively, the standard deviations are 58.395, 57.12 and 57.375 respectively, and the mean scalar combination of the RGB three channels is expressed as a vector
Figure BDA0002974083080000071
The scalar combination of the standard deviation of three channels of RGB is expressed as a vector sigma, an input image is set as x, and the normalized image data x' is set as
Figure BDA0002974083080000072
Then, filling the image scale into a multiple of 32 to avoid feature loss caused by subsequent convolution operation;
correspondingly, the corresponding instance labels of the input image after zooming and horizontal turning data enhancement are also subjected to the same transformation to obtain correct labels;
step 3-2, setting a classification loss function of the head of the predicted frame in the example segmentation network: the classification task uses the Focal Loss function, Focal local, to ameliorate the class imbalance problem in the one-stage detector. Since only one class of labels is included in the dataset, only one two-classifier needs to be trained. Consider that the network treats each location as a training sample rather than an anchor box, let feature graph Fi(i-3, 4.. 7) position (x) obtained by classification branchingi,yi) Predicted value is
Figure BDA0002974083080000073
Is provided with
Figure BDA0002974083080000074
Wherein
Figure BDA0002974083080000075
Indicating whether it is a positive or negative sample,
Figure BDA0002974083080000076
indicates the position (x)i,yi) In the case of a positive sample,
Figure BDA0002974083080000077
then the position (x) is indicatedi,yi) In the form of a negative sample, the sample,
Figure BDA0002974083080000078
is a position (x)i,yi) Probability of being a positive sample;
will position (x)i,yi) The corresponding position (x ', y') mapped to the input image is:
Figure BDA0002974083080000079
wherein s isiIs a characteristic diagram FiCompared to the Output Stride of the input image scale, if (x ', Y') falls into any one of the Group Truth (GT) frames, then the Y is 1 for the positive sample, otherwise the Y is 0. Alpha balance based focus loss function
Figure BDA0002974083080000081
Expressed as:
Figure BDA0002974083080000082
where α is the weighting factor, γ ≧ 0 is the adjustable focusing parameter, and typically α and γ are set to 0.25 and 2.0, respectively. For feature map FiTotal classification loss function of
Figure BDA0002974083080000083
Comprises the following steps:
Figure BDA0002974083080000084
step 3-3, setting a centrality loss function: given feature map FiMiddle and positive sample feature point (x)i,yi) Distance frame regression target
Figure BDA0002974083080000085
Wherein
Figure BDA0002974083080000086
Respectively represent characteristic points (x)i,yi) Distances to the left, upper, right and lower sides of the true border. Then the feature point (x)i,yi) Is a centrality target
Figure BDA0002974083080000087
Is defined as:
Figure BDA0002974083080000088
it is obvious that
Figure BDA0002974083080000089
Therefore, during training, the central degree branch adopts a binary cross entropy loss function. Let characteristic diagram FiObtaining position (x) through centrality branchingi,yi) Predicted value of (2)
Figure BDA00029740830800000810
Then the characteristic diagram FiTotal centrality loss function
Figure BDA00029740830800000811
Comprises the following steps:
Figure BDA00029740830800000812
wherein
Figure BDA00029740830800000813
Indicating whether it is a positive or negative sample,
Figure BDA00029740830800000814
indicates the position (x)i,yi) In the case of a positive sample,
Figure BDA00029740830800000815
then the position (x) is indicatedi,yi) Are negative examples.
Figure BDA00029740830800000816
To indicate a function, if the condition in parentheses
Figure BDA00029740830800000817
If yes, the indication function value is 1, otherwise, the value is 0;
step 3-4, setting a distance regression loss function: the regression task adopts a GIoU loss letterNumber, set distance
Figure BDA00029740830800000818
Wherein
Figure BDA00029740830800000819
Respectively represent characteristic points (x)i,yi) Distances to the left, top, right, and bottom of the predicted bounding box. Regression mesh distance target
Figure BDA00029740830800000820
Wherein
Figure BDA00029740830800000821
Respectively represent characteristic points (x)i,yi) Distances to the left, upper, right and lower sides of the true border. If position (x)i,yi) And selecting the frame with the smallest area from the plurality of GT frames as a distance regression target. Will be at distance
Figure BDA0002974083080000091
And
Figure BDA0002974083080000092
viewed as a circumscribed border, the GIoU loss function is expressed as:
Figure BDA0002974083080000093
Figure BDA0002974083080000094
wherein C is
Figure BDA0002974083080000095
And
Figure BDA0002974083080000096
the outer border of the two is represented by | l which represents the area of the region,
Figure BDA0002974083080000097
indicates that the region C does not contain
Figure BDA0002974083080000098
And
Figure BDA0002974083080000099
the area of the part(s) is,
Figure BDA00029740830800000910
is composed of
Figure BDA00029740830800000911
And
Figure BDA00029740830800000912
the cross-over-cross-over ratio of (c),
Figure BDA00029740830800000913
is composed of
Figure BDA00029740830800000914
And
Figure BDA00029740830800000915
the generalized cross-over ratio of (1).
Feature map FiThe total distance regression loss function is:
Figure BDA00029740830800000916
wherein
Figure BDA00029740830800000917
The meaning is the same as in step 3-3,
Figure BDA00029740830800000918
for the loss function element level weighting coefficients, the value is the position (x)i,yi) A centrality regression target;
step 3-5, setting a Mask loss function: predicting the bounding box header before passing through the Mask Head results in a plurality of proposed bounding boxes, and obtaining the RoI of the maximum number of each picture of 100 according to a score threshold of 0.05 and a non-maximum suppression threshold of IoU of 0.6. To improve the convergence rate and improve the detection performance, a GT frame is also added to the RoI for network training. Setting N total RoIs added with GT frames and K total GT frames, calculating IoU between the RoI and the GT, dividing the RoI into positive and negative samples according to a IoU threshold value of 0.5, wherein a positive sample label is 1 and a negative sample label is 0, obtaining a dictionary D corresponding to the ith RoI and the jth RoI which is most matched with the ith RoI, which belongs to 0, N) and related information of the GT frames, and finally sampling according to the proportion that the positive samples account for all the samples 1/4 to obtain a training sample X for Mask Head.
Since Mask Loss is defined only on the positive sample, foreground screening of the training sample X is also required. Setting the head of the prediction frame to obtain M positive samples RoI and the ith RoIiAlignment of RoIAlign through the region of interest yields a feature map F of 14 × 14 sizei,FiObtaining a prediction feature map pred with 28 multiplied by 28 size after Mask Headi. Mask target gt _ MaskiFirstly, finding GT information (including category, frame and polygon Mask) according to the RoI serial number of D, then cutting the frame area on the original Mask according to the frame information, finally, re-adjusting the cut part to 28 × 28 to obtain the final GT _ Maski. Computing L using an average two-value cross entropy loss functionmask
Figure BDA0002974083080000101
Wherein
Figure BDA0002974083080000102
And
Figure BDA0002974083080000103
is a feature map prediAnd gt _ maskiA characteristic value at an (x, y) position;
step 3-6, settingmaskIoU loss function: the predicted value obtained by intersecting the ith characteristic diagram after cascade operation through the prediction mask and comparing the ith characteristic diagram with the MaskIoU Head of the Head is pred _ MaskIoUiMaskIoU is targeted to gt _ MaskIoUiThe value is obtained in step 3-5 according to the predicted Mask and the Mask information in the corresponding GT to obtain GT _ Mask ioui. By using
Figure BDA0002974083080000106
Calculating L by a loss functionmaskiou
Figure BDA0002974083080000104
Step 3-7, setting a multitask loss function: adding the classification loss function, the centrality loss function, the distance regression loss function, the Mask loss function and the maskIoU loss function to obtain a total multi-task loss function L of the small batch:
Figure BDA0002974083080000105
wherein N iscls、Nctn、Nreg、Nmask、NmaskIoUIs a normalized coefficient of the corresponding loss function, and Ncls=Nctm=NregTo predict the number of positive samples in the bounding box header, Nmask=NmaskIoUThe number of positive samples obtained for the proposed frames proposals predicted in the prediction frame header based on the IoU threshold and the ratio of sampled positive and negative samples. Lambda [ alpha ]ctn、λcls、λcls、λclsThe balance coefficients for each loss are all 1;
and 3-8, setting relevant parameters of network learning and training.
The steps 3-8 comprise: transfer learning is performed using pre-trained weights of VovNetV2-99 on ImageNet. Selecting 1000 propulses with highest confidence on the pre-frame measurement result, and reserving 100 propulses through non-maximum inhibition, wherein the non-maximum inhibition is performed on NMSIoU the threshold was 0.6. The propusals with IoU value greater than 0.5 with the true mark box is considered as positive samples in Mask Head, otherwise, negative samples, and the number of positive samples used for training accounts for 1/4 of all training samples. Meanwhile, the network training adopts small-batch random gradient descent to optimize the network model, the batch size is 2, the learning rate is 0.0025, a Warmup strategy is adopted in the initial 1000 iterations, and meanwhile Momentum with the coefficient of 0.9 is used. In terms of regularization, 10 is adopted-4The weight attenuation coefficient of (2). The whole training epoch is 24 rounds, a learning rate step decreasing strategy is adopted in the 16 th round and the 22 nd round, and the decreasing coefficient is 0.1. After the learning parameters are set, training the constructed network by using the training set processed in the step 3-1;
the step 4 of the invention comprises:
step 4-1, performing the same processing as in step 3-1 on the image needing instance segmentation, except that only single-scale scaling is performed on the image during inference, namely the long side is 1333 and the short side is 800;
step 4-2, the trained network performs forward calculation on the test image obtained in the step 1-3, proposed frames propalss with confidence coefficient smaller than 0.05 are removed from the head of a predicted frame, 1000 proposed frames propalss with highest confidence coefficient are screened out, 50 RoIs at most are obtained through non-maximum inhibition NMS for Mask branch prediction, and finally a kth Mask (but the data used in the text only comprises one type) is obtained according to the type k of the classified branch prediction, the Mask with the size of 28 × 28 is scaled to the size of the corresponding RoI scale, and binarization is performed through a threshold value of 0.5 to generate a final Mask;
step 4-3, predicting Mask intersection by Top-50 propulses through the maskIoU head, and multiplying the Mask intersection by the classification confidence to obtain the Mask confidence;
step 4-4, adopting NMS (network management system) according to the Mask generated in the step 4-2 and the Mask confidence coefficient obtained in the corresponding step 4-3, wherein the threshold value of IoU is 0.5, and screening out the final prediction Mask;
and 4-5, carrying out mesoscale convection system example segmentation on the stationary satellite infrared cloud pictures of the Jianghuai region of China at adjacent moments to obtain mesoscale convection system examples at continuous moments.
The step 5 of the invention comprises:
step 5-1, obtaining the mass center coordinate of each mesoscale convection system example
Figure BDA0002974083080000111
Characteristic area, strength P:
Figure BDA0002974083080000112
area=N
Figure BDA0002974083080000113
where N is the total number of pixel points, x, encompassed by the mesoscale convective system exampleiAnd yiIs the abscissa and ordinate of the ith pixel point, f (i) is the brightness temperature value of the ith pixel point, fzThe sum of the lighting temperatures of the pixels in the example is accumulated, and the image resolution is required to be combined in the subsequent calculation;
step 5-2, solving the duration, the position change of the centroid point, the area change and the intensity change of the mesoscale convection system at two adjacent moments according to the time sequence; if the position change of the centroid point and the area change of the centroid point are less than or equal to 50m/s and less than or equal to 5km in two mesoscale convection systems at adjacent moments2The intensity change is less than or equal to 0.001 ℃/s, and the initial judgment is the same target;
and 5-3, considering that the splitting and merging phenomena commonly existing in the mesoscale convection system need to be correspondingly processed. If there are n mesoscale convective systems MCS at a certain time, it is recorded as X1,X2,…,XnAnd one MCS at the previous moment is recorded as PjIf the tracking matching principle in the step 5-2 is satisfied, the phenomenon belongs to the splitting condition, and X with the largest product after splitting is selectedmContinuation PjM is more than or equal to 1 and less than or equal to n; updating XmThe index value of the previous time of the path is j and the duration is PjDuration +1, while other MCSs are considered newThe index value of the cloud picture at the previous moment is empty and the duration is initialized to 1;
if more than two MCSs are marked as P at the previous moment1,P2,…,PzOne MCS associated with this time is denoted PaIf the MCS tracking matching principle is satisfied, the phenomenon belongs to the merging situation, and P with the largest area at the previous time is selectedm(1. ltoreq. m. ltoreq.z) as PaLast moment track of, update PmThe index value of the previous time of the path is m and the duration is PmDuration of + 1; and when the life cycle of the MCS with the non-maximum area at the previous moment is finished, judging whether the duration time is more than or equal to 1h, and if so, obtaining the corresponding MCS path in a reverse order according to the index value at the previous moment, thereby realizing the tracking of the MCS.
Has the advantages that: MCS identification and tracking are core concerns of meteorological disaster forecasting, in the past, MCS identification methods are usually based on traditional image characteristics, the methods depend on selection of a judgment threshold value, and the whole process involves more image processing technologies and is more complicated. The invention adopts a method based on a deep convolution neural network to identify the MCS, and avoids the problems of sensitivity to parameters such as the size of the anchor frame dimension, the transverse-longitudinal ratio and the quantity of the anchor frame in a detection method based on a heuristic preset anchor frame in a full convolution mode without the anchor frame. Meanwhile, the invention further improves the modeling capability of the MCS geometric deformation by combining with the deformable convolution, and focuses on the important channel or position information more. Compared with other deep learning segmentation methods, the method not only achieves better segmentation performance on MCS identification, but also has fewer network parameters and faster training and inference speed.
Drawings
The foregoing and other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a flow chart of a mesoscale convection system identification and tracking process of the present invention;
FIG. 2 is an infrared brightness-temperature-grayscale image of an original geostationary satellite according to an embodiment of the present invention;
FIG. 3 is a structure of an exemplary split network based on an anchorless frame of the present invention;
FIG. 4 is a block structure of a residual block of a backbone network of a partitioned network in accordance with an embodiment of the present invention;
FIG. 5 is a graph illustrating the effect of segmentation in an example of a mesoscale convection system of the present invention;
FIG. 6 is a graph illustrating the tracking effect of the mesoscale convection system of the present invention;
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 1, the workflow of identification and tracking of the mesoscale convection system constructed by the method of the present invention can be roughly divided into four stages: the first stage, the data of original satellite data is preprocessed and labeled; the second stage, constructing an instance segmentation network model; in the third stage, training and deducing a network model; and a fourth stage, tracking the detected continuous time mesoscale convection system example. The method for identifying and tracking the mesoscale convection system in the embodiment of the invention specifically comprises the following construction steps:
step 1, as the mesoscale convection system mostly occurs in summer, the infrared brightness and temperature data of the geostationary satellite in the year 2000 to the year 2017 and in the month 6 to the month 9 provided by the national oceanic and atmospheric administration are partially randomly screened to serve as the operation data of the embodiment of the invention. In addition, the screened original satellite data needs to be preprocessed:
(1) reading an array of each original geostationary satellite infrared brightness temperature data file in a format of 2 x 3298 x 9896 according to the related description of the satellite data to obtain a gray cloud chart as shown in fig. 2 at corresponding time points of 0 and 30, wherein a blank part with a pixel value of 255 in the gray cloud chart is regarded as a missing value;
(2) and (3) cutting the gray cloud image obtained in the step 1-1 because the size of the remote sensing satellite image is 3298X 9896, and the image is too large and is not suitable for the input of a subsequent instance segmentation network. According to the latitude of 27N to 40N and the longitude of 110E to 125E in the Jianghuai region of China, the relevant description of the brightness temperature data is combined, so that the size of the Jianghuai region of China is obtained as follows:
Figure BDA0002974083080000131
amplifying the result, and cutting the whole gray cloud picture into a plurality of sub-pictures of 420 x 360;
(3) and giving polygon example level labels of the mesoscale convection system for each gray level infrared cloud subgraph, filtering out subgraphs without the mesoscale convection system, and obtaining json files corresponding to each image, wherein the whole label only has one category. And randomly dividing the sub-image into 9407 training set images, 3135 verification set images and 3136 test set images at a ratio of 6:2: 2;
step 2, constructing an anchor-frame-free mesoscale convection system example segmentation convolutional neural network, wherein the example segmentation network structure is shown in fig. 3, and the example segmentation network structure in fig. 3 comprises: a backbone network for extracting deep abstract features of the image, wherein the backbone network comprises five stage feature maps of C1, C2, C3, C4 and C5; the feature pyramid network fusing the multi-scale features also comprises five feature layers P3, P4, P5, P6 and P7 with different scales; a network header for a predicted bounding box, comprising two large branches: classifying large Classication branches, large branches with the centrality Center-Ness branch and the distance frame Regression branch in parallel, wherein the head is shared by feature pyramid layers with different scales; a network head of a prediction Mask, which contains a space attention mechanism SAM and needs to perform region-of-interest alignment RoIAlign before entering the head; the input feature map is obtained by performing maximum pooling Maxpooling downsampling on a Mask head output feature map and performing cascade operation Concat on a feature map obtained by RoIAlign;
and 2-1, constructing a backbone network for feature extraction. The backbone network in the split network of the embodiment of the invention adopts a convolutional neural network VoVNetV2-99 with a total of 99 layers. The convolutional layers in the backbone network are all in the form of Conv-BN-ReLU, namely, the combination of the convolutional layers, the batch normalization layer BN and the linear rectification function ReLU is sequentially carried out. Meanwhile, cross-layer connection in a similar residual error network ResNet is adopted in the network structure block to realize identity mapping, and the residual error block containing the identity mapping is defined as follows:
Y=F(X,{Wi})+X
where X and Y represent the input and output profiles, F (X, { W), respectively, for each building blocki}) is a residual mapping function that needs to be learned. Meanwhile, in order to improve the quality of feature representation, a channel attention eSE (Effective squeee-Excitation) mechanism is also introduced into the network residual block, so that the network focuses more on important feature map channels and inhibits irrelevant channels, and the specific implementation mode is as follows:
AeSE(Xdiv)=σ(WC(gap(Xdiv)))
Figure BDA0002974083080000132
wherein XdivFor the diversified feature map obtained by dimensionality reduction after the network feature map cascade connection,
Figure BDA0002974083080000141
(W and H are the width and height of feature X) is the channel-level global average pooling, WCFor full connection layer weight, Xi,jFor the feature value of the feature map X at (i, j), σ is expressed as Sigmoid activation function, AeSE(Xdiv) Feature descriptors for calculated channel attention, the descriptors and XdivElement level multiplication
Figure BDA0002974083080000142
Finally obtaining the refined characteristic diagram XrefineThe residual block structure combining the identity mapping and the channel attention mechanism is shown in fig. 4. The structure of the whole backbone network VoVNetV2-99 specifically includes the following:
a. stem stage 1: this stage comprises three convolutional layers: first, the convolutional layer with convolution kernel size of 3 × 3, step size of 2, padding of 1, and output channel number of 64 (if not described otherwise, the convolutional layer with convolution kernel size of 3 × 3 defaults to have step size of 1 and padding of 1) is used to perform downsampling on the input image, and then a convolutional layer with convolution kernel sizes of 3 × 3 and output channel number of 64 and a convolutional layer with convolution kernel size of 3 × 3, step size of 2, and output channel number of 128 are connected. After the input image passes through the stage, a first scale feature map C1 is generated, wherein the feature map C1 is 4 compared with the input image scale Output Stride;
b. single polymerization OSA (One-Shot Aggregation) module stage 2: the stage comprises a residual block, wherein the residual block comprises 5 convolution layers with convolution kernel size of 3 × 3 and output channel of 128, the five convolution layers of the input image are respectively subjected to convolution operation to obtain a diversified feature map with channel number of 128, and the feature map of the layer is subjected to cascade operation on the last convolution layer, the feature map of the previous 4 layers and the input first scale feature map C1 to obtain a feature map with channel number of 128 × 6-768. Then, dimension reduction is carried out through a convolution layer with the convolution kernel size of 1 multiplied by 1, the step length of 1, the padding of 0 and the output channel number of 256 to obtain diversified feature maps XdivCombining the channel attention mechanism eSE to obtain the final Xrefine,XrefineElement-level addition with the input signature is also required to achieve identity mapping. Finally, this stage results in a second scale feature map C2, feature map C2 still being 4 compared to the input image scale Output Stride;
osa module stage 3: this phase is similar to the OSA module phase 2, except that it first performs 2-fold downsampling using a 3 × 3 max pooling layer with step size 2 and padding 0, and employs a 3 × 3 adjustable warped convolution instead of the conventional convolution in the residual block, which can be defined as:
Figure BDA0002974083080000143
where K is the total number of convolution kernel sample locations (e.g., convolution kernel size of 3)3 then K is 9), wkFor the convolution kernel weight, pkFor a predefined offset of the kth position with respect to the central position of the receptive field (e.g., p when K is 9)kE { (-1, -1), (-1,0), (-1,1), (0, -1), (0,0), (0,1), (1, -1), (1,0), (1,1) }), x (p) and y (p) are the characteristic values at position p of input profile x and output profile y, respectively, Δ pkOffset of the k-th position, Δ m, which can be learnedk∈[0,1]The number of adjustments for the k-th position. The adjustable deformation convolution implementation mode is to add a convolution layer with the same spatial resolution and expansion rate as the conventional convolution in the conventional convolution to learn the offset and the adjustment number of each position of the feature map in the x and y directions of the two-dimensional plane.
The OSA module stage 3 includes 3 residual blocks, each of which includes 5 scalable warped convolutional layers with a convolutional kernel size of 3 × 3 and an output channel of 160, and, like the OSA module stage 2 in step 2-1-2, the method also includes performing a cascade operation on the layer feature map, the first 4 layer feature maps and the input second scale feature map C2 on the last convolutional layer to obtain a diversified feature map with 256+160 × 5 ═ 1056 channels (this is the diversified feature map of the first residual block, and similarly, the diversified feature map obtained after the cascade operation of the second and third residual blocks is 512+160 × 5 ═ 1312), and the number of output channels used for the conventional convolutional with a step size of 1 after the feature map cascade operation and a step size of 1 × 1 after the map cascade operation is 0 is 512, and also includes the channel force mechanism. After the operation at this stage, a third scale feature map C3 is finally obtained, where the feature map C3 is 8 compared with the input image scale Output Stride;
osa module stage 4: this stage is similar to the OSA module stage 3 described above, except that it contains 9 residual blocks, each of which contains 5 scalable convolutional layers with a convolutional kernel size of 3 × 3 and an output channel of 192, and as in the OSA module stage 2 and stage 3, the step-wise concatenation of the layer feature map, the first 4 layer feature maps and the input second scale feature map C3 on the last convolutional layer results in a diversified feature map containing 512+192 × 5 ═ 1472 channels (this is the diversified feature map of the first residual block, and similarly the number of diversified feature map channels obtained after the second to ninth residual block concatenation operations is 768+192 × 5 ═ 1728), and the number of conventional convolutional output channels of 1 × 1 with a step size of 1 for the step-down of the feature map concatenation layer and a padding of 0 is 768, and also contains the channel attention mechanism. Finally obtaining a fourth scale feature map C4 after passing through the convolutional layers, wherein the feature map C4 is 16 compared with the input image scale Output Stride;
osa module stage 5: the stage structure is similar to stage 4, but this stage only includes 3 residual blocks, each of which includes 5 adjustable warped convolutional layers with a convolutional kernel size of 3 × 3 and an output channel of 224, and as in stage 2 and stage 3 of the OSA module, the method also performs a cascade operation on the feature map of this layer, the feature map of the first 4 layers and the input second scale feature map C3 on the last convolutional layer to obtain a diversified feature map 1888 channels (this is the diversified feature map of the first residual block, and similarly the number of the diversified feature map channels obtained after the cascade operation of the second and third residual blocks is 1024+224 5 × 2144), and the number of output channels of the conventional convolutional layer with a step size of 1 for dimension reduction after the cascade operation of the feature map and a padding of 0 is 1024, and also includes a channel attention mechanism. After this stage, a feature map C5 is finally obtained, where the feature map C5 is 32 compared to the input image scale Output Stride;
and 2-2, combining a Feature Pyramid Network (FPN) to fuse the multi-scale features. FPN top-down and lateral connection combination of the FPN on the different scale features { C3, C4, C5} in step 2-1 to fuse the features to obtain { M3, M4, M5 }. Wherein M5 is obtained by passing through a 1 × 1 convolutional layer from a feature map C5, M4 is obtained by performing element-level addition on a feature map obtained by passing through a 1 × 1 convolutional layer from a feature map C4 after nearest neighbor 2-fold upsampling by M5, and M3 is obtained by performing element-level addition on a feature map obtained by passing through a 1 × 1 convolutional layer from a feature map C3 after nearest neighbor 2-fold upsampling by M4. Finally, for each layer in the { M3, M4 and M5}, the aliasing influence caused by nearest neighbor interpolation is relieved through convolution with the convolution kernel size of 3 multiplied by 3 and the number of output channels of 256, and the feature layer { P3, P4 and P5} is obtained. In the split network of this example, P6 and P7 feature layers are additionally added, which are obtained by performing 2-fold down-sampling on P5 and P6 through 13 × 3 convolutional layer with step size of 2, respectively, to finally obtain feature layers { P3, P4, P5, P6, P7 };
step 2-3, constructing an example segmentation frame prediction head, wherein the part totally comprises 2 branches which are respectively a classification branch and a multitask branch with a centrality (center-less) and a distance frame regression in parallel:
step 2-3-1, constructing a classification branch: three conventional convolutional layers with 256 input channels and 256 output channels and 3 × 3 convolution kernel size and one adjustable deformable convolutional layer are connected in sequence after each feature layer (i.e., P3-P7) in the FPN, and the four convolutional layers do not adopt Batch Normalization (BN) but use Group Normalization (GN) to avoid the influence of Batch size on the network model. Because only the class of the mesoscale convection system is detected, a convolution layer of prediction classification is added at the end of the branch, the number of input channels is 256, the number of output channels is 1, and the size of the convolution kernel is 3 multiplied by 3;
step 2-3-2, constructing a multi-task branch with the centrality and the distance frame regression being parallel: the part is also sequentially connected with three conventional convolution layers with the same structure as the classification branch and an adjustable deformation convolution layer after the FPN characteristic layer of each scale. However, connected after these four convolutional layers is a convolutional layer containing two branches in parallel: the number of output channels for frame regression is 4, namely 3 multiplied by 3 convolution layers, and four dimension values of the branch output respectively represent the distance from the current position to four edges of the frame; 3 x 3 convolutional layers with the number of output channels being 1 for predicting centrality, the branch output being a one-dimensional representation centrality value;
assuming that the distance d from a certain sample point (x, y) on the feature map F to the four sides of the target frame to which the point belongs is (l, t, r, b), where l, t, r, b respectively represent the distances to the left, upper, right and lower sides of the rectangular frame, the centrality is defined as:
Figure BDA0002974083080000161
where min () and max () are minimum and maximum taking functions, respectively. When the example segmented convolutional neural network is used for detection, the predicted centrality value is multiplied by the classification score to obtain the final confidence. The centrality branch mainly suppresses low-quality frames farther from the center point of the target object, and it can be found from the centrality definition that if the sample point (x, y) is closer to the center of the target frame, the centrality value is closer to 1, otherwise, the centrality value is closer to 0, so that the classification of the farther points is multiplied by a smaller centrality value to obtain a lower confidence, so that the low-quality frames regressed by the farther points are more easily filtered in the Non-Maximum Suppression (NMS) stage.
And 2-4, similarly to the generation of the network RPN by the areas in the Anchor-based detector such as Mask R-CNN, predicting a plurality of frame areas by the frame head, and obtaining 100 RoIs (region of interest) after the areas are subjected to partial elimination through non-maximum suppression with IoU threshold of 0.6. In view of the implementation principle of Mask R-CNN, an instance segmentation network RoI head based on an anchor-free frame is constructed, and the method specifically comprises the following steps:
step 2-4-1, constructing a region of interest alignment RoI Align layer: the layer is firstly mapped to a corresponding FPN characteristic layer P according to the size of the region of interest RoIkThe implementation mode is as follows:
k=Ceil(kmax-log2(Ainput/ARoI))
wherein A isinputAnd ARoIRepresenting the areas, k, of the input image and the RoI, respectivelymaxSetting the number of the last layer of the backbone network as 5, enabling the Ceil () to be an upward rounding function, and enabling k to be the number of the FPN characteristic layers mapped by the RoI;
then the layer divides the corresponding position area on the mapped feature map into 14 × 14-196 small areas with the same size, adaptively samples the areas, takes the central point position of each part, and calculates by using a bilinear interpolation method to finally obtain a feature map with 14 × 14 fixed size;
step 2-4-2, constructing a Mask head containing a spatial attention mechanism: the part contains four convolutional layers with continuous convolutional kernel size of 3 × 3 and input/output channel number of 256. In order to make the branch focus more on the meaningful pixel points, a spatial attention module is introduced after the fourth convolution layer, and the implementation mechanism is as follows:
Figure BDA0002974083080000171
Figure BDA0002974083080000172
wherein XiFor inputting a feature map, Asag(Xi) Spatial attention feature descriptor, PmaxAnd PavgA feature map representing maximum pooling and average pooling over the channel dimension,
Figure BDA0002974083080000173
representing cascade operation, F3×3Is a 3 × 3 convolutional layer, σ is a Sigmoid function,
Figure BDA0002974083080000174
is element-level multiplication, XsagA characteristic diagram finally combining spatial attention;
then X obtainedsagAnd lifting and adopting a deconvolution layer with convolution kernel size of 2 multiplied by 2 and step length of 2 to obtain a characteristic diagram with size of 28 multiplied by 28 and same channel number. The convolution of the last layer of the Mask head is a Mask prediction layer specific to each class, and the detection target is a class of a mesoscale convection system, so the convolution kernel size of the prediction layer is 1 multiplied by 1, the step length is 1, and the number of output channels is 1;
step 2-4-3, a maskolou header is constructed, which re-represents the Mask quality using Mask score Mask Scoring: firstly, on the basis of the step 2-4-2, the output characteristic diagram of a Mask prediction layer is subjected to 2-time down-sampling by adopting a 2 multiplied by 2 maximum pooling layer and then is cascaded with the RoI characteristic diagram with the output size of 14 multiplied by 14 and the channel number of 256 of the RoI Align layer to obtain the characteristic diagram with the size of 28 multiplied by 28 and the channel number of 257. The maskIoU header contains four successive convolutional layers with convolutional kernel size of 3 × 3 and output channel number of 256, where the step size of the last convolutional layer is 2. Then, 2 full-connection layers with the output channel number of 1024 and 1 full-connection layer with the output channel number of 1 are also connected;
step 3, training the instance segmentation network model, specifically comprising the following steps:
and 3-1, converting the gray image into an RGB three-channel image for subsequent transfer learning because the data set infrared cloud image is a single-channel gray image, wherein the three channel images have the same value and are the original gray values. And performing data enhancement on the converted training set images and corresponding instance labels: the image is first scaled to multiple scales, with the long side at 1333 and the short side at a random one of {640,672,704,736,768,800}, and the original scale of the image is preserved while also data enhancement is performed using random horizontal flipping.
The pixel values are centrally processed, and since the training uses a pre-trained model on the ImageNet dataset, the images need to be normalized according to the mean value in the ImageNet dataset. The normalization operation is carried out according to channels, the mean values of R, G, B channels are 123.675, 116.28 and 103.53 respectively, the standard deviation is 58.395, 57.12 and 57.375, and the mean scalar combination of the RGB three channels is expressed as a vector
Figure BDA0002974083080000175
The scalar combination of the standard deviation of three channels of RGB is expressed as a vector sigma, and an input image is set as x, then the normalized image data x' is as follows:
Figure BDA0002974083080000181
then, filling the image scale into a multiple of 32 to avoid feature loss caused by subsequent convolution operation;
correspondingly, the corresponding instance labels of the input image after zooming and horizontal turning data enhancement are also subjected to the same transformation to obtain correct labels;
step 3-2, setting a classification loss function of the head of the predicted frame in the example segmentation network: the classification task uses the Focal Loss function, Focal local, to ameliorate the class imbalance problem in the one-stage detector. Since only one class of labels is included in the dataset, only one two-classifier needs to be trained. Consider that the network treats each location as a training sample rather than an anchor box, let feature graph Fi(i-3, 4.. 7) position (x) obtained by classification branchingi,yi) Predicted value is
Figure BDA0002974083080000182
Is provided with
Figure BDA0002974083080000183
Wherein
Figure BDA0002974083080000184
Indicating whether it is a positive or negative sample,
Figure BDA0002974083080000185
indicates the position (x)i,yi) In the case of a positive sample,
Figure BDA0002974083080000186
then the position (x) is indicatedi,yi) In the form of a negative sample, the sample,
Figure BDA0002974083080000187
indicates the position (x)i,yi) Is the probability of a positive sample. Will position (x)i,yi) The mapping to the corresponding positions of the input image is as follows:
Figure BDA0002974083080000188
wherein s isiIs a characteristic diagram FiIf (x ', y') falls into any one of the group Truth (true mark), as compared to the input image scale Output StrideGT) frame is then a positive sample with Y equal to 1, otherwise Y equal to 0. The focus loss function based on alpha balance can be expressed as:
Figure BDA0002974083080000189
where α is the weighting factor, γ ≧ 0 is the adjustable focusing parameter, and α and γ are set to 0.25 and 2.0, respectively, in the experiment. For feature map FiThe overall classification loss function is:
Figure BDA00029740830800001810
step 3-3, setting a centrality loss function: given feature map FiMiddle and positive sample feature point (x)i,yi) Distance frame regression target
Figure BDA00029740830800001811
Wherein
Figure BDA00029740830800001812
Is expressed as a characteristic point (x)i,yi) Distances to the left, upper, right and lower sides of the true border. Then the feature point (x)i,yi) The centrality target of (a) is defined as:
Figure BDA0002974083080000191
it is obvious that
Figure BDA0002974083080000192
Therefore, during training, the central degree branch adopts a binary cross entropy loss function. Let characteristic diagram FiObtaining position (x) through centrality branchingi,yi) Predicted value of (2)
Figure BDA0002974083080000193
Then the characteristic diagram FiTotal centrality lossThe function is:
Figure BDA0002974083080000194
wherein
Figure BDA0002974083080000195
Indicating whether it is a positive or negative sample,
Figure BDA0002974083080000196
indicates the position (x)i,yi) In the case of a positive sample,
Figure BDA0002974083080000197
then the position (x) is indicatedi,yi) Are negative examples.
Figure BDA0002974083080000198
To indicate a function, if the condition in parentheses
Figure BDA0002974083080000199
If yes, the indication function value is 1, otherwise, the value is 0;
step 3-4, setting a distance regression loss function: the regression task adopts a GIoU loss function and sets the distance
Figure BDA00029740830800001910
Wherein
Figure BDA00029740830800001911
Is expressed as a characteristic point (x)i,yi) Distances to the left, top, right, and bottom of the predicted bounding box. Regression distance target
Figure BDA00029740830800001912
Wherein
Figure BDA00029740830800001913
Is expressed as a characteristic point (x)i,yi) To the left, upper, right and lower of the true frameThe distance of the edges. If position (x)i,yi) And selecting the frame with the smallest area from the plurality of GT frames as a distance regression target. Will be at distance
Figure BDA00029740830800001914
And
Figure BDA00029740830800001915
viewed as a circumscribing bounding box, the GIoU loss function can be expressed as:
Figure BDA00029740830800001916
Figure BDA00029740830800001917
wherein C is
Figure BDA00029740830800001918
And
Figure BDA00029740830800001919
the outer border of the two is represented by | l which represents the area of the region,
Figure BDA00029740830800001920
indicates that the region C does not contain
Figure BDA00029740830800001921
And
Figure BDA00029740830800001922
the area of the part(s) is,
Figure BDA00029740830800001923
is composed of
Figure BDA00029740830800001924
And
Figure BDA00029740830800001925
the cross-over-cross-over ratio of (c),
Figure BDA00029740830800001926
is composed of
Figure BDA00029740830800001927
And
Figure BDA00029740830800001928
the generalized cross-over ratio of (1). Feature map FiThe total distance regression loss function is:
Figure BDA00029740830800001929
wherein
Figure BDA00029740830800001930
The meaning is the same as in step 3-3,
Figure BDA00029740830800001931
for the loss function element level weighting coefficients, the value is the position (x)i,yi) A centrality regression target;
step 3-5, setting a Mask loss function: predicting the bounding box header before passing through the Mask Head results in a plurality of proposed bounding boxes, and obtaining the RoI of the maximum number of each picture of 100 according to a score threshold of 0.05 and a non-maximum suppression threshold of IoU of 0.6. To improve the convergence rate and improve the detection performance, a GT frame is also added to the RoI for network training. Setting N total RoIs added with GT frames and K total GT frames, calculating IoU between the RoI and the GT, dividing the RoI into positive and negative samples according to a IoU threshold value of 0.5, wherein a positive sample label is 1 and a negative sample label is 0, obtaining a dictionary D corresponding to the ith RoI and the jth RoI which is most matched with the ith RoI, which belongs to 0, N) and related information of the GT frames, and finally sampling according to the proportion that the positive samples account for all the samples 1/4 to obtain a training sample X for Mask Head.
Since Mask Loss is defined only on the positive sample, foreground screening of the training sample X is also required. Set up to obtain MPositive sample RoI, ith RoIiAlignment of RoIAlign through the region of interest yields a feature map F of 14 × 14 sizei,FiObtaining a prediction feature map pred with 28 multiplied by 28 size after Mask Headi. Mask target gt _ MaskiFirstly, finding GT information (including category, frame and polygon Mask) according to the RoI serial number of D, then cutting the frame area on the original Mask according to the frame information, finally, re-adjusting the cut part to 28 × 28 to obtain the final GT _ Maski. Computing L using an average two-value cross entropy loss functionmask
Figure BDA0002974083080000201
Wherein
Figure BDA0002974083080000202
And
Figure BDA0002974083080000203
is a feature map prediAnd gt _ maskiA characteristic value at an (x, y) position;
step 3-6, setting a MaskIoU loss function: the predicted value obtained by intersecting the ith characteristic diagram after cascade operation through the prediction mask and comparing the ith characteristic diagram with the MaskIoU Head of the Head is pred _ MaskIoUiMaskIoU is targeted to gt _ MaskIoUiThe value is obtained in step 3-5 according to the predicted Mask and the Mask information in the corresponding GT to obtain GT _ Mask ioui. By using
Figure BDA0002974083080000205
Calculating L by a loss functionmaskiou
Figure BDA0002974083080000204
Step 3-7, setting a multitask loss function: adding the classification loss function, the centrality loss function, the distance regression loss function, the Mask loss function and the maskIoU loss function to obtain a total multi-task loss function of a small batch:
Figure BDA0002974083080000211
wherein N iscls、Nctn、Nreg、Nmask、NmaskIoUIs a normalized coefficient of the corresponding loss function, and Ncls=Nctm=NregTo predict the number of positive samples in the bounding box header, Nmask=NmaskIoUThe number of positive samples obtained for the proposed frames proposals predicted in the prediction frame header based on the IoU threshold and the ratio of sampled positive and negative samples. Lambda [ alpha ]ctn、λcls、λcls、λclsThe balance coefficients for each loss are all 1;
step 3-8, setting relevant parameters of network learning and training: transfer learning is performed using pre-trained weights of VovNetV2-99 on ImageNet. And selecting 1000 propulses with the highest confidence on the frame prediction result, and reserving 100 propulses through non-maximum inhibition, wherein the threshold value of IoU in the non-maximum inhibition NMS is 0.6. The propusals with IoU value greater than 0.5 with the true mark box is considered as positive samples in Mask Head, otherwise, negative samples, and the number of positive samples used for training accounts for 1/4 of all training samples. Meanwhile, the network training adopts small-batch random gradient descent to optimize the network model, the batch size is 2, the learning rate is 0.0025, a Warmup strategy is adopted in the initial 1000 iterations, and meanwhile Momentum with the coefficient of 0.9 is used. In terms of regularization, 10 is adopted-4The weight attenuation coefficient of (2). The whole training epoch is 24 rounds, a learning rate step decreasing strategy is adopted in the 16 th round and the 22 nd round, and the decreasing coefficient is 0.1. After the learning parameters are set, training the constructed network by using the training data enhanced in the step 3-1;
step 4, predicting the segmentation result by using the trained example segmentation network, which specifically comprises the following steps:
step 4-1, performing data enhancement basically the same as that in the step 3-1 on the image needing case segmentation, wherein the difference is that only single-scale scaling is performed on the image during inference, namely the long side is 1333, and the short side is 800;
step 4-2, the trained network performs forward calculation on the test image obtained in the step 1-3, propulses with confidence lower than 0.05 are removed from the head of a prediction frame, 1000 proposed frames propulses with the highest confidence are screened out, at most 50 RoIs are obtained through NMS and used for Mask branch prediction, and finally a kth Mask (but the data used in the text only comprises one type) is obtained according to the type k of the classification branch prediction, the Mask with the size of 28 × 28 is scaled to the size of the corresponding RoI scale, and binarization is performed through a threshold value of 0.5 to generate a final Mask;
step 4-3, predicting Mask intersection and Mask IoU by the Top-50 propulses through the Mask IoU heads, and multiplying the Mask intersection and Mask IoU by the classification confidence coefficient to obtain the final Mask confidence coefficient;
step 4-4, adopting NMS (network management system) according to the Mask generated in the step 4-2 and the Mask confidence coefficient obtained in the corresponding step 4-3, wherein the threshold value of IoU is 0.5, and screening out the final prediction Mask;
step 4-5, carrying out mesoscale convection system example segmentation on the stationary satellite infrared cloud pictures of the Jianghuai region of China at adjacent moments to obtain a mesoscale convection system example as shown in the figure 5;
step 5, tracking the mesoscale convection system at a plurality of continuous moments according to a related target matching principle, and simultaneously considering the ubiquitous splitting and merging phenomena, the method specifically comprises the following steps:
step 5-1, obtaining the mass center coordinate of each mesoscale convection system example
Figure BDA0002974083080000221
Characteristic area, strength P:
Figure BDA0002974083080000222
area=N
Figure BDA0002974083080000223
where N is the total number of pixel points, x, encompassed by the mesoscale convective system exampleiAnd yiIs the x and y coordinates of the ith pixel point, f (i) is the brightness temperature value of the ith pixel point, fzFor the accumulated sum of the lighting temperatures of the pixels in the example, the image resolution (4km) of the embodiment of the invention needs to be combined in the subsequent calculation;
and 5-2, solving the duration, the position change of the centroid point, the area change and the intensity change of the mesoscale convection system at two adjacent moments according to the time sequence. If the position change of the mass center point and the area change of the mass center point are less than or equal to 50m/s and less than or equal to 5km by two mesoscale convection systems at adjacent moments2The intensity change is less than or equal to 0.001 ℃/s, and the same target can be preliminarily judged;
and 5-3, considering that the splitting and merging phenomena commonly existing in the mesoscale convection system need to be correspondingly processed. If there are multiple mesoscale convective systems MCS at a certain time, it is recorded as X1,X2,…,XnWith a MCS recorded as P at the previous timejIf the tracking matching principle in the step 5-2 is satisfied, the phenomenon belongs to the splitting condition, and X with the largest product after splitting is selectedm(m is more than or equal to 1 and less than or equal to n) continuing PjTrack of (2), update XmThe index value of the previous time of the path is j and the duration is PjThe duration of +1, while the other MCS is considered as the index value is empty and the duration is initialized to 1 at the previous time of the new cloud picture;
if there are multiple MCS recorded at the previous time and a certain MCS recorded as P at the timeaIf the MCS tracking matching principle is satisfied, the phenomenon belongs to the merging situation, and P with the largest area at the previous time is selectedm(1. ltoreq. m. ltoreq.z) as PaLast moment track of, update PmThe index value of the previous time of the path is m and the duration is PmDuration + 1. And the MCS with the non-maximum area at the previous moment is finished in the life cycle, whether the duration time is more than or equal to 1h is judged, if yes, the corresponding MCS path is obtained in a reverse order according to the index value at the previous moment, and the MCS is realizedThe tracking of (2).
In the tracking result shown in fig. 6, every time interval is half an hour, three clouds are recognized at the first time, similarly, three clouds are also recognized at the second time, the characteristic quantities (the centroid position moving distance, the area change and the intensity change) of the clouds at two adjacent times are calculated and analyzed, the clouds corresponding to the first time and conforming to the target matching principle are tracked at the second time to be marked, and it can be seen from the second time in fig. 6 that the cloud 3 identified at the first time cannot find the cloud meeting the target matching principle at the second time, so that the cloud 3 disappears. And when there is a cloud meeting the target matching principle between the cloud cluster 1 and the cloud cluster 2 at the first time and the cloud cluster 2 at the second time, the cloud clusters corresponding to the second time are marked with the same numbers as the cloud cluster 1 and the cloud cluster 2 in the second time of fig. 6. The cloud 4 at the second moment is the new cloud at that moment. According to the tracked clouds identified at the first time and the second time, the tracked target clouds in the third time and the fourth time can be obtained, which are shown in fig. 6 at the third time and the fourth time. Finally, according to the requirement that the duration time is more than or equal to 1h, the MCS meeting the requirement can be obtained by screening, only the cloud cluster 2 in the figure 6 is the correct MCS, and the correct MCS appears at the first time, the second time and the third time and lasts for one hour (more than or equal to 1h is met);
in this embodiment, a mesoscale convection system example segmentation is performed on data of geostationary satellite infrared bright temperature data provided by the public U.S. national marine and atmospheric administration, an experimental configuration environment is shown in table 1, and an experimental result shown in table 2 is obtained by comparing with other currently mainstream deep learning example segmentation methods such as Mask R-CNN, Mask spring R-CNN, Cascade Mask R-CNN, and HTC under the same configuration environment, and the evaluation standard adopts a COCO data set part standard (i.e., box AP, box AR, Mask AP, Mask AR, and inference time).
Table 1 Experimental configuration Environment
Figure BDA0002974083080000231
Table 2 example segmentation comparative experiment results
Figure BDA0002974083080000232
Compared with the existing mainstream example segmentation methods, the method has great advantages in accuracy and recall rate, keeps the detection speed similar to Mask R-CNN and does not have extra complicated calculation expenditure, and effectively proves the characteristic of high detection precision and high speed.
The present invention provides a method for identifying and tracking a mesoscale convection system based on image detection without anchor frame, and a method and a way for implementing the technical scheme are many, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of improvements and embellishments can be made without departing from the principle of the present invention, and these improvements and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims (9)

1. The mesoscale convection system identification and tracking method based on image anchor-frame-free detection is characterized by comprising the following steps of:
step 1, preprocessing an original infrared brightness temperature data file according to the relevant description of satellite data: cutting original satellite data, marking image polygon examples, and randomly dividing a training set, a verification set and a test set;
step 2, constructing a mesoscale convection system example segmentation convolutional neural network based on an anchor-free frame: the method comprises the steps of dividing the network into a main network for feature extraction, a feature pyramid network for fusing multi-scale features, a predicted frame network head for classification, distance frame regression and prediction centrality, a Mask generation network head and a network head for predicting maskIoU;
step 3, performing multi-scale image enhancement on the training set, and automatically learning network parameters by using the mesoscale convective system example segmentation convolutional neural network constructed in the step 2 of migration learning supervision training: training by adopting a small batch random gradient descent method, and setting loss functions of three network heads, namely a prediction frame, a generated Mask and a prediction Mask IoU;
step 4, carrying out mesoscale convection system example segmentation on the stationary satellite infrared cloud pictures at adjacent moments by using the trained network to obtain mesoscale convection system related records at continuous moments;
and 5, tracking the mesoscale convection system on the basis of the step 4 according to a relevant target matching principle.
2. The method of claim 1, wherein step 1 comprises:
step 1-1, reading an infrared brightness temperature data file of each original geostationary satellite according to the related description of satellite data, wherein the file is an array in a format of 2 x 3298 x 9896, and obtaining two gray cloud pictures of 0 minute and 30 minutes at corresponding time;
step 1-2, cutting the gray cloud picture obtained in the step 1-1: cutting each gray cloud picture into more than two gray infrared cloud pictures with the size of 420 multiplied by 360;
step 1-3, giving polygon example level labels of a mesoscale convection system for each gray level infrared cloud subgraph, filtering subgraphs without the mesoscale convection system to obtain json files corresponding to each gray level infrared cloud subgraph, wherein the whole label only has one category; and randomly dividing the gray infrared cloud images into a training set image, a verification set image and a test set image in a ratio of 6:2:2, thereby forming a training set, a verification set and a test set.
3. The method of claim 2, wherein step 2 comprises:
step 2-1, constructing a backbone network for feature extraction: the backbone network in the example segmentation network adopts a convolutional neural network VoVNetV2-99 with 99 layers in total, and convolutional layers in the convolutional neural network VoVNetV2-99 are all in the form of Conv-BN-ReLU, namely the combination of the convolutional layers, a batch normalization layer BN and a linear rectification function ReLU in sequence; meanwhile, cross-layer connection is adopted in a VoVNetV2-99 structural block of the convolutional neural network to realize identity mapping, and a residual block containing the identity mapping is defined as:
Y=F(X,{Wi})+X
where X and Y represent the input and output profiles, F (X, { W), respectively, for each building blocki}) is a residual mapping function to be learned;
meanwhile, a channel attention eSE mechanism is also introduced into the network residual block, and the specific implementation mode is as follows:
AeSE(Xdiv)=σ(WC(gap(Xdiv)))
Figure FDA0002974083070000021
wherein XdivFor the diversified feature map obtained by dimensionality reduction after the network feature map cascade connection,
Figure FDA0002974083070000022
for channel level global average pooling, W and H are the width and height, respectively, of feature map X, Xi,jIs the eigenvalue of the profile X at (i, j), WCFor full connection layer weights, σ is expressed as Sigmoid activation function, AeSE(Xdiv) Feature descriptors for computed channel attention, said feature descriptors and XdivCarrying out element-level multiplication to finally obtain a refined characteristic diagram Xrefine
The specific construction steps of the convolutional neural network VoVNetV2-99 comprise:
step 2-1-1, constructing a network Stem stage 1: this stage contains three convolutional layers: firstly, convolution layers with convolution kernel size of 3 multiplied by 3, step length of 2, filling of 1 and output channel number of 64 are used for carrying out down sampling on an input image, and then convolution layers with convolution kernel size of 3 multiplied by 3 and output channel number of 64 and convolution layers with convolution kernel size of 3 multiplied by 3, step length of 2 and output channel number of 128 are connected; after the input image passes through the stage, a first scale feature map C1 is generated, wherein the feature map C1 is 4 compared with the input image scale Output Stride;
step 2-1-2, constructing a network single aggregation module stage 2, wherein the stage comprises a residual block, and the residual block comprises 5 convolutional layers with the convolutional kernel size of 3 multiplied by 3 and the output channel size of 128; after the convolution operation is carried out on the input image, 5 convolution layers respectively obtain diversified feature maps with the channel number of 128, cascade operation is carried out on the feature maps of the layers on the last convolution layer, the feature maps of the first 4 layers and the input first scale feature map C1 to obtain a feature map with the channel number of 128X 6-768, and then dimension reduction is carried out through the convolution layers with the convolution kernel size of 1X 1, the step size of 1, the padding of 0 and the output channel number of 256 to obtain diversified feature maps XdivCombining the channel attention eSE mechanism to obtain the final Xrefine,XrefineElement-level addition is also required to be carried out on the input feature map to realize identity mapping, and finally, a second scale feature map C2 with the channel number being 256 is obtained at the stage, and the scale scaling Output Stride of the feature map C2 compared with the input image is 4;
step 2-1-3, constructing a network single polymerization module stage 3: a 2-fold downsampling is performed using a 3 × 3 maximum pooling layer with step size of 2 and padding of 0, and a 3 × 3 adjustable warped convolution is employed in the residual block, which is defined as:
Figure FDA0002974083070000031
where K is the total number of convolution kernel sample locations, wkFor the convolution kernel weight, pkFor the predefined offset of the k-th position relative to the central position of the receptive field, x (p) and y (p) are the characteristic value of the input characteristic diagram x at the p position and the characteristic value of the output characteristic diagram y at the p position, respectively, Δ pkOffset of the k-th position, Δ m, which can be learnedk∈[0,1]Is the adjustment number of the k position; the adjustable deformed convolution implementation mode is to add a convolution layer with the same spatial resolution and expansion rate as the conventional convolution in the conventional convolution to learn that each position of the feature map is in two-dimensional flatnessOffset and adjustment number in the x and y directions of the plane;
the OSA module stage 3 includes 3 residual blocks, each of which includes 5 adjustable warped convolutional layers with a convolutional kernel size of 3 × 3 and an output channel of 160, and as with the network single aggregation module stage 2 in step 2-1-2, the method also includes performing a cascade operation on the layer feature map, the first 4 layer feature maps and the input second scale feature map C2 on the last convolutional layer to obtain a diversified feature map including 256+160 × 5 ═ 1056 channel numbers, which is the diversified feature map of the first residual block, and then the number of channels of the diversified feature map obtained after the cascade operation of the second and third residual blocks is 512+160 × 5 ═ 1312, and the number of output channels of a conventional convolutional layer with a step size of 1 after the cascade operation of the feature map layers and a padding of 1 × 1 is 0 is 512, and also includes a channel attention system; after the operation of the OSA module in stage 3, a third scale feature map C3 with the channel number being 512 is finally obtained, wherein the feature map C3 is 8 compared with the input image scale Output Stride;
step 2-1-4, constructing a network single polymerization module stage 4: the stage comprises 9 residual blocks, each of which comprises 5 adjustable warped convolutional layers with convolution kernel size of 3 × 3 and output channel of 192, and as with the stage 2 of the network single aggregation module and the stage 3 of the network single aggregation module, the step is to perform cascade operation on the feature map of the layer, the feature map of the first 4 layers and the input second scale feature map C3 on the last convolutional layer to obtain a diversified feature map with 512+192 × 5 ═ 1472 channels, which is the diversified feature map of the first residual block, the channel number of the diversified feature map obtained after cascade operation of the second to ninth residual blocks is 768+192 × 5 ═ 1728, and the step size for dimension reduction after cascade operation of the feature map is 1, the conventional output channel number of 1 × 1 with padding being 0 is 768, and the channel attention mechanism is also included; finally obtaining a fourth scale feature map C4 after the network single aggregation module stage 4, wherein the feature map C4 is 16 compared with the input image scale Output Stride;
step 2-1-5, constructing the last structure of the backbone network, and performing single aggregation module stage 5: the stage comprises 3 residual blocks, each of which comprises 5 adjustable warped convolutional layers with convolution kernel size of 3 × 3 and output channel of 224, and as with the network single aggregation module stage 2 and the network single aggregation module stage 3, the last convolutional layer is also subjected to cascade operation on the layer of feature map, the first 4 layers of feature maps and the input second scale feature map C3 to obtain a diversified feature map comprising 768+224 × 5 ═ 1888 channels, the diversified feature map is the first residual block, the diversified feature map obtained after cascade operation of the second and third residual blocks is 1024+224 × 5 ═ 2144, the number of output channels of the conventional convolutional layer with step size of 1 for dimension reduction and padding of 0 after cascade operation of the feature map is 1024, and the conventional convolutional layer also comprises a channel attention mechanism; after this stage, a feature map C5 is finally obtained, where the feature map C5 is 32 compared to the input image scale Output Stride;
step 2-2, combining the feature pyramid network FPN to fuse multi-scale features: the feature pyramid network FPN performs top-down and transverse connection on the features { C3, C4, C5} of different scales obtained in the step 2-1 to obtain a feature { M3, M4, M5}, wherein M5 is obtained by performing element-level addition on a feature map obtained by performing 1 × 1 convolution on a feature map C5, M4 is obtained by performing nearest neighbor 2-time upsampling on M5 and performing 1 × 1 convolution on the feature map C4, and M3 is obtained by performing nearest neighbor 2-time upsampling on M4 and performing element-level addition on the feature map obtained by performing 1 × 1 convolution on the feature map C3;
finally, obtaining feature layers { P3, P4 and P5} of each layer in { M3, M4 and M5} through convolution with convolution kernel size of 3 multiplied by 3 and output channel number of 256;
additionally adding a P6 feature layer and a P7 feature layer in the example segmentation network, wherein the P6 feature layer and the P7 feature layer are obtained by performing 2-time downsampling on P5 and P6 through 13 × 3 convolutional layer with the step size of 2 respectively, and finally obtaining feature layers { P3, P4, P5, P6 and P7 };
step 2-3, constructing an example segmentation frame prediction head, wherein the part comprises 2 branches which are respectively a classification branch and a multitask branch with the centrality and the distance frame regression in parallel, and the method specifically comprises the following steps:
step 2-3-1, constructing a classification branch: sequentially connecting three conventional convolutional layers with input channel number and output channel number of 256 and convolution kernel size of 3 multiplied by 3 and an adjustable deformation convolutional layer behind each characteristic layer in the characteristic pyramid network FPN, wherein the four convolutional layers do not adopt batch normalization BN but use group normalization GN; adding a convolution layer for predicting classification at the end of the classification branch, wherein the number of input channels is 256, the number of output channels is 1, and the size of convolution kernel is 3 multiplied by 3;
step 2-3-2, constructing a multi-task branch with the centrality and the distance frame regression being parallel: the part is sequentially connected with three conventional convolutional layers with the same structure with the classification branches and an adjustable deformation convolutional layer after a feature pyramid network FPN feature layer of each scale, and the four convolutional layers are connected with a convolutional layer comprising two parallel branches: the number of output channels for frame regression is 4, namely 3 multiplied by 3 convolution layers, and four dimension values of the branch output respectively represent the distance from the current position to four edges of the frame; 3 x 3 convolutional layers with the number of output channels being 1 for predicting centrality, the branch output being a one-dimensional representation centrality value;
assuming that the distance d from a sample point (x, y) on the feature map F to the four sides of the target frame to which the point belongs is (l, t, r, b), where l, t, r, b respectively represent the distances from the sample point (x, y) to the left, top, right and bottom sides of the rectangular frame, the centrality cennteress is defined as:
Figure FDA0002974083070000051
wherein min () and max () are minimum and maximum functions, respectively;
step 2-4, constructing an instance segmentation network RoI head based on an anchor-free frame, and specifically comprising the following steps:
step 2-4-1, constructing a region of interest alignment RoIAlign layer: the RoIAlign layer firstly maps the RoI to the corresponding feature pyramid network FPN feature layer P according to the size of the RoI of the region of interestkThe implementation mode is as follows:
k=Ceil(kmax-log2(Ainput/ARoI))
wherein A isinputAnd ARoIRespectively representing the area of the input image and the area of the RoI; k is a radical ofmaxSetting the last layer number of the backbone network as 5; ceil () is an upward rounding function, and k is the number of characteristic layers of a characteristic pyramid network FPN mapped by RoI;
the corresponding position area on the mapped feature map is divided into 196 small areas with the same size by the RoI Align layer, the small areas are sampled in a self-adaptive mode, the central point position of each small area is taken, a bilinear interpolation method is used for calculation, and finally a feature map with the fixed size of 14 x 14 is obtained;
step 2-4-2, constructing a Mask head comprising a spatial attention mechanism: the Mask head comprises four continuous convolution layers with convolution kernel size of 3 multiplied by 3 and input and output channel number of 256, and a space attention module is introduced after the fourth convolution layer, and the realization mechanism is as follows:
Figure FDA0002974083070000061
Figure FDA0002974083070000062
wherein XiFor inputting a feature map, Asag(Xi) Spatial attention feature descriptor, PmaxAnd PavgA feature map representing maximum pooling and an average pooling in the channel dimension,
Figure FDA0002974083070000063
representing cascade operation, F3×3Is a 3 × 3 convolutional layer, σ is a Sigmoid function,
Figure FDA0002974083070000064
is element-level multiplication, XsagA characteristic diagram finally combining spatial attention;
then XsagBy convolution kernel size 2 x 2, step size 2Performing up-sampling on the deconvolution layer to obtain a characteristic diagram with the size of 28 multiplied by 28 and the same channel number; the convolution layer at the last layer of the Mask head is a Mask prediction layer specific to each category, and the detection target is a category of a mesoscale convection system, so the convolution kernel size of the Mask prediction layer is 1 multiplied by 1, the step length is 1, and the number of output channels is 1;
and 2-4-3, constructing a maskIoU head, and using Mask score Mask ordering to represent Mask quality again.
4. The method of claim 3, wherein steps 2-4-3 comprise: on the basis of the step 2-4-2, performing 2-time down-sampling on an output characteristic diagram of a Mask prediction layer by adopting a 2 multiplied by 2 maximum pooling layer, and then cascading the output characteristic diagram with a RoI characteristic diagram with the output size of 14 multiplied by 14 and the channel number of 256 of a RoI Align layer to obtain a characteristic diagram with the output size of 28 multiplied by 28 and the channel number of 257;
the maskIoU header contains four successive convolutional layers with convolutional kernel size of 3 × 3 and output channel number of 256, wherein the step size of the last convolutional layer is 2, and then 2 fully-connected layers with output channel number of 1024 and 1 fully-connected layer with output channel number of 1 are connected.
5. The method of claim 4, wherein step 3 comprises the steps of:
step 3-1, converting the gray level images in the training set into images of three RGB channels, wherein the values of the three RGB channels are the same and are the gray level values of the gray level images, and performing data enhancement on the converted images in the training set and corresponding example labels: firstly, scaling an image according to a plurality of scale scales, wherein the long side is 1333, the short side is random one of {640,672,704,736,768,800}, the original proportion of the image is kept, and meanwhile, random horizontal inversion is adopted for data enhancement;
performing centering processing on pixel values, normalizing the images according to the mean value in the ImageNet data set, performing normalization operation according to channels, wherein the mean values of R, G, B channels are 123.675, 116.28 and 103.53 respectively, the standard deviations are 58.395, 57.12 and 57.375 respectively, and combining the mean scalar quantities of the three channels of RGB into a tableShown as vectors
Figure FDA0002974083070000071
The scalar combination of the standard deviation of three channels of RGB is expressed as a vector sigma, an input image is set as x, and the normalized image data x' is set as
Figure FDA0002974083070000072
Then filling the image scale into a multiple of 32;
correspondingly, the corresponding instance labels of the input image after zooming and horizontal turning data enhancement are also subjected to the same transformation to obtain correct labels;
step 3-2, setting a classification loss function of the head of the predicted frame in the example segmentation network: let characteristic diagram Fi(i-3, 4.. 7) position (x) obtained by classification branchingi,yi) Predicted value is
Figure FDA0002974083070000073
Is provided with
Figure FDA0002974083070000074
Wherein
Figure FDA0002974083070000075
Indicating whether it is a positive or negative sample,
Figure FDA0002974083070000076
indicates the position (x)i,yi) In the case of a positive sample,
Figure FDA0002974083070000077
then the position (x) is indicatedi,yi) In the form of a negative sample, the sample,
Figure FDA0002974083070000078
is a position (x)i,yi) Probability of being a positive sample;
will position (x)i,yi) The corresponding position (x ', y') mapped to the input image is:
Figure FDA0002974083070000079
wherein s isiIs a characteristic diagram FiComparing to the input image scale scaling Output Stride, if (x ', Y') falls into any one of the real mark frames, then the sample is positive, and Y is equal to 1, otherwise Y is equal to 0;
alpha balance based focus loss function
Figure FDA00029740830700000710
Expressed as:
Figure FDA0002974083070000081
wherein alpha is a weight factor, and gamma is more than or equal to 0 and is an adjustable focusing parameter;
for feature map FiTotal classification loss function of
Figure FDA0002974083070000082
Comprises the following steps:
Figure FDA0002974083070000083
step 3-3, setting a centrality loss function: given feature map FiMiddle and positive sample feature point (x)i,yi) Distance frame regression target
Figure FDA0002974083070000084
Wherein
Figure FDA0002974083070000085
Respectively represent characteristic points (x)i,yi) Distances to the left, upper, right and lower sides of the real border; then the feature point (x)i,yi) Is a centrality target
Figure FDA0002974083070000086
Is defined as:
Figure FDA0002974083070000087
Figure FDA0002974083070000088
let characteristic diagram FiObtaining position (x) through centrality branchingi,yi) Predicted value of (2)
Figure FDA0002974083070000089
Then the characteristic diagram FiTotal centrality loss function
Figure FDA00029740830700000810
Comprises the following steps:
Figure FDA00029740830700000811
wherein
Figure FDA00029740830700000812
Indicating whether it is a positive or negative sample,
Figure FDA00029740830700000813
indicates the position (x)i,yi) In the case of a positive sample,
Figure FDA00029740830700000814
then the position (x) is indicatedi,yi) Is a negative sample;
Figure FDA00029740830700000815
to indicate a function, if the condition in parentheses
Figure FDA00029740830700000816
If yes, the indication function value is 1, otherwise, the value is 0;
step 3-4, setting a distance regression loss function: the regression task adopts a GIoU loss function and sets the distance
Figure FDA00029740830700000817
Wherein
Figure FDA00029740830700000818
Respectively represent characteristic points (x)i,yi) Distances to the left, top, right, and bottom of the predicted bounding box; the regression distance target is
Figure FDA00029740830700000819
Wherein
Figure FDA00029740830700000820
Respectively represent characteristic points (x)i,yi) Distances to the left, upper, right and lower sides of the real border; will be at distance
Figure FDA00029740830700000821
And
Figure FDA00029740830700000822
viewed as a circumscribed border, the GIoU loss function is expressed as:
Figure FDA0002974083070000091
Figure FDA0002974083070000092
wherein C is
Figure FDA0002974083070000093
And
Figure FDA0002974083070000094
the outer border of the two is represented by | l which represents the area of the region,
Figure FDA0002974083070000095
indicates that the region C does not contain
Figure FDA0002974083070000096
And
Figure FDA0002974083070000097
the area of the part(s) is,
Figure FDA0002974083070000098
is composed of
Figure FDA0002974083070000099
And
Figure FDA00029740830700000910
the cross-over-cross-over ratio of (c),
Figure FDA00029740830700000911
is composed of
Figure FDA00029740830700000912
And
Figure FDA00029740830700000913
generalized cross-over ratio of (1);
feature map FiTotal distance regression loss function
Figure FDA00029740830700000914
Comprises the following steps:
Figure FDA00029740830700000915
wherein
Figure FDA00029740830700000916
For the loss function element level weighting coefficients, the value is the position (x)i,yi) A centrality regression target;
step 3-5, setting a Mask loss function: the head of the predicted frame is designed to obtain M RoIs and the ith RoIiObtaining a prediction feature map pred after aligning the RoIAlign and the Mask Head through the region of interestiObtaining RoI from both classification and regression branchesiClass to which it belongs and Mask target gt _ MaskiCalculating L using an average two-value cross entropy loss functionmask
Figure FDA00029740830700000917
Wherein
Figure FDA00029740830700000918
And
Figure FDA00029740830700000919
is a feature map prediAnd gt _ maskiA characteristic value at an (x, y) position;
step 3-6, setting a MaskIoU loss function: the predicted value obtained by intersecting the ith characteristic diagram after cascade operation through the prediction mask and comparing the ith characteristic diagram with the MaskIoU Head of the Head is pred _ MaskIoUiMaskIoU is targeted to gt _ MaskIoUiBy using a2Calculating L by a loss functionmaskiou
Figure FDA00029740830700000920
3-7, setting a multitask loss function;
and 3-8, setting relevant parameters of network learning and training.
6. The method of claim 5, wherein steps 3-7 comprise: adding the classification loss function, the centrality loss function, the distance regression loss function, the Mask loss function and the maskIoU loss function to obtain a total multi-task loss function L of the small batch:
Figure FDA0002974083070000101
wherein N iscls、Nctn、Nreg、Nmask、NmaskIoUIs a normalized coefficient of the corresponding loss function, and Ncls=Nctm=NregTo predict the number of positive samples in the bounding box header, Nmask=NmaskIoUThe number of positive samples obtained according to IoU threshold and the proportion of sampling positive and negative samples after the predicted proposed frames proposals in the header of the predicted frame; lambda [ alpha ]ctn、λcls、λcls、λclsThe balance factor for each loss.
7. The method of claim 6, wherein steps 3-8 comprise: performing transfer learning by using pre-training weights of VovNetV2-99 on ImageNet, selecting 1000 propulses with the highest confidence coefficient on a pre-frame measurement result, and reserving 100 propulses through non-maximum inhibition, wherein the threshold value of IoU in the non-maximum inhibition is 0.6; regarding the propusals with IoU value greater than 0.5 with the real mark box as positive samples in the Mask Head, otherwise, the samples are negative samples, and the number of positive samples used for training accounts for 1/4 of all training samples; meanwhile, the network training adopts small-batch random gradient descent to optimize the network model, the batch size is 2, the learning rate is 0.0025, and the maximum value isIn the initial 1000 iterations, a Warmup strategy is adopted, and Momentum with a coefficient of 0.9 is used; in terms of regularization, 10 is adopted-4The weight attenuation coefficient of (2); the whole training epoch is 24 rounds, a learning rate step decreasing strategy is adopted in the 16 th round and the 22 nd round, and a decreasing coefficient is 0.1; and (3) after the learning parameters are set, training the constructed network by using the training set processed in the step (3-1).
8. The method of claim 7, wherein step 4 comprises the steps of:
step 4-1, performing the same processing as in step 3-1 on the image needing instance segmentation, except that only single-scale scaling is performed on the image during inference, namely the long side is 1333 and the short side is 800;
step 4-2, the trained network performs forward calculation on the test image obtained in the step 1-3, proposed frames propalss with confidence coefficient smaller than 0.05 are removed from the head of a prediction frame, 1000 proposed frames propalss with highest confidence coefficient are screened out, 50 proposed frames propalss for Mask branch prediction are obtained through non-maximum inhibition, and finally the kth Mask is obtained according to the class k of classification branch prediction, and the Mask with the size of 28 × 28 is scaled to the size of the corresponding propalsl scale and is subjected to binarization to generate the final Mask;
step 4-3: top-50 proposals also predict Mask intersection and maskIoU through maskIoU heads, and the value is multiplied by the classification confidence coefficient to obtain a Mask confidence coefficient;
step 4-4, adopting NMS (network management system) according to the Mask generated in the step 4-2 and the Mask confidence coefficient obtained in the corresponding step 4-3, wherein the threshold value of IoU is 0.5, and screening out the final prediction Mask;
and 4-5, carrying out mesoscale convection system example segmentation on the stationary satellite infrared cloud pictures of the Jianghuai region of China at adjacent moments to obtain mesoscale convection system examples at continuous moments.
9. The method of claim 8, wherein step 5 comprises the steps of:
step 5-1, solving each mesoscale convection systemCentroid coordinates of the examples
Figure FDA0002974083070000111
Characteristic area, strength P:
Figure FDA0002974083070000112
area=N
Figure FDA0002974083070000113
where N is the total number of pixel points, x, encompassed by the mesoscale convective system exampleiAnd yiIs the abscissa and ordinate of the ith pixel point, f (i) is the brightness temperature value of the ith pixel point, fzThe sum of the lighting temperatures of the pixels in the example;
step 5-2, solving the duration, the position change of the centroid point, the area change and the intensity change of the mesoscale convection system at two adjacent moments according to the time sequence; if two mesoscale convection systems at adjacent moments meet the conditions that the position change of the centroid point is less than or equal to 50m/s and the area change is less than or equal to 5km2The intensity change is less than or equal to 0.001 ℃/s, and the initial judgment is the same target;
step 5-3, if there are n mesoscale convection systems MCS at a moment, it is recorded as X1,X2,…,XnAnd one MCS at the previous moment is recorded as PjIf the tracking matching principle in the step 5-2 is satisfied, the phenomenon belongs to the splitting condition, and X with the largest product after splitting is selectedmContinuation PjM is more than or equal to 1 and less than or equal to n; updating XmThe index value of the previous time of the path is j and the duration is PjThe duration of +1, while the other MCS is considered as the index value is empty and the duration is initialized to 1 at the previous time of the new cloud picture;
if more than two MCSs are marked as P at the previous moment1,P2,…,PzOne MCS associated with this time is denoted PaIf the MCS tracking matching principle is satisfied, the phenomenon belongs toIn the merging case, P with the largest previous time area is selectedm(1. ltoreq. m. ltoreq.z) as PaLast moment track of, update PmThe index value of the previous time of the path is m and the duration is PmDuration of + 1; and when the life cycle of the MCS with the non-maximum area at the previous moment is finished, judging whether the duration time is more than or equal to 1h, and if so, obtaining the corresponding MCS path in a reverse order according to the index value at the previous moment, thereby realizing the tracking of the MCS.
CN202110270336.XA 2021-03-12 2021-03-12 Mesoscale convective system recognition and tracking method based on image anchor-free frame detection Active CN112836713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110270336.XA CN112836713B (en) 2021-03-12 2021-03-12 Mesoscale convective system recognition and tracking method based on image anchor-free frame detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110270336.XA CN112836713B (en) 2021-03-12 2021-03-12 Mesoscale convective system recognition and tracking method based on image anchor-free frame detection

Publications (2)

Publication Number Publication Date
CN112836713A true CN112836713A (en) 2021-05-25
CN112836713B CN112836713B (en) 2024-11-05

Family

ID=75930062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110270336.XA Active CN112836713B (en) 2021-03-12 2021-03-12 Mesoscale convective system recognition and tracking method based on image anchor-free frame detection

Country Status (1)

Country Link
CN (1) CN112836713B (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033720A (en) * 2021-05-28 2021-06-25 南京索安电子有限公司 Vehicle bottom picture foreign matter identification method and device based on sliding window and storage medium
CN113298094A (en) * 2021-06-10 2021-08-24 安徽大学 RGB-T significance target detection method based on modal association and double-perception decoder
CN113379773A (en) * 2021-05-28 2021-09-10 陕西大智慧医疗科技股份有限公司 Dual attention mechanism-based segmentation model establishing and segmenting method and device
CN113449811A (en) * 2021-07-16 2021-09-28 桂林电子科技大学 Low-illumination target detection method based on MS-WSDA
CN113627435A (en) * 2021-07-13 2021-11-09 南京大学 Method and system for detecting and identifying flaws of ceramic tiles
CN114049621A (en) * 2021-11-10 2022-02-15 石河子大学 A cotton top recognition and detection method based on Mask R-CNN
CN114067228A (en) * 2021-10-26 2022-02-18 神思电子技术股份有限公司 Target detection method and system for enhancing foreground and background discrimination
CN114529783A (en) * 2022-02-18 2022-05-24 中南大学 Positive and negative sample division method and single-stage target detection method thereof
CN114612765A (en) * 2022-02-21 2022-06-10 浙江工业大学之江学院 Loss function in convolutional neural network
CN114708562A (en) * 2022-04-29 2022-07-05 杭州电子科技大学 A Bicycle Helmet Detection Method Based on Improved FCOS and Embedding Grouping
CN114821201A (en) * 2022-06-28 2022-07-29 江苏广坤铝业有限公司 Hydraulic ramming machine for aluminum processing and method of using the same
CN114884688A (en) * 2022-03-28 2022-08-09 天津大学 Federated anomaly detection method across multi-attribute network
CN114972429A (en) * 2022-05-26 2022-08-30 国网江苏省电力有限公司电力科学研究院 Target tracking method and system for cloud-edge collaborative adaptive reasoning path planning
CN114996658A (en) * 2022-07-20 2022-09-02 中国空气动力研究与发展中心计算空气动力研究所 Projection-based hypersonic aircraft aerodynamic heat prediction method
CN115049923A (en) * 2022-05-30 2022-09-13 北京航空航天大学杭州创新研究院 SAR image ship target instance segmentation training method, system and device
CN115359070A (en) * 2021-10-11 2022-11-18 深圳硅基智能科技有限公司 Training method and measuring device based on tight frame mark
CN115471713A (en) * 2022-10-27 2022-12-13 成都理工大学 Shale strawberry-shaped pyrite particle size measuring method based on convolutional neural network
CN116359742A (en) * 2023-03-28 2023-06-30 国网江苏省电力有限公司连云港供电分公司 Online Estimation Method and System of State of Charge of Energy Storage Battery Based on Deep Learning Combined Extended Kalman Filter
CN116681962A (en) * 2023-05-05 2023-09-01 江苏宏源电气有限责任公司 Power equipment thermal image detection method and system based on improved YOLOv5
CN116863252A (en) * 2023-09-04 2023-10-10 四川泓宝润业工程技术有限公司 Methods, devices, equipment and storage media for detecting flammable materials at hot work sites
CN117058595A (en) * 2023-10-11 2023-11-14 齐鲁工业大学(山东省科学院) Video semantic feature and extensible granularity perception time sequence action detection method and device
CN117437424A (en) * 2023-12-20 2024-01-23 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Method, apparatus, device and computer program product for moving object instance segmentation
CN117456456A (en) * 2023-10-31 2024-01-26 安徽工业大学 Deep learning-based automatic identification method for abrasion of guide plate holes in control rod guide cylinder assembly
CN119251509A (en) * 2024-12-06 2025-01-03 成都信息工程大学 Mesoscale convection image segmentation method based on small target recognition network
CN120071106A (en) * 2025-04-25 2025-05-30 成都信息工程大学 A fast and efficient method and system for identifying mesoscale convective systems in mid- and low-latitude regions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093077A (en) * 2011-11-08 2013-05-08 通用电气航空系统有限公司 Method for integrating models of a vehicle health management system
CN106023177A (en) * 2016-05-14 2016-10-12 吉林大学 Thunderstorm cloud cluster identification method and system for meteorological satellite cloud picture
WO2018013148A1 (en) * 2016-07-15 2018-01-18 University Of Connecticut Systems and methods for outage prediction
CN109816684A (en) * 2019-01-21 2019-05-28 中国气象科学研究院 A Mesoscale Convective Dynamic Tracking Method Based on Next-Generation Geostationary Satellite Data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093077A (en) * 2011-11-08 2013-05-08 通用电气航空系统有限公司 Method for integrating models of a vehicle health management system
CN106023177A (en) * 2016-05-14 2016-10-12 吉林大学 Thunderstorm cloud cluster identification method and system for meteorological satellite cloud picture
WO2018013148A1 (en) * 2016-07-15 2018-01-18 University Of Connecticut Systems and methods for outage prediction
CN109816684A (en) * 2019-01-21 2019-05-28 中国气象科学研究院 A Mesoscale Convective Dynamic Tracking Method Based on Next-Generation Geostationary Satellite Data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DONG SUI等: "CSM‑Net: A Multi‑Task Colorectal Cancer Analysis Framework", 《SENSING AND IMAGING》, 14 August 2020 (2020-08-14), pages 1 - 14 *
TONG WU等: "Improved Anchor-Free Instance Segmentation for Building Extraction from High-Resolution Remote Sensing Images", 《REMOTE SENSING》, vol. 12, 8 September 2020 (2020-09-08), pages 1 - 15 *
刘新宇: "基于无锚点框的目标检测及实例分割方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 2021, 15 January 2021 (2021-01-15), pages 138 - 1835 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033720B (en) * 2021-05-28 2021-08-13 南京索安电子有限公司 Vehicle bottom picture foreign matter identification method and device based on sliding window and storage medium
CN113379773A (en) * 2021-05-28 2021-09-10 陕西大智慧医疗科技股份有限公司 Dual attention mechanism-based segmentation model establishing and segmenting method and device
CN113033720A (en) * 2021-05-28 2021-06-25 南京索安电子有限公司 Vehicle bottom picture foreign matter identification method and device based on sliding window and storage medium
CN113298094A (en) * 2021-06-10 2021-08-24 安徽大学 RGB-T significance target detection method based on modal association and double-perception decoder
CN113298094B (en) * 2021-06-10 2022-11-04 安徽大学 An RGB-T Salient Object Detection Method Based on Modality Correlation and Dual Perceptual Decoder
CN113627435A (en) * 2021-07-13 2021-11-09 南京大学 Method and system for detecting and identifying flaws of ceramic tiles
CN113449811A (en) * 2021-07-16 2021-09-28 桂林电子科技大学 Low-illumination target detection method based on MS-WSDA
CN115359070A (en) * 2021-10-11 2022-11-18 深圳硅基智能科技有限公司 Training method and measuring device based on tight frame mark
CN114067228A (en) * 2021-10-26 2022-02-18 神思电子技术股份有限公司 Target detection method and system for enhancing foreground and background discrimination
CN114049621A (en) * 2021-11-10 2022-02-15 石河子大学 A cotton top recognition and detection method based on Mask R-CNN
CN114529783A (en) * 2022-02-18 2022-05-24 中南大学 Positive and negative sample division method and single-stage target detection method thereof
CN114612765A (en) * 2022-02-21 2022-06-10 浙江工业大学之江学院 Loss function in convolutional neural network
CN114884688A (en) * 2022-03-28 2022-08-09 天津大学 Federated anomaly detection method across multi-attribute network
CN114708562A (en) * 2022-04-29 2022-07-05 杭州电子科技大学 A Bicycle Helmet Detection Method Based on Improved FCOS and Embedding Grouping
CN114972429B (en) * 2022-05-26 2024-07-09 国网江苏省电力有限公司电力科学研究院 Target tracking method and system for cloud edge cooperative self-adaptive reasoning path planning
CN114972429A (en) * 2022-05-26 2022-08-30 国网江苏省电力有限公司电力科学研究院 Target tracking method and system for cloud-edge collaborative adaptive reasoning path planning
CN115049923A (en) * 2022-05-30 2022-09-13 北京航空航天大学杭州创新研究院 SAR image ship target instance segmentation training method, system and device
CN114821201B (en) * 2022-06-28 2022-09-20 江苏广坤铝业有限公司 Hydraulic corner impacting machine for aluminum processing and using method thereof
CN114821201A (en) * 2022-06-28 2022-07-29 江苏广坤铝业有限公司 Hydraulic ramming machine for aluminum processing and method of using the same
CN114996658A (en) * 2022-07-20 2022-09-02 中国空气动力研究与发展中心计算空气动力研究所 Projection-based hypersonic aircraft aerodynamic heat prediction method
CN115471713A (en) * 2022-10-27 2022-12-13 成都理工大学 Shale strawberry-shaped pyrite particle size measuring method based on convolutional neural network
CN115471713B (en) * 2022-10-27 2023-05-30 成都理工大学 Shale strawberry-shaped pyrite particle size measurement method based on convolutional neural network
CN116359742A (en) * 2023-03-28 2023-06-30 国网江苏省电力有限公司连云港供电分公司 Online Estimation Method and System of State of Charge of Energy Storage Battery Based on Deep Learning Combined Extended Kalman Filter
CN116681962A (en) * 2023-05-05 2023-09-01 江苏宏源电气有限责任公司 Power equipment thermal image detection method and system based on improved YOLOv5
CN116863252A (en) * 2023-09-04 2023-10-10 四川泓宝润业工程技术有限公司 Methods, devices, equipment and storage media for detecting flammable materials at hot work sites
CN116863252B (en) * 2023-09-04 2023-11-21 四川泓宝润业工程技术有限公司 Methods, devices, equipment and storage media for detecting flammable materials at hot work sites
CN117058595A (en) * 2023-10-11 2023-11-14 齐鲁工业大学(山东省科学院) Video semantic feature and extensible granularity perception time sequence action detection method and device
CN117058595B (en) * 2023-10-11 2024-02-13 齐鲁工业大学(山东省科学院) Video semantic feature and extensible granularity perception time sequence action detection method and device
CN117456456A (en) * 2023-10-31 2024-01-26 安徽工业大学 Deep learning-based automatic identification method for abrasion of guide plate holes in control rod guide cylinder assembly
CN117437424A (en) * 2023-12-20 2024-01-23 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Method, apparatus, device and computer program product for moving object instance segmentation
CN119251509A (en) * 2024-12-06 2025-01-03 成都信息工程大学 Mesoscale convection image segmentation method based on small target recognition network
CN119251509B (en) * 2024-12-06 2025-03-14 成都信息工程大学 Mesoscale convection image segmentation method based on small target recognition network
CN120071106A (en) * 2025-04-25 2025-05-30 成都信息工程大学 A fast and efficient method and system for identifying mesoscale convective systems in mid- and low-latitude regions
CN120071106B (en) * 2025-04-25 2025-07-04 成都信息工程大学 Method and system for identifying middle-low latitude area fast and efficiently mesoscale convection system

Also Published As

Publication number Publication date
CN112836713B (en) 2024-11-05

Similar Documents

Publication Publication Date Title
CN112836713A (en) Identification and Tracking Method of Mesoscale Convective System Based on Image Anchorless Frame Detection
CN108416307B (en) An aerial image pavement crack detection method, device and equipment
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN109684922B (en) A multi-model recognition method for finished dishes based on convolutional neural network
CN114255403B (en) Optical remote sensing image data processing method and system based on deep learning
CN111291826B (en) A pixel-by-pixel classification method for multi-source remote sensing images based on correlation fusion network
CN111666903B (en) Method for identifying thunderstorm cloud cluster in satellite cloud picture
CN117409339A (en) Unmanned aerial vehicle crop state visual identification method for air-ground coordination
CN112347895A (en) Ship remote sensing target detection method based on boundary optimization neural network
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN112381030B (en) Satellite optical remote sensing image target detection method based on feature fusion
CN110533100B (en) Method for CME detection and tracking based on machine learning
CN116469020A (en) Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance
CN113435254A (en) Sentinel second image-based farmland deep learning extraction method
CN114565824B (en) Single-stage rotating ship detection method based on full convolution network
CN117541535A (en) A transmission line inspection image detection method based on deep convolutional neural network
CN110969121A (en) High-resolution radar target recognition algorithm based on deep learning
CN111274964A (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN114863198A (en) Crayfish quality grading method based on neural network
Pillai et al. Fine-tuned EfficientNetB4 transfer learning model for weather classification
CN114140698B (en) A water system information extraction algorithm based on FasterR-CNN
Sun et al. Flame Image Detection Algorithm Based onComputer Vision.
CN118570585B (en) Intelligent Generation Method of SAR Target Data by Fusion of Geometric Information
CN112084941A (en) Target detection and identification method based on remote sensing image
CN118038152A (en) Infrared small target detection and classification method based on multi-scale feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant