Summary of the invention
In view of this, the purpose of the present invention is to propose to a kind of obvious objects based on multiple dimensioned convolution feature extraction and fusion
Detection method can significantly improve obvious object detection accuracy.
The present invention is realized using following scheme: a kind of to be detected based on the obvious object of multiple dimensioned convolution feature extraction and fusion
Method, specifically includes the following steps:
Step S1: carrying out data enhancing, while handling color image and corresponding artificial mark figure, increases instruction
Practice the data volume of data set;
Step S2: extracting Analysis On Multi-scale Features, and row of channels of going forward side by side is compressed to optimize the computational efficiency of network;
Step S3: merging multiple dimensioned feature, the notable figure Pred predictedi;
Step S4: intersect entropy loss by solving to minimize, the optimized parameter of model is arrived in study;Finally using trained
Prototype network carrys out the obvious object in forecast image.
Further, step S1 specifically includes the following steps:
Step S11: each color image artificial mark figure corresponding with its concentrated to data zooms in and out together, makes
The calculation amount of neural network can be undertaken by calculating equipment;
Step S12: each color image artificial mark figure corresponding with its concentrated to data is cut out at random together
Operation is cut, to increase the diversity of data;
Step S13: it is overturn by image level and generates mirror image, to expand the data volume of legacy data collection.
Further, step S2 specifically includes the following steps:
Step S21: the network structure intrinsic to U-Net improves, and wherein the coder structure of U-Net network is to scheme
Picture classification convolutional network generates 5 kinds of different scales by continuous stacked combination convolutional layer and pond layer as character network
Convolution feature, in convolution feature EniWith convolution feature Eni+1Between there are a pond layers to reduce characteristic pattern step by step
Size, the step-length that the pond layer is arranged is 2, so that Eni+1Compared to EniCharacteristic results are reduced in wide and high two spaces dimension
Half;In order to keep convolution feature and have enough information on Spatial Dimension, make between most latter two convolution feature
The step-length of pond layer is 1, so that the size that most latter two convolution feature is consistent in wide and high two spaces dimension;
Step S22: one Multi resolution feature extraction module of design acts on what the improved U-Net network of step S21 generated
The convolution feature of each scale, obtains multiple dimensioned content characteristic;
Step S23: a channel compressions module is added and acts on multiple dimensioned content characteristic to optimize the computational efficiency of network.
Further, step S22 specifically includes the following steps:
Step S221: three convolutional layers of design, with convolution feature EniAs input, these three convolution are all to execute depth
Separable cavity convolution operation, wherein the coefficient of expansion of empty convolution is 3,6,9 respectively;These three operate obtained feature knot
Fruit and convolution feature EniFeature sizes be consistent, feature sizes are all (c, h, w);
Three characteristic results: being stitched together by step S222 using attended operation on channel dimension and it is big to obtain feature
The small characteristic results for (3c, h, w);
Step S223: the characteristic results for obtaining step S222 using the convolution operation that a convolution kernel size is (1,1)
Channel compressions to convolution feature EniUnanimously and obtain the multiple dimensioned content characteristic that feature sizes are (c, h, w).
Further, step S3 specifically includes the following steps:
Step S31: one multi-scale feature fusion module of design, if the multiple dimensioned content characteristic Feat of inputiFeature
Size is (c, h, w);It is respectively the depth of (1, k) and (k, 1) using convolution kernel size in the multi-scale feature fusion module
It spends separable convolution operation and convolution kernel size is the separable convolution operation of depth of (k, 1) and (1, k), obtain and input
Feature FeatiFusion Features result of the same size;
The decoder architecture of step S32:U-Net network and the character network of encoder are all corresponding with 5 different scales
Characteristic results, the convolution feature Dec for each scale that the decoder architecture of U-Net network generatesi, all it is using multiple dimensioned spy
Fusion Module is levied to merge multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1, it is assumed here that the convolution feature of input
Deci+1Feature sizes be (c, h/2, w/2);Firstly, first to convolution feature Deci+1Using up-sampling operation on Spatial Dimension
Twice of amplification, thus convolution feature Deci+1With multiple dimensioned content characteristic FeatiHave same size on Spatial Dimension, it is special
Levying size is (c, h, w);Then by multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1It is spelled using concatenation
Rear feature is connect, feature sizes are (2c, h, w), reapply convolution operation, obtain feature by ReLU activation primitive and BN layers
Size is the characteristic results of (c, h, w);And then, feature is obtained to obtained characteristic results application multi-scale feature fusion module
Fusion results, while this feature result and Fusion Features result are subjected to concatenation again, convolution operation activates letter by ReLU
It counts and BN layers obtains the characteristic results Dec that feature sizes are (c, h, w)i;Finally, reapplying convolution kernel size as (1,1) volume
Product is operated characteristic results DeciPort number compression half in order to Deci-1Merged, by ReLU activation primitive with
And BN layers obtain feature sizes be (0.5c, h, w) characteristic results Deci, and pass it through convolution operation for channel compressions to 1,
The notable figure Pred that can be predicted using Sigmoid functioni。
Further, step S31 specifically includes the following steps:
Step S311: successively to the multiple dimensioned content characteristic Feat of inputiIt is (1, k) and (k, 1) using convolution kernel size
The separable convolution operation of depth, while again successively to input feature vector FeatiIt is (k, 1) and (1, k) using convolution kernel size
The separable convolution operation of depth, there is addition BN layers and to respectively obtain two characteristic results after this successively operation twice;
Step S312: two characteristic results are subjected to sum operation by channel dimension and are obtained and input feature vector FeatiSize
Consistent characteristic results;
Step S313: carry out the spy on the channel to characteristic results using the convolution operation that a convolution kernel size is (1,1)
Sign is modeled and is obtained and input feature vector FeatiFusion Features result of the same size.
Further, in step S4, the calculating for intersecting entropy loss Loss uses following formula:
Compared with prior art, the invention has the following beneficial effects: the invention proposes a Multi resolution feature extraction moulds
Module is directly embedded into the U-Net of typical coder-decoder structure in network design by block and Multiscale Fusion module
The network architecture, while the redundancy of information on feature channel on decoder architecture is also contemplated, apply a channel compressions mould
Block is higher to make model computational efficiency.The method can significantly improve obvious object detection accuracy.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
It is noted that described further below be all exemplary, it is intended to provide further instruction to the application.Unless another
It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular
Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
As shown in Figure 1, present embodiments providing a kind of based on the inspection of the obvious object of multiple dimensioned convolution feature extraction and fusion
Survey method, specifically includes the following steps:
Step S1: carrying out data enhancing, while handling color image and corresponding artificial mark figure, increases instruction
Practice the data volume of data set;
Step S2: extracting Analysis On Multi-scale Features, and row of channels of going forward side by side is compressed to optimize the computational efficiency of network;
Step S3: merging multiple dimensioned feature, the notable figure Pred predictedi;
Step S4: intersect entropy loss by solving to minimize, the optimized parameter of model is arrived in study;Finally using trained
Prototype network carrys out the obvious object in forecast image.
In the present embodiment, the step S1 carries out data enhancing, while to color image and corresponding artificial mark
Figure is handled, and the data volume of training dataset is increased.For training the international mainstream data set one of obvious object detection network
As be to scheme comprising color image and corresponding handmarking, wherein in color image such as Fig. 2 shown in (a), handmarking's figure class
It is similar to the bianry image that notable figure (in such as Fig. 2 (b)) is the image obvious object region gone out by handmarking.Due to constructing data
Collection needs to expend biggish manpower, considers that training deep neural network needs enough data, it is therefore desirable in legacy data collection
Data enhancement operations are carried out on the basis of data volume.Therefore step S1 specifically includes the following steps:
Step S11: each color image artificial mark figure corresponding with its concentrated to data zooms in and out together, makes
The calculation amount of neural network can be undertaken by calculating equipment;
Step S12: each color image artificial mark figure corresponding with its concentrated to data is cut out at random together
Operation is cut, to increase the diversity of data;
Step S13: being overturn by image level and generate mirror image, to expand the data volume of legacy data collection, to meet instruction
Practice biggish data volume required for the convolutional neural networks CNN of depth, enhances the generalization ability of model.
In the present embodiment, step S2 specifically includes the following steps:
Step S21: the network structure intrinsic to U-Net improves, and wherein the coder structure of U-Net network is to scheme
As classification convolutional network is as character network (such as VGG or ResNet network structure), by continuous stacked combination convolutional layer with
And pond layer generates the convolution feature of 5 kinds of different scales, such as the En in Fig. 21, En2, En3, En4And En5Five corresponding spies
Levy result.In this five convolution features, in convolution feature EniWith convolution feature Eni+1Between there are a pond layers to come gradually
Ground reduces the size of characteristic pattern, and the step-length that the pond layer is arranged is 2, so that Eni+1Compared to EniCharacteristic results are wide and two high
Reduce half on Spatial Dimension, this also results in the decaying of convolution feature information on Spatial Dimension;In order to keep convolution special
It seeks peace and has enough information on Spatial Dimension, make most latter two convolution feature (En4And En5) between pond layer step-length
It is 1, so that most latter two convolution feature (En4And En5) size that is consistent in wide and high two spaces dimension;
Step S22: one Multi resolution feature extraction module of design acts on what the improved U-Net network of step S21 generated
The convolution feature of each scale, obtains multiple dimensioned content characteristic;Wherein, Multi resolution feature extraction module is as shown in figure 3, here
Assuming that the feature sizes of convolution feature are (c, h, w);
Step S23: a channel compressions module is added and acts on multiple dimensioned content characteristic to optimize the computational efficiency of network.
Channel compressions module is as shown in figure 4, wherein " SE Module " is by Hu et al. in SENet (Squeeze-and-Excitation
Networks) the module proposed in paper.SE module is with multiple dimensioned content characteristic FeatiAs input, by each channel
On feature between correlation modeled and carry out weighting operations to keep feature generalization ability stronger.Then channel compressions mould
The port number of characteristic results is compressed to original half using the convolution operation that a convolution kernel size is (1,1) by block, then
By ReLU (Rectified Linear Unit, line rectification function) function and BN, (Batch Normalization is criticized
Normalization) layer obtains the multiple dimensioned content characteristic Feat after channel compressionsi。
In the present embodiment, step S22 specifically includes the following steps:
Step S221: three convolutional layers of design, with convolution feature EniAs input, these three convolution are all to execute depth
Separable cavity convolution operation, wherein the coefficient of expansion of empty convolution is 3,6,9 respectively;The swollen of different empty convolution is set
Swollen coefficient can make convolution operation capture different size of content area feature on image, that is, generate multiple dimensioned content regions
The characteristic results in domain.The characteristic results and convolution feature En that these three operations obtainiFeature sizes be consistent, feature sizes
All it is (c, h, w);
Step S222: three characteristic results are stitched together simultaneously on channel dimension using attended operation (concate)
Obtain the characteristic results that feature sizes are (3c, h, w);
Step S223: the characteristic results for obtaining step S222 using the convolution operation that a convolution kernel size is (1,1)
Channel compressions to convolution feature EniUnanimously and the multiple dimensioned content characteristic that feature sizes are (c, h, w) is obtained, in Fig. 4
Feati。
In the present embodiment, step S3 specifically includes the following steps:
Step S31: in order to merge various sizes of feature, the present embodiment designs a multi-scale feature fusion module, such as
Shown in Fig. 5, if the multiple dimensioned content characteristic Feat of inputiFeature sizes be (c, h, w);In the multi-scale feature fusion mould
In block, respectively using convolution kernel size be (1, k) and (k, 1) the separable convolution operation of depth and convolution kernel size for (k,
1) it with the separable convolution operation of the depth of (1, k), obtains and input feature vector FeatiFusion Features result of the same size;This
It is equivalent to the convolution operation of (k, k), but required computing resource can less splice different scale from Spatial Dimension simultaneously
Content area feature;
The decoder architecture of step S32:U-Net network and the character network of encoder are all corresponding with 5 different scales
Characteristic results, the convolution feature Dec for each scale that the decoder architecture of U-Net network generatesi, all it is using multiple dimensioned spy
Fusion Module is levied to merge multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1, it is assumed here that the convolution feature of input
Deci+1Feature sizes be (c, h/2, w/2);Firstly, first to convolution feature Deci+1Using up-sampling operation on Spatial Dimension
Twice of amplification, thus convolution feature Deci+1With multiple dimensioned content characteristic FeatiHave same size on Spatial Dimension, it is special
Levying size is (c, h, w);Then by multiple dimensioned content characteristic FeatiAnd convolution feature Deci+1It is spelled using concatenation
Rear feature is connect, feature sizes are (2c, h, w), reapply convolution operation, obtain feature by ReLU activation primitive and BN layers
Size is the characteristic results of (c, h, w);And then, feature is obtained to obtained characteristic results application multi-scale feature fusion module
Fusion results, while this feature result and Fusion Features result are subjected to concatenation again, convolution operation activates letter by ReLU
It counts and BN layers obtains the characteristic results Dec that feature sizes are (c, h, w)i;Finally, reapplying convolution kernel size as (1,1) volume
Product is operated characteristic results DeciPort number compression half in order to Deci-1Merged, by ReLU activation primitive with
And BN layers obtain feature sizes be (0.5c, h, w) characteristic results Deci, and pass it through convolution operation for channel compressions to 1,
The notable figure Pred that can be predicted using Sigmoid functioni.It is worth noting that due to Dec4And Dec5Has phase
Same number of active lanes, so port number is without compression here.
In the present embodiment, step S31 specifically includes the following steps:
Step S311: successively to the multiple dimensioned content characteristic Feat of inputiIt is (1, k) and (k, 1) using convolution kernel size
The separable convolution operation of depth, while again successively to input feature vector FeatiIt is (k, 1) and (1, k) using convolution kernel size
The separable convolution operation of depth, there is addition BN layers and to respectively obtain two characteristic results after this successively operation twice;
Step S312: two characteristic results are subjected to sum operation by channel dimension and are obtained and input feature vector FeatiSize
Consistent characteristic results;
Step S313: carry out the spy on the channel to characteristic results using the convolution operation that a convolution kernel size is (1,1)
Sign is modeled and is obtained and input feature vector FeatiFusion Features result of the same size.
In the present embodiment, in step S4, using Adam, (Adaptive moment estimation, adaptive square are estimated
Meter) algorithm the training stage optimize loss function.As shown in Figure 2, in step 3 each scale characteristic results DeciIt is all right
Answer a LossiCalculating, wherein each LossiIt is all the notable figure Pred predicted in Fig. 6iIt is calculated with artificial mark figure
An obtained intersection entropy loss.
Wherein, the calculating that network intersects entropy loss Loss uses following formula:
The optimized parameter of network is obtained by Adam algorithm optimization, is finally predicted using network significant in color image
Object.
It, can be with when algorithm goes the feature for extracting more scale dependents to be merged again on the basis of original scale feature
Make fused feature that there is stronger generalization ability.The thinking extracting a variety of scale convolution features and being merged is follow,
The present embodiment proposes a Multi resolution feature extraction module and Multiscale Fusion module.It is in network design that module is directly embedding
Enter the U-Net network architecture to typical coder-decoder structure, while also contemplating on decoder architecture on feature channel
The redundancy of information applies a channel compressions module to keep model computational efficiency higher.In conclusion the present embodiment proposes
A kind of obvious object detection method based on multiple dimensioned convolution feature extraction and fusion, one kind of algorithm design are based on multiple dimensioned
The network structure of feature extraction and fusion can significantly improve obvious object detection accuracy.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
The above described is only a preferred embodiment of the present invention, being not that the invention has other forms of limitations, appoint
What those skilled in the art changed or be modified as possibly also with the technology contents of the disclosure above equivalent variations etc.
Imitate embodiment.But without departing from the technical solutions of the present invention, according to the technical essence of the invention to above embodiments institute
Any simple modification, equivalent variations and the remodeling made, still fall within the protection scope of technical solution of the present invention.