Specific embodiment
Technical solution in order to enable those skilled in the art to better understand the present invention, below in conjunction with of the invention real
The attached drawing in example is applied, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described implementation
Example is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this field is common
Technical staff's every other embodiment obtained without making creative work, all should belong to protection of the present invention
Range.
In the prior art at present, semantic segmentation processing generally includes two parts: character representation decodes (Decoding
Of Feature Representation) and expansion convolution (Dilated Convolution).
The semantic segmentation information of pixel scale can be obtained by character representation decoding, the characteristic pattern of output has and input
The identical size of image.Since the maximization Chi Huahe in convolutional neural networks strides convolution operation, inevitably cause most
The size reduction of the characteristic pattern of several layer networks afterwards can decode the characteristic pattern of low resolution for this problem kinds of schemes
To accurate information.Common bilinear interpolation can save memory space and processing speed is fast.The method of deconvolution uses
Pond location information in pondization processing, necessary information required for Lai Huifu image reconstruction and characteristic visual.In some examples
In son, an individual uncoiling lamination is added in decoding stage, generates prediction result using the characteristic pattern that middle layer stacks.
In other examples, target object, such as chair, desk or automobile are generated in multiple features using multiple uncoiling laminations.
By the pond position in upper storage reservoir (unpooling) the step of using storage in some researchs, using uncoiling lamination as rolling up
The mirror-image structure of lamination.Some other researchs illustrate in the communication process of uncoiling lamination, may be implemented coarse to careful
(coarse-to-fine) detection of object structures, these object structures are very crucial for rebuilding subtle information.
There are also some researchs to use a similar mirror-image structure, and final to realize in conjunction with the information of uncoiling lamination execution up-sampling
Prediction.There are also some systems to come prediction label image and statistics with higher by using the classifier of pixel scale
Efficiency.Wherein, bilinear interpolation should be relatively wide.But bilinear interpolation up-sampling is obtained by filling 0 and inputs phase
With the output of resolution ratio, it is easily lost detailed information, wisp information is lost, loses the precision of image data, and bilinearity
Interpolation does not have learning ability.
Expanding convolution (or being known as empty convolution) is researched and developed for wavelet decomposition earliest.Expand the main core of convolution
The heart is to fill the receptive field for " 0 " carrying out enlarged image between the pixel of convolution kernel, so as to realize in deep neural network
Dense feature extract.In the frame of semantic segmentation, expansion convolution is also used for expanding the size of convolution kernel.Make in some researchs
It realizes that context polymerize (context aggregation) with the serializing layer with cumulative amplification degree, and designs one
It is a that " spatial pyramid (Atrous Spatial Pyramid Pooling, ASPP) based on hole is multiple arranged side by side by being arranged
Convolutional layer is expanded, to capture multiple dimensioned object and contextual information.Recently, expansion convolution is used for wide range, such as
Object detection, visual problem based on light stream are answered and audio generates.But these convolutional system explaineds can be because of standard extension
Convolution and lead to the problem of one " grid effect ", lead to not the shape or profile that identify larger-size object.
The above problem as present in semantic segmentation technology namely wisp information lose, can not identify large scale object
The shape or profile of body lead to not the closed outline for effectively and accurately extracting to obtain object.
For the above-mentioned problems in the prior art, the embodiment of the present application provides a kind of detection of object closed outline
Method and apparatus, to solve the problems, such as this.In the technical solution that some embodiments of the present application provide, by decoding stage
Intensive up-sampling process of convolution is carried out to the characteristic pattern of coding stage output, the resolution ratio of forecast image is improved, restores more
Detailed information, so as to retain more wisp information, and further detection obtains the profile information of wisp.?
In the technical solution that other embodiments provide, multiple mixing is carried out in characteristic pattern of the coding stage to extraction and is expanded at convolution
Reason, the local information and long-range information that can retain more convolution can obtain big object so as to overcome grid effect
Continuous profile information, so as to detect to obtain the profile information of big object.To technology provided by the embodiments of the present application
Scheme can accurately and effectively detect to obtain the profile information of object, be able to solve the above problem in the prior art.
It on the other hand,, can be direct without marking object by bounding box using method provided by the embodiments of the present application
Extract the closed outline information of object.In the prior art, the reality of object when marking object, can not be identified by bounding box
Border shape, can not the information such as size, area to object accurately inferred;And in the technical field of such as automatic Pilot etc.
In, these information will be the key message for carrying out many decisions.Also, some objects can bounding box merge processing in quilt
It neglects, object information is caused to be lost.Using method provided by the embodiments of the present application, the disk wheel of object can be directly extracted
Exterior feature is inferred or decision mentions to be other so as to information such as the true forms, size, area that further identify object
For accurately and effectively information.
It is core of the invention thought above, in order to enable those skilled in the art to better understand the present invention in embodiment
Technical solution, and keep the above objects, features, and advantages of the embodiment of the present invention more obvious and easy to understand, with reference to the accompanying drawing
Technical solution in the embodiment of the present invention is described in further detail.
Fig. 1 shows the process flow diagram of the detection method of object closed outline provided by the embodiments of the present application, comprising:
Step 101, object closed outline detection device export cataloged procedure in the decoding process that semantic segmentation is handled
Characteristic pattern carry out intensive up-sampling process of convolution, obtain size output image identical with input image size, output image
In include object example contour line;
Step 102, according to pixel class, identify and extract the contour line of object example from output image.
Wherein, in a step 101, intensive up-sampling process of convolution is carried out to characteristic pattern, as shown in Figure 2, comprising:
The port number c of characteristic pattern is switched to down-sampling factor d in cataloged procedure by step 10112With predetermined object category
Quantity L product;
For example, the input picture size of model is (H, W, C), wherein H is the height of image data, and W is image data
Width, C are the port number of image data.It is F by the characteristic pattern size that the processing of cataloged procedure is input to decoding processout=
(h, w, c), wherein H/d=h, W/d=w, d are the down-sampling factor.In the prior art using bilinear interpolation to characteristic pattern into
Row up-sampling, if d=16, namely it is input to 16 times of output down-sampling, if the length or width of an object is less than
16 pixels (pixels), such as electric pole, traffic lights, traffic signals or a people of distant place, which will not
It is sampled, and bilinear interpolation up-sampling will be unable to restore this information, to lose the object in the output image.
In step 1011 provided by the embodiments of the present application, the intensive process of convolution that up-samples is by characteristic pattern FoutSize in
Port number c converted, obtain port number d2* L, d are the down-sampling factor, and L is the quantity of scheduled whole object categories, are obtained
To characteristic pattern Fout=(h, w, d2*L)。
It specifically, can be according to the port number d after the port number c of former characteristic pattern and channel conversion2* the number ratios of L close
It is d2* L/c, learns the characteristic pattern on each channel, (the d after being learnt2* L) on a channel having a size of h*w's
Characteristic pattern, so that each intensive up-sampling convolutional layer is learning the prediction to each pixel.Wherein, on each channel
Characteristic pattern is learnt, and is that the learning functionality obtained according to neural network training study in advance is realized.For example, in former characteristic pattern
Port number c and scheduled object category the identical situation of quantity L numerical value under, can be by the feature graphics on each channel
Practise d2Part, the d after being learnt2* the characteristic pattern having a size of h*w on L channel.
Step 1012 is combined the characteristic pattern after number of channels conversion, and carries out normalizing to the characteristic pattern after combination
Change processing, to obtain size output image identical with input image size.
That is, to the characteristic pattern F after number of channels conversionout=(h, w, d2* L) be combined, obtain having a size of (h*d,
W*d, L) characteristic pattern, H/d=h, W/d=w as described above, then, the size of the characteristic pattern after the combination are (H, W, L), namely
The size of characteristic pattern is upsampled to size identical with input picture.
Wherein, to the processing of the characteristic pattern combination after number of channels conversion, it can be and obtain characteristic pattern according to collection apparatus
Sequence and channel sequence, characteristic pattern is combined.For example, in d2* the characteristic pattern in L channel, on n-th~m channel
It is that the feature extracted to the xth row data of input image data presses the characteristic pattern on n-th~m channel when being combined
According to pixel order, be successively combined to the xth row of output data, and so on combination to subsequent characteristics figure.Characteristic pattern is carried out
Combined processing can be implemented according to the regulation of actual algorithm in concrete application scene, and the application is not done specifically here
It limits.
Middle bilinear interpolation method does not have learnability compared with the prior art, does not have deconvolution, the application yet
The intensive up-sampling treatment that embodiment provides has learnability, can be before processing shown in Fig. 1, previously according to truthful data
Training neural network obtains semantic segmentation model, includes the intensive up-sampling convolutional layer of decoding stage in semantic segmentation model.Its
In, the intensive convolutional layer that up-samples may include multiple convolutional layers.Specifically, a series of up-sampling can be arrived by training study
Filter, by this series of up-sampling filter to having a size of Fout=(h, w, c) characteristic pattern up-sampling for having a size of
Fout=(h, w, d2* L) image data.
And the output image data after up-sampling is normalized by one softmax layers, it obtains to the end
Export image.
The example handled using method shown in Fig. 1 input picture is shown in Fig. 3.It is input on the left of Fig. 3
Image, other parts are the output image in the case of the different down-sampling factors from left to right, it can be seen that in input picture
Some wisps have obtained good identification, such as electric pole and signal lamp.
During specific implementation, it can be trained study in full convolutional network, obtain above-mentioned semantic segmentation
Model.
By intensive up-sampling process of convolution as described above, the characteristic pattern of effectively size compression can be restored to
Image data identical with input image data size, and according to the quantity of the down-sampling factor and scheduled object category, it is right
The port number of characteristic pattern is converted, and the more intensive characteristic pattern of quantity is obtained, can be right by the more dense characteristic pattern of quantity
The classification of more pixels is predicted, so as to restore the information of discreet portions or wisp in image data, makes up volume
The problem of wisp information caused by code process down-sampling loses and can not be restored by bilinear interpolation.To this Shen
Please embodiment provide method can from output image in extract obtain wisp example information and wisp disk wheel
Wide information.
Also, intensive up-sampling process of convolution can be realized directly from characteristic pattern to the processing for exporting mark image, and nothing
It needs as first carrying out bilinear interpolation to characteristic pattern in the prior art, and is up-sampled the mark exported to the image of interpolation
Image.On the other hand, intensively up-sampling process of convolution is directly handled the characteristic pattern of original resolution, can be realized pixel
The decoding of rank.
Also, further as shown in figure 4, on the basis of processing shown in Fig. 1, method provided by the embodiments of the present application is also
Include:
Step 103, the occluding contour according to the object example of extraction, determine the shape, size and/or face of object example
Product.
Using method provided by the embodiments of the present application, without marking object by bounding box, object can be directly extracted
Closed outline information.In the prior art, the true form of object when marking object, can not be identified by bounding box,
Can not the information such as size, area to object accurately inferred;And in the technical field of such as automatic Pilot etc., these
Information will be the key message for carrying out many decisions.Using method provided by the embodiments of the present application, object can be directly extracted
Closed outline, so as to information such as the true forms, size, area that further identify object, infer to be other or
Person's decision provides accurately and effectively information.
Based on identical inventive concept, on the basis of the embodiment of the present application method shown in Fig. 1, a kind of object is additionally provided
The detection method of closed outline.
Fig. 5 shows the detection method of object closed outline provided by the embodiments of the present application, comprising:
Step 100, object closed outline detection device are in the cataloged procedure that semantic segmentation is handled, to the characteristic pattern of extraction
It carries out multiple mixing and expands process of convolution, obtain the characteristic pattern for expanding receptive field;
Step 101, in decoding process, intensive up-sampling process of convolution is carried out to the characteristic pattern of cataloged procedure output, is obtained
To size output image identical with input image size, the occluding contour in image including object example is exported;
Step 102, according to pixel class, identify and extract the occluding contour of object example from output image.
Wherein, the processing of above-mentioned steps 100 namely on multiple convolutional layers, using a series of amplification degree to characteristic pattern into
Row process of convolution.
In the prior art, expand process of convolution and convolution is carried out to characteristic pattern usually using expansion convolution kernel, to expand spy
The receptive field of figure is levied, expanding convolution kernel, building obtains and being inserted into " 0 " between each pixel in convolution kernel.Two dimension is believed
Number, convolution kernel size is K*K, and the result by expanding convolution is Kd*Kd, wherein Kd=k+ (k-1) (r-1), r is amplification degree.
The receptive field (or being the visual field) of characteristic pattern can be expanded by expanding convolution, can replace the pond layer in full convolutional network framework.Example
A convolutional layer such as in ResNet-101 has step-length s=2, then step-length can be reset to 1 to replace down-sampling to operate,
And 2 will be set as to subsequent network layer, the rate of will be enlarged by.The network layer of whole progress down-sampling processing is alternately performed
Processing is stated, then the characteristic pattern exported can expand receptive field.In practical applications, under expansion process of convolution is typically used in
On the characteristic pattern of sampling, to reach reasonable efficiency and expense.But grid effect can be caused by expanding process of convolution.
In the embodiment of the present application, above-mentioned steps 100 can be realized are as follows: each convolutional layer in multiple expansion convolutional layers
On, using based on K*K size and amplification degree is riConvolution kernel, expansion process of convolution is carried out to characteristic pattern;Wherein, 1≤i≤
N, n are the number of plies of convolutional layer.The processing can by four kinds of at least following modes one of implemented:
Multiple expansion convolutional layers are divided into several groups by mode one, and the amplification degree for expanding convolutional layer in every group is constantly incremented by.
For example, s group can be divided by N layers when expansion convolutional layer has N layers, wherein every group includes at least two layers of convolutional layer, often
The size of the convolution kernel used in group uses constantly incremental amplification degree, namely in s group for K*K, rsi-2<rsi-1<
rsi.In this way, the amplification degree in every group is constantly incremented by, expanding in convolutional layer first and last in multiple groups, amplification degree changes at sawtooth wave,
The convolution kernel of smaller amplification degree can extract local information, and the convolution kernel of larger amplification degree can extract long range information.
Mode two, each convolution kernel for expanding convolutional layer and there is any amplification degree.
Any amplification degree is set for convolution kernel, the receptive field of convolution kernel can be expanded, so as to identify biggish object.
Mode three, on the basis of aforesaid way one or mode two, the transformation factor that amplification degree is incremented by every time is different.
For example, amplification degree r=(1,2,5), the transformation factor that amplification degree is incremented by every time is 1 and 3, namely change incremental every time
The factor is changed to be different.
The different multiple amplification degrees of transformation factor are set, one group of expansion convolution kernel can be made to cover more pixels, phase
If being instead using the identical amplification degree of transformation factor, such as r=(2,4,6,8), the changed factor that amplification degree is incremented by every time
2, overcome the effect of grid effect weaker in this way.
Mode four, in above-mentioned three kinds of modes on the basis of any mode, the last layer expands the expansion volume of convolutional layer
The size of the receptive field of product core is less than or equal to the size of characteristic pattern.
That is, by preset amplification degree, so that the last layer expands the receptive field of the expansion convolution kernel of convolutional layer
Size be less than or equal to characteristic pattern size, the receptive field of the last layer convolutional layer can be expanded, especially when expand convolution
Under the size of the receptive field of core and the identical situation of the size of characteristic pattern, the receptive field for expanding convolution kernel can cover characteristic pattern
Whole region, thus will not lose it is any cavity or edge, can guarantee the consistent and complete of long-range information.
Below in the above described manner for one and the prior art compares explanation.
The example for expanding convolution kernel is shown in Fig. 6.The pixel of surrounding grey is the meter to the pixel of central black in Fig. 6
Count the pixel to contribution.Fig. 6 a is expansion convolution kernel schematic diagram in the prior art.Fig. 6 b is to mention using the embodiment of the present application
The mixing of confession expands the convolution kernel schematic diagram of convolution.
The size of convolution kernel is 3*3 in Fig. 6 (a), and amplification degree from left to right is r=2.For expanding in convolutional layer
One pixel p, contributive to its is upper one layer of K centered on pd*KdClose region because expand convolution introduce 0
Value, in Kd*KdRegion only calculate K*K pixel, between non-zero pixel between be divided into r-1.Such as in k=3, the expansion of r=2
In big convolution, as shown in the figure on the left side Fig. 6 a, only have 9 pixels to be made that contribution in 25 pixels.Since all layers all have
There is identical amplification degree r, for the point p of top expanded in convolutional layer, the maximum possible of contribution is played to the calculating of p point
The quantity of pixel is (w ' * h ')/r2, wherein w ' and h ' is the width and height for the characteristic pattern that bottom expands convolutional layer respectively.From
And p point can only check its information in the form of tessellated in the characteristic pattern of top layer, will lead to losing for bulk information in this way
It loses (when r=2, about 75% information can be lost).When the r in higher convolutional layer becomes increasing, can make
The data sampled from input are more and more sparse, are unfavorable for convolution study, this is because: 1) the complete loss of local message;2)
It is too far uncorrelated between information.Another is ceased as a result, being inscribed to collect mail from the region of r*r from entirely different " grid " set,
This will damage the consistency of local information.
In the convolution kernel shown in Fig. 6 (b), using aforesaid way one, several convolutional layers are divided into one group, each group
Amplification degree be constantly incremented by, such as K=3, r=(1,2,3), so that the transformation of amplification degree is similar to the shape of sawtooth wave, in this way
Local information can be obtained in the bottom on the left side, and the information in broader region can be obtained in top layer on the right.No
The segmentation requirement of wisp and big object can be taken into account with the combination of amplification degree, i.e., lesser amplification degree extracts local information, compared with
Big amplification degree extracts long range information.
The above embodiment for expanding convolution by mixing can rolled up by the way that a series of convolution kernel of amplification degrees is arranged
So that expansion convolution kernel is covered more pixels as far as possible during product, takes into account and extract local information and long-range information.Also, with expansion
The range of the receptive field covering of big convolution kernel is bigger, and the cavity of loss and marginal information are fewer, can guarantee long-range information
It is consistent and complete, it can be efficiently against grid effect, so as to obtain complete, the closed shape and wheel of big object
It is wide.
On the other hand, there is learnability since mixing expands process of convolution, it can be before processing shown in Fig. 5, in advance
Semantic segmentation model is obtained according to truthful data training neural network, includes the mixing expansion of coding stage in semantic segmentation model
Convolutional layer.
During specific implementation, it can be trained study in full convolutional network, obtain above-mentioned semantic segmentation
Model.
Fig. 7 is shown in a particular application, the semanteme of method shown in the accomplished Fig. 5 of training on ResNet-101 framework
The network architecture of parted pattern.In coding stage in Fig. 7, multiple mixing expand convolutional layer and mix to the characteristic pattern extracted
It closes and expands process of convolution, in decoding stage, the characteristic pattern that multiple intensive up-sampling convolutional layers export coding stage is handled,
The mark image exported.
Method shown in Fig. 5 is that the mixing in cataloged procedure expands process of convolution and decodes at intensive up-sampling convolution in the process
The combination of reason, mixing, which expands process of convolution, can effectively expand the shape and profile of the receptive field of characteristic pattern, the big object of identification,
Intensive up-sampling process of convolution can restore the information of wisp, and the combination of both is conducive to comprehensively, accurately and efficiently know
Indescribably take the profile of the object example and object example in image data.
Further, similar with Fig. 4, method shown in fig. 5 can also include step 103, and which is not described herein again.
Fig. 8 and Fig. 9 respectively illustrates the comparative situation of the output image of method shown in application drawing 1 respectively and Fig. 5.In Fig. 8
In, be from left to right input picture, the output image of method shown in truthful data, application drawing 1, method shown in application drawing 5 it is defeated
Image out.It can be in fig. 8, it will be seen that output figure of the output image of method shown in application drawing 5 compared to method shown in application drawing 1
Picture, closer to truthful data in terms of the identification of wisp.In Fig. 9, the first behavior truthful data, the second behavior application drawing
The input data of method shown in 1, the output image of method shown in third behavior application drawing 5.It can be seen that in Fig. 9, application drawing 5
The output image of shown method can be more in terms of the identification of big contour of object compared to the output image of method shown in application drawing 1
Efficiently against grid effect, closer to truthful data.
The example using method provided by the embodiments of the present application is also shown in another group of 10~13b of image datagram.Figure 10
For original input picture, Figure 11 is to extract the characteristic pattern obtained after feature, Figure 12 prior art to input picture shown in Fig. 10
In object detection technology the schematic diagram of object is marked using bounding box, Figure 13 a is to apply object provided by the embodiments of the present application
The object example profile diagram obtained in advance after body closed outline detection method.
Wherein, the object of plurality of classes is marked out in Figure 11 by different colors.But in Figure 10, single body is real
The information of example rank has been lost, such as all automobiles are noted as identical color namely blue, and are noted as " vapour
Vehicle " classification.But identify whole object examples in a traffic environment, each automobile, bus, pedestrian and voluntarily
Vehicle is very crucial for safely effectively automated driving system.It may to the detection failure of an object example
It will lead to the disabler or classification error of the motion planning module of autonomous driving vehicle, so as to cause a series of accident.
Semantic segmentation frame provides the object mark of pixel scale, but can not identify individually only according to semantic segmentation technology
The other object of instance-level.
Figure 12 is the schematic diagram for marking object using bounding box using traditional object detection frame.Traditional object inspection
It surveys frame and is able to use bounding box although to mark object, but the shape of object can not be restored or handle the closing of object
The problem of contour detecting.Particularly, due to the limitation of bounding box fusion treatment in traditional object detection frame, in order to reduce vacation
Positive rate, bounding box be close, mark different objects example may be fused together, so as to cause that can not detect object
The closed outline of body or object example, especially when the object that is blocked is very big.As shown in figure 12, traditional object inspection
Survey shape or profile that frame restores different objects or different objects example using the bounding box of rectangle.To merge
During the bounding box of object and its neighbouring object, the object being blocked or the object example being blocked may detected
It is lost in journey.
Figure 13 a shows the output image using object closed outline detection method provided by the embodiments of the present application.This Shen
Please embodiment provide object closed outline detection method based on one it is assumed that namely a particular category object have it is similar
Global shape, the detection of profile and boundary line for the object of the same category, have consistent planform.Such as Figure 13 a
Shown in, the enclosed edge boundary line along the vehicle of the stop in roadside has similar width and direction.If can be calculated using one
Model learning to this structural information, we can restore contour of object and enclosed edge boundary line, and detect and be blocked
Object.In the embodiment of the present application, the task of contour of object detection can be treated as a semantic segmentation task, wherein original
Input picture and the mark image of output are image data, so as to implement object on the semantic segmentation frame of pixel scale
Body contour detecting.Particularly, the embodiment of the present application proposes method as shown in Figure 1.Convolution is intensively up-sampled shown in Fig. 1
Processing is suitable for contour of object and detects, and reason is: 1), intensive up-sampling be suitable for restoring the shape of object, 2), it is intensive on
Sampling can reach higher accuracy rate compared to the coding/decoding method of bilinearity up-sampling etc., other coding/decoding methods for
Width is easily lost in 8 pixel objects below, 3) profile for the object, being resumed cannot be too thick, otherwise may be by object
It fogs.Intensive up-sampling can decode the profile of any width, and other methods of such as bilinearity up-sampling can only be restored
The wide profile of at least eight pixel.It as depicted in fig. 13 a, can be quasi- from input picture using method provided by the embodiments of the present application
It really detects and obtains the other object segmentation of instance-level.Figure 13 b show the occluding contour of the object of extraction is superimposed upon it is defeated
Enter the effect of visualization on image.
The example using method provided by the embodiments of the present application is also shown in another group of image datagram 14~18.Figure 14 is
Original input picture, Figure 15 are to extract the characteristic pattern obtained after feature to input picture shown in Figure 14, and Figure 16 is in the prior art
Object detection technology the schematic diagram of object is marked using bounding box, Figure 17 is that application object provided by the embodiments of the present application seals
The object example profile diagram extracted after profile testing method is closed, Figure 18 is by the contour of object information superposition in Figure 17 to figure
Effect of visualization schematic diagram on input picture shown in 14.
As can be seen from Figure 17, it using method provided by the embodiments of the present application, can accurately detect each independent
Object example shape, and the adjacent object that blocks also is not lost.Once detection obtains the shape of each independent object
The profile of object can be added on input picture as shown in figure 14 by shape and profile, to form visual expression, and be automatic
The control system of driving provides accurately and effectively object information.
Based on identical inventive concept, the embodiment of the present application also provides a kind of detection devices of object closed outline.
Figure 19 shows the structural block diagram of the detection device of object closed outline provided by the embodiments of the present application, comprising:
Intensive up-sampling convolution module 91, for being exported to cataloged procedure in the decoding process that semantic segmentation is handled
Characteristic pattern carries out intensive up-sampling process of convolution, obtains size output image identical with input image size, exports in image
Occluding contour including object example;
Profile extraction module 92, for identifying and extracting the envelope of object example from output image according to pixel class
Close contour line.
Wherein, intensively up-sampling convolution module 91 is specifically used for: the port number (c) of characteristic pattern is switched in cataloged procedure
The down-sampling factor (d2) and predetermined object category quantity (L) product;Characteristic pattern after number of channels conversion is combined,
And the characteristic pattern after combination is normalized, obtain size output image identical with input image size.
The port number (c) of characteristic pattern is switched to the down-sampling factor (d in cataloged procedure by intensive up-sampling convolution module 912)
With the product of the quantity (L) of predetermined object category, comprising: according to the port number (c) of characteristic pattern and the down-sampling factor and pre- earnest
Product (the d of the quantity of body classification2* L) number ratios relationship, the characteristic pattern on each channel is learnt, is turned
Port number (the d of characteristic pattern after changing2*L)。
Intensive up-sampling convolution module 91 is combined the characteristic pattern after number of channels conversion, comprising: adopts according to feature
Collection obtains the sequence and channel sequence of characteristic pattern, is combined to the characteristic pattern after number of channels conversion.
On the basis of Figure 19 shown device, as shown in figure 20, device provided by the embodiments of the present application can also be further
Include:
Determining module 93 determines shape, the size of object example for the occluding contour according to the object example of extraction
And/or area.
Based on identical inventive concept, on the basis of Figure 19 shown device, as shown in figure 21, the embodiment of the present application is provided
Device can further include:
Mixing expands convolution module 90, for being carried out in the cataloged procedure that semantic segmentation is handled to the characteristic pattern of extraction
Multiple mixing expands process of convolution, obtains the characteristic pattern for expanding receptive field.
Wherein, mixing expands convolution module 90 and is specifically used for: on each convolutional layer in multiple expansion convolutional layers, using
Based on K*K size and amplification degree is riConvolution kernel, expansion process of convolution is carried out to characteristic pattern;Wherein, 1≤i≤n, n are volume
The number of plies of lamination.
In some embodiments, mixing expands convolution module 90 and is also used to multiple expansion convolutional layers being divided into several groups, often
The amplification degree for respectively expanding convolutional layer in group is constantly incremented by.
In some embodiments, each convolution kernel for expanding convolutional layer and there is any amplification degree.
In some embodiments, the transformation factor that amplification degree is incremented by every time is different.
In some embodiments, the size that the last layer expands the receptive field of the expansion convolution kernel of convolutional layer is less than or equal to
The size of characteristic pattern.
As shown in figure 22, on the basis of Figure 21 shown device, device provided by the embodiments of the present application can also be further
Include:
First pre-training module 94, for obtaining semantic segmentation model, language previously according to truthful data training neural network
Mixing in adopted parted pattern including coding stage expands convolutional layer.
In some embodiments, the first pre-training module 94 trains full convolutional network previously according to truthful data end-to-endly
Obtain semantic segmentation model
As shown in figure 23, on the basis of Figure 19 shown device, device provided by the embodiments of the present application can also be further
Include:
Second pre-training module 95, for obtaining semantic segmentation model, language previously according to truthful data training neural network
It include the intensive up-sampling convolutional layer of decoding stage in adopted parted pattern.
In some embodiments, the second pre-training module 95 trains full convolutional network previously according to truthful data end-to-endly
Obtain semantic segmentation model.
According to above-mentioned apparatus provided by the embodiments of the present application, mixing, which expands convolution module, can effectively expand characteristic pattern
The shape and profile of receptive field, the big object of identification, the intensive convolution module that up-samples can restore the information of wisp, by this two
Person can comprehensively, accurately and efficiently identify the profile for extracting object example and object example in image data.
Based on identical inventive concept, the embodiment of the present application also provides a kind of detection devices of object closed outline.
Figure 24 shows the detection device of object closed outline provided by the embodiments of the present application, including a processor 2401
With at least one processor 2402, at least one machine-executable instruction, processor are stored at least one processor 2402
2401 execute at least one machine-executable instruction to execute:
In the decoding process of semantic segmentation processing, the characteristic pattern of cataloged procedure output is carried out at intensive up-sampling convolution
Reason obtains size output image identical with input image size, exports the occluding contour in image including object example;
According to pixel class, the occluding contour of object example is identified and extracted from output image.
Wherein, processor 2401 executes at least one machine-executable instruction to execute the characteristic pattern to cataloged procedure output
Intensive up-sampling process of convolution is carried out, size output image identical with input image size is obtained, comprising: by the logical of characteristic pattern
Road number (c) switchs to the down-sampling factor (d in cataloged procedure2) and predetermined object category quantity (L) product;To number of channels
Characteristic pattern after conversion is combined, and the characteristic pattern after combination is normalized, and obtains size and input picture ruler
Very little identical output image.
Processor 2401 executes at least one machine-executable instruction and switchs to encode with the port number (c) for executing characteristic pattern
The down-sampling factor (d in the process2) and predetermined object category quantity (L) product, comprising: according to the port number of characteristic pattern
(c) and the product (d of the quantity of the down-sampling factor and predetermined object category2* L) number ratios relationship, on each channel
Characteristic pattern learnt, the port number (d of the characteristic pattern after being converted2*L)。
Processor 2401 execute at least one machine-executable instruction with execute to number of channels conversion after characteristic pattern into
Row combination, comprising: obtain the sequence and channel sequence of characteristic pattern according to collection apparatus, to number of channels conversion after characteristic pattern into
Row combination.
In some embodiments, processor 2401 executes at least one machine-executable instruction and also executes: previously according to true
Real data training neural network obtains semantic segmentation model, includes the intensive up-sampling convolution of decoding stage in semantic segmentation model
Layer.
At least one machine-executable instruction of execution of processor 2401 is also executed instructs end-to-endly previously according to truthful data
Practice full convolutional network and obtains semantic segmentation model.
In further embodiments, processor 2401 executes at least one machine-executable instruction and also executes: in semanteme point
It cuts in the cataloged procedure of processing, multiple mixing is carried out to the characteristic pattern of extraction and expands process of convolution, obtains expanding receptive field
Characteristic pattern.
Wherein, processor 2401 executes at least one machine-executable instruction to execute the characteristic pattern progress to extraction repeatedly
Mixing expand process of convolution, obtain the characteristic pattern for expanding receptive field, comprising: it is multiple expansion convolutional layers in each convolution
On layer, using based on K*K size and amplification degree is riConvolution kernel, expansion process of convolution is carried out to characteristic pattern;Wherein, 1≤i
≤ n, n are the number of plies of convolutional layer.
In some embodiments, processor 2401 executes at least one machine-executable instruction and also executes multiple expansions volume
Lamination is divided into several groups, and the amplification degree for expanding convolutional layer in every group is constantly incremented by.
In some embodiments, each convolution kernel for expanding convolutional layer and there is any amplification degree.
In some embodiments, the transformation factor that amplification degree is incremented by every time is different.
In some embodiments, the size that the last layer expands the receptive field of the expansion convolution kernel of convolutional layer is less than or equal to
The size of characteristic pattern.
In some embodiments, processor 2401 executes at least one machine-executable instruction and also executes: previously according to true
Real data training neural network obtains semantic segmentation model, includes the mixing expansion convolution of coding stage in semantic segmentation model
Layer.
At least one machine-executable instruction of execution of processor 2401 is also executed instructs end-to-endly previously according to truthful data
Practice full convolutional network and obtains semantic segmentation model.
According to above-mentioned apparatus provided by the embodiments of the present application, mixing, which expands process of convolution, can effectively expand characteristic pattern
The shape and profile of receptive field, the big object of identification, the intensive process of convolution that up-samples can restore the information of wisp, by this two
Person can comprehensively, accurately and efficiently identify the profile for extracting object example and object example in image data.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.