CN109426825A

CN109426825A - A kind of detection method and device of object closed outline

Info

Publication number: CN109426825A
Application number: CN201810722257.6A
Authority: CN
Inventors: 王泮渠; 陈鹏飞; 黄泽铧
Original assignee: Beijing Tusimple Future Technology Co Ltd
Current assignee: Beijing Tusimple Future Technology Co Ltd
Priority date: 2017-08-31
Filing date: 2018-06-29
Publication date: 2019-03-05
Also published as: CN114782705A

Abstract

The invention discloses a method and a device for detecting the closed contour of an object. The method includes: in the decoding process of the semantic segmentation processing, the object closed contour detection device performs dense up-sampling convolution processing on the feature map output from the encoding process to obtain an output image with the same size as the input image, and the output image includes object instances The closed contour of the object instance is identified and extracted from the output image according to the pixel category. The method can recover the information of the subtle parts or small objects in the image data, and make up for the problem that the information of the small objects is lost and cannot be recovered due to downsampling in the encoding process.

Description

A kind of detection method and device of object closed outline

Technical field

The present invention relates to computer vision field, in particular to a kind of the detection method and device of object closed outline.

Background technique

The processing of image data has important role for fields such as automatic Pilots.Semantic segmentation is a kind of according to image The technology of data progress object identification.Semantic segmentation is that each of image data pixel divides a classification.

Contour of object detection is the underlying issue in many visual tasks, including image segmentation, object detection, example semantic Segmentation and closed outline speculate.For the correct operation of an automated driving system, institute in a traffic environment is detected There is object to be very important, these objects can be automobile, bus, pedestrian and bicycle.For an object (such as One automobile or a people) detection unsuccessfully may result in an automatic driving vehicle motion planning system failure, To cause a series of accident.

Semantic segmentation frame provides the classification annotation of pixel scale, but it is other to carry out single object instance-level Mark.Current object detection frame there are problems that the shape of object can not be restored or can not handle closed occupancy detection. This is primarily due to the limitation of (bounding box) the fusion treatment bring of bounding box in conventional frame.Especially near bounding box Belong to after other different classes of objects are fused together to reduce false positive rate in the case where, can bring detect and hidden The problem of block material body.

That is, in the prior art, asking for object closed outline in image data can not accurately and effectively be detected by existing Topic.

Summary of the invention

In view of this, the embodiment of the invention provides the detection methods and device of a kind of object closed outline, to solve The problem of can not accurately and effectively detecting contour of object in image data in the prior art.

On the one hand, the embodiment of the present application provides a kind of detection method of object closed outline, comprising:

Object closed outline detection device is in the decoding process that semantic segmentation is handled, to the characteristic pattern of cataloged procedure output Intensive up-sampling process of convolution is carried out, size output image identical with input image size is obtained, exporting includes object in image The occluding contour of body example；

According to pixel class, the occluding contour of object example is identified and extracted from output image.

On the other hand, the embodiment of the present application provides a kind of detection device of object closed outline, comprising:

Intensive up-sampling convolution module, the spy for being exported to cataloged procedure in the decoding process that semantic segmentation is handled Sign figure carries out intensive up-sampling process of convolution, obtains size output image identical with input image size, exports and wrap in image Include the occluding contour of object example；

Profile extraction module, for identifying and extracting the closing of object example from output image according to pixel class Contour line.

On the other hand, the embodiment of the present application provides a kind of detection device of object closed outline, including a processor And at least one processor, at least one machine-executable instruction is stored at least one processor, processor executes at least One machine-executable instruction is to execute:

In the decoding process of semantic segmentation processing, the characteristic pattern of cataloged procedure output is carried out at intensive up-sampling convolution Reason obtains size output image identical with input image size, exports the occluding contour in image including object example；

According to technical solution provided by the embodiments of the present application, the characteristic pattern of effectively size compression can be restored to The identical image data of input image data size, and according to the quantity of the down-sampling factor and scheduled object category, to spy The port number of sign figure is converted, and the more intensive characteristic pattern of quantity is obtained, can be to more by the more dense characteristic pattern of quantity The classification of more pixels is predicted, so as to restore the information of discreet portions or wisp in image data, makes up coding The problem of wisp information caused by process down-sampling loses and can not be restored by bilinear interpolation.It is able to solve existing There is the problem of contour of object in image data can not be accurately and effectively detected in technology.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.

Fig. 1 is the process flow diagram of the detection method of object closed outline provided by the embodiments of the present application；

Fig. 2 is the process flow diagram of step 101 in Fig. 1；

Fig. 3 is the image data example handled using method shown in Fig. 1 input picture；

Fig. 4 is another process flow diagram of the detection method of object closed outline provided by the embodiments of the present application；

Fig. 5 is another process flow diagram of the detection method of object closed outline provided by the embodiments of the present application；

Fig. 6 is the schematic diagram for expanding convolution kernel；

Fig. 7 is the network architecture schematic diagram for realizing the semantic segmentation model of method shown in Fig. 5 in a particular application；

Fig. 8 is the image data example of method shown in application drawing 1；

Fig. 9 is the image data example of method shown in application drawing 5；

Figure 10 is an original input picture in a concrete application scene；

Figure 11 is to extract the characteristic pattern obtained after feature to input picture shown in Fig. 10；

Figure 12 is to be marked using object detection technology in the prior art using bounding box to input picture shown in Fig. 10 Infuse the schematic diagram of object；

Figure 13 a is to apply object closed outline detection method provided by the embodiments of the present application to input picture shown in Fig. 10 The object example profile diagram obtained in advance afterwards；

Figure 13 b is the effect of visualization figure after stacking chart 13a and Figure 10；

Figure 14 is an original input picture in another concrete application scene；

Figure 15 is to extract the feature illustrated example obtained after feature to input picture shown in Figure 14；

Figure 16 is to be marked using object detection technology in the prior art using bounding box to input picture shown in Figure 14 Infuse the schematic diagram of object；

Figure 17 is to apply object closed outline detection method provided by the embodiments of the present application to input picture shown in Figure 14 The object example profile diagram extracted afterwards；

Figure 18 is to be superimposed the effect of visualization figure after Figure 17 and Figure 14；

Figure 19 is the structural block diagram of the detection device of object closed outline provided by the embodiments of the present application；

Figure 20 is another structural block diagram of the detection device of object closed outline provided by the embodiments of the present application；

Figure 21 is another structural block diagram of the detection device of object closed outline provided by the embodiments of the present application；

Figure 22 is another structural block diagram of the detection device of object closed outline provided by the embodiments of the present application；

Figure 23 is another structural block diagram of the detection device of object closed outline provided by the embodiments of the present application；

Figure 24 is another structural block diagram of the detection device of object closed outline provided by the embodiments of the present application.

Specific embodiment

Technical solution in order to enable those skilled in the art to better understand the present invention, below in conjunction with of the invention real The attached drawing in example is applied, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described implementation Example is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this field is common Technical staff's every other embodiment obtained without making creative work, all should belong to protection of the present invention Range.

In the prior art at present, semantic segmentation processing generally includes two parts: character representation decodes (Decoding Of Feature Representation) and expansion convolution (Dilated Convolution).

The semantic segmentation information of pixel scale can be obtained by character representation decoding, the characteristic pattern of output has and input The identical size of image.Since the maximization Chi Huahe in convolutional neural networks strides convolution operation, inevitably cause most The size reduction of the characteristic pattern of several layer networks afterwards can decode the characteristic pattern of low resolution for this problem kinds of schemes To accurate information.Common bilinear interpolation can save memory space and processing speed is fast.The method of deconvolution uses Pond location information in pondization processing, necessary information required for Lai Huifu image reconstruction and characteristic visual.In some examples In son, an individual uncoiling lamination is added in decoding stage, generates prediction result using the characteristic pattern that middle layer stacks. In other examples, target object, such as chair, desk or automobile are generated in multiple features using multiple uncoiling laminations. By the pond position in upper storage reservoir (unpooling) the step of using storage in some researchs, using uncoiling lamination as rolling up The mirror-image structure of lamination.Some other researchs illustrate in the communication process of uncoiling lamination, may be implemented coarse to careful (coarse-to-fine) detection of object structures, these object structures are very crucial for rebuilding subtle information. There are also some researchs to use a similar mirror-image structure, and final to realize in conjunction with the information of uncoiling lamination execution up-sampling Prediction.There are also some systems to come prediction label image and statistics with higher by using the classifier of pixel scale Efficiency.Wherein, bilinear interpolation should be relatively wide.But bilinear interpolation up-sampling is obtained by filling 0 and inputs phase With the output of resolution ratio, it is easily lost detailed information, wisp information is lost, loses the precision of image data, and bilinearity Interpolation does not have learning ability.

Expanding convolution (or being known as empty convolution) is researched and developed for wavelet decomposition earliest.Expand the main core of convolution The heart is to fill the receptive field for " 0 " carrying out enlarged image between the pixel of convolution kernel, so as to realize in deep neural network Dense feature extract.In the frame of semantic segmentation, expansion convolution is also used for expanding the size of convolution kernel.Make in some researchs It realizes that context polymerize (context aggregation) with the serializing layer with cumulative amplification degree, and designs one It is a that " spatial pyramid (Atrous Spatial Pyramid Pooling, ASPP) based on hole is multiple arranged side by side by being arranged Convolutional layer is expanded, to capture multiple dimensioned object and contextual information.Recently, expansion convolution is used for wide range, such as Object detection, visual problem based on light stream are answered and audio generates.But these convolutional system explaineds can be because of standard extension Convolution and lead to the problem of one " grid effect ", lead to not the shape or profile that identify larger-size object.

The above problem as present in semantic segmentation technology namely wisp information lose, can not identify large scale object The shape or profile of body lead to not the closed outline for effectively and accurately extracting to obtain object.

For the above-mentioned problems in the prior art, the embodiment of the present application provides a kind of detection of object closed outline Method and apparatus, to solve the problems, such as this.In the technical solution that some embodiments of the present application provide, by decoding stage Intensive up-sampling process of convolution is carried out to the characteristic pattern of coding stage output, the resolution ratio of forecast image is improved, restores more Detailed information, so as to retain more wisp information, and further detection obtains the profile information of wisp.? In the technical solution that other embodiments provide, multiple mixing is carried out in characteristic pattern of the coding stage to extraction and is expanded at convolution Reason, the local information and long-range information that can retain more convolution can obtain big object so as to overcome grid effect Continuous profile information, so as to detect to obtain the profile information of big object.To technology provided by the embodiments of the present application Scheme can accurately and effectively detect to obtain the profile information of object, be able to solve the above problem in the prior art.

It on the other hand,, can be direct without marking object by bounding box using method provided by the embodiments of the present application Extract the closed outline information of object.In the prior art, the reality of object when marking object, can not be identified by bounding box Border shape, can not the information such as size, area to object accurately inferred；And in the technical field of such as automatic Pilot etc. In, these information will be the key message for carrying out many decisions.Also, some objects can bounding box merge processing in quilt It neglects, object information is caused to be lost.Using method provided by the embodiments of the present application, the disk wheel of object can be directly extracted Exterior feature is inferred or decision mentions to be other so as to information such as the true forms, size, area that further identify object For accurately and effectively information.

It is core of the invention thought above, in order to enable those skilled in the art to better understand the present invention in embodiment Technical solution, and keep the above objects, features, and advantages of the embodiment of the present invention more obvious and easy to understand, with reference to the accompanying drawing Technical solution in the embodiment of the present invention is described in further detail.

Fig. 1 shows the process flow diagram of the detection method of object closed outline provided by the embodiments of the present application, comprising:

Step 101, object closed outline detection device export cataloged procedure in the decoding process that semantic segmentation is handled Characteristic pattern carry out intensive up-sampling process of convolution, obtain size output image identical with input image size, output image In include object example contour line；

Step 102, according to pixel class, identify and extract the contour line of object example from output image.

Wherein, in a step 101, intensive up-sampling process of convolution is carried out to characteristic pattern, as shown in Figure 2, comprising:

The port number c of characteristic pattern is switched to down-sampling factor d in cataloged procedure by step 1011²With predetermined object category Quantity L product；

For example, the input picture size of model is (H, W, C), wherein H is the height of image data, and W is image data Width, C are the port number of image data.It is F by the characteristic pattern size that the processing of cataloged procedure is input to decoding process_out= (h, w, c), wherein H/d=h, W/d=w, d are the down-sampling factor.In the prior art using bilinear interpolation to characteristic pattern into Row up-sampling, if d=16, namely it is input to 16 times of output down-sampling, if the length or width of an object is less than 16 pixels (pixels), such as electric pole, traffic lights, traffic signals or a people of distant place, which will not It is sampled, and bilinear interpolation up-sampling will be unable to restore this information, to lose the object in the output image.

In step 1011 provided by the embodiments of the present application, the intensive process of convolution that up-samples is by characteristic pattern F_outSize in Port number c converted, obtain port number d²* L, d are the down-sampling factor, and L is the quantity of scheduled whole object categories, are obtained To characteristic pattern F_out=(h, w, d²*L)。

It specifically, can be according to the port number d after the port number c of former characteristic pattern and channel conversion²* the number ratios of L close It is d²* L/c, learns the characteristic pattern on each channel, (the d after being learnt²* L) on a channel having a size of h*w's Characteristic pattern, so that each intensive up-sampling convolutional layer is learning the prediction to each pixel.Wherein, on each channel Characteristic pattern is learnt, and is that the learning functionality obtained according to neural network training study in advance is realized.For example, in former characteristic pattern Port number c and scheduled object category the identical situation of quantity L numerical value under, can be by the feature graphics on each channel Practise d²Part, the d after being learnt²* the characteristic pattern having a size of h*w on L channel.

Step 1012 is combined the characteristic pattern after number of channels conversion, and carries out normalizing to the characteristic pattern after combination Change processing, to obtain size output image identical with input image size.

That is, to the characteristic pattern F after number of channels conversion_out=(h, w, d²* L) be combined, obtain having a size of (h*d, W*d, L) characteristic pattern, H/d=h, W/d=w as described above, then, the size of the characteristic pattern after the combination are (H, W, L), namely The size of characteristic pattern is upsampled to size identical with input picture.

Wherein, to the processing of the characteristic pattern combination after number of channels conversion, it can be and obtain characteristic pattern according to collection apparatus Sequence and channel sequence, characteristic pattern is combined.For example, in d²* the characteristic pattern in L channel, on n-th~m channel It is that the feature extracted to the xth row data of input image data presses the characteristic pattern on n-th~m channel when being combined According to pixel order, be successively combined to the xth row of output data, and so on combination to subsequent characteristics figure.Characteristic pattern is carried out Combined processing can be implemented according to the regulation of actual algorithm in concrete application scene, and the application is not done specifically here It limits.

Middle bilinear interpolation method does not have learnability compared with the prior art, does not have deconvolution, the application yet The intensive up-sampling treatment that embodiment provides has learnability, can be before processing shown in Fig. 1, previously according to truthful data Training neural network obtains semantic segmentation model, includes the intensive up-sampling convolutional layer of decoding stage in semantic segmentation model.Its In, the intensive convolutional layer that up-samples may include multiple convolutional layers.Specifically, a series of up-sampling can be arrived by training study Filter, by this series of up-sampling filter to having a size of F_out=(h, w, c) characteristic pattern up-sampling for having a size of F_out=(h, w, d²* L) image data.

And the output image data after up-sampling is normalized by one softmax layers, it obtains to the end Export image.

The example handled using method shown in Fig. 1 input picture is shown in Fig. 3.It is input on the left of Fig. 3 Image, other parts are the output image in the case of the different down-sampling factors from left to right, it can be seen that in input picture Some wisps have obtained good identification, such as electric pole and signal lamp.

During specific implementation, it can be trained study in full convolutional network, obtain above-mentioned semantic segmentation Model.

By intensive up-sampling process of convolution as described above, the characteristic pattern of effectively size compression can be restored to Image data identical with input image data size, and according to the quantity of the down-sampling factor and scheduled object category, it is right The port number of characteristic pattern is converted, and the more intensive characteristic pattern of quantity is obtained, can be right by the more dense characteristic pattern of quantity The classification of more pixels is predicted, so as to restore the information of discreet portions or wisp in image data, makes up volume The problem of wisp information caused by code process down-sampling loses and can not be restored by bilinear interpolation.To this Shen Please embodiment provide method can from output image in extract obtain wisp example information and wisp disk wheel Wide information.

Also, intensive up-sampling process of convolution can be realized directly from characteristic pattern to the processing for exporting mark image, and nothing It needs as first carrying out bilinear interpolation to characteristic pattern in the prior art, and is up-sampled the mark exported to the image of interpolation Image.On the other hand, intensively up-sampling process of convolution is directly handled the characteristic pattern of original resolution, can be realized pixel The decoding of rank.

Also, further as shown in figure 4, on the basis of processing shown in Fig. 1, method provided by the embodiments of the present application is also Include:

Step 103, the occluding contour according to the object example of extraction, determine the shape, size and/or face of object example Product.

Using method provided by the embodiments of the present application, without marking object by bounding box, object can be directly extracted Closed outline information.In the prior art, the true form of object when marking object, can not be identified by bounding box, Can not the information such as size, area to object accurately inferred；And in the technical field of such as automatic Pilot etc., these Information will be the key message for carrying out many decisions.Using method provided by the embodiments of the present application, object can be directly extracted Closed outline, so as to information such as the true forms, size, area that further identify object, infer to be other or Person's decision provides accurately and effectively information.

Based on identical inventive concept, on the basis of the embodiment of the present application method shown in Fig. 1, a kind of object is additionally provided The detection method of closed outline.

Fig. 5 shows the detection method of object closed outline provided by the embodiments of the present application, comprising:

Step 100, object closed outline detection device are in the cataloged procedure that semantic segmentation is handled, to the characteristic pattern of extraction It carries out multiple mixing and expands process of convolution, obtain the characteristic pattern for expanding receptive field；

Step 101, in decoding process, intensive up-sampling process of convolution is carried out to the characteristic pattern of cataloged procedure output, is obtained To size output image identical with input image size, the occluding contour in image including object example is exported；

Step 102, according to pixel class, identify and extract the occluding contour of object example from output image.

Wherein, the processing of above-mentioned steps 100 namely on multiple convolutional layers, using a series of amplification degree to characteristic pattern into Row process of convolution.

In the prior art, expand process of convolution and convolution is carried out to characteristic pattern usually using expansion convolution kernel, to expand spy The receptive field of figure is levied, expanding convolution kernel, building obtains and being inserted into " 0 " between each pixel in convolution kernel.Two dimension is believed Number, convolution kernel size is K*K, and the result by expanding convolution is K_d*K_d, wherein K_d=k+ (k-1) (r-1), r is amplification degree. The receptive field (or being the visual field) of characteristic pattern can be expanded by expanding convolution, can replace the pond layer in full convolutional network framework.Example A convolutional layer such as in ResNet-101 has step-length s=2, then step-length can be reset to 1 to replace down-sampling to operate, And 2 will be set as to subsequent network layer, the rate of will be enlarged by.The network layer of whole progress down-sampling processing is alternately performed Processing is stated, then the characteristic pattern exported can expand receptive field.In practical applications, under expansion process of convolution is typically used in On the characteristic pattern of sampling, to reach reasonable efficiency and expense.But grid effect can be caused by expanding process of convolution.

In the embodiment of the present application, above-mentioned steps 100 can be realized are as follows: each convolutional layer in multiple expansion convolutional layers On, using based on K*K size and amplification degree is r_iConvolution kernel, expansion process of convolution is carried out to characteristic pattern；Wherein, 1≤i≤ N, n are the number of plies of convolutional layer.The processing can by four kinds of at least following modes one of implemented:

Multiple expansion convolutional layers are divided into several groups by mode one, and the amplification degree for expanding convolutional layer in every group is constantly incremented by.

For example, s group can be divided by N layers when expansion convolutional layer has N layers, wherein every group includes at least two layers of convolutional layer, often The size of the convolution kernel used in group uses constantly incremental amplification degree, namely in s group for K*K, r_si-2<r_si-1< r_si.In this way, the amplification degree in every group is constantly incremented by, expanding in convolutional layer first and last in multiple groups, amplification degree changes at sawtooth wave, The convolution kernel of smaller amplification degree can extract local information, and the convolution kernel of larger amplification degree can extract long range information.

Mode two, each convolution kernel for expanding convolutional layer and there is any amplification degree.

Any amplification degree is set for convolution kernel, the receptive field of convolution kernel can be expanded, so as to identify biggish object.

Mode three, on the basis of aforesaid way one or mode two, the transformation factor that amplification degree is incremented by every time is different.

For example, amplification degree r=(1,2,5), the transformation factor that amplification degree is incremented by every time is 1 and 3, namely change incremental every time The factor is changed to be different.

The different multiple amplification degrees of transformation factor are set, one group of expansion convolution kernel can be made to cover more pixels, phase If being instead using the identical amplification degree of transformation factor, such as r=(2,4,6,8), the changed factor that amplification degree is incremented by every time 2, overcome the effect of grid effect weaker in this way.

Mode four, in above-mentioned three kinds of modes on the basis of any mode, the last layer expands the expansion volume of convolutional layer The size of the receptive field of product core is less than or equal to the size of characteristic pattern.

That is, by preset amplification degree, so that the last layer expands the receptive field of the expansion convolution kernel of convolutional layer Size be less than or equal to characteristic pattern size, the receptive field of the last layer convolutional layer can be expanded, especially when expand convolution Under the size of the receptive field of core and the identical situation of the size of characteristic pattern, the receptive field for expanding convolution kernel can cover characteristic pattern Whole region, thus will not lose it is any cavity or edge, can guarantee the consistent and complete of long-range information.

Below in the above described manner for one and the prior art compares explanation.

The example for expanding convolution kernel is shown in Fig. 6.The pixel of surrounding grey is the meter to the pixel of central black in Fig. 6 Count the pixel to contribution.Fig. 6 a is expansion convolution kernel schematic diagram in the prior art.Fig. 6 b is to mention using the embodiment of the present application The mixing of confession expands the convolution kernel schematic diagram of convolution.

The size of convolution kernel is 3*3 in Fig. 6 (a), and amplification degree from left to right is r=2.For expanding in convolutional layer One pixel p, contributive to its is upper one layer of K centered on p_d*K_dClose region because expand convolution introduce 0 Value, in K_d*K_dRegion only calculate K*K pixel, between non-zero pixel between be divided into r-1.Such as in k=3, the expansion of r=2 In big convolution, as shown in the figure on the left side Fig. 6 a, only have 9 pixels to be made that contribution in 25 pixels.Since all layers all have There is identical amplification degree r, for the point p of top expanded in convolutional layer, the maximum possible of contribution is played to the calculating of p point The quantity of pixel is (w ' * h ')/r², wherein w ' and h ' is the width and height for the characteristic pattern that bottom expands convolutional layer respectively.From And p point can only check its information in the form of tessellated in the characteristic pattern of top layer, will lead to losing for bulk information in this way It loses (when r=2, about 75% information can be lost).When the r in higher convolutional layer becomes increasing, can make The data sampled from input are more and more sparse, are unfavorable for convolution study, this is because: 1) the complete loss of local message；2) It is too far uncorrelated between information.Another is ceased as a result, being inscribed to collect mail from the region of r*r from entirely different " grid " set, This will damage the consistency of local information.

In the convolution kernel shown in Fig. 6 (b), using aforesaid way one, several convolutional layers are divided into one group, each group Amplification degree be constantly incremented by, such as K=3, r=(1,2,3), so that the transformation of amplification degree is similar to the shape of sawtooth wave, in this way Local information can be obtained in the bottom on the left side, and the information in broader region can be obtained in top layer on the right.No The segmentation requirement of wisp and big object can be taken into account with the combination of amplification degree, i.e., lesser amplification degree extracts local information, compared with Big amplification degree extracts long range information.

The above embodiment for expanding convolution by mixing can rolled up by the way that a series of convolution kernel of amplification degrees is arranged So that expansion convolution kernel is covered more pixels as far as possible during product, takes into account and extract local information and long-range information.Also, with expansion The range of the receptive field covering of big convolution kernel is bigger, and the cavity of loss and marginal information are fewer, can guarantee long-range information It is consistent and complete, it can be efficiently against grid effect, so as to obtain complete, the closed shape and wheel of big object It is wide.

On the other hand, there is learnability since mixing expands process of convolution, it can be before processing shown in Fig. 5, in advance Semantic segmentation model is obtained according to truthful data training neural network, includes the mixing expansion of coding stage in semantic segmentation model Convolutional layer.

Fig. 7 is shown in a particular application, the semanteme of method shown in the accomplished Fig. 5 of training on ResNet-101 framework The network architecture of parted pattern.In coding stage in Fig. 7, multiple mixing expand convolutional layer and mix to the characteristic pattern extracted It closes and expands process of convolution, in decoding stage, the characteristic pattern that multiple intensive up-sampling convolutional layers export coding stage is handled, The mark image exported.

Method shown in Fig. 5 is that the mixing in cataloged procedure expands process of convolution and decodes at intensive up-sampling convolution in the process The combination of reason, mixing, which expands process of convolution, can effectively expand the shape and profile of the receptive field of characteristic pattern, the big object of identification, Intensive up-sampling process of convolution can restore the information of wisp, and the combination of both is conducive to comprehensively, accurately and efficiently know Indescribably take the profile of the object example and object example in image data.

Further, similar with Fig. 4, method shown in fig. 5 can also include step 103, and which is not described herein again.

Fig. 8 and Fig. 9 respectively illustrates the comparative situation of the output image of method shown in application drawing 1 respectively and Fig. 5.In Fig. 8 In, be from left to right input picture, the output image of method shown in truthful data, application drawing 1, method shown in application drawing 5 it is defeated Image out.It can be in fig. 8, it will be seen that output figure of the output image of method shown in application drawing 5 compared to method shown in application drawing 1 Picture, closer to truthful data in terms of the identification of wisp.In Fig. 9, the first behavior truthful data, the second behavior application drawing The input data of method shown in 1, the output image of method shown in third behavior application drawing 5.It can be seen that in Fig. 9, application drawing 5 The output image of shown method can be more in terms of the identification of big contour of object compared to the output image of method shown in application drawing 1 Efficiently against grid effect, closer to truthful data.

The example using method provided by the embodiments of the present application is also shown in another group of 10~13b of image datagram.Figure 10 For original input picture, Figure 11 is to extract the characteristic pattern obtained after feature, Figure 12 prior art to input picture shown in Fig. 10 In object detection technology the schematic diagram of object is marked using bounding box, Figure 13 a is to apply object provided by the embodiments of the present application The object example profile diagram obtained in advance after body closed outline detection method.

Wherein, the object of plurality of classes is marked out in Figure 11 by different colors.But in Figure 10, single body is real The information of example rank has been lost, such as all automobiles are noted as identical color namely blue, and are noted as " vapour Vehicle " classification.But identify whole object examples in a traffic environment, each automobile, bus, pedestrian and voluntarily Vehicle is very crucial for safely effectively automated driving system.It may to the detection failure of an object example It will lead to the disabler or classification error of the motion planning module of autonomous driving vehicle, so as to cause a series of accident. Semantic segmentation frame provides the object mark of pixel scale, but can not identify individually only according to semantic segmentation technology The other object of instance-level.

Figure 12 is the schematic diagram for marking object using bounding box using traditional object detection frame.Traditional object inspection It surveys frame and is able to use bounding box although to mark object, but the shape of object can not be restored or handle the closing of object The problem of contour detecting.Particularly, due to the limitation of bounding box fusion treatment in traditional object detection frame, in order to reduce vacation Positive rate, bounding box be close, mark different objects example may be fused together, so as to cause that can not detect object The closed outline of body or object example, especially when the object that is blocked is very big.As shown in figure 12, traditional object inspection Survey shape or profile that frame restores different objects or different objects example using the bounding box of rectangle.To merge During the bounding box of object and its neighbouring object, the object being blocked or the object example being blocked may detected It is lost in journey.

Figure 13 a shows the output image using object closed outline detection method provided by the embodiments of the present application.This Shen Please embodiment provide object closed outline detection method based on one it is assumed that namely a particular category object have it is similar Global shape, the detection of profile and boundary line for the object of the same category, have consistent planform.Such as Figure 13 a Shown in, the enclosed edge boundary line along the vehicle of the stop in roadside has similar width and direction.If can be calculated using one Model learning to this structural information, we can restore contour of object and enclosed edge boundary line, and detect and be blocked Object.In the embodiment of the present application, the task of contour of object detection can be treated as a semantic segmentation task, wherein original Input picture and the mark image of output are image data, so as to implement object on the semantic segmentation frame of pixel scale Body contour detecting.Particularly, the embodiment of the present application proposes method as shown in Figure 1.Convolution is intensively up-sampled shown in Fig. 1 Processing is suitable for contour of object and detects, and reason is: 1), intensive up-sampling be suitable for restoring the shape of object, 2), it is intensive on Sampling can reach higher accuracy rate compared to the coding/decoding method of bilinearity up-sampling etc., other coding/decoding methods for Width is easily lost in 8 pixel objects below, 3) profile for the object, being resumed cannot be too thick, otherwise may be by object It fogs.Intensive up-sampling can decode the profile of any width, and other methods of such as bilinearity up-sampling can only be restored The wide profile of at least eight pixel.It as depicted in fig. 13 a, can be quasi- from input picture using method provided by the embodiments of the present application It really detects and obtains the other object segmentation of instance-level.Figure 13 b show the occluding contour of the object of extraction is superimposed upon it is defeated Enter the effect of visualization on image.

The example using method provided by the embodiments of the present application is also shown in another group of image datagram 14~18.Figure 14 is Original input picture, Figure 15 are to extract the characteristic pattern obtained after feature to input picture shown in Figure 14, and Figure 16 is in the prior art Object detection technology the schematic diagram of object is marked using bounding box, Figure 17 is that application object provided by the embodiments of the present application seals The object example profile diagram extracted after profile testing method is closed, Figure 18 is by the contour of object information superposition in Figure 17 to figure Effect of visualization schematic diagram on input picture shown in 14.

As can be seen from Figure 17, it using method provided by the embodiments of the present application, can accurately detect each independent Object example shape, and the adjacent object that blocks also is not lost.Once detection obtains the shape of each independent object The profile of object can be added on input picture as shown in figure 14 by shape and profile, to form visual expression, and be automatic The control system of driving provides accurately and effectively object information.

Based on identical inventive concept, the embodiment of the present application also provides a kind of detection devices of object closed outline.

Figure 19 shows the structural block diagram of the detection device of object closed outline provided by the embodiments of the present application, comprising:

Intensive up-sampling convolution module 91, for being exported to cataloged procedure in the decoding process that semantic segmentation is handled Characteristic pattern carries out intensive up-sampling process of convolution, obtains size output image identical with input image size, exports in image Occluding contour including object example；

Profile extraction module 92, for identifying and extracting the envelope of object example from output image according to pixel class Close contour line.

Wherein, intensively up-sampling convolution module 91 is specifically used for: the port number (c) of characteristic pattern is switched in cataloged procedure The down-sampling factor (d²) and predetermined object category quantity (L) product；Characteristic pattern after number of channels conversion is combined, And the characteristic pattern after combination is normalized, obtain size output image identical with input image size.

The port number (c) of characteristic pattern is switched to the down-sampling factor (d in cataloged procedure by intensive up-sampling convolution module 91²) With the product of the quantity (L) of predetermined object category, comprising: according to the port number (c) of characteristic pattern and the down-sampling factor and pre- earnest Product (the d of the quantity of body classification²* L) number ratios relationship, the characteristic pattern on each channel is learnt, is turned Port number (the d of characteristic pattern after changing²*L)。

Intensive up-sampling convolution module 91 is combined the characteristic pattern after number of channels conversion, comprising: adopts according to feature Collection obtains the sequence and channel sequence of characteristic pattern, is combined to the characteristic pattern after number of channels conversion.

On the basis of Figure 19 shown device, as shown in figure 20, device provided by the embodiments of the present application can also be further Include:

Determining module 93 determines shape, the size of object example for the occluding contour according to the object example of extraction And/or area.

Based on identical inventive concept, on the basis of Figure 19 shown device, as shown in figure 21, the embodiment of the present application is provided Device can further include:

Mixing expands convolution module 90, for being carried out in the cataloged procedure that semantic segmentation is handled to the characteristic pattern of extraction Multiple mixing expands process of convolution, obtains the characteristic pattern for expanding receptive field.

Wherein, mixing expands convolution module 90 and is specifically used for: on each convolutional layer in multiple expansion convolutional layers, using Based on K*K size and amplification degree is r_iConvolution kernel, expansion process of convolution is carried out to characteristic pattern；Wherein, 1≤i≤n, n are volume The number of plies of lamination.

In some embodiments, mixing expands convolution module 90 and is also used to multiple expansion convolutional layers being divided into several groups, often The amplification degree for respectively expanding convolutional layer in group is constantly incremented by.

In some embodiments, each convolution kernel for expanding convolutional layer and there is any amplification degree.

In some embodiments, the transformation factor that amplification degree is incremented by every time is different.

In some embodiments, the size that the last layer expands the receptive field of the expansion convolution kernel of convolutional layer is less than or equal to The size of characteristic pattern.

As shown in figure 22, on the basis of Figure 21 shown device, device provided by the embodiments of the present application can also be further Include:

First pre-training module 94, for obtaining semantic segmentation model, language previously according to truthful data training neural network Mixing in adopted parted pattern including coding stage expands convolutional layer.

In some embodiments, the first pre-training module 94 trains full convolutional network previously according to truthful data end-to-endly Obtain semantic segmentation model

As shown in figure 23, on the basis of Figure 19 shown device, device provided by the embodiments of the present application can also be further Include:

Second pre-training module 95, for obtaining semantic segmentation model, language previously according to truthful data training neural network It include the intensive up-sampling convolutional layer of decoding stage in adopted parted pattern.

In some embodiments, the second pre-training module 95 trains full convolutional network previously according to truthful data end-to-endly Obtain semantic segmentation model.

According to above-mentioned apparatus provided by the embodiments of the present application, mixing, which expands convolution module, can effectively expand characteristic pattern The shape and profile of receptive field, the big object of identification, the intensive convolution module that up-samples can restore the information of wisp, by this two Person can comprehensively, accurately and efficiently identify the profile for extracting object example and object example in image data.

Figure 24 shows the detection device of object closed outline provided by the embodiments of the present application, including a processor 2401 With at least one processor 2402, at least one machine-executable instruction, processor are stored at least one processor 2402 2401 execute at least one machine-executable instruction to execute:

Wherein, processor 2401 executes at least one machine-executable instruction to execute the characteristic pattern to cataloged procedure output Intensive up-sampling process of convolution is carried out, size output image identical with input image size is obtained, comprising: by the logical of characteristic pattern Road number (c) switchs to the down-sampling factor (d in cataloged procedure²) and predetermined object category quantity (L) product；To number of channels Characteristic pattern after conversion is combined, and the characteristic pattern after combination is normalized, and obtains size and input picture ruler Very little identical output image.

Processor 2401 executes at least one machine-executable instruction and switchs to encode with the port number (c) for executing characteristic pattern The down-sampling factor (d in the process²) and predetermined object category quantity (L) product, comprising: according to the port number of characteristic pattern (c) and the product (d of the quantity of the down-sampling factor and predetermined object category²* L) number ratios relationship, on each channel Characteristic pattern learnt, the port number (d of the characteristic pattern after being converted²*L)。

Processor 2401 execute at least one machine-executable instruction with execute to number of channels conversion after characteristic pattern into Row combination, comprising: obtain the sequence and channel sequence of characteristic pattern according to collection apparatus, to number of channels conversion after characteristic pattern into Row combination.

In some embodiments, processor 2401 executes at least one machine-executable instruction and also executes: previously according to true Real data training neural network obtains semantic segmentation model, includes the intensive up-sampling convolution of decoding stage in semantic segmentation model Layer.

At least one machine-executable instruction of execution of processor 2401 is also executed instructs end-to-endly previously according to truthful data Practice full convolutional network and obtains semantic segmentation model.

In further embodiments, processor 2401 executes at least one machine-executable instruction and also executes: in semanteme point It cuts in the cataloged procedure of processing, multiple mixing is carried out to the characteristic pattern of extraction and expands process of convolution, obtains expanding receptive field Characteristic pattern.

Wherein, processor 2401 executes at least one machine-executable instruction to execute the characteristic pattern progress to extraction repeatedly Mixing expand process of convolution, obtain the characteristic pattern for expanding receptive field, comprising: it is multiple expansion convolutional layers in each convolution On layer, using based on K*K size and amplification degree is r_iConvolution kernel, expansion process of convolution is carried out to characteristic pattern；Wherein, 1≤i ≤ n, n are the number of plies of convolutional layer.

In some embodiments, processor 2401 executes at least one machine-executable instruction and also executes multiple expansions volume Lamination is divided into several groups, and the amplification degree for expanding convolutional layer in every group is constantly incremented by.

In some embodiments, processor 2401 executes at least one machine-executable instruction and also executes: previously according to true Real data training neural network obtains semantic segmentation model, includes the mixing expansion convolution of coding stage in semantic segmentation model Layer.

According to above-mentioned apparatus provided by the embodiments of the present application, mixing, which expands process of convolution, can effectively expand characteristic pattern The shape and profile of receptive field, the big object of identification, the intensive process of convolution that up-samples can restore the information of wisp, by this two Person can comprehensively, accurately and efficiently identify the profile for extracting object example and object example in image data.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of detection method of object closed outline characterized by comprising

Object closed outline detection device carries out the characteristic pattern of cataloged procedure output in the decoding process that semantic segmentation is handled Intensive up-sampling process of convolution, obtains size output image identical with input image size, and exporting in image includes that object is real The occluding contour of example；

2. the method according to claim 1, wherein intensively being up-sampled to the characteristic pattern of cataloged procedure output Process of convolution obtains size output image identical with input image size, comprising:

The port number (c) of characteristic pattern is switched to the down-sampling factor (d in cataloged procedure²) and predetermined object category quantity (L) Product；

Characteristic pattern after number of channels conversion is combined, and the characteristic pattern after combination is normalized, obtains ruler Very little output image identical with input image size.

3. according to the method described in claim 2, it is characterized in that, the port number (c) of characteristic pattern is switched in cataloged procedure The down-sampling factor (d²) and predetermined object category quantity (L) product, comprising:

According to the product (d of the port number (c) of characteristic pattern and the quantity of the down-sampling factor and predetermined object category²* L) numeric ratio Example relationship, learns the characteristic pattern on each channel, the port number (d of the characteristic pattern after being converted²*L)。

4. according to the method described in claim 2, it is characterized in that, being combined to the characteristic pattern after number of channels conversion, packet It includes:

The sequence and channel sequence of characteristic pattern are obtained according to collection apparatus, the characteristic pattern after number of channels conversion is combined.

5. the method according to claim 1, wherein the method also includes:

In the cataloged procedure of semantic segmentation processing, multiple mixing is carried out to the characteristic pattern of extraction and expands process of convolution, is obtained Expand the characteristic pattern of receptive field.

6. according to the method described in claim 5, it is characterized in that, carrying out multiple mixing to the characteristic pattern of extraction expands convolution Processing, obtains the characteristic pattern for expanding receptive field, comprising:

On each convolutional layers in multiple expansion convolutional layers, using based on K*K size and amplification degree is r_iConvolution kernel, to spy Sign figure carries out expansion process of convolution；Wherein, 1≤i≤n, n are the number of plies of convolutional layer.

7. each in every group according to the method described in claim 6, it is characterized in that, multiple expansion convolutional layers are divided into several groups The amplification degree for expanding convolutional layer is constantly incremented by.

8. according to the method described in claim 6, it is characterized in that, each convolution for expanding convolutional layer and there is any amplification degree Core.

9. according to the method described in claim 6, it is characterized in that, the transformation factor that amplification degree is incremented by every time is different.

10. according to the method described in claim 6, it is characterized in that, the last layer expands the sense of the expansion convolution kernel of convolutional layer It is less than or equal to the size of characteristic pattern by wild size.

11. according to the method described in claim 5, it is characterized in that, the method also includes:

Semantic segmentation model is obtained previously according to truthful data (ground truth) training neural network, in semantic segmentation model Mixing including coding stage expands convolutional layer.

12. the method according to claim 1, wherein the method also includes:

Semantic segmentation model is obtained previously according to truthful data training neural network, includes decoding stage in semantic segmentation model Intensive up-sampling convolutional layer.

13. method according to claim 11 or 12, which is characterized in that previously according to truthful data, training is complete end-to-endly Convolutional network obtains semantic segmentation model.

14. method according to claim 1 or 5, which is characterized in that further include:

According to the occluding contour of the object example of extraction, the shape, size and/or area of object example are determined.

15. a kind of detection device of object closed outline characterized by comprising

Intensive up-sampling convolution module, the characteristic pattern for being exported to cataloged procedure in the decoding process that semantic segmentation is handled Intensive up-sampling process of convolution is carried out, size output image identical with input image size is obtained, exporting includes object in image The occluding contour of body example；

Profile extraction module, for identifying and extracting the closed outline of object example from output image according to pixel class Line.

16. device according to claim 15, which is characterized in that intensive up-sampling convolution module is specifically used for:

17. device according to claim 16, which is characterized in that the intensive convolution module that up-samples is by the port number of characteristic pattern (c) switch to the down-sampling factor (d in cataloged procedure²) and predetermined object category quantity (L) product, comprising:

18. device according to claim 16, which is characterized in that after intensive up-sampling convolution module is to number of channels conversion Characteristic pattern be combined, comprising:

19. device according to claim 15, which is characterized in that described device further include:

Mixing expands convolution module, for being carried out repeatedly in the cataloged procedure that semantic segmentation is handled to the characteristic pattern of extraction Mixing expands process of convolution, obtains the characteristic pattern for expanding receptive field.

20. device according to claim 19, which is characterized in that mixing expands convolution module and is specifically used for:

21. device according to claim 20, which is characterized in that mixing expands convolution module and is also used to roll up multiple expansions Lamination is divided into several groups, and the amplification degree for expanding convolutional layer in every group is constantly incremented by.

22. device according to claim 20, which is characterized in that each convolution for expanding convolutional layer and there is any amplification degree Core.

23. device according to claim 20, which is characterized in that the transformation factor that amplification degree is incremented by every time is different.

24. device according to claim 20, which is characterized in that the last layer expands the sense of the expansion convolution kernel of convolutional layer It is less than or equal to the size of characteristic pattern by wild size.

25. device according to claim 19, which is characterized in that described device further include:

First pre-training module, for obtaining semantic segmentation previously according to truthful data (ground truth) training neural network Model includes that the mixing of coding stage expands convolutional layer in semantic segmentation model.

26. device according to claim 15, which is characterized in that described device further include:

Second pre-training module, for obtaining semantic segmentation model, semantic segmentation previously according to truthful data training neural network It include the intensive up-sampling convolutional layer of decoding stage in model.

27. according to device described in claim 25 or 26, which is characterized in that the first pre-training module is previously according to true number Semantic segmentation model is obtained according to the full convolutional network of training end-to-endly；

Previously according to truthful data, the full convolutional network of training obtains semantic segmentation model to second pre-training module end-to-endly.

28. device described in 5 or 19 according to claim 1, which is characterized in that further include:

Determining module, for the occluding contour according to the object example of extraction, determine the shape of object example, size and/or Area.

29. a kind of detection device of object closed outline, which is characterized in that including a processor and at least one processor, At least one machine-executable instruction is stored at least one processor, processor executes at least one machine-executable instruction To execute:

In the decoding process of semantic segmentation processing, intensive up-sampling process of convolution is carried out to the characteristic pattern of cataloged procedure output, Size output image identical with input image size is obtained, the occluding contour in image including object example is exported；

30. device according to claim 29, which is characterized in that processor execute at least one machine-executable instruction with It executes and intensive up-sampling process of convolution is carried out to the characteristic pattern that cataloged procedure exports, it is identical with input image size to obtain size Export image, comprising:

31. device according to claim 30, which is characterized in that processor execute at least one machine-executable instruction with Execute the down-sampling factor (d switched to the port number (c) of characteristic pattern in cataloged procedure²) and predetermined object category quantity (L) Product, comprising:

32. device according to claim 30, which is characterized in that processor execute at least one machine-executable instruction with It executes and the characteristic pattern after number of channels conversion is combined, comprising:

33. device according to claim 29, which is characterized in that the processor executes the executable finger of at least one machine It enables and also executing:

34. device according to claim 33, which is characterized in that processor execute at least one machine-executable instruction with It executes and carries out multiple mixing expansion process of convolution to the characteristic pattern of extraction, obtain the characteristic pattern for expanding receptive field, comprising:

35. device according to claim 34, which is characterized in that processor executes at least one machine-executable instruction also It executes and multiple expansion convolutional layers is divided into several groups, the amplification degree for expanding convolutional layer in every group is constantly incremented by.

36. device according to claim 34, which is characterized in that each convolution for expanding convolutional layer and there is any amplification degree Core.

37. device according to claim 34, which is characterized in that the transformation factor that amplification degree is incremented by every time is different.

38. device according to claim 34, which is characterized in that the last layer expands the sense of the expansion convolution kernel of convolutional layer It is less than or equal to the size of characteristic pattern by wild size.

39. device according to claim 33, which is characterized in that processor executes at least one machine-executable instruction also It executes:

Semantic segmentation model is obtained previously according to truthful data training neural network, includes coding stage in semantic segmentation model Mixing expands convolutional layer.

40. device according to claim 29, which is characterized in that processor executes at least one machine-executable instruction also It executes:

41. the device according to claim 39 or 40, which is characterized in that processor executes the executable finger of at least one machine Enable also executing trains full convolutional network to obtain semantic segmentation model end-to-endly previously according to truthful data.

42. the device according to claim 29 or 33, which is characterized in that processor executes the executable finger of at least one machine It enables and also executing: