CN109190662A

CN109190662A - A kind of three-dimensional vehicle detection method, system, terminal and storage medium returned based on key point

Info

Publication number: CN109190662A
Application number: CN201810834980.3A
Authority: CN
Inventors: 吴子章; 王凡; 唐锐; 李坤仑; 丁丽珠
Original assignee: Beijing Anchi Zongmu Intelligent Technology Co Ltd
Current assignee: Zongmu Technology Shanghai Co Ltd
Priority date: 2018-07-26
Filing date: 2018-07-26
Publication date: 2019-01-11

Abstract

The present invention provides a three-dimensional vehicle detection method, system, terminal and storage medium based on key point regression. The relative position information of the point label in the detection area is marked as the first position information of the key point; S02: Extract the feature map of the target detection area, obtain the relative position information of the key point of the tracking target in the feature map, and mark it as the second key point Position information; S03: Taking the first position information of the key point and the second position information of the key point as input, a loss function is obtained to optimize the network structure. The present invention reduces the difficulty of the regression and improves the performance of the network structure by performing multi-scale feature fusion on the corresponding feature map, and then performing the loss function regression in two stages.

Description

A kind of three-dimensional vehicle detection method, system, terminal and storage returned based on key point Medium

Technical field

The present invention relates to technical field of automotive electronics, detect more particularly to a kind of three-dimensional vehicle returned based on key point Method, system, terminal and storage medium.

Background technique

ADAS, that is, advanced driving assistance system is also known as active safety system, main by obtaining image, radar data and locating Reason.Obtain the information such as distance, position, the shape of target object.In the tracking of target object, same target object is due to target The influence of environment locating for oneself state, scene, often difference is very big for imaging in different images for same type objects, In different time, different resolution, different illumination, different positions and pose, imaging is mutually difficult to match.Key point be The Local Extremum with directional information detected in the image in different scale space, in automatic driving vehicle driving process, Camera can collect the object on and around road, and for objects such as vehicle, pedestrian, guideboard, light poles, we can basis Critical point detection algorithm returns out its corresponding key point, automatic driving vehicle can be assisted to be determined according to key point information Position.

Summary of the invention

In order to solve above-mentioned and other potential technical problems, the present invention provides a kind of based on key point recurrence Three-dimensional vehicle detection method, system, terminal and storage medium, first, the corresponding standard feature figure of target will be tracked and carry out more rulers Spend Fusion Features.Second, using standard feature figure, in the case where not influencing time-consuming, improve the precision of images.Third is divided to two Stage carries out loss function recurrence, first operates standard feature figure progress down-sampling to obtain down-sampling layer characteristic pattern.Stage one exists Learnt in down-sampling layer characteristic pattern, after study is abundant, the key point position in down-sampling layer characteristic pattern is mapped to mark In quasi- characteristic pattern；Stage two is learnt in the standard feature figure that mapping obtains, and only learns key point place using mask Mapping position, reduce the difficulty of recurrence.

A kind of three-dimensional vehicle detection method returned based on key point, comprising the following steps:

S01: the key point label of default tracking target determines the detection zone of tracking target, obtains default tracking target The relative position information of key point label in the detection area is labeled as key point first location information；

S02: extracting the characteristic pattern of object detection area, obtains relative position letter of the tracking target critical point in characteristic pattern Breath is labeled as key point second location information；

S03: it is input with key point first location information and key point second location information, obtains loss function to optimize Network structure.

It further, further include by each characteristic pattern after the characteristic pattern for extracting object detection area in the step S02 Fusion Features step S021.

Further, the characteristic pattern that the Fusion Features of the characteristic pattern in the step S021 are limited to middle low layer carries out feature Fusion.I.e. in the convolutional layer of neural network, the characteristic pattern of the convolutional layer in middle low layer carries out Fusion Features.

Further, the Fusion Features in the step S021 are intensive Fusion Features.I.e. Fusion Features when, selection mind When through convolutional layer in network, chooses convolutional layer as much as possible and carry out Fusion Features.

Further, after the detection zone for determining tracking target of step S01, in the extraction target detection of step S02 Before the characteristic pattern in region, include thes steps that scale space converts S01a: the characteristic pattern of each object detection area is converted into The standard feature figure of identical scale, then critical point detection is carried out, obtain relative positional relationship of the key point in characteristic pattern.

Further, the size of the standard feature figure is the characteristic pattern of 56*56, and the size of standard feature figure, which is slightly larger than, to be chased after The size of track target candidate frame is in key point except standard feature figure to prevent the edge exposure of tracking target.

It further, further include by standard before obtaining relative position information of the tracking target critical point in characteristic pattern The step S01b of characteristic pattern down-sampling: down-sampling layer characteristic pattern is obtained, following sample level characteristic pattern is as input, training down-sampling Localized network；Following sample level characteristic pattern is input down-sampling localized network again, exports key point location information and maps back mark In quasi- characteristic pattern.

Further, the standard feature figure of the 56*56 that will be obtained carries out down-sampling, and the down-sampling layer for obtaining 7*7 is special Sign figure, is learnt in the down-sampling layer characteristic pattern of 7*7, after study sufficiently, then obtained key point position is mapped to 56* In 56 standard feature figure.

Further, further include step S01c: the standard feature figure for having mapped down-sampling key point position is subjected to mask Operation, training standard feature localized network make it only learn the mapping position in standard feature figure where key point.

Further, in the standard feature figure of 56*56, using mask mask, only learn the part containing key point, drop Low learning difficulty, is learnt using loss function.

Further, it is origin that the object detection area, which marks its upper left angle point, is obtained parameter (X, Y), target detection The width in region is set as W, and the high setting of object detection area is H；Obtain the parameter (X, Y, W, H) of object detection area.

Further, in the network structure, section of foundation uses resnet50 network structure, and detection section uses rrc network Structure.

Further, the network structure of the critical point detection section includes the characteristic pattern for obtaining low layer in section of foundation, is passed through Pooling layers of RoI make the window of each characteristic pattern generate fixed-size characteristic pattern, merge fixed dimension with concat function Characteristic pattern, obtain standard feature figure using convolution at least once, pondization operation, standard feature figure and default tracking target The input of key point label generates first-loss function together.

Further, the critical point detection section operates before generating standard feature figure by cubic convolution, pondization.

Further, the standard feature figure obtains down-sampling layer characteristic pattern using convolution at least once, pondization operation, The label of down-sampling layer characteristic pattern and tracking target critical point in characteristic pattern is operated by mask again collectively as input, is generated Second loss function.

Further, the critical point detection section operates before generating down-sampling layer characteristic pattern by cubic convolution, pondization.

A kind of three-dimensional vehicle detection system returned based on key point, including key point label for labelling module, target detection Module, characteristic extracting module, key point first position generation module, key point second location information generation module, loss function Generation module；

The module of target detection is used to obtain tracking target in original image, and detection zone is obtained based on tracking target Domain；

The key point label for labelling module exports key point label for marking tracking target critical point；

The characteristic extracting module generates characteristic pattern for extracting feature in self-test region；

Key point first position generation module is used for the pixel of module of target detection where key point label Location information generates key point first position array；

Key point second position generation module is used for the location information of the lattice point of characteristic pattern where key point label Generate key point second position array；

The loss function generation module be used for key point first position array digit corresponding with second position array it The sum of difference and the product of coefficient obtain loss function, with corrective networks structure.

It further, further include Fusion Features module, the Fusion Features module is used for the spy of low layer middle in section of foundation Sign figure carries out fusion and generates standard feature figure.

It further, further include scale space conversion module, the scale space conversion module is used for will be each in section of foundation A layer of characteristic pattern is converted into identical size and generates standard feature figure.

It further, further include down-sampling layer module, the down-sampling layer module is used for lattice each in standard feature figure Point down-sampling generates the down-sampling layer characteristic pattern that bulk is less than standard feature figure.

It further, further include mask module, the mask module is used for the second key point in down-sampling layer characteristic pattern During location information maps to standard feature figure, in standard feature figure in addition to relevant to the second key point confidence breath The operation of lattice point progress mask.

A kind of three-dimensional vehicle detection terminal returned based on key point, which is characterized in that including processor and memory, institute It states memory and is stored with program instruction, the processor operation program instruction realizes the step in above-mentioned method.

A kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the program is by processor The step in above-mentioned method is realized when execution.

As described above, of the invention has the advantages that first, the corresponding standard feature figure of target will be tracked and carried out Multi-scale feature fusion.Second, using standard feature figure, in the case where not influencing time-consuming, improve the precision of images.Third, point Two stages carry out loss function recurrence, first operate standard feature figure progress down-sampling to obtain down-sampling layer characteristic pattern.Stage One is learnt in down-sampling layer characteristic pattern, and after study is abundant, the key point position in down-sampling layer characteristic pattern is mapped Into standard feature figure；Stage two is learnt in the standard feature figure that mapping obtains, and only learns key point using mask The mapping position at place reduces the difficulty of recurrence.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Fig. 1 is shown as flow chart of the invention.

Fig. 2 is shown as test effect figure of the invention.

Fig. 3 is shown as the schematic diagram of mask module operation of the present invention.

Fig. 4 is shown as the network structure of critical point detection section of the present invention.

Specific embodiment

Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification Other advantages and efficacy of the present invention can be easily understood for disclosed content.The present invention can also pass through in addition different specific realities The mode of applying is embodied or practiced, the various details in this specification can also based on different viewpoints and application, without departing from Various modifications or alterations are carried out under spirit of the invention.It should be noted that in the absence of conflict, following embodiment and implementation Feature in example can be combined with each other.

It should be clear that this specification structure depicted in this specification institute accompanying drawings, ratio, size etc., only to cooperate specification to be taken off The content shown is not intended to limit the invention enforceable qualifications so that those skilled in the art understands and reads, therefore Do not have technical essential meaning, the modification of any structure, the change of proportionate relationship or the adjustment of size are not influencing the present invention Under the effect of can be generated and the purpose that can reach, it should all still fall in disclosed technology contents and obtain the model that can cover In enclosing.Meanwhile cited such as "upper" in this specification, "lower", "left", "right", " centre " and " one " term, be also only Convenient for being illustrated for narration, rather than to limit the scope of the invention, relativeness is altered or modified, in no essence It changes under technology contents, when being also considered as the enforceable scope of the present invention.

Referring to figs. 1 to 4, a kind of three-dimensional vehicle detection method returned based on key point, comprising the following steps:

As a preferred embodiment, in the step S02, after the characteristic pattern for extracting object detection area, further including will be each The step S021 of the Fusion Features of characteristic pattern.

As a preferred embodiment, the Fusion Features of the characteristic pattern in the step S021 be limited to the characteristic pattern of middle low layer into Row Fusion Features.I.e. in the convolutional layer of neural network, the characteristic pattern of the convolutional layer in middle low layer carries out Fusion Features.

As a preferred embodiment, the Fusion Features in the step S021 are intensive Fusion Features.I.e. Fusion Features when, When selecting the convolutional layer in neural network, chooses convolutional layer as much as possible and carry out Fusion Features.

As a preferred embodiment, after the detection zone for determining tracking target of step S01, in the extraction mesh of step S02 Before the characteristic pattern for marking detection zone, include the steps that scale space converts S01a: by the characteristic pattern of each object detection area It is converted into the standard feature figure of identical scale, then carries out critical point detection, obtains relative position of the key point in characteristic pattern Relationship.

As a preferred embodiment, the size of the standard feature figure is the characteristic pattern of 56*56, and the size of standard feature figure is slightly It is in key point except standard feature figure greater than the size of tracking target candidate frame to prevent the edge exposure of tracking target.

As a preferred embodiment, before obtaining relative position information of the tracking target critical point in characteristic pattern, further include By the step S01b of standard feature figure down-sampling: obtaining down-sampling layer characteristic pattern, following sample level characteristic pattern is as input, training Down-sampling localized network；Following sample level characteristic pattern is input down-sampling localized network again, exports key point location information and reflects It is emitted back towards in standard feature figure.

As a preferred embodiment, the standard feature figure of the 56*56 that will be obtained carries out down-sampling, obtains adopting under 7*7 Sample layer characteristic pattern is learnt in the down-sampling layer characteristic pattern of 7*7, after study sufficiently, then obtained key point position is reflected It is mapped in the standard feature figure of 56*56.

As a preferred embodiment, further include step S01c: will have mapped the standard feature figure of down-sampling key point position into The operation of row mask, training standard feature localized network make it only learn the mapping position in standard feature figure where key point.

As a preferred embodiment, in the standard feature figure of 56*56, using mask mask, only study contains key point Part is reduced learning difficulty, is learnt using loss function.

As a preferred embodiment, it is origin that the object detection area, which marks its upper left angle point, is obtained parameter (X, Y), mesh The width of mark detection zone is set as W, and the high setting of object detection area is H；Obtain object detection area parameter (X, Y, W, H)。

As a preferred embodiment, in the network structure, section of foundation uses resnet50 network structure, and detection section uses Rrc network

Structure.

As a preferred embodiment, the network structure of the critical point detection section includes obtaining the feature of low layer in section of foundation Figure makes the window of each characteristic pattern generate fixed-size characteristic pattern by pooling layers of RoI, solid with the fusion of concat function The characteristic pattern of scale cun obtains standard feature figure, standard feature figure and default tracking using convolution at least once, pondization operation The key point label input of target generates first-loss function together.

As a preferred embodiment, the critical point detection section operates before generating standard feature figure by cubic convolution, pondization.

As a preferred embodiment, the standard feature figure obtains down-sampling layer using convolution at least once, pondization operation Characteristic pattern, the label of down-sampling layer characteristic pattern and tracking target critical point in characteristic pattern are grasped by mask again collectively as input Make, generates the second loss function.

As a preferred embodiment, the critical point detection section passes through cubic convolution, Chi Hua before generating down-sampling layer characteristic pattern Operation.

It as a preferred embodiment, further include Fusion Features module, the Fusion Features module is used for will be low in section of foundation The characteristic pattern of layer carries out fusion and generates standard feature figure.

It as a preferred embodiment, further include scale space conversion module, the scale space conversion module is used for will be basic Each layer of characteristic pattern is converted into identical size and generates standard feature figure in section.

It as a preferred embodiment, further include down-sampling layer module, the down-sampling layer module is used for will be in standard feature figure Each lattice point down-sampling generates the down-sampling layer characteristic pattern that bulk is less than standard feature figure.

It as a preferred embodiment, further include mask module, the mask module is used for second in down-sampling layer characteristic pattern During key point location information maps to standard feature figure, in standard feature figure in addition to the second key point location information Relevant lattice point carries out the operation of mask.

As a preferred embodiment, the present embodiment also provides a kind of terminal device, can such as execute the smart phone of program, put down Plate computer, laptop, desktop computer, rack-mount server, blade server, tower server or cabinet-type service Device (including server cluster composed by independent server or multiple servers) etc..The terminal device of the present embodiment is extremely It is few to include but is not limited to: memory, the processor of connection can be in communication with each other by system bus.It should be pointed out that having group The terminal device of part memory, processor can substitute it should be understood that being not required for implementing all components shown Implementation is more or less component.

As a preferred embodiment, memory (i.e. readable storage medium storing program for executing) includes flash memory, hard disk, multimedia card, card-type storage Device (for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only storage Device (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic Disk, CD etc..In some embodiments, memory can be the internal storage unit of computer equipment, such as the computer is set Standby 20 hard disk or memory.In further embodiments, memory is also possible to the External memory equipment of computer equipment, such as The plug-in type hard disk being equipped in the computer equipment, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Certainly, memory can also both include computer equipment Internal storage unit also include its External memory equipment.In the present embodiment, memory is installed on computer commonly used in storage The operating system and types of applications software of equipment, for example, in embodiment Case-based Reasoning segmentation target Re-ID program code Deng.In addition, memory can be also used for temporarily storing the Various types of data that has exported or will export.

Processor can be central processing unit (Central Processing Unit, CPU), control in some embodiments Device, microcontroller, microprocessor or other data processing chips processed.The processor is total commonly used in control computer equipment Gymnastics is made.In the present embodiment, program code or processing data of the processor for being stored in run memory, such as operation base In the target Re-ID program of example segmentation, to realize the function of the target Re-ID system of Case-based Reasoning segmentation in embodiment.

The present embodiment also provides a kind of computer readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic Disk, CD, server, App are stored thereon with computer program, phase are realized when program is executed by processor using store etc. Answer function.The computer readable storage medium of the present embodiment is used to store the target Re-ID program of Case-based Reasoning segmentation, processed The target Re-ID method of the Case-based Reasoning segmentation in embodiment is realized when device executes.

The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripe The personage for knowing this technology all without departing from the spirit and scope of the present invention, carries out modifications and changes to above-described embodiment.Cause This, includes that institute is complete without departing from the spirit and technical ideas disclosed in the present invention for usual skill in technical field such as At all equivalent modifications or change, should be covered by the claims of the present invention.

Claims

1. a three-dimensional vehicle detection method based on key point regression, is characterized in that, comprises the following steps:

S01: preset the key point label of the tracking target, determine the detection area of the tracking target, obtain the relative position information of the key point label of the preset tracking target in the detection area, and mark it as the first position information of the key point;

S02: extract the feature map of the target detection area, obtain the relative position information of the tracking target key point in the feature map, and mark it as the second position information of the key point;

S03: Taking the first position information of the key point and the second position information of the key point as input, a loss function is obtained to optimize the network structure.

2. The three-dimensional vehicle detection method based on key point regression according to claim 1, wherein in the step S02, after extracting the feature map of the target detection area, it also includes a step S021 of fusing the features of each feature map .

3 . The three-dimensional vehicle detection method based on key point regression according to claim 2 , wherein after determining the detection area of the tracking target in step S01 , and before extracting the feature map of the target detection area in step S02 , further comprising: 4 . Step S01a of scale space conversion: Convert the feature maps of each target detection area into standard feature maps of the same scale, and then perform key point detection to obtain the relative positional relationship of the key points in the feature map.

4 . The three-dimensional vehicle detection method based on key point regression according to claim 3 , wherein, before acquiring the relative position information of the tracking target key points in the feature map, it also includes the step S01b of downsampling the standard feature map. 5 . : Obtain the feature map of the down-sampling layer, take the feature map of the sub-sampling layer as the input, and train the down-sampling local network; then use the feature map of the sub-sampling layer as the input to the down-sampling local network, output the key point position information and map it back to the standard feature map.

5. The three-dimensional vehicle detection method based on key point regression according to claim 4, further comprising step S01c: performing a mask operation on the standard feature map mapped to the downsampling key point position, and training a standard feature local network , so that it only learns the mapping positions where the key points in the standard feature map are located.

6. The three-dimensional vehicle detection method based on key point regression according to claim 5, wherein the upper left corner of the target detection area is marked as the origin, the parameters (X, Y) are obtained, and the width of the target detection area is set is W, and the height of the target detection area is set to H; the parameters of the target detection area (X, Y, W, H) are obtained.

7 . The three-dimensional vehicle detection method based on key point regression according to claim 1 , wherein, in the network structure, the basic segment adopts the resnet50 network structure, and the detection segment adopts the rrc network structure. 8 .

8 . The three-dimensional vehicle detection method based on key point regression according to claim 1 , wherein the network structure of the key point detection segment comprises acquiring the feature maps of the middle and lower layers of the basic segment, and making each feature through the RoI pooling layer. 9 . The window of the graph generates a fixed-size feature map, and the fixed-size feature map is fused with the concat function. After at least one convolution and pooling operation, a standard feature map is obtained. The standard feature map is generated together with the keypoint label input of the preset tracking target. The first loss function; the standard feature map is then subjected to at least one convolution and pooling operation to obtain a downsampling layer feature map, and the downsampling layer feature map and the label of the tracking target key point in the feature map are used as input together and then pass through the mask operation to generate a second loss function.

9. A vehicle three-dimensional detection system based on key point regression, characterized in that it comprises a key point label labeling module, a target detection module, a feature extraction module, a key point first position generation module, a key point second position information generation module, Loss function generation module;

The target detection module is used to obtain the tracking target in the original image, and obtain the detection area based on the tracking target;

The key point label labeling module is used for labeling and tracking target key points, and outputting key point labels;

The feature extraction module is used to extract features from the detection area and generate a feature map;

The key point first position generation module is used to generate the key point first position array with the position information of the pixel point of the target detection module where the key point label is located;

The second position generation module of the key point is used to generate the second position array of the key point with the position information of the grid point of the feature map where the key point label is located;

The loss function generation module is configured to obtain a loss function by multiplying the sum of the differences between the first position array and the second position array of the key points corresponding to the number of digits and the coefficient, so as to correct the network structure.

10. A vehicle three-dimensional detection system terminal based on key point regression, characterized in that it comprises a processor and a memory, wherein the memory stores program instructions, and the processor executes the program instructions to realize any one of claims 1 to 8. Steps in the method described.

11. A computer-readable storage medium on which a computer program is stored, characterized in that: when the program is executed by a processor, the steps in the method according to any one of claims 1 to 8 are implemented.