CN110827202A

CN110827202A - Target detection method, target detection device, computer equipment and storage medium

Info

Publication number: CN110827202A
Application number: CN201911081532.1A
Authority: CN
Inventors: 周康明; 魏宇飞
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2020-02-21

Abstract

The application relates to a target detection method, a target detection device, computer equipment and a storage medium. The method comprises the following steps: acquiring point cloud data and a corresponding color image; fusing the point cloud data and the corresponding color image to obtain fused data; completing the point cloud data according to the fusion data to obtain the completed point cloud data; carrying out target detection according to the fusion data to obtain an intermediate target detection result; acquiring supplemented point cloud data of an area corresponding to the intermediate target detection result from the supplemented point cloud data; and according to the acquired supplemented point cloud data, carrying out result correction on the intermediate target detection result to obtain a final target detection result. The method can improve the detection precision.

Description

Target detection method, target detection device, computer equipment and storage medium

Technical Field

The present application relates to the field of object detection technologies, and in particular, to an object detection method, an object detection apparatus, a computer device, and a storage medium.

Background

3D object detection refers to the task of inferring the 3D position, size, and orientation of all objects given the data obtained from multiple types of sensors (including RGB images from color cameras, and point cloud data from lidar). Currently, the point cloud data collected by lidar can provide accurate location information about the target, but is limited by technology and cost, has low data resolution, and thus is not effective on distant targets as well as small targets. On the other hand, an RGB image obtained by a color camera has high resolution but lacks depth information, and therefore cannot accurately express target position information. Naturally, 3D target detection using RGB images and point cloud data is a promising approach.

The traditional 3D target detection method fusing and utilizing RGB images and point cloud data mainly adopts the technical scheme that information or characteristics of the RGB images and the point cloud at corresponding positions are spliced and then submitted to a deep neural network to learn how to fuse and utilize the two information, and in addition, a semantic segmentation loss function is added in some methods after the information or characteristics are spliced to guide a training process to utilize different information in a larger mode. However, no matter how to fuse two kinds of information by training and indirect learning through a deep neural network, or how to guide through a semantic segmentation loss function, it is difficult to directly learn the position information most helpful to the 3D target detection task, so that the position of the target detected finally is inaccurate.

Disclosure of Invention

In view of the above, it is necessary to provide an object detection method, an apparatus, a computer device, and a storage medium capable of improving detection accuracy in view of the above technical problems.

A method of target detection, the method comprising:

acquiring point cloud data and a corresponding color image;

fusing the point cloud data and the corresponding color image to obtain fused data;

completing the point cloud data according to the fusion data to obtain the completed point cloud data;

carrying out target detection according to the fusion data to obtain an intermediate target detection result;

acquiring supplemented point cloud data of an area corresponding to the intermediate target detection result from the supplemented point cloud data;

and according to the acquired supplemented point cloud data, carrying out result correction on the intermediate target detection result to obtain a final target detection result.

In one embodiment, the performing target detection according to the fusion data to obtain an intermediate target detection result includes:

extracting features of the fusion data, and performing target detection according to the extracted features to obtain a preliminary target detection result and a measurement index corresponding to the preliminary target detection result;

sorting the preliminary target detection results according to the measurement indexes corresponding to the preliminary target detection results;

and carrying out non-maximum suppression processing on the measurement indexes corresponding to the sorted preliminary target detection results to obtain an intermediate target detection result.

In one embodiment, the fusing the point cloud data and the corresponding color image to obtain fused data includes:

performing multi-scale processing on the color image to obtain multi-scale features;

dividing the point cloud data to obtain voxels to be supplemented, and recording the original characteristics of the voxels to be supplemented;

and splicing the multi-scale features into the corresponding original features to form spliced voxel data, and taking the spliced voxel data as fusion data.

In one embodiment, the dividing the point cloud data to obtain voxels to be complemented includes:

dividing the point cloud data to obtain initial voxels, and marking the initial voxels comprising at least one point in the point cloud data;

and acquiring the initial voxel with the distance to the marked initial voxel smaller than a preset value as a voxel to be completed.

In one embodiment, the stitching the multi-scale features into the corresponding original features to form stitched voxel data includes:

acquiring a coordinate projection matrix between the color image and the point cloud data and an initial coordinate of a central point of the voxel to be complemented;

calculating the initial coordinate of the central point and the conversion coordinate in the color image according to the coordinate projection matrix;

acquiring multi-scale characteristics of pixels corresponding to the conversion coordinates;

and splicing the obtained multi-scale features into the corresponding original features of the voxels to be complemented.

In one embodiment, the recording the original features of the voxel to be complemented includes:

acquiring coordinates of points in the voxel to be complemented;

inquiring the laser radar reflection intensity of the points in the voxel to be complemented;

and obtaining the original characteristics of the voxel to be supplemented according to the coordinates of the points in the voxel to be supplemented and the average value of the reflection intensity of the laser radar.

In one embodiment, the performing, according to the obtained supplemented point cloud data, result correction on the intermediate target detection result to obtain a final target detection result includes:

the point cloud data of the area corresponding to the intermediate target detection result is divided again to obtain new voxels;

and extracting features according to the new voxels, and correcting the result according to the extracted features to obtain a final target detection result.

An object detection apparatus, the apparatus comprising:

the data acquisition module is used for acquiring point cloud data and a corresponding color image;

the fusion module is used for fusing the point cloud data and the corresponding color image to obtain fusion data;

the completion module is used for completing the point cloud data according to the fusion data to obtain the completed point cloud data;

the preliminary detection module is used for carrying out target detection according to the fusion data to obtain an intermediate target detection result;

the area acquisition module is used for acquiring the supplemented point cloud data of the area corresponding to the intermediate target detection result from the supplemented point cloud data;

and the target detection module is used for correcting the result of the intermediate target detection result according to the acquired supplemented point cloud data to obtain a final target detection result.

A computer device comprising a memory storing a computer program and a processor implementing the steps of any of the methods described above when the processor executes the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the above.

According to the target detection method, the device, the computer equipment and the storage medium, firstly, fusion data are obtained according to the point cloud data and the color image, so that a middle target detection result can be obtained according to the fusion data through target detection, the characteristics of the fused point cloud data and the high resolution of the color image can improve the precision of the middle target detection result, secondly, the point cloud data are completed according to the fusion data, the completed point cloud data can be obtained, the most helpful position information, namely the completed points, can be obtained, and therefore, the region corresponding to the middle target detection result is subjected to the target detection again according to the completed point cloud data, and the detection precision can be improved.

Drawings

FIG. 1 is a diagram illustrating an exemplary implementation of a target detection method;

FIG. 2 is a schematic flow chart diagram of a method for object detection in one embodiment;

FIG. 3 is a diagram illustrating the steps of obtaining intermediate target detection results, in one embodiment;

FIG. 4 is a flow diagram of multi-scale processing steps in one embodiment;

FIG. 5 is a block diagram of a process for object detection based on re-voxelized data in one embodiment;

FIG. 6 is a block flow diagram of a method of object detection in one embodiment;

FIG. 7 is a block diagram of an embodiment of an object detection device;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The target detection method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 and the server 104 communicate via a network. The server 104 may obtain the point cloud data and the color image acquired by the terminal 102, and then fuse the point cloud data and the color image to obtain fused data, so that target detection may be performed on the fused data to obtain an intermediate target detection result, and the point cloud data may be complemented by the fused data to obtain complemented point cloud data, so that target detection with higher precision may be performed on a region corresponding to the intermediate target detection result by the complemented point cloud data to obtain a final target detection result, so that the most helpful position information, i.e., complemented points, may be obtained, so that target detection may be performed on a region corresponding to the intermediate target detection result again according to the complemented point cloud data, and detection precision may be improved. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, and lidar cameras, and the server 104 may be implemented by an independent server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, an object detection method is provided, which is described by taking the application of the method to the server in fig. 1 as an example, and includes the following steps:

s202: and acquiring point cloud data and a corresponding color image.

Specifically, the point cloud data is data collected by the laser radar, and is recorded in the form of points, and each point includes three-dimensional coordinate information and reflection intensity information. The color image refers to an image acquired by a camera, which includes, but is not limited to, RGB images, in which point cloud data and the color image are corresponding to each other.

S204: and fusing the point cloud data and the corresponding color images to obtain fused data.

Specifically, the fusion is to supplement the low-resolution point cloud data with the high resolution of the color image. The server can obtain fusion data by acquiring the characteristics of the color image at the corresponding position and splicing the characteristics with the original characteristics of the point cloud data. Optionally, in order to improve the accuracy of the target detection result, the server may further perform multi-scale processing on the color image to obtain multi-scale features of the color image, so as to supplement the point cloud data through the multi-scale features. In addition, the server can perform voxelization processing on the point cloud data to obtain a plurality of voxels to be supplemented in order to conveniently determine the positions of the point cloud data and the color image, so that the positions of the voxels to be supplemented, which correspond to pixels in the color image, are determined through the center point coordinates of the voxels to be supplemented and the coordinate projection matrix between the color image and the point cloud data, and further feature fusion is performed.

S206: and completing the point cloud data according to the fusion data to obtain the completed point cloud data.

Specifically, the server may determine whether the point cloud data needs to be supplemented according to the fused data, specifically, the server determines whether a point to be supplemented exists in the voxel to be supplemented according to the fused data, and if so, adds the point to be supplemented in the point cloud data. For example, judging whether a point to be complemented exists in the corresponding voxel to be complemented according to the spliced voxel data, that is, the fused data, includes: and processing the spliced voxel data, namely the fusion data through a pre-trained 3D convolutional neural network to judge whether the corresponding voxel to be supplemented has a point to be supplemented. Specifically, the server scores corresponding voxels to be supplemented according to the spliced voxel data, then obtains voxels to be supplemented with scores greater than a preset value as voxels to be supplemented, finally adds points to be supplemented corresponding to the voxels to be supplemented to the point cloud data, scores the corresponding voxels to be supplemented according to a pre-trained 3D convolutional neural network, gives preset number of target prediction results and scores to each point of feature maps of different scales in the color image through the neural network, performs non-maximum suppression according to score ordering to give a preliminary target detection result, namely the scores after scoring, and finally determines whether the points to be supplemented exist in the voxels to be supplemented according to the scores.

S208: and carrying out target detection according to the fusion data to obtain an intermediate target detection result.

Specifically, the server may perform target detection through a pre-trained neural network, for example, first perform convolution operation according to the fusion data to obtain features, then perform target detection according to the obtained features to obtain a preliminary target detection result, and obtain an intermediate target detection result by screening the preliminary target detection result.

S210: and acquiring the supplemented point cloud data of the area corresponding to the intermediate target detection result from the supplemented point cloud data.

Specifically, after the server obtains the intermediate target detection result, that is, after the server performs rough target detection, high-precision target detection is also required to be performed to improve the precision of the target detection, so that the server only needs to perform high-precision target detection on the region corresponding to the intermediate target detection result, and therefore the server only needs to obtain the supplemented point cloud data of the region corresponding to the intermediate target detection result, and the supplemented point cloud data is added with the position information, so that the precision can be improved.

S212: and according to the acquired supplemented point cloud data, carrying out result correction on the intermediate target detection result to obtain a final target detection result.

Specifically, the server may perform high-precision target detection through a pre-trained neural network, for example, perform voxelization processing on the supplemented point cloud data to obtain new voxels, perform convolution operation on the voxel characteristics of the voxels to obtain characteristics for detection, and perform target detection according to the obtained characteristics for detection to obtain a final target detection result.

According to the target detection method, firstly, fusion data is obtained according to the point cloud data and the color image, so that an intermediate target detection result can be obtained by performing target detection according to the fusion data, the characteristics of the fused point cloud data and the high resolution of the color image can improve the precision of the intermediate target detection result, secondly, the point cloud data is complemented according to the fusion data, the complemented point cloud data can be obtained, the most helpful position information, namely the complemented points, can be obtained, and the detection precision can be improved by performing target detection on the region corresponding to the intermediate target detection result according to the complemented point cloud data.

In one embodiment, referring to fig. 3, fig. 3 is a schematic diagram of an obtaining step of an intermediate target detection result in one embodiment. The obtaining of the intermediate target detection result by performing the target detection according to the fusion data may include: and extracting the characteristics of the fusion data, and carrying out target detection according to the extracted characteristics to obtain a preliminary target detection result and a measurement index corresponding to the preliminary target detection result. And sequencing the primary target detection results according to the corresponding measurement indexes of the primary target detection results. And carrying out non-maximum suppression processing on the measurement indexes corresponding to the sorted primary target detection results to obtain an intermediate target detection result.

Specifically, the fusion data is voxel data after feature splicing, that is, point cloud data is subjected to voxelization to obtain a plurality of voxels to be supplemented, then the positions of the voxels to be supplemented in the color image are determined, the multi-scale features corresponding to the positions are obtained, and the multi-scale features are spliced to the original features of the voxels to obtain the fusion data corresponding to each voxel.

Feature extraction of the fused data can be performed through a preset 3D convolutional neural network, that is, each fused data includes position information of the voxel and a feature dimension, that is, the fused voxel can be represented by (x1, y1, z1, c1), where x1, y1, z1 are coordinates of the voxel, and c1 is the feature dimension. Firstly, performing scale transformation on voxels in a point cloud space, for example, transforming the voxels into original 1/64, then changing the number of channels of a characteristic dimension after convolution operation, for example, changing the number of channels into 16, that is, 2 × 8, where 2 indicates that two target prediction results exist, in other embodiments, other values may be set, 8 is used to represent a preliminary target detection result obtained by identification and a corresponding measurement index, that is, 7+1, where 7 indicates the length, width, height, and position of a central point (x2, y2, z2) and a deflection angle of the preliminary target detection result, 1 is a measurement index, that is, the reliability of the preliminary target detection result, and in practice, the reliability can be represented by a score.

Specifically, after the server obtains the preliminary target detection result, that is, the prediction result of the target possibly existing in each voxel, the preliminary target detection result is sorted according to the measurement index. It is assumed that the original point cloud space is divided into 40 × 40 voxels, the original point cloud space is 10 × 10 voxels through scale transformation, each voxel corresponds to two preliminary target detection results, 2000 preliminary target detection results exist at this time, and the server sorts the 2000 preliminary target detection results.

After sorting is finished, the server adopts non-maximum suppression processing to delete the initial target detection result with reliability to obtain an intermediate target detection result.

Optionally, the server predicts the intermediate target detection result and obtains a point cloud completion result according to the fusion data, that is, the server scores corresponding voxels to be completed according to the spliced voxel data, then obtains the voxels to be completed with scores greater than a preset value as voxels to be completed, and finally adds points to be completed corresponding to the voxels to be completed in the point cloud data.

In the embodiment, the point cloud data and the color image are fused to obtain the fused data, so that the target detection can be performed according to the fused data to obtain the intermediate target detection result, and the accuracy of the intermediate target detection result can be improved due to the characteristics of the fused point cloud data and the high resolution of the color image.

In one embodiment, fusing the point cloud data and the corresponding color image to obtain fused data, including: performing multi-scale processing on the color image to obtain multi-scale features; dividing the point cloud data to obtain voxels to be supplemented, and recording the original characteristics of the voxels to be supplemented; and splicing the multi-scale features into the corresponding original features to form spliced voxel data, and taking the spliced voxel data as fusion data.

Specifically, the multi-scale processing is to extract features of different scales corresponding to the color image, and then to splice the features of different scales and original scale features corresponding to the color image. Wherein the different dimensions may be determined as desired, including but not limited to 1/4 artwork size and 1/16 artwork size.

Referring to fig. 4, fig. 4 is a flowchart of a multi-scale processing step in an embodiment, in which performing multi-scale processing on a color image to obtain a multi-scale feature includes: carrying out scale transformation and feature extraction on the color image to obtain a plurality of images to be processed with different scales and original feature maps; carrying out interpolation up-sampling operation on a plurality of images to be processed with different scales to obtain a plurality of intermediate processing images with the same scale as the color images; splicing the multiple intermediate processing images and the original characteristic image to obtain a multi-scale color image; and extracting the multi-scale features of each pixel in the multi-scale color image. For example, assuming that the length and width of the color image are W and H, respectively, the color image is input to a deep convolutional neural network to extract original feature maps of original sizes corresponding to the image to be processed and the color image in a plurality of scales, for example, the deep convolutional neural network shown in fig. 4, and feature maps of original sizes (W × H), 1/4 original sizes (W/2 × H/2), and 1/16 original sizes (W/4 × H/4) can be obtained. And then, respectively carrying out interpolation up-sampling operation on feature maps of 1/4 original size (W/2H/2) and 1/16 original size (W/4H/4) to restore the feature maps to the original size, namely the intermediate processed image, and finally splicing the intermediate processed image and the original image to form a multi-scale color image, so that the multi-scale feature of each pixel in the color image can be obtained. And the purpose of restoring the image to be processed to the same size as the color image is to splice the original characteristic diagram.

And dividing the point cloud data to obtain voxels to be supplemented, and recording the original characteristics of the voxels to be supplemented, wherein the voxels are short for volume elements, and the solid containing the voxels can be represented by solid rendering or polygonal isosurface extraction of a given threshold contour. The voxel to be completed is the voxel for judging whether to be completed. The raw features are features used to measure the voxels to be complemented, which may include, but are not limited to, coordinates of all points in the voxels to be complemented and lidar reflection intensities corresponding to the points.

The server performs voxelization on the point cloud data to obtain a plurality of initial voxels, so that voxels to be complemented can be determined according to the initial voxels and the initial voxels of points in the point cloud data. For example, a voxel to be supplemented is determined according to an initial voxel of a point in at least one point cloud data, that is, the initial voxel of the point in at least one point cloud data is marked, then a voxel near the marked initial voxel is obtained as a voxel to be supplemented, for example, a server determines a space of N × N with the marked initial voxel as a center, so that the voxel to be supplemented in the space is N × N-1, where N represents a range of voxels to be supplemented generated by each initial voxel in each coordinate axis direction.

After the server determines the voxels to be supplemented, the server also needs to determine the original features of the voxels to be supplemented, that is, the original features of the voxels to be supplemented are obtained according to the coordinates of all points in the voxels to be supplemented and the reflection intensity of the laser radar.

And finally, the server can map the whole pixels to be complemented onto a plane of the color image to determine the pixels in the color image corresponding to the whole pixels to be complemented, and then the server splices the multi-scale features corresponding to the pixels in the corresponding color image into the original features of the corresponding voxels to be complemented so as to form spliced voxel data of the voxels to be complemented.

In the above embodiment, the multi-scale features of the color image are first obtained, the point cloud data is then divided to obtain voxels to be complemented and original features of the voxels to be complemented, the multi-scale features are spliced into the corresponding original features to form spliced voxel data, whether points to be complemented exist in the voxels to be complemented are judged through the spliced voxel data, that is, whether points to be complemented exist in the voxels to be complemented is judged by combining the multi-scale features of the color image and the original features of the voxels to be complemented of the divided point cloud data, that is, point cloud complementation is performed in a feature fusion mode, rather than presuming a depth value for each or a part of RGB pixel points, so that the features of the color image and the power data are comprehensively considered, and the detection accuracy can be improved.

In one embodiment, the dividing of point cloud data to obtain voxels to be complemented includes: dividing the point cloud data to obtain initial voxels, and marking the initial voxels comprising points in at least one point cloud data; and acquiring initial voxels with the distance to the marked initial voxels smaller than a preset value as voxels to be completed.

Specifically, the initial voxel is obtained by dividing a space represented by the point cloud data by the server, for example, the server first determines a scene space represented by the point cloud data, that is, determines the scene space according to coordinates (X, Y, Z) of all points in the point cloud data, if the coordinates of all points in the point cloud data are all limited in a space of X e [ -X, X ], Y [ -Y, Y ], Z [ -Z, Z ] (unit meter), the scene space is the scene space represented by the point cloud data, optionally, the server may determine a minimum scene space containing the coordinates (X, Y, Z) of all points as the scene space represented by the point cloud data, and then perform segmentation on the space along a coordinate axis, for example, perform uniform segmentation to obtain the initial voxel, optionally, perform segmentation according to a preset number of voxels or a size of the initial voxel, for example, each initial voxel size is 5 cm X10 cm, and may be segmented into 40X 40Y X20Z initial voxels.

The point cloud data are distributed in the initial voxels, some initial voxels have no point in the point cloud data, some initial voxels have a point in the point cloud data, and the server can acquire the initial voxels with the point in the point cloud data for marking to determine whether points to be complemented exist around the initial voxels, namely, the physical principle is that only the space with the data exists, the missing information possibly exists, and if one space does not have information originally, the possibility that the space around the space does not have the information is high.

The server acquires the initial voxels with the distance to the marked initial voxels smaller than the preset value as voxels to be supplemented, so that all data can be prevented from being processed, and only the surrounding voxels of the voxels with points in the point cloud data need to be processed, so that the data volume can be reduced, and the processing efficiency can be improved.

In addition, the server determines a space of N × N by centering on the marked initial voxels, so that the total voxels to be compensated in the space are N × N-1, wherein N represents the range of the total voxels to be compensated generated by each initial voxel in the direction of each coordinate axis. Where N may be 7, the voxel to be filled is 7 × 7-1.

In the above embodiment, only the initial voxels having points in the point cloud data are labeled, and a certain number of voxels near the initial voxels are determined as the voxels to be incomplete, so that only the surrounding voxels having points in the point cloud data need to be processed, and thus the data amount can be reduced, and the processing efficiency can be improved.

In one embodiment, stitching the multi-scale features into corresponding original features to form stitched voxel data includes: acquiring a coordinate projection matrix between the color image and the point cloud data and an initial coordinate of a central point of a voxel to be complemented; calculating an initial coordinate of the central point and a conversion coordinate in the color image according to the coordinate projection matrix; acquiring multi-scale characteristics of pixels corresponding to the conversion coordinates; and splicing the obtained multi-scale features into the corresponding original features of the voxels to be complemented.

Specifically, the coordinate projection matrix Pr is a mapping matrix between the color image and the point cloud data, which can be calculated from the device placement position and the device internal parameters.

The server acquires initial coordinates of the center points of all voxels to be complemented, calculates the initial coordinates of the center points through a coordinate projection matrix, projects all the voxels to be complemented to a plane where a color image is located by converting coordinates in the color image, namely Pnew is Pr multiplied by Pold, wherein Pnew is the converting coordinates, and Pold is the initial coordinates, so that the server can acquire multi-scale features of pixels at positions corresponding to the color image, and splices the original features of all the voxels to be complemented with the multi-scale color image features corresponding to the color plane to serve as new voxel features.

Specifically, the server may operate according to the following steps: assuming that the central coordinates of the whole pixel to be compensated are (x, y, z), the transformed coordinates (u, z) projected onto the color image plane are first calculated through Pnew — Pr × Pold, and since the coordinates of the pixels are all integers, the transformed coordinates (u ', z') need to be rounded, where the rounding operation may be directly rounding off small digits. Thus, the feature of the position (u ', z') on the obtained multi-scale color image feature map is the multi-scale feature corresponding to the voxel to be completed, and finally the corresponding multi-scale feature is added to the original feature of the current voxel to be completed to complete feature splicing.

In the above embodiment, the position corresponding relationship between the multi-scale features and the voxels to be complemented is determined through the coordinate projection matrix, and the splicing of the multi-scale features and the corresponding original features of the voxels to be complemented is completed according to the position corresponding relationship.

In one embodiment, recording the raw features of the voxels to be complemented comprises: acquiring coordinates of points in voxels to be complemented; inquiring the laser radar reflection intensity of points in the total elements to be compensated; and obtaining the original characteristics of the voxel to be complemented according to the coordinates of the points in the voxel to be complemented and the average value of the reflection intensity of the laser radar.

Specifically, the characteristic of the voxel to be complemented may be identified by coordinates of all points in the voxel to be complemented and an average value of the lidar reflection intensity, for example, taking an average of (x, y, z) coordinates of all points in each voxel and the lidar reflection intensity r as a characteristic value C (4-dimensional, x, y, z, r) of the voxel to be complemented.

In the above embodiment, the coordinates of the points in the voxel to be compensated and the average of the reflection intensity of the laser radar are used as the characteristic values of the voxel to be compensated, which is simple and reliable.

In one embodiment, the performing result correction on the intermediate target detection result according to the acquired supplemented point cloud data to obtain a final target detection result includes: the point cloud data of the area corresponding to the intermediate target detection result is divided again to obtain new voxels; and extracting features according to the new voxels, and correcting the result according to the extracted features to obtain a final target detection result.

Specifically, with reference to fig. 5 and 6, fig. 5 is a flow chart of an embodiment of a target detection method according to re-voxelized data, and fig. 6 is a flow chart of the target detection method in an embodiment, where after an intermediate target detection result is output, a server determines a region corresponding to the intermediate target detection result, then replaces the region with supplemented point cloud data, re-divides the voxel to obtain re-voxelized data, for example, divides the voxel by 10 × 20, and inputs the divided voxel into a 3D neural network for result correction, specifically, each new voxel includes location information of the voxel and a feature dimension, that is, the new voxel may be represented by a voxel feature (x3, y3, z3, c2), where x3, y3, z3 are coordinates of the new voxel, and c2 is the feature dimension. And then inputting the obtained voxel characteristics into a 3D neural network for convolution operation to obtain a final target detection result. .

Specifically, in fig. 6, the calculation may be performed in two threads, one being a main thread and one being an auxiliary thread. One thread is responsible for multi-scale processing of a color image, namely an RGB image, so as to extract multi-scale features, the other thread is responsible for voxelization processing of point cloud data to obtain voxels to be supplemented, then the position corresponding relation between the multi-scale features and the voxels to be supplemented is determined in a main thread through a coordinate projection matrix, splicing of the multi-scale features and the original features of the corresponding voxels to be supplemented is completed according to the position corresponding relation, finally, the fusion features obtained through splicing are input into a pre-trained 3D convolutional neural network to judge whether points to be supplemented exist in the corresponding voxels to be supplemented or not, supplementation is performed, and target detection is performed according to fusion data to obtain an intermediate target detection result.

When the target detection is carried out in the 3D scene, the server can firstly carry out preliminary target detection according to the spliced voxel data, namely the fusion data to obtain an intermediate target detection result, then the server can replace the point cloud data of the area corresponding to the intermediate target detection result according to the supplemented point cloud data, and then carry out result correction through the replaced point cloud data to obtain a final target detection result.

According to the target detection method, firstly, fusion data is obtained according to the point cloud data and the color image, so that an intermediate target detection result can be obtained by performing target detection according to the fusion data, the characteristics of the fused point cloud data and the high resolution of the color image can improve the precision of the intermediate target detection result, secondly, the point cloud data is completed according to the fusion data, the completed point cloud data can be obtained, the most helpful position information, namely the completed points, can be obtained, and the result correction is performed on the area corresponding to the intermediate target detection result according to the completed point cloud data, so that the detection precision can be improved.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 7, there is provided an object detection apparatus including: data acquisition module 100, fusion module 200, completion module 300, preliminary detection module 400, region acquisition module 500, and target detection module 600, wherein:

and the data acquisition module 100 is configured to acquire point cloud data and a corresponding color image.

And the fusion module 200 is configured to fuse the point cloud data and the corresponding color image to obtain fusion data.

And a completion module 300, configured to complete the point cloud data according to the fusion data to obtain completed point cloud data.

And the preliminary detection module 400 is configured to perform target detection according to the fusion data to obtain an intermediate target detection result.

And an area obtaining module 500, configured to obtain the complemented point cloud data of the area corresponding to the intermediate target detection result from the complemented point cloud data.

And the target detection module 600 is configured to correct the intermediate target detection result according to the acquired supplemented point cloud data to obtain a final target detection result.

In one embodiment, the preliminary detection module 400 includes:

and the preliminary target detection result acquisition unit is used for extracting the characteristics of the fusion data and carrying out target detection according to the extracted characteristics to obtain a preliminary target detection result and a measurement index corresponding to the preliminary target detection result.

And the first sequencing unit is used for sequencing the preliminary target detection results according to the measurement indexes corresponding to the preliminary target detection results.

And the intermediate target detection result acquisition unit is used for carrying out non-maximum suppression processing on the measurement indexes corresponding to the sorted primary target detection results to obtain an intermediate target detection result.

In one embodiment, the fusion module 200 may include:

and the multi-scale processing unit is used for carrying out multi-scale processing on the color image to obtain multi-scale features.

And the first dividing unit is used for dividing the point cloud data to obtain voxels to be complemented and recording the original characteristics of the voxels to be complemented.

And the splicing unit is used for splicing the multi-scale features into the corresponding original features to form spliced voxel data, and the spliced voxel data is used as fusion data.

In one embodiment, the first dividing unit may include:

and the initial voxel dividing subunit is used for dividing the point cloud data to obtain initial voxels, and marking the initial voxels comprising at least one point in the point cloud data.

And the selecting subunit is used for acquiring the initial voxel with the distance to the marked initial voxel smaller than a preset value as the voxel to be supplemented.

In one embodiment, the splicing unit may include:

and the coordinate projection matrix acquisition subunit is used for acquiring a coordinate projection matrix between the color image and the point cloud data and the initial coordinates of the central point of the voxel to be complemented.

And the conversion subunit is used for calculating the initial coordinate of the central point and the conversion coordinate in the color image according to the coordinate projection matrix.

And the multi-scale feature acquisition subunit is used for acquiring the multi-scale features of the pixels corresponding to the conversion coordinates.

And the splicing subunit is used for splicing the acquired multi-scale features into the corresponding original features of the voxels to be completed.

In one embodiment, the first dividing unit includes:

and the coordinate acquisition subunit is used for acquiring the coordinates of the points in the voxels to be complemented.

And the laser radar reflection intensity acquisition subunit is used for inquiring the laser radar reflection intensity of the points in the total elements to be compensated.

And the original characteristic calculating subunit is used for obtaining the original characteristic of the voxel to be complemented according to the coordinates of the points in the voxel to be complemented and the average value of the reflection intensity of the laser radar.

In one embodiment, the object detection module 600 may include:

and the second dividing unit is used for re-dividing the point cloud data of the region corresponding to the intermediate target detection result to obtain a new voxel.

And the final target detection result acquisition unit is used for extracting the characteristics according to the new voxels and correcting the result according to the extracted characteristics to obtain a final target detection result.

For specific limitations of the target detection device, reference may be made to the above limitations of the target detection method, which are not described herein again. The modules in the target detection device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing point cloud data and color images. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of object detection.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: acquiring point cloud data and a corresponding color image; fusing point cloud data and the corresponding color images to obtain fused data; completing the point cloud data according to the fusion data to obtain the completed point cloud data; carrying out target detection according to the fusion data to obtain an intermediate target detection result; acquiring supplemented point cloud data of an area corresponding to the intermediate target detection result from the supplemented point cloud data; and according to the acquired supplemented point cloud data, carrying out result correction on the intermediate target detection result to obtain a final target detection result.

In one embodiment, the obtaining of the intermediate target detection result by the target detection according to the fusion data when the processor executes the computer program includes: extracting features of the fusion data, and performing target detection according to the extracted features to obtain a preliminary target detection result and a measurement index corresponding to the preliminary target detection result; sorting the preliminary target detection results according to the corresponding weighing indexes of the preliminary target detection results; and carrying out non-maximum suppression processing on the measurement indexes corresponding to the sorted primary target detection results to obtain an intermediate target detection result.

In one embodiment, fusing the point cloud data and the corresponding color image to obtain fused data when the processor executes the computer program includes: performing multi-scale processing on the color image to obtain multi-scale features; dividing the point cloud data to obtain voxels to be supplemented, and recording the original characteristics of the voxels to be supplemented; and splicing the multi-scale features into the corresponding original features to form spliced voxel data, and taking the spliced voxel data as fusion data.

In one embodiment, the dividing of point cloud data into voxels to be complemented, which is implemented when the processor executes the computer program, includes: dividing the point cloud data to obtain initial voxels, and marking the initial voxels comprising points in at least one point cloud data; and acquiring initial voxels with the distance to the marked initial voxels smaller than a preset value as voxels to be completed.

In one embodiment, the stitching of the multi-scale features into corresponding original features to form stitched voxel data, which is implemented when the processor executes the computer program, includes: acquiring a coordinate projection matrix between the color image and the point cloud data and an initial coordinate of a central point of a voxel to be complemented; calculating an initial coordinate of the central point and a conversion coordinate in the color image according to the coordinate projection matrix; acquiring multi-scale characteristics of pixels corresponding to the conversion coordinates; and splicing the obtained multi-scale features into the corresponding original features of the voxels to be complemented.

In one embodiment, recording raw features of a voxel to be complemented, as implemented by a processor executing a computer program, comprises: acquiring coordinates of points in voxels to be complemented; inquiring the laser radar reflection intensity of points in the total elements to be compensated; and obtaining the original characteristics of the voxel to be complemented according to the coordinates of the points in the voxel to be complemented and the average value of the reflection intensity of the laser radar.

In one embodiment, the performing, by the processor when executing the computer program, a result correction on the intermediate target detection result according to the obtained supplemented point cloud data to obtain a final target detection result includes: the point cloud data of the area corresponding to the intermediate target detection result is divided again to obtain new voxels; and extracting features according to the new voxels, and correcting the result according to the extracted features to obtain a final target detection result.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring point cloud data and a corresponding color image; fusing point cloud data and the corresponding color images to obtain fused data; completing the point cloud data according to the fusion data to obtain the completed point cloud data; carrying out target detection according to the fusion data to obtain an intermediate target detection result; acquiring supplemented point cloud data of an area corresponding to the intermediate target detection result from the supplemented point cloud data; and according to the acquired supplemented point cloud data, carrying out result correction on the intermediate target detection result to obtain a final target detection result.

In one embodiment, the intermediate target detection result obtained by the target detection according to the fusion data when the computer program is executed by the processor includes: extracting features of the fusion data, and performing target detection according to the extracted features to obtain a preliminary target detection result and a measurement index corresponding to the preliminary target detection result; sorting the preliminary target detection results according to the corresponding weighing indexes of the preliminary target detection results; and carrying out non-maximum suppression processing on the measurement indexes corresponding to the sorted primary target detection results to obtain an intermediate target detection result.

In one embodiment, fusing point cloud data and a corresponding color image to obtain fused data, which is implemented when the computer program is executed by the processor, includes: performing multi-scale processing on the color image to obtain multi-scale features; dividing the point cloud data to obtain voxels to be supplemented, and recording the original characteristics of the voxels to be supplemented; and splicing the multi-scale features into the corresponding original features to form spliced voxel data, and taking the spliced voxel data as fusion data.

In one embodiment, the dividing of point cloud data into voxels to be complemented, implemented when the computer program is executed by the processor, includes: dividing the point cloud data to obtain initial voxels, and marking the initial voxels comprising points in at least one point cloud data; and acquiring initial voxels with the distance to the marked initial voxels smaller than a preset value as voxels to be completed.

In one embodiment, the stitching of the multi-scale features into corresponding original features to form stitched voxel data, implemented when the computer program is executed by the processor, includes: acquiring a coordinate projection matrix between the color image and the point cloud data and an initial coordinate of a central point of a voxel to be complemented; calculating an initial coordinate of the central point and a conversion coordinate in the color image according to the coordinate projection matrix; acquiring multi-scale characteristics of pixels corresponding to the conversion coordinates; and splicing the obtained multi-scale features into the corresponding original features of the voxels to be complemented.

In one embodiment, recording raw features of voxels to be complemented, implemented by a computer program when executed by a processor, comprises: acquiring coordinates of points in voxels to be complemented; inquiring the laser radar reflection intensity of points in the total elements to be compensated; and obtaining the original characteristics of the voxel to be complemented according to the coordinates of the points in the voxel to be complemented and the average value of the reflection intensity of the laser radar.

In one embodiment, the performing, by the processor, a result correction on the intermediate target detection result according to the acquired complemented point cloud data to obtain a final target detection result includes: the point cloud data of the area corresponding to the intermediate target detection result is divided again to obtain new voxels; and extracting features according to the new voxels, and correcting the result according to the extracted features to obtain a final target detection result.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of target detection, the method comprising:

acquiring point cloud data and a corresponding color image;

2. The method according to claim 1, wherein the performing target detection according to the fused data to obtain an intermediate target detection result comprises:

3. The method according to claim 1 or 2, wherein the fusing the point cloud data and the corresponding color image to obtain fused data comprises:

4. The method of claim 3, wherein the dividing the point cloud data into voxels to be complemented comprises:

5. The method according to claim 3, wherein the stitching the multi-scale features into the corresponding original features to form stitched voxel data comprises:

6. The method according to claim 3, wherein the recording of the raw features of the voxel to be complemented comprises:

acquiring coordinates of points in the voxel to be complemented;

7. The method according to claim 1 or 2, wherein the performing result correction on the intermediate target detection result according to the obtained supplemented point cloud data to obtain a final target detection result comprises:

and extracting features according to the new voxels, and correcting a target detection result according to the extracted features to obtain a final target detection result.

8. An object detection apparatus, characterized in that the apparatus comprises:

and the target detection result correction module is used for correcting the result of the intermediate target detection result according to the acquired supplemented point cloud data to obtain a final target detection result.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.