CN116883964A

CN116883964A - Target object detection method, device, equipment and storage medium

Info

Publication number: CN116883964A
Application number: CN202310756869.8A
Authority: CN
Inventors: 刘谦
Original assignee: Shanghai Yunji Yuedong Intelligent Technology Development Co ltd
Current assignee: Shanghai Yunji Yuedong Intelligent Technology Development Co ltd
Priority date: 2023-06-25
Filing date: 2023-06-25
Publication date: 2023-10-13

Abstract

The application provides a target object detection method, device, equipment and storage medium, and relates to the technical field of data processing. The method comprises the following steps: acquiring an image set and acquiring point cloud data acquired by a laser radar; the image set comprises a plurality of images; the point cloud data comprises a plurality of points, and the points have characteristic data; determining an image corresponding to the point from the image set; the image corresponding to the point is an image of a projection point corresponding to the point in a preset area range, and the projection point is a position point of the point projected on the image; determining a first feature vector of the point according to the feature data of the point and the image corresponding to the point; the first feature vector characterizes data obtained by fusing the points and the multi-scale image features of the images corresponding to the points; and detecting the first feature vector of each point to obtain the information of the target object. The method reduces the data calculation amount in the subsequent detection, improves the detection efficiency and improves the detection precision.

Description

Target object detection method, device, equipment and storage medium

Technical Field

The present application relates to data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a target object.

Background

In an autopilot perception system, 3D object detection plays an important role, and its task is mainly to detect objects of interest in an application scene and determine their information such as position and category, so as to realize real autopilot.

In the prior art, a 3D target detection algorithm performs environment sensing by using a camera or a laser radar, respectively. The camera can provide rich texture information and color information, but the camera cannot directly acquire depth information, and accurate depth estimation on a target object is difficult to make, so that the final detection precision is low. Although the laser radar can provide accurate distance and position information, the laser radar lacks semantic information which can be acquired by a camera, and has more false detection, and the detection result is not satisfactory. The existing fusion scheme based on the camera and the laser radar cannot effectively solve the problem of continuity between the camera and the laser radar when data fusion is carried out, so that the data calculation amount is too large, the time consumption is too long, and the detection precision is not high enough.

Therefore, a new method for detecting a target object is needed to effectively solve the problem of continuity between different mode data of the camera and the laser radar, so as to improve the detection precision and the detection efficiency.

Disclosure of Invention

The application provides a target object detection method, device, equipment and storage medium, which are used for solving the problem of how to improve the detection precision and detection efficiency of a target object.

In a first aspect, the present application provides a method for detecting a target object, the method comprising:

acquiring an image set and acquiring point cloud data acquired by a laser radar; the image set comprises a plurality of images, wherein the image set is a set of images obtained by shooting a target scene by cameras at a plurality of different positions on a vehicle, and the target scene comprises a target object; the point cloud data are data obtained by scanning the target scene by a laser radar on the vehicle, wherein the point cloud data comprise a plurality of points, and the points have characteristic data;

determining an image corresponding to the point from the image set; the image corresponding to the point is an image of a projection point corresponding to the point in a preset area range, and the projection point is a position point of the point projected on the image;

determining a first feature vector of the point according to the feature data of the point and the image corresponding to the point; the first feature vector characterizes data obtained by fusing the points and the multi-scale image features of the image corresponding to the points;

And detecting the first feature vector of each point to obtain the information of the target object.

Optionally, determining the first feature vector of the point according to the feature data of the point and the image corresponding to the point includes:

processing the image corresponding to the point to obtain a second feature vector corresponding to the point; the second feature vector characterizes projection points corresponding to the points, and the average value of the multi-scale image features in the image corresponding to the points;

and performing splicing processing on the characteristic data of the points and the second characteristic vectors corresponding to the points to obtain first characteristic vectors of the points.

Optionally, processing the image corresponding to the point to obtain a second feature vector corresponding to the point, including:

carrying out normalization processing on projection points of the points on the image corresponding to the points to obtain normalized projection points corresponding to the points;

performing feature extraction processing on the image corresponding to the point to obtain multi-scale image features of the image corresponding to the point;

and determining a second feature vector corresponding to the point according to the two-dimensional coordinates of the normalized projection point corresponding to the point and the multi-scale image feature of the image corresponding to the point.

Optionally, determining the second feature vector corresponding to the point according to the two-dimensional coordinates of the normalized projection point corresponding to the point and the multi-scale image feature of the image corresponding to the point includes:

determining the image characteristics of the normalized projection points corresponding to the points in the images corresponding to the points with different scales by using an interpolation function based on the two-dimensional coordinates of the normalized projection points corresponding to the points and the multi-scale image characteristics of the images corresponding to the points;

and carrying out mean processing on the image features of the normalized projection points corresponding to the points in the images corresponding to the points with different scales, and determining the image features after mean processing as second feature vectors corresponding to the points.

Optionally, determining the image corresponding to the point from the image set includes:

acquiring two-dimensional coordinates of projection points of the points on each image according to the characteristic data of the points and a preset transformation matrix of a laser radar camera corresponding to each image;

and determining a corresponding image of the two-dimensional coordinates of the projection point in a preset range as an image corresponding to the point.

Optionally, detecting the first feature vector of each point to obtain information of the target object, including:

Inputting the first feature vector of each point into a preset point cloud detection network model, and outputting the information of the target object; the preset point cloud detection network model is a pre-trained model for determining information of the target object.

Optionally, the information of the target object includes one or more of: rectangular frame of target object, category of target object, position of target object, size of target object, attitude information of target object, moving speed of target object, and number information of target object.

In a second aspect, the present application provides a detection apparatus for a target object, the apparatus comprising:

the acquisition unit is used for acquiring an image set and acquiring point cloud data acquired by the laser radar; the image set comprises a plurality of images, wherein the image set is a set of images obtained by shooting a target scene by cameras at a plurality of different positions on a vehicle, and the target scene comprises a target object; the point cloud data are data obtained by scanning the target scene by a laser radar on the vehicle, wherein the point cloud data comprise a plurality of points, and the points have characteristic data;

The matching unit is used for determining an image corresponding to the point from the image set; the image corresponding to the point is an image of a projection point corresponding to the point in a preset area range, and the projection point is a position point of the point projected on the image;

the processing unit is used for determining a first feature vector of the point according to the feature data of the point and the image corresponding to the point; the first feature vector characterizes data obtained by fusing the points and the multi-scale image features of the image corresponding to the points;

and the detection unit is used for carrying out detection processing on the first feature vector of each point to obtain the information of the target object.

Optionally, the processing unit includes a processing module and a splicing module;

the processing module is used for processing the image corresponding to the point to obtain a second feature vector corresponding to the point; the second feature vector characterizes projection points corresponding to the points, and the average value of the multi-scale image features in the image corresponding to the points;

and the splicing module is used for carrying out splicing processing on the characteristic data of the point and the second characteristic vector corresponding to the point to obtain a first characteristic vector of the point.

Optionally, the processing module includes a processing sub-module, a feature extraction module, and a determination module;

the processing submodule is used for carrying out normalization processing on the projection points of the points on the image corresponding to the points to obtain normalized projection points corresponding to the points;

the feature extraction module is used for carrying out feature extraction processing on the image corresponding to the point to obtain multi-scale image features of the image corresponding to the point;

the determining module is configured to determine a second feature vector corresponding to the point according to the two-dimensional coordinates of the normalized projection point corresponding to the point and the multi-scale image feature of the image corresponding to the point.

Optionally, the determining module includes a first determining module and a second determining module;

the first determining module is configured to determine, by using an interpolation function, an image feature of the normalized projection point corresponding to the point in the image corresponding to each point of different scales based on the two-dimensional coordinates of the normalized projection point corresponding to the point and the multi-scale image feature of the image corresponding to the point;

the second determining module is configured to perform mean processing on image features of the normalized projection points corresponding to the points in the images corresponding to the points with different scales, and determine the image features after mean processing as second feature vectors corresponding to the points.

Optionally, the matching unit includes a calculation module and a determination module;

the calculation module is used for obtaining the two-dimensional coordinates of the projection point of the point on each image according to the characteristic data of the point and the preset transformation matrix of the laser radar camera corresponding to each image.

The judging module is used for determining the corresponding image of the two-dimensional coordinates of the projection point in a preset range as the image corresponding to the point.

Optionally, the detection unit is specifically configured to input a first feature vector of each point into a preset point cloud detection network model, and output information of the target object; the preset point cloud detection network model is a pre-trained model for determining information of the target object.

In a third aspect, the present application provides an electronic device comprising: a processor, and a memory communicatively coupled to the processor;

The memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement the method of any one of the preceding claims.

In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, are adapted to carry out the method of any preceding claim.

In a fifth aspect, the application provides a computer program product comprising a computer program for implementing a method as in any of the preceding claims when executed by a processor.

The application provides a target object detection method, a target object detection device, target object detection equipment and a storage medium, wherein the target object detection method comprises the following steps: acquiring an image set and acquiring point cloud data acquired by a laser radar; the image set comprises a plurality of images, wherein the image set is a set of images obtained by shooting a target scene by cameras at a plurality of different positions on a vehicle, and the target scene comprises a target object; the point cloud data are data obtained by scanning the target scene by a laser radar on the vehicle, wherein the point cloud data comprise a plurality of points, and the points have characteristic data; determining an image corresponding to the point from the image set; the image corresponding to the point is an image of a projection point corresponding to the point in a preset area range, and the projection point is a position point of the point projected on the image; determining a first feature vector of the point according to the feature data of the point and the image corresponding to the point; the first feature vector characterizes data obtained by fusing the points and the multi-scale image features of the image corresponding to the points; and detecting the first feature vector of each point to obtain the information of the target object. On the one hand, the processed image data comprise images acquired by a plurality of cameras with different directions, the range is larger, the included image information is more, the extracted image features are more, the images in the image set are subjected to multi-scale feature extraction processing, and more data bases are provided for the detection of subsequent target objects; on the other hand, the method carries out fusion processing on the image set acquired by the camera and the point cloud data acquired by the laser radar, and detects the target object according to the data after the fusion processing, so that the method of fusion in advance and quickly and before reduces the data calculation amount in the subsequent detection, improves the detection efficiency, stably and effectively solves the problem of continuity between two inaudible mode data of the camera and the laser radar, and improves the detection precision.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1 is a flow chart of a method for detecting a target object according to an embodiment of the present application;

fig. 2 is a schematic diagram of image distribution acquired by a camera on a vehicle according to an embodiment of the present application;

FIG. 3 is a block diagram of an architecture for implementing the scheme of the present application according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a target object detection device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a detection device for a target object according to another embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the current implementation, the main idea is to expand the features of the 2D (two-dimensional) image acquired by the camera into the 3D (three-dimensional) space, mainly including 2D-3D depth prediction-based schemes and 3D-2D lidar to camera (lidar to camera) projection sampling-based schemes.

The 2D-3D is based on a depth prediction scheme, for example, bevDepth is to utilize LSS to carry out depth estimation under a camera coordinate system on an image, and utilize lidar information to monitor the depth estimation, then convert the obtained depth distribution under the camera coordinate system into the lidar coordinate system through camera-to-lidar (camera to lidar transform), and finally finish 3D target detection by using the existing lidar point cloud 3D detection algorithm. According to the scheme, the image information acquired by the camera is mainly utilized, the prediction error of the object position is large, and the detection accuracy mAP is relatively low.

The 3D-2D is based on a projection sampling scheme from a laser radar to a camera, for example, DETR3D, and is to obtain a 3D feature query by sampling image features from a 3D reference point (3D reference point) through projection from the laser radar to the camera (lidar 2 camera), implicitly realize object depth estimation, and then realize 3D target detection through a classification head and a regression head. According to the scheme, laser radar information is mainly utilized, the error of depth estimation is large due to the loss of image depth information caused by perspective projection, and in addition, processing of color texture information is lacking, so that not only is the detection position deviation large, but also more false recognition is caused.

In the two schemes, although two sensors which can be complemented are used, namely a camera and a laser radar, the camera can provide rich texture information and color information, and the laser radar can provide accurate distance and position information, however, the data information provided by the camera and the laser radar is not effectively utilized, so that the whole data calculation amount is too large, the time is too long, the high-efficiency requirement of automatic driving is not met completely, and the detection precision is not high enough.

In order to solve the problems, the application provides a method for detecting a target object, which comprises the steps of acquiring a plurality of images through a plurality of cameras positioned at different positions on an automobile, acquiring laser radar point cloud data through a laser radar on the automobile, extracting multi-scale image features through an image feature extractor, projecting each point in the point cloud data onto the image through laser radar-to-camera conversion (lidar 2camera transform), obtaining corresponding multi-scale image features, processing the corresponding multi-scale image features, splicing the corresponding multi-scale image features with the point cloud data to obtain enhanced point cloud data, and finally carrying out 3D target detection on the enhanced point cloud data through a point cloud detection network model. Because the number of cameras is multiple, the acquired images are comprehensive enough, and the point cloud data and the images are fused, the target detection is performed, so that the data calculation amount during detection is reduced, the detection efficiency is improved, and the detection precision is also improved.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a flow chart of a method for detecting a target object according to an embodiment of the present application. The execution main body of the embodiment of the application can be a detection device of a target object, the detection device can be located on electronic equipment, the electronic equipment can be a mobile terminal such as a mobile phone, a tablet, a computer and the like, and the detection device can also be located on a vehicle such as an automobile and the like. The embodiment of the application will be described in detail with reference to a detection device in which an execution subject is a target object, and can be applied to a vehicle.

As shown in fig. 1, the method for detecting a target object provided in this embodiment includes:

s101, acquiring an image set and acquiring point cloud data acquired by a laser radar; the image set comprises a plurality of images, wherein the image set is a set of images obtained by shooting a target scene by cameras at a plurality of different positions on a vehicle, and the target scene comprises a target object; the point cloud data are data obtained by scanning a target scene by a laser radar on a vehicle, and the point cloud data comprise a plurality of points which have characteristic data.

For example, cameras may be provided at a plurality of different positions on the vehicle, respectively, according to actual detection range requirements. The detection device of the target object acquires images obtained by shooting a target scene by a plurality of cameras at different positions so as to obtain an image set comprising a plurality of images.

The application does not limit the number and the distribution position of the cameras. Exemplary, fig. 2 is a schematic diagram of image distribution acquired by a camera on a vehicle according to an embodiment of the present application. As shown in fig. 2, at least 6 cameras may be provided on the vehicle for acquiring images of the surroundings of the vehicle, respectively, to obtain an image set including at least the images 1 to 6.

The vehicle is further provided with a laser radar, when a plurality of cameras shoot images of target scenes with different visual angles from different directions, the laser radar scans the target scenes, the target scenes have target objects, and the laser radar can acquire point cloud data of the target objects when shooting the target scenes. The point cloud data comprises a plurality of points scanned by the laser radar, and each point has corresponding characteristic data. For example, the feature data may include three-dimensional coordinate information of points, color information, reflection intensity information, echo number information, and the like, and the present application is not limited. The target object detection device is used for detecting the target object, so that not only is an image set acquired, but also point cloud data acquired by a laser radar are acquired.

102. Determining an image corresponding to the point from the image set; the image corresponding to the point is an image of a projection point corresponding to the point in a preset area range, and the projection point is a position point of the point projected on the image.

Illustratively, after acquiring the image set and the point cloud data, the detection device of the target object of the present application determines an image corresponding to each point from the image set including a plurality of images. The image corresponding to the point is an image of a projection point corresponding to the point in a preset area range, and the projection point is a position point of the point projected on the image. Therefore, each point can be projected on the acquired multiple images in turn, the position of the projected point of the point on each image is determined, and the corresponding image of the projected point in the preset area range is determined as the image corresponding to the point. If a point can be projected on a plurality of images, an image corresponding to the point where the position of the projected point is closest to the middle position of the image can be selected.

In one example, determining an image corresponding to a point from the image set may also include:

s1021, obtaining the two-dimensional coordinates of the projection points of the points on each image according to the characteristic data of the points and the preset transformation matrix of the laser radar camera corresponding to each image.

S1022, determining a corresponding image of the two-dimensional coordinates of the projection point in a preset range, wherein the corresponding image is the image of the point.

For example, when determining the image corresponding to the point, the two-dimensional coordinates of the projection point of the point on each image may be obtained according to the feature data of the point and the preset transformation matrix of the laser radar camera corresponding to each image, and then the corresponding image of the two-dimensional coordinates of the projection point in the preset range may be determined as the image corresponding to the point. For example, taking a certain point in the point cloud data as an example, according to the characteristic data of the point and the preset transformation matrix 1 of the laser radar camera corresponding to the image 1, calculating the two-dimensional coordinate of the projection point of the point on the image 1, determining whether the two-dimensional coordinate of the projection point of the point on the image 1 is in a preset range, and if so, determining that the image 1 is the image corresponding to the point; if the two-dimensional coordinates of the projection point of the point on the image 2 are not in the preset range, the two-dimensional coordinates of the projection point of the point on the image 2 are calculated continuously according to the characteristic data of the point and the preset transformation matrix 2 from the laser radar corresponding to the image 2 to the camera, whether the two-dimensional coordinates of the projection point of the point on the image 2 are in the preset range is determined, and the like until the image corresponding to the point is determined from the image set. The processing of each of the other points is similar to that described above until the image corresponding to each point is determined.

S103, determining a first feature vector of the point according to the feature data of the point and the image corresponding to the point; the first feature vector characterizes data obtained by fusing the points and the multi-scale image features of the image corresponding to the points.

After determining the image corresponding to each point, the detection device of the target object of the present application performs feature fusion according to the feature data of each point and the image corresponding to the point, so as to determine a first feature vector of each point, where the first feature vector characterizes the data after the fusion of the multi-scale image features of the image corresponding to the point and the point.

The application does not limit how to determine the first feature vector of the point according to the feature data of the point and the image corresponding to the point. In one example, determining the first feature vector for the point based on the feature data for the point and the image corresponding to the point may include:

s1031, processing the point-corresponding image to obtain a point-corresponding second feature vector; the second feature vector characterizes projection points corresponding to the points, and the average value of the multi-scale image features in the image corresponding to the points.

S1032, performing splicing processing on the point characteristic data and the second characteristic vector corresponding to the point to obtain a first characteristic vector of the point.

For example, the point-corresponding image may be processed first to obtain the second feature vector corresponding to the point, that is, to determine the average value of the multi-scale image features of the projection point corresponding to the point in the point-corresponding image. For example, the point-corresponding image is processed, a multi-scale image of the point-corresponding image is obtained, then the image characteristics of the projection points corresponding to the points under different scales are determined, and then the multi-scale image characteristics are averaged to obtain the second characteristic vector. Then, the feature data of the point and the second feature vector corresponding to the point are spliced, for example, the feature data of the point is spliced in front of or behind the second feature vector corresponding to the point, so as to obtain the first feature vector of the point.

Illustratively, processing the image corresponding to the point to obtain a second feature vector corresponding to the point may include:

s1, carrying out normalization processing on projection points of the points on the image corresponding to the points to obtain normalized projection points corresponding to the points.

S2, performing feature extraction processing on the image corresponding to the point to obtain multi-scale image features of the image corresponding to the point.

S3, determining a second feature vector corresponding to the point according to the two-dimensional coordinates of the normalized projection point corresponding to the point and the multi-scale image feature of the image corresponding to the point.

For example, when processing the image corresponding to the point, the projection point of the point on the image corresponding to the point may be determined first, and then normalization processing is performed on the projection point of the point on the image corresponding to the point, so as to obtain the normalized projection point corresponding to the point. In practical application, the two-dimensional coordinates of the projection points of each point on the image corresponding to the point can be determined directly according to the characteristic data of each point and the preset transformation matrix of the laser radar camera corresponding to the image corresponding to the point, and then the two-dimensional coordinates of the projection points are normalized to obtain the normalized two-dimensional coordinates of the projection points of each point on the image corresponding to the point. Meanwhile, the images corresponding to the points are input into an image feature extractor, and feature extraction processing is carried out on the images corresponding to each point so as to obtain multi-scale image features of the images corresponding to each point. And then determining a second feature vector corresponding to the point according to the two-dimensional coordinates of the normalized projection point corresponding to the point and the multi-scale image feature of the image corresponding to the point.

The image feature extractor may include a backbone layer (backbone network layer) and a neg layer (connection layer), where the backbone network layer is used to extract features, and there are Resnet, VGG, and the like in common use, and the connection layer is placed between the backbone layer and the head layer, so as to further improve diversity and robustness of features.

Illustratively, determining the second feature vector corresponding to the point according to the two-dimensional coordinates of the normalized projection point corresponding to the point and the multi-scale image feature of the image corresponding to the point may include:

s31, determining the image characteristics of the normalized projection points corresponding to the points in the images corresponding to the points with different scales by utilizing an interpolation function based on the two-dimensional coordinates of the normalized projection points corresponding to the points and the multi-scale image characteristics of the images corresponding to the points.

S32, carrying out mean processing on the image features of the normalized projection points corresponding to the points in the images corresponding to the points with different scales, and determining the image features after mean processing as second feature vectors corresponding to the points.

For example, the f.grid_sample interpolation function may be used to calculate image features for different scale projection points. Namely: calculating the image characteristics of the normalized projection points corresponding to each point in the images corresponding to the points with different scales according to the obtained two-dimensional coordinates of the normalized projection points corresponding to each point and the multi-scale image characteristics of the images corresponding to the points by using an interpolation function, carrying out mean value processing on the image characteristics, and determining the image characteristics after mean value processing of each point as a second characteristic vector corresponding to the point. For example, the image feature after each point mean processing may be a 256-dimensional feature vector, which is not limited in the present application.

After the second feature vector corresponding to the point is determined, the feature data of the point and the second feature vector corresponding to the point are spliced, so that the first feature vector of the point can be obtained. For example, if the feature data of the point includes three-dimensional coordinate information x, y, z and reflection intensity information i, that is, if the feature data of the point is (x, y, z, i), the feature data of the point (x, y, z, i) is spliced before or after the 256-dimensional second feature vector, a 260-dimensional feature vector can be obtained, and the 260-dimensional feature vector is the first feature vector of the point, which represents the data after the point and the multi-scale image feature of the image corresponding to the point are fused.

S104, detecting the first feature vector of each point to obtain the information of the target object.

Illustratively, after the first feature vector of each point is obtained, the detection device of the target object of the present application performs detection processing on the first feature vector of each point to obtain the information of the target object. For example, the first feature vector of each point may be subjected to voxellization (voxellization) and then processed by an encoder (encoder), and then the data processed by the encoder is input into a region generation network (RegionProposal Network, RPN) and then processed, and finally detected by a 3D target detection head, thereby obtaining information of the target object.

Fig. 3 is an architecture block diagram for implementing the solution of the present application according to an embodiment of the present application. As shown in fig. 3, an encoder (Encorder) in image processing processes images 1 to 6 acquired by a camera to acquire multi-scale image features; after the point cloud data acquired by the laser radar and the images 1 to 6 acquired by the camera are subjected to the fusion processing described in the embodiment of fig. 1 of the present application, a first feature vector of data obtained by fusing the multi-scale image features of the images with the characteristic points corresponding to the points is obtained, and the first feature vector is sequentially input into a Voxelization model (Voxelization), an encoder (encoder), a region generation network (RegionProposal Network, RPN) and a 3D target Detection Head (3D Detection Head) as shown in fig. 3, and then information of a target object is output to complete 3D target Detection.

Wherein the information of the target object may include one or more of the following: rectangular frame of target object, category of target object, position of target object, size of target object, attitude information of target object, moving speed of target object, and number information of target object.

For example, after the first feature vector of each point is detected, a target object may be circled on the image with a rectangular frame, and information such as category, position, size, gesture information, movement speed, number information and the like of the target object may be marked. Wherein, the category of the target object can be pedestrians, automobiles, obstacles and the like; the position can be represented by the distance and the azimuth of the target object to the own vehicle; the size can be expressed by the length, width, height and the like of the target object; the gesture information may be a rotation angle, an orientation, etc. of the target object; if the target object is moving, the moving speed of the target object can be marked; if there are a plurality of target objects, the total number of the collected target objects can be output. In practical application, the information of one or more target objects can be output according to practical requirements.

For example, the detecting the first feature vector of each point to obtain the information of the target object may also include: inputting the first feature vector of each point into a preset point cloud detection network model, and outputting the information of a target object; the preset point cloud detection network model is a pre-trained model for determining information of a target object.

In an exemplary embodiment, the detection device for a target object of the present application may input the first feature vector of each point into a pre-trained model for determining information of the target object, where the model performs detection processing on the first feature vector of each point, and directly outputs the information of the target object.

The method for detecting the target object provided by the embodiment of the application comprises the following steps: acquiring an image set and acquiring point cloud data acquired by a laser radar; the image set comprises a plurality of images, wherein the image set is a set of images obtained by shooting a target scene by cameras at a plurality of different positions on a vehicle, and the target scene comprises a target object; the point cloud data are data obtained by scanning a target scene by a laser radar on a vehicle, wherein the point cloud data comprise a plurality of points, and the points have characteristic data; determining an image corresponding to the point from the image set; the image corresponding to the point is an image of a projection point corresponding to the point in a preset area range, and the projection point is a position point of the point projected on the image; determining a first feature vector of the point according to the feature data of the point and the image corresponding to the point; the first feature vector characterizes data obtained by fusing the points and the multi-scale image features of the image corresponding to the points; and detecting the first feature vector of each point to obtain the information of the target object. According to the scheme, on one hand, the processed image data comprise images acquired through a plurality of cameras with different directions, the range is larger, the contained image information is more, the extracted image features are more, the images in the image set are subjected to multi-scale feature extraction processing, and more data bases are provided for the detection of subsequent target objects; on the other hand, the method carries out fusion processing on the multi-scale image characteristics and the point cloud data acquired by the laser radar, and detects the target object according to the data after the fusion processing, so that the method of fusion in advance and rapidly and before reduces the data calculation amount in the subsequent detection, improves the detection efficiency, stably and effectively solves the problem of continuity between two inaudible mode data of the camera and the laser radar, and improves the detection precision.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Fig. 4 is a schematic structural diagram of a target object detection device according to an embodiment of the present application. As shown in fig. 4, a target object detection apparatus 40 provided in an embodiment of the present application includes: an acquisition unit 401, a matching unit 402, a processing unit 403, and a detection unit 404.

The acquiring unit 401 is configured to acquire an image set and acquire point cloud data acquired by the laser radar; the image set comprises a plurality of images, wherein the image set is a set of images obtained by shooting a target scene by cameras at a plurality of different positions on a vehicle, and the target scene comprises a target object; the point cloud data are data obtained by scanning a target scene by a laser radar on a vehicle, and the point cloud data comprise a plurality of points which have characteristic data.

A matching unit 402, configured to determine an image corresponding to a point from the image set; the image corresponding to the point is an image of a projection point corresponding to the point in a preset area range, and the projection point is a position point of the point projected on the image.

A processing unit 403, configured to determine a first feature vector of the point according to the feature data of the point and the image corresponding to the point; the first feature vector characterizes data obtained by fusing the points and the multi-scale image features of the image corresponding to the points.

And the detection unit 404 is configured to perform detection processing on the first feature vectors of each point, so as to obtain information of the target object.

The device provided in this embodiment may be used to perform the method of the foregoing embodiment, and its implementation principle and technical effects are similar, and will not be described herein again.

Fig. 5 is a schematic structural diagram of a detection device for a target object according to another embodiment of the present application. As shown in fig. 5, a target object detection apparatus 50 provided in an embodiment of the present application includes: an acquisition unit 501, a matching unit 502, a processing unit 503, and a detection unit 504.

The acquiring unit 501 is configured to acquire an image set and acquire point cloud data acquired by a laser radar; the image set comprises a plurality of images, wherein the image set is a set of images obtained by shooting a target scene by cameras at a plurality of different positions on a vehicle, and the target scene comprises a target object; the point cloud data are data obtained by scanning a target scene by a laser radar on a vehicle, and the point cloud data comprise a plurality of points which have characteristic data.

A matching unit 502, configured to determine an image corresponding to a point from the image set; the image corresponding to the point is an image of a projection point corresponding to the point in a preset area range, and the projection point is a position point of the point projected on the image.

A processing unit 503, configured to determine a first feature vector of the point according to the feature data of the point and the image corresponding to the point; the first feature vector characterizes data obtained by fusing the points and the multi-scale image features of the image corresponding to the points.

And the detection unit 504 is configured to perform detection processing on the first feature vectors of each point, so as to obtain information of the target object.

In one example, the processing unit 503 includes a processing module 5031 and a stitching module 5032.

The processing module 5031 is configured to process the image corresponding to the point to obtain a second feature vector corresponding to the point; the second feature vector characterizes projection points corresponding to the points, and the average value of the multi-scale image features in the image corresponding to the points.

And the splicing module 5032 is used for carrying out splicing processing on the characteristic data of the point and the second characteristic vector corresponding to the point to obtain a first characteristic vector of the point.

In one example, the processing module 5031 includes a processing sub-module 50311, a feature extraction module 50312, and a determination module 50313.

The processing sub-module 50311 is configured to normalize the projection points of the point on the image corresponding to the point, to obtain normalized projection points corresponding to the point.

The feature extraction module 50312 performs feature extraction processing on the point-corresponding image to obtain multi-scale image features of the point-corresponding image.

The determining module 50313 is configured to determine a second feature vector corresponding to the point according to the two-dimensional coordinates of the normalized projection point corresponding to the point and the multi-scale image feature of the image corresponding to the point.

In one example, the determination module 50313 includes a first determination module 503131 and a second determination module 503132.

The first determining module 503131 is configured to determine, by using an interpolation function, an image feature of the normalized projection point corresponding to the point in the image corresponding to each of the different scales based on the two-dimensional coordinates of the normalized projection point corresponding to the point and the multi-scale image feature of the image corresponding to the point.

The second determining module 503132 is configured to perform mean processing on image features of the normalized projection points corresponding to the points in the images corresponding to the points in different scales, and determine the image features after the mean processing as second feature vectors corresponding to the points.

In one example, the matching unit 502 includes a calculation module 5021 and a determination module 5022.

The calculating module 5021 is configured to obtain two-dimensional coordinates of a projection point of the point on each image according to the feature data of the point and a preset transformation matrix of the laser radar camera corresponding to each image.

The determining module 5022 is configured to determine a corresponding image of the two-dimensional coordinates of the projection point within a preset range, as an image corresponding to the point.

In one example, the detecting unit 504 is specifically configured to input the first feature vector of each point into a preset point cloud detection network model, and output information of the target object; the preset point cloud detection network model is a pre-trained model for determining information of a target object.

In one example, the information of the target object includes one or more of the following: rectangular frame of target object, category of target object, position of target object, size of target object, attitude information of target object, moving speed of target object, and number information of target object.

It should be noted that, it should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. The functions of the above data processing module may be called and executed by a processing element of the above apparatus, and may be stored in a memory of the above apparatus in the form of program codes. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element here may be an integrated circuit with signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 6, the electronic device 60 includes: a processor 601, and a memory 602 communicatively coupled to the processor.

Wherein the memory 602 stores computer-executable instructions; processor 601 executes computer-executable instructions stored in memory 602 to implement a method as in any of the preceding claims.

In the specific implementation of the electronic device described above, it should be understood that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The method disclosed in connection with the embodiments of the present application may be directly embodied as a hardware processor executing or may be executed by a combination of hardware and software modules in the processor.

Embodiments of the present application also provide a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, are adapted to carry out a method as any one of the preceding claims.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by computer instruction related hardware. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

Embodiments of the present application also provide a computer program product comprising a computer program for implementing a method as in any of the preceding claims when executed by a processor.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of detecting a target object, the method comprising:

2. The method of claim 1, wherein determining a first feature vector for the point based on the feature data for the point and the image corresponding to the point comprises:

3. The method of claim 2, wherein processing the image corresponding to the point to obtain the second feature vector corresponding to the point comprises:

4. A method according to claim 3, wherein determining the second feature vector for the point based on the two-dimensional coordinates of a normalized projected point for the point and the multi-scale image feature of the image for the point comprises:

5. The method of claim 1, wherein determining the image corresponding to the point from the set of images comprises:

6. The method according to any one of claims 1-5, wherein detecting the first feature vector of each point to obtain the information of the target object includes:

7. The method of any one of claims 1-5, wherein the information of the target object includes one or more of: rectangular frame of target object, category of target object, position of target object, size of target object, attitude information of target object, moving speed of target object, and number information of target object.

8. A device for detecting a target object, the device comprising:

9. An electronic device, the electronic device comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1-7.

10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1-7.