CN111860493B

CN111860493B - Target detection method and device based on point cloud data

Info

Publication number: CN111860493B
Application number: CN202010535697.8A
Authority: CN
Inventors: 李智超; 王乃岩
Original assignee: Beijing Tusimple Technology Co Ltd
Current assignee: Beijing Original Generation Technology Co ltd
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2024-02-09
Anticipated expiration: 2040-06-12
Also published as: CN111860493A

Abstract

The application provides a target detection method and device based on point cloud data, and relates to the technical field of target detection. The method comprises the following steps: and obtaining original point cloud data and each initial target detection frame output by the initial target detection network, and obtaining information of each initial target detection frame. Extracting point clouds in a preset frame range around each initial target detection frame from the original point cloud data; generating input data of a neural network according to points in original point cloud data in each initial target detection frame, points in point cloud within a preset frame range outside the initial target detection frame and information of each initial target detection frame; and inputting the input data into a pre-trained neural network for processing, and obtaining a detection frame result and a target class result corresponding to each initial target detection frame according to the output of the pre-trained neural network. The embodiment of the application can improve the performance of target detection.

Description

Target detection method and device based on point cloud data

Technical Field

The present disclosure relates to the field of target detection technologies, and in particular, to a target detection method and device based on point cloud data.

Background

Currently, with the development of autopilot technology and mobile robot technology, lidar devices have been widely used in autopilot vehicles and mobile robots. In order to ensure the normal operation of the autonomous vehicle and the mobile robot, it is generally necessary to collect point cloud data of the surrounding environment through a laser radar to help the autonomous vehicle and the mobile robot perceive the surrounding environment thereof. Current autonomous vehicles and mobile robots generally need to perform object detection on various objects in their surroundings in order to better perceive their surroundings. One of the ways is through point cloud data acquired by a lidar.

Currently, there are prior art techniques for performing object detection by inputting Point cloud data into an object detection network, such as a sparse embedded convolution object detection network (Sparsely Embedded Convolutional Detection, abbreviated as SECOND), a novel Point cloud encoder and network (pointpilars), a Multi-View three-dimensional network (Multi-View 3D, abbreviated as MV 3D), a Point Voxel-integrated network (Point Voxel-RCNN, abbreviated as PV-RCNN), or a novel three-dimensional object detection network (Point RCNN). However, the object detection performed by these existing object detection networks has a problem of poor detection performance. In order to improve the performance of target detection, the application provides a target detection scheme based on point cloud data.

Disclosure of Invention

The embodiment of the application provides a target detection method and device based on point cloud data, so as to improve the performance of target detection.

In order to achieve the above purpose, the embodiments of the present application adopt the following technical solutions:

in a first aspect of an embodiment of the present application, a target detection method based on point cloud data is provided, including:

obtaining original point cloud data and each initial target detection frame output by an initial target detection network, and obtaining information of each initial target detection frame;

extracting point clouds in a preset frame range around each initial target detection frame from the original point cloud data;

generating input data of a neural network according to points in original point cloud data in each initial target detection frame, points in point cloud within a preset frame range outside the initial target detection frame and information of each initial target detection frame;

and inputting the input data into a pre-trained neural network for processing, and obtaining a detection frame result and a target class result corresponding to each initial target detection frame according to the output of the pre-trained neural network.

In a second aspect of the embodiments of the present application, there is provided a target detection device based on point cloud data, including:

The initial information obtaining unit is used for obtaining original point cloud data and each initial target detection frame output by the initial target detection network and obtaining information of each initial target detection frame;

the point cloud extraction unit is used for extracting point clouds in a preset frame range around each initial target detection frame from the original point cloud data;

the input data generating unit is used for generating input data of the neural network according to points in original point cloud data in each initial target detection frame, points in point cloud in a preset frame range outside the initial target detection frame and information of each initial target detection frame;

and the result generating unit is used for inputting the input data into a pre-trained neural network for processing, and obtaining a detection frame result and a target class result corresponding to each initial target detection frame according to the output of the pre-trained neural network.

In a third aspect of embodiments of the present application, a computer readable storage medium is provided, including a program or instructions, which when run on a computer, implement the method according to the first aspect.

In a fourth aspect of embodiments of the present application, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method as described in the first aspect above.

In a fifth aspect of embodiments of the present application, there is provided a chip system, including a processor coupled to a memory, where the memory stores program instructions, and the method according to the first aspect is implemented when the program instructions stored in the memory are executed by the processor.

In a sixth aspect of embodiments of the present application, there is provided a computer server comprising a memory, and one or more processors communicatively coupled to the memory;

the memory stores instructions executable by the one or more processors to cause the one or more processors to implement the method of the first aspect.

The embodiment of the application provides a target detection method and device based on point cloud data, which comprises the steps of firstly obtaining original point cloud data and each initial target detection frame output by an initial target detection network, and obtaining information of each initial target detection frame; in order to enable a neural network to obtain information of an initial target detection frame in subsequent processing, point clouds in a preset frame range around each initial target detection frame are extracted from original point cloud data, and input data of the neural network are generated according to points in original point cloud data in each initial target detection frame, points in point clouds in a preset frame range outside the initial target detection frame and information of each initial target detection frame. The input data is the data considering the initial target detection frame information, so that the input data is input into a pre-trained neural network for processing, and the detection frame result and the target class result corresponding to each initial target detection frame are obtained according to the output of the pre-trained neural network. Thus, the obtained detection frame result and the target class result are more accurate, and the target detection performance can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a flowchart of a target detection method based on point cloud data according to an embodiment of the present application;

fig. 2 is a flowchart two of a target detection method based on point cloud data according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a situation where the initial target detection frame is inaccurate in position in the embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a case where the initial target detection frame is not precisely sized in the embodiment of the present application;

FIG. 5 is a schematic diagram of an initial target detection frame, a preset frame range, and virtual points according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a target detection device based on point cloud data according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the present application described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In order to enable those skilled in the art to better understand the present application, some technical terms appearing in the embodiments of the present application are explained below:

movable object: the device is an object which can move, such as a vehicle, a mobile robot, an aircraft and the like, and various types of sensors, such as a laser radar, a camera and the like, can be carried on the movable object.

And (3) point cloud: the data of the surrounding environment collected by the laser radar is marked by sparse three-dimensional space points.

Frame (Frame): the sensor completes one-time observation of the received measurement data, for example, one frame of data of the camera is a picture, and one frame of data of the laser radar is a group of laser point clouds.

PointNet: the existing novel deep learning model for processing the point cloud data can finish the processing of the original point cloud data without encoding the original point cloud data.

PointNet++: an improved version of the neural network model of existing PointNet networks.

SECOND: sparsely Embedded Convolutional Detection, one of the existing approaches uses a sparse embedded convolutional target detection network.

PointPicloras: a novel point cloud encoder and a network exist.

MV3D: multi-View 3D, a Multi-View three-dimensional network exists.

PV-RCNN: point Voxel-RCNN, an existing Point Voxel integration network.

PointRCNN: a novel three-dimensional object detection network exists.

DGCNN: dynamic Graph Convolutional Neural Network, a dynamic graph convolutional neural network.

RSCNN: the Relation-Shape CNN is a convolution neural network based on the geometric Relation.

NMS: non Maximum Suppression, non-maxima suppression, as the name implies, suppresses elements that are not maxima, searching for local maxima.

In some embodiments of the present application, the term "vehicle" is to be construed broadly to include any movable object, including for example, aircraft, watercraft, spacecraft, automobiles, trucks, vans, semi-trailers, motorcycles, golf carts, off-road vehicles, warehouse transportation vehicles or agricultural vehicles, and vehicles traveling on rails, such as electric cars or trains, and other rail vehicles. The term "vehicle" in this application may generally include: power systems, sensor systems, control systems, peripherals, and computer systems. In other embodiments, the vehicle may include more, fewer, or different systems.

Wherein the power system is a system for providing power motion to a vehicle, comprising: engine/motor, transmission and wheels/tires, and energy unit.

The control system may include a combination of devices that control the vehicle and its components, such as steering units, throttle valves, braking units.

The peripheral device may be a device that allows the vehicle to interact with external sensors, other vehicles, external computing devices, and/or users, such as a wireless communication system, touch screen, microphone, and/or speaker.

Based on the above-described vehicle, a sensor system and an automatic driving control device are also provided in the automatic driving vehicle.

The sensor system may include a plurality of sensors for sensing information of the environment in which the vehicle is located, and one or more actuators that change the position and/or orientation of the sensors. The sensor system may include any combination of sensors such as global positioning system sensors, inertial measurement units, radio detection and ranging (RADAR) units, cameras, laser rangefinders, light detection and ranging (LIDAR) units, and/or acoustic sensors; the sensor system may also include a sensor (e.g., O ₂ Monitors, fuel gauges, engine thermometers, etc.).

The autopilot control arrangement may include a processor and memory having at least one machine executable instruction stored therein, the processor executing the at least one machine executable instruction to perform functions including a map engine, a positioning module, a perception module, a navigation or routing module, an autopilot module, and the like. The map engine and the positioning module are used for providing map information and positioning information. The sensing module is used for sensing things in the environment of the vehicle according to the information acquired by the sensor system and map information provided by the map engine. The navigation or path module is used for planning a driving path for the vehicle according to the processing results of the map engine, the positioning module and the sensing module. The automatic control module analyzes and converts decision information input of modules such as a navigation or path module and the like into control command output of a vehicle control system, and sends the control command to corresponding components in the vehicle control system through a vehicle-mounted network (such as a vehicle internal electronic network system realized by a CAN bus, a local area interconnection network, a multimedia orientation system transmission mode and the like) to realize automatic control of the vehicle; the automatic control module can also acquire information of each component in the vehicle through the vehicle-mounted network.

Currently, there are prior art techniques for performing object detection by inputting Point cloud data into various object detection networks, such as sparse embedded convolution object detection network (Sparsely Embedded Convolutional Detection, abbreviated as SECOND), novel Point cloud encoder and network (pointpixel), multi-View three-dimensional network (Multi-View 3D, abbreviated as MV 3D), point Voxel-integrated network (Point Voxel-RCNN), or a novel three-dimensional object detection network (Point RCNN). However, the above-mentioned existing object detection networks have the problem of poor detection performance, and the following two reasons are mainly included:

first, when SECOND, pointPillars, MV3D or PV-RCNN is adopted to detect a target, various codes (such as rasterization or bird's eye view projection) need to be performed on the point cloud data, so that the disordered point cloud becomes ordered, and the calculation amount in the target detection process is reduced. However, in the process of performing various codes, the loss of the position information of the original point cloud data is also caused, so that the obtained target detection frame is inaccurate in position, and the overall detection performance is reduced.

Secondly, when the Point RCNN is adopted for target detection, the predicted target detection frame size can be large or small at some sparse positions of the Point clouds (the size is larger or smaller and the corresponding Point clouds can be framed in), so that the positive and negative samples divided by each target detection frame are inaccurate under the condition that the size of the target detection frame cannot be clearly known by a network (such as PointNet) to ensure that the subsequent Non-maximum suppression (NMS) processing is inaccurate and the target detection performance is poor.

The embodiment of the application aims to provide a target detection scheme based on point cloud data so as to improve the performance of target detection.

As shown in fig. 1, an embodiment of the present application provides a target detection method based on point cloud data, including:

and step 101, obtaining original point cloud data and each initial target detection frame output by an initial target detection network, and obtaining information of each initial target detection frame.

And 102, extracting point clouds in a preset frame range around each initial target detection frame from the original point cloud data.

And 103, generating input data of the neural network according to the points in the original point cloud data in each initial target detection frame, the points in the point cloud in the range of the preset frame outside the initial target detection frame and the information of each initial target detection frame.

And 104, inputting the input data into a pre-trained neural network for processing, and obtaining a detection frame result and a target class result corresponding to each initial target detection frame according to the output of the pre-trained neural network.

In order to better understand the present application, embodiments of the present application will be described in more detail below with reference to the accompanying drawings, examples, and the like. It should be noted that, in the embodiment of the present application, the target detection may refer to sensing and detecting and identifying an object of interest (such as a vehicle, a pedestrian, an obstacle, etc., but not limited thereto) in the surrounding environment by using an autonomous vehicle, an intelligent robot, an unmanned plane, etc., and may refer to sensing and detecting and identifying an object of interest (such as a person, a vehicle, a cargo box, etc., but not limited thereto) in the monitored area by using a security and protection facility equipped with a radar. Of course, there are many scenarios for applying object detection, and it should be understood that scenarios for applying object detection may be applied to embodiments of the present application, which are not listed here.

In an embodiment of the present application, as shown in fig. 2, a target detection method based on point cloud data is provided, including:

step 201, obtaining original point cloud data and each initial target detection frame output by an initial target detection network, and obtaining information of each initial target detection frame.

Here, in an embodiment of the present application, the obtained raw point cloud data refers to raw point cloud data acquired by a radar (e.g., a lidar). In this embodiment, the original point cloud data are subsequently applied to perform target detection, and various encoding (such as rasterization or bird's eye view projection) is not required to be performed on the original point cloud data, so that the problem that the obtained initial target detection frame is inaccurate in position and the overall detection performance is reduced due to loss of position information of the original point cloud data in the process of performing various encoding is avoided.

In this embodiment of the present application, the initial target detection network may be some existing target detection networks, for example, some target detection networks that need to perform various encoding (such as rasterization or bird's-eye view projection) on the original point cloud data, such as SECOND, pointPillars, MV3D or PV-RCNN, where the specific working processes of these target detection networks belong to the prior art, and are not described herein again. It should be appreciated that the initial target detection frames output by these target detection networks may suffer from inaccurate positions. For example, in fig. 3, a set of point clouds 31 (such as the external contour point clouds of the vehicle) should ideally be framed by an accurate initial target detection frame, and substantially no points fall outside the initial target detection frame, however, the initial target detection frame 32 output by these target detection networks may be inaccurate in position as shown in fig. 3, resulting in a situation in which some points in the point clouds 31 are outside the initial target detection frame 32. One of the purposes of the target detection method based on the point cloud data provided in the embodiment of the application is to overcome the problem shown in fig. 3, so that the result position of the detection frame is more accurate.

In addition, in an embodiment of the present application, the initial target detection network may be some existing target detection networks, for example, some target detection networks that directly process the original point cloud data, such as a target detection network like a poiintrcnn, and specific working processes of these target detection networks all belong to the prior art, which is not described herein again. It should be noted that the initial target detection frames output by the target detection networks may have an unclear size, and the initial target detection frames may have larger or smaller sizes and may be used to frame corresponding point clouds, so that the network (such as the PointNet) that is processed later may not be able to clearly know the size of the initial target detection frame, and the positive and negative samples divided by each initial target detection frame may not be accurate, so that the subsequent Non-maximum suppression (Non-Maximum Suppression, NMS for short) process may not be accurate enough and the target detection performance may be poor. For example, in fig. 4, a set of point clouds 41 (e.g., the exterior contour point clouds of a vehicle) should ideally be framed by an initial target detection frame, and the initial target detection frame should be sized to just fit the point clouds 41. However, the initial target detection frame 42 output by these target detection networks may be oversized as shown in fig. 4, resulting in excessive white space in the initial target detection frame 42, although the point cloud 41 is framed. Another purpose of the target detection method based on the point cloud data provided in the embodiment of the present application is to overcome the problem shown in fig. 4, so that the detection frame result frames the target more accurately, and the detection frame size is appropriate.

In addition, since the object detection network can detect a plurality of objects at the same time, and even for the object detection of the same object, there may be a plurality of initial object detection boxes. In order to ensure the subsequent calculation process, in an embodiment of the present application, each initial target detection frame information needs to be obtained, where the initial target detection frame information may include size range information of the initial target detection frame, for example, but not limited to, the size range information is represented in the form of (l, w, h); where l, w and h represent the length, width and height, respectively, of the initial target detection frame. In addition, the initial target detection frame information may further include initial target detection frame center point information and orientation information, for example, the initial target detection frame center point information is represented by (Cx, cy, cz), cx, cy, cz respectively represent center point coordinates of the initial target detection frame, and the orientation information may be denoted as a heading, but is not limited thereto.

Step 202, extracting point clouds in a preset frame range around each initial target detection frame from original point cloud data.

In an embodiment of the present application, since there may be a case that some points in the point cloud 31 shown in fig. 3 fall outside the initial target detection frame 32, in order to meet the processing requirement of the original point cloud data, and avoid dropping key points in the point cloud, the step 202 needs to extract, from the original point cloud data, the point cloud within a preset frame range around each initial target detection frame, and the specific process may be as follows (but is not limited to this):

Mode one: the preset frame magnification is obtained, that is, as shown in fig. 5, the frame magnification refers to a ratio of the size of the preset range (for example, the frame range 51) to the initial target detection frame 52 (for example, may be a ratio of volumes of the two, a ratio of lengths of the two, or a ratio of surface areas of the two, but is not limited thereto). Generally, the frame magnification is greater than 1, for example, 1.5 times may be selected, but the invention is not limited thereto.

According to the initial target detection frame and the frame magnification, each preset frame range is determined (i.e. the preset frame range 51 can be calculated when the initial target detection frame 52 and the frame magnification are known), and the point cloud within each preset frame range, i.e. the point cloud outside the initial target detection frame and within the preset frame range, is extracted from the original point cloud data.

Mode two: obtaining a preset frame expansion amount; wherein, the frame expansion amount can be preset expansion length, expansion area or expansion volume; for example, as shown in fig. 5, the initial target detection frame 52 has the characteristics of a frame length, a surface area, a frame volume, and the like, so that after obtaining the corresponding frame expansion amount, a preset frame range 51 can be determined, and point clouds within each preset frame range can be extracted from the original point cloud data. I.e. outside the initial target detection box and within the preset box, the point cloud will be extracted.

It should be noted that, in step 103, the input data of the neural network in the embodiment of the present application may include point data to be processed and expression information of each initial target detection frame; the initial target detection frame information and the expression information at least comprise size range information of the initial target detection frame, and in addition, the initial target detection frame information and the expression information can also comprise center point information and orientation information of the initial target detection frame; the point data to be processed includes the point location to be processed (e.g., in coordinates may be employed).

The expression information may refer to any manner capable of expressing the initial target detection frame, and this is not limited in this application. For ease of understanding, the following embodiments of the present application will provide several common ways.

Then after step 202, the following steps 203 to 204 are continued. Steps 203 to 204 may be implemented as specific implementation of step 103.

Step 203, determining expression information of each initial target detection frame according to points in original point cloud data in each initial target detection frame, points in point clouds within a preset frame range outside the initial target detection frame and size range information of each initial target detection frame.

Here, this step 203 may take various forms, and the embodiments of the present application only exemplify some of the embodiments, and it should be understood that, all the expression information capable of expressing the initial target detection frame should be understood as expression information in the embodiments of the present application.

The mode a may be a mode of setting a virtual point in an initial target detection frame:

for example, virtual points uniformly filled in the initial target detection frames are generated in the initial target detection frames as expression information of the initial target detection frames based on the size range information of the initial target detection frames. The distribution of virtual points expresses the range of the initial target detection frame.

Specifically, the interval of virtual points corresponding to each initial target detection frame can be obtained according to the size range information of each initial target detection frame; and then generating virtual points uniformly filled in the initial target detection frames in each initial target detection frame as expression information of each initial target detection frame according to the intervals of the virtual points corresponding to each initial target detection frame. As shown in fig. 5, virtual points 53 are uniformly distributed within the initial target detection frame 52. The interval between the virtual points 53 may be determined according to the size range of the initial target detection frame, for example, the size range of the initial target detection frame is a frame with a length, width, height and height of 5m, 3m, and 2m, so that the cell center or eight vertex angles of each cubic decimeter may be determined to set the virtual points 53 within the frame range, but not limited thereto, and those skilled in the art may also use other manners to obtain the interval between the virtual points, such as directly setting the interval manually.

In order to facilitate the subsequent neural network to be able to identify virtual points, points in the original point cloud data within the initial target detection frame, and points in the point cloud within the preset frame range outside the initial target detection frame, assignment needs to be made to these points, for example: as shown in fig. 5, the preset point type values are respectively allocated to the point 521 in the original point cloud data in each initial target detection frame 52, the virtual point 53, and the point 511 in the point cloud in the preset frame range 51 outside the initial target detection frame 52, for example, but not limited to, the point type value corresponding to the point in the original point cloud data in the initial target detection frame is 1, the point type value corresponding to the virtual point is 2, and the point type value corresponding to the point in the point cloud in the preset frame range outside the initial target detection frame is 0.

Mode B, mode of adding size range information to the characteristics of points in the point cloud:

for example, the size range information of each initial target detection frame can be added to the corresponding characteristics of the points in the original point cloud data in the initial target detection frame and the points in the point cloud in the range of the preset frame outside the initial target detection frame, so that the characteristics carry the expression information, and after the characteristics are input into the subsequent neural network, the subsequent neural network can obtain the expression information capable of expressing the initial target detection frame.

Here, there are various ways how to increase the size range information of the initial target detection frame to the corresponding features of the points in the original point cloud data within the initial target detection frame and the points in the point cloud within the preset frame range outside the initial target detection frame, and only two of these ways are listed here, but not limited thereto.

One of them is: the size range information (l, w, h) of each initial target detection frame can be added into the coordinates (xi, yi, zi) of points in the original point cloud data in the corresponding initial target detection frame and points in the point cloud in the range of the preset frame outside the initial target detection frame, and the characteristics (xi, yi, zi, l, w, h) of each point with the expression information are generated; where l, w and h represent the length, width and height, respectively, of the initial target detection frame.

Another is: the normalization processing can be carried out on the corresponding points in the original point cloud data in the initial target detection frame and the corresponding points (xi, yi, zi) in the point cloud in the range of the preset frame outside the initial target detection frame according to the size range information (l, w, h) of each initial target detection frame, so as to generate the characteristics that each point has the expression informationWhere l, w and h represent the length, width and height, respectively, of the initial target detection frame.

Step 204, generating input data of the neural network at least comprising the point positions to be processed and the expression information of each initial target detection frame.

The point to be processed generally refers to a point in original point cloud data in the initial target detection frame and a point in a point cloud within a preset frame range outside the initial target detection frame.

And 205, inputting the input data into a pre-trained neural network for processing, and obtaining a detection frame result and a target class result corresponding to each initial target detection frame according to the output of the pre-trained neural network.

The pre-trained neural network may be a currently mainstream target detection network capable of processing the original point cloud data, such as a PointNet network, a pointnet++ network, a dynamic graph convolutional neural network DGCNN, or a convolutional neural network RSCNN based on geometric relationships, but is not limited thereto. Since these neural networks belong to the prior art target detection networks, the structure of these neural networks is not described here in detail. The embodiments of the present application provide input data only for these neural networks and obtain the output of these neural networks.

Specifically, when the above step 203 is implemented in the mode a, the pre-training process of the neural network is:

obtaining a training sample data set; the training sample data set comprises a plurality of groups of training sample data; each group of training sample data comprises points in original point cloud data in an initial target detection frame, virtual points in the initial target detection frame, points in point cloud in a preset frame range outside the initial target detection frame, point type values corresponding to the points, detection frame results corresponding to the initial target detection frame and target type results which are marked in advance.

And taking points in original point cloud data in an initial target detection frame, virtual points in the initial target detection frame, points in point cloud within a preset frame range outside the initial target detection frame and point type values corresponding to the points as inputs, taking a detection frame result corresponding to the initial target detection frame and a target class result which are marked in advance as outputs, and training the neural network.

Specifically, when the above step 203 is performed in the mode B, the pre-training process of the neural network is:

obtaining a training sample data set; the training sample data set comprises a plurality of groups of training sample data; each group of training sample data comprises characteristics of points to be processed in an initial target detection frame with the expression information, characteristics of points to be processed in a preset frame range outside the initial target detection frame with the expression information, detection frame results corresponding to the initial target detection frame and target category results, which are marked in advance;

And taking the characteristics of the expression information of the points to be processed in the initial target detection frame and the characteristics of the expression information of the points to be processed in the range of the preset frame outside the initial target detection frame as inputs, taking the detection frame result and the target class result corresponding to the initial target detection frame which are marked in advance as outputs, and training the neural network.

Here, there may be various specific neural network training methods, such as BGD (Batch Gradient Descent, batch gradient descent method), SGD (Stochastic Gradient Descent, random gradient descent method), adam optimization algorithm (Adam optimization algorithm), RMSprop (Root Mean Square prop, root mean square transfer method), and the like, but not limited thereto.

Accordingly, when the above step 203 is implemented in the manner a, the step 205 may be implemented as follows:

and inputting points and virtual points in original point cloud data in each initial target detection frame, points in point clouds within a preset frame range outside the initial target detection frame and respective corresponding point type values into a pre-trained neural network for processing, and obtaining detection frame results and target category results corresponding to each initial target detection frame according to output of the pre-trained neural network.

Accordingly, when the above step 203 is implemented in the manner B, the step 205 may be implemented as follows:

and inputting the characteristics of the expression information of the points to be processed in each initial target detection frame and the characteristics of the expression information of the points to be processed in a range of a preset frame outside the initial target detection frame into a pre-trained neural network for processing, and obtaining detection frame results and target class results corresponding to each initial target detection frame according to the output of the pre-trained neural network.

Through the steps 201 to 205, the pre-trained neural network processes the original point cloud data without losing the position information of the original point cloud data, and in addition, the pre-trained neural network can obtain the expression information representing the initial target detection frame, so that the output detection frame result can avoid the condition of overlarge existence. The resulting detection frame results, both in location and in size, will be more closely related to the reality of the target.

Specifically, the detection frame result output by the neural network trained in advance may include coordinates of a center point of the detection frame output by the neural network, size information of the detection frame, and orientation information corresponding to the detection frame.

The specific target class result can be generally expressed in the form of numbers according to the class of interest, and for the same target object, the sum of numbers expressing each class in the corresponding target class result is 1. For example, in the whole frame of point cloud data, the category types of interest are pedestrians and vehicles respectively, and then the target category result may be expressed as score, namely: (the number of pedestrians is identified, the number of vehicles is identified), wherein 5 detection frames may be output for the pedestrians, the target class results may be (0.90,0.10), (0.87,0.13), (0.78,0.22), (0.96,0.04), (0.89,0.11) respectively, and 4 detection frames may be output for the vehicles, the target class results may be (0.05,0.95), (0.08,0.92), (0.20,0.80), (0.15,0.85) respectively. In the field of target detection, the detection frame result and the target class result may be expressed in other ways, which are not listed here.

After step 205, step 206 may also be continued:

and 206, processing the detection frame results and the target class results corresponding to the initial target detection frames according to the non-maximum suppression NMS algorithm to generate final detection frame results after non-maximum suppression.

Since there are several detection frame results and corresponding target class results, especially for the same target, there may be multiple detection frames, and there are overlapping areas of the multiple detection frames, so that non-maximum suppression is performed to screen out the most preferable final detection frame result. For example, the pedestrian corresponds to 5 detection frames, the target class results thereof are (0.90,0.10), (0.87,0.13), (0.78,0.22), (0.96,0.04), (0.89,0.11), respectively, and the vehicle corresponds to 4 detection frames, the target class results thereof are (0.05,0.95), (0.08,0.92), (0.20,0.80), (0.15,0.85), respectively. By the above-described non-maximum value suppression processing, the final detection frame result may leave only the detection frame corresponding to (0.96,0.04) and the detection frame corresponding to (0.05,0.95). Specific non-maximum suppression algorithms are well established and will not be described in detail herein.

In addition, as shown in fig. 6, an embodiment of the present application further provides a target detection device based on point cloud data, including:

an initial information obtaining unit 61 for obtaining the original point cloud data and each initial target detection frame outputted from the initial target detection network, and obtaining information of each initial target detection frame.

And a point cloud extracting unit 62, configured to extract, from the original point cloud data, a point cloud within a range of a preset frame around each initial target detection frame.

The input data generating unit 63 is configured to generate input data of the neural network according to points in the original point cloud data in each initial target detection frame, points in the point cloud within a preset frame range outside the initial target detection frame, and information of each initial target detection frame.

The result generating unit 64 is configured to input the input data into a pre-trained neural network for processing, and obtain a detection frame result and a target class result corresponding to each initial target detection frame according to an output of the pre-trained neural network.

It should be noted that, the specific implementation manner of the target detection device based on the point cloud data provided in the embodiment of the present application may refer to the method embodiments corresponding to fig. 1 and fig. 5, and are not described herein again.

In addition, the embodiment of the application further provides a computer readable storage medium, which includes a program or instructions, and when the program or instructions run on a computer, the method corresponding to fig. 1 and fig. 5 is implemented.

In addition, the embodiment of the application further provides a computer program product containing instructions, which when run on a computer, cause the computer to execute the method corresponding to the above-mentioned fig. 1 and 5.

In addition, the embodiment of the application further provides a chip system, which comprises a processor, wherein the processor is coupled with a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the method corresponding to the above fig. 1 and 5 is realized.

In addition, the embodiment of the application also provides a computer server, which comprises a memory and one or more processors in communication connection with the memory;

the memory stores instructions executable by the one or more processors to cause the one or more processors to implement the methods corresponding to fig. 1 and 5 described above.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principles and embodiments of the present application are described herein with reference to specific examples, the description of which is only for the purpose of aiding in the understanding of the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. The target detection method based on the point cloud data is characterized by comprising the following steps of:

obtaining original point cloud data and initial target detection frames output by an initial target detection network, and obtaining information of the initial target detection frames, wherein the initial target detection frames are obtained by inputting the original point cloud data into the initial target detection network, one part of points of the original point cloud data fall inside the initial target detection frames, and the other part of points fall outside the initial target detection frames;

extracting point clouds in a preset frame range around each initial target detection frame from the original point cloud data, wherein each preset frame range is larger than and surrounds the corresponding initial target detection frame;

2. The method of claim 1, wherein extracting a point cloud within a predetermined frame range around each initial target detection frame from the raw point cloud data comprises:

obtaining a preset frame multiplying power, wherein the frame multiplying power is larger than 1;

and determining each preset frame range according to each initial target detection frame and the frame multiplying power, and extracting point clouds in each preset frame range from the original point cloud data.

3. The method of claim 1, wherein extracting a point cloud within a predetermined frame range around each initial target detection frame from the raw point cloud data comprises:

obtaining a preset frame expansion amount; the frame expansion amount is preset expansion length, expansion area or expansion volume;

and determining each preset frame range according to each initial target detection frame and the frame expansion amount, and extracting point clouds in each preset frame range from the original point cloud data.

4. The method according to claim 1, wherein the input data of the neural network includes point data to be processed and expression information of each initial target detection frame; the initial target detection frame information and the expression information at least comprise size range information of an initial target detection frame; the point data to be processed comprises point positions to be processed;

The generating the input data of the neural network according to the points in the original point cloud data in each initial target detection frame, the points in the point cloud within the preset frame range outside the initial target detection frame and the information of each initial target detection frame comprises the following steps:

determining the expression information of each initial target detection frame according to the points in the original point cloud data in each initial target detection frame, the points in the point cloud within the range of the preset frame outside the initial target detection frame and the size range information of each initial target detection frame;

and generating input data of the neural network at least comprising the expression information of each point position to be processed and each initial target detection frame.

5. The method of claim 4, wherein the initial target detection frame information and the expression information further comprise initial target detection frame center point information and orientation information.

6. The method according to claim 4, wherein determining the expression information of each initial target detection frame based on the points in the original point cloud data in each initial target detection frame, the points in the point cloud within the preset frame range outside the initial target detection frame, and the size range information of each initial target detection frame includes:

Generating virtual points uniformly filled in each initial target detection frame as expression information of each initial target detection frame according to the size range information of each initial target detection frame;

the method further comprises the steps of:

and respectively distributing preset point type values to points in original point cloud data in each initial target detection frame, the virtual points and points in point clouds within a preset frame range outside the initial target detection frame.

7. The method of claim 6, wherein generating virtual points uniformly filled in each initial target detection frame as expression information of each initial target detection frame according to the size range information of each initial target detection frame, comprises:

obtaining the intervals of virtual points corresponding to the initial target detection frames according to the size range information of the initial target detection frames;

and generating virtual points uniformly filled in the initial target detection frames in each initial target detection frame as expression information of each target detection frame according to the intervals of the virtual points corresponding to each initial target detection frame.

8. The method of claim 7, wherein inputting the input data into a pre-trained neural network for processing, obtaining a detection frame result and a target class result corresponding to each initial target detection frame according to an output of the pre-trained neural network, comprises:

9. The method of claim 8, further comprising a pre-training process of the neural network:

obtaining a training sample data set; the training sample data set comprises a plurality of groups of training sample data; each group of training sample data comprises points in original point cloud data in an initial target detection frame, virtual points in the initial target detection frame, points in point cloud in a preset frame range outside the initial target detection frame, point type values corresponding to the points, detection frame results corresponding to the initial target detection frame and target type results which are marked in advance;

and taking points in the original point cloud data in the initial target detection frame, virtual points in the initial target detection frame, points in the point cloud within a preset frame range outside the initial target detection frame and point type values corresponding to the points as inputs, taking a detection frame result corresponding to the initial target detection frame and a target class result which are marked in advance as outputs, and training the neural network.

10. The method according to claim 4, wherein determining the expression information of each initial target detection frame based on the points in the original point cloud data in each initial target detection frame, the points in the point cloud within the preset frame range outside the initial target detection frame, and the size range information of each initial target detection frame includes:

and adding the size range information of each initial target detection frame into the corresponding characteristics of the points in the original point cloud data in the initial target detection frame and the points in the point cloud within the range of the preset frame outside the initial target detection frame, so that the characteristics carry the expression information.

11. The method according to claim 10, wherein adding the size range information of each initial target detection frame to the corresponding feature of the point in the original point cloud data in the initial target detection frame and the point in the point cloud within the preset frame range outside the initial target detection frame, so that the feature carries the expression information, includes:

size range information of each initial target detection framel、w、h) Coordinates of points in original point cloud data added to corresponding initial target detection frame and points in point cloud within a preset frame range outside the initial target detection frame xi、yi、zi) In the process, the characteristics that each point has the expression information are generatedxi、yi、zi、l、w、h) The method comprises the steps of carrying out a first treatment on the surface of the Wherein,l、wandhrepresenting the length, width and height of the initial target detection frame, respectively.

12. The method according to claim 10, wherein adding the size range information of each initial target detection frame to the corresponding feature of the point in the original point cloud data in the initial target detection frame and the point in the point cloud within the preset frame range outside the initial target detection frame, so that the feature carries the expression information, includes:

according to the size range information of each initial target detection framel、w、h) The corresponding points in the original point cloud data in the initial target detection frame and the points in the point cloud within the range of the preset frame outside the initial target detection frame are treated with the methodxi、yi、zi) Normalized processing is carried out to generate the characteristics that each point has the expression information) The method comprises the steps of carrying out a first treatment on the surface of the Wherein,l、wandhrepresenting the length, width and height of the initial target detection frame, respectively.

13. The method according to claim 11 or 12, wherein inputting the input data into a pre-trained neural network for processing, obtaining a detection frame result and a target class result corresponding to each initial target detection frame according to an output of the pre-trained neural network, comprises:

14. The method of claim 13, further comprising a pre-training process of the neural network:

15. The method of claim 1, wherein the detection frame result includes center point coordinates of a detection frame output by the neural network, detection frame size information, and orientation information corresponding to the detection frame.

16. The method of claim 1, further comprising, after obtaining the detection frame result and the target class result corresponding to each initial target detection frame:

and processing the detection frame results and the target class results corresponding to the initial target detection frames according to a non-maximum suppression NMS algorithm to generate final detection frame results after non-maximum suppression.

17. A point cloud data-based object detection apparatus, comprising:

an initial information obtaining unit configured to obtain initial point cloud data and initial target detection frames output by an initial target detection network, and obtain initial target detection frame information, where each initial target detection frame is obtained by inputting the initial point cloud data to the initial target detection network, a part of points of the initial point cloud data fall within each initial target detection frame, and another part of points of the initial point cloud data fall outside each initial target detection frame;

The point cloud extraction unit is used for extracting point clouds in a preset frame range around each initial target detection frame from the original point cloud data, wherein each preset frame range is larger than and surrounds the corresponding initial target detection frame;

18. A computer readable storage medium comprising a program or instructions which, when run on a computer, implement the method of any one of claims 1 to 16.

19. A system on a chip comprising a processor coupled to a memory, the memory storing program instructions that when executed by the processor implement the method of any one of claims 1 to 16.

20. A computer server comprising a memory, and one or more processors communicatively coupled to the memory;

stored in the memory are instructions executable by the one or more processors to cause the one or more processors to implement the method of any one of claims 1 to 16.