CN114913331B

CN114913331B - Target detection method and device based on point cloud data

Info

Publication number: CN114913331B
Application number: CN202110174725.2A
Authority: CN
Inventors: 苗振伟; 陈纪凯; 朱均; 刘凯旋; 郝培涵; 占新; 卿泉
Original assignee: Wuzhou Online E Commerce Beijing Co ltd
Current assignee: Wuzhou Online E Commerce Beijing Co ltd
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2024-09-20
Anticipated expiration: 2041-02-08
Also published as: CN114913331A

Abstract

The disclosure discloses a target detection method and device based on point cloud data. The method comprises the following steps: voxelized point cloud data, and extracting voxel characteristics of non-empty voxels; obtaining voxel characteristics and pixel characteristics corresponding to laser points in non-empty voxels according to the voxel characteristics; obtaining a fusion characteristic of the laser point according to the original characteristic of the laser point and the voxel characteristic of the non-empty voxels, as well as the voxel characteristic and the pixel characteristic corresponding to the laser point; and determining a target to be identified in the point cloud data according to the fusion characteristics of the laser points. The point dimension target segmentation detection is realized, and the detection accuracy and precision are high.

Description

Target detection method and device based on point cloud data

Technical Field

The disclosure relates to the technical field of deep learning, in particular to a target detection method and device based on point cloud data.

Background

The laser point cloud data can be used for predicting the position information of the target and the geometric shape information of the target, and therefore plays an important role in the field of machine perception, such as unmanned and robotic fields.

The method for realizing target detection by utilizing point cloud data in the prior art mainly comprises the following steps:

1. based on conventional segmentation detection algorithms. The method comprises the steps of filtering out ground point clouds from laser point cloud data through a ground segmentation algorithm, clustering the point cloud data based on a graph-based segmentation clustering algorithm, filtering out background point clouds, and classifying the segmented point cloud clusters through a classifier (for example, an SVM classifier). However, the graph-based segmentation clustering algorithm has a large calculation amount and depends on the ground segmentation algorithm, so that the detection accuracy and precision are limited in a complex urban environment.

2. A laser point cloud projection-based deep learning method. By projecting the 3D laser point cloud data into a specific 2D plane, the 3D laser point cloud target detection problem is reduced to the target detection problem of a 2D image, but the reduction of the dimension also causes the point cloud data to lose part of the information of the target, and the accuracy and precision of final target prediction are reduced.

In summary, in the prior art, the method for realizing target detection by using the point cloud data is difficult to meet the requirements of accuracy and precision.

Disclosure of Invention

In view of the foregoing, the present disclosure has been made in order to provide a target detection method and apparatus based on point cloud data that overcomes or at least partially solves the foregoing problems.

In a first aspect, an embodiment of the present disclosure provides a target detection method based on point cloud data, including:

voxelized point cloud data, and extracting voxel characteristics of non-empty voxels;

obtaining voxel characteristics and pixel characteristics corresponding to laser points in the non-empty voxels according to the voxel characteristics;

obtaining a fusion characteristic of the laser point according to the original characteristic of the laser point, the voxel characteristic of the non-empty voxel, and the voxel characteristic and the pixel characteristic corresponding to the laser point;

And determining a target to be identified in the point cloud data according to the fusion characteristics of the laser points.

In a second aspect, an embodiment of the present disclosure provides a target detection apparatus based on point cloud data, including:

The voxelization module is used for voxelizing the point cloud data and extracting voxel characteristics of non-empty voxels;

The feature acquisition module is used for acquiring voxel features and pixel features corresponding to the laser points in the non-empty voxels according to the voxel features extracted by the voxelization module;

the fusion module is used for obtaining the fusion characteristics of the laser points according to the original characteristics of the laser points, the voxel characteristics of the non-empty voxels extracted by the voxelization module, and the voxel characteristics and the pixel characteristics corresponding to the laser points obtained by the characteristic acquisition module;

And the target identification module is used for determining a target to be identified in the point cloud data according to the fusion characteristics of the laser points obtained by the fusion module.

In a third aspect, an embodiment of the present disclosure provides a computer program product with a target detection function, including a computer program/instruction, where the computer program/instruction, when executed by a processor, implements the target detection method based on point cloud data.

In a fourth aspect, embodiments of the present disclosure provide a server, including: the target detection method based on the point cloud data comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the target detection method based on the point cloud data when executing the program.

The beneficial effects of the technical scheme provided by the embodiment of the disclosure at least include:

According to the target detection method based on the point cloud data, the point cloud data are voxelized, and voxel characteristics of non-empty voxels are extracted; obtaining voxel characteristics and pixel characteristics corresponding to laser points in non-empty voxels according to the voxel characteristics; obtaining a fusion characteristic of the laser point according to the original characteristic of the laser point and the voxel characteristic of the non-empty voxels, as well as the voxel characteristic and the pixel characteristic corresponding to the laser point; and determining a target to be identified in the point cloud data according to the fusion characteristics of the laser points. The fusion characteristics of the laser points comprise original characteristics of the points, and accurate position information of the points is reserved; meanwhile, the method also comprises voxel features and pixel features corresponding to the laser points, so that relative information among surrounding laser points is reserved, the features of the points are characterized more abundantly, the fusion features of the points retain original features of the points and also contain abundant context semantic information, the feature expression capability of each point is enhanced, and the accuracy and the precision of targets identified by deep learning by utilizing the fusion features of the points are ensured.

Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the disclosure. The objectives and other advantages of the disclosure will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, without limitation to the disclosure. In the drawings:

fig. 1 is a flowchart of a target detection method based on point cloud data in a first embodiment of the disclosure;

FIG. 2 is a flowchart showing the implementation of step S12 in FIG. 1;

FIG. 3 is a flowchart of a specific implementation of foreground feature-based object recognition in a second embodiment of the present disclosure;

FIG. 4 is a flowchart showing the implementation of step S33 in FIG. 3;

FIG. 5 is an example diagram of a method of target detection based on point cloud data in an embodiment of the present disclosure;

Fig. 6 is a schematic structural diagram of a target detection device based on point cloud data in an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In order to solve the problem that in the prior art, accuracy and precision of target detection are low by utilizing point cloud data, the embodiment of the disclosure provides a target detection method and device based on the point cloud data, which realizes point dimension target segmentation detection and has high detection accuracy and precision.

Example 1

An embodiment of the present disclosure provides a target detection method based on point cloud data, the flow of which is shown in fig. 1, including the following steps:

step S11: and voxelizing the point cloud data, and extracting voxel characteristics of non-empty voxels.

The point cloud data may be obtained by a multi-line lidar or radar such as 4 lines, 16 lines, 32 lines, 64 lines or 128 lines.

The method comprises the steps of voxelizing point cloud data, converting point data without space length information into cube data with three-dimensional space information, and determining a minimum cuboid containing all the point cloud data according to minimum values and maximum values of all the point cloud data in X, Y, Z coordinate directions; determining the size of the voxels according to the size of the minimum cuboid and the resolution requirement; the minimum cuboid is equally divided into voxels according to the voxel size.

In one embodiment, extracting voxel features of non-empty voxels may include utilizing multi-layer perceptron network MLP and/or convolutional neural network CNN layers.

A Multi-Layer Perceptron (MLP), also known as an artificial neural network (ARTIFICIAL NEURAL NETWORK, ANN), includes an input Layer, an output Layer, and at least one hidden Layer in between. The neural network is a technology from a bionic neural network, and finally achieves a target through connecting a plurality of characteristic values and combining linearity and nonlinearity.

Convolutional neural networks (Convolutional Neural Networks, CNN) are a class of feed-forward neural networks (Feedforward Neural Networks, FNN) that contain convolutional computations and have a deep structure, and are one of the representative algorithms for deep learning. Voxel features of non-empty voxels may be extracted through pooling layers of the convolutional neural network.

Alternatively, other manually set features of non-empty voxels may be acquired and added to the voxel features extracted through the network.

The obtained voxel feature may be a feature including each laser spot in the non-empty voxel, and a feature obtained by fusing the feature of the laser spot in the non-empty voxel and the feature of the surrounding laser spots.

Step S12: shallow layer information of the laser points in the non-empty voxels, and voxel characteristics and pixel characteristics corresponding to the laser points are obtained according to the voxel characteristics.

Referring to fig. 2, the specific implementation procedure of step S12 may include the following steps:

Step S121: and splicing the original features of the laser points in the non-empty voxels with the voxel features of the non-empty voxels to obtain shallow information of the laser points.

And carrying out one-dimensional convolution on original features { x, y, z, intensity } (intensity is reflection intensity information of points) of the laser point by using a multi-layer perception network MLP, splicing the obtained processed features with voxel features of voxels where the laser point is located, and obtaining shallow layer information of the point dimension of the laser point by using the multi-layer perception network MLP.

Step S122: and inputting the voxel characteristics of the non-empty voxels into a characteristic network with three-dimensional convolution and two-dimensional convolution fusion, and extracting the three-dimensional network characteristics and the two-dimensional network characteristics of the non-empty voxels.

And inputting the voxel characteristics of the non-empty voxels into a characteristic network fused by the three-dimensional convolution and the two-dimensional convolution, sequentially carrying out the three-dimensional convolution and the two-dimensional convolution, and extracting the three-dimensional network characteristics and the two-dimensional network characteristics of the non-empty voxels.

Sparse three-dimensional convolution can be carried out on voxel characteristics of non-empty voxels through a backstone network, a three-dimensional data body with set resolution and comprising processed voxel characteristics of the non-empty voxels is obtained, and three-dimensional network characteristics of the non-empty voxels are extracted; and carrying out two-dimensional convolution on the three-dimensional data body through a backhaul network to obtain a two-dimensional data body with corresponding resolution and including pixel characteristics of non-empty voxels, and extracting the two-dimensional network characteristics of the non-empty voxels.

For example, downsampling may be performed 2 times, 4 times, and 8 times by a backhaul sparse 3D convolutional network, in which three-dimensional data volumes of the 4-layer 3D network are extracted, corresponding to full resolution, 1/2, 1/4, and 1/8 resolution, respectively, and optionally, other resolutions. And continuing to carry out two-dimensional convolution on the three-dimensional data volume with each resolution to obtain a two-dimensional data volume with the corresponding resolution.

Step S123: and obtaining the voxel characteristics and the pixel characteristics corresponding to the laser points in the non-empty voxels according to the corresponding relation between the laser points in the non-empty voxels and the three-dimensional network characteristics and the two-dimensional network characteristics.

The original characteristics of the laser points in the point cloud comprise the position information of the laser points, the projection positions of the laser points in the three-dimensional data body and the two-dimensional data body are determined according to the position information of the laser points, namely, the voxels where the laser points are located are determined according to the position information of the laser points, and the processed voxel characteristics and pixel characteristics of the voxels in the three-dimensional data body and the two-dimensional data body are determined as the voxel characteristics and the pixel characteristics corresponding to the laser points.

Step S13: and obtaining the fusion characteristic of the laser spot according to the shallow information of the laser spot and the corresponding voxel characteristic and pixel characteristic.

And fusing the shallow layer information of the point dimension of the laser point with the corresponding voxel characteristic and pixel characteristic to obtain the fusion characteristic of the point dimension of the laser point.

Step S14: and determining a target to be identified in the point cloud data according to the fusion characteristics of the laser points.

The segmentation task and the target detection task of the point cloud are realized by performing multi-task learning on the fusion characteristic of each laser point, including foreground segmentation, point classification and identification, iou score supervision, center point, size and angle supervision and the like.

Specifically, the front background segmentation is to distinguish whether the laser point is a foreground point or a background point, if the laser point is the background point, the laser point is filtered, and if the laser point is the foreground point, the laser point is used for subsequent target recognition; the point classification and identification, namely after the point is determined to be the foreground point, further distinguishing the type of the target corresponding to the foreground point, such as a person, a vehicle or a tree; in an evaluation system of target detection, there is an iou parameter, which is simply understood to be the overlapping rate of a target window and an actual window generated by a model, namely the score of target prediction accuracy; the center point, size and angle are the center point, size and angle of the target contour identified from the foreground points.

According to the target detection method based on the point cloud data, the point cloud data are voxelized, and voxel characteristics of non-empty voxels are extracted; shallow layer information of a laser point in a non-empty voxel is obtained according to the voxel characteristics, and the voxel characteristics and the pixel characteristics corresponding to the laser point; obtaining fusion characteristics of the laser points according to shallow layer information of the laser points and corresponding voxel characteristics and pixel characteristics; and determining a target to be identified in the point cloud data according to the fusion characteristics of the laser points. The fusion characteristics of the laser points comprise shallow information of the points, and accurate position information of the points is reserved; meanwhile, the method also comprises voxel features and pixel features corresponding to the laser points, so that relative information among surrounding laser points is reserved, the features of the points are characterized more abundantly, the fusion features of the points retain original features of the points and also contain abundant context semantic information, the feature expression capability of each point is enhanced, and the accuracy and the precision of targets identified by deep learning by utilizing the fusion features of the points are ensured.

In one embodiment, prior to voxelizing the point cloud data, further comprising performing at least one of:

Filtering background point clouds in the point cloud data;

the point clouds within the non-interest blocks in the point cloud data are filtered.

By filtering the background point cloud and the point cloud data of the non-interest block, more than about 50% of irrelevant points can be filtered, the number of laser points to be processed is greatly reduced, and the subsequent calculation amount is reduced.

When the above-described target detection method is applied to the field of automatic driving, interest corresponding to a non-interest region refers to a region that has an influence on automatic driving, such as a traveling road, a sidewalk that is closer to the traveling road, or the like.

Example two

A second embodiment of the present disclosure provides a method for identifying a target based on foreground features, the flow of which is shown in FIG. 3, including the following steps:

step S31: and identifying foreground points through the fusion characteristics of the laser points by a deep learning network, and determining target prediction information of the foreground points.

The foreground points are identified after the fusion characteristics of each laser point are subjected to multi-task learning, and the central point position information, the size information, the angle information and the prediction score of the target corresponding to the foreground points are determined and used as target prediction information. Optionally, the target prediction information may further include a classification of the target.

The central point position information can be the offset of the foreground point relative to the target central point, and the coordinate value of the central point position is often larger, so that the central point position information is the offset of the foreground point relative to the target central point rather than the specific coordinate value of the central point, the deep learning process is simplified, and the calculated amount is reduced.

Step S32: and screening a preset number of foreground points by using the furthest point sampling mode to serve as main foreground points.

In one embodiment, deleting foreground points in the target prediction information, wherein the prediction score of the foreground points is lower than a preset second score threshold value; determining the center point position of a target corresponding to the foreground point according to the position information in the original characteristics of the foreground point and the offset of the foreground point in the target prediction information relative to the target center point; and screening a preset number of foreground points to serve as main foreground points in a mode of furthest point sampling according to the center point position of the target corresponding to the foreground points.

Specifically, a foreground point is randomly determined as a main foreground point, and the foreground point farthest from the current main foreground point is sequentially determined as the main foreground point until the number of the main foreground points reaches a preset number.

The preset number is determined according to specific situations, and may be determined according to factors such as accuracy requirements of prediction, the approximate target number and the size of the target outline. For example, the preset number may be 256, or may be other values.

Step S33: and determining the recognition result of the target to be recognized according to the target prediction information of the main foreground point.

In one embodiment, referring to FIG. 4, the following steps may be included:

step S331: for each main foreground point, determining a set of main foreground points, wherein the distance between the target center point and the target center point of the main foreground point is smaller than a preset distance threshold value, according to the target center point position information in the target prediction information of the main foreground point.

For each primary foreground point, a primary foreground point set corresponding to the same target as the primary foreground point may be determined using a ball query method.

Step S332: and determining a first target prediction result according to the target prediction information of each main foreground point in the set.

The information of the corresponding target can be predicted by different estimators according to the target prediction information of each main foreground point in the set, for example, the center point, the size, the angle and the prediction score of the target in the target prediction information of the single main foreground point are simply averaged by a mean estimator to obtain the output of the single target.

Step S333: clustering the first target prediction results, and determining a second target prediction result according to the first target prediction results of the same class as a target recognition result of the target to be recognized.

Although each primary foreground point corresponds to one primary foreground point set and further corresponds to one first target prediction result, since the obtained primary foreground point sets are often the same based on different primary foreground points in the same primary foreground point set, the obtained first target prediction results are also often the same. Therefore, the first target prediction results of the same type obtained by clustering the first target prediction results may only comprise a unique first target prediction result, and at this time, the unique first target prediction result is taken as a final target recognition result of the target.

When the first target prediction results of the same class comprise more than one first target prediction result, determining a score average value of prediction scores in target prediction information of main foreground points corresponding to the first target prediction results of the class; and determining a second target prediction result according to the first target prediction result with the score mean value higher than a preset first score threshold value.

According to the foreground feature-based target identification method provided by the second embodiment of the disclosure, the preset number of foreground points are screened as main foreground points in the furthest point sampling mode, so that the number of the main foreground points is greatly reduced, the calculated amount is reduced, and the real-time requirement of target prediction is ensured; meanwhile, the selected main foreground points are strong in representativeness, prediction of each target can be completed better, and prediction accuracy is guaranteed.

Referring to fig. 5, the above-described process of object detection based on point cloud data may be summarized as follows: (1) Inputting point cloud data, namely inputting the point cloud data into a multidimensional fusion end-to-end 3D perception network; (2) The method comprises the steps of (1) voxelization of point cloud and voxel feature extraction, voxelization of point cloud data, and extraction of voxel features of each non-empty voxel, wherein the features can comprise features output by pooling layers of a lightweight multilayer perception network MLP+convolutional neural network CNN, and can also comprise features of manual design; (3) Extracting voxel characteristics after 3D processing and 2D pixel characteristics of a back bone network, further inputting the voxel characteristics into a characteristic back bone network with 3D convolution and 2D convolution fused, and extracting voxels and pixel characteristics with more abundant semantic information under multiple scales, namely three-dimensional data volumes of various resolutions, including processed voxel characteristics of non-empty voxels, and two-dimensional data volumes including pixel characteristics of non-empty voxels; (4) Determining the voxel of the point, and determining the projection position of the point in the three-dimensional data volume and the two-dimensional data volume according to the position information in the original characteristic of the point, namely determining the voxel of the point; (5) Determining the voxel characteristics and the pixel characteristics after the point processing, and determining the voxel characteristics and the pixel characteristics after the point processing according to the voxels to which the point belongs; (6) The feature fusion of the points is realized by splicing the original features of the laser points with voxel features after passing through mlp networks and further extracting shallow feature of the point dimension in the voxels through mlp networks; on the basis of obtaining shallow features of points and corresponding processed voxel features and pixel features, further fusing the features in the three dimensions to obtain point dimension description features with accurate position information and rich semantic information; (7) The point cloud segmentation and target detection are realized by performing multi-task learning on the fusion characteristics of each laser point, including foreground and background segmentation, point classification and identification, iou score supervision, center point, size and angle supervision.

The target detection method in the embodiment of the disclosure can be applied to automatic driving, and objects around the vehicle are predicted in the automatic driving process, so that guarantee is provided for the safe realization of automatic driving; the method can also be applied to scenes such as high-precision maps, augmented reality (Augmented Reality, AR) navigation and the like.

Based on the inventive concept of the present disclosure, an embodiment of the present disclosure further provides a target detection device based on point cloud data, a structure of which is shown in fig. 6, including:

a voxelization module 61 for voxelizing the point cloud data and extracting voxel characteristics of non-empty voxels;

A feature obtaining module 62, configured to obtain shallow layer information of a laser point in the non-empty voxel according to the voxel feature extracted by the voxelization module 61, and a voxel feature and a pixel feature corresponding to the laser point;

the fusion module 63 is configured to obtain fusion characteristics of the laser spot according to the shallow information of the laser spot obtained by the characteristic obtaining module 62 and the voxel characteristics and the pixel characteristics corresponding to the shallow information;

The target identification module 64 is configured to determine a target to be identified in the point cloud data according to the fusion characteristics of the laser points obtained by the fusion module 63.

In one embodiment, the feature obtaining module 62 is configured to obtain shallow layer information of the laser point in the non-empty voxel according to the voxel feature, specifically configured to:

And splicing the original features of the laser points in the non-empty voxels with the voxel features of the non-empty voxels to obtain shallow layer information of the laser points.

In one embodiment, the feature obtaining module 62 is configured to obtain a voxel feature and a pixel feature corresponding to the laser point, where the voxel feature and the pixel feature are specifically:

Inputting the voxel characteristics of the non-empty voxels into a three-dimensional convolution and two-dimensional convolution fused characteristic network, and extracting the three-dimensional network characteristics and the two-dimensional network characteristics of the non-empty voxels; and obtaining the voxel characteristics and the pixel characteristics corresponding to the laser points in the non-empty voxels according to the corresponding relation between the laser points in the non-empty voxels and the three-dimensional network characteristics and the two-dimensional network characteristics.

In one embodiment, the target identifying module 64 is configured to determine a target to be identified in the point cloud data according to the fusion characteristics of the laser points, specifically:

Identifying foreground points through a deep learning network according to the fusion characteristics of the laser points, and determining target prediction information of the foreground points; screening a preset number of foreground points to be used as main foreground points in a mode of furthest point sampling; and determining the recognition result of the target to be recognized according to the target prediction information of the main foreground point.

In one embodiment, the target recognition module 64 determines a recognition result of the target to be recognized according to the target prediction information of the primary foreground point, specifically for:

For each main foreground point, determining a set of main foreground points, wherein the distance between the target center point and the target center point of the main foreground point is smaller than a preset distance threshold value, according to the target center point position information in the target prediction information of the main foreground point; determining a first target prediction result according to target prediction information of each main foreground point in the set; clustering the first target prediction results, and determining a second target prediction result according to the first target prediction results of the same class as a target recognition result of the target to be recognized.

In one embodiment, the target recognition module 64 determines a second target prediction result based on the first target prediction result of the same class, specifically for:

If the first target prediction results of the same class comprise more than one first target prediction result, determining a score average value of prediction scores in target prediction information of main foreground points corresponding to the first target prediction results of the class; and determining a second target prediction result according to the first target prediction result with the score mean value higher than a preset first score threshold value.

In one embodiment, the target recognition module 64 determines target prediction information for the foreground points, specifically for:

And determining the central point position information, the size information, the angle information and the prediction score of the target corresponding to the foreground point, wherein the central point position information is the offset of the foreground point relative to the central point of the target, and is used as target prediction information.

In one embodiment, the target recognition module 64 screens a preset number of foreground points as the main foreground points by using the furthest point sampling method, which is specifically used for:

Deleting foreground points with the prediction score lower than a preset second score threshold value in the target prediction information; determining the center point position of a target corresponding to the foreground point according to the position information in the original characteristics of the foreground point and the offset of the foreground point in the target prediction information relative to the target center point; and screening a preset number of foreground points to serve as main foreground points in a mode of furthest point sampling according to the center point position of the target corresponding to the foreground points.

In one embodiment, the voxelization module 61 extracts voxel features of non-empty voxels, specifically for:

and extracting voxel characteristics of non-empty voxels by using a multi-layer perception network (MLP) and/or a Convolutional Neural Network (CNN).

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Based on the inventive concept of the present disclosure, the embodiments of the present disclosure further provide a computer program product with a target detection function, including a computer program/instruction, where the computer program/instruction implements the target detection method based on the point cloud data when executed by a processor.

Based on the inventive concept of the present disclosure, an embodiment of the present disclosure further provides a server, including: the target detection method based on the point cloud data comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the target detection method based on the point cloud data when executing the program.

Unless specifically stated otherwise, terms such as processing, computing, calculating, determining, displaying, or the like, may refer to an action and/or process of one or more processing or computing systems, or similar devices, that manipulates and transforms data represented as physical (e.g., electronic) quantities within the processing system's registers or memories into other data similarly represented as physical quantities within the processing system's memories, registers or other such information storage, transmission or display devices. Information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.

It should be understood that the specific order or hierarchy of steps in the processes disclosed are examples of exemplary approaches. Based on design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, the present disclosure is directed to less than all of the features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate preferred embodiment of this disclosure.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. The processor and the storage medium may reside as discrete components in a user terminal.

For a software implementation, the techniques described in this disclosure may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. These software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.

The foregoing description includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, as used in the specification or claims, the term "comprising" is intended to be inclusive in a manner similar to the term "comprising," as interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean "non-exclusive or". The terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Claims

1. A target detection method based on point cloud data comprises the following steps:

Voxelized point cloud data, extracting voxel characteristics of non-empty voxels, wherein the voxel characteristics comprise original characteristics of each laser point in the non-empty voxels and characteristics obtained by fusing the original characteristics of the laser points in the non-empty voxels and the original characteristics of the laser points in a surrounding set range;

inputting the voxel characteristics into a three-dimensional convolution and two-dimensional convolution fused characteristic network, extracting three-dimensional network characteristics and two-dimensional network characteristics of the non-empty voxels, and obtaining voxel characteristics and pixel characteristics corresponding to laser points in the non-empty voxels according to the corresponding relation between the laser points in the non-empty voxels and the three-dimensional network characteristics and the two-dimensional network characteristics;

2. The method according to claim 1, wherein the determining the target to be identified in the point cloud data according to the fusion characteristics of the laser points specifically comprises:

identifying foreground points through a deep learning network according to the fusion characteristics of the laser points, and determining target prediction information of the foreground points;

screening a preset number of foreground points to be used as main foreground points in a mode of furthest point sampling;

And determining the recognition result of the target to be recognized according to the target prediction information of the main foreground point.

3. The method of claim 2, wherein the determining the recognition result of the object to be recognized according to the object prediction information of the primary foreground point specifically includes:

For each main foreground point, determining a set of main foreground points, wherein the distance between the target center point and the target center point of the main foreground point is smaller than a preset distance threshold value, according to the target center point position information in the target prediction information of the main foreground point;

determining a first target prediction result according to target prediction information of each main foreground point in the set;

Clustering the first target prediction results, and determining a second target prediction result according to the first target prediction results of the same class as a target recognition result of the target to be recognized.

4. A method according to claim 3, wherein said determining a second target prediction result from the first target prediction results of the same class comprises:

if the first target prediction results of the same class comprise more than one first target prediction result, determining a score average value of prediction scores in target prediction information of main foreground points corresponding to the first target prediction results of the class;

and determining a second target prediction result according to the first target prediction result with the score mean value higher than a preset first score threshold value.

5. The method according to any one of claims 2 to 4, wherein the determining target prediction information of the foreground point specifically includes:

6. The method of claim 5, wherein the screening the preset number of foreground points by using the furthest point sampling method as the main foreground points specifically comprises:

deleting foreground points with the prediction score lower than a preset second score threshold value in the target prediction information;

determining the center point position of a target corresponding to the foreground point according to the position information in the original characteristics of the foreground point and the offset of the foreground point in the target prediction information relative to the target center point;

and screening a preset number of foreground points to serve as main foreground points in a mode of furthest point sampling according to the center point position of the target corresponding to the foreground points.

7. The method of claim 1, wherein the extracting voxel features of non-empty voxels, in particular, comprises:

8. A point cloud data-based object detection apparatus, comprising:

The voxelization module is used for voxelizing the point cloud data and extracting the voxel characteristics of non-empty voxels, wherein the voxel characteristics comprise the original characteristics of each laser point in the non-empty voxels and the characteristics obtained by fusing the original characteristics of the laser points in the non-empty voxels and the original characteristics of the laser points in a surrounding set range;

The feature acquisition module is used for inputting the voxel features extracted by the voxelization module into a feature network with three-dimensional convolution and two-dimensional convolution fusion, extracting the three-dimensional network features and the two-dimensional network features of the non-empty voxels, and acquiring the voxel features and the pixel features corresponding to the laser points in the non-empty voxels according to the corresponding relation between the laser points in the non-empty voxels and the three-dimensional network features and the two-dimensional network features;

9. A computer program product with object detection functionality, comprising a computer program/instruction which, when executed by a processor, implements the object detection method based on point cloud data as claimed in any of claims 1 to 7.