CN112862953B

CN112862953B - Point cloud data processing method and device, electronic equipment and storage medium

Info

Publication number: CN112862953B
Application number: CN202110129924.1A
Authority: CN
Inventors: 王哲
Original assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2023-11-28
Anticipated expiration: 2041-01-29
Also published as: CN112862953A

Abstract

The disclosure provides a method, a device, electronic equipment and a storage medium for processing point cloud data, wherein the method comprises the following steps: performing target detection on the first point cloud data in the first training sample set by using a target detection neural network obtained based on the second point cloud data training in the second training sample set to obtain a prediction detection frame of at least one target object; pairing at least one prediction detection frame and at least one first annotation frame corresponding to at least one target object in the first point cloud data to obtain at least one pair of prediction detection frames and first annotation frames; each pair of the prediction detection frames and the first annotation frame correspond to the same target object; for each first annotation frame, generating a second annotation frame with the same annotation style as the second training sample set based on the pairing result. The method and the device realize conversion among different labeling styles, and are convenient for improving the detection precision of the neural network trained by using the mixed data.

Description

Point cloud data processing method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to a processing method and device of point cloud data, electronic equipment and a storage medium.

Background

three-Dimensional (3D) target detection based on lidar is a key technology in the field of autopilot. In the process of target detection, firstly, a laser radar can be adopted to acquire point cloud data of the appearance surface of an object in the environment; and then, manually marking the point cloud data to obtain a marking frame of the target object.

However, due to the sparse nature of the laser radar point clouds, targets that are far from the laser radar tend to be scanned less in number of point clouds, which increases the difficulty of labeling. For different annotators, the annotating styles are different, for example, some annotators may annotate the frame according to life experience, and some annotators only annotate according to the scanned point cloud (such as only annotating the tail of the vehicle).

Under the condition that point cloud data of different labeling styles are mixed together to perform network training, the network cannot effectively distinguish the different labeling styles, and the accuracy of subsequent target detection is poor.

Disclosure of Invention

The embodiment of the disclosure at least provides a processing method, a processing device, electronic equipment and a storage medium of point cloud data, which realize conversion among different labeling styles, facilitate neural network training of mixed data, and improve accuracy of the trained neural network in subsequent target detection and other applications.

In a first aspect, an embodiment of the present disclosure provides a method for processing point cloud data, including:

acquiring first point cloud data in a first training sample set and a first annotation frame of at least one target object in the first point cloud data, and acquiring a target detection neural network obtained based on second point cloud data training in a second training sample set; the first training sample set and the second training sample set are different in labeling style;

performing target detection on the first point cloud data by utilizing the target detection neural network to obtain a prediction detection frame of at least one target object;

pairing the at least one prediction detection frame and the at least one first labeling frame to obtain at least one pair of prediction detection frames and the first labeling frame; wherein each pair of the prediction detection frame and the first annotation frame corresponds to the same target object;

and for each first annotation frame, performing annotation style conversion on the first annotation frame based on the first annotation frame and a prediction detection frame matched with the first annotation frame, and generating a second annotation frame with the same annotation style as the second training sample set.

According to the point cloud data processing method, target detection is carried out on first point cloud data in the first training sample set through the target detection neural network obtained through training of second point cloud data in the second training sample set, so that the prediction detection frames of the obtained target objects can be fused with the labeling styles of the second training sample set, and after at least one prediction detection frame and at least one first labeling frame are paired, each pair of obtained prediction detection frames and each pair of obtained first labeling frame correspond to the same target object. Because the prediction detection frames matched with the first annotation frame are fused with the annotation styles of the second point cloud data, based on the matching relation, the first annotation frame can be subjected to style conversion to obtain the second annotation frame which accords with the annotation style of the second training sample set, and under the condition that the point cloud data belonging to the same annotation style are subjected to mixed training, the influence of training samples of different annotation styles on the accuracy of the neural network is avoided, so that the accuracy of subsequent target detection and other applications is higher.

In a possible implementation manner, the pairing processing of the at least one prediction detection box and the at least one first labeling box includes:

respectively carrying out compression processing on each prediction detection frame and each first labeling frame; the predicted detection frame after the compaction is a minimum size detection frame containing all first point cloud data in the predicted detection frame before the compaction; the first labeling frame after the compaction is a minimum size labeling frame containing all first point cloud data in the corresponding first labeling frame before the compaction;

and carrying out pairing processing on each prediction detection frame and each first labeling frame based on each prediction detection frame and each first labeling frame after the compaction processing.

Here, considering the demands of various labeling modes on labeling integrity, the compacting processing may be performed based on all the first point cloud data in the prediction detection frame before the compacting processing, so that the obtained prediction detection frame is a minimum size detection frame including all the first point cloud data, and the detection frame may be the smallest style difference between the detection frame and the first labeling frame of the first point cloud data, so that the matching to the same target object may be facilitated, and further, the difference between labeling styles of the first labeling frame and the second point cloud data may be determined, so that the converted second labeling frame may be obtained.

In one possible implementation, the first point cloud data includes coordinate information of a plurality of point cloud points; the compacting processing of each prediction detection frame comprises the following steps:

for each prediction detection frame, determining a target point cloud point closest to each face based on coordinate information of a plurality of point cloud points in the prediction detection frame and position information of each face of the prediction detection frame;

and for each surface of the prediction detection frame, moving the center position of the prediction detection frame facing the prediction detection frame until the target point cloud point closest to the surface falls on the moved surface, so as to obtain the prediction detection frame after the compression processing.

Here, the closest target point cloud points to each face may be determined based on the position information of each face of the prediction detection frame and the coordinate information of a plurality of point cloud points within the prediction detection frame, and these closest target point cloud points correspond to the boundary of the minimum size detection frame, so that the compaction process under the minimum size compaction may be realized.

In one possible implementation manner, the pairing processing is performed on each prediction detection frame and each first labeling frame based on each prediction detection frame and each first labeling frame after the compaction processing, including:

For each packed predicted detection frame, selecting the packed first annotation frame with the largest intersection ratio with the packed predicted detection frame as the packed first annotation frame paired with the packed predicted detection frame based on the intersection ratio of the respective packed first annotation frame with the packed predicted detection frame; or,

for each first packed annotation frame, selecting the predicted annotation frame with the largest intersection ratio with the first packed annotation frame as the predicted annotation frame paired with the first packed annotation frame based on the intersection ratio of each predicted annotation frame with the first packed annotation frame.

In the embodiment of the disclosure, the probability that the intersection ratio between the predicted detection frame after the compaction processing and the first labeling frame after the compaction processing represents the corresponding same target object is higher, the probability is smaller, and based on the probability, the pairing processing between each predicted detection frame and each first labeling frame can be realized.

In one possible implementation manner, the generating, for each first labeling frame, a second labeling frame with the same labeling style as the second training sample set based on the first labeling frame and a prediction detection frame paired with the first labeling frame, and the generating includes:

Inputting first point cloud data in each first labeling frame into a trained style conversion neural network, and determining the offset between the first labeling frame and a prediction detection frame matched with the first labeling frame;

and generating a second annotation frame which is the same as the annotation style of the second training sample set based on the first annotation frame and the offset determined for the first annotation frame.

Here, the offset between the paired prediction detection frame and the first label frame may be determined first, and then the first label frame may be style-converted based on this offset. The offset in embodiments of the present disclosure may be determined using a pre-trained style conversion neural network, which makes the overall style conversion more efficient.

In one possible implementation, the style conversion neural network is trained using a first annotation box and a prediction detection box of a plurality of target objects in the first training sample set.

In one possible implementation, the style conversion neural network is trained as follows:

acquiring first point cloud data in a first annotation frame of each target object in a plurality of target objects and offset between a prediction detection frame and the first annotation frame of each target object;

And taking the first point cloud data in the first labeling frame of each target object as input data of the style conversion neural network to be trained, taking the offset between the prediction detection frame of the target object and the first labeling frame as an output result of the style conversion neural network to be trained, and training the style conversion neural network to be trained.

In a second aspect, an embodiment of the present disclosure further provides a processing apparatus for point cloud data, including:

the acquisition module is used for acquiring first point cloud data in a first training sample set and a first labeling frame of at least one target object in the first point cloud data, and acquiring a target detection neural network obtained based on second point cloud data in a second training sample set; the first training sample set and the second training sample set are different in labeling style;

the detection module is used for carrying out target detection on the first point cloud data by utilizing the target detection neural network to obtain a prediction detection frame of at least one target object;

the pairing module is used for carrying out pairing processing on the at least one prediction detection frame and the at least one first annotation frame to obtain at least one pair of prediction detection frames and the first annotation frame; wherein each pair of the prediction detection frame and the first annotation frame corresponds to the same target object;

The generating module is used for carrying out annotation style conversion on the first annotation frames based on the first annotation frames and the prediction detection frames matched with the first annotation frames for each first annotation frame, and generating a second annotation frame with the same annotation style as the second training sample set.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the method of processing point cloud data as described in any of the first aspect and its various embodiments.

In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor performs the steps of the method of processing point cloud data according to the first aspect and any of its various embodiments.

For the description of the effect of the processing device, the electronic device, and the computer-readable storage medium of the point cloud data, reference is made to the description of the processing method of the point cloud data, which is not repeated here.

The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.

Fig. 1 shows a flowchart of a method for processing point cloud data according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a processing device for point cloud data according to an embodiment of the disclosure;

fig. 3 shows a schematic diagram of an electronic device provided by an embodiment of the disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The term "and/or" is used herein to describe only one relationship, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

According to research, due to the sparse characteristic of the point clouds of the laser radar, the number of the point clouds which are often scanned by targets far away from the laser radar is smaller, and the difficulty of labeling is increased. For different annotators, there is often a lot of disputes about how the same object is annotated, for example, whether a close frame close to the point cloud should be marked, or a brain complement frame should be marked according to common knowledge, imagining the part not swept by the point cloud. If brain-supplementing frames are marked, different markers may be brain-supplemented according to their respective life experiences, resulting in inconsistent marked frame sizes.

In addition, if the annotators can refer to the point clouds of the front frame and the rear frame or are matched with images, the reliability of the annotating frames can be enhanced, but in the actual annotating work, the annotators can not necessarily take the information, so that the label styles of the 3D detection often have great differences among different training sample sets.

Based on the above research, the disclosure provides a processing method, a device, an electronic device and a storage medium of point cloud data, which realize conversion among different labeling styles, facilitate neural network training of mixed data, and promote accuracy of the trained neural network in subsequent target detection and other applications.

For the sake of understanding the present embodiment, first, a detailed description will be given of a method for processing point cloud data disclosed in the present embodiment, where an execution body of the method for processing point cloud data provided in the present embodiment is generally a computer device having a certain computing capability, where the computer device includes, for example: the terminal device, or server or other processing device, may be a User Equipment (UE), mobile device, user terminal, cellular telephone, cordless telephone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle mounted device, wearable device, etc. In some possible implementations, the method for processing the point cloud data may be implemented by a processor calling computer readable instructions stored in a memory.

Referring to fig. 1, a flowchart of a method for processing point cloud data according to an embodiment of the disclosure is shown, where the method includes steps S101 to S104, where:

s101: acquiring first point cloud data in a first training sample set and a first annotation frame of at least one target object in the first point cloud data, and acquiring a target detection neural network obtained based on second point cloud data training in a second training sample set; the first training sample set and the second training sample set are different in labeling style;

s102: performing target detection on the first point cloud data by utilizing a target detection neural network to obtain a prediction detection frame of at least one target object;

s103: pairing at least one prediction detection frame and at least one first labeling frame to obtain at least one pair of prediction detection frames and first labeling frames; wherein each pair of the prediction detection frame and the first annotation frame corresponds to the same target object;

s104: and for each first annotation frame, performing annotation style conversion on the first annotation frame based on the first annotation frame and a prediction detection frame matched with the first annotation frame, and generating a second annotation frame with the same annotation style as that of the second training sample set.

Here, in order to facilitate understanding of the processing method of point cloud data provided by the embodiment of the present disclosure, an application scenario of the processing method may be first described in detail. The processing method provided by the embodiment of the disclosure can be mainly applied to the field of style migration for style unification among data with different labeling styles. In a specific application, there may be two or three or more labeling styles, and each labeling style may correspond to one training sample set, that is, a process of unifying labeling styles among multiple training sample sets may be performed.

For example, the annotation box style of any training sample set is converted into the annotation style of one of the designated training sample sets. The labeling style referred to herein may refer to that when labeling different training sample sets, there are different labeling effects on the same target due to different labeling standards and different experience of labeling staff, such as the size of a frame, whether the frame is closely attached to a point cloud, and so on.

The training sample sets of different labeling styles are mixed together to perform the training of the neural network, so that the neural network can be confused in the labeling styles of the different training sample sets, for example, the labeling result can be given by the first labeling style in the process of performing the current round of training on the data of the first labeling style, and the neural network does not know whether the data of the second labeling style is labeled by referring to the first labeling style or the second labeling style in the process of performing the next round of training, so that the application effect of the neural network is poor.

In order to solve the above-mentioned problem, the embodiment of the present disclosure provides a processing scheme for unifying labeling styles between different training sample sets, so that an algorithm based on supervised training (such as deep learning of such a neural network) can fully utilize mixed training of different training sample sets, and improve the application effect of the neural network.

Considering the problem that the 3D data, i.e., the point cloud data, exists in the actual labeling process, the first training sample set and the second training data set in the embodiment of the disclosure may both correspond to the point cloud data. Here, the first point cloud data in the first training sample set may be used as point cloud data to be subjected to style conversion, and the second point cloud data in the second training sample set may be used as point cloud data of a labeling style to which the intention is to be converted.

It should be noted that, the first point cloud data may be one-frame point cloud data, or may be multi-frame point cloud data (e.g., corresponding to the first training sample set), and the second point cloud data may be multi-frame point cloud data (e.g., corresponding to the second training sample set). In addition, in a specific application, the labeling styles of one first training sample set may be converted into the labeling styles of a second training sample set, or the labeling styles of a plurality of first training sample sets may be converted into the labeling styles of the second training sample set, where the number of the first training sample sets is not specifically limited.

The process of collecting point cloud data may be briefly described as follows. The point cloud data in the embodiment of the present disclosure may be acquired by using a radar device, and the point cloud data correspondingly acquired in different target scenes is also different, where the point cloud data may be dense or sparse, and is not limited specifically. The radar apparatus may be a rotary scanning lidar, or may be other radar apparatus, and is not particularly limited.

Taking a rotary scanning laser radar as an example, the laser radar can acquire three-dimensional point cloud data related to the surrounding environment during horizontal rotary scanning. Here, in the process of performing the rotation scanning, the laser radar may adopt a multi-line scanning mode, that is, the laser tube is used for transmitting sequentially, and the structure is that the laser tubes are longitudinally arranged, that is, in the process of performing the rotation scanning in the horizontal direction, the multi-layer scanning in the vertical direction is performed. The laser radar device comprises a laser tube, wherein each laser tube is provided with a certain included angle, a vertical emission view field can be 30-40 degrees, so that one data packet returned by the laser emitted by a plurality of laser tubes can be obtained when the laser radar device rotates by one scanning angle, one frame of point cloud data (corresponding to 360 degrees of scanning rotation) can be obtained by splicing the data packets obtained by the scanning angles, and the acquisition of one frame of point cloud data can be completed after the laser radar scans by one circle.

For the collected first point cloud data, the embodiment of the disclosure may label the target object in the first point cloud data in the first training sample set in advance, that is, may determine the first point cloud data labeled with the first labeling frame.

For the second point cloud data in the second training sample set, the target detection neural network can be supervised and trained based on the pre-labeled label frame.

The trained target detection neural network covers a second labeling style of the second point cloud data. In specific applications, three-dimensional detection networks such as PV-RCNN, part-A2, etc. may be used, and other networks capable of implementing the above functions may be used, which are not limited in particular.

In order to convert the first labeling style of the first point cloud data into the second labeling style, the target detection neural network obtained by the training may be used to perform target detection on the first point cloud data, so that the obtained prediction detection frame of at least one target object is fused with the second labeling style.

For the prediction detection frame fused with the second annotation style and the first annotation frame with the first annotation style, the annotation difference between the two annotation styles can be determined through pairing processing, wherein the annotation difference can be the offset between the paired prediction detection frame and the first annotation frame, the first annotation frame marked by the first point cloud data can be adjusted based on the offset, and then the second annotation frame conforming to the second annotation style is obtained aiming at the first point cloud data.

Considering the key role of the pairing process of the detection frame and the annotation frame on style conversion, the pairing process can be specifically described herein, and includes the following steps:

step one, respectively carrying out compression processing on each prediction detection frame and each first labeling frame; the predicted detection frame after the compaction is a minimum size detection frame containing all first point cloud data in the predicted detection frame before the compaction; the first labeling frame after the compaction is a minimum size labeling frame containing all first point cloud data in the corresponding first labeling frame before the compaction;

and step two, based on the prediction detection frames and the first labeling frames after the compression processing, carrying out pairing processing on the prediction detection frames and the first labeling frames.

Here, considering that all labeling rules to be followed by the labeling staff, it must at least be ensured that the 3D frame can completely encapsulate the target point cloud (i.e. the point cloud data corresponding to the target object). In the case of different experience of the annotators, the frames that are ultimately annotated by the different annotators are different, but the point clouds contained in the detection frames are highly probable to be uniform, mainly because the annotators only imagine the parts that are not swept by the radio rays.

Based on this, in the embodiment of the present disclosure, the compaction processing may be completed with a minimum size detection frame corresponding to all first point cloud data in the prediction detection frame before the compaction processing, and the compaction processing may be completed with a minimum size labeling frame corresponding to all first point cloud data in the first labeling frame before the compaction processing, so that a style difference between the prediction detection frame after the completion of the compaction processing and the first labeling frame is smaller, and thus pairing of the prediction detection frame and the first labeling frame for the same target object may be achieved.

The following further describes the compression processing process of the prediction detection frame, which can be implemented by the following steps:

step one, aiming at each prediction detection frame, determining a target point cloud point closest to each surface based on coordinate information of a plurality of point cloud points in the prediction detection frame and position information of each surface of the prediction detection frame;

and secondly, moving the center position of the prediction detection frame facing the prediction detection frame aiming at each surface of the prediction detection frame until a target point cloud point closest to the surface falls on the moved surface, so as to obtain the prediction detection frame after the compression processing.

In order to implement the compacting process based on the minimum size, a target point cloud point closest to the surface may be determined for each surface of the prediction detection frame, and the center position of the prediction detection frame may be moved based on the determined target point cloud point of each surface until the target point cloud point closest to the surface falls on the moved surface, so as to obtain the prediction detection frame after the compacting process.

Wherein in determining the closest target point cloud point to each face, it may be determined based on a point-to-plane distance formula. The point here may correspond to coordinate information of a plurality of point cloud points in the prediction detection frame, and the plane here may correspond to a plane equation corresponding to each plane, so that, for each plane, a distance between the point cloud point and each point cloud point may be determined based on a distance formula, and a point cloud point closest to the point cloud point may be determined as a target point cloud point corresponding to the plane.

Taking a 3D detection frame as an example of a prediction detection frame, the detection frame has 6 faces, and 6 target point cloud points closest to the 6 faces of the detection amount can be determined based on the method. Here, each surface of the detection frame may be sequentially shrunk toward the direction in which the center position of the detection frame is located, and if the shrunk surface covers the corresponding target point cloud point, the compaction process for each surface is completed, so as to obtain the predicted detection frame after the compaction process.

The compacting process of the first labeling frame may be the same as the compacting process of the prediction detection frame, which is not described herein.

In the processing method of the point cloud data provided by the embodiment of the disclosure, under the condition of performing compression processing, pairing processing can be performed, where the pairing processing can be implemented based on an intersection ratio between two frames (i.e., the prediction detection frame and the first labeling frame).

Considering that the prediction detection boxes in the embodiments of the present disclosure may be one or more, and the first labeling boxes may be one or more, based on this, a combination manner between various boxes may occur.

In order to facilitate traversal calculation of the intersection ratio between various combination modes, on the one hand, the prediction detection frame may be taken as a reference frame, and the first annotation frame paired with the reference frame may be selected from the first annotation frames after each compaction treatment by calculating the intersection ratio between the first annotation frame after each compaction treatment and the reference frame; on the other hand, the first labeling frame may be used as a reference frame, and the prediction detection frame paired with the reference frame may be selected from among the prediction labeling frames after the compression processing by calculating the intersection ratio between the prediction labeling frame after the compression processing and the reference frame.

Regardless of which frame is used as the reference frame, here, the intersection ratio between the first labeling frame and the prediction detection frame may be determined as follows:

step one, respectively determining first volume information of a first labeling frame and second volume information of a prediction detection frame;

and step two, determining the intersection ratio between the first target labeling frame and the target detection frame based on the first volume information and the second volume information.

Here, first volume information of the first labeling frame and second volume information of the prediction detection frame may be calculated first. Still taking a 3D frame as an example of the first labeling frame and the prediction detection frame, here, the first volume information and the second volume information may be determined based on a volume calculation formula of a cuboid, respectively.

Under the condition that the first volume information and the second volume information are determined, the intersection volume of the first labeling frame and the prediction detection frame (corresponding to the overlapping volume between the two frames) and the merging volume of the first labeling frame and the prediction detection frame (corresponding to the merging volume between the two frames) can be further determined, and therefore the intersection ratio between the first labeling frame and the prediction detection frame can be obtained by carrying out ratio operation on the intersection volume and the merging volume.

In a specific application, the prediction detection frame and the first annotation frame may be synchronously displayed on the first point cloud data by using a visualization tool, and may be displayed based on a relative positional relationship between the two frames. The greater the ratio of the two frames to be intersected, the greater the likelihood that the two frames correspond to the same target object, and similarly, the lesser the ratio of the two frames to be intersected, the lesser the likelihood that the two frames correspond to the same target object.

According to the above method for calculating the cross-over ratio, a first labeling frame with the largest cross-over ratio with respect to each prediction detection frame can be found, and meanwhile, in consideration of the key effect of pairing between two frames with different cross-over ratios, a preset cross-over ratio threshold (for example, 0.5) can be set to filter out the false detection influence caused by the search result with smaller cross-over ratio, so that the accuracy of the pairing result is improved.

Under the condition that pairing operation between the first annotation frame and the prediction detection frame is completed, the embodiment of the disclosure can perform style conversion on the first annotation frame, and further obtain a second annotation frame which accords with the annotation style of the second training sample set after style conversion. The method can be realized by the following steps:

Step one, inputting first point cloud data in each first labeling frame into a style conversion neural network after training is completed, and determining offset between the first labeling frame and a prediction detection frame matched with the first labeling frame;

and step two, generating a second annotation frame with the same annotation style as the second training sample set based on the first annotation frame and the offset determined for the first annotation frame.

Here, considering that the prediction detection frame may be a detection frame fused with the labeling style of the second training sample set, the difference between the labeling style of the first labeling frame and the labeling style of the second training sample set may be determined by the offset between the paired prediction detection frame and the first labeling frame, and the first labeling frame may be adjusted by using the difference, so as to obtain the second labeling frame after style conversion.

Wherein, the offset between the prediction detection box and the first labeling box related to pairing in the embodiment of the disclosure may be determined based on the style-converting neural network. In a specific application, the first point cloud data in each first annotation frame is input into a trained style conversion neural network, so that the offset between the first annotation frame and a prediction detection frame paired with the first annotation frame can be determined.

The style conversion neural network may be obtained by training a first labeling frame and a prediction detection frame of each target object in the plurality of target objects in the first training sample set.

Specifically, the style conversion neural network may be trained as follows:

step one, acquiring first point cloud data in a first labeling frame of each target object in a plurality of target objects and offset between a prediction detection frame and the first labeling frame of each target object;

and secondly, taking first point cloud data in a first labeling frame of each target object as input data of the style conversion neural network to be trained, taking offset between a prediction detection frame of the target object and the first labeling frame as an output result of the style conversion neural network to be trained, and training the style conversion neural network to be trained.

Here, input data (corresponding to first point cloud data in a first labeling frame of each target object in the plurality of target objects) and output results (corresponding to an offset between a prediction detection frame and the first labeling frame of each target object) of the style conversion neural network to be trained may be collected in advance, and network parameters of the style conversion neural network may be trained by training a correspondence between the input data and the output results.

Here, it is assumed that the prediction detection frame of each target object can be expressed as (x, y, z, w, h, l, θ), where x, y, z are coordinates of the frame, w, h, l are length, width, height of the frame, and θ is an orientation angle of the frame in the xy plane. The first labeling frame of each target object is (x ', y ', z ', w ', h ', l ', theta '), and the sign meaning is consistent with the prediction detection frame. In the training process of the style conversion neural network, the neural network input may be first point cloud data in a first labeling frame corresponding to each target object, and the output may be an offset between a pair of prediction detection frames and the first labeling frame, that is:

(dx，dy，dz，dw，dh，dl，dθ)＝(x′-x，y′-y，z′-z，w′-w，h′-h，l′-l，θ′-θ)。

the neural network is trained by the corresponding relation between the input point cloud data and the output offset. After training is completed, first point cloud data in a first annotation frame corresponding to a target object is input into a trained style conversion neural network, and the offset corresponding to the target object can be obtained.

In a specific application, the style conversion neural network may be a deep learning neural network such as a PointNet or a VoxelNet for processing point cloud data, or may be other neural networks capable of implementing the above functions, which is not limited in particular.

For each target object in the first point cloud data, under the condition of the offset determined based on the style conversion neural network, the second annotation frame of the second annotation style of each target object in the first point cloud data can be obtained by adding the determined offset to the first annotation frame corresponding to the target object, and as the second annotation style is the annotation style belonging to the second training sample set, the unification of different annotation styles is realized, and good data support is provided for subsequent mixed training.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

Based on the same inventive concept, the embodiment of the disclosure further provides a processing device for point cloud data corresponding to the processing method for point cloud data, and since the principle of solving the problem of the device in the embodiment of the disclosure is similar to that of the processing method for point cloud data in the embodiment of the disclosure, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 2, a schematic diagram of a processing device for point cloud data according to an embodiment of the disclosure is shown, where the device includes: an acquisition module 201, a detection module 202, a pairing module 203, and a generation module 204; wherein,

an obtaining module 201, configured to obtain first point cloud data in a first training sample set and a first labeling frame of at least one target object in the first point cloud data, and obtain a target detection neural network obtained based on second point cloud data in a second training sample set; the first training sample set and the second training sample set are different in labeling style;

the detection module 202 is configured to perform target detection on the first point cloud data by using a target detection neural network, so as to obtain a prediction detection frame of at least one target object;

the pairing module 203 is configured to pair at least one prediction detection frame and at least one first labeling frame, so as to obtain at least one pair of prediction detection frame and first labeling frame; wherein each pair of the prediction detection frame and the first annotation frame corresponds to the same target object;

the generating module 204 is configured to perform, for each first labeling frame, labeling style conversion on the first labeling frame based on the first labeling frame and a prediction detection frame paired with the first labeling frame, and generate a second labeling frame with the same labeling style as that of the second training sample set.

According to the embodiment of the disclosure, the target detection neural network obtained by training the second point cloud data in the second training sample set is used for carrying out target detection on the first point cloud data in the first training sample set, so that the prediction detection frames of the obtained target objects can be fused with the labeling styles of the second training sample set, and after all the prediction detection frames and all the first labeling frames are paired, each pair of obtained prediction detection frames and each pair of obtained first labeling frames correspond to the same target object. Because the prediction detection frames matched with the first annotation frames are fused with the annotation styles of the second point cloud data, based on the matching relation, the first annotation frames can be subjected to style conversion, namely, the second annotation frames which accord with the annotation styles of the second training sample set can be obtained, and under the condition that the point cloud data belonging to the same annotation style are subjected to mixed training, the influence of training samples of different annotation styles on the accuracy of the neural network is avoided, so that the accuracy of subsequent target detection and other applications is higher.

In a possible implementation manner, the pairing module 203 is configured to pair at least one prediction detection box and at least one first labeling box according to the following steps:

In one possible implementation, the first point cloud data includes coordinate information of a plurality of point cloud points; the pairing module 203 is configured to perform compaction processing on each prediction detection frame according to the following steps:

In a possible implementation manner, the pairing module 203 is configured to pair each prediction detection frame and each first label frame based on each prediction detection frame and each first label frame after the compaction process according to the following steps:

In one possible implementation manner, the generating module 204 is configured to perform, for each first labeling frame, labeling style conversion on the first labeling frame based on the first labeling frame and a prediction detection frame paired with the first labeling frame, and generate a second labeling frame with the same labeling style as the second training sample set according to the following steps:

based on the first annotation frame and the offset determined for the first annotation frame, a second annotation frame is generated that is the same as the annotation style of the second training sample set.

In one possible implementation, the style conversion neural network is trained using a first annotation box and a predictive detection box of a plurality of target objects in a first training sample set.

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

The embodiment of the disclosure further provides an electronic device, as shown in fig. 3, which is a schematic structural diagram of the electronic device provided by the embodiment of the disclosure, including: a processor 301, a memory 302, and a bus 303. The memory 302 stores machine-readable instructions executable by the processor 301 (e.g., execution instructions corresponding to the acquisition module 201, the detection module 202, the pairing module 203, the generation module 204, etc. in the apparatus of fig. 2), when the electronic device is running, the processor 301 communicates with the memory 302 through the bus 303, and the machine-readable instructions when executed by the processor 301 perform the following processes:

performing target detection on the first point cloud data by utilizing a target detection neural network to obtain a prediction detection frame of at least one target object;

pairing at least one prediction detection frame and at least one first labeling frame to obtain at least one pair of prediction detection frames and first labeling frames; wherein each pair of the prediction detection frame and the first annotation frame corresponds to the same target object;

and for each first annotation frame, performing annotation style conversion on the first annotation frame based on the first annotation frame and a prediction detection frame matched with the first annotation frame, and generating a second annotation frame with the same annotation style as that of the second training sample set.

The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for processing point cloud data described in the above method embodiments. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.

The embodiments of the present disclosure further provide a computer program product, where the computer program product carries program code, and instructions included in the program code may be used to execute the steps of the method for processing point cloud data described in the foregoing method embodiments, and specifically refer to the foregoing method embodiments and are not described herein.

Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. The processing method of the point cloud data is characterized by comprising the following steps of:

2. The processing method according to claim 1, wherein the pairing processing of the at least one prediction detection frame and the at least one first annotation frame includes:

3. The processing method according to claim 2, wherein the first point cloud data includes coordinate information of a plurality of point cloud points; the compacting processing of each prediction detection frame comprises the following steps:

4. A processing method according to claim 2 or 3, wherein the pairing processing of each prediction detection frame and each first label frame based on each prediction detection frame and each first label frame after the compaction processing includes:

5. The processing method according to any one of claims 1 to 4, wherein the generating, for each first labeling frame, a second labeling frame having a labeling style identical to that of the second training sample set based on the first labeling frame and a prediction detection frame paired with the first labeling frame, includes:

6. The method of claim 5, wherein the style conversion neural network is trained using a first annotation box and a predictive detection box of a plurality of target objects in the first training sample set.

7. The method of processing of claim 6, wherein the style conversion neural network is trained as follows:

8. A processing apparatus for point cloud data, comprising:

9. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the method of processing point cloud data according to any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when run by a processor, performs the steps of the method of processing point cloud data according to any of claims 1 to 7.