CN112669436A

CN112669436A - Deep learning sample generation method based on 3D point cloud

Info

Publication number: CN112669436A
Application number: CN202011563204.8A
Authority: CN
Inventors: 盛晨; 朱伟; 查竞宇; 陈冰晶; 柴连兴; 李昌昊; 赵宏; 孙俊杰
Original assignee: Jiaxing Hengchuang Electric Power Group Co ltd Bochuang Material Branch; Jiaxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: Jiaxing Hengchuang Electric Power Group Co ltd Bochuang Material Branch; Jiaxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-16
Anticipated expiration: 2040-12-25
Also published as: CN112669436B

Abstract

The invention discloses a deep learning sample generation method based on 3D point cloud, which comprises the following steps: obtaining a color point cloud data set of a target to be detected; constructing a virtual camera model; setting the focal length and the posture of a virtual camera, inputting each point in a color point cloud data set into a virtual camera model to obtain a pixel coordinate for each point, and generating an image output by the virtual camera model for a target to be detected; calculating an illumination correction coefficient, superposing the illumination correction coefficient on color components of the color point cloud data set, and inputting points in the color point cloud data set after virtual illumination change into a virtual camera model one by one to obtain an image of the target to be detected after the virtual illumination change; adding noise to obtain a new image based on the obtained image; and repeating the steps until a preset number of images are obtained and used as deep learning samples of the target to be detected. The invention simplifies the construction difficulty of the sample and improves the richness of the sample.

Description

Deep learning sample generation method based on 3D point cloud

Technical Field

The application belongs to the field of deep learning artificial intelligence, and particularly relates to a deep learning sample generation method based on 3D point cloud.

Background

With the rapid development of deep learning and computer vision, both have been widely used in many fields. Object Detection (Object Detection), which is a more common application in the two combined applications, has become an important ring in image understanding, and the task of the Object Detection is to find all interested objects (objects) in an image, and determine the position and size of each Object, which is one of the core problems in the field of machine vision.

At present, in a target detection algorithm based on a convolutional neural network, weight parameters in the convolutional neural network need to be trained on a large number of data samples, and the more data is, the better the regularization effect of the trained parameters is. However, in reality, detection and identification of a specific target, such as image detection of professional equipment, image detection of a specific defect, and the like, are greatly limited due to the lack of a large number of data set references and the lack of sufficient data samples, and thus, the application of the convolutional neural network is greatly limited.

Disclosure of Invention

The application aims to provide a deep learning sample generation method based on a 3D point cloud, which simplifies the sample construction difficulty and improves the sample abundance.

In order to achieve the purpose, the technical scheme adopted by the application is as follows:

a deep learning sample generation method based on 3D point cloud is used for generating a deep learning sample of a target to be detected in target detection, and comprises the following steps:

step 1, scanning a target to be detected by using a 3D scanner to obtain a color point cloud data set of the target to be detected, wherein each point in the color point cloud data set has a three-dimensional coordinate and a color component;

step 2, constructing a virtual camera model;

step 3, setting the focal length and the posture of a virtual camera, inputting each point in the color point cloud data set into the virtual camera model to obtain a pixel coordinate for each point, and generating an image output by the virtual camera model for a target to be detected;

step 4, calculating an illumination correction coefficient, superposing the illumination correction coefficient on the color components of the color point cloud data set to obtain a color point cloud data set after virtual illumination change, and inputting points in the color point cloud data set after virtual illumination change into the virtual camera model one by one to obtain an image of the target to be detected after virtual illumination change;

step 5, adding noise to obtain a new image based on the images obtained in the step 3 and the step 4;

and 6, repeating the steps 3-5 until a preset number of images are obtained and used as deep learning samples of the target to be detected.

Several alternatives are provided below, but not as an additional limitation to the above general solution, but merely as a further addition or preference, each alternative being combinable individually for the above general solution or among several alternatives without technical or logical contradictions.

Preferably, the scanning precision of the 3D scanner is not lower than 1 mm.

Preferably, the constructing the virtual camera model includes:

the virtual camera model is constructed as follows:

in the formula, Z_cIs a proportionality constant, u, v are the pixel coordinates of the image, f_x、f_yIs the focal length of the virtual camera, u₀、v₀Is the optical center coordinate of the virtual camera, R is the rotation matrix of the virtual camera's pose relative to the world coordinate system, t is the translation matrix of the virtual camera's pose relative to the world coordinate system, x_w、y_w、z_wThe position of a point in a world coordinate system;

wherein the rotation matrix R is defined by euler angles (α, β, γ) as follows:

in the formula, the angles of alpha, beta and gamma are 0 to 360;

wherein the translation matrix t can be expressed as:

in the formula, t_x、t_y、t_zIs the coordinate of the virtual camera center in the world coordinate system.

Preferably, the inputting each point in the color point cloud data set into the virtual camera model to obtain a pixel coordinate for each point includes:

said colour point cloud data set being

And is

Wherein (x)_i,y_i,z_i) Is the three-dimensional coordinate of point i, (r)_i,g_i,b_i) Is the color component of point i, n is the number of points;

three-dimensional coordinates (x) of point i_i,y_i,z_i) As (x)_w,y_w,z_w) Inputting the pixel coordinate (u) corresponding to the point i into the virtual camera model_i,v_i) And pixel coordinate (u)_i,v_i) Is the color component (r) of point i_i,g_i,b_i)；

If the same pixel coordinate is obtained by corresponding a plurality of points in the color point cloud data set, calculating a plurality of points and the center (t) of the virtual camera respectively_x,t_y,t_z) And (4) taking the color component corresponding to the point with the minimum distance as the color of the pixel coordinate.

Preferably, the generating an image output by the virtual camera model for the target to be detected includes:

acquiring pixel coordinates aiming at each point output by the virtual camera model and colors corresponding to the pixel coordinates to obtain an initial image;

traversing all pixel coordinates in the initial image, judging whether a pixel with the color of 0 exists or not, and if not, outputting the initial image as an image of a target to be detected; otherwise, color filling is carried out on the pixel with the color of 0, and the image after the color filling is finished is taken as the image of the target to be detected to be output.

Preferably, the color padding for the pixel with the color of 0 includes:

let the pixel coordinate of the pixel to be filled with color 0 be (u)_k,v_k)；

Taking the pixel coordinate (u)_k,v_k) Effective pixel θ { (u) within a rectangular region as a center_q,v_q),-d≤u_q-u_k≤d,-d≤v_q-v_k≤d,(r_q,g_q,b_q)≠0}，(u_q,v_q) Is the pixel coordinate of the effective pixel, (r)_q,g_q,b_q) Is the color of the active pixel;

calculate the pixel coordinate as (u)_k,v_k) Is (r) as the fill color_k,g_k,b_k)：

In the formula (d)_qkIs an effective pixel (u)_q,v_q) And the pixel (u) to be filled_k,v_k) The distance between them.

Preferably, the calculating the illumination correction coefficient, and superimposing the illumination correction coefficient on the color component of the color point cloud data set to obtain the color point cloud data set after the virtual illumination is changed includes:

for each point (x) to be corrected_i,y_i,z_i) The illumination correction factor is calculated as follows:

I_d＝IK_d(N·L)

in the formula I_dAs the illumination correction factor, I is the external illumination (R)_I,G_I,B_I)，R_I、G_I、B_IIs between 0 and 1, K_dIs a reflection coefficient (K)_r,K_g,K_b)，K_r、K_g、K_bThe value of (1) is between 0 and 1, N is a normalized normal vector of a point to be corrected, and L is a vector normalized by the center of the virtual camera;

the calculation process of the normalized normal vector N of the point to be corrected is as follows:

acquiring a point (x) to be corrected_i,y_i,z_i) Carrying out plane fitting on 4 points closest to the periphery to obtain a plane AX + BY + CZ + D which is 0;

obtaining a normalized normal vector according to the plane obtained by fitting

The calculation process of the vector L normalized by the virtual camera center is as follows:

let the virtual camera center and the point (x) to be corrected_i,y_i,z_i) A distance of d_vThen vector normalized by virtual camera center

The calculated illumination correction coefficient I_dWith the point (x) to be corrected_i,y_i,z_i) Corresponding color component (r)_i,g_i,b_i) Multiplying to obtain (I)_dr_i,I_dg_i,I_db_i) As a point (x)_i,y_i,z_i) Color after virtual illumination change;

and (4) performing illumination correction on each point one by one to obtain a color point cloud data set after virtual illumination change.

Preferably, the noise is gaussian noise and/or salt and pepper noise.

According to the 3D point cloud-based deep learning sample generation method, a data set picture is automatically generated by 3D point cloud scanning and a computer vision geometric model method, and the richness of a sample is improved by changing the shooting posture of a virtual camera, the change of virtual illumination and adding random noise; the virtual modeling mode is utilized to save the process of actually acquiring and processing the image, thereby greatly simplifying the difficulty of sample construction.

Drawings

Fig. 1 is a flowchart of a 3D point cloud-based deep learning sample generation method according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

In order to solve the problems of high sample generation difficulty, high cost and long time consumption in the prior art, the embodiment provides the deep learning sample generation method based on the 3D point cloud, which is used for generating the deep learning sample of the target to be detected in the target detection through the 3D point cloud, so that the sample generation difficulty is effectively reduced, and the richness of the sample is improved.

As shown in fig. 1, the method for generating a deep learning sample based on a 3D point cloud in this embodiment includes the following steps:

step 1, scanning a target to be detected by adopting a 3D scanner to obtain a color point cloud data set of the target to be detected, wherein each point in the color point cloud data set has a three-dimensional coordinate and a color component.

In order to ensure that the 3D point cloud can restore the form of the real target to a large extent, the scanning accuracy of the 3D scanner is set to be not less than 1mm in this embodiment. The color point cloud data obtained by scanning are

And is

Wherein (x)_i,y_i,z_i) Is the three-dimensional coordinate of point i, (r)_i,g_i,b_i) Is the color component of point i and n is the number of points.

Step 2, constructing a virtual camera model as follows:

in the formula, Z_cIs a proportionality constant, u, v are the pixel coordinates of the image, f_x、f_yIs the focal length of the virtual camera, u₀、v₀Is the optical center coordinate of the virtual camera, R is the rotation matrix of the virtual camera's pose relative to the world coordinate system, t is the translation matrix of the virtual camera's pose relative to the world coordinate system, x_w、y_w、z_wIs the position of a point in the world coordinate system.

in the formula, the angles of alpha, beta and gamma are 0 to 360.

Wherein the translation matrix t can be expressed as:

And 3, setting the focal length and the posture of the virtual camera, inputting each point in the color point cloud data set into the virtual camera model to obtain a pixel coordinate for each point, and generating an image output by the virtual camera model for the target to be detected.

Different images can be obtained for the same target under different focal lengths and postures of the camera, so that the target image under multiple angles is obtained through the posture of the focal length of the virtual camera, and samples are enriched.

Wherein the focal length f of the virtual camera_x、f_yThe value can be 2000, and of course, any focal length value can be set according to needs. Optical center coordinate u of virtual camera₀、v₀Depending on the pixel size of the sample picture to be generated, for example, a picture of 600 × 600 pixels is to be generated, the optical center coordinates are (300 ).

The rotation matrix R can be obtained by taking (alpha, beta, gamma) at random, and the translation matrix t can be obtained by taking t at random_x、t_y、t_zAnd (4) obtaining.

After the focal length and the posture of the virtual camera are set, an image can be generated through the 3D point cloud, and the generation of a sample of a target to be detected is achieved. The process of generating an image in one embodiment is as follows:

three-dimensional coordinates (x) of point i_i,y_i,z_i) As (x)_w,y_w,z_w) Inputting the pixel coordinate (u) corresponding to the point i into the virtual camera model_i,v_i) And pixel coordinate (u)_i,v_i) Is the color component (r) of point i_i,g_i,b_i)。

And traversing each point in the color point cloud data set, acquiring the pixel coordinate of each point output by the virtual camera model and the color corresponding to each pixel coordinate, and obtaining the image of the target to be detected at the moment as an initial image.

The method is limited by the scanning precision and environmental interference of a 3D scanner, so that points in a color point cloud data set may not completely reflect a target to be detected, namely, a currently obtained initial image may have a hole to be filled, and therefore hole filling detection needs to be performed on the initial image:

traversing all pixel coordinates in the initial image, judging whether a pixel (a hole pixel) with the color of 0 exists or not, and if not, outputting the initial image as an image of the target to be detected; otherwise, color filling is carried out on the pixel with the color of 0, and the image after the color filling is finished is taken as the image of the target to be detected to be output.

In the color filling, the color of any adjacent pixel of the hole pixel may be used as the color of the hole pixel, or the color of a pixel at a distance and in a direction from the hole pixel may be used as the color of the hole pixel. In order to improve the effectiveness of color filling and avoid color overshoot, in one embodiment, a color filling method is provided as follows:

let the pixel coordinate of the pixel to be filled with color 0 be (u)_k,v_k)。

Taking the pixel coordinate (u)_k,v_k) Effective pixel θ { (u) within a rectangular region as a center_q,v_q),-d≤u_q-u_k≤d,-d≤v_q-v_k≤d,(r_q,g_q,b_q)≠0}，(u_q,v_q) Is the pixel coordinate of the effective pixel, (r)_q,g_q,b_q) Is the color of the active pixel; typically d has a value of 5.

In the formula (d)_qkIs an effective pixel (u)_q,v_q) And the pixel (u) to be filled_k,v_k) Is a distance therebetween, i.e.

And 4, calculating an illumination correction coefficient, superposing the illumination correction coefficient on the color components of the color point cloud data set to obtain a color point cloud data set after virtual illumination change, and inputting points in the color point cloud data set after the virtual illumination change into the virtual camera model one by one to obtain an image of the target to be detected after the virtual illumination change.

Considering that when a sample of a target to be detected is obtained in an actual environment, the ambient light can affect the imaging color of the target to be detected to a greater extent, so that the illumination correction coefficient is introduced in the embodiment, and the change of the ambient light is simulated by using the illumination correction coefficient, so that a richer sample is obtained.

Compared with the method for obtaining the sample in the actual environment, the method for obtaining the sample in the actual environment can simulate the illumination of various environments through the illumination correction coefficient, does not need to match the illumination condition in the actual environment, and can obtain more samples in the illumination environment more simply, conveniently and quickly.

In one embodiment, obtaining the image of the target to be detected after the virtual illumination is changed includes the following steps:

I_d＝IK_d(N·L)

in the formula I_dAs the illumination correction factor, I is the external illumination (R)_I,G_I,B_I)，R_I、G_I、B_IIs between 0 and 1, K_dIs a reflection coefficient (K)_r,K_g,K_b)，K_r、K_g、K_bThe value of (a) is between 0 and 1, N is a normalized normal vector of a point to be corrected, and L is a vector normalized by the center (observation position) of the virtual camera.

acquiring a point (x) to be corrected_i,y_i,z_i) And performing plane fitting on the nearest 4 points around the plane to obtain a plane AX + BY + CZ + D which is 0.

Obtaining a normalized normal vector according to the plane obtained by fitting

The calculation process of the vector L of the observation position normalization is as follows:

let the observed position and the point (x) to be corrected_i,y_i,z_i) A distance of d_vThen observe the vector of the position normalization

The calculated illumination correction coefficient I_dWith the point (x) to be corrected_i,y_i,z_i) Corresponding color component (r)_i,g_i,b_i) Multiplying to obtain (I)_dr_i,I_dg_i,I_db_i) As a point (x)_i,y_i,z_i) Color after change in virtual lighting.

And after the color point cloud data set with the changed virtual illumination is obtained, inputting the color point cloud data set with the changed virtual illumination into the virtual camera model to obtain an image with the changed virtual illumination. It should be noted that the process of inputting the color point cloud data set after the virtual illumination change into the virtual camera model to obtain the image is the same as the process of directly inputting the color point cloud data set into the virtual camera model to obtain the image, and the focal length and the posture which are the same as those set in step 3 can be adopted, and the focal length and the posture can also be reset, which is not described herein again.

And 5, adding noise to obtain a new image based on the images obtained in the

steps

3 and 4. The noise can be gaussian noise and/or salt and pepper noise to expand the sample and make the sample more consistent with the image collected in the actual environment.

According to the embodiment, the focal length and the posture of the virtual camera are continuously changed, the virtual illumination environment is obtained, abundant and real samples can be obtained based on point cloud data of the target to be detected, and the sample obtaining difficulty and cost are greatly simplified.

It should be noted that, according to actual needs, operations such as image cropping and scaling may be performed on the obtained image, for example, using a findContour function of opencv to obtain an image contour, using a boundingRect function to obtain a minimum bounding rectangle of the contour, and cropping the image so as to conform to the size of the minimum bounding rectangle. And then scaling the image according to the sample picture size requirement.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A deep learning sample generation method based on 3D point cloud is used for generating a deep learning sample of a target to be detected in target detection, and is characterized in that the deep learning sample generation method based on the 3D point cloud comprises the following steps:

step 2, constructing a virtual camera model;

2. The method for generating 3D point cloud-based deep learning samples according to claim 1, wherein the scanning accuracy of the 3D scanner is not less than 1 mm.

3. The method for generating 3D point cloud-based deep learning samples according to claim 1, wherein the constructing a virtual camera model comprises:

the virtual camera model is constructed as follows:

in the formula, the angles of alpha, beta and gamma are 0 to 360;

wherein the translation matrix t can be expressed as:

4. The method of claim 1, wherein the inputting each point in the color point cloud data set into the virtual camera model to obtain pixel coordinates for each point comprises:

said colour point cloud data set being

And is

5. The method for generating the deep learning sample based on the 3D point cloud according to claim 4, wherein the generating the image output by the virtual camera model for the target to be detected comprises:

6. The method for generating deep learning samples based on 3D point cloud according to claim 5, wherein the color filling of the pixel with color 0 comprises:

7. The method of claim 1, wherein the calculating an illumination correction coefficient and superimposing the illumination correction coefficient on color components of the color point cloud data set to obtain a color point cloud data set after virtual illumination change comprises:

I_d＝IK_d(N·L)

in the formula I_dAs the illumination correction factor, I is the external illumination (R)_I,G_I,B_I)，R_I、G_I、B_IIs between 0 and 1, K_dIs a coefficient of reflection(K_r,K_g,K_b)，K_r、K_g、K_bThe value of (1) is between 0 and 1, N is a normalized normal vector of a point to be corrected, and L is a vector normalized by the center of the virtual camera;

obtaining a normalized normal vector according to the plane obtained by fitting

8. The method of generating deep learning samples based on 3D point cloud of claim 1, wherein the noise is gaussian noise and/or salt and pepper noise.