Data enhancement method based on depth image
The technical field is as follows:
the invention provides a data enhancement method based on a depth image, which is suitable for the field of computer vision and is based on algorithms of depth image recognition, target detection, behavior recognition and the like.
Background art:
in recent years, deep learning has become more and more widely used in the field of computer vision. The excellent performance of deep learning in the face of many problems in the field of computer vision has led more and more researchers to begin to be involved in this research direction. The deep learning can be performed with such excellent performance because the deep convolutional network has strong expression capability, and can train the required model result according to the training target. However, this is also true, and the network model itself requires a large amount of data, even massive, to drive the model training, otherwise it may put the model in a dilemma of overfitting. In practical cases, however, not all data sets possess a huge amount of training samples. As such, data enhancement becomes an important step in model training during the actual training process. Effective data expansion not only can expand the number of training samples, but also can increase the diversity of sample training. On one hand, overfitting of the model can be avoided, and on the other hand, the performance of the model can be improved. Common image data enhancement methods are: horizontal flipping, random rotation, random scaling, random cropping, random translation, and the like. These common image data enhancement methods are applied in the RGB image field, and for other types of images, these methods are not applicable.
With the development of binocular cameras in recent years, the cost of the binocular cameras becomes lower and lower, and depth images acquired by the binocular cameras are applied to the field of computer vision by more and more researchers. For example: human skeleton key point detection, human behavior recognition, gesture recognition and other fields. However, the common data enhancement method applied in the RGB image field is not applicable in the depth image. Since the value stored by each pixel point of the depth image is the depth distance from the position to the camera, the picture is distorted by directly using the data enhancement method in the RGB image. Aiming at the problem, the invention provides a data enhancement method based on a depth image, which comprises the steps of converting pixel points in an image into a three-dimensional space to form a three-dimensional point cloud through a conversion relation from an image coordinate system to a world coordinate system according to the imaging principle of a depth image, then carrying out corresponding pose conversion on the three-dimensional point cloud by taking the world coordinate as a center, and converting the space three-dimensional point cloud into the pixel points through the conversion relation from the world coordinate to the image coordinate after conversion to form a new image. In addition, the smoothing method using minimum value filtering is proposed to remove noise in the depth image and fill in blank spots generated after the depth image is transformed. The depth image data enhancement method can improve the generalization capability and accuracy of the network model.
The invention content is as follows:
the invention aims to provide a data enhancement method for the depth learning direction of a depth image in the field of vision computers, and the method can enhance the data of the depth image in the network model training process so as to improve the generalization capability and accuracy of a training model.
The invention mainly adopts the following scheme:
the data enhancement method based on the depth image mainly comprises a pixel coordinate conversion three-dimensional point cloud, three-dimensional point cloud space transformation, three-dimensional point cloud conversion pixel coordinates and a minimum value filtering processing part, and the algorithm flow is shown in figure 1. The pixel coordinate conversion three-dimensional point cloud comprises a pixel coordinate system conversion image coordinate system, an image coordinate system conversion camera coordinate system and a camera coordinate system conversion world coordinate system. And converting pixel points in the depth image into a three-dimensional space by multiplication and concatenation of conversion matrixes among the four coordinate systems to form a three-dimensional space point cloud. The three-dimensional point cloud space transformation comprises random translation transformation under a three-dimensional point cloud space and random rotation transformation under the three-dimensional point cloud space. The rotation transformation in the three-dimensional point cloud space comprises rotation transformation around an X axis, a Y axis and a Z axis, the rotation transformation angles of the X axis, the Y axis and the Z axis are generated by randomly generating numbers, and then the space rotation angle transformation is completed by a corresponding transformation matrix. The random translation transformation in the three-dimensional point cloud space comprises translation transformation along an X axis, a Y axis and a Z axis, the translation transformation distances along the X axis, the Y axis and the Z axis are generated by randomly generating numbers, and the space translation transformation is completed by a translation transformation matrix. The three-dimensional point cloud conversion pixel coordinate system comprises a world coordinate system conversion camera coordinate system, a camera coordinate system conversion image coordinate system and an image coordinate system conversion pixel coordinate system. And projecting the transformed three-dimensional space point cloud into the depth image through the combination of the transformation matrixes among the four coordinate systems. When the position of the spatial three-dimensional point cloud relative to the camera is changed, partial points are shielded by other points, so that data loss exists in the converted depth image. The minimum filtering process is used for filling up lost pixel points, so that the final output effect of the converted and transformed depth image is clearer.
The pixel coordinate transformation three-dimensional point cloud is obtained by inputting a depth image through a pixel plane coordinate system (u, v), an image coordinate system (X, y) and a camera coordinate system (X)c,Yc,Zc) World coordinate system (X)w,Yw,Zw) The conversion relationship between the two points converts the pixel points in the image into a space three-dimensional point cloud. Firstly, converting the pixel points in each image into the plane image coordinates through a conversion matrix from a pixel coordinate system to an image coordinate system, wherein the relationship between the pixel coordinate system and the image coordinate system is shown in fig. 2, and the matrix transformation between the image coordinate system (x, y) and the pixel coordinate system (u, v) is shown in fig. 3. Projecting the image plane points in the obtained image coordinate system to a camera coordinate system according to the geometric relationship principle of camera imaging, wherein the geometric relationship of the camera imaging is shown in figure 4, and the image coordinate system (X, y) is changed into the camera coordinate system (X)c,Yc,Zc) The transformation matrix of (2) is shown in fig. 5. Then, the position relation of the obtained points in the camera coordinate system with respect to the established world coordinate system is converted, and the camera coordinate system (X) is obtainedc,Yc,Zc) To the world coordinate system (X)w,Yw,Zw) The transformation relationship matrix of (2) is shown in fig. 6. Finally, by combining the coordinate system transformation relations, a transformation relation from the depth image to the world coordinate system is obtained, and the transformation matrix is shown in fig. 7, where M is a projection matrix. And converting pixel points in the depth map into a three-dimensional space point cloud map under a world coordinate system through the projection matrix, wherein an effect map is shown in fig. 8.
The three-dimensional point cloud space transformation mainly comprises rotation transformation around an axis X, Z, Y of a world coordinate system and translation transformation along an XYZ axis of the world coordinate system, namely a rotation transformation matrix R and a translation transformation matrix T. The rotation transformation matrix R comprises an angle rotation transformation matrix around an X axis, an angle rotation transformation matrix around a Y axis and an angle rotation transformation matrix around a Z axis, and the three groups of rotation matrices are combined to obtain the rotation transformation matrix R. The relationship diagram of the rotation transformation matrix around the axis X, Y, Z is shown in fig. 9, 10 and 11. Transforming angles [ psi, omega, theta ] for translations around axes of world coordinate system X, Y, Z]TAnd generating a corresponding transformation angle through a random number. For translation transformations along the axes of world coordinate system X, Y, Z, translation along the X-axis, translation along the Y-axis, translation along the Z-axis, respectively, a translation transformation matrix T is shown in fig. 12. Transforming distances [ x, y, z ] for translation along axes of world coordinate system X, Y, Z]TThe corresponding translation distance is generated by a random number.
The three-dimensional point cloud converts pixel coordinates including a world coordinate system (X)w,Yw,Zw) Transforming the camera coordinate System (X)c,Yc,Zc) Camera coordinate system (X)c,Yc,Zc) The image coordinate system (x, y) is converted, and the image coordinate system (x, y) is converted into the pixel coordinate system (u, v). And (4) performing inversion operation on the relation by using the previously obtained conversion matrix relation among the four coordinate systems, and projecting the transformed three-dimensional space point cloud into the depth image. A transformed depth image is formed.
When the position of the spatial three-dimensional point cloud relative to the camera is changed, a part of points may be blocked by other points, so that a part of data of the converted depth image is lost, for example, fig. 13 is an effect diagram obtained by projecting the three-dimensional point cloud to the pixel coordinates of the image after changing the distance of the three-dimensional point cloud relative to the Z axis of the world coordinate system, and a black line part in the diagram is lost data. The minimum filtering process is used for filling up lost pixel points, so that the converted depth image has a better imaging effect. The designed minimum value filter is used for sequencing the coordinate point and values of points around the coordinate point, the minimum non-zero value is used for replacing the values of the coordinate point, the designed filter kernel is a 3-by-3 matrix, and the final output depth image is obtained after the processing of the minimum value filter. The front and back contrast of the filtered image is shown in fig. 14. Finally, the specific effect of the input depth image after the data enhancement method is shown in fig. 15. The specific method for enhancing the effect map data is to translate the input depth image along the world coordinate system Z, rotate the input depth image along the world coordinate system Z axis and rotate the input depth image along the world coordinate system Y axis.
①, in the process of performing rotation transformation around the X, Y and Z axes of a world coordinate system on three-dimensional space point cloud and performing random transformation along the X, Y, Z axis direction of the world coordinate system, the value of a random rotation angle is between-5 degrees and 5 degrees, the translation distance is between-0.3 m and 0.3m, when the translation distance exceeds the range, the transformed three-dimensional point cloud is re-projected into a depth image, the point cloud is seriously lost, ②, after the transformed three-dimensional point cloud is projected into the depth image, the depth image is subjected to twice filtering treatment by using a minimum filter, and the obtained image effect is ideal.
The data enhancement method based on the depth image has the following advantages:
1. the data enhancement method based on the depth image provides a data expansion method for the research based on the neighborhood of the depth image in the field of computer vision, so that the training of a network model on a data set of the depth image is not limited to only the data set any more.
2. The distance value is stored in each coordinate point of the depth image, so the data distribution of the collected images is different greatly due to different installation positions of the depth cameras. In the data enhancement method based on the depth image, the method for converting the image points into the three-dimensional space point cloud and then carrying out pose transformation on the three-dimensional space point cloud is provided, so that the data distribution of a network model at different camera installation positions can be learned in the training process, and the generalization capability of the network model is greatly improved.
3. The depth image after data enhancement is filtered by using the minimum filter, so that the problem of data loss caused by partial points being shielded after pose transformation is carried out on the three-dimensional space point cloud in the data enhancement of the depth image can be effectively eliminated, and the effect of the depth image generated after the data enhancement is better.
Drawings
FIG. 1 is a flow chart of a method for enhancing depth image data;
FIG. 2 is a diagram of a relationship between a pixel coordinate system and an image coordinate system;
FIG. 3 is a matrix transformation equation from a pixel coordinate system to an image coordinate system;
FIG. 4 is a diagram showing a transformation relationship between a camera coordinate system and an image coordinate system;
FIG. 5 is a matrix transformation equation from an image coordinate system to a camera coordinate system;
FIG. 6 is a matrix transformation equation from a camera coordinate system to a world coordinate system;
FIG. 7 is a matrix transformation equation from a pixel coordinate system to a world coordinate system;
FIG. 8 is a three-dimensional cloud plot of depth images and the corresponding depth images after conversion from the image coordinate system to the world coordinate system;
FIG. 9 is an angular rotation transformation matrix equation about the X-axis of the world coordinate system;
FIG. 10 is an angular rotation transformation matrix equation about the Y-axis of the world coordinate system;
FIG. 11 is an angular rotation transformation matrix equation about the Z-axis of the world coordinate system;
FIG. 12 is a translation transformation matrix equation along world coordinate system XYZ axes;
FIG. 13 is a transformed depth image without minimum filtering;
FIG. 14 is a front-to-back comparison graph of a depth image after a minimum filtering process is performed on the depth image after data enhancement operations have been performed on the depth image;
FIG. 15 is a diagram of data enhancement effect of depth images;
the specific implementation mode is as follows:
the invention is further described below with reference to the figures and examples.
The data enhancement method based on the depth image mainly comprises a pixel coordinate conversion three-dimensional point cloud, three-dimensional point cloud space transformation, three-dimensional point cloud conversion image coordinates and a minimum value filtering processing part. The process of converting the pixel coordinate into the three-dimensional point cloud comprises the steps of converting a pixel coordinate system into an image coordinate system, converting an image coordinate system into a camera coordinate system and converting a camera coordinate system into a world coordinate system. And converting the image points in the depth image into a three-dimensional space through the combination of the conversion matrixes among the four coordinate systems to form a three-dimensional space point cloud. The three-dimensional point cloud space transformation comprises random translation transformation under a three-dimensional point cloud space and random rotation transformation under the three-dimensional point cloud space, and the purpose of data expansion is achieved by performing pose transformation on the three-dimensional point cloud. The rotation transformation in the three-dimensional point cloud space comprises rotation transformation around an X axis, a Y axis and a Z axis, the rotation transformation angles of the X axis, the Y axis and the Z axis are generated by randomly generating numbers, and then the space rotation angle transformation is completed by a corresponding transformation matrix. The random translation transformation in the three-dimensional point cloud space comprises translation transformation along an X axis, a Y axis and a Z axis, the translation transformation distances along the X axis, the Y axis and the Z axis are generated by randomly generating numbers, and the space translation transformation is completed by a translation transformation matrix. The three-dimensional point cloud conversion pixel coordinate system comprises a world coordinate system conversion camera coordinate system, a camera coordinate system conversion image coordinate system and an image coordinate system conversion pixel coordinate system. And projecting the transformed three-dimensional space point cloud into the depth image through the multiplication and concatenation of the transformation matrixes among the four coordinate systems. When the position of the spatial three-dimensional point cloud relative to the camera is changed, partial points are shielded by other points, so that data loss exists in the converted depth image. The minimum filtering process is used for filling up lost pixel points, so that the converted depth image is clearer.
The image coordinate conversion three-dimensional point cloud is composed of an image coordinate system conversion pixel coordinate system, a pixel coordinate system conversion camera coordinate system and a camera coordinate system conversion world coordinate system, and an input depth image is converted into a three-dimensional point cloud through an image coordinate system (X, y), a pixel plane coordinate system (u, v) and a camera coordinate system (X, y)c,Yc,Zc) World coordinate system (X)w,Yw,Zw) The conversion relationship between the points in the image is converted into a space three-dimensional point cloud. The pixel coordinate system is converted into an image coordinate system, the pixel coordinate system and the image coordinate system are on an imaging plane, only the respective origin and measurement unit are different, and a relation graph is shown in fig. 2. Since (u, v) represents only the column and row numbers of pixels, and the positions of the pixels in the image are not expressed in physical units, an image coordinate system x-y in physical units (e.g., millimeters) is established. Defining the intersection point of the camera optical axis and the image plane as the origin O of the coordinate system1And the x-axis is parallel to the u-axis and the y-axis is parallel to the v-axis, assuming (u)0,v0) Represents O1Coordinates in the u-v coordinate system, dx and dy respectively represent the physical dimensions of each pixel on the horizontal axis x and the vertical axis y, and the conversion relationship between the coordinates in the u-v coordinate system and the coordinates in the x-y coordinate system of each pixel in the image is expressed in the form of a matrix as shown in fig. 3. The conversion between the pixel coordinate system and the image coordinate system can be completed through the matrix relational expression. The image coordinate system is converted into a camera coordinate system, and the geometric relationship of camera imaging can be represented by fig. 4. Wherein the O point is the optical center (projection center) of the camera and XcAxis and YcThe axes being parallel to the x-and y-axes of the imaging plane coordinate system, ZcThe axis is the optical axis of the camera and is perpendicular to the image plane. The intersection point of the optical axis and the image plane is the principal point O of the image1From points O and XcYcZcThe rectangular coordinate system of axes is called the camera coordinate system. O is1Is the camera focal length. Point P (X)c,Yc,Zc) Projected onto the image plane by light rays passing through the center of projection. The corresponding image point is p (x, y), and the transformation matrix equation derived from the principle of similarity triangles is shown in fig. 6. The conversion between the image coordinate system and the camera coordinate system can be completed through the matrix equation. The camera coordinate system is converted into a world coordinate system, and the world coordinate system is introduced to describe the position of the camera. The translation vector t and the rotation matrix R can be used to represent the relationship between the camera coordinate system and the world coordinate system. Therefore, assume that the spatial points P are aligned in the world coordinate systemThe secondary coordinate is (X)w,Yw,Zw,1)TThe homogeneous coordinate in the camera coordinate is (X)c,Yc,Zc,1)TThen, the transformation relation matrix equation between the world coordinate system and the camera coordinate system is shown in fig. 6, where R is a 3 × 3 rotation matrix and T is a 3 × 1 translation vector. Combining the above descriptions, the transformation matrix equation of the world coordinate system to the pixel plane coordinate system is shown in fig. 7, where M is the projection matrix. The conversion from the pixel coordinate system to the world coordinate system can be realized through the matrix equation, so that the pixel points in the depth image are converted into a three-dimensional space point cloud image under the world coordinate system, and the effect image is shown in fig. 8.
The three-dimensional point cloud space transformation mainly comprises rotation transformation around an axis X, Z, Y of a world coordinate system and translation transformation along an XYZ axis of the world coordinate system, namely a rotation transformation matrix R and a translation transformation matrix T. The rotation transformation matrix R comprises an angle rotation transformation matrix around an X axis, an angle rotation transformation matrix around a Y axis and an angle rotation transformation matrix around a Z axis, and the three groups of rotation matrices are combined to obtain the rotation transformation matrix R. The relationship diagram of the rotation transformation matrix around the axis X, Y, Z is shown in fig. 9, 10 and 11. Transforming angles [ psi, omega, theta ] for translations around axes of world coordinate system X, Y, Z]TGenerating corresponding transformation angle by random number, taking [ -a, a [ -a [ ]]The specific formula is [ psi, omega, theta%]T(-1+2 × random (0,1)) × a. For translation transformations along the axes of world coordinate system X, Y, Z, translation along the X-axis, translation along the Y-axis, translation along the Z-axis, respectively, a translation transformation matrix T is shown in fig. 12. Transforming distances [ x, y, z ] for translation along axes of world coordinate system X, Y, Z]TGenerating corresponding translation distance by random number, taking [ -b, b [ -b [ ]]The specific formula is [ x, y, z ]]T=(-1+2*random(0,1))*b。
The three-dimensional point cloud converts pixel coordinates including a world coordinate system (X)w,Yw,Zw) Transforming the camera coordinate System (X)c,Yc,Zc) Camera coordinate system (X)c,Yc,Zc) Converting image coordinate system (x, y) and converting image of image coordinate system (x, y)The prime coordinate system (u, v). And projecting the transformed three-dimensional space point cloud to pixel points of the depth image according to the obtained conversion matrix relation among the four coordinate systems. A transformed depth image is formed.
When the position of the spatial three-dimensional point cloud relative to the camera is changed, a part of points may be blocked by other points, so that a part of data of the converted depth image is lost, for example, fig. 13 is an effect diagram obtained by projecting the three-dimensional point cloud to the pixel coordinates of the image after changing the distance of the three-dimensional point cloud relative to the Z axis of the world coordinate system, and a black line part in the diagram is lost data. The minimum filtering process is used for filling up lost pixel points, so that the converted depth image has a better imaging effect. The center pixel is then compared to a non-zero minimum pixel value and if less than the minimum value, the replacement center pixel is the minimum value. The designed filter kernel is a matrix of 3 x 3, and when the depth value of a pixel point is less than 100, the depth value of the point is judged to be 0. And obtaining the final output depth image after the processing of the minimum value filter. The front and back contrast of the filtered image is shown in fig. 14. Finally, the specific effect of the input depth image after the data enhancement method is shown in fig. 15. The specific method for enhancing the effect map data is to translate the input depth image along the world coordinate system Z, rotate the input depth image along the world coordinate system Z axis and rotate the input depth image along the world coordinate system Y axis.