WO2021102948A1

WO2021102948A1 - Image processing method and device

Info

Publication number: WO2021102948A1
Application number: PCT/CN2019/122090
Authority: WO
Inventors: 孙春苗; 董双; 李鑫超
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2021-06-03
Also published as: CN112513929A

Abstract

An image processing method and device. The method comprises: obtaining a plurality of depth maps, each depth map comprising one or more target image regions, and the target image regions satisfying a preset type condition; determining edge pixels of the target image regions in the plurality of depth maps; projecting the edge pixels of the target image regions to a three-dimensional space to determine at least one three-dimensional point set, three-dimensional points in each three-dimensional point set corresponding to the same target object in the three-dimensional space; and according to spatial coordinates of the three-dimensional points in the three-dimensional points set, filling up a depth value of the target image region corresponding to each three-dimensional point set. By using this method, defects of missing, layering, and unevenness of a point cloud constructed by using depth maps are effectively ameliorated.

Description

Image processing method and device

Technical field

This application relates to the field of image processing technology, and specifically to an image processing method and device.

Background technique

At present, 3D reconstruction technology has been widely used in various fields. When performing three-dimensional reconstruction, multiple images can be collected, and the depth maps corresponding to these images can be determined, and then a three-dimensional point cloud can be obtained from the multiple depth maps to construct a three-dimensional model. Among them, when determining the depth of each pixel of an object in an image, feature points can be extracted from one image, and matching points of these feature points can be found in another image, and then the image can be determined based on these feature points and matching points The depth of each pixel in the middle to obtain a depth map. Therefore, for some objects with smooth surfaces and weak textures, because the feature points cannot be extracted, the depth of the object cannot be obtained. Therefore, the depth information corresponding to these objects in the final depth map will be missing, which makes the final construction There are holes in the 3D point cloud, so it is necessary to fill in the missing depth information in the depth map.

In the related technology, the effect of filling the missing depth information in the depth map is not ideal, which will lead to the layering phenomenon of the point cloud corresponding to the same object in the three-dimensional space in the finally reconstructed three-dimensional point cloud. Therefore, it is necessary to improve the filling method of the depth information of the depth map to obtain a uniform and complete point cloud.

Summary of the invention

In view of this, this application provides an image processing method and device.

According to the first aspect of the present application, there is provided an image processing method, the method including:

Acquiring a plurality of depth maps, each depth map includes one or more target image regions, and the target image regions meet a preset type condition;

Determining the edge pixels of each target image area in the multiple depth maps;

Projecting the edge pixels of each target image area into a three-dimensional space to determine at least one three-dimensional point set, and the three-dimensional points contained in each of the three-dimensional point sets correspond to the same target object in the three-dimensional space;

According to the spatial coordinates of the three-dimensional points in each of the three-dimensional point sets, the depth value of the target image area corresponding to each three-dimensional point set is filled.

According to a second aspect of the present application, there is provided an image processing apparatus, the apparatus including a processor, a memory, and a computer program stored on the memory, and the processor implements the following steps when the processor executes the computer program:

Applying the solution of this application, after acquiring multiple depth maps, determine the edge pixels of the target image area on each depth map, and then project the edge pixels in the three-dimensional space to obtain the corresponding three-dimensional values of the edge pixels in the three-dimensional space Then, according to these three-dimensional points, determine the three-dimensional point set corresponding to the same target object in three-dimensional space, and determine the depth value of the target image area corresponding to the three-dimensional point set on the depth map according to the space coordinates of the three-dimensional points in each three-dimensional point set. In order to fill in the depth information of the depth map, this method can effectively improve the defect, layering, and unevenness of the point cloud constructed by the depth map.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.

Fig. 1 is a flowchart of an image processing method provided by an embodiment of the present invention.

Fig. 2 is a schematic diagram of a semantic image provided by an embodiment of the present invention.

Fig. 3 is a schematic diagram of a depth map or semantic image including a target area provided by an embodiment of the present invention.

Fig. 4 is a schematic diagram of the intersection of water areas on two images provided by an embodiment of the present invention.

Fig. 5 is a flowchart of another image processing method provided by an embodiment of the present invention.

FIG. 6 is a schematic diagram of the point cloud effect obtained after filling the water area of the depth map provided by the prior art.

FIG. 7 is a schematic diagram of the point cloud effect obtained after filling the water area of the depth map according to an embodiment of the present invention.

Fig. 8 is a logical structure block diagram of an image processing device provided by an embodiment of the present invention.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

Three-dimensional reconstruction technology has been widely used in various fields. When performing three-dimensional reconstruction, multiple images can be collected, and the depth maps corresponding to these images can be determined, and then a three-dimensional point cloud can be obtained from the multiple depth maps to construct a three-dimensional model. Among them, when determining the depth of each pixel point of an object in an image, the feature points on the surface of the object can be extracted from one image, and the matching points of these feature points can be found in another image, and then based on these feature points and matching Point to determine the depth of each pixel of the object to get a depth map. Since the feature points are generally the corner points, inflection points or the intersection of the contour lines in the object, that is, the pixels with large differences in the pixel values of the surrounding pixels, it is impossible to extract the feature points for some objects with smooth surfaces and weak texture. It is also impossible to accurately calculate the depth of this object, so the area corresponding to these objects in the final depth map will have lack of depth information, which makes the final three-dimensional point cloud appear hollow. For example, for objects such as water and flat glass, because the surface is relatively smooth and weak texture, it is impossible to extract feature points from the image, and calculate the depth of the surface of these objects based on the feature points, so the final depth map is that the water Corresponding areas of objects such as, glass, etc. will appear missing.

In order to perfect the depth information of these objects with smooth surfaces and weak texture in the depth map, the regions corresponding to such objects in the depth map can be filled. In related technologies, when filling such objects in the depth map, the edge pixels of this type of object are usually determined from a single depth map, and the depth is calculated based on the depth of the edge pixels in the single depth map. , And fill in the corresponding area of the object in the single depth map. However, since an object often appears in multiple depth maps, each depth map may only contain a part of the object, so a single depth map is used There will be a certain deviation in calculating the depth based on the pixels of each depth map, that is, the depth calculated by each depth map is different, but it represents the same object in the three-dimensional space. Therefore, the depth of the edge pixels based on a single depth map is calculated separately. After the depth map is filled, it will cause the layered phenomenon in the finally reconstructed 3D point cloud. Therefore, it is necessary to improve the filling method of the depth information of the depth map to obtain a uniform and complete point cloud.

In order to solve the above problems, this application provides an image processing method. Specifically, as shown in FIG. 1, the image processing method provided by this application includes the following steps:

S102. Acquire multiple depth maps, each depth map includes one or more target image regions, and the target image regions meet a preset type condition;

S104. Determine the edge pixels of each target image area in the multiple depth maps.

S106: Project the edge pixels of each target image area into a three-dimensional space to determine at least one three-dimensional point set, and the three-dimensional points included in each three-dimensional point set correspond to the same target object in the three-dimensional space;

S108: Fill in the depth value of the target image area corresponding to each three-dimensional point set according to the space coordinates of the three-dimensional point in each of the three-dimensional point sets.

The image processing method provided in this application can be used in various electronic devices that perform image processing, such as mobile phones, notebook computers, tablet computers, desktop computers, and so on. In an optional embodiment, the image processing method provided in this application may also be executed in a cloud processor. Of course, in some embodiments, the image processing method provided in this application can also be applied to various types of 3D reconstruction software.

Multiple depth maps can be obtained first. These depth maps can be obtained from multiple RGB images collected by a camera. By extracting feature points from one RGB image, and then finding the corresponding matching points from another RGB image, according to the feature points With the corresponding matching point, the depth information of each pixel in the RGB image can be obtained, and the depth map corresponding to the RGB image can be obtained. These depth maps include one or more target image areas, where the target image area is an area that meets a preset type condition, that is, it may be a corresponding image area of a certain type of three-dimensional object. Generally, these target image areas may be image areas that need to be filled with depth information. For example, they may be image areas corresponding to some three-dimensional objects with a smooth surface and weak texture. For example, in some embodiments, the target image area may be The image area corresponding to the water area in the three-dimensional space, or the image area corresponding to the flat glass, of course, can also be the image area of other three-dimensional objects with similar characteristics, and this application is not limited.

The target object in this application is an object in a three-dimensional space corresponding to the target image area, that is, an object in the three-dimensional space with a relatively smooth surface and weak texture. For example, the target image area is an image area of a water area, that is, the target object is a water area. The target image area is the image area of the flat glass, and the target object is the flat glass. Wherein, one or more target image regions included in each depth map may be image regions corresponding to the same three-dimensional space target object, or image regions corresponding to different target objects in the three-dimensional space.

After obtaining a depth map containing one or more target image regions, the edge pixels of the target image region can be determined from the depth map. Among them, in some embodiments, the edge pixels can be obtained by pre-marking the depth map. For example, the target image area in the depth map can be determined, and then the target image area can be marked in the depth map. The marking can be artificial The way of marking, of course, can also adopt the way of automatic marking.

In some embodiments, the edge pixels can be determined from the depth map through the semantic image corresponding to the depth map, where the semantic image is an image divided into multiple image regions, and each image region corresponds to a type in the three-dimensional space. Target object. Semantic images can be obtained by semantic segmentation of images. Semantic segmentation of images distinguishes objects of different categories, and objects belonging to the same category are marked as one category. As shown in Figure 2, in the result of semantic segmentation, all people in the scene are divided into one category and marked with the same color (presented by different gray levels in Figure 2), trees, streets, buildings, sky, etc. So it is.

In some embodiments, the semantic image corresponding to each depth map can be obtained through a pre-trained calculation model. The RGB image corresponding to each depth map can be input to the pre-trained calculation model, and then the semantic image corresponding to each depth map can be output. . In order to train the calculation model, a large number of RGB images containing the target object can be collected, and then the image area corresponding to the target object in the image can be annotated, and the annotated RGB image can be trained on the model to obtain the calculation model. The calculation model may be a deep learning model, for example, it may be an FCN (Fully Convolutional Networks) model.

After determining the semantic image corresponding to the depth map, since the pixels in the image are in a one-to-one correspondence, each target image area and the target image area can be determined from the depth map according to the classification of various objects in the semantic image. Edge pixels. For example, if the target image area is the image area of the water area, the image area of the water area can be determined from the semantic image, and then the image area and the edge of the water area can be determined on the depth map according to the correspondence between the pixels on the two images pixel. Of course, it is also possible to directly determine the image area and edge pixels of the water area in the semantic image, and then determine the edge pixels in the depth map according to the corresponding relationship of the pixels.

Among them, in some embodiments, the edge pixels may be pixels located at the boundary of the target image area and belonging to the target image area. In some embodiments, the edge pixels may also be pixels located at the boundary of the target image area but belonging to other image areas. As shown in Figure 3, it is a schematic diagram of the target image area in the depth map or semantic image, where the pixels filled in gray correspond to the target image area in the image, and the unfilled white pixels are other image areas, and the pixels filled in gray are gray. And the pixel points 301 marked with a "cross" are pixels located at the boundary of the target image area and belonging to the target image area, and the pixels 302 filled with white and marked by multiple diagonal lines are located at the boundary of the target image area and belong to others. The pixels of the image area. Therefore, in some embodiments, 301, that is, a pixel located at the boundary of the target image area and belonging to the target image area, may be regarded as an edge pixel. However, because the pixel depth information on the target image area is not accurate, the subsequent calculation of the depth of the target image area is not accurate. Therefore, in some embodiments, 302, which is located at the boundary of the target image area and belongs to other images The pixels of the area are regarded as edge pixels.

In some embodiments, the pixel points of each target image area can be determined from the depth map according to the correspondence between the semantic image and the pixel points of the depth map, and then for each pixel point in the target image area, it can be determined one by one to locate the pixel point. Whether the surrounding pixels are all pixels of the target image area, if so, it means that there are pixels of the target area besides the pixel, so the pixel is not the edge of the target image area. If it is not, it means that the image is outside the image area, so it can be determined that the pixels of other image areas adjacent to the pixel are the edge pixels of the target image area, as shown in 302 in FIG. 3. For example, if the target image area is a water image, each pixel of the water image area can be determined from the semantic image corresponding to the depth map. Since the pixels of the two images correspond one-to-one, the depth map can be found Pixels in the water image area, and then determine one by one whether the neighboring pixels around each pixel are all pixels representing the water image, if not, determine the pixels of other image areas adjacent to the pixel as the edge of the water area .

Of course, in some embodiments, the pixel corresponding to the target image area can also be determined in the semantic image, and it is determined one by one whether the neighboring pixels located around the pixel are all pixels of the target image area, if not, then It is determined that the pixels in other image areas adjacent to the pixel are edge pixels, as shown in 302 in FIG. 3. After determining the edge pixel in the semantic image, find the corresponding point of the edge pixel on the depth map, which is the edge pixel on the depth map.

After determining the edge pixels of each target area in the above multiple depth maps, the edge pixels of each target image area can be projected into the three-dimensional space to determine which target image areas correspond to the same target object in the three-dimensional space , And then take the three-dimensional points corresponding to the edge pixels of the target image area corresponding to the same target object as a set to obtain one or more three-dimensional point sets, where each three-dimensional point set corresponds to the same target in the three-dimensional space Object. Then, according to the three-dimensional points in each three-dimensional point set, the depth value of the target image area corresponding to the set is determined. The projection of the edge pixel points of the target image area to the corresponding three-dimensional points in the three-dimensional space can be determined according to the pixel coordinates of the edge pixel points, the internal parameters and external parameters of the camera device, and the external parameters include the rotation matrix and the translation matrix of the camera device. The formula (1) can be used for conversion:

Among them, (u, v) is the coordinate of the edge pixel, Z is the depth of the pixel coordinate (u, v), K is the camera internal parameter, R is the rotation matrix of the camera device that took the image, and T is the image The translation matrix of the camera device, where Pw is the three-dimensional space coordinate (u, v) is the spatial coordinate of the three-dimensional point corresponding to the edge pixel point.

In some embodiments, the edge pixel points of each target image area can be projected into the three-dimensional space first to obtain the three-dimensional points corresponding to the edge pixel points of each target image area, and then according to the corresponding edge pixel points of each target image area The spatial coordinates of the three-dimensional points determine the three-dimensional points corresponding to the same target object in the three-dimensional space, and place the three-dimensional points corresponding to the same target object in the three-dimensional space in the same three-dimensional point set. For example, the three-dimensional space contains water area 1 and water area 2, and the water area image area on each depth map contains only a part of the above two water area areas, so the edge pixels of the water area image area on the multiple depth maps Points are projected into the three-dimensional space, and these edge pixels are determined to correspond to the three-dimensional points in the three-dimensional space. Then, from these edge pixels, the three-dimensional points corresponding to the edge pixels of water area 1 and the three-dimensional points corresponding to the edge pixels of water area 2 are determined. Point to construct two three-dimensional point sets.

In some embodiments, when determining the three-dimensional points corresponding to the same target object in the three-dimensional space, the three-dimensional points corresponding to the edge pixel points of each target image area may be determined first according to the spatial coordinates of the three-dimensional points corresponding to each target image area in the three-dimensional space. The intersection of the three-dimensional space area, and then determine the three-dimensional point corresponding to the same target object in the three-dimensional space according to the intersection of each target image area in the three-dimensional space corresponding to the three-dimensional space. For example, the three-dimensional points corresponding to the edge pixels of the target image area A are A1, A2, A3, A4, A5, and the three-dimensional points corresponding to the edge pixels of the target image area B are B1, B2, B3, B4, B5, and then you can add Connect A1, A2, A3, A4, and A5 to obtain a three-dimensional space area a, which is the physical space of the real object corresponding to the target image area A. Similarly, connect B1, B2, B3, B4, and B5, and You will get a three-dimensional space area b, which is the physical space of the real object corresponding to the target image area B. Then you can see whether the three-dimensional space area a and the three-dimensional space area b intersect. If they intersect, the target image area A and the target image area B corresponds to the same target object, so the three-dimensional points corresponding to the edge pixels of A and B are the three-dimensional points of the same target object.

In some embodiments, in order to better determine the intersection of the three-dimensional space area corresponding to the target image area, the three-dimensional points corresponding to the edge pixel points of each target image area can be projected onto the same plane to determine each target image The plane area corresponding to the area, and then according to the intersection of the plane area corresponding to each target image area, the intersection of the three-dimensional space area corresponding to each target image area is determined. For example, the three-dimensional points corresponding to the edge pixels of the target image area A are A1, A2, A3, A4, A5, and the three-dimensional points corresponding to the edge pixels of the target image area B are B1, B2, B3, B4, and B5. Points A1, A2, A3, A4, A5 and three-dimensional points B1, B2, B3, B4, B5 are projected into a plane, such as projected to a plane formed by the XY axis, and two plane areas a1 and b1 are obtained respectively, and then a1 is judged Whether it intersects with b1. If it intersects, it means that the three-dimensional space area corresponding to the target image area A and B intersect.

In some embodiments, if the three-dimensional space regions corresponding to two target image regions intersect, it is determined that the three-dimensional points corresponding to the edge pixel points of the two target image regions are three-dimensional points corresponding to the same target object in the three-dimensional space, Or if the three-dimensional space regions corresponding to two target image regions intersect with the same three-dimensional space region, then it is determined that the three-dimensional points corresponding to the edge pixel points of the two target image regions are three-dimensional points corresponding to the same target object in the three-dimensional space. For example, the corresponding three-dimensional space area of target image area A in three-dimensional space is a, the corresponding three-dimensional space area of target image area B in three-dimensional space is b, and the corresponding three-dimensional space area of target image area C in three-dimensional space is c, if If a and b intersect, it is considered that the target image area A and the target image area B correspond to the same target object in three-dimensional space, and the three-dimensional points corresponding to the edge pixels of the two image areas are the three-dimensional points of the same target object. If a and c intersect, and b and c intersect, then it is considered that the target image area A and the target image area B also correspond to the same target object in the three-dimensional space, that is, the target image area A, the target image area B, and the target image area C all correspond to the same target object in the three-dimensional space, and the three-dimensional points corresponding to the edge pixels of the three image regions are the three-dimensional points of the same target object.

After determining the three-dimensional point set corresponding to each target object, the depth information of the target object can be determined according to the spatial coordinates of each three-dimensional point in the three-dimensional point set. Since the target object is often an object with flat features such as water and glass, it can be regarded as a plane. Therefore, in some embodiments, the average value of the depth values of these three-dimensional points can be taken as the depth value of the target object. To fill in the depth information of the target image area in each depth map.

In order to more accurately determine the depth value of each target image area to obtain a more realistic three-dimensional point cloud model, in some embodiments, after determining the three-dimensional point set of each target object, for each three-dimensional point set, you can The spatial coordinates of each three-dimensional point in the three-dimensional point set are fitted to obtain a fitting plane, and then the depth information of each target image area is determined according to the fitting plane. For example, according to the three-dimensional point set corresponding to the same target object, the plane equation of the fitting plane can be determined as follows:

ax _w +by _w +cz _w +d = 1 formula (2)

Among them, a, b, c, d are the coefficients of the plane equation obtained by fitting the three-dimensional point coordinates in the three-dimensional point set. Then according to the fitting plane to determine the depth information of each target image area. Specifically, by combining formula (2) with the following formula (1), the depth Z of the pixel corresponding to the target object on each depth map can be immediately obtained, and then the target image area corresponding to the target object on each depth map is filled. The specific formula (1) is as follows:

In order to further explain the image processing method provided in this application, a specific embodiment is used for explanation below.

In the 3D reconstruction technology, the calculation of the depth map depends on the specificity and stability of the surface texture of the object. It is necessary to extract feature points from multiple RGB images of three-dimensional objects and perform matching to determine the depth information of each three-dimensional object in the image. For 3D objects with reflective surfaces such as water surface, weak surface texture, and unfixed texture, the 3D reconstruction has always been a problem. Traditional 3D reconstruction methods are difficult to obtain correct reconstruction results for these objects. The water surface point cloud usually lacks or height information Confused.

In order to solve the problem of the lack of water in the 3D reconstruction, this embodiment provides an image processing method that can fill in the water area in the depth map used to construct the 3D point cloud, so that the constructed 3D point cloud will not appear The delamination phenomenon is relatively uniform and complete. The specific method is as follows:

1. Training of deep learning model

First, you can collect a large number of RGB images containing water scenes, label the pixels of the water scenes in these RGB images, and then input the marked RGB images into the deep learning model, and train the deep model to obtain a A model that separates water and non-aquatic areas. Among them, the deep learning model can be an FCN model. After the model is trained, the trained model can calculate the pixel-level water segmentation results of each image through a series of processing, and then determine the pixel points corresponding to the water area in each image.

2. Determine the semantic image corresponding to the depth map

After the deep learning model is trained, the RGB image corresponding to the depth map used to construct the three-dimensional point cloud can be input into the trained deep learning model, and then the image of the water area and non-water area in the marked image can be output. Call it a semantic image. As a result, the original color RGB original image of the same scale and the depth map and semantic image corresponding to the RGB image can be obtained, and the position of each pixel of the three images has a one-to-one correspondence.

3. Determining the edge pixels of each water area in the depth map

The corresponding relationship between each depth map and semantic image can determine the edge pixels of each water area in the depth map. Since the pixels in the water area in the semantic image all carry labels, each pixel in the water area can be determined from the semantic image, and then each pixel is judged one by one whether the neighboring pixels around each pixel are all pixels in the water area, if not , Then take the pixel point of the non-water image area adjacent to the pixel point as the edge pixel point of the water area to determine the edge pixel point of the water area in each depth map.

4. Determine the intersection of each water area in the depth map

After determining the edge pixels of the water area in each depth map, these edge pixels can be projected into the world coordinate system to determine the three-dimensional points corresponding to each edge pixel in the world coordinate system. Among them, the spatial coordinates of the corresponding three-dimensional points after each edge pixel point is projected to the world coordinates can be calculated by formula (1):

Among them, (u, v) is the coordinate of the edge pixel point, Z is the depth of the pixel coordinate (u, v), K is the camera parameter, R is the rotation torque of the camera device that took the image, and T is the image taken The translation torque of the camera device, Pw is the space coordinate of the three-dimensional point corresponding to the edge pixel point in the three-dimensional space with coordinates (u, v).

When multiple images are captured in the same water area, the edge of the water area will intersect. Therefore, the intersection of each water area can be determined according to the three-dimensional points corresponding to the edge pixels of each water area in the depth map, and the intersecting water areas are marked as the same piece Waters, as shown in Figure 4, the waters on the two images intersect and can be marked as the same waters.

As shown in Figure 5, in order to determine the flow chart of intersecting waters, after each depth map used to construct the three-dimensional point cloud is obtained (S501), the edge pixels of the water area of each depth map can be determined (S502), Then project the edge pixels of the water area of each image to the world coordinate system, obtain the 3D point corresponding to the edge pixel in the 3D space (S503), and then determine whether the water area intersects the counted water area according to the 3D points (S504), if it does not intersect the water area that has been counted, it will be counted as a new water area (S505). If it intersects the water area that has been counted, it is determined that the water area in which the image is located belongs to the known water area, and it is determined that it is a new water area. Whether the statistical water area intersects with multiple statistical water areas, if it intersects with a single statistical water area (S506), the 3D point corresponding to the water area is added to the known 3D point set of the corresponding water area (S508); When the water area corresponding to the image intersects with multiple statistical water areas (S507), the multiple water areas are merged into one water area, and the three-dimensional points corresponding to the edge pixels corresponding to the multiple water areas are placed in a three-dimensional point set (S509). ). In this way, a statistical result of the global water area can be obtained.

Put the three-dimensional points corresponding to the edge pixels of the same water area in the three-dimensional space in a three-dimensional point set, and fit a plane by the three-dimensional point coordinates in a three-dimensional point set, and the plane equation is as follows:

ax _w +by _w +cz _w +d = 1 formula (2)

Among them, a, b, c, d are the coefficients of the plane equation obtained by fitting the three-dimensional point coordinates in the three-dimensional point set. Combining formula (2) with the following formula (1) can immediately obtain the depth Z of the pixel corresponding to the target object on each depth map, and then fill in the target image area corresponding to the target object on each depth map, formula (1 )as follows:

When multiple images are captured in the same water area, the related technology directly uses the water edge pixel information of a single depth map to fill in the depth of the water surface area. Because the depth noise near the water surface pixels is large, the reconstructed point cloud will appear serious Of stratification. The image processing method provided by this application collects the edge pixels corresponding to the same water area in each depth map, and then determines the depth based on these edge pixels, and fills in the depth of the water surface area from the global perspective. The angle determines the depth information and fills in the depth of the water surface area. Therefore, the final point cloud will not be stratified, and it will be more uniform and complete.

As shown in Figure 6, Figure 6(a) is a schematic diagram of a three-dimensional point cloud reconstructed without filling the water area, and Figure 6(b) is a schematic diagram of a three-dimensional point cloud reconstructed based on a single depth map for filling processing, which can be seen , After the water area is filled with the filling method in the prior art, the reconstructed 3D point cloud will have the phenomenon of non-uniform layering. As shown in Figure 7, Figure 7(a) is a schematic diagram of a three-dimensional point cloud reconstructed using the filling method provided by an embodiment of the present invention, and Figure 7(b) is a three-dimensional point cloud reconstructed based on a single depth map for filling processing It can be seen that, compared to the method of using a single depth map for filling processing, the water surface in the point cloud obtained by the method provided in this application is more uniform, there will be no water surface layering, and the water surface noise is greatly reduced .

In addition, the present application also provides an image processing device. As shown in FIG. 8, the image processing device 80 includes a processor 81, a memory 82, and a computer program stored on the memory. The processor executes the The computer program implements the following steps:

In some embodiments, when the processor is configured to project edge pixel points of each target image area into a three-dimensional space to determine at least one three-dimensional point set, the method includes:

Projecting the edge pixels of each target image area into a three-dimensional space to obtain three-dimensional points corresponding to the edge pixels of each target image area;

Determine the three-dimensional points corresponding to the same target object in the three-dimensional space according to the spatial coordinates of the three-dimensional points corresponding to the edge pixel points of the respective target image regions;

Place the three-dimensional points corresponding to the same target object in the three-dimensional space in the same three-dimensional point set.

In some embodiments, when the processor is configured to determine the three-dimensional points corresponding to the same target object in the three-dimensional space according to the spatial coordinates of the three-dimensional points corresponding to the edge pixel points of the respective target image regions, the method includes:

Determine the intersection of each target image area in the three-dimensional space corresponding to the three-dimensional space according to the spatial coordinates of the three-dimensional point corresponding to the edge pixel point of each target image area;

A three-dimensional point corresponding to the same target object in the three-dimensional space is determined based on the intersection of the three-dimensional space regions.

In some embodiments, when the processor is configured to determine the intersection of the three-dimensional space regions corresponding to each target image region according to the spatial coordinates of the three-dimensional points corresponding to the edge pixel points of each target image region, the method includes:

Projecting the three-dimensional points corresponding to the edge pixel points of each target image area onto the same plane to determine the plane area corresponding to each target image area;

Determine the intersection of the three-dimensional space area corresponding to each target image area according to the intersection of the plane areas corresponding to each target image area.

In some embodiments, when the processor is configured to determine the three-dimensional point corresponding to the same target object in the three-dimensional space based on the intersection of the three-dimensional space regions, the method includes:

If the three-dimensional space areas corresponding to the two target image areas intersect, determine that the three-dimensional points corresponding to the edge pixel points of the two target image areas are three-dimensional points corresponding to the same target object in the three-dimensional space; or

If the three-dimensional space regions corresponding to the two target image regions intersect with the same three-dimensional space region, it is determined that the three-dimensional points corresponding to the edge pixel points of the two target image regions are three-dimensional points corresponding to the same target object in the three-dimensional space.

In some embodiments, when the processor is configured to fill in the depth value of the target image region corresponding to each three-dimensional point set according to the spatial coordinates of the three-dimensional point in each three-dimensional point set, the method includes:

For each three-dimensional point collection:

Determining a fitting plane according to the spatial coordinates of the three-dimensional points in the three-dimensional point set;

Filling the depth value of the target image area corresponding to the three-dimensional point set according to the fitting plane.

In some embodiments, when the processor is used to determine the edge pixels of each target image area in the multiple depth maps, the method includes:

Acquiring semantic images corresponding to the multiple depth maps, wherein the semantic image is segmented into multiple image regions, and each image region corresponds to a type of target object in a three-dimensional space;

Determine the edge pixel points of each target image area in the multiple depth maps based on the semantic image.

In some embodiments, the semantic image is obtained based on the RGB image corresponding to the depth map and a pre-trained calculation model.

In some embodiments, the determining the edge pixels of each target image area in the multiple depth maps based on the semantic image includes:

Determining the pixel points of each target image area from the depth map based on the correspondence between the semantic image and the pixel points of the depth map;

Determine the edge pixels of the target image area according to whether the neighboring pixels around the pixel points are all the pixels of the target image area.

In some embodiments, the target image area is an image area of a water area, and the target object is a water area.

Correspondingly, an embodiment of the present specification also provides a computer storage medium in which a program is stored, and the program is executed by a processor to implement the image processing method in any of the foregoing embodiments.

The embodiments of this specification may adopt the form of a computer program product implemented on one or more storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing program codes. Computer usable storage media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to: phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.

For the device embodiment, since it basically corresponds to the method embodiment, the relevant part can refer to the part of the description of the method embodiment. The device embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement without creative work.

It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply one of these entities or operations. There is any such actual relationship or order between. The terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes other elements that are not explicitly listed. Elements, or also include elements inherent to such processes, methods, articles, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.

The methods and devices provided by the embodiments of the present invention are described in detail above. Specific examples are used in this article to illustrate the principles and implementations of the present invention. The descriptions of the above embodiments are only used to help understand the methods and methods of the present invention. The core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as a limitation of the present invention .

Claims

An image processing method, characterized in that the method includes:

Acquiring a plurality of depth maps, each depth map includes one or more target image regions, and the target image regions meet a preset type condition;

Determining the edge pixels of each target image area in the multiple depth maps;

Projecting the edge pixels of each target image area into a three-dimensional space to determine at least one three-dimensional point set, and the three-dimensional points contained in each of the three-dimensional point sets correspond to the same target object in the three-dimensional space;

According to the spatial coordinates of the three-dimensional points in each of the three-dimensional point sets, the depth value of the target image area corresponding to each three-dimensional point set is filled.
The image processing method according to claim 1, wherein projecting the edge pixel points of each target image area into a three-dimensional space to determine at least one three-dimensional point set comprises:

Projecting the edge pixels of each target image area into a three-dimensional space to obtain three-dimensional points corresponding to the edge pixels of each target image area;

Determine the three-dimensional points corresponding to the same target object in the three-dimensional space according to the spatial coordinates of the three-dimensional points corresponding to the edge pixel points of the respective target image regions;

Place the three-dimensional points corresponding to the same target object in the three-dimensional space in the same three-dimensional point set.
The image processing method according to claim 2, wherein the determining the three-dimensional points corresponding to the same target object in the three-dimensional space according to the spatial coordinates of the three-dimensional points corresponding to the edge pixel points of the respective target image regions comprises :

Determine the intersection of each target image area in the three-dimensional space corresponding to the three-dimensional space according to the spatial coordinates of the three-dimensional point corresponding to the edge pixel point of each target image area;

The three-dimensional points corresponding to the same target object in the three-dimensional space are determined based on the intersection of the three-dimensional space regions.
The image processing method according to claim 3, wherein the intersection of the three-dimensional space area corresponding to each target image area is determined according to the spatial coordinates of the three-dimensional point corresponding to the edge pixel point of each target image area Situation, including:

Projecting the three-dimensional points corresponding to the edge pixel points of each target image area onto the same plane to determine the plane area corresponding to each target image area;

According to the intersection of the plane regions corresponding to the respective target image regions, the intersection of the three-dimensional space regions corresponding to the respective target image regions is determined.
The image processing method according to claim 3, wherein the determining the three-dimensional points corresponding to the same target object in the three-dimensional space based on the intersection of the three-dimensional space regions comprises:

If the three-dimensional space regions corresponding to the two target image regions intersect, it is determined that the three-dimensional points corresponding to the edge pixel points of the two target image regions are three-dimensional points corresponding to the same target object in the three-dimensional space; or

If the three-dimensional space regions corresponding to the two target image regions intersect with the same three-dimensional space region, it is determined that the three-dimensional points corresponding to the edge pixel points of the two target image regions are three-dimensional points corresponding to the same target object in the three-dimensional space.
The image processing method according to any one of claims 1-5, wherein the depth value of the target image area corresponding to each three-dimensional point set is filled according to the spatial coordinates of the three-dimensional point in each three-dimensional point set, include:

For each three-dimensional point collection:

Determining a fitting plane according to the spatial coordinates of the three-dimensional points in the three-dimensional point set;

Filling the depth value of the target image area corresponding to the three-dimensional point set according to the fitting plane.
The image processing method according to any one of claims 1 to 6, wherein determining the edge pixels of each target image area in the multiple depth maps comprises:

Acquiring semantic images corresponding to the multiple depth maps, wherein the semantic image is segmented into multiple image regions, and each image region corresponds to a type of target object in a three-dimensional space;

Determine the edge pixels of each target image area in the multiple depth maps based on the semantic image.
8. The image processing method according to claim 7, wherein the semantic image is obtained based on the RGB image corresponding to the depth map and a pre-trained calculation model.
The image processing method according to claim 7 or 8, characterized in that determining the edge pixels of each target image area in the multiple depth maps based on the semantic image comprises:

Determining the pixel points of each target image area from the depth map based on the correspondence between the semantic image and the pixel points of the depth map;

Determine the edge pixels of the target image area according to whether the neighboring pixels around the pixel points are all the pixels of the target image area.
The image processing method according to any one of claims 1-9, wherein the target image area is an image area of a water area, and the target object is a water area.
An image processing device, characterized in that the device includes a processor, a memory, and a computer program stored on the memory, and the processor executes the computer program to implement the following steps:

Acquiring a plurality of depth maps, each depth map includes one or more target image regions, and the target image regions meet a preset type condition;

Determining the edge pixels of each target image area in the multiple depth maps;

Projecting the edge pixels of each target image area into a three-dimensional space to determine at least one three-dimensional point set, and the three-dimensional points contained in each of the three-dimensional point sets correspond to the same target object in the three-dimensional space;

According to the spatial coordinates of the three-dimensional points in each of the three-dimensional point sets, the depth value of the target image area corresponding to each three-dimensional point set is filled.
The image processing device according to claim 11, wherein when the processor is configured to project edge pixel points of each target image area into a three-dimensional space to determine at least one three-dimensional point set, the method comprises:

Projecting the edge pixels of each target image area into a three-dimensional space to obtain three-dimensional points corresponding to the edge pixels of each target image area;

Determining the three-dimensional points corresponding to the same target object in the three-dimensional space according to the spatial coordinates of the three-dimensional points corresponding to the edge pixel points of the respective target image regions;

Place the three-dimensional points corresponding to the same target object in the three-dimensional space in the same three-dimensional point set.
The image processing device according to claim 12, wherein the processor is configured to determine, according to the spatial coordinates of the three-dimensional points corresponding to the edge pixels of the respective target image regions, the three-dimensional objects corresponding to the same target object in the three-dimensional space. When point, include:

Determine the intersection of each target image area in the three-dimensional space corresponding to the three-dimensional space according to the spatial coordinates of the three-dimensional point corresponding to the edge pixel point of each target image area;

The three-dimensional points corresponding to the same target object in the three-dimensional space are determined based on the intersection of the three-dimensional space regions.
The image processing device according to claim 13, wherein the processor is configured to determine the three-dimensional space corresponding to each target image area according to the spatial coordinates of the three-dimensional point corresponding to the edge pixel point of each target image area When the area intersects, it includes:

Projecting the three-dimensional points corresponding to the edge pixel points of each target image area onto the same plane to determine the plane area corresponding to each target image area;

According to the intersection of the plane regions corresponding to the respective target image regions, the intersection of the three-dimensional space regions corresponding to the respective target image regions is determined.
The image processing device according to claim 13, wherein when the processor is configured to determine the three-dimensional points corresponding to the same target object in the three-dimensional space based on the intersection of the three-dimensional space regions, the method comprises:

If the three-dimensional space regions corresponding to the two target image regions intersect, it is determined that the three-dimensional points corresponding to the edge pixel points of the two target image regions are three-dimensional points corresponding to the same target object in the three-dimensional space; or

If the three-dimensional space regions corresponding to the two target image regions intersect with the same three-dimensional space region, it is determined that the three-dimensional points corresponding to the edge pixel points of the two target image regions are three-dimensional points corresponding to the same target object in the three-dimensional space.
The image processing device according to any one of claims 11-15, wherein the processor is configured to fill in the target image area corresponding to each three-dimensional point set according to the spatial coordinates of the three-dimensional point in each three-dimensional point set. The depth value includes:

For each three-dimensional point collection:

Determining a fitting plane according to the spatial coordinates of the three-dimensional points in the three-dimensional point set;

Filling the depth value of the target image area corresponding to the three-dimensional point set according to the fitting plane.
The image processing device according to any one of claims 11-16, wherein when the processor is used to determine the edge pixels of each target image area in the multiple depth maps, the method comprises:

Acquiring semantic images corresponding to the multiple depth maps, wherein the semantic image is segmented into multiple image regions, and each image region corresponds to a type of target object in a three-dimensional space;

Determine the edge pixels of each target image area in the multiple depth maps based on the semantic image.
18. The image processing device of claim 17, wherein the semantic image is obtained based on an RGB image corresponding to the depth map and a pre-trained calculation model.
The image processing device according to claim 17 or 18, wherein the determining the edge pixels of each target image area in the multiple depth maps based on the semantic image comprises:

Determining the pixel points of each target image area from the depth map based on the correspondence between the semantic image and the pixel points of the depth map;

Determine the edge pixels of the target image area according to whether the neighboring pixels around the pixel points are all the pixels of the target image area.
The image processing device according to any one of claims 11-19, wherein the target image area is an image area of a water area, and the target object is a water area.