Disclosure of Invention
The application provides an image processing method and device, which can consider the influence of ambient illumination on a target object and are beneficial to improving the accuracy of a three-dimensional reconstruction result.
In a first aspect, there is provided an image processing method including: acquiring a plurality of original images of a target object, wherein the original images are obtained by shooting the target object under different visual angles; obtaining a plurality of groups of matching feature points based on a plurality of original images, and determining three-dimensional coordinates of the plurality of groups of matching feature points and relative pose of cameras corresponding to each original image based on pixel coordinates of the plurality of groups of matching feature points in the corresponding original images; determining depth information of the corresponding matching feature points on each original image based on the three-dimensional coordinates of the plurality of groups of matching feature points and the relative pose of the camera corresponding to each original image; determining an ambient illumination map corresponding to each original image based on the depth information of the corresponding matching feature points on each original image and the plurality of original images; determining a three-dimensional coordinate of a virtual light source based on an ambient light map corresponding to each original image and a camera relative pose corresponding to each original image; performing illumination compensation on each original image based on the three-dimensional coordinates of the virtual light source and the environment illumination map corresponding to each original image to obtain a plurality of illumination compensation images; a three-dimensional model of the target object is determined based on the multiple Zhang Guangzhao compensated images and the three-dimensional coordinates of the multiple sets of matching feature points.
According to the image processing method, the influence of the environment illumination on the target object is considered, the environment illumination map is introduced during three-dimensional reconstruction, the environment illumination map is utilized to determine the three-dimensional coordinates of the virtual light source, illumination compensation is carried out on the original image based on the three-dimensional coordinates of the virtual light source and the environment illumination map, a plurality of illumination compensation images are obtained, the illumination compensation images contain the influence of the environment illumination on the target object, the three-dimensional reconstruction is carried out on the target object by utilizing the plurality of illumination compensation images, the color of the three-dimensional model obtained after the three-dimensional reconstruction is enabled to be closer to the real color of the target object in an actual environment illumination scene, the accuracy of the three-dimensional reconstruction result is improved, and therefore user experience is improved.
It will be appreciated that the depth information is the respective perpendicular distance of each set of matching feature points to the corresponding image acquisition device (e.g., camera) imaging plane.
It should also be understood that an ambient illumination map is an image used to represent the luminance information of each pixel point in the original image under the influence of an ambient light source. Each original image in the plurality of original images has a corresponding ambient illumination map, that is, the step can obtain a plurality of original illumination maps.
It should also be appreciated that the number of virtual light sources may be one or more
With reference to the first aspect, in certain implementation manners of the first aspect, the method further includes: based on the camera relative pose corresponding to each original image, the three-dimensional coordinates of the virtual light source and the three-dimensional coordinates of a plurality of groups of matching characteristic points, determining updated three-dimensional coordinates of the updated virtual light source and the updated three-dimensional coordinates of the plurality of groups of matching characteristic points; based on the three-dimensional coordinates of the virtual light source and the ambient illumination map corresponding to each original image, carrying out illumination compensation on each original image, including: performing illumination compensation on each original image based on the updated virtual light source three-dimensional coordinates and the environment illumination map corresponding to each original image; determining a three-dimensional model of the target object based on the multiple Zhang Guangzhao compensated images and the three-dimensional coordinates of the multiple sets of matching feature points, comprising: a three-dimensional model of the target object is determined based on the plurality Zhang Guangzhao of compensated images and the updated three-dimensional coordinates of the plurality of sets of matched feature points.
According to the image processing method, illumination compensation is carried out on each original image by determining updated three-dimensional coordinates of the updated virtual light source and updated three-dimensional coordinates of the plurality of groups of matching feature points, and a three-dimensional model of a target object is determined based on the image subjected to the illumination compensation and the updated three-dimensional coordinates of the plurality of groups of matching feature points.
With reference to the first aspect, in some implementations of the first aspect, performing illumination compensation on each original image to obtain a plurality of illumination compensated images includes: determining updated pixel coordinates of the virtual light source in each original image based on the updated virtual light source three-dimensional coordinates; based on the difference value between the updated pixel coordinates of the virtual light source in each original image and the corresponding pixel coordinates of the virtual light source in each original image, shifting the pixel points in the ambient light map corresponding to each original image to obtain an ambient light compensation map corresponding to each original image; and respectively carrying out illumination compensation on each original image based on the environment illumination compensation image corresponding to each original image to obtain a plurality of illumination compensation images.
For example, the image processing apparatus may map the ambient illumination compensation map corresponding to each original image to the corresponding original image, and perform illumination compensation on each original image through a back projection operation, so as to obtain a plurality of illumination compensation images.
With reference to the first aspect, in certain implementations of the first aspect, determining a three-dimensional model of the target object based on the plurality Zhang Guangzhao of compensated images and the updated three-dimensional coordinates of the plurality of sets of matching feature points includes: determining updated three-dimensional coordinates of a plurality of groups of matching feature points as colorless sparse three-dimensional point clouds; acquiring pixel coordinates of a plurality of groups of matching feature points in a plurality of corresponding illumination compensation images; coloring the colorless sparse three-dimensional point cloud by utilizing pixel coordinates of a plurality of groups of matching characteristic points in a plurality of corresponding illumination compensation images to obtain colored sparse three-dimensional point cloud; and determining a three-dimensional model of the target object based on the colored sparse three-dimensional point cloud and the plurality of illumination compensation images.
For example, the image processing device may color the colorless sparse three-dimensional point cloud by using the average value of pixel coordinates of the plurality of sets of matching feature points in the corresponding plurality of illumination compensation images, to obtain a colored sparse three-dimensional point cloud.
With reference to the first aspect, in some implementations of the first aspect, obtaining multiple sets of matching feature points based on multiple original images includes: extracting characteristic points of each original image in the plurality of original images; and carrying out feature point matching on the plurality of original images to obtain a plurality of groups of matching feature points.
It should be understood that the image processing apparatus may extract feature points of each of the plurality of original images by a feature extraction algorithm. Illustratively, the feature extraction algorithm may be a scale-invariant feature transform algorithm (scale-INVARIANT FEATURE TRANSFORM, SIFT), an accelerated robust feature algorithm (speeded up robust features, SURF), a corner detection algorithm (features from ACCELERATED SEGMENT TEST, FAST), or a FAST feature point extraction and description algorithm (oriented FAST and rotated brief, ORB) algorithm, which extracts feature points of each of the plurality of original images, which is not limited by the embodiments of the present application.
It should also be appreciated that the image processing apparatus may perform feature point matching on a plurality of feature points extracted from a plurality of original images, respectively, by using a feature matching policy, to obtain a plurality of sets of matching feature points. Illustratively, the feature matching policy may be a violent matching policy, a K-nearest neighbor (KNN) matching policy, or the like, which is not limited by the embodiment of the present application.
Through a proper feature extraction algorithm and a feature matching strategy, the image processing equipment can extract the feature points more accurately, and can obtain more accurate feature matching results, namely the multiple groups of matching feature points, so that the accuracy of the follow-up three-dimensional reconstruction results is improved.
With reference to the first aspect, in some implementations of the first aspect, determining three-dimensional coordinates of a plurality of sets of matching feature points and a camera relative pose corresponding to each original image includes: based on pixel coordinates of a plurality of groups of matching feature points in corresponding original images and camera internal references corresponding to the plurality of original images, determining the relative pose of the camera corresponding to each original image by using a triangulation method; and determining three-dimensional coordinates of the multiple groups of matching feature points by using a triangulation method based on pixel coordinates of the multiple groups of matching feature points in the corresponding original images and camera relative pose corresponding to each original image.
With reference to the first aspect, in certain implementation manners of the first aspect, determining depth information of corresponding matching feature points on each original image includes: and inputting the three-dimensional coordinates of the multiple groups of matching feature points and the relative pose of the camera corresponding to each original image into a depth estimation network model to obtain the depth information of the corresponding matching feature points on each original image.
The depth estimation network model may be, for example, a convolutional neural network (convolutional neural networks, CNN).
With reference to the first aspect, in some implementations of the first aspect, determining an ambient light map corresponding to each original image includes: and inputting the depth information of the corresponding matching characteristic points on each original image and the plurality of original images into an illumination estimation network model to obtain an environment illumination map corresponding to each original image.
Illustratively, the illumination estimation network model may be a Gardner's illumination estimation network model.
With reference to the first aspect, in certain implementations of the first aspect, determining the virtual light source three-dimensional coordinates includes: determining the pixel coordinate with the minimum pixel amplitude in the ambient light map corresponding to each original image as the pixel coordinate corresponding to the virtual light source in each original image; and determining the three-dimensional coordinates of the virtual light source based on the pixel coordinates corresponding to each original image and the relative pose of the camera corresponding to each original image.
In a second aspect, there is provided an image processing apparatus including: the acquisition module is used for acquiring a plurality of original images of the target object, wherein the original images are obtained by shooting the target object under different visual angles; the processing module is used for obtaining a plurality of groups of matching feature points based on a plurality of original images, and determining three-dimensional coordinates of the plurality of groups of matching feature points and relative pose of the camera corresponding to each original image based on pixel coordinates of the plurality of groups of matching feature points in the corresponding original images; determining depth information of the corresponding matching feature points on each original image based on the three-dimensional coordinates of the plurality of groups of matching feature points and the relative pose of the camera corresponding to each original image; determining an ambient illumination map corresponding to each original image based on the depth information of the corresponding matching feature points on each original image and the plurality of original images; determining a three-dimensional coordinate of a virtual light source based on an ambient light map corresponding to each original image and a camera relative pose corresponding to each original image; performing illumination compensation on each original image based on the three-dimensional coordinates of the virtual light source and the environment illumination map corresponding to each original image to obtain a plurality of illumination compensation images; and determining a three-dimensional model of the target object based on the multiple Zhang Guangzhao compensated images and the three-dimensional coordinates of the multiple sets of matching feature points.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: based on the camera relative pose corresponding to each original image, the three-dimensional coordinates of the virtual light source and the three-dimensional coordinates of a plurality of groups of matching characteristic points, determining updated three-dimensional coordinates of the updated virtual light source and the updated three-dimensional coordinates of the plurality of groups of matching characteristic points; performing illumination compensation on each original image based on the updated virtual light source three-dimensional coordinates and the environment illumination map corresponding to each original image; and determining a three-dimensional model of the target object based on the plurality Zhang Guangzhao of compensated images and the updated three-dimensional coordinates of the plurality of sets of matching feature points.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: determining updated pixel coordinates of the virtual light source in each original image based on the updated virtual light source three-dimensional coordinates; based on the difference value between the updated pixel coordinates of the virtual light source in each original image and the corresponding pixel coordinates of the virtual light source in each original image, shifting the pixel points in the ambient light map corresponding to each original image to obtain an ambient light compensation map corresponding to each original image; and respectively carrying out illumination compensation on each original image based on the environment illumination compensation image corresponding to each original image to obtain a plurality of illumination compensation images.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: determining updated three-dimensional coordinates of a plurality of groups of matching feature points as colorless sparse three-dimensional point clouds; the acquisition module is also used for: acquiring pixel coordinates of a plurality of groups of matching feature points in a plurality of corresponding illumination compensation images; the processing module is also used for: coloring the colorless sparse three-dimensional point cloud by utilizing pixel coordinates of a plurality of groups of matching characteristic points in a plurality of corresponding illumination compensation images to obtain colored sparse three-dimensional point cloud; and determining a three-dimensional model of the target object based on the colored sparse three-dimensional point cloud and the plurality of illumination compensation images.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: extracting characteristic points of each original image in the plurality of original images; and performing feature point matching on the plurality of original images to obtain a plurality of groups of matching feature points.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: based on pixel coordinates of a plurality of groups of matching feature points in corresponding original images and camera internal references corresponding to the plurality of original images, determining the relative pose of the camera corresponding to each original image by using a triangulation method; and determining three-dimensional coordinates of the plurality of groups of matching feature points by using a triangulation method based on pixel coordinates of the plurality of groups of matching feature points in the corresponding original images and camera relative pose corresponding to each original image.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: and inputting the three-dimensional coordinates of the multiple groups of matching feature points and the relative pose of the camera corresponding to each original image into a depth estimation network model to obtain the depth information of the corresponding matching feature points on each original image.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: and inputting the depth information of the corresponding matching characteristic points on each original image and the plurality of original images into an illumination estimation network model to obtain an environment illumination map corresponding to each original image.
With reference to the second aspect, in certain implementations of the second aspect, the processing module is further configured to: determining the pixel coordinate with the minimum pixel amplitude in the ambient light map corresponding to each original image as the pixel coordinate corresponding to the virtual light source in each original image; and determining the three-dimensional coordinates of the virtual light source based on the pixel coordinates corresponding to each original image and the camera relative pose corresponding to each original image of the virtual light source.
In a third aspect, another image processing apparatus is provided that includes a processor and a memory. The processor is configured to read instructions stored in the memory and to receive signals via the receiver and to transmit signals via the transmitter to perform the method of any one of the possible implementations of the first aspect.
Optionally, the processor is one or more and the memory is one or more.
Alternatively, the memory may be integrated with the processor or the memory may be separate from the processor.
In a specific implementation process, the memory may be a non-transient (non-transitory) memory, for example, a Read Only Memory (ROM), which may be integrated on the same chip as the processor, or may be separately disposed on different chips.
The image processing apparatus in the above third aspect may be one chip, the processor may be realized by hardware or may be realized by software, and when realized by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor, implemented by reading software code stored in a memory, which may be integrated in the processor, or may reside outside the processor, and exist separately.
In a fourth aspect, a computer readable storage medium is provided, which stores a computer program (which may also be referred to as code, or instructions) which, when run on a computer, causes the computer to perform the method of any one of the possible implementations of the first aspect.
In a fifth aspect, there is provided a computer program product comprising: a computer program (which may also be referred to as code, or instructions) which, when executed, causes a computer to perform the method of any one of the possible implementations of the first aspect described above.
Detailed Description
The technical scheme of the application will be described below with reference to the accompanying drawings.
In order to clearly describe the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.
In the present application, the words "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
Furthermore, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, and c may represent: a, b, or c, or a and b, or a and c, or b and c, or a, b and c, wherein a, b and c can be single or multiple.
Three-dimensional reconstruction techniques refer to the use of computer technology to reconstruct real-world scenes or objects into data models that can be expressed and processed by a computer. Three-dimensional reconstruction based on images is increasingly used due to its low requirements for acquisition equipment and the low cost nature of the reconstruction process.
In order to facilitate understanding of the present application, an application scenario according to an embodiment of the present application is described below with reference to fig. 1.
Fig. 1 shows a schematic diagram of an application scenario 100 according to an embodiment of the present application. The application scene includes an image acquisition device 101 and an image processing device 102. The image capturing device 101 may capture images of the three-dimensional reconstructed target object or of a scene in which the target object is located at different perspectives and send the images at the different perspectives to the image processing device 102. Correspondingly, after receiving the images at the different perspectives, the image processing device 102 may reconstruct the target object or the scene where the target object is located in three dimensions by using the images at the different perspectives.
Alternatively, the image capturing device 101 may take a photograph of the target object with a camera, and directly capture a plurality of images with different perspectives; the camera can be used for video recording, videos comprising pictures of the target object at different view angles are collected, and then images comprising the target object at different view angles are clipped in the videos; the network pictures under different perspectives of the same target object can also be obtained from the internet, which is not limited in the embodiment of the application.
It will be appreciated that the image capturing device 101 may be embodied as a device comprising a camera and the image processing device 102 may be embodied as a device having data processing capabilities. It should also be understood that in the above application scenario, the number of image capturing devices may be one or more, which is not limited by the embodiment of the present application.
The foregoing fig. 1 shows only one possible scenario, and in other possible scenarios, the image capturing device and the image processing device may be jointly configured as one entity device, which is referred to as an image processing device for convenience of description in the embodiment of the present application. In other words, the image processing device according to the embodiment of the present application may have an image capturing function, and may capture images of different perspectives of the target object by itself, or may receive images captured by other devices.
Currently, the process of three-dimensional reconstruction by an image processing apparatus based on images at different viewing angles may include: extracting characteristic points in the multi-view image, completing characteristic point matching, performing sparse point cloud reconstruction based on a motion restoration structure technology, and performing dense reconstruction based on a multi-view stereo technology. After the sparse point cloud is rebuilt, the average value of the colors in the images of the same characteristic point under different view angles is taken as the corresponding three-dimensional point color, and coloring treatment is carried out on the sparse point cloud. Based on the colored sparse point cloud, dense reconstruction is performed through a multi-view stereo technology, so that the accuracy of a three-dimensional reconstruction result is improved.
However, due to the surface material of the target object and the ambient light position or color when the image of the target object is acquired, the images under different view angles may have the phenomena of unclear texture, unrealistic color, inconsistent texture under multiple view angles and the like of the target object, and the average value of the colors in the images under different view angles of the matching feature points does not represent the true color of the target object, so that the accuracy of the three-dimensional reconstruction result of the method is poor.
In view of this, the embodiments of the present application provide an image processing method and apparatus, by considering the influence of ambient light on a target object, introducing an ambient light map during three-dimensional reconstruction, determining three-dimensional coordinates of a virtual light source using the ambient light map, performing illumination compensation on an original image based on the three-dimensional coordinates of the virtual light source and the ambient light map, so as to obtain a plurality of illumination compensation images.
An image processing method provided by an embodiment of the present application will be described below with reference to fig. 2 to 3.
Fig. 2 is a schematic flowchart of an image processing method 200 according to an embodiment of the present application, where the method 200 may be performed by the image processing device 102 shown in fig. 1, or may be performed by other similar devices, and the embodiment of the present application is not limited thereto. For convenience of description, embodiments of the present application will be collectively referred to as an image processing apparatus. The method 200 includes the steps of:
S201, acquiring a plurality of original images of a target object, wherein the plurality of original images are obtained by shooting the target object under different visual angles.
It should be understood that the above-described plurality of original images may be captured by the image processing apparatus itself (in the case where the image processing apparatus has an image capturing function), or may be captured by the image capturing apparatus. In addition, the plurality of original images may be panoramic images acquired by a panoramic camera, or wide-angle images, or images captured by a common camera, which is not limited in the present application.
S202, obtaining a plurality of groups of matching feature points based on a plurality of original images, and determining three-dimensional coordinates of the plurality of groups of matching feature points and relative pose of cameras corresponding to each original image based on pixel coordinates of the plurality of groups of matching feature points in the corresponding original images.
It should be understood that the image processing apparatus may extract a plurality of feature points on a plurality of original images respectively through a feature extraction algorithm, and match the plurality of feature points of each original image on the plurality of original images through a feature matching policy, so as to obtain a plurality of sets of matching feature points.
The matching feature points may be matching feature points in two images, for example, the first original image has feature points 1-1 and 1-2, the second original image has feature points 2-1 and 2-2, and feature points 1-1 and 2-1 are imaging points of the same physical space point in the two original images, so that feature points 1-1 and 2-1 are a set of matching feature points.
The matching feature points may be matching feature points in two or more images, for example, the first original image has feature points 1-1 and 1-2, the second original image has feature points 2-1 and 2-2, the third original image has feature points 3-1 and 3-2, and the feature points 1-1, 2-1 and 3-1 are imaging points of the same physical space point in the three original images, so that the feature points 1-1, 2-1 and 3-1 are a set of matching feature points.
In the above example, the image processing apparatus may determine the pixel coordinates of the feature point 1-1 in the first original image, the pixel coordinates of the feature point 2-1 in the second original image, and the pixel coordinates of the feature point 3-1 in the third original image. Since the feature point 1-1, the feature point 2-1, and the feature point 3-1 are imaging points of the same physical space point in the three original images, the three-dimensional coordinates of the set of matching feature points are the three-dimensional coordinates of the same physical space point corresponding to the set of matching feature points.
The above examples merely illustrate an example of a set of matching feature points, and in actual operation, a plurality of original images may have one set of matching feature points or a plurality of sets of matching feature points, which is not limited by the embodiment of the present application.
And S203, determining depth information of the corresponding matching feature points on each original image based on the three-dimensional coordinates of the plurality of groups of matching feature points and the relative pose of the camera corresponding to each original image.
It will be appreciated that the depth information is the respective perpendicular distance of each set of matching feature points to the corresponding image acquisition device (e.g., camera) imaging plane. Illustratively, the first original image has a feature point 1-1, the feature point 1-1 and the second original image has a feature point 2-1 as a set of matching feature points, and the vertical distance from the feature point 1-1 in the first original image to the camera imaging plane is the depth information of the feature point.
S204, determining an environment illumination map corresponding to each original image based on the depth information of the corresponding matching feature points on each original image and the plurality of original images.
It should be understood that the ambient light map is an image for representing brightness information of each pixel point in the original image under the influence of the ambient light source. Each original image in the plurality of original images has a corresponding ambient illumination map, that is, the step can obtain a plurality of original illumination maps.
S205, determining three-dimensional coordinates of the virtual light source based on the environment illumination map corresponding to each original image and the camera relative pose corresponding to each original image.
It should be understood that the number of the virtual light sources may be one or more, and correspondingly, the number of the three-dimensional coordinates of the virtual light sources may be one or more, which is not limited in the embodiment of the present application.
Optionally, if the plurality of original images are collected in the outdoor natural light scene, the number of virtual light sources may be one; if the plurality of original images are collected under the condition of poor outdoor natural light or indoor scene, the number of the virtual light sources can be one or more.
S206, performing illumination compensation on each original image based on the three-dimensional coordinates of the virtual light source and the environment illumination map corresponding to each original image to obtain a plurality of illumination compensation images.
S207, determining a three-dimensional model of the target object based on the plurality Zhang Guangzhao of compensation images and the three-dimensional coordinates of the plurality of groups of matching feature points.
According to the image processing method, the influence of the environment illumination on the target object is considered, the environment illumination map is introduced during three-dimensional reconstruction, the environment illumination map is utilized to determine the three-dimensional coordinates of the virtual light source, illumination compensation is carried out on the original image based on the three-dimensional coordinates of the virtual light source and the environment illumination map, a plurality of illumination compensation images are obtained, the illumination compensation images contain the influence of the environment illumination on the target object, the three-dimensional reconstruction is carried out on the target object by utilizing the plurality of illumination compensation images, the color of the three-dimensional model obtained after the three-dimensional reconstruction is enabled to be closer to the real color of the target object in an actual environment illumination scene, the accuracy of the three-dimensional reconstruction result is improved, and therefore user experience is improved.
As an alternative embodiment, the method further comprises:
Based on the camera relative pose corresponding to each original image, the three-dimensional coordinates of the virtual light source and the three-dimensional coordinates of a plurality of groups of matching characteristic points, determining updated three-dimensional coordinates of the updated virtual light source and the updated three-dimensional coordinates of the plurality of groups of matching characteristic points;
s206, performing illumination compensation on each original image based on the three-dimensional coordinates of the virtual light source and the ambient illumination map corresponding to each original image, including:
Performing illumination compensation on each original image based on the updated virtual light source three-dimensional coordinates and the environment illumination map corresponding to each original image;
S207, determining a three-dimensional model of the target object based on the plurality of illumination compensation images and the three-dimensional coordinates of the plurality of sets of matching feature points, comprising:
A three-dimensional model of the target object is determined based on the plurality Zhang Guangzhao of compensated images and the updated three-dimensional coordinates of the plurality of sets of matched feature points.
According to the image processing method, illumination compensation is carried out on each original image by determining updated three-dimensional coordinates of the updated virtual light source and updated three-dimensional coordinates of the plurality of groups of matching feature points, and a three-dimensional model of a target object is determined based on the image subjected to the illumination compensation and the updated three-dimensional coordinates of the plurality of groups of matching feature points.
It should be appreciated that updating the virtual light source three-dimensional coordinates and updating the three-dimensional coordinates of the sets of matching feature points may also be referred to as accurate virtual light source three-dimensional coordinates and accurate three-dimensional coordinates of the sets of matching feature points.
It should be understood that the image processing apparatus has errors in the process of calculating the relative pose of the camera, the three-dimensional coordinates of the virtual light source and the three-dimensional coordinates of the plurality of sets of matching feature points corresponding to each original image, which affects the accuracy of the three-dimensional model of the target object. Because the camera relative pose corresponding to each original image, the three-dimensional coordinates of the virtual light source and the three-dimensional coordinates of the plurality of sets of matching feature points are mutually influenced, whether the camera relative pose corresponding to each original image is updated or not can influence the three-dimensional coordinates of the virtual light source and the three-dimensional coordinates of the plurality of sets of matching feature points, the image processing equipment can adjust the camera relative pose corresponding to each original image, the three-dimensional coordinates of the virtual light source and the three-dimensional coordinates of the plurality of sets of matching feature points to obtain the accurate camera relative pose corresponding to each original image, the accurate three-dimensional coordinates of the virtual light source and the accurate three-dimensional coordinates of the plurality of sets of matching feature points, and finally the accuracy of the three-dimensional model of the target object is improved.
As an optional embodiment, S206, performing illumination compensation on each original image to obtain a plurality of illumination compensated images, including: determining updated pixel coordinates of the virtual light source in each original image based on the updated virtual light source three-dimensional coordinates; based on the difference value between the updated pixel coordinates of the virtual light source in each original image and the corresponding pixel coordinates of the virtual light source in each original image, shifting the pixel points in the ambient light map corresponding to each original image to obtain an ambient light compensation map corresponding to each original image; and respectively carrying out illumination compensation on each original image based on the environment illumination compensation image corresponding to each original image to obtain a plurality of illumination compensation images.
For example, the image processing apparatus may map the ambient illumination compensation map corresponding to each original image to the corresponding original image, and perform illumination compensation on each original image through a back projection operation, so as to obtain a plurality of illumination compensation images.
As an optional embodiment, S207, determining the three-dimensional model of the target object based on the plurality Zhang Guangzhao of compensated images and the updated three-dimensional coordinates of the plurality of sets of matching feature points includes: determining updated three-dimensional coordinates of a plurality of groups of matching feature points as colorless sparse three-dimensional point clouds; acquiring pixel coordinates of a plurality of groups of matching feature points in a plurality of corresponding illumination compensation images; coloring the colorless sparse three-dimensional point cloud by utilizing pixel coordinates of a plurality of groups of matching characteristic points in a plurality of corresponding illumination compensation images to obtain colored sparse three-dimensional point cloud; and determining a three-dimensional model of the target object based on the colored sparse three-dimensional point cloud and the plurality of illumination compensation images.
For example, the image processing device may color the colorless sparse three-dimensional point cloud by using the average value of pixel coordinates of the plurality of sets of matching feature points in the corresponding plurality of illumination compensation images, to obtain a colored sparse three-dimensional point cloud.
As an optional embodiment, S202, obtaining multiple sets of matching feature points based on multiple original images includes: extracting characteristic points of each original image in the plurality of original images; and carrying out feature point matching on the plurality of original images to obtain a plurality of groups of matching feature points.
It should be understood that the image processing apparatus may extract feature points of each of the plurality of original images by a feature extraction algorithm. Illustratively, the feature extraction algorithm may be a scale-invariant feature transform algorithm (scale-INVARIANT FEATURE TRANSFORM, SIFT), an accelerated robust feature algorithm (speeded up robust features, SURF), a corner detection algorithm (features from ACCELERATED SEGMENT TEST, FAST), or a FAST feature point extraction and description algorithm (oriented FAST and rotated brief, ORB) algorithm, which extracts feature points of each of the plurality of original images, which is not limited by the embodiments of the present application.
It should be understood that the image processing apparatus may perform feature point matching on a plurality of feature points extracted from a plurality of original images, respectively, by using a feature matching policy, to obtain a plurality of sets of matching feature points. Illustratively, the feature matching policy may be a violent matching policy, a K-nearest neighbor (KNN) matching policy, or the like, which is not limited by the embodiment of the present application.
Through a proper feature extraction algorithm and a feature matching strategy, the image processing equipment can extract the feature points more accurately, and can obtain more accurate feature matching results, namely the multiple groups of matching feature points, so that the accuracy of the follow-up three-dimensional reconstruction results is improved.
As an optional embodiment, S202 described above, determining three-dimensional coordinates of a plurality of sets of matching feature points and a camera relative pose corresponding to each original image includes: based on pixel coordinates of a plurality of groups of matching feature points in corresponding original images and camera internal references corresponding to the plurality of original images, determining the relative pose of the camera corresponding to each original image by using a triangulation method; and determining three-dimensional coordinates of the multiple groups of matching feature points by using a triangulation method based on pixel coordinates of the multiple groups of matching feature points in the corresponding original images and camera relative pose corresponding to each original image.
Illustratively, the first original image has feature points 1-1 therein, the second original image has feature points 2-1 therein, and the feature points 1-1 and 2-1 are a set of matching feature points. Based on the pixel coordinates of the feature points 1-1 in the first original image, the pixel coordinates of the feature points 2-1 in the second original image, the camera relative pose (for example, a preset unit array) of the first original image, and the camera internal parameters, the camera relative pose corresponding to the second original image can be obtained through a triangulation method. Based on the camera relative pose corresponding to the second original image, the pixel coordinates of the feature point 1-1 in the first original image, and the pixel coordinates of the feature point 2-1 in the second original image, the three-dimensional coordinates of the matching feature point can be determined by a triangulation method.
Illustratively, the first original image has feature points 1-1, the second original image has feature points 2-1, and the third original image has feature points 3-1, and the feature points 1-1, 2-1, and 3-1 are a set of matching feature points. Based on the pixel coordinates of the feature point 1-1 in the first original image, the pixel coordinates of the feature point 2-1 in the second original image, the camera relative pose (for example, a preset unit array) of the first original image, and the camera internal reference, the three-dimensional coordinates P1' of the camera relative pose corresponding to the second original image and the matching feature point can be determined by the above-mentioned triangulation method. Based on the pixel coordinates of the feature point 1-1 in the first original image, the pixel coordinates of the feature point 3-1 in the third original image, the camera relative pose (for example, a preset unit array) of the first original image, and the camera internal reference, by the above-mentioned triangulation method, the three-dimensional coordinates P1 "of the camera relative pose and the matching feature point corresponding to the third original image can be determined, and based on the three-dimensional coordinates P1' and P1", the three-dimensional coordinates of P1 as the corresponding matching feature point can be determined by a joint equation.
As an optional embodiment, S203, determining depth information of the corresponding matching feature point on each original image includes: and inputting the three-dimensional coordinates of the multiple groups of matching feature points and the relative pose of the camera corresponding to each original image into a depth estimation network model to obtain the depth information of the corresponding matching feature points on each original image.
The depth estimation network model may be, for example, a convolutional neural network (convolutional neural networks, CNN).
As an optional embodiment, S204, determining the ambient light map corresponding to each original image includes: and inputting the depth information of the corresponding matching characteristic points on each original image and the plurality of original images into an illumination estimation network model to obtain an environment illumination map corresponding to each original image.
Illustratively, the illumination estimation network model may be a Gardner's illumination estimation network model.
As an alternative embodiment, S205, determining the three-dimensional coordinates of the virtual light source includes: determining the pixel coordinate with the minimum pixel amplitude in the ambient light map corresponding to each original image as the pixel coordinate corresponding to the virtual light source in each original image; and determining the three-dimensional coordinates of the virtual light source based on the pixel coordinates corresponding to each original image and the relative pose of the camera corresponding to each original image.
It should be appreciated that the pixel magnitude indicates the brightness of a single pixel point. Illustratively, a pixel magnitude in the range of 0-256, a pixel magnitude of 0 may be used to indicate white, and a pixel magnitude of 256 may be used to indicate black.
Illustratively, fig. 3a is an ambient light map in which there is one virtual light source P, and fig. 3b is the position of the virtual light source P under the camera coordinate system. In fig. 3a, point P is the point where the pixel amplitude is closest to 0, and the point P is determined as the position of the virtual light source in the ambient light map in fig. 3 a. The pixel coordinates of the virtual light source in the ambient light map of fig. 3a can be converted from the pixel coordinate system to the image coordinate system to the camera coordinate system, so as to obtain the virtual light source P coordinate in the camera coordinate system as shown in fig. 3b.
Optionally, the three-dimensional coordinates of the virtual light source are determined by a triangulation method based on the pixel coordinates corresponding to each original image and the camera relative pose corresponding to each original image.
In the following, an embodiment of the present application will be described in detail by taking an example of 10 original images with different viewing angles collected under outdoor natural light with reference to fig. 4.
Fig. 4 is a schematic flowchart of another image processing method 400 according to an embodiment of the present application, where the method 400 may be performed by the image processing device 102 shown in fig. 1, or may be performed by other similar devices, and the embodiment of the present application is not limited thereto. The method 400 includes the steps of:
s401, the image processing apparatus acquires 10 original images of the target object, the 10 original images being obtained by photographing the target object at different angles of view.
S402, the image processing apparatus extracts a plurality of feature points on each of the above 10 original images, respectively, and obtains a plurality of feature point information of each original image, the plurality of feature point information including pixel coordinates of the feature points in the original image.
The image processing apparatus may extract a plurality of feature points on each of the ten original images described above by a SIFT algorithm, a SURF algorithm, a FAST algorithm, or an ORB algorithm, respectively, as an example.
S403, the image processing device performs feature point matching on the 10 original images based on the plurality of feature point information of the 10 original images to obtain a plurality of groups of matching feature points. The matching feature points may be understood as imaging points of the same physical spatial point of the target object in a plurality of original images.
Illustratively, the image processing apparatus may obtain a plurality of sets of matching feature points by matching a plurality of feature points of the ten original images through a brute-force matching (BFM) policy.
S404, based on pixel coordinates of a plurality of groups of matching feature points in the corresponding original images and camera internal references corresponding to the original images, the image processing equipment obtains three-dimensional coordinates of the plurality of groups of matching feature points and camera relative pose corresponding to each original image.
Optionally, the image processing device may obtain the camera relative pose by using a triangulation method based on the pixel coordinates of the matching feature points in each two original images in the corresponding original images and the camera internal parameters. The image processing device may obtain three-dimensional coordinates of the matching feature points in each two original images by a triangulation method based on the obtained relative pose of the camera and the pixel coordinates of the matching feature points in each two original images in the corresponding original images.
And S405, the image processing device obtains depth information of the matching feature points on the corresponding original images through a depth estimation network model based on the three-dimensional coordinates of the matching feature points in each two original images and the camera relative pose corresponding to each original image.
Illustratively, the depth estimation network model may be a CNN.
And S406, the image processing equipment obtains an environment illumination map corresponding to the 10 original images by utilizing the illumination estimation network model based on the 10 original images and the depth information of the corresponding matching feature points on the 10 original images.
Illustratively, the above-described illumination estimation network model may be a Gardner's illumination estimation network model.
And S407, the image processing device determines the pixel coordinate with the minimum pixel amplitude in the environment illumination map as the pixel coordinate corresponding to the virtual light source in each original image based on the environment illumination map corresponding to the 10 original images.
Optionally, the image processing device may divide the connected regions of each ambient light map according to the pixel amplitude of the pixel point in the ambient light map, find a point with the smallest pixel amplitude in a region less than or equal to a preset threshold, and use the position of the point as the position of the virtual point light source. Pixel amplitude range: 0-256, the pixel amplitude is white when 0, and the pixel amplitude is black when 256.
The image processing apparatus calculates virtual light source three-dimensional coordinates based on the camera relative pose corresponding to the 10 original images and the pixel coordinates corresponding to the virtual light source in each original image S408.
Alternatively, the image processing apparatus may calculate the three-dimensional coordinates of the virtual light source by a triangulation method based on the camera relative pose corresponding to the 10 original images and the pixel coordinates corresponding to the virtual light source in each original image.
S409, the image processing device calculates updated three-dimensional coordinates of the updated virtual light source three-dimensional coordinates and the updated three-dimensional coordinates of the plurality of sets of matching feature points by using a cluster optimization equation based on the camera relative pose, the virtual light source three-dimensional coordinates and the three-dimensional coordinates of the plurality of sets of matching feature points corresponding to the 10 original images.
And S410, the image processing equipment performs illumination compensation on the corresponding 10 original images based on updating the three-dimensional coordinates of the virtual light source and the environment illumination map corresponding to the 10 original images to obtain 10 images after illumination compensation.
S411, the image processing device determines updated three-dimensional coordinates of a plurality of groups of matching feature points as colorless sparse three-dimensional point clouds, acquires pixel coordinates of the plurality of groups of matching feature points in the corresponding 10 illumination compensation images, calculates an average value of the pixel coordinates of the plurality of groups of matching feature points in the corresponding plurality of illumination compensation images, and colors the colorless sparse three-dimensional point clouds to obtain colored sparse three-dimensional point clouds.
S412, the image processing device obtains a three-dimensional model of the target object based on the 10 illumination-compensated images and the colored sparse three-dimensional point cloud.
Optionally, the image processing device obtains a three-dimensional model of the target object through a dense reconstruction algorithm based on the 10 illumination-compensated images and the colored sparse three-dimensional point cloud. The dense reconstruction algorithm may be a multi-view stereoscopic (MVS) algorithm.
According to the image processing method, illumination compensation is carried out on each original image by determining updated three-dimensional coordinates of the updated virtual light source and updated three-dimensional coordinates of the plurality of groups of matching feature points, and a three-dimensional model of a target object is determined based on the image subjected to the illumination compensation and the updated three-dimensional coordinates of the plurality of groups of matching feature points.
The image processing method of the embodiment of the present application is described in detail above with reference to fig. 2 to 4, and the image processing apparatus of the embodiment of the present application will be described in detail below with reference to fig. 5 and 6.
Fig. 5 shows an image processing apparatus 500 provided by an embodiment of the present application, the image processing apparatus 500 including: an acquisition module 501 and a processing module 502.
The acquisition module 501 is configured to acquire a plurality of original images of a target object, where the plurality of original images are obtained by shooting the target object under different viewing angles; the processing module 502 is configured to obtain multiple sets of matching feature points based on multiple original images, and determine three-dimensional coordinates of the multiple sets of matching feature points and a relative pose of the camera corresponding to each original image based on pixel coordinates of the multiple sets of matching feature points in the corresponding original images; determining depth information of the corresponding matching feature points on each original image based on the three-dimensional coordinates of the plurality of groups of matching feature points and the relative pose of the camera corresponding to each original image; determining an ambient illumination map corresponding to each original image based on the depth information of the corresponding matching feature points on each original image and the plurality of original images; determining a three-dimensional coordinate of a virtual light source based on an ambient light map corresponding to each original image and a camera relative pose corresponding to each original image; performing illumination compensation on each original image based on the three-dimensional coordinates of the virtual light source and the environment illumination map corresponding to each original image to obtain a plurality of illumination compensation images; and determining a three-dimensional model of the target object based on the multiple Zhang Guangzhao compensated images and the three-dimensional coordinates of the multiple sets of matching feature points.
Optionally, the processing module 502 is further configured to: based on the camera relative pose corresponding to each original image, the three-dimensional coordinates of the virtual light source and the three-dimensional coordinates of a plurality of groups of matching characteristic points, determining updated three-dimensional coordinates of the updated virtual light source and the updated three-dimensional coordinates of the plurality of groups of matching characteristic points; performing illumination compensation on each original image based on the updated virtual light source three-dimensional coordinates and the environment illumination map corresponding to each original image; and determining a three-dimensional model of the target object based on the plurality Zhang Guangzhao of compensated images and the updated three-dimensional coordinates of the plurality of sets of matching feature points.
Optionally, the processing module 502 is further configured to: determining updated pixel coordinates of the virtual light source in each original image based on the updated virtual light source three-dimensional coordinates; based on the difference value between the updated pixel coordinates of the virtual light source in each original image and the corresponding pixel coordinates of the virtual light source in each original image, shifting the pixel points in the ambient light map corresponding to each original image to obtain an ambient light compensation map corresponding to each original image; and respectively carrying out illumination compensation on each original image based on the environment illumination compensation image corresponding to each original image to obtain a plurality of illumination compensation images.
Optionally, the processing module 502 is further configured to: determining updated three-dimensional coordinates of a plurality of groups of matching feature points as colorless sparse three-dimensional point clouds; the acquisition module 501 is further configured to: acquiring pixel coordinates of a plurality of groups of matching feature points in a plurality of corresponding illumination compensation images; the processing module is also used for: coloring the colorless sparse three-dimensional point cloud by utilizing pixel coordinates of a plurality of groups of matching characteristic points in a plurality of corresponding illumination compensation images to obtain colored sparse three-dimensional point cloud; and determining a three-dimensional model of the target object based on the colored sparse three-dimensional point cloud and the plurality of illumination compensation images.
Optionally, the processing module 502 is further configured to: extracting characteristic points of each original image in the plurality of original images; and performing feature point matching on the plurality of original images to obtain a plurality of groups of matching feature points.
Optionally, the processing module 502 is further configured to: based on pixel coordinates of a plurality of groups of matching feature points in corresponding original images and camera internal references corresponding to the plurality of original images, determining the relative pose of the camera corresponding to each original image by using a triangulation method; and determining three-dimensional coordinates of the plurality of groups of matching feature points by using a triangulation method based on pixel coordinates of the plurality of groups of matching feature points in the corresponding original images and camera relative pose corresponding to each original image.
Optionally, the processing module 502 is further configured to: and inputting the three-dimensional coordinates of the multiple groups of matching feature points and the relative pose of the camera corresponding to each original image into a depth estimation network model to obtain the depth information of the corresponding matching feature points on each original image.
Optionally, the processing module 502 is further configured to: and inputting the depth information of the corresponding matching characteristic points on each original image and the plurality of original images into an illumination estimation network model to obtain an environment illumination map corresponding to each original image.
Optionally, the processing module 502 is further configured to: determining the pixel coordinate with the minimum pixel amplitude in the ambient light map corresponding to each original image as the pixel coordinate corresponding to the virtual light source in each original image; and determining the three-dimensional coordinates of the virtual light source based on the pixel coordinates corresponding to each original image and the camera relative pose corresponding to each original image of the virtual light source.
It should be understood that the apparatus 500 herein is embodied in the form of functional modules. The term module herein may refer to an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (e.g., a shared, dedicated, or group processor, etc.) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality. In an alternative example, it will be understood by those skilled in the art that the apparatus 500 may be specifically an image processing apparatus in the foregoing embodiment, and the apparatus 500 may be configured to perform each flow and/or step corresponding to the image processing apparatus in the foregoing method embodiment, which is not described herein for avoiding repetition.
The above-described apparatus 500 has a function of realizing the respective steps executed by the image processing apparatus in the above-described method; the above functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.
In an embodiment of the present application, the apparatus 500 in fig. 5 may also be a chip or a system-on-chip (SoC), for example.
Fig. 6 shows another image processing apparatus 600 provided by an embodiment of the present application, the image processing apparatus 600 comprising a processor 601, a transceiver 602 and a memory 603. Wherein the processor 601, the transceiver 602 and the memory 603 communicate with each other through an internal connection path, the memory 603 is used for storing commands, and the processor 601 is used for executing instructions stored in the memory 603 to control the transceiver 602 to transmit signals and/or receive signals.
It should be understood that the image processing apparatus 600 may be embodied as the image processing apparatus in the above-described embodiments, and may be used to perform the respective steps and/or flows corresponding to the image processing apparatus in the above-described method embodiments. The memory 603 may optionally include read only memory and random access memory and provide instructions and data to the processor 601. A portion of memory 603 may also include non-volatile random access memory. For example, the memory 603 may also store information of the device type. The processor 601 may be configured to execute instructions stored in a memory, and when the processor 501 executes instructions stored in the memory, the processor 601 is configured to perform the steps and/or flows of the method embodiments described above corresponding to the image processing device. The transceiver 602 may include a transmitter that may be used to implement various steps and/or processes for performing transmit actions corresponding to the transceiver and a receiver that may be used to implement various steps and/or processes for performing receive actions corresponding to the transceiver.
It should be appreciated that in embodiments of the present application, the processor 601 may be a central processing unit (central processing unit, CPU), the processor 601 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor executes instructions in the memory to perform the steps of the method described above in conjunction with its hardware. To avoid repetition, a detailed description is not provided herein.
The present application also provides a computer-readable storage medium storing a computer program for implementing the method corresponding to the image processing apparatus in the above-described embodiment.
The present application also provides a computer program product comprising a computer program (which may also be referred to as code, or instructions) which, when run on a computer, is capable of performing the method of the above embodiments corresponding to an image processing device.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system, apparatus and module may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a specific implementation of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art may easily think about changes or substitutions within the technical scope of the embodiments of the present application, and all changes and substitutions are included in the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.