CN119052659B

CN119052659B - Image processing methods and related devices

Info

Publication number: CN119052659B
Application number: CN202310621936.5A
Authority: CN
Inventors: 汤兴粲
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2023-05-29
Filing date: 2023-05-29
Publication date: 2025-11-07
Anticipated expiration: 2043-05-29
Also published as: CN119052659A

Abstract

The application provides an image processing method and a related device, which can be used in the technical field of image processing. In the technical scheme provided by the application, the imaging devices with different unit pixel sizes are used for shooting with the same exposure time at the same moment to obtain images with different brightness dynamic ranges, and the images with different brightness dynamic ranges are processed, so that the images with higher brightness dynamic ranges are obtained. In the application, the image with different brightness dynamic ranges can be obtained by shooting the same exposure time length through the image shooting devices with different unit pixel sizes, and compared with the image with different brightness dynamic ranges obtained by shooting with different exposure time lengths, more images can be obtained by shooting within the same time length, so that more images with high brightness dynamic ranges can be obtained, namely, the frame rate can be improved, and finally the user experience can be improved.

Description

Image processing method and related device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for acquiring an image.

Background

The video perspective technology is a technology of capturing an image of the surrounding environment by a camera and displaying the image on a display screen. The video perspective technology has high requirements on resolution, frame rate and image quality of a camera, wherein the dynamic range in the image quality is a main factor influencing reality, and the dynamic range of the image refers to the brightness range in a scene which can be captured by the image.

Because the dynamic range shot by the camera is far smaller than the dynamic range perceived by human eyes, the current method for improving the dynamic range of the image is a high dynamic range (HIGH DYNAMIC RANGE, HDR) imaging technology, namely, the image with the high dynamic range is acquired by fusing pictures with different exposure ranges, so that the shot image or video is closer to the human eyes.

However, when the image dynamic range is increased by using the method, the frame rate of the camera is found to be low, resulting in a low video frame rate.

Disclosure of Invention

The image processing method and the related device can improve the video frame rate and further improve the user experience.

In a first aspect, the present application provides an image processing method, in which a first image capturing device and a second image capturing device are controlled to capture images with the same exposure time at the same time to obtain a first image and a second image, where a unit pixel size of the first image capturing device is different from a unit pixel size of the second image capturing device, the first image is an image with the same exposure time obtained by the first image capturing device at the same time, and the second image is an image with the same exposure time obtained by the second image capturing device at the same time, and a brightness dynamic range of the target image is higher than a brightness dynamic range of the first image and a brightness dynamic range of the second image based on the first image and the second image.

In the method, the image capturing devices with different unit pixel sizes are used for shooting with the same exposure time to obtain images with different brightness dynamic ranges, so that the images with high dynamic brightness ranges are obtained based on the images with different brightness dynamic ranges. Compared with the method for acquiring images with different brightness dynamic ranges through different exposure time lengths, the method can save the shooting time length of the images, and therefore the shooting frame rate of the images can be improved.

In the present application, the acquisition of the target image based on the first image and the second image can be understood as acquiring the image of the high luminance dynamic range based on the plurality of images of the low luminance dynamic range. The low luminance dynamic range and the high luminance dynamic range herein are relative concepts mainly referring to the relative high and low of the luminance dynamic range between the first and second images and the target image.

In the present application, the implementation manner of acquiring the image of the high luminance dynamic range based on the plurality of images of the low luminance dynamic range is not limited.

In some possible implementations, acquiring the target image based on the first image and the second image may include matching the first image and the second image based on a similarity between the first image and the second image to obtain a first matching result, acquiring a disparity map of the first image and a disparity map of the second image based on the first matching result, acquiring a depth map of the first image based on the disparity map of the first image, acquiring a depth map of the second image based on the disparity map of the second image, performing a matching process on the first image and the second image based on the depth map of the first image and the depth map of the second image to obtain a second matching result, and performing a luminance fusion process on the first image and the second image according to the second matching result to obtain the target image.

In the implementation manner, the matching accuracy can be improved, so that the image quality of the target image can be improved.

In some possible implementations, matching the first image and the second image based on the similarity between the first image and the second image may include performing a target process on the first image and the second image, respectively, resulting in a third image and a fourth image, the target process including a binocular stereo correction process, and matching the first image and the second image based on the similarity between the third image and the fourth image.

In some possible implementations, the matching of the first image and the second image based on the depth map of the first image and the depth map of the second image may include mapping the first image into the coordinate system of the first image capturing device based on a conversion relationship between the depth map of the first image and the coordinate system of the first image capturing device to obtain a first three-dimensional image, converting the first three-dimensional image into the coordinate system of the second image capturing device based on a conversion relationship between the coordinate system of the first image capturing device and the coordinate system of the second image capturing device to obtain a second three-dimensional image, mapping the second image into the coordinate system of the second image capturing device based on a conversion relationship between the depth map of the second image and the coordinate system of the second image capturing device to obtain a third three-dimensional image, and determining a second matching result of the first image and the second image based on a matching result of the second three-dimensional image and the third three-dimensional image.

In a second aspect, the present application provides an image processing apparatus which may include respective functional modules for implementing the method in the first aspect. For example, the apparatus includes an interaction module and a processing module.

In some implementations, these modules may be implemented in software and/or hardware. For example, the interaction module may be implemented by a communication interface and the processing module may be implemented by a processor executing program code stored in a memory. Optionally, a memory may also be included.

It will be appreciated that the processing means provided in the second aspect may also be a system-on-chip.

In a third aspect, the application provides a computer readable storage medium storing program code for execution by an apparatus, the program code comprising instructions for implementing the method of the first aspect.

In a fourth aspect, the application provides a computer program product comprising instructions which, when run on an apparatus, cause the apparatus to implement the method of the first aspect.

In a fifth aspect, the present application provides an image processing apparatus, the image processing system comprising a first image capturing apparatus, a second image capturing apparatus, and a processing unit, the first image capturing apparatus and the second image capturing apparatus differing in unit pixel size, the processing unit being configured to implement the method of the first aspect or any one of the possible implementations thereof.

In a sixth aspect, the present application provides a terminal device comprising the image processing apparatus in the fifth aspect. For example, the terminal device may be a mediated reality (MEDIATED REALITY, MR) glasses.

It will be appreciated that the effects obtainable in the second to sixth aspects may be referred to in the description of the first aspect and are not described here in detail.

Drawings

FIG. 1 is an exemplary architecture diagram of an image processing system according to one embodiment of the present application;

FIG. 2 is a schematic diagram of MR glasses according to an embodiment of the present application;

FIG. 3 is an exemplary flow chart of an image processing method according to one embodiment of the application;

FIG. 4 is an exemplary graph of parallax versus depth for one embodiment of the present application;

fig. 5 is an exemplary configuration diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

In order to clearly describe the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or" describes an association of associated objects, meaning that there may be three relationships, e.g., A and/or B, and that there may be A alone, while A and B are present, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (a, b, or c) of a, b, c, a-b, a-c, b-c, or a-b-c may be represented, wherein a, b, c may be single or plural.

Fig. 1 is an exemplary architecture diagram of an image processing system according to one embodiment of the present application. As shown in fig. 1, the image processing system 100 may include an image capturing apparatus 101, an image capturing apparatus 102, a processor 103, a memory 104, a display 105, and a display 106.

The image pickup device 101, the image pickup device 102, the processor 103, the memory 104, the display 105, and the display 106 communicate with each other through an internal connection path.

The image pickup device 101 and the image pickup device 102 have basic functions such as video image pickup and/or still image capturing, and after an image is collected by a lens, the image is processed by a photosensitive component circuit and a control component in the image pickup device and converted into a digital signal which can be recognized by a processor.

The image pickup devices 101 and 102 respectively include an image sensor, on which a photosensitive element is provided, which can divide an optical image on a light receiving surface thereof into a plurality of small units and convert the small units into usable electrical signals.

For a photosensitive element having the same effective pixel, generally, the larger the size thereof, the larger the unit area of each pixel, the better the photosensitive performance, and more image details can be recorded. The unit area of each pixel may also be referred to as a unit pixel area, a single pixel size, or a unit pixel size. One way to calculate the unit pixel area is unit pixel area = total number of pixels/physical photosensitive screen area of the image sensor.

Since the unit pixel size of the image pickup device 101 is different from that of the image pickup device 102, the sensitivity of the image pickup device 101 is different from that of the image pickup device 102, and the dynamic range of the image picked up by the image pickup device 101 is further different from that of the image picked up by the image pickup device 102.

In some implementations, if the unit pixel size of the image capturing device 101 is greater than the unit pixel size of the image capturing device 102, the sensitivity of the image capturing device 101 is higher than the sensitivity of the image capturing device 102, and the dynamic range of the image captured by the image capturing device is greater than the dynamic range of the image captured by the image capturing device 102.

In other implementations, if the unit pixel size of the image capturing device 101 is smaller than the unit pixel size of the image capturing device 102, the sensitivity of the image capturing device 101 is lower than the sensitivity of the image capturing device 102, and the dynamic range of the image captured by the image capturing device is lower than the dynamic range of the image captured by the image capturing device 102.

The memory 104 is used to store instructions. Memory 104 may optionally include read-only memory and random access memory, and provide instructions and data to processor 103. A portion of the memory may also include non-volatile random access memory.

For example, the memory 104 may also store information of device type.

The processor 103 may be configured to execute instructions stored in the memory 104 and, when the processor 103 executes instructions stored in the memory 103, the processor 103 may be configured to perform the steps and/or processes of any of the method embodiments that follow.

It should be appreciated that in embodiments of the present application, the processor 103 may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The display 105 and the display 106 may receive signals transmitted by the processor, form images, and display the images. For example, the display 105 displays an image obtained by processing an image captured by the image capturing device 101 by the processor 103, and the display 106 displays an image obtained by processing an image captured by the image capturing device 102 by the processor 103.

It will be appreciated that the configuration of the image processing system shown in fig. 1 is merely an example, and that more or fewer components than shown may be included in the image processing system, or certain components may be combined, or certain components may be split, or different arrangements of components may be included in the image processing system. For example, the image processing system provided by the application can also comprise a communication interface, a power interface or more image pickup devices and other components.

In some embodiments of the present application, a terminal device is further provided, where the terminal device provided in the embodiment of the present application includes an image processing system shown in fig. 1. By way of example, the terminal device may be a smart phone, XR glasses, MR glasses, a camera, or the like.

Fig. 2 is a schematic diagram of MR glasses according to an embodiment of the application. The MR glasses 200 shown in fig. 2 may include the image processing system shown in fig. 1, and only the image pickup device 101 and the image pickup device 102 in the image processing system 100 are shown in fig. 2.

The imaging device 101 and the imaging device 102 may be referred to as binocular cameras of MR glasses. The placement positions of the imaging device 101 and the imaging device 102 in the MR glasses 200 are shown in fig. 2. The image pickup device 101 may be referred to as a left-eye camera, and the image pickup device 102 may be referred to as a right-eye camera.

As one example, the MR glasses 200 have an imaging device with a size of 2.0 micrometers (μm) mounted on the left and an imaging device with a size of 1.0 μm mounted on the right.

As one example, MR glasses include a left mirror and a right mirror. The left mirror comprises a display 105 for displaying the processed image after the shooting of the left-eye camera, and the right mirror comprises a display 106 for displaying the processed image after the shooting of the right-eye camera.

Fig. 3 is an exemplary flowchart of an image processing method according to an embodiment of the present application. The image processing method may be implemented by the processor 103 in the image processing system in the foregoing. As shown in fig. 3, the image processing method may include S310 and S320.

S310, controlling the first image pickup device and the second image pickup device to pick up images with the same exposure time at the same moment to obtain a first image and a second image, wherein the unit pixel size of the first image pickup device is different from that of the second image pickup device.

As an example, the first image pickup device may be the image pickup device 101, and the second image pickup device may be the image pickup device 102.

In some implementations, the unit pixel size of the first image capture device is greater than the unit pixel size of the second image capture device, and in other implementations, the unit pixel size of the first image capture device is less than the unit pixel size of the second image capture device.

As an example, the processor may send instructions to the first image capturing device and the second image capturing device instructing the first image capturing device and the second image capturing device to take an image.

In some possible implementations, the processor may indicate the same photographing start time and the same exposure time period to the first image capturing device and the second image capturing device. In this way, the first image pickup device and the second image pickup device can start shooting at the same timing and take images of the same exposure time.

In this embodiment, for convenience of description, an image captured by a first image capturing device is referred to as a first image, and an image captured by a second image capturing device is referred to as a second image.

Because the first image pickup device and the second image pickup device pick up the first image and the second image at the same time, the first image and the second image are images of the same subject.

The first image pickup device and the second image pickup device have different sensitivities because the unit pixel size of the first image pickup device is different from the unit pixel size of the second image pickup device, and have the same exposure time period for the first image and the second image, and thus the dynamic range of the first image is different from the dynamic range of the second image.

For example, if the unit pixel size of the first image capturing device is greater than the unit pixel size of the second image capturing device, the sensitivity of the first image capturing device is greater than the sensitivity of the second image capturing device, and the dynamic range of the first image is greater than the dynamic range of the second image, the first image including more details of the highlight region in the scene, and the second image including more details of the highlight region in the scene.

In other implementations, the unit pixel size of the first image capturing device is smaller than the unit pixel size of the second image capturing device, and the sensitivity of the first image capturing device is smaller than the sensitivity of the second image capturing device, and the dynamic range of the first image is smaller than the dynamic range of the second image, wherein the second image comprises more details of a highlight region in the scene, and the first image comprises more details of a highlight region in the scene.

S320, acquiring a target image based on the first image and the second image, wherein the luminance dynamic range of the target image is higher than the luminance dynamic range of the first image and the luminance dynamic range of the second image.

This step can be understood as deriving a higher dynamic range image based on the first image and the second image, which higher dynamic range image is referred to as the target image.

For example, after the first image capturing device and the second image capturing device capture the first image and the second image under the control of the processor, the processor may perform image fusion processing based on the first image and the second image to obtain the target image.

In this embodiment, the target image may include one or more. For example, the first image is used for luminance fusion of the first image to obtain a first target image based on the first image, and the second image is used for luminance fusion of the second image to obtain a second target image based on the second image.

In this embodiment, because the unit pixel size of the first image capturing device and the unit pixel size of the second image capturing device are different, the first image capturing device and the second image capturing device can capture images with different dynamic ranges within the same exposure time, and thus the purpose of acquiring images with higher dynamic ranges based on the images with different dynamic ranges can be achieved.

Compared with the technical scheme that images with different dynamic ranges are acquired through long exposure time and short exposure time, more time consumed by the long exposure time can be avoided, images with different dynamic ranges can be acquired in a shorter time, more images with different dynamic ranges can be acquired in a certain time, namely more images with high dynamic ranges can be acquired in a certain time, and finally the frame rate can be improved.

In addition, when the target image includes a plurality of, for example, the aforementioned first target image and second target image, since the first image and second image can be acquired by the same exposure time period, the photographing timings of the first image and second image can be ensured to be the same, and thus the time alignment of the first target image and second target image can be ensured.

In addition, since the first image and the second image are obtained by photographing by two independent photographing devices, respectively, the resolution of the first image and the second image can be ensured, so that the resolution of the finally obtained target image with high dynamic range can be ensured.

The present embodiment is not limited to the method or manner of acquiring a higher luminance dynamic range image based on the first image and the second image. Some implementations of acquiring a higher luminance dynamic range target image based on a first image and a second image are described below.

In some possible implementations, the method can include the following operations of performing matching processing on the first image and the second image to obtain a matching result, and performing brightness fusion processing on the first image and the second image according to the matching result to obtain a target image.

The matching result is used for indicating which pixel point in the first image is matched with which pixel point in the second image, and which pixel point in the first image is matched with which pixel point in the second image can be understood as which pixel point in the first image and which pixel point in the second image are the same point in the scene.

As an example, the matching result may be a conversion relationship between coordinates of a pixel point in the first image and coordinates of a pixel point in the second image, and a pixel point in the first image and a pixel point in the second image satisfying the coordinate conversion relationship may be regarded as matching.

The brightness fusion processing is performed on the first image and the second image according to the matching result, which can be understood as fusing the brightness of the matched pixel points in the first image and the second image, and the obtained pixel points form the target image.

For example, the brightness of the matching pixel point in the second image is used to fuse the brightness of the pixel point in the first image, so as to obtain a first target image, and the brightness of the matching pixel point in the first image is used to fuse the brightness of the pixel point in the second image, so as to obtain a second target image.

Taking the first image capturing device as a left eye camera on the MR glasses and the second image capturing device as a right eye camera on the MR glasses as an example, the first target image may be an image displayed by a left mirror on the MR glasses, and the second target image may be an image displayed by a right mirror on the MR glasses.

In some possible implementations, the operation of matching the first image and the second image may include matching the first image and the second image based on a similarity of the first image and the second image to obtain a first matching result, obtaining a disparity map of the first image relative to the second image and a disparity map of the second image relative to the first image based on the first matching result, obtaining a depth map of the first image based on the disparity map of the first image relative to the second image, obtaining a depth map of the second image based on the disparity map of the second image relative to the first image, matching the first image and the second image based on the depth maps of the first image and the second image to obtain a second matching result, and the second matching result may be used for luminance fusion.

The prior related art may be referred to respectively by matching the first image and the second image based on the similarity of the first image and the second image, acquiring the spatial depth of the pixel point in the first image based on the first matching result, and matching the first image and the second image based on the spatial depth of the pixel point in the first image.

In the implementation manner, after the first image and the second image are matched based on the similarity of the first image and the second image, the matching is also performed based on the depth, so that the matching precision can be improved, the fusion quality can be improved, and the quality of the target image can be improved. The matching is performed based on the similarity between the depth matching to obtain the depth, so that the efficiency of the depth matching can be improved, and the efficiency of obtaining the target image is improved.

In this implementation manner, the disparity map of the first image with respect to the second image may be understood as a set formed by a difference between the coordinate of the pixel point in the first image in the x-axis and the coordinate of the matching pixel point in the second image in the x-axis, and the disparity map of the second image with respect to the first image may be understood as a set formed by a difference between the coordinate of the pixel point in the second image in the x-axis and the coordinate of the matching pixel point in the first image in the x-axis. The matching pixel points are the pixel points indicated by the first matching result.

In this implementation, the depth map of the first image may be understood as a set of distances between the scene point corresponding to the pixel point in the first image and the first image capturing device, and the depth map of the second image may be understood as a set of distances between the scene point corresponding to the pixel point in the second image and the second image capturing device.

Fig. 4 is an exemplary diagram of parallax versus depth for one embodiment of the present application. Wherein f is the focal length of the image capturing devices, xl represents the offset of a pixel point Pl in the first image relative to the center of the first image or the coordinate of the pixel point on the X-axis, xr represents the offset of a pixel point Pr in the second image relative to the center of the second image or the coordinate of the pixel point on the X-axis, b represents the baseline distance between the two image capturing devices, P represents the point on the scene, pl represents the corresponding pixel point in the first image, pr represents the pixel point P in the second image, Z represents the Z-axis, X represents the X-axis, wherein the Y-axis is not marked, and Z represents the depth of the P point.

As can be seen from fig. 4, the parallax d=xl-xr, z=b×f/d of the first image.

The first image may be subject to a targeting process prior to matching the first image and the second image based on similarity of the first image and the second image, the targeting process may include one or more of an image equalization process, a distortion correction process, and a binocular stereo correction process, and the targeting process may be performed on the second image. For convenience of description, an image obtained by performing the target processing on the first image may be referred to as a third image, and an image obtained by performing the target processing on the first image may be referred to as a fourth image.

It is to be understood that, when the target processing includes a plurality of processes among the image equalization processing, the distortion correction processing, and the binocular stereo correction processing, the present embodiment does not limit the execution order of the plurality of processes. As an example, when the target processing includes image equalization processing, distortion correction processing, and binocular stereo correction processing, the image equalization processing, the distortion correction processing, and the binocular stereo correction processing are performed in this order.

The first image and the second image are respectively subjected to image brightness equalization, and the brightness of the first image and the brightness of the second image can be adjusted to be closer to each other, so that the matching precision of the first image and the second image can be improved, and the quality of the target image can be improved.

And the first image and the second image are subjected to distortion correction respectively, so that the matching precision of the first image and the second image can be improved, and the quality of the target image can be improved.

The binocular stereo correction processing is performed on the first image and the second image respectively, so that the matching efficiency of the first image and the second image can be improved, and further the efficiency of acquiring the target image can be improved.

In this embodiment, binocular stereo correction is performed on the first image and the second image, so that the coordinates of the imaging origin of the corrected first image and the imaging origin of the corrected second image are consistent, and the optical axes of the corrected first image and the corrected second image are parallel and coplanar. In this way, any pixel point on the corrected first image and the matching point on the corrected second image are necessarily in the same row of pixel points, so that only one-dimensional search is performed on the row of pixel points in the corrected first image or the corrected second image, the matching pixel point in the other image can be obtained, and the matching efficiency can be improved.

In some possible implementations of the present embodiment, after the disparity map of the first image or the second image is acquired, temporal smoothing and hole filling operations may be performed on the disparity map, and then a depth map of the first image or the second image is acquired based on the disparity map obtained by the operations. The depth map is more accurate, the matching result of the first image and the second image is more accurate, and the quality of the target image obtained by fusion is higher.

The time domain smoothing operation may include aligning a disparity map of a previous frame with a disparity map of a current frame according to a relation between a previous frame and a next frame, comparing a depth value change of the corresponding pose, and setting a threshold to filter noise points with larger changes of the depth value. For example, in this embodiment, a disparity map of a preceding image captured by a first image capturing apparatus (or a second image capturing apparatus) at a time after the first image (or the second image) may be acquired, the disparity map of the preceding image and the disparity map of the first image (or the second image) may be aligned based on a pose relationship between the first image (or the second image) and the preceding image, a depth value transformation case may be compared, and pixels having a large change in depth value in the disparity map of the first image (the second image) may be filtered based on a set change threshold, and the pixels may be referred to as noise points.

The cavity filling operation can comprise the steps of establishing a global optimization equation according to a parallax map neighborhood pixel constraint relation, solving a missing parallax value, performing airspace smoothing operation on the existing parallax value, and finally obtaining a smooth dense parallax map. For example, in this embodiment, a global optimization equation may be established based on a domain pixel constraint relationship of a disparity map of a first image (second image), so as to solve a missing disparity value, and at the same time, perform spatial smoothing operation on an existing disparity value, and finally obtain a smoothed dense disparity map of the first image (or the second image). The dense disparity map may be taken as a final disparity map of the first image (second image), and a depth map of the first image (second image) may be determined and image matching performed based on the final disparity map.

In some possible implementations of the present embodiment, matching the first image and the second image based on the depth map of the first image and the depth map of the second image may include mapping the first image into the coordinate system of the first image capturing device based on a conversion relationship between the depth map of the first image and the coordinate system of the first image capturing device to obtain a first three-dimensional image, converting the first three-dimensional image into the coordinate system of the second image capturing device based on a conversion relationship between the coordinate system of the first image capturing device and the coordinate system of the second image capturing device to obtain a second three-dimensional image, mapping the second image into the coordinate system of the second image capturing device based on a conversion relationship between the depth map of the second image and the coordinate system of the second image capturing device to obtain a third three-dimensional image, and determining the second matching result of the first image and the second image based on the matching result of the second three-dimensional image and the third three-dimensional image.

In some implementations, determining the second matching result of the first image and the second image based on the matching result of the second three-dimensional image and the third three-dimensional image may include finding two three-dimensional space points in the second three-dimensional image that match with the third three-dimensional image based on the second three-dimensional image and the third three-dimensional image, where a point belonging to the second three-dimensional image is referred to as a first three-dimensional space point and a point belonging to the third three-dimensional image is referred to as a second three-dimensional space point, finding a three-dimensional space point in the first three-dimensional image that maps to obtain the first three-dimensional space point, continuing to find a pixel in the first three-dimensional image that maps to obtain the three-dimensional space point in the first three-dimensional image, and finding a pixel in the second image that maps to obtain the three-dimensional space point in the third three-dimensional image, and determining that the pixel in the first image and the pixel in the second image are matched.

In other possible implementations of the present embodiment, matching the first image and the second image based on the depth map of the first image and the depth map of the second image may include mapping the second image into the coordinate system of the second image capturing device based on a conversion relationship between the depth map of the second image and the coordinate system of the second image capturing device to obtain a fourth three-dimensional image, converting the fourth three-dimensional image into the coordinate system of the first image capturing device based on a conversion relationship between the coordinate system of the second image capturing device and the coordinate system of the first image capturing device to obtain a fifth three-dimensional image, mapping the first image into the coordinate system of the first image capturing device based on a conversion relationship between the depth map of the first image and the coordinate system of the first image capturing device to obtain a sixth three-dimensional image, and determining a second matching result of the first image and the second image based on the matching result of the fifth three-dimensional image and the sixth three-dimensional image.

In some implementations, determining the second matching result of the first image and the second image based on the matching result of the fifth three-dimensional image and the sixth three-dimensional image may include finding two three-dimensional space points in the fifth three-dimensional image that match with the sixth three-dimensional image based on the fifth three-dimensional image and the sixth three-dimensional image, a point belonging to the fifth three-dimensional image of the two three-dimensional space points being referred to as a third three-dimensional space point, a point belonging to the sixth three-dimensional image being referred to as a fourth three-dimensional space point, finding a three-dimensional space point in the fourth three-dimensional image that maps to obtain the third three-dimensional space point, continuing to find a pixel in the second image that maps to obtain the third three-dimensional space point in the fourth three-dimensional image, and finding a pixel in the first image that maps to obtain the third three-dimensional space point in the sixth three-dimensional image, and determining that the pixel in the first image and the pixel in the second image are matched.

In still other possible implementations of the present embodiment, matching the first image and the second image based on the depth map of the first image and the depth map of the second image may include mapping the first image into the coordinate system of the first image capturing device based on a conversion relationship between the depth map of the first image and the coordinate system of the first image capturing device to obtain a seventh three-dimensional image, converting the seventh three-dimensional image into the world coordinate system based on a conversion relationship between the coordinate system of the first image capturing device and the world coordinate system to obtain an eighth three-dimensional image, mapping the second image into the coordinate system of the second image capturing device based on a conversion relationship between the depth map of the second image and the coordinate system of the second image capturing device to obtain a ninth three-dimensional image, converting the ninth three-dimensional image into the world coordinate system based on a conversion relationship between the coordinate system of the second image capturing device and the world coordinate system to obtain a thirteenth three-dimensional image, and determining a matching result of the first image and the second image based on the eighth three-dimensional image and the thirteenth matching result of the second image.

In some implementations, determining the second matching result of the first image and the second image based on the matching result of the eighth three-dimensional image and the thirteenth three-dimensional image may include finding two three-dimensional space points in the eighth three-dimensional image that are matched with the thirteenth three-dimensional image based on the eighth three-dimensional image and the thirteenth three-dimensional image, where a point belonging to the eighth three-dimensional image is called a fifth three-dimensional space point, a point belonging to the thirteenth three-dimensional image is called a sixth three-dimensional space point, finding a three-dimensional space point in the seventh three-dimensional image that is mapped to a fifth three-dimensional space point, continuing to find a three-dimensional space point in the first image that is mapped to a sixth three-dimensional space point, continuing to find a pixel in the ninth three-dimensional image that is mapped to a sixth three-dimensional space point, and determining that the pixel in the first image and the pixel in the second image are matched.

In some possible implementations of the embodiment, the luminance fusion processing is performed on the first image and the second image based on the matching result, and the luminance fusion processing includes calculating weights based on the luminance of the pixel points in the first image and the luminance of the matched pixel points in the second image for the common view region, then fusing according to the weights, and performing luminance adjustment on the non-common view region according to the overall luminance change of the common view region for the non-common view region, and adding an excessive region. The common view area is understood to be an area formed by the matched pixels.

One implementation of luminance fusion in this embodiment includes combining three weight information of contrast C _i,j,k, saturation S _i,j,k, exposure suitability E _i,j,k, with overall higher weights tending to give pixels of high contrast, high saturation, and good exposure. An expression for calculating the fusion weight at this time is as follows:

W_i,j,k＝C_i,j,k ^wc*S_i,j,k*E_i,j,k

Where i, j, k represent row coordinates (x-axis coordinates), column coordinates (y-axis coordinates), respectively.

One implementation manner of brightness adjustment in this embodiment includes respectively counting average adjustment Ratio _gray(I) of different gray values of fusion areas 0-255 in the first image and the second image, and average brightness adjustment Ratio _avg of the whole fusion area, and then traversing unfused areas in the first image and the second image, wherein brightness values in the unfused areas are adjusted according to the current gray values. One way to calculate the luminance value I _adjust of the pixel point in the unfused region is as follows:

I_adjust＝I*(w*Ratio_gray(I)+(1-w)*Ratio_avg)

wherein w is the average local and global weight, the value range is 0 to 1, and the default is 0.5, and the weight can be adjusted according to the effect.

In one implementation manner of luminance transition in this embodiment, the calculation manner of the luminance value I _transition of the transition region is:

I_transition＝d*I_merge+(1-d)*I_adjust

Wherein d is the weight of two areas, the value of the range of the transition area is 0-1, the value of the range of the transition area is smaller as the distance from the fusion area is larger, the size of the transition area is generally 1% of the resolution of the full map, the adjustment can be carried out according to the actual effect (the transition range is obviously enlarged when the boundary is obvious), and I _merge is the brightness of the matched pixel point of the common-view area.

In some embodiments of the present application, optionally, after obtaining the target image, the target image may also be displayed on a display screen, for example, a first target image is displayed on a left mirror of the MR glasses, and a second target image is displayed on a right mirror of the MR glasses.

In some embodiments of the present application, there is also provided an image processing apparatus for implementing the image processing method in any of the foregoing method embodiments.

Fig. 5 is an exemplary configuration diagram of an image processing apparatus according to an embodiment of the present application. As shown in fig. 5, the image processing apparatus 500 may include a control module 501 and a processing module 502.

As an example, the image processing apparatus 500 may be used to implement the processing method of the embodiment shown in fig. 3. For example, the control module 501 may be used to perform S310 and the processing module 502 may be used to perform S320.

The present application also provides an image processing apparatus, which may include the processor 103. Optionally, the image processing apparatus may further comprise a memory 104.

Optionally, the image processing apparatus may further include an image pickup apparatus 101 and an image pickup apparatus 102.

The present application also provides a computer storage medium having stored thereon an image processing program which, when executed by a processor, implements the steps of the image processing program method according to any of the above embodiments.

The specific embodiments of the computer storage medium of the present application are substantially the same as the embodiments of the image processing program method of the present application described above, and will not be described herein.

The present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the image processing method according to the present application as described in any of the above embodiments, and is not described herein in detail.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a TWS headset or the like) to perform the method according to the embodiments of the present application.

Claims

1. An image processing method, comprising:

Controlling a first image pickup device and a second image pickup device to pick up images with the same exposure time at the same moment to obtain a first image and a second image, wherein the unit pixel size of the first image pickup device is different from that of the second image pickup device, the first image is the image with the same exposure time, which is obtained by the first image pickup device at the same moment, and the second image is the image with the same exposure time, which is obtained by the second image pickup device at the same moment;

and acquiring a target image based on the first image and the second image, wherein the brightness dynamic range of the target image is higher than that of the first image and that of the second image.

2. The method of claim 1, wherein the acquiring a target image based on the first image and the second image comprises:

matching the first image and the second image based on the similarity between the first image and the second image to obtain a first matching result;

Acquiring a parallax map of the first image and a parallax map of the second image based on the first matching result;

acquiring a depth map of the first image based on the parallax map of the first image;

Acquiring a depth map of the second image based on the parallax map of the second image;

Performing matching processing on the first image and the second image based on the depth map of the first image and the depth map of the second image to obtain a second matching result;

And carrying out brightness fusion processing on the first image and the second image according to the second matching result to obtain the target image.

3. The method of claim 2, wherein the matching the first image and the second image based on the similarity between the first image and the second image comprises:

Performing target processing on the first image and the second image respectively to obtain a third image and a fourth image, wherein the target processing comprises binocular stereo correction processing;

Matching the first image and the second image based on similarity between the third image and the fourth image.

4. A method according to claim 2 or 3, wherein said matching the first image and the second image based on the depth map of the first image and the depth map of the second image comprises:

Mapping the first image into the coordinate system of the first image pickup device based on the depth map of the first image and the conversion relation between the coordinate system of the first image and the coordinate system of the first image pickup device to obtain a first three-dimensional image;

converting the first three-dimensional image into the coordinate system of the second image pickup device based on the conversion relation between the coordinate system of the first image pickup device and the coordinate system of the second image pickup device to obtain a second three-dimensional image;

mapping the second image into the coordinate system of the second image pickup device based on the depth map of the second image and the conversion relation between the coordinate system of the second image and the coordinate system of the second image pickup device to obtain a third three-dimensional image;

the second matching result of the first image and the second image is determined based on the matching result of the second three-dimensional image and the third three-dimensional image.

5. An image processing apparatus comprising means for performing the method of any one of claims 1 to 4.

6. An image processing apparatus comprising a processor coupled to a memory for storing instructions that, when executed by the processor, cause the apparatus to perform the method of any of claims 1 to 4.

7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, causes the method according to any one of claims 1 to 4 to be performed.

8. A computer program product comprising a computer program which, when run, causes the method of any one of claims 1 to 4 to be performed.

9. An image processing apparatus, comprising a first image capturing device, a second image capturing device, and a processing unit, wherein a unit pixel dot size of the first image capturing device is different from a unit pixel size of the second image capturing device:

The processing unit is used for controlling a first image pickup device and a second image pickup device to pick up images with the same exposure time length at the same moment to obtain a first image and a second image, acquiring a target image based on the first image and the second image, wherein the unit pixel size of the first image pickup device is different from the unit pixel size of the second image pickup device, the first image is an image with the same exposure time length obtained by the first image pickup device at the same moment, the second image is an image with the same exposure time length obtained by the second image pickup device at the same moment, and the brightness dynamic range of the target image is higher than that of the first image and the second image.

10. The apparatus according to claim 9, wherein the processing unit is configured to, when acquiring a target image based on the first image and the second image, in particular:

The method comprises the steps of obtaining a first matching result by matching a first image and a second image based on similarity between the first image and the second image, obtaining a parallax image of the first image and a parallax image of the second image based on the first matching result, obtaining a depth image of the first image based on the parallax image of the first image, obtaining a depth image of the second image based on the parallax image of the second image, carrying out matching processing on the first image and the second image based on the depth image of the first image and the depth image of the second image to obtain a second matching result, and carrying out brightness fusion processing on the first image and the second image according to the second matching result to obtain the target image.

11. The apparatus according to claim 10, wherein the processing unit is configured to match the first image and the second image based on a similarity between the first image and the second image, in particular configured to:

performing target processing on the first image and the second image respectively to obtain a third image and a fourth image, wherein the target processing comprises binocular stereo correction processing; matching the first image and the second image based on similarity between the third image and the fourth image.

12. The apparatus according to claim 10 or 11, wherein the processing unit is configured to perform a matching process on the first image and the second image based on a depth map of the first image and a depth map of the second image, in particular configured to:

The method comprises the steps of mapping a first image into a coordinate system of a first camera device based on a depth map of the first image and a conversion relation between the coordinate system of the first image and the coordinate system of the first camera device to obtain a first three-dimensional image, converting the first three-dimensional image into a coordinate system of a second camera device based on a conversion relation between the coordinate system of the first camera device and the coordinate system of the second camera device to obtain a second three-dimensional image, mapping the second image into the coordinate system of the second camera device based on the depth map of the second image and a conversion relation between the coordinate system of the second image and the coordinate system of the second camera device to obtain a third three-dimensional image, and determining the second matching result of the first image and the second image based on the matching result of the second three-dimensional image and the third three-dimensional image.

13. A terminal device comprising an image processing apparatus according to any one of claims 9 to 12.

14. The terminal device of claim 13, wherein the terminal device is a mediated reality MR glasses.