CN119135868A

CN119135868A - A method and device for generating binocular images based on monocular images

Info

Publication number: CN119135868A
Application number: CN202411341309.7A
Authority: CN
Inventors: 陈伯伦
Original assignee: Hangzhou Xiaoying Innovation Technology Co ltd
Current assignee: Hangzhou Xiaoying Innovation Technology Co ltd
Priority date: 2024-09-25
Filing date: 2024-09-25
Publication date: 2024-12-13

Abstract

The present invention discloses a method and device for generating a binocular image based on a monocular image, which belongs to the field of image processing technology, and includes the following steps: obtaining an original monocular image, and performing a monocular depth prediction on the original monocular image to obtain a foreground depth map; generating a mask depth map and a background depth map according to the foreground depth map, and performing mixed calculation on the foreground depth map, the mask depth map and the background depth map to obtain a target depth map; obtaining the distortion change values corresponding to the left and right eye views, respectively, and deforming the original monocular image according to the distortion change values and the target depth map to obtain a binocular image corresponding to the original monocular image. This application realizes the direct operation and display of stereoscopic images on devices such as iPhone, iPad, Mac and VisionOS. Through this method, users can quickly convert previously shot or recorded materials into stereoscopic images and view them, which is simple and fast, and effectively improves the user experience.

Description

Method and device for generating binocular picture based on monocular picture

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a device for generating a binocular picture based on a monocular picture.

Background

With the proliferation of stereoscopic effect demands in the market and the frequent updating and iteration of spatial devices such as Virtual Reality (VR) and Mixed Reality (MR), conventional monocular shooting or recording of common pictures cannot meet the demands of users for immersive experience in the devices, while stereoscopic pictures can significantly improve the visual experience of users, but have low popularity due to the limitation of the existing hardware devices.

Currently, some solutions have been tried to solve this problem on the market, for example, the Vision Pro device has been provided with a function of converting a normal photo into a photo with a sense of spatial depth in its VisionOS 2.0.0 Beta version which is not formally released, however, the function is only applicable to photos and is limited by a specific operating system version and device type, and devices such as iPhone, iPad and Mac do not support this function.

In addition, most of the technical schemes on the market rely on cloud servers to perform complex image processing work, which means that data needs to be uploaded to the servers to complete conversion and finally downloaded back to the client device.

Disclosure of Invention

The invention aims to provide a method and a device for generating a binocular picture based on a monocular picture, which are used for solving the problems that the monocular picture is converted into the binocular picture, the monocular picture is limited by an operating system version and a device type and depends on a cloud server for image processing in the prior art.

In order to achieve the above purpose, the present application adopts the following technical scheme:

the application discloses a method for generating a binocular picture based on a monocular picture surface, which comprises the following steps of:

Acquiring an original monocular image, and performing monocular depth prediction on the original monocular image to obtain a front depth map;

respectively generating a mask depth map and a background depth map according to the foreground depth map, and performing mixed calculation on the foreground depth map, the mask depth map and the background depth map to obtain a target depth map;

and obtaining distortion change values corresponding to the left eye view and the right eye view respectively, and deforming the original monocular picture according to the distortion change values and the target depth map to obtain a binocular picture corresponding to the original monocular picture.

Preferably, generating a mask depth map from the foreground depth map includes:

processing the foreground depth map by using an edge detection algorithm to obtain an edge detection image, and judging whether the edge detection image is normally generated or not;

When the edge detection image is normally generated, converting the edge detection image into a binary image by utilizing a color threshold algorithm, and judging whether the binary image is normally generated or not;

when the binary image is normally generated, performing smoothing processing on the binary image by using a blurring algorithm to obtain a smoothed binary image, and judging whether the smoothed binary image is normally generated or not;

And when the smoothed binary image is generated normally, performing background extraction on the smoothed binary image by using a color threshold algorithm to obtain a mask depth map.

Preferably, generating a background depth map according to the foreground depth map includes:

and carrying out smoothing treatment on the foreground depth map by using a blurring algorithm to obtain a background depth map.

Preferably, the performing a hybrid calculation on the foreground depth map, the mask depth map, and the background depth map to obtain a target depth map includes:

And respectively sampling the foreground depth map, the mask depth map and the background depth map, performing mixed calculation on all sampling results according to the mask value, and outputting a target depth map.

Preferably, the obtaining a distortion variation value corresponding to each of the left eye view and the right eye view, and deforming the original monocular image according to the distortion variation value and the target depth map to obtain a binocular image corresponding to the original monocular image includes:

sampling the target depth map to obtain a first sampling result, and sampling the original monocular image according to a first channel value of the first sampling result to obtain a second sampling result;

Calculating distortion change values corresponding to the left eye view and the right eye view respectively according to the focal length of the camera and the distance between the two eyes and the depth information of the second sampling result;

respectively carrying out matrix transformation on the original monocular images according to the distortion change values respectively corresponding to the left eye view and the right eye view to obtain the left eye view and the right eye view;

And respectively extracting the transparency in the left eye view and the right eye view, and optimizing the transparency by using a transparency optimization algorithm to obtain a binocular picture.

Preferably, the first channel value is depth information.

Preferably, the performing matrix transformation on the original monocular image according to the distortion variation values corresponding to the left eye view and the right eye view to obtain the left eye view and the right eye view respectively includes:

Respectively carrying out matrix transformation on the original monocular picture according to the distortion change values respectively corresponding to the left eye view and the right eye view to obtain two groups of new pixel positions, and judging whether the two groups of new pixel positions are beyond the image range;

when the two groups of new pixel positions are not beyond the image range, calculating the offset between the second sampling result and the two groups of new pixel positions;

Creating a grid containing all pixel coordinates in the second sampling result, and respectively adding the offset to corresponding coordinates in the grid to obtain new coordinates of corresponding left and right views;

and remapping the second sampling result according to the new coordinates respectively to obtain left and right eye views.

An apparatus for generating a binocular based on a monocular image, comprising:

The prediction module is used for acquiring an original monocular picture and performing monocular depth prediction on the original monocular picture to obtain a foreground depth map;

the conversion module is used for respectively generating a mask depth map and a background depth map according to the foreground depth map, and carrying out mixed calculation on the foreground depth map, the mask depth map and the background depth map to obtain a target depth map;

and the distortion module is used for acquiring distortion change values corresponding to the left eye view and the right eye view respectively, and deforming the original monocular picture according to the distortion change values and the target depth map to obtain a binocular picture corresponding to the original monocular picture.

An electronic device comprising a memory and a processor, the memory to store one or more computer instructions, wherein the one or more computer instructions are executable by the processor to implement a method of generating a binocular picture based on a monocular picture as claimed in any one of the preceding claims.

A computer readable storage medium storing a computer program which, when executed by a computer, implements a method of generating a binocular picture based on a monocular picture as claimed in any one of the preceding claims.

The application has the following beneficial effects:

The method realizes that stereoscopic pictures can be directly operated and displayed on iPhone, iPad, mac, visionOS and other devices, a user can also quickly convert the materials shot or recorded in the past into stereoscopic pictures and view the stereoscopic pictures, meanwhile, based on player application, the user can directly play common videos and immediately render the stereoscopic pictures by the method, based on shooting application, the user can also realize shooting and recording of the stereoscopic pictures on the devices which originally do not support the shooting of the stereoscopic pictures, in addition, the method also supports the stereoscopic upgrading of the user materials by using a network disk and the like, so that the user experience is remarkably improved in space devices such as VR, MR and the like.

Drawings

FIG. 1 is a flow chart of a method for generating a binocular picture based on a monocular picture provided by an embodiment of the present application;

Fig. 2 is a schematic structural diagram of an apparatus for generating a binocular image based on a monocular image according to an embodiment of the present application;

fig. 3 is a schematic diagram of an electronic device for implementing a method for generating a binocular picture based on a monocular picture according to an embodiment of the present application.

Detailed Description

In order to make the technical scheme of the application clearer, the application is further described in detail below with reference to the attached drawings and specific embodiments. The terms "first," "second," and the like in the claims and the description of the application, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order, and it is to be understood that the terms so used may be interchanged, if appropriate, merely to describe the manner in which objects of the same nature are distinguished in the embodiments of the application by the description, and furthermore, the terms "comprise" and "have" and any variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1, the present embodiment provides a method for generating a binocular picture based on a monocular picture, which includes the following steps:

s110, acquiring an original monocular image, and performing monocular depth prediction on the original monocular image to obtain a front depth map;

s120, respectively generating a mask depth map and a background depth map according to the foreground depth map, and performing mixed calculation on the foreground depth map, the mask depth map and the background depth map to obtain a target depth map;

s130, obtaining distortion change values corresponding to the left eye view and the right eye view respectively, and deforming the original monocular picture according to the distortion change values and the target depth map to obtain a binocular picture corresponding to the original monocular picture.

The method provided by the embodiment is mainly applied to iOS, iPadOS, macOS, visionOS, application program apple related ecology, VR and MR related space equipment or service end, and is used for converting a common picture into a stereoscopic binocular picture, namely 2D to 3D, but not 3D modeling.

Specifically, an original monocular image to be processed is obtained, monocular depth prediction is performed on the original monocular image through an AI model, wherein the AI model can be a monocular depth estimation model Monocular Depth Estimation or other open source models capable of achieving the same effect, and a front depth map is obtained.

Further, processing the foreground depth map by using an edge detection algorithm to obtain an edge detection image, and judging whether the edge detection image is normally generated or not;

when the binary image is normally generated, smoothing the binary image by using a fuzzy algorithm to obtain a smoothed binary image, and judging whether the smoothed binary image is normally generated or not;

The method comprises the steps of processing a foreground depth image by using an edge detection algorithm to obtain an edge detection image so as to highlight the edge of depth information, judging whether the edge detection image is normally generated or not, judging whether an output picture is normal or not, if the picture is empty, whether an encoding format of the picture supports, if the output is empty or not and the like, specifically, adjusting according to an on-line running environment, if the output is normal, carrying out binary division on the edge detection image by using a color threshold algorithm, namely, setting pixels with brightness higher than a set threshold in the edge detection image as one color, setting pixels with brightness lower than the set threshold as the other color, dividing the color in the edge detection image into two extreme ends so as to obtain a binary image with high contrast, judging whether the binary image is normally generated or not at the same time, if the output is not normally generated, carrying out smoothing processing on the binary image by using a fuzzy algorithm, if the output is not normally generated, and carrying out a depth masking algorithm, if the output is not normally generated by using a fuzzy algorithm, and if the depth image is not normally generated, and finally, carrying out a depth masking.

The foreground depth map is then processed using a blurring algorithm to obtain a background depth map with a soft, smooth visual effect.

Of primary interest to the foreground depth map ‌ is the foreground object in the scene, i.e., the object closest to the camera or observer, which provides distance information between the foreground object and the camera, typically used to highlight or analyze the primary elements in the scene.

‌ Background depth map ‌, which focuses on the background portion of the scene, i.e., objects located behind the foreground, provides distance information between the background objects and the camera, helping to understand the overall structure of the scene and the location of the background elements.

‌ Mask depth map ‌, also known as a depth mask, is a special depth map that creates a mask by assigning a depth value to each pixel, which mask can be used to mask or hide certain parts of a scene for specific visual effects or analysis purposes, and has wide application in the fields of computer graphics, virtual reality, augmented reality, etc., for example, by adjusting the mask depth map to control which parts of information should be displayed or hidden, thereby achieving more complex image processing effects.

And then mixing the obtained foreground depth map, the mask depth map and the background depth map, specifically, sampling the images respectively, performing mixed calculation on all sampling results according to the mask value, and finally outputting the results to obtain the target depth map.

Further, sampling the target depth map to obtain a first sampling result, and sampling the original monocular image according to a first channel value of the first sampling result to obtain a second sampling result;

The target depth map is sampled first, then the first channel value of the first sampling result is extracted, the depth map is one-dimensional in general, each pixel only has one value to represent the depth, but the depth map can also be multi-channel, at this time, only one channel carries the actual depth information, other channels can be reserved for being compatible with certain image processing libraries or formats, when the first channel is mentioned, the first channel is usually referred to as a single channel storing the depth information in the depth map, therefore, the first channel value extracted here is essentially the depth information, the original monocular picture is sampled according to the depth information, at the same time, the sampled original monocular picture, namely the second sampling result, is ensured to be consistent with the size of the sampled target depth map, and then, calculating the parallax of each pixel point relative to the central viewpoint, namely the displacement in the horizontal direction, according to the focal length and the interocular distance of the camera and the depth information of the second sampling result, wherein the displacement value is a distortion change value, the distortion change value is negative (left offset) for the left eye view, the distortion change value is positive (right offset) for the right eye view, after the calculation is completed, respectively performing matrix transformation on the original monocular image based on the distortion change values respectively corresponding to the left and right views to change the positions of pixels in the original monocular image, thereby obtaining two groups of new pixel positions respectively corresponding to the left and right views, respectively comparing the two groups of new pixel positions with the image range, if the two groups of new pixel positions do not exceed the image range, calculating the offset between the second sampling result and the two groups of new pixel positions, simultaneously creating a grid containing all pixel coordinates in the second sampling result, and adding the offsets to corresponding coordinates in the grid to obtain new coordinates of a left view and a right view corresponding to each pixel point, then remapping a second sampling result according to the new coordinates to obtain a left view and a right view, extracting transparency Alpha in the left view and the right view respectively, optimizing the transparency respectively by using a transparency optimization algorithm, specifically defining four sampling directions up, down, left and right, calculating sampling point positions of each pixel in four directions around the pixel, acquiring Alpha of the four sampling points, comparing the Alpha, and taking the maximum value as a final transparency value of the pixel, thereby obtaining an optimized left and right view, namely a binocular picture.

The embodiment realizes that stereoscopic pictures can be directly operated and displayed on iPhone, iPad, mac, visionOS and other devices, a user can quickly convert the materials shot or recorded in the past into stereoscopic pictures and view the stereoscopic pictures, meanwhile, based on player application, the user can directly play common videos and immediately render the stereoscopic pictures by the method, based on shooting application, the user can also realize shooting and recording of the stereoscopic pictures on the devices which originally do not support the shooting of the stereoscopic pictures, in addition, the method also supports the stereoscopic upgrading of the user materials by using a network disk and other applications, so that the user experience is remarkably improved in VR, MR and other space devices.

As shown in fig. 2, the present embodiment further provides an apparatus for generating a binocular image based on a monocular image plane, including:

the prediction module 10 is configured to obtain an original monocular image, and perform monocular depth prediction on the original monocular image to obtain a foreground depth map;

the conversion module 20 is configured to generate a mask depth map and a background depth map according to the foreground depth map, and perform a hybrid calculation on the foreground depth map, the mask depth map, and the background depth map to obtain a target depth map;

the warping module 30 is configured to obtain a warping value corresponding to each of the left eye view and the right eye view, and warp the original monocular image according to the warping value and the target depth map to obtain a binocular image corresponding to the original monocular image.

One embodiment of the device may be that the prediction module 10 obtains an original monocular image, performs monocular depth prediction on the original monocular image to obtain a foreground depth map, the conversion module 20 generates a mask depth map and a background depth map according to the foreground depth map obtained by the prediction module 10, performs mixed computation on the foreground depth map, the mask depth map and the background depth map to obtain a target depth map, and the warping module 30 obtains a warping value corresponding to each of left and right eye views, and deforms the original monocular image according to the warping value and the target depth map obtained by the conversion module to obtain a binocular image corresponding to the original monocular image.

As shown in fig. 3, the present embodiment further provides an electronic device, including a memory 301 and a processor 302, where the memory 301 is configured to store one or more computer instructions, and the one or more computer instructions are executed by the processor 302 to implement a method for generating a binocular image based on a monocular image as described above.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the electronic device described above may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

A computer-readable storage medium storing a computer program which, when executed by a computer, implements a method of generating a binocular picture based on a monocular picture as described above.

By way of example, a computer program may be divided into one or more modules/units stored in the memory 301 and executed by the processor 302 and completed by the input interface 305 and the output interface 306 to complete the present invention, and one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program in a computer device.

The computer device may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The computer device may include, but is not limited to, a memory 301, a processor 302, it will be understood by those skilled in the art that the present embodiment is merely an example of a computer device and is not limiting of a computer device, may include more or fewer components, or may combine certain components, or different components, e.g., a computer device may also include an input 307, a network access device, a bus, etc.

The Processor 302 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors 302, digital signal processors 302 (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor 302 may be a microprocessor 302 or the processor 302 may be any conventional processor 302 or the like.

The memory 301 may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The memory 301 may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, and further, the memory 301 may also include an internal storage unit of the computer device and an external storage device, the memory 301 may be used to store computer programs and other programs and data required by the computer device, and the memory 301 may also be used to temporarily store the programs and data in the output device 308, where the foregoing storage media include a usb disk, a removable hard disk, a read-only memory ROM303, a random access memory RAM304, a disk or an optical disk, and other various media that can store program codes.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. A method for generating a binocular picture based on a monocular picture, comprising the steps of:

2. The method of generating a binocular based on a monocular image as claimed in claim 1, wherein generating a mask depth map from the foreground depth map comprises:

3. The method of generating a binocular based on a monocular image according to claim 1, wherein generating a background depth map from the foreground depth map comprises:

4. The method for generating a binocular image based on a monocular image according to claim 1, wherein the performing a hybrid calculation on the foreground depth map, the mask depth map, and the background depth map to obtain a target depth map includes:

5. The method for generating a binocular picture based on a monocular picture according to claim 1, wherein the obtaining the distortion variation values corresponding to the left eye view and the right eye view respectively, and deforming the original monocular picture according to the distortion variation values and the target depth map to obtain the binocular picture corresponding to the original monocular picture, comprises:

6. The method for generating a binocular picture based on a monocular picture according to claim 5, the method is characterized in that the first channel value is depth information.

7. The method for generating a binocular picture based on a monocular picture according to claim 5, wherein the performing matrix transformation on the original monocular picture according to the distortion variation values corresponding to the left-eye view and the right-eye view respectively to obtain the left-eye view and the right-eye view respectively comprises:

8. An apparatus for generating a binocular based on a monocular image, comprising:

9. An electronic device comprising a memory and a processor, the memory configured to store one or more computer instructions, wherein the one or more computer instructions are executable by the processor to implement a method of generating a binocular picture based on a monocular picture as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, wherein the computer program when executed causes a computer to implement a method of generating a binocular picture based on a monocular picture as claimed in any one of claims 1 to 7.