CN110400337B

CN110400337B - Image processing method, image processing device, electronic equipment and storage medium

Info

Publication number: CN110400337B
Application number: CN201910618669.XA
Authority: CN
Inventors: 安世杰; 张渊; 马重阳
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-07-10
Filing date: 2019-07-10
Publication date: 2021-10-26
Anticipated expiration: 2039-07-10
Also published as: CN110400337A

Abstract

The application relates to an image processing method, an image processing device, electronic equipment and a storage medium, wherein depth information of each pixel of an image to be processed is acquired; obtaining the three-dimensional position of the pixel in the coordinate system of the image acquisition device according to the depth information and the two-dimensional position of the pixel in the image coordinate system; acquiring a visual angle parameter and a focusing three-dimensional position of a focusing point; the viewing angle parameters are parameters of viewing angles different from fixed viewing angles corresponding to the image to be processed; obtaining the shifted three-dimensional position of the pixel according to the focusing three-dimensional position, the view angle parameter and the pixel three-dimensional position; and projecting each pixel into a two-dimensional coordinate system of the image to be processed according to the shifted three-dimensional position of each pixel to obtain a target image. By the scheme, scenes in the image to be processed have different display effects corresponding to different observation visual angles.

Description

Image processing method, image processing device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

The scene in the image captured by the image capturing device is often a scene that can be observed by a user in the real world. Moreover, when the user observes in the real world, the observation effect of the same scene is different from the different observation visual angles of the user. For example, in the real world, when the observation angle is a left angle for the same scene, the observation effect is that the left side of the scene is clear, the right side of the scene is blurred, and the like; when the observation angle is a right angle, the observation effect is that the right side of the scene is clear, the left side is fuzzy, and the like.

However, the image capturing device usually captures an image at a fixed viewing angle, and accordingly, when the captured image is displayed, the displaying effect of the scene in the image is the fixed effect of the scene at the fixed viewing angle when the image is captured. Therefore, how to make the scene in the acquired image have different display effects corresponding to different observation angles is an urgent problem to be solved.

Disclosure of Invention

In order to overcome the problems in the related art, the present application provides an image processing method, an apparatus, an electronic device, and a storage medium.

According to a first aspect of embodiments of the present application, there is provided an image processing method, the method including:

acquiring depth information of each pixel of an image to be processed;

obtaining the three-dimensional position of the pixel in the coordinate system of the image acquisition device according to the depth information and the two-dimensional position of the pixel in the image coordinate system;

acquiring a visual angle parameter and a focusing three-dimensional position of a focusing point; the view angle parameters are parameters of view angles different from fixed observation view angles corresponding to the image to be processed; the focusing point is a point which is used as a rotating shaft when the view angle of the scene in the image to be processed is changed;

obtaining the shifted three-dimensional position of the pixel according to the focusing three-dimensional position, the view angle parameter and the pixel three-dimensional position; wherein the shifted three-dimensional position is a three-dimensional position of a scene observed when the scene at the pixel three-dimensional position is observed under the viewing angle parameter;

and projecting each pixel into a two-dimensional coordinate system of the image to be processed according to the shifted three-dimensional position of each pixel to obtain a target image.

Optionally, the step of obtaining depth information of each pixel of the image to be processed includes:

inputting the image to be processed into a preset neural network model to obtain the depth information; the preset neural network model is a model obtained by utilizing a plurality of sample images and depth information labels of the sample images in advance for training; the scene in the sample image is the same as the type of the scene in the image to be processed; the type of the scene is a type divided according to a distribution difference of depths of the scene.

Optionally, the preset neural network is obtained by training through the following steps:

respectively inputting the plurality of sample images into an initial neural network model for training to obtain the predicted depth information of each sample image;

judging whether the neural network model in the current training stage is converged or not according to the predicted depth information, the depth information label, the first loss function, the second loss function and the third loss function; wherein the first loss function is a loss function for calculating an overall error of the predicted depth information and the depth information tag; the second loss function is a loss function for calculating an error of the predicted depth information and the depth information label in a gradient direction; the third loss function is a loss function used for calculating errors of the predicted depth information and the depth information label in a normal vector direction;

if the neural network model is converged, determining the neural network model in the current training stage as the preset neural network model;

if not, adjusting the model parameters of the neural network model in the current training stage by using a random gradient descent algorithm to obtain an adjusted neural network model;

and respectively inputting the plurality of sample images into the adjusted neural network model, and repeating the steps of training and adjusting the model parameters until the adjusted neural network model converges.

Optionally, the step of inputting the plurality of sample images into an initial neural network model for training to obtain the predicted depth information of each sample image includes:

dividing the plurality of sample images into an image set corresponding to the type of the scene according to the type of the scene in each sample image;

counting a first total number of the plurality of sample images and a second total number of sample images in each image set;

taking a ratio of the first total number and the second total number of the image set as a sampling weight of the image set;

selecting a number of sample images corresponding to the sampling weight in the image set, inputting an initial neural network model for training, and obtaining the predicted depth information of the sample images.

Optionally, the step of obtaining a three-dimensional position of the pixel in a coordinate system of the image acquisition device according to the depth information and a two-dimensional position of the pixel in the two-dimensional coordinate system of the image includes:

converting the two-dimensional position of the pixel to homogeneous coordinates;

and taking the depth information of the pixel as the Z coordinate of the homogeneous coordinate of the pixel to obtain the three-dimensional position of the pixel in the coordinate system of the image acquisition device.

Optionally, the step of obtaining the shifted three-dimensional position of the pixel according to the focusing three-dimensional position, the view angle parameter, and the pixel three-dimensional position includes:

acquiring an offset vector for offsetting the pixel from the three-dimensional position to the offset three-dimensional position according to the view angle parameter;

calculating an offset of the pixel three-dimensional position relative to the in-focus three-dimensional position;

multiplying the offset of the pixel by the offset vector to obtain an offset distance of the pixel from the three-dimensional position to the offset three-dimensional position;

and adding the three-dimensional position of the pixel and the offset distance of the pixel to obtain the offset three-dimensional position of the pixel.

Optionally, the step of acquiring the view angle parameter and the focused three-dimensional position of the focus point includes:

acquiring angular motion parameters of the electronic equipment, which are acquired by an angular motion sensor in the electronic equipment displaying the image to be processed, and taking the angular motion parameters as the view angle parameters;

and taking the three-dimensional position of the designated point in the image to be processed as the focusing three-dimensional position of the focusing point.

According to a second aspect of embodiments of the present application, there is provided an image processing apparatus, the apparatus comprising:

a depth information acquisition module configured to acquire depth information of each pixel of an image to be processed;

a pixel three-dimensional position obtaining module configured to obtain a pixel three-dimensional position of the pixel in a coordinate system of an image acquisition device according to the depth information and a two-dimensional position of the pixel in the coordinate system of the image;

the parameter acquisition module is configured to acquire a visual angle parameter and a focusing three-dimensional position of a focusing point; the view angle parameters are parameters of view angles different from fixed observation view angles corresponding to the image to be processed; the focusing point is a point which is used as a rotating shaft when the view angle of the scene in the image to be processed is changed;

a shifted three-dimensional position acquisition module configured to obtain a shifted three-dimensional position of the pixel according to the focusing three-dimensional position, the view angle parameter, and the pixel three-dimensional position; wherein the three-dimensional position after the shift is a three-dimensional position of a scene observed by the image acquisition device when the scene at the pixel three-dimensional position is observed under the view angle parameter;

and the target image acquisition module is configured to project each pixel into a two-dimensional coordinate system of the image to be processed according to the shifted three-dimensional position of each pixel to obtain a target image.

Optionally, the depth information obtaining module is configured to:

Optionally, the pixel three-dimensional position obtaining module is configured to:

Optionally, the offset three-dimensional position obtaining module is configured to:

Optionally, the parameter obtaining module is configured to:

According to a third aspect of embodiments of the present application, there is provided an electronic apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the steps of the image processing method according to the first aspect when executing the executable instructions stored in the memory.

According to a fourth aspect of embodiments of the present application, there is provided a non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor of an electronic device, enable the processor to perform the steps of the image processing method according to the first aspect.

According to a fifth aspect of embodiments of the present application, there is provided a computer program product, which, when run on an electronic device, causes the electronic device to perform the steps of the image processing method according to the first aspect described above.

The technical scheme provided by the embodiment of the application can have the following beneficial effects: the depth information of the image to be processed may reflect a distance between a scene represented by each pixel of the image to be processed and the image acquisition device in the real world, and a two-dimensional position of each pixel in the image coordinate system may reflect a positional relationship between different scenes represented by each pixel. Therefore, according to the depth information of the pixels and the two-dimensional position of the pixels in the image coordinate system, the three-dimensional position of the pixels in the image acquisition device coordinate system can be obtained, and the three-dimensional position can reflect the three-dimensional structure of the scene in the image to be processed in the image acquisition device coordinate system. On this basis, the viewing angle parameter is a parameter of a viewing angle different from a fixed viewing angle corresponding to the image to be processed, and the focusing point is a point as a rotation axis when the viewing angle from which the scene in the image to be processed is viewed is changed. Therefore, the shifted three-dimensional position of the pixel can be obtained from the in-focus three-dimensional position of the focus, the viewing angle parameter, and the pixel three-dimensional position. And the three-dimensional position after the shift is the three-dimensional position of the scene observed by the image acquisition device when the scene at the pixel three-dimensional position is observed under the view angle parameter. Therefore, each pixel is projected into the two-dimensional coordinate system of the image to be processed according to the shifted three-dimensional position of each pixel, and the obtained target image is the image of the scene acquired by the image acquisition device when the scene is acquired by the image acquisition device under the view angle parameter. Therefore, the display effect of the target image has different display effects corresponding to different viewing angle parameters of the fixed observation viewing angle corresponding to the image to be processed. Therefore, the scene in the image to be processed has different display effects corresponding to different observation visual angles when being observed in the real world.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow diagram illustrating an image processing method according to an exemplary embodiment.

Fig. 2(a) is an exemplary diagram illustrating a to-be-processed image and a target image in an image processing method according to an exemplary embodiment.

Fig. 2(b) is an exemplary diagram illustrating another image to be processed and a target image in an image processing method according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating an image processing method according to another exemplary embodiment.

Fig. 4 is a schematic structural diagram of a preset neural network in an image processing method according to another exemplary embodiment.

Fig. 5 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment.

FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.

FIG. 7 is a block diagram illustrating an electronic device in accordance with another example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

An execution main body of the image processing method provided by the embodiment of the application may be an electronic device, and the electronic device may specifically be an image acquisition device, or may be an image display device. For example, when the electronic device is an image capturing device, the electronic device may be a desktop computer, an intelligent mobile terminal, a notebook computer, a wearable intelligent terminal, and the like, in which the image capturing device is installed. When the electronic device is an image display device, the electronic device may be a desktop computer, an internet television, an intelligent mobile terminal, a notebook computer, a wearable intelligent terminal, and the like. Any electronic device capable of displaying images can be used in the present invention, and is not limited herein.

Fig. 1 is a flow chart illustrating an image processing method according to an exemplary embodiment, such as the image processing method shown in fig. 1, which may include the steps of:

step S101, depth information of each pixel of the image to be processed is acquired.

The depth information of each pixel may reflect a distance between the scene corresponding to the pixel and the image capturing device, and therefore, the depth information of the pixel may be used to obtain a three-dimensional position of the pixel in the subsequent step S102. In a specific application, the depth information of each pixel of the image to be processed may be acquired in various manners. Illustratively, the binocular visual depth estimation method may be used to obtain: shooting a scene in an image to be processed by adopting two cameras to obtain two images; and according to the parallax of the two images, the depth information of each pixel of the image to be processed is obtained by utilizing triangulation and solid geometry calculation. Or, for example, the image to be processed may be input into a preset neural network model, so as to obtain depth information of each pixel; the preset neural network model is obtained by training a plurality of sample images and depth information labels of the sample images in advance. For convenience of understanding and reasonable layout, a manner of obtaining depth information of each pixel of the image to be processed by using a preset neural network model is specifically described in the embodiment of fig. 2. Any method capable of acquiring depth information of each pixel of an image can be used in the present invention, and the present embodiment does not limit this.

And S102, acquiring a three-dimensional position of the pixel in the coordinate system of the image acquisition device according to the depth information and the two-dimensional position of the pixel in the coordinate system of the image.

The image coordinate system of the image to be processed is a two-dimensional coordinate system, and the two-dimensional position of the pixel in the image coordinate system may reflect the position relationship of each scene corresponding to each pixel in the real world, for example, the scene S1 is on the left of the scene S2. And, the image acquisition coordinate system is: the method comprises the steps of establishing a rectangular coordinate system by taking the optical center of an image acquisition device for acquiring an image to be processed as an origin, taking the optical axis of the image acquisition device as a Z axis, taking an axis parallel to the X axis of the image to be processed as an X axis and taking an axis parallel to the Y axis of the image to be processed as a Y axis, wherein the coordinate system is a three-dimensional coordinate system capable of reflecting the real world. On this basis, in order to obtain the three-dimensional position of each pixel in the coordinate system of the image capturing device to obtain the shifted three-dimensional position of the pixel using the three-dimensional position in the subsequent step S104, it is necessary to obtain the distance between the pixel and the image capturing device on the basis of the two-dimensional position of the pixel. And the depth information of the image to be processed can reflect the distance between the scene corresponding to each pixel of the image to be processed and the image acquisition equipment in the real world. Therefore, the three-dimensional position of the pixel in the coordinate system of the image acquisition device can be obtained for each pixel according to the depth information of the pixel and the two-dimensional position of the pixel. For the convenience of understanding and reasonable layout, the manner of acquiring the three-dimensional position of each pixel of the image to be processed in the coordinate system of the image acquisition device is specifically described in the following in an alternative embodiment.

In addition, the image capturing device may be various. Illustratively, when the execution subject of the invention is an electronic device capable of image acquisition, the execution subject is an image acquisition device. Such as smart mobile terminals, tablets, etc. Or, when the execution subject of the image processing method provided by the embodiment of the present invention is an electronic device that cannot perform image acquisition, the image acquisition device is a device that acquires an image to be processed. For example, a camera, a smart mobile terminal, etc., which are different from the main subjects of execution of the present invention.

Step S103, acquiring the view angle parameter and the focusing three-dimensional position of the focusing point. The viewing angle parameters are viewing angles different from fixed viewing angles corresponding to the images to be processed; the focus is a point that serves as a rotation axis when changing the angle of view from which the scene in the image to be processed is viewed.

In a specific application, the viewing angle parameters and the manner of acquiring the focus point may be various. This is described below by way of an alternative embodiment.

In an alternative embodiment, step S103 may comprise: and selecting a parameter of a visual angle different from the visual angle corresponding to the image to be processed from a plurality of pre-stored visual angles as a visual angle parameter. And selecting one two-dimensional position corresponding to the view angle parameter from a plurality of pre-stored two-dimensional positions, and taking the three-dimensional position corresponding to the two-dimensional position as a focusing three-dimensional position of the focusing point.

As for the viewing angle parameter, for example, the user selects one of options showing a plurality of viewing angle parameters as the viewing angle parameter, or the electronic device automatically selects one of a plurality of pre-stored viewing angle parameters as the viewing angle parameter. For example, a left view, a right view, or an up view, etc. may be selected. In addition, the image to be processed is two-dimensional, and the three-dimensional position of the pixel is obtained based on the two-dimensional position of the pixel in the image to be processed. Therefore, the observation angle of the image acquisition device when any image to be processed is acquired, that is, the angle corresponding to the scene in any image to be processed, can be regarded as the inherent angle of view of the image to be processed, which is a fixed observation angle of view that does not involve the change of the depth of field.

The in-focus three-dimensional position of the focus point may be various. For example, the plurality of pre-stored two-dimensional positions may be three-dimensional positions of a central point of the image to be processed in the coordinate system of the image acquisition device, or, after dividing the image to be processed into four parts, three-dimensional positions of a central position of an upper left part and a central point of a lower left part in the coordinate system of the image acquisition device, and the like. Or, for example, the three-dimensional position of any point in the image to be processed may be taken as the in-focus three-dimensional position of the focus point. In a specific application, the in-focus three-dimensional position may correspond to any viewing angle parameter, or the center position of the upper left portion may correspond to the upper left viewing angle, the center position of the lower left portion may correspond to the lower left viewing angle, and so on. Moreover, since the focusing point is a point serving as a rotation axis when the viewing angle of the scene in the image to be processed is changed, the focusing point can be used as the center point of different images corresponding to different viewing angles in the viewing angle changing process.

The optional embodiment prestores the visual angle parameters and the focusing three-dimensional position, so that the hardware requirement on the electronic equipment serving as the execution main body of the invention is relatively less, and the visual angle parameters and the focusing three-dimensional position can be acquired without installing an angular motion sensor, a touch screen, and/or an external mouse and other human-computer interaction devices on the electronic equipment.

In another alternative embodiment, the step S103 may specifically include the following steps:

acquiring angular motion parameters of the electronic equipment, which are acquired by an angular motion sensor in the electronic equipment for displaying the image to be processed, and taking the angular motion parameters as view angle parameters;

In a specific application, a user can select a viewing angle parameter through an interactive gesture with the electronic device, when the user interacts with the electronic device in the interactive gesture, the electronic device moves, and at this time, an angular motion sensor of the electronic device can acquire the angular motion parameter of the electronic device and use the angular motion parameter as the viewing angle parameter. According to the optional embodiment, the visual angle parameters and the focusing point can be acquired through the gesture interaction between the user and the electronic equipment, so that the interaction sense and the interestingness of the user in the image display process are improved, and the image display effect is more real and vivid.

The angular motion sensor may be, for example, a gyroscope, and accordingly, the angular motion parameters may include: roll angle and pitch angle. For example, when the electronic device is horizontal, the pitch angle and the roll angle are both 0 degrees, when the electronic device is vertical, the pitch angle is 90 degrees, and when the electronic device is side-down, the roll angle is 90 degrees. Also, the user's interaction gestures with the electronic device may be tilting the electronic device to the left, tilting the electronic device to the right, tilting the electronic device up and tilting the electronic device down, and so on. For example, the angular motion parameter when the user tilts the electronic device to the left may be used as the viewing angle parameter left viewing angle, the angular motion parameter when the user tilts the electronic device to the right may be used as the viewing angle parameter right viewing angle, and so on. Also, the designated point as the focus point in the image to be processed may be various. For example, the designated point may be a point of a position touched by a fingertip when the user moves on the touch screen and/or rotates the image, or a point of a position selected by the user in the image to be processed by using an interactive device such as a mouse. Any method capable of acquiring the view angle parameter and the three-dimensional position of the focus point can be used in the present invention, and this embodiment does not limit this.

And step S104, obtaining the shifted three-dimensional position of the pixel according to the focusing three-dimensional position, the view angle parameter and the pixel three-dimensional position. The three-dimensional position after the deviation is the three-dimensional position of the scene observed by the image acquisition device when the scene at the pixel three-dimensional position is observed under the visual angle parameter.

In the real world, when the same scene is observed at different viewing angles, the pixel arrangement positions of the scene may be shifted in the observed images corresponding to the two viewing angles. Therefore, in order to obtain a target image obtained when a scene at a pixel three-dimensional position is observed under a viewing angle parameter, it is necessary to obtain an offset condition of a three-dimensional position of each pixel in an image to be processed, and then determine the three-dimensional position of each pixel after offset according to the offset condition.

On the basis, in order to ensure that the position relation among the scenes does not change when the visual angle is changed, the focusing point can be determined, and the change of the three-dimensional position of the pixel is equivalent to the deviation of the direction and the distance under the visual angle parameter by taking the focusing point as a fixed point. Therefore, the shifted three-dimensional position of the pixel can be obtained according to the in-focus three-dimensional position, the viewing angle parameter, and the pixel three-dimensional position. For ease of understanding and reasonable layout, the manner in which the shifted three-dimensional positions of the pixels are obtained is described in detail in the following with an alternative embodiment.

And S105, projecting each pixel into a two-dimensional coordinate system of the image to be processed according to the shifted three-dimensional position of each pixel to obtain a target image.

Illustratively, according to the shifted three-dimensional position of each pixel, projecting each pixel into a two-dimensional coordinate system of the image to be processed to obtain the target image, which may specifically include: projecting the shifted three-dimensional position of each pixel in a two-dimensional coordinate system of the image to be processed to obtain the shifted two-dimensional position of each pixel; and arranging the pixels in the image to be processed according to the shifted two-dimensional position of each pixel to obtain a target image.

For the image to be processed, after the observation visual angle of the scene in the image to be processed is changed, the observation effect of the scene is equivalent to observing the scene in the real world under the visual angle parameter to obtain the target image. Therefore, it is necessary to obtain the target image by projecting each pixel into the two-dimensional coordinate system of the image to be processed according to the shifted three-dimensional position of each pixel in step S105.

Illustratively, as shown in fig. 2(a), the image to be processed 201 is a left perspective image in close focus, and the target image 202 is a right perspective image in close focus. Focusing on the near part, and selecting a point at the near part in the image scene to be processed as an opposite focus, wherein the scene at the far part is blurred. When the viewing angle is changed from the left viewing angle of the image to be processed to the right viewing angle of the target image, the portion of the scene 2011 in the image to be processed 201 that can be observed is less than the same scene in the target image 202, and the portion of the scene 2021 in the target image 202 that can be observed is more than the same scene in the image to be processed 201. Also, it is understood that, when the observation angle of view is changed from the left angle of view at close focus to the right angle of view at close focus, the observed image is the target image 202 in fig. 2 (a). It can be seen that the embodiment of the invention can realize that the scene in the image to be processed has different display effects corresponding to different observation visual angles.

Similarly, as shown in fig. 2(b), the image to be processed 203 is a far-focused upper-view image, and the target image 204 is a far-focused lower-view image. Focusing far away, selecting a far point in the image scene to be processed as an opposite focus point, and blurring the scene at near. When the viewing angle is changed from the upper viewing angle of the image to be processed to the lower viewing angle of the target image, the scene 2031 in the image to be processed 203 is compared with the same scene 2041 in the target image 204, and the portion originally hidden by the green belt in the image to be processed 203 can be observed in the target image 204. Also, it is understood that when the observation angle of view is changed from the upper angle of view at far focus to the lower angle of view at far focus, the observed image is the target image 204 in fig. 2 (b). The embodiment of the invention can realize that the scene in the image to be processed has different display effects corresponding to different observation visual angles when being observed in the real world.

Optionally, in step S102: obtaining a three-dimensional position of a pixel in a coordinate system of the image acquisition device according to the depth information and a two-dimensional position of the pixel in the two-dimensional coordinate system of the image, which may specifically include the following steps:

converting the two-dimensional position of the pixel into homogeneous coordinates;

Wherein, homogeneous coordinate includes: an n-dimensional vector is represented by an n + 1-dimensional vector. Thus, in order to obtain the three-dimensional position of a pixel in the image acquisition device coordinate system, the two-dimensional position of the pixel may be converted into homogeneous coordinates for each pixel of the image to be processed. For example, if the two-dimensional position of a pixel is (X, Y), the homogeneous coordinate of the pixel is (X, Y, 1). And the depth information of the pixel can reflect the distance between the scene represented by the pixel and the image acquisition device, so that the depth information of the pixel can be used as the Z coordinate of the homogeneous coordinate of the pixel, and the three-dimensional position of the pixel in the coordinate system of the image acquisition device can be obtained. For example, if the depth information of the pixel is Z, the three-dimensional position of the pixel in the coordinate system of the image capturing device is (X, Y, Z).

Optionally, in step S104: obtaining the shifted three-dimensional position of the pixel according to the focused three-dimensional position, the viewing angle parameter, and the pixel dimensional position, which may specifically include the following steps:

acquiring an offset vector for offsetting the pixel from the three-dimensional position of the pixel to the offset three-dimensional position according to the visual angle parameter of the pixel;

calculating the offset of the pixel three-dimensional position relative to the focusing three-dimensional position;

multiplying the offset of the pixel by the offset vector to obtain the offset distance from the three-dimensional position of the pixel to the offset three-dimensional position;

Wherein the offset vector is indicative of an offset direction for offsetting a pixel from a pixel three-dimensional position to an offset three-dimensional position. The offset vector may be obtained in various ways. For example, when the viewing angle parameter is one of a plurality of pre-stored viewing angles, a corresponding offset vector may be calculated for each pre-stored viewing angle, so as to obtain a corresponding relationship between the pre-stored offset vector and the viewing angle. Therefore, the offset vector corresponding to the view angle parameter can be searched from the preset corresponding relation between the offset vector and the view angle. Or, for example, when the viewing angle parameter is an angular motion parameter acquired by an angular motion sensor in the electronic device for displaying the image to be processed, the angular motion parameter is an angular parameter reflecting a change of an included angle between a plane where the electronic device is located and a horizontal plane, for example, a pitch angle and a roll angle. And the plane of the electronic device is equivalent to the plane of the image to be processed, and the view angle parameter can reflect the change condition of the included angle between the image to be processed and the horizontal plane when the image to be processed is shifted to the shifted three-dimensional position. Thus, the view angle parameters, i.e., the angular motion parameters, e.g., pitch angle and roll angle, may be converted into offset vectors.

In a specific application, calculating, for each pixel, an offset of a three-dimensional position of the pixel with respect to a three-dimensional position in focus may include: the offset Δ d is a × (Zi-Z0). The three-dimensional focusing position is (X0, Y0, Z0), the three-dimensional pixel position of the pixel i is (Xi, Yi, Zi), a is a constant, and i is the serial number of the pixel. If the offset vector is (x, y), the offset distance d for the pixel i to be offset from the pixel three-dimensional position to the offset three-dimensional position may include: d ═ a × (Zi-Z0) × x, a × (Zi-Z0) y, a × (Zi-Z0) ]. Accordingly, the shifted three-dimensional position of the pixel i is [ Xi + a × (Xi-X0) ×, Yi + a × (Yi-Y0) Y, Zi + a × (Zi-Z0) ].

Illustratively, the range of the depth information of the image to be processed is 0-1, and the focusing point is the central point (L/2, W/2,0.5) of the image to be processed, wherein L is the length of the image to be processed, and W is the width of the image to be processed. The depth of a certain pixel at the upper left corner is 0, and the three-dimensional position of the pixel is (0,0, 0). The offset vector under the viewing angle parameter is (x, y), and after the pixel is offset to the offset three-dimensional position under the viewing angle parameter, the new three-dimensional position of the pixel, that is, the offset three-dimensional position of the pixel is [0+ a × (0-0.5) × x, 0+ a × (0-0.5) y, 0 ].

Fig. 3 is a flowchart illustrating an image processing method according to another exemplary embodiment, and as shown in fig. 3, a method for determining personalized content may include the steps of:

step S301, inputting the image to be processed into a preset neural network model to obtain the depth information of each pixel in the image to be processed. The preset neural network model is a model obtained by utilizing a plurality of sample images and depth information labels of the plurality of sample images in training in advance; the scene in the sample image is the same as the scene in the image to be processed in type; the type of the scene is a type divided according to a distribution difference of depths of the scene.

In specific application, the depth distributions of different scenes are different, so that in order to ensure that the trained preset neural network model can cope with diversified depth distributions, the types of the scenes can be divided according to the depth distribution differences of the scenes, and the scenes in the sample image and the scenes in the image to be processed are ensured to be the same in type. Illustratively, the scene types may include an indoor scene, an outdoor scene, and a scene in which a character exists.

Illustratively, as shown in fig. 4, in an image processing method according to the embodiment shown in fig. 3, a structural diagram of a preset neural network may include four parts: base Model, Multi-Scale Model, Feature Fuse layer, and Prediction layer. In either part, Conv2d represents a convolutional layer. Illustratively, the specific structure of the convolutional layer can be a depthwise-pointwise structure, and the convolutional layer has the characteristics of less parameters, smaller model, less first-level accuracy, less loss compared with a partial convolutional structure and the like, and can extract features from different channels (channels) of the image to be processed 401 by using different convolutional kernels; also, features may be extracted for each pixel of the image to be processed 401.

The Base Model is used for extracting features from the bottom layer to the upper layer of the picture of the image 401 to be processed, and is used for providing features for the Multi-Scale Model. The underlying features may be some corner points, edges, corners, and other basic objects. The middle-level features higher than the bottom-level features can be geometric objects such as triangles, circles, squares and the like, and the high-level features higher than the middle-level features are complex and represent that the positions of the features are object objects such as people, cups, automobiles and the like. Therefore, different scene information in the picture can be understood by the preset neural network, and relatively more sufficient and clear data are provided for depth calculation of the subsequent part of the neural network.

The Multi-Scale Model is used for extracting feature maps of different scales from the features provided by the Base Model. Specifically, each pixel position of the feature map records the relative relationship of the receptive field of the point in the image to be processed on the whole image to be processed. Therefore, the Feature maps of different scales can respectively reflect the local features and the global features in the image to be processed, so that the local features and the global features can be provided for the Feature Fuse layer and the Prediction layer.

The Feature Fuse layer is used for recovering the image resolution and reducing the number of channels, and fusing the characteristics from the bottom layer to the high layer so as to provide the overall information of each scene in the image to be processed for the Prediction layer. The Prediction layer is configured to calculate depth information of each pixel in the image to be processed by using the received features, and output the obtained depth information in the form of an image 402.

Step S302, according to the depth information of the pixel and the two-dimensional position of the pixel, the three-dimensional position of the pixel in the coordinate system of the image acquisition device is obtained.

Step S303, acquiring a view angle parameter and a focusing three-dimensional position of a focusing point.

And step S304, obtaining the shifted three-dimensional position of the pixel according to the focusing three-dimensional position, the view angle parameter and the pixel three-dimensional position.

Step S305, projecting each pixel to a two-dimensional coordinate system of the image to be processed according to the shifted three-dimensional position of each pixel to obtain a target image.

The steps S302 to S305 are the same as the steps S102 to S105 in the embodiment of fig. 1, and are not repeated herein, for details, see the description of the embodiment and the alternative embodiment of fig. 1.

In the embodiment of fig. 2, the depth information of the image to be processed is acquired by using the preset neural network, so that the efficiency of acquiring the depth information can be improved, and the hardware cost and the acquisition convenience of acquiring the depth information are reduced. Therefore, when the scene in the image to be processed is observed in the real world, the efficiency improvement and the convenience of image display are considered while different display effects corresponding to different observation visual angles are realized.

Optionally, the preset neural network may be obtained by training specifically using the following steps:

respectively inputting a plurality of sample images into an initial neural network model for training to obtain the predicted depth information of each sample image;

judging whether the neural network model in the current training stage is converged or not according to the predicted depth information, the depth information label, the first loss function, the second loss function and the third loss function of the sample image; the first loss function is used for calculating the overall error of the predicted depth information and the depth information label; the second loss function is used for calculating the errors of the predicted depth information and the depth information label in the gradient direction; the third loss function is used for calculating errors of the predicted depth information and the depth information label in the normal vector direction;

if the neural network model is converged, determining the neural network model in the current training stage as a preset neural network model;

and respectively inputting a plurality of sample images into the adjusted neural network model, and repeating the steps of training and adjusting model parameters until the adjusted neural network model converges.

In a specific application, in a plurality of sample images, the prediction difficulty of the depth information of a sample image with a simple scene depth, such as a simple plane image, is often higher than that of a sample image with a complex scene depth, such as an image with a large number of flowers, plants and trees. In addition, in a certain sample image, the difficulty of predicting the depth information of complex sample features such as object boundaries and statue boundaries at edges is often higher than that of depth information of simple sample features such as road surfaces and table tops of pixels in a plane. Therefore, different loss functions can be set to realize different training degrees of different training degrees for different prediction difficulties.

Specifically, a first loss function may be set to calculate an overall error of the predicted depth information and the depth information label, so as to implement the targeted error calculation from the sample image as a whole. For example, the first loss function may be a HuBer loss function, and the loss function may reduce the training degree of sample images with relatively low prediction difficulty and increase the training degree of sample images with relatively high prediction difficulty, thereby implementing the training of the sample images with complex scene depth and the sample images with simple scene depth. Also, a second loss function may be provided to calculate a loss function of the error in the gradient direction between the predicted depth information and the depth information label. The gradient directions comprise a horizontal direction and a vertical direction, and the depth information of the gradient directions is relatively significant at the edge, so that the addition of the second loss function can improve the training degree of the sample features at the boundary, thereby improving the prediction accuracy of the depth information of the pixels at the boundary. And the third loss function calculates the errors of the predicted depth information and the depth information label in the normal vector direction. The normal vector direction represents the orientation of the plane, so that the training differentiation of the sample at the plane position can be ensured by adding the third loss function when the second loss function is adopted, and the prediction accuracy of the depth information of the pixel at the plane position is ensured. Therefore, the first loss function is used for realizing the differential training of the complex sample image and the simple sample image, and the second loss function and the third loss function are used for realizing the differential training of the complex sample feature and the simple sample feature.

The smaller the error output by each loss function is, the better, when the neural network model in the current training stage converges, it indicates that the error output by each loss function in the neural network model in the current training stage reaches the expected level after training, that is: the predicted value of the depth information of the overall features of the image to be processed reaches a desired level, and the predicted values of the depth information at the edge position and the depth information at the plane position also reach a desired level. In the training process, the stochastic gradient descent algorithm adjusts the model parameters of the convolutional neural network model in the current training stage, so that after the convolutional neural network model is adjusted by the model parameters, the detection result is improved, the difference between the detection result and the pre-labeled category information is reduced, and convergence is achieved. Accordingly, before the model in the current training phase converges, the above steps of training and adjusting the model parameters may be repeated until the adjusted neural network model converges. Of course, each training is directed to the convolutional neural network model with the newly adjusted model parameters.

In addition, after a preset neural network model is obtained through training, the prediction effect of the model can be verified by using the multiple test images and the depth information labels of the multiple test images. Wherein the type of the scene in the test image is the same as the type of the scene in the sample image. The method specifically comprises the following steps: respectively inputting a plurality of test images into a preset neural network model to obtain the predicted depth information of each test image; calculating the error between the predicted depth information of the test image and the depth information label according to the predicted depth information of the test image, the depth information label and a fourth loss function; when the error meets the expected level, the test is passed, the preset neural network model can be used for obtaining the depth information of each pixel of the image to be processed, otherwise, the sample image can be replaced and the training can be carried out again. The fourth loss function may be specifically an average relative error function, or a root mean square error function, or the like.

Optionally, before the step of inputting the plurality of sample images into the initial neural network model for training to obtain the predicted depth information of each sample image, the image display method provided in the embodiment of the present application may further include:

acquiring an enhanced sample image by using the sample image and a preset random disturbance rule; the preset random disturbance rule is a rule capable of adjusting the designated image characteristics of the sample image;

and adding the enhanced sample images in the plurality of sample images for training to obtain a preset neural network model.

In a specific application, the preset random perturbation rule may be various. For example, the preset random perturbation rule may specify image feature adjustment such as image contrast enhancement, image left-right rotation, image random cropping, and/or image pixel perturbation. The enhanced sample image is equivalent to a new sample image obtained by adjusting the image characteristics in a preset random disturbance rule of the sample image.

The optional embodiment is to perform preprocessing on the sample images before training, and after one sample image is subjected to the preprocessing, the diversity of the sample images can be increased. After the enhanced sample images are added in the plurality of sample images, the sample images used for training to obtain the preset neural network model comprise the plurality of sample images and the enhanced sample images, so that the model can learn the sample images containing various conditions. Therefore, the robustness of the preset neural network model can be improved, the preset neural network model is relatively less influenced by the interference of external factors, and for example, the depth information can be calculated for images with interference of illumination variation, contrast variation and the like.

Optionally, the step of inputting the plurality of sample images into the initial neural network model for training to obtain the predicted depth information of each sample image may specifically include:

dividing a plurality of sample images into an image set corresponding to the type of the scene according to the type of the scene in each sample image;

counting a first total number of the plurality of sample images and a second total number of the sample images in each image set;

taking the ratio of the first total number and the second total number of the image set as the sampling weight of the image set;

In specific application, the number of sample images of different scene types has a certain difference, and if one sample image is randomly selected from the sample images for training according to a traditional method, the problems of model overfitting and inaccurate depth information caused by unbalanced sample number can occur. Therefore, in order to reduce overfitting of the preset neural network model and improve accuracy of depth information, corresponding sampling weights can be set for image sets corresponding to different types of scenes. Different sampling weights are set for sample images of different image sets, so that the balance of the number of each type of sample is ensured, and overfitting of the model is reduced.

Illustratively, the image set ag1 of the outdoor scene, the image set ag2 of the indoor scene, and the image set ag3 of the scene in which the human object exists are divided according to the type of the scene in each sample image. A first total K of the plurality of sample images is counted, a second total of the set of images ag1 is K1, a second total of the set of images ag2 is K2, and a second total of the set of images ag3 is K3. The sampling weight of each image set is K/ki: the sampling weight of the image set ag1 is K/K1, the sampling weight of the image set ag2 is K/K2, and the sampling weight of the image set ag3 is K/K3. The sampling weight of the sample images with larger quantity is smaller, and the sampling weight of the sample images with smaller quantity is larger, so that the quantity balance of the sample images with different scene types is ensured during the training of the network model, and the deviation of the model training is prevented.

Corresponding to the method embodiment, the application also provides an image processing device.

Fig. 5 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment. As shown in fig. 5, an image processing apparatus may include: a depth information obtaining module 501, a three-dimensional position obtaining module 502, a parameter obtaining module 503, a three-dimensional position obtaining module 504 after offset, a target image obtaining module 505, and a target image displaying module 506, wherein:

a depth information obtaining module 501 configured to obtain depth information of each pixel of an image to be processed;

a pixel three-dimensional position obtaining module 502 configured to obtain a pixel three-dimensional position of the pixel in an image acquisition apparatus coordinate system according to the depth information and a two-dimensional position of the pixel in an image coordinate system;

a parameter acquiring module 503 configured to acquire a viewing angle parameter and a focused three-dimensional position of the focus point; the view angle parameters are view angles different from fixed observation view angles corresponding to the image to be processed; the focusing point is a point which is used as a rotating shaft when the view angle of the scene in the image to be processed is changed;

a shifted three-dimensional position obtaining module 504 configured to obtain a shifted three-dimensional position of the pixel according to the focusing three-dimensional position, the view angle parameter, and the pixel three-dimensional position; wherein the three-dimensional position after the shift is a three-dimensional position of a scene observed by the image acquisition device when the scene at the pixel three-dimensional position is observed under the view angle parameter;

a target image obtaining module 505, configured to project each pixel into a two-dimensional coordinate system of the image to be processed according to the shifted three-dimensional position of each pixel, respectively, to obtain a target image.

Optionally, the depth information obtaining module 501 is configured to:

Optionally, the pixel three-dimensional position obtaining module 502 is configured to:

Optionally, the offset three-dimensional position obtaining module 504 is configured to:

Optionally, the parameter obtaining module 503 is configured to:

Corresponding to the method embodiment, the application further provides the electronic equipment.

FIG. 6 is an electronic device shown in accordance with an example embodiment. Referring to fig. 6, the electronic device may include:

a processor 601;

a memory 602 for storing processor-executable instructions;

the processor 601 is configured to execute the executable instructions stored in the memory 602 to implement the steps of any image processing method provided in the embodiments of the present application.

Fig. 7 is a block diagram of an electronic device 700 shown in accordance with another example embodiment. For example, the electronic device 700 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, an exercise device, a personal digital assistant, and so forth.

Referring to fig. 7, electronic device 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, and an input/output (I/O) interface 710.

The processing component 702 generally controls overall operation of the electronic device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 702 may include one or more processors 720 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 702 may include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.

The memory 704 is configured to store various types of data to support operation at the device 700. Examples of such data include instructions for any application or method operating on the electronic device 700, contact data, phonebook data, messages, pictures, videos, and so forth. The Memory 704 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as an SRAM (Static Random Access Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM (Erasable Programmable Read-Only Memory), a PROM (Programmable Read-Only Memory), a ROM, a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

The power supply component 706 provides power to the various components of the device 700. The power components 706 may include a power management device, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 700.

The multimedia component 708 includes a screen that provides an output interface between the device 700 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 700 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens device or have a focal length and optical zoom capability.

The I/O interface 710 provides an interface between the processing component 702 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

In an exemplary embodiment, the electronic device 700 may be implemented by one or more Application ASICs (Application Specific Integrated circuits), DSPs (Digital Signal processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gate arrays), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described image Processing methods.

In addition, the present application also provides a non-transitory computer readable storage medium, which is included in an electronic device, and when executed by a processor of the electronic device, enables the electronic device to perform the steps of the image processing method described in any of the embodiments of the present application.

In an exemplary embodiment, a non-transitory computer readable storage medium includes instructions, such as memory 402, that are executable by processor 401 to perform the above-described method; alternatively, the memory 704 comprises instructions executable by the processing component 702 of the electronic device 700 to perform the image processing method provided by any of the embodiments described above. For example, the non-transitory computer readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions that, when run on an electronic device, cause the electronic device to perform the image processing method of any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, by wire from a website, computer, server, or data center, such as: the computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more available media integrated servers, data centers, etc. the available medium may be a magnetic medium such as a floppy Disk, a hard Disk, a magnetic tape, an optical medium such as a DVD (Digital Versatile Disk), or a semiconductor medium such as an SSD (Solid State Disk), etc.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring depth information of each pixel of an image to be processed;

taking the depth information of the pixel as a Z coordinate of the homogeneous coordinate of the pixel to obtain a three-dimensional position of the pixel in a coordinate system of an image acquisition device;

2. The method according to claim 1, wherein the step of obtaining depth information for each pixel of the image to be processed comprises:

3. The method of claim 2, wherein the predetermined neural network is trained by the steps of:

4. The method of claim 3, wherein the step of inputting the plurality of sample images into an initial neural network model for training to obtain the predicted depth information of each sample image comprises:

5. The method of claim 1, wherein the step of obtaining the shifted three-dimensional position of the pixel from the in-focus three-dimensional position, the view angle parameter, and the pixel three-dimensional position comprises:

multiplying the offset of the pixel by the offset vector to obtain an offset distance of the pixel from the three-dimensional position of the pixel to the three-dimensional position after offset;

6. The method according to any one of claims 1 to 5, wherein the step of obtaining the viewing angle parameter and the in-focus three-dimensional position of the focal point comprises:

7. An image processing apparatus, characterized in that the apparatus comprises:

a pixel three-dimensional position acquisition module configured to convert a two-dimensional position of the pixel into homogeneous coordinates; taking the depth information of the pixel as a Z coordinate of the homogeneous coordinate of the pixel to obtain a three-dimensional position of the pixel in a coordinate system of an image acquisition device;

the shifted three-dimensional position acquisition module is configured to acquire a shifted three-dimensional position of the pixel according to the focusing three-dimensional position, the view angle parameter and the three-dimensional position of the pixel; wherein the shifted three-dimensional position is a three-dimensional position of a scene observed when the scene at the pixel three-dimensional position is observed under the viewing angle parameter;

8. The apparatus of claim 7, wherein the depth information acquisition module is configured to:

9. The apparatus of claim 8, wherein the predetermined neural network is trained by the following steps:

10. The apparatus of claim 9, wherein the step of inputting the plurality of sample images into an initial neural network model for training to obtain the predicted depth information of each sample image comprises:

11. The apparatus of claim 7, wherein the offset three-dimensional position acquisition module is configured to:

12. The apparatus of any of claims 7 to 11, wherein the parameter obtaining module is configured to:

13. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to implement the steps of the image processing method according to any one of claims 1 to 6 when executing the executable instructions stored in the memory.

14. A non-transitory computer readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the processor to perform the steps of the image processing method of any of claims 1 to 6.