CN109788270B

CN109788270B - 3D-360-degree panoramic image generation method and device

Info

Publication number: CN109788270B
Application number: CN201811619904.7A
Authority: CN
Inventors: 周强; 高宏彬
Original assignee: Nanjing Magewell Electronic Technology Co ltd
Current assignee: Nanjing Meilewei Electronic Technology Co ltd
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2021-04-09
Anticipated expiration: 2038-12-28
Also published as: CN109788270A

Abstract

The present invention relates to a method and device for generating a 3D-360-degree panoramic image. The deep learning neural network method is adopted to directly generate left and right eye views by using surrounding multi-camera image pictures, and then splicing and synthesizing a 3D-360-degree panoramic picture to obtain the entire process of left and right eye views. It is realized by automatic interpolation of suitable features obtained by automatic training of the network model, without the need for additional calculation by optical flow algorithm, which greatly reduces the amount of calculation, increases the robustness of panorama generation, avoids unacceptable artifacts, and improves the speed of panorama image generation. At the same time, the image generation process can deal with scenes with complex light, close-range targets and not rich textures, high image quality, high-efficiency, high-quality and high-stability output, which can meet the needs of real-time panoramic image acquisition.

Description

3D-360-degree panoramic image generation method and device

Technical Field

The invention relates to the technical field of camera shooting and image processing, in particular to a 3D-360-degree panoramic image generation method and device.

Background

With the increasing development of Virtual Reality (VR) technology, the production of VR content is becoming a short board for the development of the whole industry. 3D-360 degree panoramic video in VR content is an important direction of video industry, and the performance of the acquisition equipment plays a crucial role in video image quality.

The 3D-360 degree panoramic image is a shot image field of view reaching 360 degrees in the horizontal direction, simultaneously contains left and right eye panoramic views with horizontal parallax, and can be respectively displayed on left and right eye screens of VR glasses, so that a three-dimensional panoramic image display effect with reality is realized.

The current 3D-360 degree panoramic image splicing implementation method comprises the following steps: a plurality of images are shot by surrounding multi-camera hardware, then optical flows are calculated for two adjacent images in the plurality of images, the optical flows are that the positions of any pixel in overlapped pictures of the two images correspond to the position of the other image, the left eye view and the right eye view are interpolated by using the pixel level corresponding relation, and if a pair of virtual cameras simulating human eyes are positioned behind the two cameras, the imaging of the visual fields of the two virtual cameras between the two cameras can be completely obtained by interpolation of the two real cameras. To achieve 3D effect, two images of the left and right eyes of the same scene must be synthesized, and a 360-degree panorama is obtained by sequentially connecting a plurality of left and right eye images synthesized by two cameras.

In the above-described conventional 3D-360 panoramic image stitching process, the link directly determining the stitching quality is optical flow calculation of adjacent images, and the pixel level correspondence of the accurate adjacent views can generate a flawless and artifact-free interpolation result. The optical flow calculation is always a difficult problem in the computer vision industry, and is easy to fail in scenes with complex light, close objects, and poor texture, and the like, and the spliced images are directly mistaken at the time. Meanwhile, optical flow calculation is to obtain the pixel level position correspondence of any pixel in an image in another view, and the calculation amount is often large, so that the panoramic real-time stitching application which aims to achieve a high frame rate is difficult, the hardware calculation cost of the panoramic camera is increased, and the large-scale application of panoramic stitching is limited.

Disclosure of Invention

The invention aims to improve the existing 3D-360-degree panoramic image generation method and provides a 3D-360-degree panoramic image generation method and a device.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

A3D-360 degree panoramic image generation method comprises the following steps:

acquiring images of cameras surrounding a plurality of cameras, one image for each camera;

preprocessing a plurality of acquired images into images meeting the input requirements of a network model; the network model is obtained by performing iterative training on a plurality of groups of image sample sets generated by the virtual camera and 3D left and right eye views corresponding to the image sample sets;

inputting the preprocessed multiple images into a network model, and calculating to obtain multiple left and right eye views;

carrying out post-processing on the obtained multiple left and right eye views, and recovering the size and the pixel value range of the original image;

splicing the plurality of left-eye views after the post-processing according to a sequence to obtain a left-eye panoramic view, splicing the plurality of right-eye views according to a sequence to obtain a right-eye panoramic view, and splicing the left-eye panoramic view and the right-eye panoramic view up and down to obtain the required panoramic image.

Further, the network model is obtained through convolutional neural network training, and the convolutional neural network comprises a plurality of first convolutional layers, a first activation function layer, a pooling layer, a plurality of anti-convolutional layers, a second convolutional layer and a second activation function layer which are sequentially connected.

Further, the training method of the network model comprises the following steps:

acquiring an image sample set generated by a virtual camera and a plurality of 3D left and right eye views corresponding to the image sample set;

preprocessing an image sample set;

inputting each group of preprocessed image sample set into a convolutional neural network, outputting a plurality of generated left and right eye views, calculating prediction errors according to the left and right eye views obtained by calculation of each group of image sample set and the left and right eye views obtained by a virtual camera, and performing iterative training on the convolutional neural network by adopting a supervised back propagation method to obtain a deep learning network model.

Further, the method for generating a plurality of left and right eye views by using the convolutional neural network comprises the following steps:

s1: performing convolution operation on the acquired image through the first convolution layer, performing nonlinear transformation on a convolution operation result through the first activation function layer, and performing pooling operation on a nonlinear transformation result through the pooling layer;

s2: repeating S1 to obtain a plurality of feature maps with descending scales;

s3: sampling the characteristic diagram result of the processing result of the front half part through the deconvolution layer to obtain a characteristic diagram with a plurality of scales rising continuously; then, connecting the first half part characteristic diagram and the second half part characteristic diagram of the network with the same scale in parallel, performing convolution operation on the processing result through a second convolution layer, and performing nonlinear transformation on the convolution operation result of the second convolution layer through a second activation function layer;

s4: the S3 is repeated to obtain the prediction results of the plurality of left and right eye views.

Further, the activation functions adopted in the first activation function layer and the second activation function layer are linear rectification functions; the pooling layer adopts a maximum pooling mode.

Further, the method for preprocessing the image or the image sample set comprises the following steps:

scaling the image to a standard size;

normalizing the scaled image pixel values so that all pixel values are between 0 and 1;

and averaging 0 of each pixel value in the normalized image.

Further, the image post-processing method comprises the following steps:

multiplying the pixel value of the image by a coefficient to restore the pixel value to an original pixel value range;

and enlarging the image restored by the numerical value range to the standard size.

Further, the method for acquiring the image sample set and the 3D left and right eye views corresponding to the image sample set by the virtual camera includes:

simulating a plurality of virtual cameras to be placed in a surrounding mode by using a VR graphic engine to form an annular virtual camera set;

setting a first group of virtual cameras to be completely horizontally arranged in equal proportion, wherein a virtual imaging scene comprises objects and textures of depth of field, and each virtual camera performs imaging independently to obtain a group of image sets;

placing a second group of two virtual cameras in the circular shape of the annular virtual camera set, wherein the second group of virtual cameras only record pixels in the vertical direction right in front of the optical center of the cameras, simulating the second group of two virtual cameras to rotate by taking the circle center of the annular virtual camera as the center, recording scanning imaging right in front of the optical center of the second group of two virtual cameras, recording a left eye view and a right eye view respectively when each scanning passes through the two peripheral virtual cameras, and recording left eye views and right eye views corresponding to any two adjacent peripheral virtual cameras when the second group of two virtual cameras are rotated by 360 degrees;

and repeating the steps to obtain a plurality of groups of image samples and left and right eye view data corresponding to the image samples.

The device of the invention is realized by the following technical scheme:

a 3D-360 degree panoramic image generation apparatus comprising:

the surrounding multi-camera image acquisition module is used for acquiring images shot by surrounding multi-cameras, and each camera acquires an image;

the network model training module is used for training a network model; the network model is obtained by performing iterative training on a plurality of groups of image sample sets generated by the virtual camera and 3D left and right eye views corresponding to the image sample sets;

the image preprocessing module is connected with the surrounding multi-camera image acquisition module and used for preprocessing the acquired image into an image meeting the input requirement of a network model;

the view prediction module is connected with the network model training module and the image preprocessing module and is used for inputting a plurality of preprocessed images into the network model generated by the network model training module to obtain a plurality of left and right eye views;

the view post-processing module is connected with the view prediction module and used for restoring the acquired multiple left and right eye views to the original image size and the pixel value range;

and the panoramic image splicing module is connected with the view post-processing module and used for splicing all the left eye views subjected to post-processing to obtain a left eye panoramic view, splicing all the right eye views subjected to post-processing to obtain a right eye panoramic view, and splicing the left eye panoramic view and the right eye panoramic view up and down to obtain the required panoramic image.

Further, the network model training module performs training based on a convolutional neural network, and comprises a plurality of first convolutional layers, a first activation function layer, a pooling layer, a plurality of anti-convolutional layers, a second convolutional layer and a second activation function layer which are connected in sequence. (ii) a

The first convolution layer performs convolution operation on the acquired image;

the first activation function layer carries out nonlinear transformation on the convolution operation result;

the pooling layer performs pooling operation on the nonlinear transformation result to obtain a characteristic diagram with a plurality of scales descending continuously;

the deconvolution layer samples the result of the characteristic diagram, and then the characteristic diagram with the same scale is connected in parallel with a new characteristic diagram obtained by sampling the characteristic diagram;

the second convolution layer carries out convolution operation on the processing result of the deconvolution layer;

and the second activation function layer performs nonlinear transformation on the convolution operation result of the second convolution layer to obtain a plurality of prediction results of left and right eye views.

The invention adopts a deep learning neural network method, utilizes surrounding multi-camera image frames to directly generate left and right eye views, and further splices and synthesizes a 3D-360-degree panoramic image, the whole process of acquiring the left and right eye views is realized by automatic interpolation of proper characteristics obtained by automatic training of a network model, optical flow algorithm calculation is not needed, the calculated amount is greatly reduced, the robustness of panoramic generation is increased, meanwhile, unacceptable artifacts are avoided, the panoramic image generation speed is improved, meanwhile, the image generation process can deal with scenes with complex light, close-range objects and insufficient textures, the image quality is high, high-efficiency, high-quality and high-stability output is realized, and the requirement of real-time panoramic image acquisition can be met.

Drawings

FIG. 1 is a view showing the structure of the apparatus of the present invention.

Detailed Description

The technical solution of the present invention will be further described with reference to the accompanying drawings and detailed description.

Example 1

This example specifically illustrates an implementation of the method of the present invention.

The 3D-360 degree panoramic image generation method comprises the following steps:

s100, acquiring images of cameras surrounding a plurality of cameras, wherein each camera has one image;

s200, preprocessing a plurality of acquired images into images meeting the input requirements of a network model; the method comprises the following steps:

scaling the image to a standard size;

and averaging 0 of each pixel value in the normalized image.

S300, inputting the preprocessed multiple images into a network model, and calculating to obtain multiple left and right eye views;

the network model is obtained through convolutional neural network training, the convolutional neural network comprises a plurality of first convolutional layers, a first activation function layer, a pooling layer, a plurality of anti-convolutional layers, a second convolutional layer and a second activation function layer which are sequentially connected, and the training method comprises the following steps:

s310, acquiring an image sample set generated by the virtual camera and a plurality of 3D left and right eye views corresponding to the image sample set;

the method for acquiring the image sample set and the 3D left and right eye views corresponding to the image sample set through the virtual camera comprises the following steps:

simulating a plurality of virtual cameras to be placed in a surrounding mode by using a VR graphic Engine (such as a non-regional Engine, Unity 3D, CryENGINE and the like) to form an annular virtual camera group;

the method comprises the steps that a first group of virtual cameras are arranged in a system and are arranged in a horizontal equal proportion completely, a virtual imaging scene comprises objects and textures of depth of field, the simulated scene comprises an indoor space and an outdoor space, the simulated objects comprise people, buildings, office supplies, trees, flowers and plants, large stadiums, parks, the sky, the sea bottom, tunnels and the like, real world textures can be attached to the virtual scene, and each virtual camera is used for imaging independently to obtain a group of image sets.

The second group of two virtual cameras are placed in the circular ring of the annular virtual camera set, the distance between the two virtual cameras is set to be 6.4cm, the two virtual cameras of the second group only record pixels in the vertical direction in front of the optical center of the cameras, the virtual cameras of the second group are simulated to rotate by taking the circle center of the annular virtual camera set as the center, scanning imaging in front of the optical center of the two virtual cameras is recorded, each scanning passes through the two peripheral virtual cameras, namely, a left eye view and a right eye view are respectively recorded, and the two virtual cameras of the second group rotate for 360 degrees, so that left eye views and right eye views corresponding to any two adjacent peripheral virtual cameras are recorded.

And repeating the process to obtain a large amount of imaging data of the surrounding multi-camera for training and left and right eye diagram data corresponding to the imaging data.

S320, preprocessing an image sample set;

the method comprises the following steps:

scaling the image to a standard size;

and averaging 0 of each pixel value in the normalized image.

S330, inputting each group of preprocessed image sample sets into a convolutional neural network, and outputting a plurality of generated left and right eye views, wherein the steps are as follows:

s331, performing convolution operation on the acquired image through the first convolution layer, performing nonlinear transformation on a convolution operation result through the first activation function layer, and performing pooling operation on a nonlinear transformation result through the pooling layer; the activation function adopted in the first activation function layer is a linear rectification function; the pooling layer adopts a maximum pooling mode;

s332, repeating S1 to obtain a plurality of feature maps with descending scales;

s333, sampling the feature map result of the processing result of the front half part through the deconvolution layer to obtain a feature map with a plurality of scales rising continuously; then, connecting the first half part characteristic diagram and the second half part characteristic diagram of the network with the same scale in parallel, performing convolution operation on the processing result through a second convolution layer, and performing nonlinear transformation on the convolution operation result of the second convolution layer through a second activation function layer; the activation function adopted in the second activation function layer is a linear rectification function;

s334 repeats S3 to obtain the prediction results of the left and right eye views.

And calculating prediction errors according to the left and right eye views obtained by calculating each group of image sample sets and the left and right eye views obtained by the virtual camera, and performing iterative training on the convolutional neural network by adopting a supervised back propagation method to obtain a deep learning network model.

S340, post-processing the left and right eye views to restore the size and the pixel value range of the original image;

the method comprises the following steps:

multiplying the pixel value of the image by a coefficient to restore the pixel value to an original pixel value range; for example, for an image with an original pixel value range of 0-255, multiplying the pixel value of the image by a coefficient of 255 to restore the image;

S500, splicing the plurality of left-eye views after the post-processing in sequence to obtain a left-eye panoramic view, splicing the plurality of right-eye views in sequence to obtain a right-eye panoramic view, and splicing the left-eye panoramic view and the right-eye panoramic view up and down to obtain the required panoramic image.

Example 2

This example specifically illustrates an implementation of the apparatus of the present invention.

The 3D-360 degree panorama image generating apparatus as shown in fig. 1 includes:

The network model training module is used for training based on a convolutional neural network and comprises a plurality of first convolution layers, a first activation function layer, a pooling layer, a plurality of deconvolution layers, a second convolution layer and a second activation function layer which are sequentially connected. (ii) a

And calculating prediction errors according to the predicted left and right eye views and the left and right eye views obtained by the virtual camera corresponding to each group of images, and performing iterative training on the convolutional neural network and the deconvolution network by adopting a supervised back propagation method to obtain a deep learning network model.

The image preprocessing module comprises:

a scaling unit scaling the image to a standard size;

the normalization unit normalizes the zoomed image pixel values to ensure that all the pixel values are between 0 and 1;

and the normalization unit is used for carrying out 0 equalization processing on each pixel value in the normalized image.

The apparatus further comprises a virtual camera module; the system is used for acquiring an image sample set required by a network model training module and left and right eye view records corresponding to the images, and simulating the disturbance of the position and orientation of the camera during imaging through the parameter setting of the virtual camera.

The view post-processing module includes:

the value range recovery unit is used for recovering all the pixel value ranges of the images in the predicted left and right eye views to an original image value range;

and a scaling unit scaling the image to a standard size.

Claims

1. a 3D-360 degree panoramic image generation method, is characterized in that, comprises the steps:

Get images of each camera surrounding a multi-camera, one image per camera;

Preprocessing the acquired images into images that meet the input requirements of the network model; wherein, the network model is obtained by iterative training including multiple groups of image sample sets generated by virtual cameras and their corresponding 3D left and right eye views; The network model is obtained through the training of a convolutional neural network, and the convolutional neural network includes a plurality of first convolutional layers, a first activation function layer, a pooling layer, a plurality of deconvolutional layers, and a second convolutional layer connected in sequence. layer and the second activation function layer;

The training method of the network model includes:

Obtain the image sample set generated by the virtual camera and its corresponding multiple 3D left and right eye views;

Preprocess the image sample set;

Input each group of preprocessed image sample sets into the convolutional neural network, and output the multiple left and right eye views generated. View to calculate the prediction error, and use the supervised backpropagation method to iteratively train the convolutional neural network to obtain a deep learning network model;

Methods for generating multiple left and right eye views using convolutional neural networks include:

S1: Perform a convolution operation on the acquired image through the first convolution layer, perform nonlinear transformation on the result of the convolution operation through the first activation function layer, and perform a pooling operation on the nonlinear transformation result through the pooling layer;

S2: Repeat S1 to obtain multiple feature maps with decreasing scales;

S3: Sampling the feature map results of the first half of the processing results through the deconvolution layer to obtain multiple feature maps with increasing scales; The convolution layer performs a convolution operation on the processing result, and performs nonlinear transformation on the convolution operation result of the second convolution layer through the second activation function layer;

S4: Repeat S3 to obtain prediction results of multiple left and right eye views;

Input the preprocessed multiple images into the network model, and calculate and obtain multiple left and right eye views;

The acquired multiple left and right eye views are post-processed to restore the original image size and pixel value range;

The left-eye panoramic view is obtained by stitching the post-processed multiple left-eye views in sequence, the right-eye panoramic view is obtained by stitching multiple right-eye views in sequence, and the left-eye panoramic view and the right-eye panoramic view are stitched up and down to obtain the desired Panoramic image.

2. The method for generating a 3D-360-degree panoramic image according to claim 1, wherein the activation function adopted in the first activation function layer and the second activation function layer is a linear rectification function; pooling method.

3. The method for generating a 3D-360-degree panoramic image according to claim 1, wherein the preprocessing method for the image or image sample set comprises:

Scale the image to standard size;

Normalize the scaled image pixel values so that all pixel values lie between 0-1;

0-means each pixel value in the normalized image.

4. The method for generating a 3D-360-degree panoramic image according to claim 1, wherein the image post-processing method comprises:

Multiply the image pixel value by the coefficient to restore it to the original image pixel value range;

Enlarges the image recovered by the numeric range to standard size.

5. The method for generating a 3D-360-degree panoramic image according to claim 1, wherein the method for obtaining an image sample set and its corresponding 3D left and right eye views through a virtual camera comprises:

Use the VR graphics engine to simulate the surrounding placement of multiple virtual cameras to form a ring-shaped virtual camera group;

Set the first group of virtual cameras so that they are completely horizontally arranged in equal proportions. The virtual imaging scene contains objects and textures with depth of field. Each virtual camera is imaged separately to obtain a set of images;

A second group of two virtual cameras is placed inside the above-mentioned annular virtual camera group circle. The second group of virtual cameras only records the pixels in the vertical direction directly in front of the optical center of the camera. The system simulates the second group of two virtual cameras with the center of the annular virtual camera as the center. Rotate and record the scanning imaging in front of the optical center of the second group of two virtual cameras. Each scan passes through the two peripheral virtual cameras, that is, a left-eye view and a right-eye view are recorded respectively, and the two virtual cameras in the second group are recorded. When the camera is rotated 360 degrees, the left and right eye views corresponding to any two adjacent peripheral virtual cameras are recorded;

Repeat the above steps to obtain multiple sets of image samples and corresponding left and right eye view data.

6. A 3D-360 degree panoramic image generation device, characterized in that, comprising:

The surround multi-camera image acquisition module is used to acquire the images captured by the surround multi-cameras, and each camera acquires one image;

a network model training module, used for training a network model; the network model is obtained by iterative training including multiple groups of image sample sets generated by virtual cameras and their corresponding 3D left and right eye views;

The network model training module is trained based on a convolutional neural network, including a plurality of first convolutional layers, a first activation function layer, a pooling layer, a plurality of deconvolutional layers, a second convolutional layer and a first convolutional layer connected in sequence. Two activation function layers;

The first convolution layer performs a convolution operation on the acquired image;

The first activation function layer performs nonlinear transformation on the result of the convolution operation;

The pooling layer performs a pooling operation on the nonlinear transformation result to obtain a plurality of feature maps with decreasing scales;

The deconvolution layer samples the feature map result, and then parallelizes the feature map of the same scale and the new feature map obtained by sampling the feature map;

The second convolution layer performs a convolution operation on the deconvolution layer processing result;

The second activation function layer performs nonlinear transformation on the convolution operation result of the second convolution layer, and obtains prediction results of multiple left and right eye views;

The image preprocessing module is connected to the surrounding multi-camera image acquisition module, and is used to preprocess the acquired image into an image that meets the input requirements of the network model;

The view prediction module is connected to the network model training module and the image preprocessing module, and is used to input multiple preprocessed images into the network model generated by the network model training module to obtain multiple left and right eye views;

The view post-processing module is connected to the view prediction module, and is used to restore the acquired multiple left and right eye views to the original image size and pixel value range;

The panoramic image stitching module is connected to the view post-processing module, which is used to stitch all the post-processed left-eye views to obtain the left-eye panoramic view, stitch all the post-processed right-eye views to obtain the right-eye panoramic view, and stitch the left-eye panoramic view. The view and the right eye panorama view are stitched up and down to obtain the desired panorama image.