[go: up one dir, main page]

CN114648472B - Image fusion model training method, image generation method and device - Google Patents

Image fusion model training method, image generation method and device

Info

Publication number
CN114648472B
CN114648472B CN202011504084.4A CN202011504084A CN114648472B CN 114648472 B CN114648472 B CN 114648472B CN 202011504084 A CN202011504084 A CN 202011504084A CN 114648472 B CN114648472 B CN 114648472B
Authority
CN
China
Prior art keywords
image
images
grid
mosaic
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011504084.4A
Other languages
Chinese (zh)
Other versions
CN114648472A (en
Inventor
周永翔
干宏华
杨蕊
吴增德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Public Information Industry Co ltd
Original Assignee
Zhejiang Public Information Industry Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Public Information Industry Co ltd filed Critical Zhejiang Public Information Industry Co ltd
Priority to CN202011504084.4A priority Critical patent/CN114648472B/en
Publication of CN114648472A publication Critical patent/CN114648472A/en
Application granted granted Critical
Publication of CN114648472B publication Critical patent/CN114648472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本公开涉及图像融合模型的训练方法、图像生成方法及其装置。提供了一种基于神经网络的图像融合模型的训练方法,包括:接收特定场景的M个输入图像,其中M为大于或等于3的整数;基于M个输入图像生成场景的三维全局网格;选择M个输入图像中的一个输入图像,作为参考图像;使用M个输入图像中的M‑1个非参考图像,生成针对参考图像的视角的n个马赛克图像,其中n为大于或等于2的整数并且n小于或等于M‑1;将三维全局网格和n个马赛克图像作为训练图像输入融合模型,生成与参考图像的视角相同的视角的预测图像;使用代价函数计算预测图像和所述参考图像之间的误差;以及使用误差调整图像融合模型的融合权重,以减小所述误差。

The present disclosure relates to a training method for an image fusion model, an image generation method, and an apparatus thereof. A training method for an image fusion model based on a neural network is provided, comprising: receiving M input images of a specific scene, where M is an integer greater than or equal to 3; generating a three-dimensional global grid of the scene based on the M input images; selecting one input image from the M input images as a reference image; using M-1 non-reference images from the M input images to generate n mosaic images for the perspective of the reference image, where n is an integer greater than or equal to 2 and n is less than or equal to M-1; inputting the three-dimensional global grid and the n mosaic images as training images into a fusion model to generate a predicted image with the same perspective as the reference image; calculating an error between the predicted image and the reference image using a cost function; and adjusting a fusion weight of the image fusion model using the error to reduce the error.

Description

Training method of image fusion model, image generation method and device thereof
Technical Field
The present disclosure relates generally to training methods for image fusion models, image generation methods, devices and media therefor.
Background
There is an increasing demand for real-time, realistic, easy to capture 3D content suitable for freehand, interactive navigation. In the case where images of a plurality of views (or viewpoints) of a scene have been obtained, it is desirable to easily obtain images of different views from the plurality of views.
Disclosure of Invention
The following presents a simplified summary of the disclosure in order to provide a basic understanding of some aspects of the disclosure. It should be understood that this summary is not an exhaustive overview of the disclosure. It is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. Its purpose is to present some concepts related to the disclosure in a simplified form as a prelude to the more detailed description that is presented later.
According to one aspect of the disclosure, a training method of an image fusion model based on a neural network is provided, comprising receiving M input images of a specific scene, wherein M is an integer greater than or equal to 3, generating a three-dimensional global grid of the scene based on the M input images, selecting one of the M input images as a reference image, generating n mosaic images for a view angle of the reference image using M-1 non-reference images of the M input images, wherein n is an integer greater than or equal to 2 and n is less than or equal to M-1, inputting the three-dimensional global grid and the n mosaic images as training images into the fusion model, generating a predicted image of the same view angle as that of the reference image, calculating an error between the predicted image and the reference image using a cost function, and adjusting a fusion weight of the image fusion model using the error to reduce the error.
According to one aspect of the present disclosure, there is provided an image generation method including receiving L input images of a specific scene, where L is an integer greater than or equal to 2, generating a three-dimensional global grid of the scene based on the L input images, selecting a new view angle different from a view angle of the L input images, generating n mosaic images for the new view angle using the L input images, where n is an integer greater than or equal to 2 and n is less than or equal to L, and inputting the three-dimensional global grid and the n mosaic images into an image fusion model obtained according to the above method, generating a predicted image of the new view angle.
According to another aspect of the present disclosure, there is provided a training apparatus of a neural network-based image fusion model, including a memory having instructions stored thereon, and a processor configured to execute the instructions stored on the memory to perform the above-described training method of the neural network-based image fusion model.
According to another aspect of the present disclosure, there is provided an image generating apparatus including a memory having instructions stored thereon, and a processor configured to execute the instructions stored on the memory to perform the above-described image generating method.
According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium comprising computer-executable instructions which, when executed by one or more processors, cause the one or more processors to perform the method according to any of the above aspects of the present disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
Fig. 1 shows one example of a scenario to which the present disclosure is to be applied.
Fig. 2 shows an example of an image and its three-dimensional global grid.
Fig. 3 shows an example of a generation flow of a mosaic image.
Fig. 4 is a schematic diagram for explaining mosaic generation of different meshes of a mosaic image.
Fig. 5 is a schematic diagram for explaining mosaic generation of different meshes of a mosaic image.
Fig. 6 illustrates a flowchart example of a training method of a neural network-based image fusion model according to the present disclosure.
Fig. 7 illustrates an example of a convolutional neural network architecture according to the present disclosure.
Fig. 8 shows an example of a prediction process of a new view image.
Fig. 9 illustrates an example of a training and fusion process of a neural network-based image fusion model, according to one embodiment of the present disclosure.
FIG. 10 illustrates an exemplary configuration of a computing device in which embodiments according to the present disclosure may be implemented.
Detailed Description
The following detailed description is made with reference to the accompanying drawings and is provided to assist in a comprehensive understanding of various example embodiments of the disclosure. The following description includes various details to aid in understanding, but these are to be considered merely examples and are not intended to limit the disclosure, which is defined by the appended claims and their equivalents. The words and phrases used in the following description are only intended to provide a clear and consistent understanding of the present disclosure. In addition, descriptions of well-known structures, functions and configurations may be omitted for clarity and conciseness. Those of ordinary skill in the art will recognize that various changes and modifications of the examples described herein can be made without departing from the spirit and scope of the present disclosure.
In three-dimensional event live broadcast, three-dimensional real-time video monitoring, three-dimensional tourism and other scenes, the fusion of images is often required. Taking live video broadcasting as an example, as shown in fig. 1, after capturing real-time video by cameras around a football field, it is desirable to fuse three-dimensional video images according to the viewing angle of a client and transmit the fused three-dimensional video images to the client.
To this end, the present disclosure proposes an image fusion technique. For example, for the application scenario shown in fig. 1, by the fusion technique of the present disclosure, a football game is watched as if it were in it, and the spectator can watch the game from any angle, can run in the virtual scenario, and the images seen are different as they are in different positions in three-dimensional space.
A training method of an image fusion model based on a neural network according to one embodiment of the present disclosure includes receiving M input images of a specific scene, where M is an integer greater than or equal to 3, generating a three-dimensional global grid of the scene based on the M input images, selecting one of the M input images as a reference image, generating n mosaic images for a view angle of the reference image using M-1 non-reference images of the M input images, where n is an integer greater than or equal to 2 and n is less than or equal to M-1, inputting the three-dimensional global grid and the n mosaic images as training images into the image fusion model, generating a prediction image of the same view angle as the view angle of the reference image, calculating an error between the prediction image and the reference image using a cost function, and adjusting a fusion weight of the image fusion model using the error to reduce the error.
In one embodiment, the method may further include iterating the steps of generating the predicted image, calculating the error, and adjusting the fusion weight until the error is less than a predetermined value or the number of iterations reaches a predetermined number.
In the step of receiving M input images of a particular scene, after a set of images is collected, the present disclosure determines spatial and geometric relationships (SFM) of objects by movement of a camera and generates a three-dimensional global grid. Fig. 2 shows an example of an image and its three-dimensional global grid. Alternatively, in generating the three-dimensional global grid, the image depth may also be calculated by multi-view stereo vision (MVS) and a local depth map is built.
In order to generate an image of a new view from an image photographed from an existing view, the present disclosure needs to generate n mosaic images in addition to the three-dimensional global grid shown in fig. 3. n is an integer greater than or equal to 2 and n is less than or equal to M-1.
In one embodiment, the step of generating n mosaic images for the view angles of the reference images may include, for each of the three-dimensional global grids, calculating the weights of the M-1 non-reference images on that grid, selecting the n non-reference images with higher weights, obtaining warped projections of the n non-reference images on that grid, and generating the n mosaic images using the warped projections of each grid, wherein the pixels at each grid of a first mosaic image of the n mosaic images are obtained by warping projections of the pixels corresponding to that grid of the non-reference image with the highest weight on that grid, the pixels at each grid of a second mosaic image of the n mosaic images are obtained by warping projections of the pixels corresponding to that grid of the second highest weight non-reference image on that grid, and so on.
Fig. 3 shows an example of a generation flow of a mosaic image. Fig. 4 is a schematic diagram for explaining mosaic generation of different meshes of a mosaic image.
In the following description, n is 4 (i.e., four mosaic images are obtained) as an example. However, n may be set to any value greater than 1.
In fig. 4, it is assumed that five images of camera 1, camera 2, camera 3, camera 4, and camera 5 have now been obtained. We need to calculate the four mosaic images with the highest weights for the new view x.
The triangular mesh m (denoted as t m) is one of many meshes in the three-dimensional mesh established through the above procedure. As an example, a cosine value of an angle between a normal of a lens of a camera that obtains a specific image and a normal of a grid in a three-dimensional global grid may be used as a weight of the specific image on the grid. For example, a mosaic with the highest priority (weight) may be selected according to the magnitude of the weight W cm, which is the cosine of the angle between the normal line of the lens of camera c and the normal line of t m. However, other parameters may be used as the weight W cm.
Taking triangular grid m of fig. 4 as an example, in the case of using cosine of the included angle as a weight, the weight of camera 1 at grid m > the weight of camera 2 at grid m > the weight of camera 3 at grid m > the weight of camera 4 at grid m > the weight of camera 5 at grid m, i.e. W 1m>W2m>W3m>W4m>W5m. Also taking triangular mesh n as an example, W 3n>W4n>W5n>W2n>W1n.
In the new view direction x, if the projection of the pixels of the image shot by the camera c in the grid m area is p cm, and then the calculated weights are combined, four mosaic images with highest weights can be obtained. The projections of grid m and grid n on these four mosaic images are shown in fig. 5. The m-triangular meshes and n-triangular meshes of the mosaic image of the first priority (i.e. the first mosaic image) are respectively projected by pixel distortions of the triangular meshes corresponding to the camera 1 and the camera 3, the m-triangular meshes and n-triangular meshes of the mosaic image of the second priority (i.e. the second mosaic image) are respectively projected by pixel distortions of the triangular meshes corresponding to the camera 2 and the camera 4, and so on.
The foregoing procedure gives a global three-dimensional grid of three-dimensional images, and 4 mosaic images of high priority, totaling 5 images. The neural network based image fusion model is then trained with these 5 images as input. In the present disclosure, the image fusion model may be based on any neural network and any combination of neural networks, including forward neural networks, recurrent neural networks, convolutional neural networks, deep belief networks, generate countermeasure networks, and so forth. Hereinafter, a Convolutional Neural Network (CNN) will be described as an example only.
The training process is shown in fig. 6. First, a global grid is generated from M images of a particular scene at one time, which is used multiple times for the next training. And then selecting M-1 images from the M images to generate four mosaic images, and inputting the four mosaic images and the global grid into a forward neural network together to obtain a predicted image of a new view angle. The neural network includes image fusion weights. Initially, these image fusion weights may have any initial value. After the predicted image of the new view is obtained, a training loss is calculated from the difference between the predicted image and the reference image. Then, the image fusion weight of the neural network model is adjusted through the backward neural network (or back propagation).
The foregoing procedure is merely a training procedure of the deep neural network, and a specific convolutional neural network architecture example will be described below, and the architecture includes a left image contraction path and a right image expansion path as shown in fig. 7. Note that as previously described, the present disclosure may employ any neural network other than convolutional neural networks.
In the example shown in fig. 7, the systolic path may employ a typical convolutional neural network architecture. For example, the following operations may be performed:
(1) The image is subjected to a 3x3 convolution operation and then a ReLU operation.
(2) The first step is repeatedly performed on the result of the operation of step (1).
(3) Downsampling is further achieved by a max-pooling operation with a span of 2 and a window size of 2x 2.
(4) Returning to step (1), a plurality of similar operations are performed.
The right image expansion path in fig. 7 may take a similar step as the contraction path, but with an increased number of feature channels. The upsampling of the extension path consists of the following parts:
(1) A window size of 2x2 is convolved up operation which reduces the number of channels.
(2) A first 3x3 convolution operation is performed, followed by a ReLU operation.
(3) And (3) repeatedly executing the step (2).
(4) Returning to step (1), a plurality of similar operations are performed.
Since some edge pixels are lost in each convolution operation, for example, the input image with the number of pixels 572x572 is processed as described above, and the output resolution is 388x388, and 92 pixels are clipped around the image. Needless to say, the above-described pixel resolution and number of clipped pixels are merely examples, and any other suitable resolution and number of clipped pixels may be employed by the present disclosure.
The new view image prediction process of the forward neural network is shown in fig. 8, and the relevant parameters are exemplified by the image size shown in fig. 7 (the input image size is 572x572, and the output image is 388x 388):
(1) After the input image is processed by the method disclosed by the disclosure, a global grid map and 4 mosaic images with highest priority (weight) are obtained, and 5 images are obtained in total.
(2) These 5 images are then input into a CNN neural network (i.e., an image fusion model based on the neural network) that contains the image fusion weights.
(3) A pixel-level weighted sum is made for the 5 images input. The method comprises the following steps:
a. The ith row and jth column values of the mth frame of the image fusion weights are r mij, where 0< m <6, (0, 0) < (i, j) < (388)
B. The ith row and jth column values of the mth frame of the cropped image are c mij, where 0< m <6, (0, 0) < (i, j) < (388)
C. The pixel value of the ith row and jth column of the predicted image is
Where k=n+1.
(4) Finally, obtaining the image prediction result of the new visual angle.
The three-dimensional image training process and depth fusion process are shown in fig. 9. The upper left-hand corner plus dashed box region is a new view image prediction process depicted in fig. 8 that yields a prediction result via a neural network (e.g., a forward neural network). For M input images of a complete training process for a particular scene, the complete training and fusion process is shown in fig. 9:
(1) First, building global grid from M images
(2) Selecting 1 image as original reference image, and taking visual angle of the image as new visual angle
(3) Generating, for example, 4 highest priority mosaic images from the remaining M-1 images
(4) With the global grid of step (1) and the 4 mosaic images of step (3) as inputs, an image prediction result of a new view in step (2) is generated by an image prediction process such as a convolutional neural network shown in fig. 8. There is a predetermined number of cropping pixels around each of the predicted result images compared to 5 input images.
(5) And (2) clipping the preset number of pixels in the step (4) from the periphery of the original reference image in the step (2) to obtain a clipped reference image.
(6) And (3) inputting the image prediction result in the step (4) and the cut reference image in the step (5) into a cost function to obtain an error.
(7) And adjusting the weight of the neural network through the reverse weight adjustment process of the convolutional neural network so as to reduce the prediction error.
Obviously, as previously described, an iterative process may be performed for the above steps to continually adjust the neural network weights, thereby continually reducing errors.
As described above, in the case where the neural network is a convolutional neural network, the peripheries of the prediction image and the reference image may be clipped by a predetermined number of pixels as compared to the input image, and the error may be calculated using the clipped prediction image and reference image.
In one embodiment, the training method may further include training the neural network-based image fusion model multiple times by changing an image serving as a reference image among the M input images, or using different M input images.
Further, the disclosure may also include an image generation method including receiving L input images of a particular scene, where L is an integer greater than or equal to 2, generating a three-dimensional global grid of the scene based on the L input images, selecting a new view angle different from a view angle of the L input images, generating n mosaic images for the new view angle using the L input images, where n is an integer greater than or equal to 2 and n is less than or equal to L, and inputting the three-dimensional global grid and the n mosaic images into an image fusion model obtained by the method as described above, generating a predicted image of the new view angle.
Fig. 10 illustrates an exemplary configuration of a computing device 1200 capable of implementing embodiments in accordance with the present disclosure.
Computing device 1200 is an example of a hardware device that can employ the above aspects of the disclosure. Computing device 1200 may be any machine configured to perform processing and/or calculations. Computing device 1200 may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a Personal Data Assistant (PDA), a smart phone, an in-vehicle computer, or a combination thereof.
As shown in fig. 10, computing device 1200 may include one or more elements that may be connected to or in communication with a bus 1202 via one or more interfaces. The bus 1202 may include, but is not limited to, an industry standard architecture (Industry Standard Architecture, ISA) bus, a micro channel architecture (Micro Channel Architecture, MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus. Computing device 1200 may include, for example, one or more processors 1204, one or more input devices 1206, and one or more output devices 1208. The one or more processors 1204 may be any kind of processor and may include, but are not limited to, one or more general purpose processors or special purpose processors (such as special purpose processing chips). The processor 1204 may be configured to implement, for example, a training method or an image generation method as described above. Input device 1206 may be any type of input device capable of inputting information to a computing device, and may include, but is not limited to, a mouse, keyboard, touch screen, microphone, and/or remote controller. The output device 1208 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers.
The computing device 1200 may also include or be connected to a non-transitory storage device 1214, which non-transitory storage device 1214 may be any storage device that is non-transitory and that may enable data storage, and may include, but is not limited to, disk drives, optical storage devices, solid state memory, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic medium, compact disk or any other optical medium, cache memory and/or any other memory chip or module, and/or any other medium from which a computer may read data, instructions, and/or code. Computing device 1200 may also include Random Access Memory (RAM) 1210 and Read Only Memory (ROM) 1212. The ROM 1212 may store programs, utilities or processes to be executed in a non-volatile manner. The RAM 1210 may provide volatile data storage and stores instructions related to the operation of the computing device 1200. The computing device 1200 may also include a network/bus interface 1216 coupled to the data link 1218. The network/bus interface 1216 can be any kind of device or system capable of enabling communication with external equipment and/or networks, and can include, but is not limited to, modems, network cards, infrared communication devices, wireless communication devices, and/or chipsets (such as bluetooth TM devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication facilities, etc.).
The present disclosure may be implemented as any combination of apparatuses, systems, integrated circuits, and computer programs on a non-transitory computer readable medium. One or more processors may be implemented as an Integrated Circuit (IC), application Specific Integrated Circuit (ASIC), or large scale integrated circuit (LSI), system LSI, super LSI, or ultra LSI assembly that performs some or all of the functions described in this disclosure.
The present disclosure includes the use of software, applications, computer programs, or algorithms. The software, application, computer program or algorithm may be stored on a non-transitory computer readable medium to cause a computer, such as one or more processors, to perform the steps described above and depicted in the drawings. For example, one or more memories may store software or algorithms in executable instructions and one or more processors may associate a set of instructions to execute the software or algorithms to provide various functions in accordance with the embodiments described in this disclosure.
The software and computer programs (which may also be referred to as programs, software applications, components, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural, object-oriented, functional, logical, or assembly or machine language. The term "computer-readable medium" refers to any computer program product, apparatus or device, such as magnetic disks, optical disks, solid state memory devices, memory, and Programmable Logic Devices (PLDs), for providing machine instructions or data to a programmable data processor, including computer-readable media that receives machine instructions as a computer-readable signal.
By way of example, computer-readable media can comprise Dynamic Random Access Memory (DRAM), random Access Memory (RAM), read Only Memory (ROM), electrically erasable read only memory (EEPROM), compact disk read only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired computer-readable program code in the form of instructions or data structures and that can be accessed by a general purpose or special purpose computer or general purpose or special purpose processor. Disk or disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above are also included within the scope of computer-readable media.
The present disclosure provides a holographic image generation apparatus based on, for example, a convolutional neural network. By re-projecting a set of input images from different views to a new view, a weighted mosaic list from the different views is created. The best candidate set for each pixel is then selected and input into the CNN model for fusion. Multiple images through different scenes are used as training data sets. For a certain scene, one image is used as a reference view, and the image is generated by fusion of other images through a CNN neural network, so that training is realized.
The subject matter of the present disclosure is provided as examples of apparatuses, systems, methods, and programs for performing the features described in the present disclosure. Other features or variations in addition to those described above are contemplated. It is contemplated that the implementation of the components and functions of the present disclosure may be accomplished with any emerging technology that may replace any of the above-described implementation technologies.
In addition, the foregoing description provides examples without limiting the scope, applicability, or configuration set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the spirit and scope of the disclosure. Various embodiments may omit, replace, or add various procedures or components as appropriate. For example, features described with respect to certain embodiments may be combined in other embodiments.
In addition, in the description of the present disclosure, the terms "first," "second," "third," etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or order.
Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous.

Claims (9)

1. A training method of an image fusion model based on a neural network comprises the following steps:
receiving M input images of a specific scene, wherein M is an integer greater than or equal to 3;
generating a three-dimensional global grid of the scene based on the M input images;
selecting one of the M input images as a reference image;
Generating n mosaic images for the view angle of the reference image using M-1 non-reference images in the M input images, comprising calculating the weight of the M-1 non-reference images on each grid in the three-dimensional global grid, selecting n non-reference images with higher weight, obtaining the distortion projection of the n non-reference images on the grid, generating the n mosaic images using the distortion projection of each grid, wherein the pixel at each grid in the first mosaic image in the n mosaic images is obtained by performing the distortion projection on the pixel corresponding to the grid of the non-reference image with highest weight on the grid, the pixel at each grid in the second mosaic image in the n mosaic images is obtained by performing the distortion projection on the pixel corresponding to the grid of the non-reference image with second higher weight on the grid, and so on, wherein n is an integer greater than or equal to 2 and n is equal to or less than M-1;
Inputting the three-dimensional global grid and n mosaic images as training images into the image fusion model to generate a predicted image with the same viewing angle as that of the reference image;
calculating an error between the predicted image and the reference image using a cost function, and
And adjusting the fusion weight of the fusion model by using the error to reduce the error.
2. Training method according to claim 1, wherein cosine values of angles between the normal of the lens of the camera obtaining the non-reference image and the normal of the grid in the three-dimensional global grid are used as weights of the non-reference image on the grid.
3. The training method of claim 1, wherein the neural network is a convolutional neural network, and
The prediction image and the reference image are clipped by a predetermined number of pixels around compared to the input image, and the error is calculated using the clipped prediction image and reference image.
4. A training method according to claim 3, wherein the predicted image is represented by the following formula:
Where o ij denotes a pixel value of an ith row and a jth column of the predicted image, k=n+1, c mij denotes a pixel value of an ith row and a jth column of an mth image of the n mosaic images and a three-dimensional global grid after clipping the predetermined number of pixels, And representing fusion weights corresponding to the ith row and the jth column of the mth image in the cut three-dimensional global grid and the n mosaic images.
5. The training method according to claim 1, wherein the steps of generating the predicted image, calculating the error, and adjusting the fusion weight are iterated until the error is smaller than a predetermined value or the number of iterations reaches a predetermined number.
6. An image generation method, comprising:
receiving L input images of a specific scene, wherein L is an integer greater than or equal to 2;
Generating a three-dimensional global grid of the scene based on the L input images;
selecting a new view angle different from the view angles of the L input images;
generating n mosaic images for the new view using the L input images, where n is an integer greater than or equal to 2 and n is less than or equal to L;
Inputting the three-dimensional global grid and n mosaic images into an image fusion model obtained by the method according to any one of claims 1 to 5, generating a predicted image of the new view.
7. A training device of an image fusion model based on a neural network, comprising:
A memory having instructions stored thereon, and
A processor configured to execute instructions stored on the memory to perform the method of any one of claims 1 to 5.
8. An image generating apparatus comprising:
A memory having instructions stored thereon, and
A processor configured to execute instructions stored on the memory to perform the method of claim 6.
9. A computer-readable storage medium comprising computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of any of claims 1-6.
CN202011504084.4A 2020-12-18 2020-12-18 Image fusion model training method, image generation method and device Active CN114648472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011504084.4A CN114648472B (en) 2020-12-18 2020-12-18 Image fusion model training method, image generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011504084.4A CN114648472B (en) 2020-12-18 2020-12-18 Image fusion model training method, image generation method and device

Publications (2)

Publication Number Publication Date
CN114648472A CN114648472A (en) 2022-06-21
CN114648472B true CN114648472B (en) 2025-08-05

Family

ID=81990710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011504084.4A Active CN114648472B (en) 2020-12-18 2020-12-18 Image fusion model training method, image generation method and device

Country Status (1)

Country Link
CN (1) CN114648472B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272606A (en) * 2021-04-30 2022-11-01 浙江省公众信息产业有限公司 Image generation method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6522787B1 (en) * 1995-07-10 2003-02-18 Sarnoff Corporation Method and system for rendering and combining images to form a synthesized view of a scene containing image information from a second image
JP2019016230A (en) * 2017-07-07 2019-01-31 日本電信電話株式会社 Learning device, image combining device, learning method, image combining method, and program

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5963664A (en) * 1995-06-22 1999-10-05 Sarnoff Corporation Method and system for image combination using a parallax-based technique
US5757424A (en) * 1995-12-19 1998-05-26 Xerox Corporation High-resolution video conferencing system
US20180192033A1 (en) * 2016-12-30 2018-07-05 Google Inc. Multi-view scene flow stitching

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6522787B1 (en) * 1995-07-10 2003-02-18 Sarnoff Corporation Method and system for rendering and combining images to form a synthesized view of a scene containing image information from a second image
JP2019016230A (en) * 2017-07-07 2019-01-31 日本電信電話株式会社 Learning device, image combining device, learning method, image combining method, and program

Also Published As

Publication number Publication date
CN114648472A (en) 2022-06-21

Similar Documents

Publication Publication Date Title
CN112686824B (en) Image correction method, apparatus, electronic device, and computer-readable medium
US12148123B2 (en) Multi-stage multi-reference bootstrapping for video super-resolution
CN110264509B (en) Method, apparatus, and storage medium for determining pose of image capturing device
CN112750085B (en) Image restoration methods and image restoration devices
JP7123256B2 (en) Video image processing method and apparatus
KR101502362B1 (en) Image processing apparatus and method
CN113850833A (en) Video Frame Segmentation Using Reduced Resolution Neural Networks and Masks from Previous Frames
CN105635588B (en) A kind of digital image stabilization method and device
KR101583155B1 (en) Non-dyadic lens distortion correction method and apparatus
EP3857499A1 (en) Panoramic light field capture, processing and display
KR102801383B1 (en) Image restoration method and device
CN116547712A (en) Image restoration method and device
CN116823601A (en) Image stitching method, device, equipment and storage medium
WO2021003807A1 (en) Image depth estimation method and device, electronic apparatus, and storage medium
WO2021031210A1 (en) Video processing method and apparatus, storage medium, and electronic device
Greisen et al. Algorithm and VLSI architecture for real-time 1080p60 video retargeting
CN114648472B (en) Image fusion model training method, image generation method and device
KR102861119B1 (en) Method and Apparatus for Video Inpainting
Lee et al. Fast-rolling shutter compensation based on piecewise quadratic approximation of a camera trajectory
WO2022000266A1 (en) Method for creating depth map for stereo moving image and electronic device
CN121040039A (en) Multi-view image generation methods, training data acquisition methods, and neural network training methods
WO2024125245A1 (en) Panoramic image processing method and apparatus, and electronic device and storage medium
CN112822442A (en) Heat map generation method and device and electronic equipment
CN114862676B (en) Panoramic image and panoramic video super-resolution processing method, equipment and product
CN119232998B (en) Video processing methods, apparatus, electronic devices and storage media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant