[go: up one dir, main page]

skip to main content
research-article
Open access

Stereo magnification: learning view synthesis using multiplane images

Published: 30 July 2018 Publication History

Abstract

The view synthesis problem---generating novel views of a scene from known imagery---has garnered recent attention due in part to compelling applications in virtual and augmented reality. In this paper, we explore an intriguing scenario for view synthesis: extrapolating views from imagery captured by narrow-baseline stereo cameras, including VR cameras and now-widespread dual-lens camera phones. We call this problem stereo magnification, and propose a learning framework that leverages a new layered representation that we call multiplane images (MPIs). Our method also uses a massive new data source for learning view extrapolation: online videos on YouTube. Using data mined from such videos, we train a deep network that predicts an MPI from an input stereo image pair. This inferred MPI can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline. We show that our method compares favorably with several recent view synthesis methods, and demonstrate applications in magnifying narrow-baseline stereo images.

Supplementary Material

MP4 File (065-276.mp4)
MP4 File (a65-zhou.mp4)

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI.
[2]
Sameer Agarwal, Keir Mierle, and Others. 2016. Ceres Solver, http://ceres-solver.org. (2016).
[3]
Apple. 2016. Portrait mode now available on iPhone 7 Plus with iOS 10.1. https://www.apple.com/newsroom/2016/10/portrait-mode-now-available-on-iphone-7-plus-with-ios-101/. (2016).
[4]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv.1607.06450 (2016).
[5]
Alexandre Chapiro, Simon Heinzle, Tunç Ozan Aydin, Steven Poulakos, Matthias Zwicker, Aljosa Smolic, and Markus Gross. 2014. Optimizing stereo-to-multiview conversion for autostereoscopic displays. In Computer graphics forum.
[6]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2018. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. on Pattern Analysis and Machine Intelligence 40, 4 (2018).
[7]
Qifeng Chen and Vladlen Koltun. 2017. Photographic image synthesis with cascaded refinement networks. In ICCV.
[8]
Shenchang Eric Chen and Lance Williams. 1993. View Interpolation for Image Synthesis. In Proc. SIGGRAPH.
[9]
Paul E Debevec, Camillo J Taylor, and Jitendra Malik. 1996. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proc. SIGGRAPH.
[10]
Piotr Didyk, Pitchaya Sitthi-Amorn, William Freeman, Frédo Durand, and Wojciech Matusik. 2013. Joint view expansion and filtering for automultiscopic 3D displays. In Proc. SIGGRAPH.
[11]
Alexey Dosovitskiy and Thomas Brox. 2016. Generating images with perceptual similarity metrics based on deep networks. In NIPS.
[12]
Jakob Engel, Vladlen Koltun, and Daniel Cremers. 2018. Direct sparse odometry. IEEE Trans. on Pattern Analysis and Machine Intelligence 40, 3 (2018).
[13]
John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. DeepStereo: Learning to Predict New Views From the World's Imagery. In CVPR.
[14]
Christian Forster, Matia Pizzoli, and Davide Scaramuzza. 2014. SVO: Fast Semi-Direct Monocular Visual Odometry. In ICRA.
[15]
Ravi Garg and Ian Reid. 2016. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue. In ECCV.
[16]
Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. 2017. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In CVPR.
[17]
Google. 2017a. Introducing VR180 cameras, https://vr.google.com/vr180/. (2017).
[18]
Google. 2017b. Portrait mode on the Pixel 2 and Pixel 2 XL smartphones. https://research.googleblog.com/2017/10/portrait-mode-on-pixel-2-and-pixel-2-xl.html. (2017).
[19]
Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen. 1996. The Lumigraph. In Proc. SIGGRAPH.
[20]
Hyowon Ha, Sunghoon Im, Jaesik Park, Hae-Gon Jeon, and In So Kweon. 2016. High-quality Depth from Uncalibrated Small Motion Clip. In CVPR.
[21]
Richard Hartley and Andrew Zisserman. 2003. Multiple View Geometry in Computer Vision. Cambridge University Press.
[22]
Samuel W Hasinoff, Dillon Sharlet, Ryan Geiss, Andrew Adams, Jonathan T Barron, Florian Kainz, Jiawen Chen, and Marc Levoy. 2016. Burst photography for high dynamic range and low-light imaging on mobile cameras. In Proc. SIGGRAPH Asia.
[23]
Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D Photography. In Proc. SIGGRAPH Asia.
[24]
Michael Holroyd, Ilya Baran, Jason Lawrence, and Wojciech Matusik. 2011. Computing and fabricating multilayer models. In Proc. SIGGRAPH Asia.
[25]
Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial transformer networks. In NIPS.
[26]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In ECCV.
[27]
Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-Based View Synthesis for Light Field Cameras. In Proc. SIGGRAPH Asia.
[28]
Petr Kellnhofer, Piotr Didyk, Szu-Po Wang, Pitchaya Sitthi-Amorn, William Freeman, Fredo Durand, and Wojciech Matusik. 2017. 3DTV at Home: Eulerian-Lagrangian Stereo-to-Multiview Conversion. In Proc. SIGGRAPH.
[29]
Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[30]
Marc Levoy and Pat Hanrahan. 1996. Light Field Rendering. In Proc. SIGGRAPH.
[31]
Ziwei Liu, Raymond Yeh, Xiaoou Tang, Yiming Liu, and Aseem Agarwala. 2017. Video Frame Synthesis Using Deep Voxel Flow. In ICCV.
[32]
Lytro. 2018. Lytro. https://www.lytro.com/. (2018).
[33]
Montiel J. M. M. Mur-Artal, Raúl and Juan D. Tardós. 2015. ORB-SLAM: a Versatile and Accurate Monocular SLAM System. IEEE Trans. on Robotics 31, 5 (2015).
[34]
Eric Penner and Li Zhang. 2017. Soft 3D Reconstruction for View Synthesis. In Proc. SIGGRAPH Asia.
[35]
Thomas Porter and Tom Duff. 1984. Compositing Digital Images. In Proc. SIGGRAPH.
[36]
Christian Riechert, Frederik Zilly, Peter Kauff, Jens Güther, and Ralf Schäfer. 2012. Fully automatic stereo-to-multiview conversion in autostereoscopic displays. The Best of IET and IBC 4 (09 2012).
[37]
Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In CVPR.
[38]
Jonathan Shade, Steven Gortler, Li-wei He, and Richard Szeliski. 1998. Layered depth images. In Proc. SIGGRAPH.
[39]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[40]
Pratul P. Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. 2017. Learning to Synthesize a 4D RGBD Light Field from a Single Image. In ICCV.
[41]
Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, and Jitendra Malik. 2017. Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency. In CVPR.
[42]
Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki. 2017. Sfm-net: Learning of structure and motion from video. arXiv preprint arXiv:1704.07804 (2017).
[43]
John YA Wang and Edward H Adelson. 1994. Representing moving images with layers. IEEE Trans. on Image Processing 3, 5 (1994).
[44]
Zhou Wang, Alan Bovik, Hamid Sheikh, and Eero Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Trans. on Image Processing 13, 4 (2004).
[45]
Sven Wanner, Stephan Meister, and Bastian Goldluecke. 2013. Datasets and benchmarks for densely sampled 4d light fields. In VMV.
[46]
G. Wetzstein, D. Lanman, W Heidrich, and R. Raskar. 2011. Layered 3D: Tomographic Image Synthesis for Attenuation-based Light Field and High Dynamic Range Displays. In Proc. SIGGRAPH.
[47]
Wikipedia. 2017. Multiplane camera. https://en.wikipedia.org/wiki/Multiplane_camera. (2017).
[48]
Junyuan Xie, Ross B. Girshick, and Ali Farhadi. 2016. Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks. In ECCV.
[49]
Fisher Yu and David Gallup. 2014. 3D Reconstruction from Accidental Motion. In CVPR.
[50]
Fisher Yu and Vladlen Koltun. 2016. Multi-Scale Context Aggregation by Dilated Convolutions. In ICLR.
[51]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Networks as a Perceptual Metric. In CVPR.
[52]
Zhoutong Zhang, Yebin Liu, and Qionghai Dai. 2015. Light field from micro-baseline image pair. In CVPR.
[53]
Tinghui Zhou, Matthew Brown, Noah Snavely, and David Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In CVPR.
[54]
Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. 2016. View synthesis by appearance flow. In ECCV.
[55]
C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality Video View Interpolation Using a Layered Representation. In Proc. SIGGRAPH.

Cited By

View all
  • (2025)ProbIBR: Fast Image-Based Rendering With Learned Probability-Guided SamplingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337215231:3(1888-1901)Online publication date: Mar-2025
  • (2025)NeRF-DA: Neural Radiance Fields Deblurring With Active LearningIEEE Signal Processing Letters10.1109/LSP.2024.351135032(261-265)Online publication date: 2025
  • (2025)Benchmarking neural radiance fields for autonomous robotsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109685140:COnline publication date: 15-Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 37, Issue 4
August 2018
1670 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3197517
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2018
Published in TOG Volume 37, Issue 4

Check for updates

Author Tags

  1. deep learning
  2. view extrapolation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)330
  • Downloads (Last 6 weeks)44
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)ProbIBR: Fast Image-Based Rendering With Learned Probability-Guided SamplingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337215231:3(1888-1901)Online publication date: Mar-2025
  • (2025)NeRF-DA: Neural Radiance Fields Deblurring With Active LearningIEEE Signal Processing Letters10.1109/LSP.2024.351135032(261-265)Online publication date: 2025
  • (2025)Benchmarking neural radiance fields for autonomous robotsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109685140:COnline publication date: 15-Jan-2025
  • (2024)一种融合视觉Transformer和扩散模型的单视点内窥镜手术光场重建方法(特邀)Laser & Optoelectronics Progress10.3788/LOP24127261:16(1611013)Online publication date: 2024
  • (2024)[Paper] Compressive Acquisition of Light Field Video Using Aperture-Exposure-Coded CameraITE Transactions on Media Technology and Applications10.3169/mta.12.2212:1(22-35)Online publication date: 2024
  • (2024)SDF-based 3D Generative Adversarial Networks for Image and Geometry Generation2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10662644(8576-8581)Online publication date: 28-Jul-2024
  • (2024)Real Time Stampede Detection System Using Computer VisionSSRN Electronic Journal10.2139/ssrn.4490376Online publication date: 2024
  • (2024)SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene ExplorationACM Transactions on Graphics10.1145/365819343:4(1-13)Online publication date: 19-Jul-2024
  • (2024)FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic FacesProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36513047:1(1-17)Online publication date: 13-May-2024
  • (2024)Factorized Motion Fields for Fast Sparse Input Dynamic View SynthesisACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657498(1-12)Online publication date: 13-Jul-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media