research-article

Open access

Stereo magnification: learning view synthesis using multiplane images

Authors:

Richard Tucker,

Noah SnavelyAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 37, Issue 4

Article No.: 65, Pages 1 - 12

https://doi.org/10.1145/3197517.3201323

Published: 30 July 2018 Publication History

Abstract

The view synthesis problem---generating novel views of a scene from known imagery---has garnered recent attention due in part to compelling applications in virtual and augmented reality. In this paper, we explore an intriguing scenario for view synthesis: extrapolating views from imagery captured by narrow-baseline stereo cameras, including VR cameras and now-widespread dual-lens camera phones. We call this problem stereo magnification, and propose a learning framework that leverages a new layered representation that we call multiplane images (MPIs). Our method also uses a massive new data source for learning view extrapolation: online videos on YouTube. Using data mined from such videos, we train a deep network that predicts an MPI from an input stereo image pair. This inferred MPI can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline. We show that our method compares favorably with several recent view synthesis methods, and demonstrate applications in magnifying narrow-baseline stereo images.

Supplementary Material

MP4 File (065-276.mp4)

Download
205.22 MB

MP4 File (a65-zhou.mp4)

Download
270.58 MB

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI.

Digital Library

[2]

Sameer Agarwal, Keir Mierle, and Others. 2016. Ceres Solver, http://ceres-solver.org. (2016).

[3]

Apple. 2016. Portrait mode now available on iPhone 7 Plus with iOS 10.1. https://www.apple.com/newsroom/2016/10/portrait-mode-now-available-on-iphone-7-plus-with-ios-101/. (2016).

[4]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv.1607.06450 (2016).

[5]

Alexandre Chapiro, Simon Heinzle, Tunç Ozan Aydin, Steven Poulakos, Matthias Zwicker, Aljosa Smolic, and Markus Gross. 2014. Optimizing stereo-to-multiview conversion for autostereoscopic displays. In Computer graphics forum.

Digital Library

[6]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2018. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. on Pattern Analysis and Machine Intelligence 40, 4 (2018).

[7]

Qifeng Chen and Vladlen Koltun. 2017. Photographic image synthesis with cascaded refinement networks. In ICCV.

[8]

Shenchang Eric Chen and Lance Williams. 1993. View Interpolation for Image Synthesis. In Proc. SIGGRAPH.

Digital Library

[9]

Paul E Debevec, Camillo J Taylor, and Jitendra Malik. 1996. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proc. SIGGRAPH.

Digital Library

[10]

Piotr Didyk, Pitchaya Sitthi-Amorn, William Freeman, Frédo Durand, and Wojciech Matusik. 2013. Joint view expansion and filtering for automultiscopic 3D displays. In Proc. SIGGRAPH.

Digital Library

[11]

Alexey Dosovitskiy and Thomas Brox. 2016. Generating images with perceptual similarity metrics based on deep networks. In NIPS.

Digital Library

[12]

Jakob Engel, Vladlen Koltun, and Daniel Cremers. 2018. Direct sparse odometry. IEEE Trans. on Pattern Analysis and Machine Intelligence 40, 3 (2018).

[13]

John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. DeepStereo: Learning to Predict New Views From the World's Imagery. In CVPR.

[14]

Christian Forster, Matia Pizzoli, and Davide Scaramuzza. 2014. SVO: Fast Semi-Direct Monocular Visual Odometry. In ICRA.

[15]

Ravi Garg and Ian Reid. 2016. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue. In ECCV.

[16]

Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. 2017. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In CVPR.

[17]

Google. 2017a. Introducing VR180 cameras, https://vr.google.com/vr180/. (2017).

[18]

Google. 2017b. Portrait mode on the Pixel 2 and Pixel 2 XL smartphones. https://research.googleblog.com/2017/10/portrait-mode-on-pixel-2-and-pixel-2-xl.html. (2017).

[19]

Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen. 1996. The Lumigraph. In Proc. SIGGRAPH.

Digital Library

[20]

Hyowon Ha, Sunghoon Im, Jaesik Park, Hae-Gon Jeon, and In So Kweon. 2016. High-quality Depth from Uncalibrated Small Motion Clip. In CVPR.

[21]

Richard Hartley and Andrew Zisserman. 2003. Multiple View Geometry in Computer Vision. Cambridge University Press.

[22]

Samuel W Hasinoff, Dillon Sharlet, Ryan Geiss, Andrew Adams, Jonathan T Barron, Florian Kainz, Jiawen Chen, and Marc Levoy. 2016. Burst photography for high dynamic range and low-light imaging on mobile cameras. In Proc. SIGGRAPH Asia.

Digital Library

[23]

Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D Photography. In Proc. SIGGRAPH Asia.

Digital Library

[24]

Michael Holroyd, Ilya Baran, Jason Lawrence, and Wojciech Matusik. 2011. Computing and fabricating multilayer models. In Proc. SIGGRAPH Asia.

Digital Library

[25]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial transformer networks. In NIPS.

Digital Library

[26]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In ECCV.

[27]

Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-Based View Synthesis for Light Field Cameras. In Proc. SIGGRAPH Asia.

Digital Library

[28]

Petr Kellnhofer, Piotr Didyk, Szu-Po Wang, Pitchaya Sitthi-Amorn, William Freeman, Fredo Durand, and Wojciech Matusik. 2017. 3DTV at Home: Eulerian-Lagrangian Stereo-to-Multiview Conversion. In Proc. SIGGRAPH.

Digital Library

[29]

Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[30]

Marc Levoy and Pat Hanrahan. 1996. Light Field Rendering. In Proc. SIGGRAPH.

Digital Library

[31]

Ziwei Liu, Raymond Yeh, Xiaoou Tang, Yiming Liu, and Aseem Agarwala. 2017. Video Frame Synthesis Using Deep Voxel Flow. In ICCV.

[32]

Lytro. 2018. Lytro. https://www.lytro.com/. (2018).

[33]

Montiel J. M. M. Mur-Artal, Raúl and Juan D. Tardós. 2015. ORB-SLAM: a Versatile and Accurate Monocular SLAM System. IEEE Trans. on Robotics 31, 5 (2015).

Digital Library

[34]

Eric Penner and Li Zhang. 2017. Soft 3D Reconstruction for View Synthesis. In Proc. SIGGRAPH Asia.

Digital Library

[35]

Thomas Porter and Tom Duff. 1984. Compositing Digital Images. In Proc. SIGGRAPH.

Digital Library

[36]

Christian Riechert, Frederik Zilly, Peter Kauff, Jens Güther, and Ralf Schäfer. 2012. Fully automatic stereo-to-multiview conversion in autostereoscopic displays. The Best of IET and IBC 4 (09 2012).

[37]

Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In CVPR.

[38]

Jonathan Shade, Steven Gortler, Li-wei He, and Richard Szeliski. 1998. Layered depth images. In Proc. SIGGRAPH.

Digital Library

[39]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[40]

Pratul P. Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. 2017. Learning to Synthesize a 4D RGBD Light Field from a Single Image. In ICCV.

[41]

Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, and Jitendra Malik. 2017. Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency. In CVPR.

[42]

Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki. 2017. Sfm-net: Learning of structure and motion from video. arXiv preprint arXiv:1704.07804 (2017).

[43]

John YA Wang and Edward H Adelson. 1994. Representing moving images with layers. IEEE Trans. on Image Processing 3, 5 (1994).

Digital Library

[44]

Zhou Wang, Alan Bovik, Hamid Sheikh, and Eero Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Trans. on Image Processing 13, 4 (2004).

Digital Library

[45]

Sven Wanner, Stephan Meister, and Bastian Goldluecke. 2013. Datasets and benchmarks for densely sampled 4d light fields. In VMV.

[46]

G. Wetzstein, D. Lanman, W Heidrich, and R. Raskar. 2011. Layered 3D: Tomographic Image Synthesis for Attenuation-based Light Field and High Dynamic Range Displays. In Proc. SIGGRAPH.

Digital Library

[47]

Wikipedia. 2017. Multiplane camera. https://en.wikipedia.org/wiki/Multiplane_camera. (2017).

[48]

Junyuan Xie, Ross B. Girshick, and Ali Farhadi. 2016. Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks. In ECCV.

[49]

Fisher Yu and David Gallup. 2014. 3D Reconstruction from Accidental Motion. In CVPR.

Digital Library

[50]

Fisher Yu and Vladlen Koltun. 2016. Multi-Scale Context Aggregation by Dilated Convolutions. In ICLR.

[51]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Networks as a Perceptual Metric. In CVPR.

[52]

Zhoutong Zhang, Yebin Liu, and Qionghai Dai. 2015. Light field from micro-baseline image pair. In CVPR.

[53]

Tinghui Zhou, Matthew Brown, Noah Snavely, and David Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In CVPR.

[54]

Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. 2016. View synthesis by appearance flow. In ECCV.

[55]

C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality Video View Interpolation Using a Layered Representation. In Proc. SIGGRAPH.

Digital Library

Cited By

Zhou YYu TZheng ZWu GZhao GJiang WFu YLiu Y(2025)ProbIBR: Fast Image-Based Rendering With Learned Probability-Guided SamplingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337215231:3(1888-1901)Online publication date: Mar-2025
https://doi.org/10.1109/TVCG.2024.3372152
Hong SKim E(2025)NeRF-DA: Neural Radiance Fields Deblurring With Active LearningIEEE Signal Processing Letters10.1109/LSP.2024.351135032(261-265)Online publication date: 2025
https://doi.org/10.1109/LSP.2024.3511350
Ming YYang XWang WChen ZFeng JXing YZhang G(2025)Benchmarking neural radiance fields for autonomous robotsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109685140:COnline publication date: 15-Jan-2025
https://dl.acm.org/doi/10.1016/j.engappai.2024.109685
Show More Cited By

Index Terms

Stereo magnification: learning view synthesis using multiplane images
1. Computing methodologies

Recommendations

Catadioptric Stereo Using Planar Mirrors

By using mirror reflections of a scene, stereo images can be captured with a single camera (catadioptric stereo). In addition to simplifying data acquisition single camera stereo provides both geometric and radiometric advantages over traditional two ...
Stereo fusion

A stereo fusion system that combines binocular and refractive stereo is presented.Our stereo fusion outperforms traditional binocular and refractive stereo.An efficient calibration method for refractive stereo is proposed. Display Omitted The performance ...
Omnivergent Stereo

The notion of a virtual camera for optimal 3D reconstruction is introduced. Instead of planar perspective images that collect many rays at a fixed viewpoint, omnivergent cameras collect a small number of rays at many different viewpoints. The resulting ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 37, Issue 4

August 2018

1670 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3197517

Issue’s Table of Contents

Copyright © 2018 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2018

Published in TOG Volume 37, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

490
Total Citations
View Citations
2,054
Total Downloads

Downloads (Last 12 months)330
Downloads (Last 6 weeks)44

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou YYu TZheng ZWu GZhao GJiang WFu YLiu Y(2025)ProbIBR: Fast Image-Based Rendering With Learned Probability-Guided SamplingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337215231:3(1888-1901)Online publication date: Mar-2025
https://doi.org/10.1109/TVCG.2024.3372152
Hong SKim E(2025)NeRF-DA: Neural Radiance Fields Deblurring With Active LearningIEEE Signal Processing Letters10.1109/LSP.2024.351135032(261-265)Online publication date: 2025
https://doi.org/10.1109/LSP.2024.3511350
Ming YYang XWang WChen ZFeng JXing YZhang G(2025)Benchmarking neural radiance fields for autonomous robotsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109685140:COnline publication date: 15-Jan-2025
https://dl.acm.org/doi/10.1016/j.engappai.2024.109685
Han Chenming 韩Wu Gaochang 吴(2024)一种融合视觉Transformer和扩散模型的单视点内窥镜手术光场重建方法(特邀)Laser & Optoelectronics Progress10.3788/LOP24127261:16(1611013)Online publication date: 2024
https://doi.org/10.3788/LOP241272
Mizuno RTakahashi KYoshida MTsutake CFujii TNagahara H(2024)[Paper] Compressive Acquisition of Light Field Video Using Aperture-Exposure-Coded CameraITE Transactions on Media Technology and Applications10.3169/mta.12.2212:1(22-35)Online publication date: 2024
https://doi.org/10.3169/mta.12.22
Zheng HXiong RWang YWu J(2024)SDF-based 3D Generative Adversarial Networks for Image and Geometry Generation2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10662644(8576-8581)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10662644
Verma UMittal U(2024)Real Time Stampede Detection System Using Computer VisionSSRN Electronic Journal10.2139/ssrn.4490376Online publication date: 2024
https://doi.org/10.2139/ssrn.4490376
Duckworth DHedman PReiser CZhizhin PThibert JLučić MSzeliski RBarron J(2024)SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene ExplorationACM Transactions on Graphics10.1145/365819343:4(1-13)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658193
Medin SLi GDu RGarbin SDavidson PWornell GBeeler TMeka A(2024)FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic FacesProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36513047:1(1-17)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3651304
Somraj NChoudhary KMupparaju SSoundararajan R(2024)Factorized Motion Fields for Fast Sparse Input Dynamic View SynthesisACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657498(1-12)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657498
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents