article

Free access

Convolutional nets and watershed cuts for real-time semantic Labeling of RGBD videos

Editors: Kevin Murphy, Bernhard Schölkopf Authors:

Camille Couprie,

Clément Farabet,

Laurent Najman,

Yann LeCunAuthors Info & Claims

The Journal of Machine Learning Research, Volume 15, Issue 1

Pages 3489 - 3511

Published: 01 January 2014 Publication History

PDF eReader Publisher Site

Abstract

This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on handcrafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. Using a frame by frame labeling, we obtain nearly state-of-the-art performance on the NYU-v2 depth data set with an accuracy of 64.5%. We then show that the labeling can be further improved by exploiting the temporal consistency in the video sequence of the scene. To that goal, we present a method producing temporally consistent superpixels from a streaming video. Among the different methods producing superpixel segmentations of an image, the graph-based approach of Felzenszwalb and Huttenlocher is broadly employed. One of its interesting properties is that the regions are computed in a greedy manner in quasi-linear time by using a minimum spanning tree. In a framework exploiting minimum spanning trees all along, we propose an efficient video segmentation approach that computes temporally consistent pixels in a causal manner, filling the need for causal and real-time applications. We illustrate the labeling of indoor scenes in video sequences that could be processed in real-time using appropriate hardware such as an FPGA.

References

[1]

Cédric Allène, Jean-Yves Audibert, Michel Couprie, and Renaud Keriven. Some links between extremum spanning forests, watersheds and min-cuts. Image and Vision Computing , 28(10):1460-1471, 2010.

[2]

César Cadena and Jana Košecka. Semantic parsing for priming object detection in RGB-D scenes. In 3rd Workshop on Semantic Perception, Mapping and Exploration, 2013.

[3]

Dan Claudiu Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, and Jürgen Schmidhuber. Flexible, high performance convolutional neural networks for image classification. In Proc. of the 22nd International Joint Conference on Artificial Intelligence, pages 1237-1242, 2011.

[4]

Dan Claudiu Ciresan, Alessandro Giusti, Luca Maria Gambardella, and Jürgen Schmidhuber. Deep neural networks segment neuronal membranes in electron microscopy images. In Conference on Neural Information Processing Systems, pages 2852-2860, 2012.

[5]

Dan Claudiu Ciresan, Alessandro Giusti, Luca Maria Gambardella, and Jürgen Schmidhuber. Mitosis detection in breast cancer histology images using deep neural networks. In MICCAI, 2013.

[6]

R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A Matlab-like environment for machines learning. In NIPS Big Learning Workshop, Sierra Nevada, Spain, 2011.

[7]

Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Analysis and Machine Intelligence, 24:603-619, 2002.

[8]

Camille Couprie. Multi-label energy minimization for object class segmentation. In 20th European Signal Processing Conference, Bucharest, Romania, August 2012.

[9]

Camille Couprie. Source code for causal graph-based video segmentation, 2013. www.esiee. fr/~coupriec/code.html.

[10]

Camille Couprie, Leo J. Grady, Laurent Najman, and Hugues Talbot. Power watershed: A unifying graph-based optimization framework. IEEE Trans. Pattern Analysis and Machine Intelligence, 33(7):1384-1399, 2011.

[11]

Camille Couprie, Clément Farabet, Yann LeCun, and Laurent Najman. Causal graph-based video segmentation. In Proc. of IEEE International Conference on Image Processing, 2013a.

[12]

Camille Couprie, Clément Farabet, Laurent Najman, and Yann LeCun. Indoor semantic segmentation using depth information. In International Conference on Learning Representations , 2013b.

[13]

Jean Cousty, Gilles Bertrand, Laurent Najman, and Michel Couprie. Watershed cuts: Minimum spanning forests and the drop of water. IEEE Trans. Pattern Analysis and Machine Intelligence, 31(8):1362-1374, 2009.

[14]

Leandro Cruz, Djalma Lucio, and Luiz Velho. Kinect and RGBD images: Challenges and applications. SIBGRAPI Tutorial, 2012.

[15]

Anat Levin Dani, Dani Lischinski, and Yair Weiss. Colorization using optimization. ACM Transactions on Graphics, 23:689-694, 2004.

[16]

Clément Farabet. Towards Real-Time Image Understanding with Convolutional Networks. PhD thesis, Université Paris Est, 2014.

[17]

Clement Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers. In Proc. of International Conference on Machine Learning, Edinburgh, Scotland, June 2012.

[18]

Clement Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. Learning hierarchical features for scene labeling. IEEE Trans. on Pattern Analysis and Machine Intelligence, 35(8):1915-1929, Aug 2013.

[19]

Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Efficient graph-based image segmentation. International Journal of Computer Vision, 59:167-181, 2004.

[20]

Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4): 193-202, 1980.

[21]

Daniel Glasner, Shiv Naga Prasad Vitaladevuni, and Ronen Basri. Contour-based joint clustering of multiple segmentations. In Proc. of IEEE Computer Vision and Pattern Recognition, pages 2385-2392, Washington, DC, USA, 2011.

[22]

Cristina Gomila and Fernand Meyer. Graph-based object tracking. In Proc. of IEEE International Conference on Image Processing, 2003.

[23]

Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan Essa. Efficient hierarchical graph-based video segmentation. In Proc. of IEEE Computer Vision and Pattern Recognition, 2010.

[24]

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.

[25]

Derek Hoiem, Alexei A. Efros, and Martial Hebert. Geometric context from a single image. In Proc. of IEEE International Conference on Computer Vision, volume 1, pages 654-661, 2005.

[26]

Navdeep Jaitly, Patrick Nguyen, Andrew Senior, and Vincent Vanhoucke. Application of pretrained deep neural networks to large vocabulary speech recognition. In Proc. of Interspeech, 2012.

[27]

Allison Janoch, Sergey Karayev, Yangqing Jia, Jonathan T. Barron, Mario Fritz, Kate Saenko, and Trevor Darrell. A category-level 3-D object dataset: Putting the kinect to work. In IEEE International Conference on Computer Vision Workshops, pages 1168-1174, 2011.

[28]

Armand Joulin, Francis Bach, and Jean Ponce. Multi-class cosegmentation. In Proc. of IEEE Computer Vision and Pattern Recognition, pages 542-549, 2012.

[29]

Hema S. Koppula, Abhishek Anand, Thorsten Joachims, and Ashutosh Saxena. Cornell-RGBD-dataset, 2009. http://pr.cs.cornell.edu/sceneunderstanding/data/data.php.

[30]

Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proc. of the IEEE, 86(11):2278-2324, nov 1998.

[31]

Yann LeCun, Fu-Jie Huang, and Leon Bottou. Learning Methods for generic object recognition with invariance to pose and lighting. In Proc. of IEEE Computer Vision and Pattern Recognition, 2004.

[32]

J. Lee, JungHwan Oh, and Sae Hwang. Clustering of video objects by graph matching. In Proc. of IEEE International Conference on Multimedia and Expo, pages 394-397, 2005.

[33]

Ian Lenz, Honglak Lee, and Ashutosh Saxena. Deep learning for detecting robotic grasp, corr abs/1301.3592s. In Robotics: Science and Systems, 2013.

[34]

Fernand Meyer. Minimum spanning forests for morphological segmentation. In Proc. of International Symposium on Mathematical Morphology, pages 77-84, 1994.

[35]

Ondrej Miksik, Daniel Munoz, J. Andrew Bagnell, and Martial Hebert. Efficient temporal consistency for streaming video scene analysis. In Proc. of the IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 2013.

[36]

O.J. Morris, M. de Jersey Lee, and A.G. Constantinides. Graph theory for image analysis: an approach based on the shortest spanning tree. Communications, Radar and Signal Processing, IEE Proceedings F, 133(2):146-152, April 1986.

[37]

Andreas C Müller and Sven Behnke. Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images. In IEEE International Conference on Robotics and Automation, Hong Kong, May, 2014.

[38]

Sylvain Paris. Edge-preserving smoothing and mean-shift segmentation of video streams. In Proc. of IEEE European Conference on Computer Vision, pages 460-473, Marseille, France, 2008.

[39]

Xiaofeng Ren, Liefeng Bo, and D. Fox. RGB-(D) scene labeling: Features and algorithms. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pages 2759 -2766, june 2012.

[40]

Maximilian Riesenhuber and Tomaso Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2:1019-1025, 1999.

[41]

Hannes Schulz and Sven Behnke. Learning object-class segmentation with convolutional neural networks. In 11th European Symposium on Artificial Neural Networks, 2012.

[42]

Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. In IEEE Trans. on Pattern Analysis and Machine Intelligence, volume 22, pages 888-905, 1997.

[43]

Nathan Silberman and Rob Fergus. Indoor scene segmentation using a structured light sensor. In 3DRR Workshop, IEEE International Conference on Computer Vision Workshops, 2011.

[44]

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from RGBD images. In Proc. of IEEE European Conference on Computer Vision, 2012.

[45]

Ali Kemal Sinop and Leo Grady. A seeded image segmentation framework unifying graph cuts and random walker which yields a new algorithm. In Proc. of International Conference of Computer Vision, 2007.

[46]

Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, and Andrew Y. Ng. Convolutional-Recursive Deep Learning for 3D Object Classification. In Advances in Neural Information Processing Systems 25, 2012.

[47]

Jörg Stückler, Benedikt Waldvogel, Hannes Schulz, and Sven Behnke. Dense real-time mapping of object-class semantics from RGB-D video. Journal of Real-Time Image Processing , pages 1-11, 2013.

[48]

Olga Veksler, Yuri Boykov, and Paria Mehrani. Superpixels and supervoxels in an energy optimization framework. In Proc. of IEEE European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11 (5), pages 211-224, 2010.

[49]

Chenliang Xu, Caiming Xiong, and Jason J. Corso. Streaming hierarchical video segmentation. In Proc. of IEEE European Conference on Computer Vision, Florence, Italy, October 7-13, pages 626-639, 2012.

Cited By

Martinson EYalla V(2021)Real-time human detection for robots using CNN with a feature-based layered pre-filter2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)10.1109/ROMAN.2016.7745248(1120-1125)Online publication date: 11-Mar-2021
https://dl.acm.org/doi/10.1109/ROMAN.2016.7745248
Martinson EYalla V(2021)Augmenting deep convolutional neural networks with depth-based layered detection for human detection2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2016.7759182(1073-1078)Online publication date: 11-Mar-2021
https://dl.acm.org/doi/10.1109/IROS.2016.7759182
Michieli UCamporese MAgiollo APagnutti GZanuttigh PConci NShan CMarcenaro LHan J(2019)Region Merging Driven by Deep Learning for RGB-D Segmentation and LabelingProceedings of the 13th International Conference on Distributed Smart Cameras10.1145/3349801.3349810(1-6)Online publication date: 9-Sep-2019
https://dl.acm.org/doi/10.1145/3349801.3349810
Show More Cited By

Convolutional nets and watershed cuts for real-time semantic Labeling of RGBD videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks

Recommendations

Watershed Cuts: Thinnings, Shortest Path Forests, and Topological Watersheds

We recently introduced watershed cuts, a notion of watershed in edge-weighted graphs. In this paper, our main contribution is a thinning paradigm from which we derive three algorithmic watershed cut strategies: The first one is well suited to parallel ...
Segmentation and semantic labelling of RGBD data with convolutional neural networks and surface fitting

We present an approach for segmentation and semantic labelling of RGBD data exploiting together geometrical cues and deep learning techniques. An initial over‐segmentation is performed using spectral clustering and a set of non‐uniform rational B‐spline ...
Deep learning-powered biomedical photoacoustic imaging
Abstract
Photoacoustic Imaging (PAI) is an emerging hybrid imaging modality that combines optical imaging and ultrasound imaging, offering advantages such as high resolution, strong contrast, and safety. Despite demonstrating superior imaging capabilities,...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 15, Issue 1

January 2014

4085 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Kevin Murphy
Google
,
Bernhard Schölkopf

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Revised: 01 May 2014

Published: 01 January 2014

Published in JMLR Volume 15, Issue 1

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
312
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)9

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Martinson EYalla V(2021)Real-time human detection for robots using CNN with a feature-based layered pre-filter2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)10.1109/ROMAN.2016.7745248(1120-1125)Online publication date: 11-Mar-2021
https://dl.acm.org/doi/10.1109/ROMAN.2016.7745248
Martinson EYalla V(2021)Augmenting deep convolutional neural networks with depth-based layered detection for human detection2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2016.7759182(1073-1078)Online publication date: 11-Mar-2021
https://dl.acm.org/doi/10.1109/IROS.2016.7759182
Michieli UCamporese MAgiollo APagnutti GZanuttigh PConci NShan CMarcenaro LHan J(2019)Region Merging Driven by Deep Learning for RGB-D Segmentation and LabelingProceedings of the 13th International Conference on Distributed Smart Cameras10.1145/3349801.3349810(1-6)Online publication date: 9-Sep-2019
https://dl.acm.org/doi/10.1145/3349801.3349810
Pagnutti GMinto LZanuttigh P(2017)Segmentation and semantic labelling of RGBD data with convolutional neural networks and surface fittingIET Computer Vision10.1049/iet-cvi.2016.050211:8(633-642)Online publication date: 20-Sep-2017
https://dl.acm.org/doi/10.1049/iet-cvi.2016.0502
Xu XLi YWu GLuo J(2017)Multi-modal deep feature learning for RGB-D object detectionPattern Recognition10.1016/j.patcog.2017.07.02672:C(300-313)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1016/j.patcog.2017.07.026
Gori MLippi MMaggini MMelacci S(2016)Semantic video labeling by developmental visual agentsComputer Vision and Image Understanding10.1016/j.cviu.2016.02.011146:C(9-26)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1016/j.cviu.2016.02.011
Sabouri PGholamHosseini H(undefined)Lesion border detection using deep learning2016 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC.2016.7743955(1416-1421)
https://dl.acm.org/doi/10.1109/CEC.2016.7743955

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents