[go: up one dir, main page]

Skip to main content

Flex-Convolution

Million-Scale Point-Cloud Learning Beyond Grid-Worlds

  • Conference paper
  • First Online:
Computer Vision – ACCV 2018 (ACCV 2018)

Abstract

Traditional convolution layers are specifically designed to exploit the natural data representation of images – a fixed and regular grid. However, unstructured data like 3D point clouds containing irregular neighborhoods constantly breaks the grid-based data assumption. Therefore applying best-practices and design choices from 2D-image learning methods towards processing point clouds are not readily possible. In this work, we introduce a natural generalization flex-convolution of the conventional convolution layer along with an efficient GPU implementation. We demonstrate competitive performance on rather small benchmark sets using fewer parameters and lower memory consumption and obtain significant improvements on a million-scale real-world dataset. Ours is the first which allows to efficiently process 7 million points concurrently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    \(c\in C\) represents the RGB, where we abuse notation and write C for \(\{0,1,\ldots , C-1\}\subset \mathbb {N}\) as well.

  2. 2.

    \(1_M\) is the indicator function being 1 iff \(M\ne \emptyset \).

  3. 3.

    By setting \(\mathcal {N}_9(\ell )=\{\ell - \tau |\tau \in \{-1,0,1\}^d\}\) and \(\tilde{w}_{c'}(c, \ell ^{(i)}, \ell ') = w_{c'}(c, \ell ^{(i)} - \ell ')\).

References

  1. Armeni, I., Sax, A., Zamir, A.R., Savarese, S.: Joint 2D-3D-semantic data for indoor scene understanding. arXiv e-prints, February 2017

    Google Scholar 

  2. Armeni, I., et al.: 3D semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  3. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  4. Cao, Z., Huang, Q., Karthik, R.: 3D object classification via spherical projections. In: International Conference on 3D Vision (3DV), pp. 566–574. IEEE (2017)

    Google Scholar 

  5. Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. Technical report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago (2015)

  6. Chetlur, S., et al.: cuDNN: efficient primitives for deep learning. CoRR (2014)

    Google Scholar 

  7. De Brabandere, B., Jia, X., Tuytelaars, T., Van Gool, L.: Dynamic filter networks. In: Advances in Neural Information Processing Systems (NIPS) (2016)

    Google Scholar 

  8. Groh, F., Resch, B., Lensch, H.P.A.: Multi-view continuous structured light scanning. In: Roth, V., Vetter, T. (eds.) GCPR 2017. LNCS, vol. 10496, pp. 377–388. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66709-6_30

    Chapter  Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38

    Chapter  Google Scholar 

  10. Hermosilla, P., Ritschel, T., Vázquez, P.P., Vinacua, À., Ropinski, T.: Monte Carlo convolution for learning on non-uniformly sampled point clouds. arXiv preprint arXiv:1806.01759 (2018)

  11. Hershey, S., et al.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135. IEEE (2017)

    Google Scholar 

  12. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems (NIPS), pp. 2017–2025. Curran Associates, Inc., Red Hook (2015)

    Google Scholar 

  13. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  14. Klokov, R., Lempitsky, V.: Escape from cells: deep Kd-networks for the recognition of 3D point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 863–872, October 2017

    Google Scholar 

  15. Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4013–4021 (2016)

    Google Scholar 

  16. Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: International Conference on Intelligent Robots and Systems (2015)

    Google Scholar 

  17. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  18. Qi, C.R., Su, H., Niessner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  19. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems (NIPS), pp. 5099–5108. Curran Associates, Inc., Red Hook (2017)

    Google Scholar 

  20. Riegler, G., Ulusoy, A.O., Bischof, H., Geiger, A.: OctNetFusion: learning depth fusion from data. In: International Conference on 3D Vision (3DV), October 2017

    Google Scholar 

  21. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  22. Sfikas, K., Pratikakis, I., Theoharis, T.: Ensemble of PANORAMA-based convolutional neural networks for 3D model classification and retrieval. Comput. Graph. 71, 208–218 (2017)

    Article  Google Scholar 

  23. Simonovsky, M., Komodakis, N.: Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). https://arxiv.org/abs/1704.02901

  24. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2014)

    Google Scholar 

  25. Su, H., et al.: SPLATNet: sparse lattice networks for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2530–2539 (2018)

    Google Scholar 

  26. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  27. Vasilache, N., et al.: Tensor comprehensions: framework-agnostic high-performance machine learning abstractions (2018)

    Google Scholar 

  28. Wang, W., Yu, R., Huang, Q., Neumann, U.: SGPN: similarity group proposal network for 3D point cloud instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2569–2578 (2018)

    Google Scholar 

  29. Wieschollek, P., Schölkopf, M.H.B., Lensch, H.P.A.: Learning blind motion deblurring. In: International Conference on Computer Vision (ICCV), October 2017

    Google Scholar 

  30. Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920 (2015)

    Google Scholar 

  31. Yi, L., et al.: A scalable active framework for region annotation in 3D shape collections. ACM Trans. Graph. (SIGGRAPH ASIA) 35(6), 210 (2016)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the German Research Foundation (DFG): SFB 1233, Robust Vision: Inference Principles and Neural Mechanisms, TP 01 & 02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabian Groh .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 10833 KB)

Supplementary material 2 (mp4 42449 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Groh, F., Wieschollek, P., Lensch, H.P.A. (2019). Flex-Convolution. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11361. Springer, Cham. https://doi.org/10.1007/978-3-030-20887-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20887-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20886-8

  • Online ISBN: 978-3-030-20887-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics