Part2Object: Hierarchical Unsupervised 3D Instance Segmentation

Cheng Shi¹³,
Yulin Zhang¹³,
Bin Yang¹³,
Jiajin Tang¹³,
Yuexin Ma¹³ &
…
Sibei Yang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15076))

Included in the following conference series:

European Conference on Computer Vision

374 Accesses

Abstract

Unsupervised 3D instance segmentation aims to segment objects from a 3D point cloud without any annotations. Existing methods face the challenge of either too loose or too tight clustering, leading to under-segmentation or over-segmentation. To address this issue, we propose Part2Object, hierarchical clustering with object guidance. Part2Object employs multi-layer clustering from points to object parts and objects, allowing objects to manifest at any layer. Additionally, it extracts and utilizes 3D objectness priors from temporally consecutive 2D RGB frames to guide the clustering process. Moreover, we propose Hi-Mask3D to support hierarchical 3D object part and instance segmentation. By training Hi-Mask3D on the objects and object parts extracted from Part2Object, we achieve consistent and superior performance compared to state-of-the-art models in various settings, including unsupervised instance segmentation, data-efficient fine-tuning, and cross-dataset generalization. Code is release at https://github.com/ChengShiest/Part2Object.

C. Shi and Y. Zhang–Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semantic segmentation-assisted instance feature fusion for multi-level 3D part instance segmentation

Article Open access 30 June 2023

Learning Regional Purity for Instance Segmentation on 3D Point Clouds

SAM-Guided Graph Cut for 3D Instance Segmentation

References

Adams, R., Bischof, L.: Seeded region growing. IEEE Trans. Pattern Anal. Mach. Intell. 16(6), 641–647 (1994)
Article Google Scholar
Amir, S., Gandelsman, Y., Bagon, S., Dekel, T.: Deep VIT features as dense visual descriptors. arXiv preprint arXiv:2112.05814 (2021)
An, D., et al.: ETPNav: evolving topological planning for vision-language navigation in continuous environments. arXiv preprint arXiv:2304.03047 (2023)
Armeni, I., et al.: 3D semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1534–1543 (2016)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Chen, R., et al.: Towards label-free scene understanding by vision foundation models. In: Thirty-Seventh Conference on Neural Information Processing Systems (2023)
Google Scholar
Chen, S., Fang, J., Zhang, Q., Liu, W., Wang, X.: Hierarchical aggregation for 3d instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15467–15476 (2021)
Google Scholar
Chen, Z., Yin, K., Fisher, M., Chaudhuri, S., Zhang, H.: BAE-NET: branched autoencoder for shape co-segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8490–8499 (2019)
Google Scholar
Chibane, J., Engelmann, F., Anh Tran, T., Pons-Moll, G.: Box2Mask: weakly supervised 3D semantic instance segmentation using bounding boxes. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13691, pp. 681–699. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_39
Chapter Google Scholar
Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convnets: Minkowski convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3075–3084 (2019)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR). IEEE (2017)
Google Scholar
Dai, Q., Yang, S.: Curriculum point prompting for weakly-supervised referring segmentation (2024)
Google Scholar
Deng, R., Shen, C., Liu, S., Wang, H., Liu, X.: Learning to predict crisp boundaries. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 570–586. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_35
Chapter Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)
Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vision 59, 167–181 (2004)
Article Google Scholar
Geng, H., et al.: GAPartNet: cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7081–7091 (2023)
Google Scholar
Ghiasi, G., Gu, X., Cui, Y., Lin, T.Y.: Open-vocabulary image segmentation. arXiv preprint arXiv:2112.12143 (2021)
Han, L., Zheng, T., Xu, L., Fang, L.: OccuSeg: occupancy-aware 3D instance segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2937–2946 (2020). https://doi.org/10.1109/CVPR42600.2020.00301
Hou, J., Dai, A., Nießner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4421–4430 (2019)
Google Scholar
Hou, J., Graham, B., Nießner, M., Xie, S.: Exploring data-efficient 3D scene understanding with contrastive scene contexts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15587–15597 (2021)
Google Scholar
Hu, Q., et al.: RandLA-Net: efficient semantic segmentation of large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11108–11117 (2020)
Google Scholar
Huang, S., et al.: Diffusion-based generation, optimization, and planning in 3D scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16750–16761 (2023)
Google Scholar
Huang, Z., Wu, X., Chen, X., Zhao, H., Zhu, L., Lasenby, J.: OpenIns3D: snap and lookup for 3D open-vocabulary instance segmentation. arXiv preprint arXiv:2309.00616 (2023)
Hui, L., Tang, L., Shen, Y., Xie, J., Yang, J.: Learning superpoint graph cut for 3D instance segmentation. In: NeurIPS (2022)
Google Scholar
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Kolodiazhnyi, M., Rukhovich, D., Vorontsova, A., Konushin, A.: Top-down beats bottom-up in 3D instance segmentation (2023). https://doi.org/10.48550/ARXIV.2302.02871. https://arxiv.org/abs/2302.02871
Li, B., Weinberger, K.Q., Belongie, S., Koltun, V., Ranftl, R.: Language-driven semantic segmentation. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=RriDjddCLN
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Liu, J., Yu, M., Ni, B., Chen, Y.: Self-prediction for joint instance and semantic segmentation of point clouds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XXII. LNCS, vol. 12367, pp. 187–204. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_12
Chapter Google Scholar
Liu, Y., et al.: Segment any point cloud sequences by distilling vision foundation models. arXiv preprint arXiv:2306.09347 (2023)
McInnes, L., Healy, J.: Accelerated hierarchical density based clustering. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 33–42. IEEE (2017)
Google Scholar
Nunes, L., et al.: Unsupervised class-agnostic instance segmentation of 3D lidar data for autonomous vehicles. IEEE Robot. Autom. Lett. 7(4), 8713–8720 (2022)
Article Google Scholar
Papon, J., Abramov, A., Schoeler, M., Worgotter, F.: Voxel cloud connectivity segmentation-supervoxels for point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2027–2034 (2013)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Peng, S., et al.: OpenScene: 3D scene understanding with open vocabularies. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 815–824 (2023)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Rethage, D., Wald, J., Sturm, J., Navab, N., Tombari, F.: Fully-convolutional point networks for large-scale point clouds. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 625–640. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_37
Chapter Google Scholar
Rozenberszki, D., Litany, O., Dai, A.: Language-grounded indoor 3D semantic segmentation in the wild. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13693, pp. 125–141. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_8
Chapter Google Scholar
Rozenberszki, D., Litany, O., Dai, A.: Language-grounded indoor 3D semantic segmentation in the wild. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13693, pp. 125–141. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_8
Chapter Google Scholar
Rozenberszki, D., Litany, O., Dai, A.: UnScene3D: unsupervised 3D instance segmentation for indoor scenes. arXiv preprint arXiv:2303.14541 (2023)
Schult, J., Engelmann, F., Hermans, A., Litany, O., Tang, S., Leibe, B.: Mask3D for 3D semantic instance segmentation. In: International Conference on Robotics and Automation (ICRA) (2023)
Google Scholar
Shi, C., Yang, S.: EdaDet: open-vocabulary object detection using early dense alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15724–15734 (2023)
Google Scholar
Shi, C., Yang, S.: LoGoPrompt: synthetic text images can be good visual prompts for vision-language models. arXiv preprint arXiv:2309.01155 (2023)
Shi, C., Yang, S.: The devil is in the object boundary: towards annotation-free instance segmentation using foundation models. In: The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=4JbrdrHxYy
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Song, Z., Yang, B.: OGC: unsupervised 3D object segmentation from rigid dynamics of point clouds. Adv. Neural. Inf. Process. Syst. 35, 30798–30812 (2022)
Google Scholar
Stewart, R., Andriluka, M., Ng, A.Y.: End-to-end people detection in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2325–2333 (2016)
Google Scholar
Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)
Sun, J., Qing, C., Tan, J., Xu, X.: Superpoint transformer for 3D scene instance segmentation (2022)
Google Scholar
Suo, S., et al.: MixSim: a hierarchical framework for mixed reality traffic simulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9622–9631 (2023)
Google Scholar
Suomela, L., Kalliola, J., Dag, A., Edelman, H., Kämäräinen, J.K.: Benchmarking visual localization for autonomous navigation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2945–2955 (2023)
Google Scholar
Tang, J., Zheng, G., Shi, C., Yang, S.: Contrastive grouping with transformer for referring image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23570–23580 (2023)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Vu, T., Kim, K., Luu, T.M., Nguyen, T., Kim, J., Yoo, C.D.: SoftGroup++: scalable 3D instance segmentation with octree pyramid grouping. arXiv preprint arXiv:2209.08263 (2022)
Vu, T., Kim, K., Luu, T.M., Nguyen, X.T., Yoo, C.D.: SoftGroup for 3D instance segmentation on 3D point clouds. In: CVPR (2022)
Google Scholar
Wang, R., Zhang, Y., Mao, J., Zhang, R., Cheng, C.Y., Wu, J.: IKEA-manual: seeing shape assembly step by step. Adv. Neural. Inf. Process. Syst. 35, 28428–28440 (2022)
Google Scholar
Wang, W., Yu, R., Huang, Q., Neumann, U.: SGPN: similarity group proposal network for 3D point cloud instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2569–2578 (2018)
Google Scholar
Wang, X., et al.: FreeSOLO: learning to segment objects without annotations. arXiv preprint arXiv:2202.12181 (2022)
Wang, X., Girdhar, R., Yu, S.X., Misra, I.: Cut and learn for unsupervised object detection and instance segmentation. arXiv preprint arXiv:2301.11320 (2023)
Wang, Y., Shen, X., Hu, S.X., Yuan, Y., Crowley, J.L., Vaufreydaz, D.: Self-supervised transformers for unsupervised object discovery using normalized cut. In: Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, June 2022
Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)
Article Google Scholar
Wu, W., Qi, Z., Fuxin, L.: PointConv: deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9621–9630 (2019)
Google Scholar
Yang, Y., Wu, X., He, T., Zhao, H., Liu, X.: SAM3D: segment anything in 3D scenes. arXiv preprint arXiv:2306.03908 (2023)
Zhang, B., Wonka, P.: Point cloud instance segmentation using probabilistic embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8883–8892 (2021)
Google Scholar
Zhang, Z., Ding, J., Jiang, L., Dai, D., Xia, G.S.: FreePoint: unsupervised point cloud instance segmentation. arXiv preprint arXiv:2305.06973 (2023)
Zhang, Z., Yang, B., Wang, B., Li, B.: GrowSP: unsupervised semantic segmentation of 3d point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17619–17629 (2023)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62206174) and MoE Key Laboratory of Intelligent Perception and Human-Machine Collaboration (ShanghaiTech University).

Author information

Authors and Affiliations

School of Information Science and Technology, ShanghaiTech University, Shanghai, China
Cheng Shi, Yulin Zhang, Bin Yang, Jiajin Tang, Yuexin Ma & Sibei Yang

Authors

Cheng Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yulin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jiajin Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yuexin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Sibei Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sibei Yang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Ethics declarations

Conflict of Interest

Given that our 2D knowledge is derived from the self-supervised models DINO, we acknowledge that biases and controversies inherent in the training data for these models may be introduced into our model.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, C., Zhang, Y., Yang, B., Tang, J., Ma, Y., Yang, S. (2025). Part2Object: Hierarchical Unsupervised 3D Instance Segmentation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15076. Springer, Cham. https://doi.org/10.1007/978-3-031-72649-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-72649-1_1
Published: 30 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72648-4
Online ISBN: 978-3-031-72649-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics