Submit to Applied Sciences Review for Applied Sciences Propose a Special Issue

Journal Menu

Journal Browser

3D Scene Understanding and Object Recognition

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 September 2023) | Viewed by 15241

Share This Special Issue

Special Issue Editors

Dr. Francisco Gomez-Donoso

E-Mail Website
Guest Editor

University Institute for Computer Research, University of Alicante, P.O. Box 99, 03080 Alicante, Spain
Interests: machine learning; computer vision; pattern recognition; gesture recognition; object recognition; neural networks; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

Dr. Félix Escalona Moncholí

E-Mail Website
Guest Editor

University Institute for Computer Research, University of Alicante, P.O. Box 99, 03080 Alicante, Spain
Interests: computer vision; deep learning; 3D object recognition; mapping; navigation; robotics
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Miguel Angel Cazorla

E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues,

Three-dimensional data have become widespread in recent years due largely to the rise of self-driving cars and intelligent vehicles. These new transportation systems are fitted with LiDAR, ToF cameras, stereo setups and a range of other devices providing 3D data on nearby surroundings. Furthermore, 3D data are actively used in the industry for quality testing and other tasks, as well as in consumer devices, such as smartphones. Moreover, most robots are equipped with a device able to perceive this kind of info.

Managing 3D data is thus of the utmost importance, and the ability to optimally perform guidance and navigation, object recognition and detection, reduction in noise and other related tasks is a hot research topic today.

Against this background, we propose this Special Issue focused on 3D scene understanding and object recognition with an emphasis on new algorithms and applications using 3D data. Topics of interest include:

Learning-based 3D object recognition;
Monocular depth estimation;
Navigation algorithms based on 3D data;
Registration and map creation;
Noise reduction in 3D data.

Dr. Francisco Gomez-Donoso
Dr. Félix Escalona Moncholí
Prof. Dr. Miguel Cazorla
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

3D scene understanding
registration
mapping
3D object recognition
depth estimation
noise reduction

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (8 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

19 pages, 11374 KiB

Open AccessArticle

3D Point Cloud Completion Method Based on Building Contour Constraint Diffusion Probability Model

by Bo Ye, Han Wang, Jingwen Li, Jianwu Jiang, Yanling Lu, Ertao Gao and Tao Yue

Appl. Sci. 2023, 13(20), 11246; https://doi.org/10.3390/app132011246 - 13 Oct 2023

Cited by 1 | Viewed by 1629

Abstract

Building point cloud completion is the process of reconstructing missing parts of a building’s point cloud, which have been affected by external factors during data collection, to restore the original geometric shape of the building. However, the uncertainty in filling point positions in the areas where building features are missing makes it challenging to recover the original distribution of the building’s point cloud shape. To address this issue, we propose a point cloud generation diffusion probability model based on building outline constraints. This method constructs building-outline-constrained regions using information related to the walls on the building’s surface and adjacent roofs. These constraints are encoded by an encoder and fused into latent codes representing the incomplete building point cloud shape. This ensures that the completed point cloud adheres closely to the real geometric shape of the building by constraining the generated points within the missing areas. The quantitative and qualitative results of the experiment clearly show that our method performs better than other methods in building point cloud completion. Full article

(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)

► Show Figures

Figure 1

14 pages, 2826 KiB

Open AccessArticle

Study of Root Canal Length Estimations by 3D Spatial Reproduction with Stereoscopic Vision

by Takato Tsukuda, Noriko Mutoh, Akito Nakano, Tomoki Itamiya and Nobuyuki Tani-Ishii

Appl. Sci. 2023, 13(15), 8651; https://doi.org/10.3390/app13158651 - 27 Jul 2023

Cited by 1 | Viewed by 1454

Abstract

Extended Reality (XR) applications are considered useful for skill acquisition in dental education. In this study, we examined the functionality and usefulness of an application called “SR View for Endo” that measures root canal length using a Spatial Reality Display (SRD) capable of naked-eye stereoscopic viewing. Three-dimensional computer graphics (3DCG) data of dental models were obtained and output to both the SRD and conventional 2D display devices. Forty dentists working at the Kanagawa Dental University Hospital measured root canal length using both types of devices and provided feedback through a questionnaire. Statistical analysis using one-way analysis of variance evaluated the measurement values and time, while multivariate analysis assessed the relationship between questionnaire responses and measurement time. There was no significant difference in the measurement values between the 2D device and SRD, but there was a significant difference in measurement time. Furthermore, a negative correlation was observed between the frequency of device usage and the extended measurement time of the 2D device. Measurements using the SRD demonstrated higher accuracy and shorter measurement times compared to the 2D device, increasing expectations for clinical practice in dental education and clinical education for clinical applications. However, a certain percentage of participants experienced symptoms resembling motion sickness associated with virtual reality (VR). Full article

(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)

► Show Figures

Figure 1

17 pages, 58573 KiB

Open AccessArticle

A 3D Estimation Method Using an Omnidirectional Camera and a Spherical Mirror

by Yuya Hiruta, Chun Xie, Hidehiko Shishido and Itaru Kitahara

Appl. Sci. 2023, 13(14), 8348; https://doi.org/10.3390/app13148348 - 19 Jul 2023

Cited by 1 | Viewed by 1367

Abstract

As the demand for 3D information continues to grow in various fields, technologies are rapidly being used to acquire such information. Laser-based estimation and multi-view images are popular methods for sensing 3D information, while deep learning techniques are also being developed. However, the former method requires precise sensing equipment or large observation systems, while the latter relies on substantial prior information in the form of extensive learning datasets. Given these limitations, our research aims to develop a method that is independent of learning and makes it possible to capture a wide range of 3D information using a compact device. This paper introduces a novel approach for estimating the 3D information of an observed scene utilizing a monocular image based on a catadioptric imaging system employing an omnidirectional camera and a spherical mirror. By employing a curved mirror, it is possible to capture a large area in a single observation. At the same time, using an omnidirectional camera enables the creation of a simplified imaging system. The proposed method focuses on a spherical or spherical cap-shaped mirror in the scene. It estimates the mirror’s position from the captured images, allowing for the estimation of the scene with great flexibility. Simulation evaluations are conducted to validate the characteristics and effectiveness of our proposed method. Full article

(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)

► Show Figures

Figure 1

14 pages, 5688 KiB

Open AccessArticle

FANet: Improving 3D Object Detection with Position Adaptation

by Jian Ye, Fushan Zuo and Yuqing Qian

Appl. Sci. 2023, 13(13), 7508; https://doi.org/10.3390/app13137508 - 25 Jun 2023

Viewed by 1311

Abstract

Three-dimensional object detection plays a crucial role in achieving accurate and reliable autonomous driving systems. However, the current state-of-the-art two-stage detectors lack flexibility and have limited feature extraction capabilities to effectively handle the disorder and irregularity of point clouds. In this paper, we propose a novel network called FANet, which combines the strengths of PV-RCNN and PAConv (position adaptive convolution). The goal of FANet is to address the irregularity and disorder present in point clouds. In our network, the convolution operation constructs convolutional kernels using a basic weight matrix, and the coefficients of these kernels are adaptively learned by LearnNet from relative points. This approach allows for the flexible modeling of complex spatial variations and geometric structures in 3D point clouds, leading to the improved extraction of point cloud features and generation of high-quality 3D proposal boxes. Compared to other methods, extensive experiments on the KITTI dataset have demonstrated that the FANet exhibits superior 3D object detection accuracy, showcasing a significant improvement in our approach. Full article

(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)

► Show Figures

Figure 1

25 pages, 5804 KiB

Open AccessArticle

NGLSFusion: Non-Use GPU Lightweight Indoor Semantic SLAM

by Le Wan, Lin Jiang, Bo Tang, Yunfei Li, Bin Lei and Honghai Liu

Appl. Sci. 2023, 13(9), 5285; https://doi.org/10.3390/app13095285 - 23 Apr 2023

Viewed by 1529

Abstract

Perception of the indoor environment is the basis of mobile robot localization, navigation, and path planning, and it is particularly important to construct semantic maps in real time using minimal resources. The existing methods are too dependent on the graphics processing unit (GPU) for acquiring semantic information about the indoor environment, and cannot build the semantic map in real time on the central processing unit (CPU). To address the above problems, this paper proposes a non-use GPU for lightweight indoor semantic map construction algorithm, named NGLSFusion. In the VO method, ORB features are used for the initialization of the first frame, new keyframes are created by optical flow method, and feature points are extracted by direct method, which speeds up the tracking speed. In the semantic map construction method, a pretrained model of the lightweight network LinkNet is optimized to provide semantic information in real time on devices with limited computing power, and a semantic point cloud is fused using OctoMap and Voxblox. Experimental results show that the algorithm in this paper ensures the accuracy of camera pose while speeding up the tracking speed, and obtains a reconstructed semantic map with complete structure without using GPU. Full article

(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)

► Show Figures

Figure 1

19 pages, 9929 KiB

Open AccessArticle

Boundary–Inner Disentanglement Enhanced Learning for Point Cloud Semantic Segmentation

by Lixia He, Jiangfeng She, Qiang Zhao, Xiang Wen and Yuzheng Guan

Appl. Sci. 2023, 13(6), 4053; https://doi.org/10.3390/app13064053 - 22 Mar 2023

Cited by 2 | Viewed by 1537

Abstract

In a point cloud semantic segmentation task, misclassification usually appears on the semantic boundary. A few studies have taken the boundary into consideration, but they relied on complex modules for explicit boundary prediction, which greatly increased model complexity. It is challenging to improve the segmentation accuracy of points on the boundary without dependence on additional modules. For every boundary point, this paper divides its neighboring points into different collections, and then measures its entanglement with each collection. A comparison of the measurement results before and after utilizing boundary information in the semantic segmentation network showed that the boundary could enhance the disentanglement between the boundary point and its neighboring points in inner areas, thereby greatly improving the overall accuracy. Therefore, to improve the semantic segmentation accuracy of boundary points, a Boundary–Inner Disentanglement Enhanced Learning (BIDEL) framework with no need for additional modules and learning parameters is proposed, which can maximize feature distinction between the boundary point and its neighboring points in inner areas through a newly defined boundary loss function. Experiments with two classic baselines across three challenging datasets demonstrate the benefits of BIDEL for the semantic boundary. As a general framework, BIDEL can be easily adopted in many existing semantic segmentation networks. Full article

(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)

► Show Figures

Figure 1

27 pages, 4579 KiB

Open AccessArticle

An Accurate, Efficient, and Stable Perspective-n-Point Algorithm in 3D Space

by Rui Qiao, Guili Xu, Ping Wang, Yuehua Cheng and Wende Dong

Appl. Sci. 2023, 13(2), 1111; https://doi.org/10.3390/app13021111 - 13 Jan 2023

Cited by 1 | Viewed by 3270

Abstract

The Perspective-n-Point problem is usually addressed by means of a projective imaging model of 3D points, but the spatial distribution and quantity of 3D reference points vary, making it difficult for the Perspective-n-Point algorithm to balance accuracy, robustness, and computational efficiency. To address this issue, this paper introduces Hidden PnP, a hidden variable method. Following the parameterization of the rotation matrix by CGR parameters, the method, unlike the existing best matrix synthesis technique (Gröbner technology), does not require construction of a larger matrix elimination template in the polynomial solution phase. Therefore, it is able to solve CGR parameter rapidly, and achieve an accurate location of the solution using the Gauss–Newton method. According to the synthetic data test, the PnP algorithm solution, based on hidden variables, outperforms the existing best Perspective-n-Point method in accuracy and robustness, under cases of Ordinary 3D, Planar Case, and Quasi-Singular. Furthermore, its computational efficiency can be up to seven times that of existing excellent algorithms when the spatially redundant reference points are increased to 500. In physical experiments on pose reprojection from monocular cameras, this algorithm even showed higher accuracy than the best existing algorithm. Full article

(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)

► Show Figures

Figure 1

15 pages, 1736 KiB

Open AccessArticle

LUMDE: Light-Weight Unsupervised Monocular Depth Estimation via Knowledge Distillation

by Wenze Hu, Xue Dong, Ning Liu and Yuanfeng Chen

Appl. Sci. 2022, 12(24), 12593; https://doi.org/10.3390/app122412593 - 8 Dec 2022

Cited by 2 | Viewed by 2118

Abstract

The use of the unsupervised monocular depth estimation network approach has seen rapid progress in recent years, as it avoids the use of ground truth data, and also because monocular cameras are readily available in most autonomous devices. Although some effective monocular depth estimation networks have been reported previously, such as Monodepth2 and SC-SfMLearner, most of these approaches are still computationally expensive for lightweight devices. Therefore, in this paper, we introduced a knowledge-distillation-based approach named LUMDE, to deal with the pixel-by-pixel unsupervised monocular depth estimation task. Specifically, we use a teacher network and lightweight student network to distill the depth information, and further, integrate a pose network into the student module to improve the depth performance. Moreover, referring to the idea of the Generative Adversarial Network (GAN), the outputs of the student network and teacher network are taken as fake and real samples, respectively, and Transformer is introduced as the discriminator of GAN to further improve the depth prediction results. The proposed LUMDE method achieves state-of-the-art (SOTA) results in the knowledge distillation of unsupervised depth estimation and also outperforms the results of some dense networks. The proposed LUMDE model only loses 2.6% on

δ 1

accuracy on the NYUD-V2 dataset compared with the teacher network but reduces the computational complexity by 95.2%. Full article

(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)

► Show Figures

Journal Menu

Journal Browser

3D Scene Understanding and Object Recognition

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI