[go: up one dir, main page]

 
 
applsci-logo

Journal Browser

Journal Browser

3D Scene Understanding and Object Recognition

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 September 2023) | Viewed by 15241

Special Issue Editors


E-Mail Website
Guest Editor
University Institute for Computer Research, University of Alicante, P.O. Box 99, 03080 Alicante, Spain
Interests: machine learning; computer vision; pattern recognition; gesture recognition; object recognition; neural networks; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
University Institute for Computer Research, University of Alicante, P.O. Box 99, 03080 Alicante, Spain
Interests: computer vision; deep learning; 3D object recognition; mapping; navigation; robotics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Three-dimensional data have become widespread in recent years due largely to the rise of self-driving cars and intelligent vehicles. These new transportation systems are fitted with LiDAR, ToF cameras, stereo setups and a range of other devices providing 3D data on nearby surroundings. Furthermore, 3D data are actively used in the industry for quality testing and other tasks, as well as in consumer devices, such as smartphones. Moreover, most robots are equipped with a device able to perceive this kind of info.

Managing 3D data is thus of the utmost importance, and the ability to optimally perform guidance and navigation, object recognition and detection, reduction in noise and other related tasks is a hot research topic today.

Against this background, we propose this Special Issue focused on 3D scene understanding and object recognition with an emphasis on new algorithms and applications using 3D data. Topics of interest include:

  • Learning-based 3D object recognition;
  • Monocular depth estimation;
  • Navigation algorithms based on 3D data;
  • Registration and map creation;
  • Noise reduction in 3D data.

Dr. Francisco Gomez-Donoso
Dr. Félix Escalona Moncholí
Prof. Dr. Miguel Cazorla
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • 3D scene understanding
  • registration
  • mapping
  • 3D object recognition
  • depth estimation
  • noise reduction

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 11374 KiB  
Article
3D Point Cloud Completion Method Based on Building Contour Constraint Diffusion Probability Model
by Bo Ye, Han Wang, Jingwen Li, Jianwu Jiang, Yanling Lu, Ertao Gao and Tao Yue
Appl. Sci. 2023, 13(20), 11246; https://doi.org/10.3390/app132011246 - 13 Oct 2023
Cited by 1 | Viewed by 1629
Abstract
Building point cloud completion is the process of reconstructing missing parts of a building’s point cloud, which have been affected by external factors during data collection, to restore the original geometric shape of the building. However, the uncertainty in filling point positions in [...] Read more.
Building point cloud completion is the process of reconstructing missing parts of a building’s point cloud, which have been affected by external factors during data collection, to restore the original geometric shape of the building. However, the uncertainty in filling point positions in the areas where building features are missing makes it challenging to recover the original distribution of the building’s point cloud shape. To address this issue, we propose a point cloud generation diffusion probability model based on building outline constraints. This method constructs building-outline-constrained regions using information related to the walls on the building’s surface and adjacent roofs. These constraints are encoded by an encoder and fused into latent codes representing the incomplete building point cloud shape. This ensures that the completed point cloud adheres closely to the real geometric shape of the building by constraining the generated points within the missing areas. The quantitative and qualitative results of the experiment clearly show that our method performs better than other methods in building point cloud completion. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

Figure 1
<p>Point cloud generation diffusion probability model based on building contour constraints.</p>
Full article ">Figure 2
<p>Building outline constraint feature extraction module: “<math display="inline"><semantics> <mrow> <msup> <mi>X</mi> <mrow> <mfenced> <mn>0</mn> </mfenced> </mrow> </msup> </mrow> </semantics></math>” refers to the input initial point cloud. “<math display="inline"><semantics> <mi>V</mi> </semantics></math>” represents the plane polygons extracted by RANSAC. “<math display="inline"><semantics> <mi>E</mi> </semantics></math>” denotes the polygon-related constraints. “<math display="inline"><semantics> <mrow> <mfenced close="" open="{"> <mrow> <mfenced close="}" open=""> <mrow> <msub> <mi>Z</mi> <mn>1</mn> </msub> <mo>,</mo> <mo>⋯</mo> <mo>,</mo> <msub> <mi>Z</mi> <mi>a</mi> </msub> </mrow> </mfenced> </mrow> </mfenced> </mrow> </semantics></math>” represents the grouping of plane polygons. “<math display="inline"><semantics> <mrow> <msub> <mi>F</mi> <mi>V</mi> </msub> </mrow> </semantics></math>” is the definition of polygon attributes. “<math display="inline"><semantics> <mi>R</mi> </semantics></math>” and “<math display="inline"><semantics> <mi>W</mi> </semantics></math>” are categories defined by plane polygons <math display="inline"><semantics> <mi>V</mi> </semantics></math>, representing the roof and walls, respectively. “<math display="inline"><semantics> <mrow> <msup> <mi>L</mi> <mrow> <mfenced> <mi>R</mi> </mfenced> </mrow> </msup> </mrow> </semantics></math>” denotes the regular edges of polygons projected onto the ground plane <math display="inline"><semantics> <mrow> <msub> <mi>V</mi> <mrow> <mi>h</mi> <mi>o</mi> <mi>r</mi> <mi>i</mi> <mi>z</mi> <mi>o</mi> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math> from both the roof and walls. “<math display="inline"><semantics> <mrow> <msup> <mi>L</mi> <mrow> <mfenced> <mi>W</mi> </mfenced> </mrow> </msup> </mrow> </semantics></math>” represents the regular edges of polygons projected onto the ground plane <math display="inline"><semantics> <mrow> <msub> <mi>V</mi> <mrow> <mi>h</mi> <mi>o</mi> <mi>r</mi> <mi>i</mi> <mi>z</mi> <mi>o</mi> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math> from the walls. “<math display="inline"><semantics> <mrow> <msup> <mi>S</mi> <mrow> <mfenced> <mi>R</mi> </mfenced> </mrow> </msup> </mrow> </semantics></math>” represents the irregular edges of the roof projected onto the ground plane <math display="inline"><semantics> <mrow> <msub> <mi>V</mi> <mrow> <mi>h</mi> <mi>o</mi> <mi>r</mi> <mi>i</mi> <mi>z</mi> <mi>o</mi> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math>. “<math display="inline"><semantics> <mrow> <msup> <mi>C</mi> <mrow> <mfenced> <mi>Z</mi> </mfenced> </mrow> </msup> </mrow> </semantics></math>” represents the possible generation area for contour constraints, and “<math display="inline"><semantics> <mrow> <msub> <mi>X</mi> <mi>C</mi> </msub> </mrow> </semantics></math>” is the point cloud sampled from it. “<math display="inline"><semantics> <mrow> <msup> <mi>M</mi> <mrow> <mfenced> <mi>Z</mi> </mfenced> </mrow> </msup> </mrow> </semantics></math>” represents the inevitable generation area for contour constraints, and “<math display="inline"><semantics> <mrow> <msub> <mi>X</mi> <mi>M</mi> </msub> </mrow> </semantics></math>” is the point cloud sampled from it.</p>
Full article ">Figure 3
<p>Schematic diagram of common polygonal convex–concave connections on building surfaces: (<b>a</b>,<b>b</b>,<b>e</b>) have angle <math display="inline"><semantics> <mrow> <msub> <mi>θ</mi> <mi>i</mi> </msub> <mo>&gt;</mo> <msub> <mi>θ</mi> <mi>j</mi> </msub> </mrow> </semantics></math>, and there is a convex connection between the polygons; (<b>c</b>,<b>d</b>) have angle <math display="inline"><semantics> <mrow> <msub> <mi>θ</mi> <mi>i</mi> </msub> <mo>&lt;</mo> <msub> <mi>θ</mi> <mi>j</mi> </msub> </mrow> </semantics></math>, and there is a concave connection between the polygons.</p>
Full article ">Figure 4
<p>Building grouping results schematic diagram: The figure divides the surface polygons fitted from the incomplete building point cloud into four groups, with each color representing a polygon association group. The white irregular region represents the missing point cloud area.</p>
Full article ">Figure 5
<p>Projection diagram of <math display="inline"><semantics> <mrow> <msub> <mi>Z</mi> <mi>a</mi> </msub> </mrow> </semantics></math> onto the horizontal plane <math display="inline"><semantics> <mrow> <msub> <mi>V</mi> <mrow> <mi>h</mi> <mi>o</mi> <mi>r</mi> <mi>i</mi> <mi>z</mi> <mi>o</mi> <mi>n</mi> </mrow> </msub> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Building constraint contour diagram: (<b>a</b>) contour constraint polygon <math display="inline"><semantics> <mrow> <msup> <mi>U</mi> <mrow> <mfenced> <mi>R</mi> </mfenced> </mrow> </msup> </mrow> </semantics></math>; (<b>b</b>) contour constraint polygon <math display="inline"><semantics> <mrow> <msup> <mi>U</mi> <mrow> <mfenced> <mi>W</mi> </mfenced> </mrow> </msup> </mrow> </semantics></math>.</p>
Full article ">Figure 7
<p>The contour constraint possibility region <math display="inline"><semantics> <mrow> <msup> <mi>C</mi> <mrow> <mo stretchy="false">(</mo> <msub> <mi>Z</mi> <mi>a</mi> </msub> <mo stretchy="false">)</mo> </mrow> </msup> </mrow> </semantics></math> and the contour constraint necessity region <math display="inline"><semantics> <mrow> <msup> <mi>M</mi> <mrow> <mfenced> <mrow> <msub> <mi>Z</mi> <mi>a</mi> </msub> </mrow> </mfenced> </mrow> </msup> </mrow> </semantics></math>.</p>
Full article ">Figure 8
<p>Diffusion procession.</p>
Full article ">Figure 9
<p>Point encoding procession.</p>
Full article ">Figure 10
<p>Reverse procession.</p>
Full article ">Figure 11
<p>Comparison of building point cloud completion results using different methods: This figure encompasses the initial incomplete point cloud, the completed point clouds generated by PCN, PF-Net, VRC-Net, and our method, alongside their respective real point clouds for reference.</p>
Full article ">Figure 12
<p>Line plot of point cloud completion results at different missing rates: (<b>a</b>) G→R refers to the variation in <span class="html-italic">d<sub>CD</sub></span> values when pointing from generation to real point clouds; (<b>b</b>) R→G refers to the variation in <span class="html-italic">d<sub>CD</sub></span> values when pointing from real to generated point clouds.</p>
Full article ">Figure 13
<p>Visualization of ablation experiment point cloud completion results: The point clouds visualized in the figure include, in sequence, the input incomplete building point cloud, the building point cloud completed by the Baseline Model, the building point cloud completed by the model with only the fusion of contour-constrained shape latent code <math display="inline"><semantics> <mi>C</mi> </semantics></math>, the building point cloud completed by the model with only the fusion of contour-constrained shape latent code <math display="inline"><semantics> <mi>M</mi> </semantics></math>, and the completed point cloud obtained by our method.</p>
Full article ">
14 pages, 2826 KiB  
Article
Study of Root Canal Length Estimations by 3D Spatial Reproduction with Stereoscopic Vision
by Takato Tsukuda, Noriko Mutoh, Akito Nakano, Tomoki Itamiya and Nobuyuki Tani-Ishii
Appl. Sci. 2023, 13(15), 8651; https://doi.org/10.3390/app13158651 - 27 Jul 2023
Cited by 1 | Viewed by 1454
Abstract
Extended Reality (XR) applications are considered useful for skill acquisition in dental education. In this study, we examined the functionality and usefulness of an application called “SR View for Endo” that measures root canal length using a Spatial Reality Display (SRD) capable of [...] Read more.
Extended Reality (XR) applications are considered useful for skill acquisition in dental education. In this study, we examined the functionality and usefulness of an application called “SR View for Endo” that measures root canal length using a Spatial Reality Display (SRD) capable of naked-eye stereoscopic viewing. Three-dimensional computer graphics (3DCG) data of dental models were obtained and output to both the SRD and conventional 2D display devices. Forty dentists working at the Kanagawa Dental University Hospital measured root canal length using both types of devices and provided feedback through a questionnaire. Statistical analysis using one-way analysis of variance evaluated the measurement values and time, while multivariate analysis assessed the relationship between questionnaire responses and measurement time. There was no significant difference in the measurement values between the 2D device and SRD, but there was a significant difference in measurement time. Furthermore, a negative correlation was observed between the frequency of device usage and the extended measurement time of the 2D device. Measurements using the SRD demonstrated higher accuracy and shorter measurement times compared to the 2D device, increasing expectations for clinical practice in dental education and clinical education for clinical applications. However, a certain percentage of participants experienced symptoms resembling motion sickness associated with virtual reality (VR). Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

Figure 1
<p>The dental models used in this study. Maxillary first premolar (<b>a</b>), maxillary first molar (<b>b</b>), mandibular first molar (<b>c</b>), and mandibular first premolar (<b>d</b>).</p>
Full article ">Figure 2
<p>An example of a series of operations for measurement using SR View for Endo in the state where the 3D space reproduction (3DCG) environment is constructed by SRD. The SRD screen after setting the first reference point on the #26 tooth model (<b>a</b>). The SRD screen after setting the second reference point by right-clicking (<b>b</b>). The SRD screen after changing the angle after measurement (<b>c</b>).</p>
Full article ">Figure 3
<p>Measurement screen using the conventional 2D device. Screen (<b>a</b>) displaying the dental model of tooth number 26. Schematic diagram (<b>b</b>) of the 2D device measurement screen.</p>
Full article ">Figure 4
<p>Study design flow chart.</p>
Full article ">Figure 5
<p>Bland–Altman plots of the first and second measurements by the spatial reality display (SRD). Scatterplot of Bland–Altman limits for maxillary first premolar as measured by the SRD (<b>a</b>); maxillary first premolar measurements obtained by the SRD (<b>b</b>); scatterplot of Bland–Altman limits for maxillary first molar as measured by the SRD (<b>c</b>); maxillary first molar measurements obtained by the SRD (<b>d</b>); scatterplot of Bland–Altman limits for mandibular first premolar as measured by the SRD (<b>e</b>); mandibular first premolar measurements obtained by the SRD (<b>f</b>); scatterplot of Bland–Altman limits for mandibular first premolar as measured by the SRD (<b>g</b>); and mandibular first premolar measurements obtained by the SRD (<b>h</b>).</p>
Full article ">Figure 5 Cont.
<p>Bland–Altman plots of the first and second measurements by the spatial reality display (SRD). Scatterplot of Bland–Altman limits for maxillary first premolar as measured by the SRD (<b>a</b>); maxillary first premolar measurements obtained by the SRD (<b>b</b>); scatterplot of Bland–Altman limits for maxillary first molar as measured by the SRD (<b>c</b>); maxillary first molar measurements obtained by the SRD (<b>d</b>); scatterplot of Bland–Altman limits for mandibular first premolar as measured by the SRD (<b>e</b>); mandibular first premolar measurements obtained by the SRD (<b>f</b>); scatterplot of Bland–Altman limits for mandibular first premolar as measured by the SRD (<b>g</b>); and mandibular first premolar measurements obtained by the SRD (<b>h</b>).</p>
Full article ">Figure 6
<p>Bland–Altman plots of the first and second measurements by the two-dimensional (2D) device. Scatterplot of Bland–Altman limits for maxillary first premolar as measured by the 2D device (<b>a</b>); maxillary first premolar measurements obtained by the 2D device (<b>b</b>); scatterplot of Bland–Altman limits for maxillary first molar as measured by the 2D device (<b>c</b>); maxillary first molar measurements obtained by the 2D device (<b>d</b>); scatterplot of Bland–Altman limits for mandibular first premolar as measured by the SRD (<b>e</b>); mandibular first premolar measurements obtained by the 2D device (<b>f</b>); scatterplot of Bland–Altman limits for mandibular first premolar as measured by the 2D device (<b>g</b>); and mandibular first premolar measurements obtained by the 2D device (<b>h</b>).</p>
Full article ">Figure 6 Cont.
<p>Bland–Altman plots of the first and second measurements by the two-dimensional (2D) device. Scatterplot of Bland–Altman limits for maxillary first premolar as measured by the 2D device (<b>a</b>); maxillary first premolar measurements obtained by the 2D device (<b>b</b>); scatterplot of Bland–Altman limits for maxillary first molar as measured by the 2D device (<b>c</b>); maxillary first molar measurements obtained by the 2D device (<b>d</b>); scatterplot of Bland–Altman limits for mandibular first premolar as measured by the SRD (<b>e</b>); mandibular first premolar measurements obtained by the 2D device (<b>f</b>); scatterplot of Bland–Altman limits for mandibular first premolar as measured by the 2D device (<b>g</b>); and mandibular first premolar measurements obtained by the 2D device (<b>h</b>).</p>
Full article ">
17 pages, 58573 KiB  
Article
A 3D Estimation Method Using an Omnidirectional Camera and a Spherical Mirror
by Yuya Hiruta, Chun Xie, Hidehiko Shishido and Itaru Kitahara
Appl. Sci. 2023, 13(14), 8348; https://doi.org/10.3390/app13148348 - 19 Jul 2023
Cited by 1 | Viewed by 1367
Abstract
As the demand for 3D information continues to grow in various fields, technologies are rapidly being used to acquire such information. Laser-based estimation and multi-view images are popular methods for sensing 3D information, while deep learning techniques are also being developed. However, the [...] Read more.
As the demand for 3D information continues to grow in various fields, technologies are rapidly being used to acquire such information. Laser-based estimation and multi-view images are popular methods for sensing 3D information, while deep learning techniques are also being developed. However, the former method requires precise sensing equipment or large observation systems, while the latter relies on substantial prior information in the form of extensive learning datasets. Given these limitations, our research aims to develop a method that is independent of learning and makes it possible to capture a wide range of 3D information using a compact device. This paper introduces a novel approach for estimating the 3D information of an observed scene utilizing a monocular image based on a catadioptric imaging system employing an omnidirectional camera and a spherical mirror. By employing a curved mirror, it is possible to capture a large area in a single observation. At the same time, using an omnidirectional camera enables the creation of a simplified imaging system. The proposed method focuses on a spherical or spherical cap-shaped mirror in the scene. It estimates the mirror’s position from the captured images, allowing for the estimation of the scene with great flexibility. Simulation evaluations are conducted to validate the characteristics and effectiveness of our proposed method. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

Figure 1
<p>Flow of 3D Estimation Using an Omnidirectional Camera and a Spherical Mirror: The capture of two images from different viewpoints using an omnidirectional camera and a spherical mirror, and the estimation of the 3D information of the scene using stereo vision.</p>
Full article ">Figure 2
<p>The Process of a 3D Estimation Based on a Catadioptric Imaging System with an Omnidirectional Camera and a Spherical Mirror: 1. Segmentation of the mirror image region in an omnidirectional image by network. Estimation of the 3D position of a spherical mirror from the shape of the mirror image (<a href="#sec4-applsci-13-08348" class="html-sec">Section 4</a>). 2. Estimation of the 3D information of the shooting scene based on the 3D position of the spherical mirror. Searching for the omnidirectional image part corresponding to the mirror image using the incident light rays from the object obtained by the 3D position information of the spherical mirror, using stereo matching (<a href="#sec5-applsci-13-08348" class="html-sec">Section 5</a>).</p>
Full article ">Figure 3
<p>Process of segmenting mirror image in omnidirectional image: The procedure begins with the segmenting of the omnidirectional image into perspective projection images via cube mapping. Subsequently, a deep learning model is applied to the perspective projection image containing the mirror image for further segmentation.</p>
Full article ">Figure 4
<p>Relationship between the Camera and the Spherical Mirror: The estimation of the 3D position of the spherical mirror involves identifying the supporting plane (normal <math display="inline"><semantics><mi mathvariant="bold-italic">n</mi></semantics></math>, distance <math display="inline"><semantics><msub><mi>h</mi><mi>p</mi></msub></semantics></math> from the camera) using the ellipse shape of the mirror image (matrix <math display="inline"><semantics><mi mathvariant="bold">Q</mi></semantics></math>). This process employs prior knowledge of the spherical mirror’s shape, including the radius <span class="html-italic">R</span> of the mirror sphere and the radius r of the 3D circle.</p>
Full article ">Figure 5
<p>The shape of a spherical mirror that allows for the estimation of its 3D position. When the radius of the 3D circle is unknown: On the <b>left</b> and in the <b>middle</b>, tangent lines can be drawn, allowing for estimation of the mirror’s position. On the <b>right</b>, tangent lines cannot possibly be drawn, making estimation impossible.</p>
Full article ">Figure 6
<p>Backward Projection (projection from 2D to 3D): Estimation of the light path from an object to the omnidrectional camera via reflection point on the spherical mirror.</p>
Full article ">Figure 7
<p>Search for Matching Point by Color Histogram: Calculate the color histograms of the mirrored image at the reflection point and the corresponding omnidirectional image. Then, search for corresponding points based on their similarity.</p>
Full article ">Figure 8
<p>Effect of 3D Position Estimation Errors of the Spherical Mirror on 3D Estimation: The effect of the error in the 3D position estimation of the spherical mirror on the 3D estimation is shown by expressing the distance <math display="inline"><semantics><mrow><mo stretchy="false">∥</mo><mi mathvariant="bold-italic">P</mi><mo stretchy="false">∥</mo></mrow></semantics></math> from the camera center to the 3D point with the imaging position of the 3D point (the direct image <math display="inline"><semantics><msub><mi>θ</mi><mi>i</mi></msub></semantics></math>, the mirror image <math display="inline"><semantics><mi>ϕ</mi></semantics></math>) and the radius <span class="html-italic">R</span> of the spherical mirror.</p>
Full article ">Figure 9
<p>Input Images in Comparetion Simulation in Room Model: (<b>a</b>) The shooting omnidirectional image. The mirror image is in the center. (<b>b</b>) The GT image of the shooting image.</p>
Full article ">Figure 10
<p>Estimation Result Images of Position of Sherical Mirror: (<b>a</b>) The image of the mirror image region from the shooting image. (<b>b</b>) The image of the segmented mirror region from the image of (<b>a</b>). (<b>c</b>) The image of the estimated elliptical shape based on (<b>b</b>).</p>
Full article ">Figure 11
<p>Distance and error estimation from the simulated scene: (<b>a</b>) The estimation map by the proposed method. The map is visualized by changing the hue linearly. (<b>b</b>) The error map. The map is visualized by varying the brightness linearly. The map is obtained by the difference between the ground truth at each pixel and the result estimated by the proposed method. The errors are larger at the edge and the center of the mirror image region.</p>
Full article ">Figure 12
<p>Change in Estimation Accuracy against Change in Angle between Optical Axis and Camera Ray in the Spherical Mirror: MAE increases as the angle <math display="inline"><semantics><mi>ϕ</mi></semantics></math> of incidence at the reflection point <math display="inline"><semantics><msub><mi mathvariant="bold-italic">X</mi><mi mathvariant="bold-italic">s</mi></msub></semantics></math> decreases or increases. On the other hand, MAE decreases and accuracy is stable in the central area.</p>
Full article ">Figure 13
<p>Definition of Parameter <span class="html-italic">h</span>: The parameter <span class="html-italic">h</span> is defined as the size of the line segment perpendicular from the camera to the incident ray <math display="inline"><semantics><msub><mi mathvariant="bold-italic">v</mi><mi mathvariant="bold-italic">r</mi></msub></semantics></math> from the object to the reflection point <math display="inline"><semantics><msub><mi mathvariant="bold-italic">X</mi><mi mathvariant="bold-italic">s</mi></msub></semantics></math> in 3D space.</p>
Full article ">Figure 14
<p>Change in Parameter <span class="html-italic">h</span> against Change in Angle <math display="inline"><semantics><mi>ϕ</mi></semantics></math>: The parameters <span class="html-italic">h</span> increase as angle <math display="inline"><semantics><mi>ϕ</mi></semantics></math> increases or decreases. This feature is similar to the change in MAE against the angle of incidence.</p>
Full article ">Figure 15
<p>Estimation Result Images when Mirror Position is Ground Truth: (<b>a</b>) The estimation map using the proposed method, with mirror position as GT. The map is visualized by changing the hue linearly. (<b>b</b>) The error map. The map is visualized by varying the brightness linearly. The map is obtained from the difference between the ground truth at each pixel and the result estimated by the proposed method. The overall errors are smaller than when the mirror position is estimated, although the errors are still larger at the edges and in the center of the mirror image.</p>
Full article ">Figure 16
<p>Application Result of Equation (<a href="#FD13-applsci-13-08348" class="html-disp-formula">13</a>): The result shows that adding the effect of the spherical mirror position estimation to the result in the <a href="#applsci-13-08348-f011" class="html-fig">Figure 11</a>a is similar to the result in <a href="#applsci-13-08348-f015" class="html-fig">Figure 15</a>a.</p>
Full article ">Figure 17
<p>Change in 3D Information of the poster against Change in 3D Position of the Spherical Mirror along the Z-axis: Changes in 3D information are indicated by the solid blue line. The green vertical dashed line represents the ground truth of the 3D information, and the red horizontal dashed line represents the ground truth of the spherical mirror position. Since the poster is located at the edge of the mirror surface, if the position of the sphere is estimated to be large in the depth direction, the distance error rapidly increases.</p>
Full article ">Figure 18
<p>Input Images in Real-world Simulation: The shooting omnidirectional image. The mirror image is on the center.</p>
Full article ">Figure 19
<p>Estimation Result Images of Position of Spherical Mirror: (<b>a</b>) The image of the mirror image region from the shooting image. (<b>b</b>) The image of the segmented mirror region from the image of (<b>a</b>). (<b>c</b>) The image of the estimated elliptical shape based on (<b>b</b>).</p>
Full article ">Figure 20
<p>Estimation Result Image on Real-World Simulation: The estimation map using the proposed method, with mirror position as GT. The map is visualized by changing the hue linearly.</p>
Full article ">
14 pages, 5688 KiB  
Article
FANet: Improving 3D Object Detection with Position Adaptation
by Jian Ye, Fushan Zuo and Yuqing Qian
Appl. Sci. 2023, 13(13), 7508; https://doi.org/10.3390/app13137508 - 25 Jun 2023
Viewed by 1311
Abstract
Three-dimensional object detection plays a crucial role in achieving accurate and reliable autonomous driving systems. However, the current state-of-the-art two-stage detectors lack flexibility and have limited feature extraction capabilities to effectively handle the disorder and irregularity of point clouds. In this paper, we [...] Read more.
Three-dimensional object detection plays a crucial role in achieving accurate and reliable autonomous driving systems. However, the current state-of-the-art two-stage detectors lack flexibility and have limited feature extraction capabilities to effectively handle the disorder and irregularity of point clouds. In this paper, we propose a novel network called FANet, which combines the strengths of PV-RCNN and PAConv (position adaptive convolution). The goal of FANet is to address the irregularity and disorder present in point clouds. In our network, the convolution operation constructs convolutional kernels using a basic weight matrix, and the coefficients of these kernels are adaptively learned by LearnNet from relative points. This approach allows for the flexible modeling of complex spatial variations and geometric structures in 3D point clouds, leading to the improved extraction of point cloud features and generation of high-quality 3D proposal boxes. Compared to other methods, extensive experiments on the KITTI dataset have demonstrated that the FANet exhibits superior 3D object detection accuracy, showcasing a significant improvement in our approach. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

Figure 1
<p>The overall framework of the FANet.</p>
Full article ">Figure 2
<p>Dynamic network.</p>
Full article ">Figure 3
<p>The framework of position adaptive convolution.</p>
Full article ">Figure 4
<p>Loss of PC-RCNN.</p>
Full article ">Figure 5
<p>Loss of FANet.</p>
Full article ">Figure 6
<p>Scene 1.</p>
Full article ">Figure 6 Cont.
<p>Scene 1.</p>
Full article ">Figure 7
<p>Scene 2.</p>
Full article ">Figure 8
<p>Scene 3.</p>
Full article ">Figure 9
<p>Scene 4.</p>
Full article ">Figure 10
<p>Scene 5.</p>
Full article ">
25 pages, 5804 KiB  
Article
NGLSFusion: Non-Use GPU Lightweight Indoor Semantic SLAM
by Le Wan, Lin Jiang, Bo Tang, Yunfei Li, Bin Lei and Honghai Liu
Appl. Sci. 2023, 13(9), 5285; https://doi.org/10.3390/app13095285 - 23 Apr 2023
Viewed by 1529
Abstract
Perception of the indoor environment is the basis of mobile robot localization, navigation, and path planning, and it is particularly important to construct semantic maps in real time using minimal resources. The existing methods are too dependent on the graphics processing unit (GPU) [...] Read more.
Perception of the indoor environment is the basis of mobile robot localization, navigation, and path planning, and it is particularly important to construct semantic maps in real time using minimal resources. The existing methods are too dependent on the graphics processing unit (GPU) for acquiring semantic information about the indoor environment, and cannot build the semantic map in real time on the central processing unit (CPU). To address the above problems, this paper proposes a non-use GPU for lightweight indoor semantic map construction algorithm, named NGLSFusion. In the VO method, ORB features are used for the initialization of the first frame, new keyframes are created by optical flow method, and feature points are extracted by direct method, which speeds up the tracking speed. In the semantic map construction method, a pretrained model of the lightweight network LinkNet is optimized to provide semantic information in real time on devices with limited computing power, and a semantic point cloud is fused using OctoMap and Voxblox. Experimental results show that the algorithm in this paper ensures the accuracy of camera pose while speeding up the tracking speed, and obtains a reconstructed semantic map with complete structure without using GPU. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

Figure 1
<p>Algorithm framework of NGLSFusion.</p>
Full article ">Figure 2
<p>Algorithm framework of VO module.</p>
Full article ">Figure 3
<p>Optical flow tracking. (<b>a</b>) Forward tracking; (<b>b</b>) reverse tracking; (<b>c</b>) judgment of pixel distance; (<b>d</b>) removal of mismatched points.</p>
Full article ">Figure 4
<p>Algorithm framework of optimized semantic segmentation.</p>
Full article ">Figure 5
<p>Algorithm framework of semantic map construction.</p>
Full article ">Figure 6
<p>The trajectory position comparison diagram of the Handheld Camera dataset. (<b>a–c</b>,<b>g</b>,<b>i</b>,<b>j</b>) the top view of the trajectory; (<b>d</b>–<b>f</b>,<b>h</b>,<b>k</b>,<b>l</b>) the side view of the trajectory.</p>
Full article ">Figure 6 Cont.
<p>The trajectory position comparison diagram of the Handheld Camera dataset. (<b>a–c</b>,<b>g</b>,<b>i</b>,<b>j</b>) the top view of the trajectory; (<b>d</b>–<b>f</b>,<b>h</b>,<b>k</b>,<b>l</b>) the side view of the trajectory.</p>
Full article ">Figure 7
<p>The trajectory position comparison diagram of the Robot dataset and the Dynamic Objects dataset. (<b>a</b>,<b>c</b>) are the top view of the trajectory; (<b>b</b>,<b>d</b>) the side view of the trajectory.</p>
Full article ">Figure 8
<p>Global semantic maps for small office scenes. (<b>a</b>) Color reference map. (<b>b</b>) Semantic maps constructed by UCSSLAM. (<b>c</b>) Semantic maps constructed by UCSSLAM. (<b>d</b>) Semantic maps constructed by our method.</p>
Full article ">Figure 9
<p>Global semantic map of large laboratory scenes. (<b>a</b>) Color reference map. (<b>b</b>) Semantic maps constructed by UCSSLAM. (<b>c</b>) Semantic maps constructed by UCSSLAM. (<b>d</b>) Semantic maps constructed by our method.</p>
Full article ">Figure 10
<p>GPU occupancy.</p>
Full article ">Figure 11
<p>Global texture map for small office scenes. (<b>a</b>) Color reference map. (<b>b</b>) Semantic map constructed by Kimera. (<b>c</b>) Semantic maps constructed by UCSSLAM. (<b>d</b>) Semantic maps constructed by UCSSLAM. (<b>e</b>) Semantic maps constructed by our method.</p>
Full article ">Figure 12
<p>Global texture maps for large laboratory scenes. (<b>a</b>) Color reference map. (<b>b</b>) Semantic map constructed by Kimera. (<b>c</b>) Semantic map constructed by UCSSLAM. (<b>d</b>) Semantic map constructed by UCSSLAM. (<b>e</b>) Semantic map constructed by our method.</p>
Full article ">
19 pages, 9929 KiB  
Article
Boundary–Inner Disentanglement Enhanced Learning for Point Cloud Semantic Segmentation
by Lixia He, Jiangfeng She, Qiang Zhao, Xiang Wen and Yuzheng Guan
Appl. Sci. 2023, 13(6), 4053; https://doi.org/10.3390/app13064053 - 22 Mar 2023
Cited by 2 | Viewed by 1537
Abstract
In a point cloud semantic segmentation task, misclassification usually appears on the semantic boundary. A few studies have taken the boundary into consideration, but they relied on complex modules for explicit boundary prediction, which greatly increased model complexity. It is challenging to improve [...] Read more.
In a point cloud semantic segmentation task, misclassification usually appears on the semantic boundary. A few studies have taken the boundary into consideration, but they relied on complex modules for explicit boundary prediction, which greatly increased model complexity. It is challenging to improve the segmentation accuracy of points on the boundary without dependence on additional modules. For every boundary point, this paper divides its neighboring points into different collections, and then measures its entanglement with each collection. A comparison of the measurement results before and after utilizing boundary information in the semantic segmentation network showed that the boundary could enhance the disentanglement between the boundary point and its neighboring points in inner areas, thereby greatly improving the overall accuracy. Therefore, to improve the semantic segmentation accuracy of boundary points, a Boundary–Inner Disentanglement Enhanced Learning (BIDEL) framework with no need for additional modules and learning parameters is proposed, which can maximize feature distinction between the boundary point and its neighboring points in inner areas through a newly defined boundary loss function. Experiments with two classic baselines across three challenging datasets demonstrate the benefits of BIDEL for the semantic boundary. As a general framework, BIDEL can be easily adopted in many existing semantic segmentation networks. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

Figure 1
<p>Visualization of boundary generated from ground-truth image. Each scene was selected from S3DIS [<a href="#B13-applsci-13-04053" class="html-bibr">13</a>]. Red outlines represent boundary areas.</p>
Full article ">Figure 2
<p>From left to right, (<b>a</b>) the entanglement between the boundary point and its four neighboring collections in the control group, and (<b>b</b>) the entanglement between the boundary point and its four neighboring collections in the experimental group. Each point in the figure represents a boundary point selected from a batch of input.</p>
Full article ">Figure 3
<p>Detailed illustration of the Boundary–Inner Disentanglement Enhanced Learning (BIDEL) framework.</p>
Full article ">Figure 4
<p>Overall architecture of segmentation network embedded within the BIDEL framework.</p>
Full article ">Figure 5
<p>Visualization results on S3DIS Area 5 after applying BIDEL to KPConv [<a href="#B31-applsci-13-04053" class="html-bibr">31</a>]. Images in the first column and third column represent the input point cloud overlaid with the boundaries. Images in the second column and last column represent improved areas (blue regions were misclassified by the baseline but identified accurately by BIDEL).</p>
Full article ">Figure 6
<p>Visualization results on S3DIS Area 5 after applying BIDEL to RandLA-Net. The images from left to right are (<b>a</b>) the input point cloud overlaid with the boundaries, (<b>b</b>) the ground truth, (<b>c</b>) the baseline (RandLA-Net), (<b>d</b>) the baseline + BIDEL, and (<b>e</b>) the improved areas (blue regions were misclassified by the baseline but identified accurately by BIDEL).</p>
Full article ">Figure 7
<p>Visualization results on Toronto-3D dataset [<a href="#B48-applsci-13-04053" class="html-bibr">48</a>], highlighting mislabeling. The images from top to bottom are the input point cloud and the ground truth. Yellow rectangles show regions where objects were labeled wrongly in the ground truth. (<b>a</b>) Subscene-1; (<b>b</b>) Subscene-2; (<b>c</b>) Subscene-3.</p>
Full article ">Figure 8
<p>Qualitative results on Toronto-3D L002 dataset. The images from top to bottom are (<b>a</b>) the input point cloud overlaid with the boundaries (red points) generated from the ground truth, (<b>b</b>) the ground truth, (<b>c</b>) the baseline (RandLA-Net), (<b>d</b>) the baseline + BIDEL, and (<b>e</b>) the improved areas (blue regions were misclassified by the baseline but identified accurately by BIDEL).</p>
Full article ">Figure 8 Cont.
<p>Qualitative results on Toronto-3D L002 dataset. The images from top to bottom are (<b>a</b>) the input point cloud overlaid with the boundaries (red points) generated from the ground truth, (<b>b</b>) the ground truth, (<b>c</b>) the baseline (RandLA-Net), (<b>d</b>) the baseline + BIDEL, and (<b>e</b>) the improved areas (blue regions were misclassified by the baseline but identified accurately by BIDEL).</p>
Full article ">Figure 9
<p>Visualization results on the challenging Semantic3D reduced-8 dataset [<a href="#B49-applsci-13-04053" class="html-bibr">49</a>]. The images from left to right are (<b>a</b>) the input point cloud, (<b>b</b>) the baseline (RandLA-Net), and (<b>c</b>) the baseline + BIDEL. Objects in red rectangles were misclassified by the baseline but identified accurately by BIDEL. Note that, although the ground truth of the test set was not publicly provided, the class of objects in the red rectangles could be easily recognized by human eyes with the support of RGB attributes.</p>
Full article ">
27 pages, 4579 KiB  
Article
An Accurate, Efficient, and Stable Perspective-n-Point Algorithm in 3D Space
by Rui Qiao, Guili Xu, Ping Wang, Yuehua Cheng and Wende Dong
Appl. Sci. 2023, 13(2), 1111; https://doi.org/10.3390/app13021111 - 13 Jan 2023
Cited by 1 | Viewed by 3270
Abstract
The Perspective-n-Point problem is usually addressed by means of a projective imaging model of 3D points, but the spatial distribution and quantity of 3D reference points vary, making it difficult for the Perspective-n-Point algorithm to balance accuracy, robustness, and computational efficiency. To address [...] Read more.
The Perspective-n-Point problem is usually addressed by means of a projective imaging model of 3D points, but the spatial distribution and quantity of 3D reference points vary, making it difficult for the Perspective-n-Point algorithm to balance accuracy, robustness, and computational efficiency. To address this issue, this paper introduces Hidden PnP, a hidden variable method. Following the parameterization of the rotation matrix by CGR parameters, the method, unlike the existing best matrix synthesis technique (Gröbner technology), does not require construction of a larger matrix elimination template in the polynomial solution phase. Therefore, it is able to solve CGR parameter rapidly, and achieve an accurate location of the solution using the Gauss–Newton method. According to the synthetic data test, the PnP algorithm solution, based on hidden variables, outperforms the existing best Perspective-n-Point method in accuracy and robustness, under cases of Ordinary 3D, Planar Case, and Quasi-Singular. Furthermore, its computational efficiency can be up to seven times that of existing excellent algorithms when the spatially redundant reference points are increased to 500. In physical experiments on pose reprojection from monocular cameras, this algorithm even showed higher accuracy than the best existing algorithm. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

Figure 1
<p>Schematic diagram of Gröbner generation framework.</p>
Full article ">Figure 2
<p>Coordinate system diagram of camera imaging model.</p>
Full article ">Figure 3
<p>Error comparison in the Ordinary 3D case.</p>
Full article ">Figure 4
<p>Error comparison in the Planar Case.</p>
Full article ">Figure 5
<p>Error comparison in the Quasi-Singular case.</p>
Full article ">Figure 6
<p>Error comparison in ordinary 3D case.</p>
Full article ">Figure 7
<p>Error comparison in Planar case.</p>
Full article ">Figure 8
<p>Error comparison in Quasi-singularity case.</p>
Full article ">Figure 9
<p>Comparison of the average running time of the PnP algorithm when the number of spatial reference points n ranged from 4 to 500.</p>
Full article ">Figure 10
<p>(<b>a</b>) Rmoncam G180 camera. (<b>b</b>) High-precision checkerboard calibration board.</p>
Full article ">Figure 11
<p>Flow chart of physical objects reprojection experiment.</p>
Full article ">Figure 12
<p>Extraction range of corner points on calibration board.</p>
Full article ">Figure 13
<p>Ten images of the calibration plate at different distances and attitudes and the reprojection effects.</p>
Full article ">
15 pages, 1736 KiB  
Article
LUMDE: Light-Weight Unsupervised Monocular Depth Estimation via Knowledge Distillation
by Wenze Hu, Xue Dong, Ning Liu and Yuanfeng Chen
Appl. Sci. 2022, 12(24), 12593; https://doi.org/10.3390/app122412593 - 8 Dec 2022
Cited by 2 | Viewed by 2118
Abstract
The use of the unsupervised monocular depth estimation network approach has seen rapid progress in recent years, as it avoids the use of ground truth data, and also because monocular cameras are readily available in most autonomous devices. Although some effective monocular depth [...] Read more.
The use of the unsupervised monocular depth estimation network approach has seen rapid progress in recent years, as it avoids the use of ground truth data, and also because monocular cameras are readily available in most autonomous devices. Although some effective monocular depth estimation networks have been reported previously, such as Monodepth2 and SC-SfMLearner, most of these approaches are still computationally expensive for lightweight devices. Therefore, in this paper, we introduced a knowledge-distillation-based approach named LUMDE, to deal with the pixel-by-pixel unsupervised monocular depth estimation task. Specifically, we use a teacher network and lightweight student network to distill the depth information, and further, integrate a pose network into the student module to improve the depth performance. Moreover, referring to the idea of the Generative Adversarial Network (GAN), the outputs of the student network and teacher network are taken as fake and real samples, respectively, and Transformer is introduced as the discriminator of GAN to further improve the depth prediction results. The proposed LUMDE method achieves state-of-the-art (SOTA) results in the knowledge distillation of unsupervised depth estimation and also outperforms the results of some dense networks. The proposed LUMDE model only loses 2.6% on δ1 accuracy on the NYUD-V2 dataset compared with the teacher network but reduces the computational complexity by 95.2%. Full article
(This article belongs to the Special Issue 3D Scene Understanding and Object Recognition)
Show Figures

Figure 1

Figure 1
<p>Overview of the LUMDE architecture. Baseline results are generated by the knowledge distillation module with pixel-wise loss, pair-wise loss, and holistic loss (CNN discriminator). PoseNet is further incorporated into the student network to improve model performance during training, and Transformer is adopted to replace CNN as the discriminator in the GAN.</p>
Full article ">Figure 2
<p>Qualitative comparison of depth ground truth, depth from teacher network, baseline, and LUMDE of indoor scenes, with RGB images shown in the first column.</p>
Full article ">Figure 3
<p>Qualitative comparison of depth ground truth, depth from teacher network, baseline, and LUMDE about outdoor traffic scenes, with RGB images shown in the first column.</p>
Full article ">Figure 4
<p>(<b>a</b>) Inference efficiency on an i5 core CPU; (<b>b</b>) Inference efficiency on an Nvidia GeForce MX330 GPU. In the inference stage, only the DepthNet is used to test inference efficiency. Pre- &amp; post-processing refer to the necessary procedures for inference such as image resizing, read &amp; write, and coloring. Image size is 640 × 320.</p>
Full article ">
Back to TopTop