[go: up one dir, main page]

 
 

Topic Editors

Prof. Dr. Junxing Zheng
School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, China
Dr. Peng Cao
Associate Professor, Faculty of Architecture, Civil and Transportation Engineering, Beijing University of Technology, Beijing 100084, China

3D Computer Vision and Smart Building and City, 2nd Volume

Abstract submission deadline
31 October 2024
Manuscript submission deadline
31 December 2024
Viewed by
19127

Topic Information

Dear Colleagues,

This Topic is a continuation of the previous successful Topic, "3D Computer Vision and Smart Building and City (https://www.mdpi.com/topics/3D_BIM)". Three-dimensional computer vision is an interdisciplinary subject involving computer vision, computer graphics, artificial intelligence and other fields. Its main contents include 3D perception, 3D understanding and 3D modeling. In recent years, 3D computer vision technology has developed rapidly and has been widely used in unmanned aerial vehicles, robots, autonomous driving, AR, VR and other fields. Smart buildings and cities use various information technologies or innovative concepts to connect as well as various systems and services so as to improve the efficiency of resource utilization, optimize management and services and improve quality of life. Smart buildings and cities can involve some frontier techniques, such as 3D CV for building information models, digital twins, city information models, simultaneous localization and mapping robots. The application of 3D computer vision in smart buildings and cities is a valuable research direction, but it still faces many major challenges. This topic focuses on the theory and technology of 3D computer vision in smart buildings and cities. We welcome papers that provide innovative technologies, theories or case studies in the relevant field.

Prof. Dr. Junxing Zheng
Dr. Peng Cao
Topic Editors

Keywords

  • smart buildings and cities
  • 3D computer vision
  • SLAM
  • building information model
  • city information model
  • robots

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Buildings
buildings
3.1 3.4 2011 17.2 Days CHF 2600 Submit
Drones
drones
4.4 5.6 2017 21.7 Days CHF 2600 Submit
Energies
energies
3.0 6.2 2008 17.5 Days CHF 2600 Submit
Sensors
sensors
3.4 7.3 2001 16.8 Days CHF 2600 Submit
Sustainability
sustainability
3.3 6.8 2009 20 Days CHF 2400 Submit
ISPRS International Journal of Geo-Information
ijgi
2.8 6.9 2012 36.2 Days CHF 1700 Submit

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (16 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
20 pages, 4137 KiB  
Article
A Minimal Solution Estimating the Position of Cameras with Unknown Focal Length with IMU Assistance
by Kang Yan, Zhenbao Yu, Chengfang Song, Hongping Zhang and Dezhong Chen
Drones 2024, 8(9), 423; https://doi.org/10.3390/drones8090423 - 24 Aug 2024
Viewed by 401
Abstract
Drones are typically built with integrated cameras and inertial measurement units (IMUs). It is crucial to achieve drone attitude control through relative pose estimation using cameras. IMU drift can be ignored over short periods. Based on this premise, in this paper, four methods [...] Read more.
Drones are typically built with integrated cameras and inertial measurement units (IMUs). It is crucial to achieve drone attitude control through relative pose estimation using cameras. IMU drift can be ignored over short periods. Based on this premise, in this paper, four methods are proposed for estimating relative pose and focal length across various application scenarios: for scenarios where the camera’s focal length varies between adjacent moments and is unknown, the relative pose and focal length can be computed from four-point correspondences; for planar motion scenarios where the camera’s focal length varies between adjacent moments and is unknown, the relative pose and focal length can be determined from three-point correspondences; for instances of planar motion where the camera’s focal length is equal between adjacent moments and is unknown, the relative pose and focal length can be calculated from two-point correspondences; finally, for scenarios where multiple cameras are employed for image acquisition but only one is calibrated, a method proposed for estimating the pose and focal length of uncalibrated cameras can be used. The numerical stability and performance of these methods are compared and analyzed under various noise conditions using simulated datasets. We also assessed the performance of these methods on real datasets captured by a drone in various scenes. The experimental results demonstrate that the method proposed in this paper achieves superior accuracy and stability to classical methods. Full article
Show Figures

Figure 1

Figure 1
<p><span class="html-italic">O</span><sub>1</sub> and <span class="html-italic">O</span><sub>2</sub> represent the camera center; <span class="html-italic">P</span> denotes the target feature point; <span class="html-italic">p</span><sub>1</sub> and <span class="html-italic">p</span><sub>2</sub> are the pixel coordinates of the feature points; <span class="html-italic">e</span><sub>1</sub> and <span class="html-italic">e</span><sub>2</sub> are epipoles, which are the points where the line connecting <span class="html-italic">O</span><sub>1</sub> and <span class="html-italic">O</span><sub>2</sub> intersects with the image plane; <span class="html-italic">O</span><sub>1</sub>, <span class="html-italic">O</span><sub>2</sub>, and <span class="html-italic">P</span> forms the epipolar plane; and <span class="html-italic">l</span><sub>1</sub> and <span class="html-italic">l</span><sub>2</sub> are the epipolar lines, which are the lines where the epipolar plane intersects with the image plane.</p>
Full article ">Figure 2
<p>Focal length error probability density for 10,000 randomly generated problem instances.</p>
Full article ">Figure 3
<p>Translation matrix error probability density for 10,000 randomly generated problem instances.</p>
Full article ">Figure 4
<p>Error variation curve of focal length <span class="html-italic">f</span> with different scale errors in pixel coordinates.</p>
Full article ">Figure 5
<p>Error variation curve of translation vector <b><span class="html-italic">t</span></b> with different scale errors in pixel coordinates.</p>
Full article ">Figure 6
<p>The error variation curves of eight methods when introducing different levels of noise into the three rotation angles with the IMU: (<b>a</b>) the median focal length error calculated after introducing pitch angle rotation errors; (<b>b</b>) the median focal length error calculated after introducing yaw angle rotation errors; (<b>c</b>) the median focal length error calculated after introducing roll angle rotation errors; (<b>d</b>) the median translation vector error calculated after introducing pitch angle rotation errors; (<b>e</b>) the median translation vector error calculated after introducing yaw angle rotation errors; (<b>f</b>) the median translation vector error calculated after introducing roll angle rotation errors.</p>
Full article ">Figure 7
<p>Images captured by the drone: (<b>a</b>) outdoor landscapes; (<b>b</b>) urban buildings; (<b>c</b>) road vehicles.</p>
Full article ">Figure 8
<p>Schematic of feature point extraction using the SIFT algorithm.</p>
Full article ">Figure 9
<p>Cumulative distribution functions of the estimated errors in camera focal length and translation vector across three scenarios: (<b>a</b>) the camera focal length error of outdoor landscapes; (<b>b</b>) the translation vector error of outdoor landscapes; (<b>c</b>) the camera focal length error of urban buildings; (<b>d</b>) the translation vector error of urban buildings; (<b>e</b>) the camera focal length error of road vehicles; (<b>f</b>) the translation vector error of road vehicles.</p>
Full article ">Figure 10
<p>Three-dimensional trajectory plot of real data.</p>
Full article ">Figure 11
<p>Two-dimensional trajectory plot of real data.</p>
Full article ">
23 pages, 63398 KiB  
Article
Automatic Generation of Standard Nursing Unit Floor Plan in General Hospital Based on Stable Diffusion
by Zhuo Han and Yongquan Chen
Buildings 2024, 14(9), 2601; https://doi.org/10.3390/buildings14092601 - 23 Aug 2024
Viewed by 372
Abstract
This study focuses on the automatic generation of architectural floor plans for standard nursing units in general hospitals based on Stable Diffusion. It aims at assisting architects in efficiently generating a variety of preliminary plan preview schemes and enhancing the efficiency of the [...] Read more.
This study focuses on the automatic generation of architectural floor plans for standard nursing units in general hospitals based on Stable Diffusion. It aims at assisting architects in efficiently generating a variety of preliminary plan preview schemes and enhancing the efficiency of the pre-planning stage of medical buildings. It includes dataset processing, model training, model testing and generation. It enables the generation of well-organized, clear, and readable functional block floor plans with strong generalization capabilities by inputting the boundaries of the nursing unit’s floor plan. Quantitative analysis demonstrated that 82% of the generated samples met the evaluation criteria for standard nursing units. Additionally, a comparative experiment was conducted using the same dataset to train a deep learning model based on Generative Adversarial Networks (GANs). The conclusion describes the strengths and limitations of the methodology, pointing out directions for improvement by future studies. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) the basic architecture of the SD model: Latent Diffusion Model [<a href="#B8-buildings-14-02601" class="html-bibr">8</a>]; (<b>b</b>) LoRA [<a href="#B9-buildings-14-02601" class="html-bibr">9</a>].</p>
Full article ">Figure 2
<p>Methodological framework of the experiment.</p>
Full article ">Figure 3
<p>Main and sub-corridor style nursing units floor plan dataset (portion).</p>
Full article ">Figure 4
<p>Stable Diffusion loss.</p>
Full article ">Figure 5
<p>Testing and Generation Framework.</p>
Full article ">Figure 6
<p>Sampling steps and denoising strength.</p>
Full article ">Figure 7
<p>Other hyperparameters remain unchanged; the seed is changed.</p>
Full article ">Figure 8
<p>Other hyperparameters remain unchanged; the input image is added to the main corridor and the seed is changed.</p>
Full article ">Figure 9
<p>ControlNet preprocessors.</p>
Full article ">Figure 10
<p>ControlNet preprocessors with the input image added to the main corridor.</p>
Full article ">Figure 11
<p>ControlNet Weight, Guidance Start.</p>
Full article ">Figure 12
<p>Preprocessor and Guidance End.</p>
Full article ">Figure 13
<p>Image-to-Image + ControlNet, change the seed.</p>
Full article ">Figure 14
<p>Parameter-controlled boundary generation.</p>
Full article ">Figure 15
<p>Area feature distribution.</p>
Full article ">
20 pages, 8685 KiB  
Article
Numerical Simulation and Field Monitoring of Blasting Vibration for Tunnel In-Situ Expansion by a Non-Cut Blast Scheme
by Zhenchang Guan, Lifu Xie, Dong Chen and Jingkang Shi
Sensors 2024, 24(14), 4546; https://doi.org/10.3390/s24144546 - 13 Jul 2024
Viewed by 812
Abstract
There have been ever more in-situ tunnel extension projects due to the growing demand for transportation. The traditional blast scheme requires a large quantity of explosive and the vibration effect is hard to control. In order to reduce explosive consumption and the vibration [...] Read more.
There have been ever more in-situ tunnel extension projects due to the growing demand for transportation. The traditional blast scheme requires a large quantity of explosive and the vibration effect is hard to control. In order to reduce explosive consumption and the vibration effect, an optimized non-cut blast scheme was proposed and applied to the in-situ expansion of the Gushan Tunnel. Refined numerical simulation was adopted to compare the traditional and optimized blast schemes. The vibration attenuation within the interlaid rock mass and the vibration effect on the adjacent tunnel were studied and compared. The simulation results were validated by the field monitoring of the vibration effect on the adjacent tunnel. Both the simulation and the monitoring results showed that the vibration velocity on the adjacent tunnel’s back side was much smaller than its counterpart on the blast side, i.e., the presence of cavity reduced the blasting vibration effect significantly. The optimized non-cut blast scheme, which effectively utilized the existing free surface, could reduce the explosive consumption and vibration effect significantly, and might be preferred for in-situ tunnel expansion projects. Full article
Show Figures

Figure 1

Figure 1
<p>The engineering practices of tunnel reconstruction or expansion.</p>
Full article ">Figure 2
<p>The typical cross-section of the Gushan tunnel before and after in-situ expansion (unit: m).</p>
Full article ">Figure 3
<p>The excavation sequence for the in-situ expansion of the north tunnel. The dashed areas represents the lining profile after in-situ expansion and the “+” areas represent the unexcavated rock mass.</p>
Full article ">Figure 4
<p>Traditional blast scheme for the top part of the north tunnel. Numbers represent detonator sequences. Plus sign represents unexcavated rock mass. Circles represent blast holes.</p>
Full article ">Figure 5
<p>Non-cut blast scheme for the top part of the north tunnel.</p>
Full article ">Figure 6
<p>Loading boundary of equivalent blasting load.</p>
Full article ">Figure 7
<p>Numerical model for the Gushan tunnel.</p>
Full article ">Figure 8
<p>Equivalent blasting loads for every detonator sequence in a traditional blast scheme.</p>
Full article ">Figure 9
<p>The implementation of equivalent blasting load for traditional blast scheme: (<b>a</b>) detonator sequence 8; (<b>b</b>) denotator sequence 14.</p>
Full article ">Figure 10
<p>Equivalent blasting load for every detonator sequence in the non-cut blast scheme.</p>
Full article ">Figure 11
<p>The implementation of equivalent blasting load for the non-cut blast scheme: (<b>a</b>) detonator sequence 8; (<b>b</b>) detonator sequence 14.</p>
Full article ">Figure 12
<p>Arrangement of numerical monitoring points (unit: m).</p>
Full article ">Figure 13
<p>The velocity–time histories and frequency spectra of the M6 monitoring point for the traditional blast scheme: (<b>a</b>) time history in the <span class="html-italic">X</span>-direction, (<b>b</b>) frequency spectrum in the <span class="html-italic">X</span>-direction, (<b>c</b>) time history in the <span class="html-italic">Z</span>-direction, (<b>d</b>) frequency spectrum in the <span class="html-italic">Z</span>-direction.</p>
Full article ">Figure 14
<p>The maximum velocities on the adjacent tunnel for the traditional blast scheme: (<b>a</b>) the <span class="html-italic">X</span>-direction; (<b>b</b>) the <span class="html-italic">Z</span>-direction. Units: cm/s.</p>
Full article ">Figure 15
<p>The velocity–time histories and frequency spectra of the M6 monitoring point for the non-cut blast scheme: (<b>a</b>) time history in the <span class="html-italic">X</span>-direction, (<b>b</b>) frequency spectrum in the <span class="html-italic">X</span>-direction, (<b>c</b>) time history in the <span class="html-italic">Z</span>-direction, (<b>d</b>) frequency spectrum in the <span class="html-italic">Z</span>-direction.</p>
Full article ">Figure 16
<p>The maximum velocities on the adjacent tunnel for the non-cut blast scheme: (<b>a</b>) the <span class="html-italic">X</span>-direction; (<b>b</b>) the Z-direction; units: cm/s.</p>
Full article ">Figure 17
<p>The maximum velocities within interlaid rock mass for the traditional blast scheme: (<b>a</b>) <span class="html-italic">X</span>-direction, (<b>b</b>) Z-direction.</p>
Full article ">Figure 18
<p>The maximum velocities within the interlaid rock mass for the non-cut blast scheme: (<b>a</b>) <span class="html-italic">X</span>-direction, (<b>b</b>) Z-direction.</p>
Full article ">Figure 19
<p>The upper part of NK18+110 section before and after expansion.</p>
Full article ">Figure 20
<p>Field monitoring for blasting vibration.</p>
Full article ">Figure 21
<p>Arrangement of blasting vibration meters.</p>
Full article ">Figure 22
<p>The velocity–time histories and frequency spectra recorded by field monitoring and compared with numerical simulation results: (<b>a</b>) time history of M6 in the <span class="html-italic">X</span>-direction, (<b>b</b>) frequency spectrum of M6 in the <span class="html-italic">X</span>-direction, (<b>c</b>) time history of M6 in the Z-direction, (<b>d</b>) frequency spectrum of M6 in the Z-direction, (<b>e</b>) time history of M7 in the <span class="html-italic">X</span>-direction, (<b>f</b>) frequency spectrum of M7 in the <span class="html-italic">X</span>-direction, (<b>g</b>) time history of M7 in the Z-direction, (<b>h</b>) frequency spectrum of M7 in the Z-direction.</p>
Full article ">
21 pages, 3782 KiB  
Article
Globally Optimal Relative Pose and Scale Estimation from Only Image Correspondences with Known Vertical Direction
by Zhenbao Yu, Shirong Ye, Changwei Liu, Ronghe Jin, Pengfei Xia and Kang Yan
ISPRS Int. J. Geo-Inf. 2024, 13(7), 246; https://doi.org/10.3390/ijgi13070246 - 9 Jul 2024
Viewed by 605
Abstract
Installing multi-camera systems and inertial measurement units (IMUs) in self-driving cars, micro aerial vehicles, and robots is becoming increasingly common. An IMU provides the vertical direction, allowing coordinate frames to be aligned in a common direction. The degrees of freedom (DOFs) of the [...] Read more.
Installing multi-camera systems and inertial measurement units (IMUs) in self-driving cars, micro aerial vehicles, and robots is becoming increasingly common. An IMU provides the vertical direction, allowing coordinate frames to be aligned in a common direction. The degrees of freedom (DOFs) of the rotation matrix are reduced from 3 to 1. In this paper, we propose a globally optimal solver to calculate the relative poses and scale of generalized cameras with a known vertical direction. First, the cost function is established to minimize algebraic error in the least-squares sense. Then, the cost function is transformed into two polynomials with only two unknowns. Finally, the eigenvalue method is used to solve the relative rotation angle. The performance of the proposed method is verified on both simulated and KITTI datasets. Experiments show that our method is more accurate than the existing state-of-the-art solver in estimating the relative pose and scale. Compared to the best method among the comparison methods, the method proposed in this paper reduces the rotation matrix error, translation vector error, and scale error by 53%, 67%, and 90%, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>The rotation matrix, translation vector, and scale are <math display="inline"><semantics> <mstyle mathvariant="bold" mathsize="normal"> <mi>R</mi> </mstyle> </semantics></math>, <math display="inline"><semantics> <mstyle mathvariant="bold" mathsize="normal"> <mi>t</mi> </mstyle> </semantics></math>, and <math display="inline"><semantics> <mi mathvariant="normal">s</mi> </semantics></math>, respectively.</p>
Full article ">Figure 2
<p>The rotation matrix and translation vector of the <math display="inline"><semantics> <mi>i</mi> </semantics></math>-th camera in the <math display="inline"><semantics> <mi>k</mi> </semantics></math> frame are <math display="inline"><semantics> <mrow> <msub> <mstyle mathvariant="bold" mathsize="normal"> <mi>R</mi> </mstyle> <mrow> <mi>k</mi> <mi>i</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mstyle mathvariant="bold" mathsize="normal"> <mi>t</mi> </mstyle> <mrow> <mi>k</mi> <mi>i</mi> </mrow> </msub> </mrow> </semantics></math>. The rotation matrix and translation vector of the <math display="inline"><semantics> <mi>j</mi> </semantics></math>-th camera in the <math display="inline"><semantics> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </semantics></math> frame are <math display="inline"><semantics> <mrow> <msub> <mstyle mathvariant="bold" mathsize="normal"> <mi>R</mi> </mstyle> <mrow> <msup> <mi>k</mi> <mo>′</mo> </msup> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mstyle mathvariant="bold" mathsize="normal"> <mi>t</mi> </mstyle> <mrow> <msup> <mi>k</mi> <mo>′</mo> </msup> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math>. The rotation matrix, translation vector, and scale vector between aligned <math display="inline"><semantics> <mi>k</mi> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </semantics></math> are <math display="inline"><semantics> <mrow> <msub> <mstyle mathvariant="bold" mathsize="normal"> <mi>R</mi> </mstyle> <mi>y</mi> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mstyle mathvariant="bold" mathsize="normal"> <mover accent="true"> <mi>t</mi> <mo>˜</mo> </mover> </mstyle> </semantics></math>, and <math display="inline"><semantics> <mi>s</mi> </semantics></math>.</p>
Full article ">Figure 3
<p>Algorithm flow chart.</p>
Full article ">Figure 4
<p>Effect of the number of feature points on the accuracy of rotation, translation, and scale estimation by the method proposed in this paper with different feature points. (<b>a</b>) Rotation error (degree); (<b>b</b>) translation error (degree); (<b>c</b>) translation error; (<b>d</b>) scale error.</p>
Full article ">Figure 5
<p>Estimating errors in the rotation matrix, translation vector, and scale information under random motion. The first column shows the calculation results of adding image noise. The second column shows the calculation results of adding pitch angle noise. The third column shows the calculation results of adding roll angle noise. The first, second, third and fourth rows represent the values of <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mstyle mathvariant="bold" mathsize="normal"> <mi>R</mi> </mstyle> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mstyle mathvariant="bold" mathsize="normal"> <mi>t</mi> </mstyle> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mrow> <mstyle mathvariant="bold" mathsize="normal"> <mi>t</mi> </mstyle> <mo>,</mo> <mi>dir</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mi>s</mi> </msub> </mrow> </semantics></math> respectively.</p>
Full article ">Figure 6
<p>Estimating errors in the rotation matrix, translation vector, and scale information under planar motion. The first column shows the calculation results of adding image noise. The second column shows the calculation results of adding pitch angle noise. The third column shows the calculation results of adding roll angle noise. The first, second, third and fourth rows represent the values of <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mstyle mathvariant="bold" mathsize="normal"> <mi>R</mi> </mstyle> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mstyle mathvariant="bold" mathsize="normal"> <mi>t</mi> </mstyle> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mrow> <mstyle mathvariant="bold" mathsize="normal"> <mi>t</mi> </mstyle> <mo>,</mo> <mi>dir</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mi>s</mi> </msub> </mrow> </semantics></math> respectively.</p>
Full article ">Figure 7
<p>Estimating errors in the rotation matrix, translation vector, and scale information under sideways motion. The first column shows the calculation results of adding image noise. The second column shows the calculation results of adding pitch angle noise. The third column shows the calculation results of adding roll angle noise. The first, second, third and fourth rows represent the values of <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mstyle mathvariant="bold" mathsize="normal"> <mi>R</mi> </mstyle> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mstyle mathvariant="bold" mathsize="normal"> <mi>t</mi> </mstyle> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mrow> <mstyle mathvariant="bold" mathsize="normal"> <mi>t</mi> </mstyle> <mo>,</mo> <mi>dir</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mi>s</mi> </msub> </mrow> </semantics></math> respectively.</p>
Full article ">Figure 8
<p>Estimating errors in the rotation matrix, translation vector, and scale information under forward motion. The first column shows the calculation results of adding image noise. The second column shows the calculation results of adding pitch angle noise. The third column shows the calculation results of adding roll angle noise. The first, second, third and fourth rows represent the values of <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mstyle mathvariant="bold" mathsize="normal"> <mi>R</mi> </mstyle> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mstyle mathvariant="bold" mathsize="normal"> <mi>t</mi> </mstyle> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mrow> <mstyle mathvariant="bold" mathsize="normal"> <mi>t</mi> </mstyle> <mo>,</mo> <mi>dir</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>ε</mi> <mi>s</mi> </msub> </mrow> </semantics></math> respectively.</p>
Full article ">Figure 9
<p>Test image pair from KITTI dataset with feature detection.</p>
Full article ">
29 pages, 28612 KiB  
Article
Synergistic Landscape Design Strategies to Renew Thermal Environment: A Case Study of a Cfa-Climate Urban Community in Central Komatsu City, Japan
by Jing Xiao, Takaya Yuizono and Ruixuan Li
Sustainability 2024, 16(13), 5582; https://doi.org/10.3390/su16135582 - 29 Jun 2024
Viewed by 778
Abstract
An effective community landscape design consistently impacts thermally comfortable outdoor conditions and climate adaptation. Therefore, constructing sustainable communities requires a resilience assessment of existing built environments for optimal design mechanisms, especially the renewal of thermally resilient communities in densely populated cities. However, the [...] Read more.
An effective community landscape design consistently impacts thermally comfortable outdoor conditions and climate adaptation. Therefore, constructing sustainable communities requires a resilience assessment of existing built environments for optimal design mechanisms, especially the renewal of thermally resilient communities in densely populated cities. However, the current community only involves green space design and lacks synergistic landscape design for renewing the central community. The main contribution of this study is that it reveals a three-level optimization method to validate the Synergistic Landscape Design Strategies (SLDS) (i.e., planting, green building envelope, water body, and urban trees) for renewing urban communities. A typical Japanese community in central Komatsu City was selected to illustrate the simulation-based design strategies. The microclimate model ENVI-met reproduces communities involving 38 case implementations to evaluate the physiologically equivalent temperature (PET) and microclimate condition as a measure of the thermal environments in humid subtropical climates. The simulation results indicated that the single-family buildings and real estate flats were adapted to the summer thermal mitigation strategy of water bodies and green roofs (W). In small-scale and large-scale models, the mean PET was lowered by 1.4–5.0 °C (0.9–2.3 °C), and the cooling effect reduced mean air temperature by 0.4–2.3 °C (0.5–0.8 °C) and improved humidification by 3.7–15.2% (3.7–5.3%). The successful SLDS provides precise alternatives for realizing Sustainable Development Goals (SDGs) in the renewal of urban communities. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The methodology for microclimate simulation sets an optimization mechanism (from a to d) in the Synergistic Landscape Design Strategies (SLDS) to renew the thermal environment in urban communities.</p>
Full article ">Figure 2
<p>Local sample communities include the single-family building community (HC), the real estate flat community (AC), and the mixed cluster community (BC).</p>
Full article ">Figure 3
<p>(<b>a</b>) Köppen-Geiger climate classification of Cfa, in Japan (with black square area); (<b>b</b>) surface temperature change in Japan from 2019 to 2021 for August warming; (<b>c</b>) maximum air temperature (T<sub>a</sub>) in August from 2019 to 2021; and (<b>d</b>) annual mean air temperature (T<sub>a</sub>) in Komatsu City from 2019 to 2021.</p>
Full article ">Figure 4
<p>Japanese community is in two building forms (A and H types) and wall material settings in the ENVI-met.</p>
Full article ">Figure 5
<p>(<b>a</b>) ArcGIS analysis of Komatsu City using urban surface categories, (<b>b</b>) vegetation cover distribution, and (<b>c</b>) urban heat island (UHI) effect.</p>
Full article ">Figure 6
<p>Axonometric diagrams for all design cases using the Synergistic Landscape design strategies (SLDS) in three sample communities (HC, AC, and BC areas).</p>
Full article ">Figure 7
<p>Planting design cases for small-scale models (HC and AC areas) with the position of receptors in ENVI-met.</p>
Full article ">Figure 8
<p>Synergistic landscape design cases for large-scale models (BC area) in ENVI-met.</p>
Full article ">Figure 9
<p>Validation of linear fit for monitored and modeled air temperature (T<sub>a</sub>) and relative humidity (RH) in local sample communities (HC, AC, and BC areas) at small and large scales.</p>
Full article ">Figure 10
<p>Simulation results of the microclimate variations in sample communities at small-large scales.</p>
Full article ">Figure 11
<p>ENVI-met Simulation results on the physiologically equivalent temperature (PET) distribution and the mitigation time at a pedestrian height of 1.8 m.</p>
Full article ">Figure 12
<p>Distribution maps of the PET thermal index at 14:00 simulated with planting design strategies (L1, L2, L3, and L4) in two small-scale communities.</p>
Full article ">Figure 13
<p>Distribution maps of the PET thermal index at 14:00 under green building envelope (GBE) design strategies (“R, F, and W”) renewed in the HC and AC areas based on planting design (L1–4).</p>
Full article ">Figure 14
<p>Distribution maps of the PET thermal index at 14:00 under urban tree effect (W-Ga-f) based water body and green roof (W) design of the BC area.</p>
Full article ">
14 pages, 3735 KiB  
Article
Learning Effective Geometry Representation from Videos for Self-Supervised Monocular Depth Estimation
by Hailiang Zhao, Yongyi Kong, Chonghao Zhang, Haoji Zhang and Jiansen Zhao
ISPRS Int. J. Geo-Inf. 2024, 13(6), 193; https://doi.org/10.3390/ijgi13060193 - 11 Jun 2024
Viewed by 802
Abstract
Recent studies on self-supervised monocular depth estimation have achieved promising results, which are mainly based on the joint optimization of depth and pose estimation via high-level photometric loss. However, how to learn the latent and beneficial task-specific geometry representation from videos is still [...] Read more.
Recent studies on self-supervised monocular depth estimation have achieved promising results, which are mainly based on the joint optimization of depth and pose estimation via high-level photometric loss. However, how to learn the latent and beneficial task-specific geometry representation from videos is still far from being explored. To tackle this issue, we propose two novel schemes to learn more effective representation from monocular videos: (i) an Inter-task Attention Model (IAM) to learn the geometric correlation representation between the depth and pose learning networks to make structure and motion information mutually beneficial; (ii) a Spatial-Temporal Memory Module (STMM) to exploit long-range geometric context representation among consecutive frames both spatially and temporally. Systematic ablation studies are conducted to demonstrate the effectiveness of each component. Evaluations on KITTI show that our method outperforms current state-of-the-art techniques. Full article
Show Figures

Figure 1

Figure 1
<p>Comparison of the learning process of the general pipeline (<b>a</b>) and our method (<b>b</b>) for self-supervised monocular depth estimation. Different from the general pipeline that learns the depth feature <math display="inline"><semantics> <msub> <mi>F</mi> <mi>D</mi> </msub> </semantics></math> and the pose feature <math display="inline"><semantics> <msub> <mi>F</mi> <mi>P</mi> </msub> </semantics></math> separately using a 2D photometric loss <span class="html-italic">L</span>, we propose a new scheme for learning better representation from videos. A memory mechanism <span class="html-italic">M</span> is devised to exploit the long-range context from videos for depth feature learning. An inter-task attention mechanism <span class="html-italic">A</span> is devised to leverage depth information for helping pose feature learning, which inversely benefits depth feature learning as well via gradient back-propagation.</p>
Full article ">Figure 2
<p>Illustration of our network framework (<b>a</b>) and the architecture of the IAM (<b>b</b>) and the STMM (<b>c</b>). The network takes three consecutive frames as input to learn the long-range geometric correlation representation by introducing STMM after the encoder. The pose network is split into two branches to predict rotation <span class="html-italic">R</span> and translation <span class="html-italic">t</span> separately. The IAM is applied after the second convolution layer of both <span class="html-italic">R</span> and <span class="html-italic">t</span> branches, learning valuable geometry information to assist <span class="html-italic">R</span> and <span class="html-italic">t</span> branches in leveraging inter-task correlation representation.</p>
Full article ">Figure 3
<p>Qualitative results on KITTI test set. Our method produces more accurate depth maps with low-texture regions, moving vehicles, delicate structures, and object boundaries.</p>
Full article ">Figure 4
<p>Visual results evaluated on the Cityscapes dataset. The evaluation uses models trained on KITTI without any refinement. Compared with the methods in [<a href="#B2-ijgi-13-00193" class="html-bibr">2</a>], our method generates higher-quality depth maps and captures moving and slim objects better. The difference is highlighted with the dashed circles.</p>
Full article ">Figure 5
<p>The visualization of learned attention maps in the IAM. It indicates the IAM places distinct emphasis on different regions for two branches to improve their estimation.</p>
Full article ">Figure 6
<p>Visual comparison of the visual odometry trajectories. Full trajectories are plotted using the Evo visualization tool [<a href="#B51-ijgi-13-00193" class="html-bibr">51</a>].</p>
Full article ">
16 pages, 7679 KiB  
Article
A 3D Parameterized BIM-Modeling Method for Complex Engineering Structures in Building Construction Projects
by Lijun Yang, Xuexiang Gao, Song Chen, Qianyao Li and Shuo Bai
Buildings 2024, 14(6), 1752; https://doi.org/10.3390/buildings14061752 - 11 Jun 2024
Viewed by 777
Abstract
The structural components of large-scale public construction projects are more complex than those of ordinary residential buildings, with irregular and diverse components, as well as a large number of repetitive structural elements, which increase the difficulty of BIM-modeling operations. Additionally, there is a [...] Read more.
The structural components of large-scale public construction projects are more complex than those of ordinary residential buildings, with irregular and diverse components, as well as a large number of repetitive structural elements, which increase the difficulty of BIM-modeling operations. Additionally, there is a significant amount of inherent parameter information in the construction process, which puts forward higher requirements for the application and management capabilities of BIM technology. However, the current BIM software still has deficiencies in the parameterization of complex and irregular structural components, fine modeling, and project management information. To address these issues, this paper takes Grasshopper as the core parametric tool and Revit as the carrier of component attribute information. It investigates the parametric modeling logic of Grasshopper and combines the concepts of parameterization, modularization, standardization, and engineering practicality to create a series of parametric programs for complex structural components in building projects. This approach mainly addresses intricate challenges pertaining to the parametric structural shapes (including batch processing) and parametric structural attributes (including the batch processing of diverse attribute parameters), thereby ensuring the efficiency in BIM modeling throughout the design and construction phases of complex building projects. Full article
Show Figures

Figure 1

Figure 1
<p>BIM parameterized digital graphic representation.</p>
Full article ">Figure 2
<p>Vector representation in Grasshopper.</p>
Full article ">Figure 3
<p>Differences in different types of data structures and operations.</p>
Full article ">Figure 4
<p>The three matching modes: (<b>a</b>) Matching with Longest List; (<b>b</b>) Matching with Shortest List; and (<b>c</b>) Cross-List Data Matching.</p>
Full article ">Figure 5
<p>The process of forming points, lines, and surfaces.</p>
Full article ">Figure 6
<p>Diversified movement methods of components: (<b>a</b>) object translation along direction; and (<b>b</b>) object rotation around axis.</p>
Full article ">Figure 7
<p>Type and section of retaining wall: (<b>a</b>) Type a; (<b>b</b>) Type b; (<b>c</b>) Type c; (<b>d</b>) Type d; (<b>e</b>) Type e; (<b>f</b>) Type f.</p>
Full article ">Figure 8
<p>The parameterization process of GH for retaining walls.</p>
Full article ">Figure 9
<p>Implementation process of structural positioning.</p>
Full article ">Figure 10
<p>Parameter settings and model parameterization creation.</p>
Full article ">Figure 11
<p>Parameterized variable display of staircase structure.</p>
Full article ">Figure 12
<p>Parameterized node connection display of staircase structure.</p>
Full article ">Figure 13
<p>Draw parameterized staircase structure based on projection lines.</p>
Full article ">
20 pages, 13136 KiB  
Article
DSOMF: A Dynamic Environment Simultaneous Localization and Mapping Technique Based on Machine Learning
by Shengzhe Yue, Zhengjie Wang and Xiaoning Zhang
Sensors 2024, 24(10), 3063; https://doi.org/10.3390/s24103063 - 11 May 2024
Viewed by 747
Abstract
To address the challenges of reduced localization accuracy and incomplete map construction demonstrated using classical semantic simultaneous localization and mapping (SLAM) algorithms in dynamic environments, this study introduces a dynamic scene SLAM technique that builds upon direct sparse odometry (DSO) and incorporates instance [...] Read more.
To address the challenges of reduced localization accuracy and incomplete map construction demonstrated using classical semantic simultaneous localization and mapping (SLAM) algorithms in dynamic environments, this study introduces a dynamic scene SLAM technique that builds upon direct sparse odometry (DSO) and incorporates instance segmentation and video completion algorithms. While prioritizing the algorithm’s real-time performance, we leverage the rapid matching capabilities of Direct Sparse Odometry (DSO) to link identical dynamic objects in consecutive frames. This association is achieved through merging semantic and geometric data, thereby enhancing the matching accuracy during image tracking through the inclusion of semantic probability. Furthermore, we incorporate a loop closure module based on video inpainting algorithms into our mapping thread. This allows our algorithm to rely on the completed static background for loop closure detection, further enhancing the localization accuracy of our algorithm. The efficacy of this approach is validated using the TUM and KITTI public datasets and the unmanned platform experiment. Experimental results show that, in various dynamic scenes, our method achieves an improvement exceeding 85% in terms of localization accuracy compared with the DSO system. Full article
Show Figures

Figure 1

Figure 1
<p>Algorithm framework.</p>
Full article ">Figure 2
<p>Image processing workflow (The blue box in the diagram represents the tracking thread, while the orange box represents the mapping thread).</p>
Full article ">Figure 3
<p>Dynamic object instance segmentation results. (<b>a</b>) Original image; (<b>b</b>) mask image.</p>
Full article ">Figure 4
<p>Optical flow tracking of regional centroids.</p>
Full article ">Figure 5
<p>Delineation of dynamic and static regions.</p>
Full article ">Figure 6
<p>Comparative analysis of mapping outcomes pre and post dynamic object elimination.</p>
Full article ">Figure 7
<p>FGVC optical flow completion process.</p>
Full article ">Figure 8
<p>Comparison of absolute trajectory error for the camera on the TUM dataset.</p>
Full article ">Figure 9
<p>Video completion process in KITTI-04 dataset.</p>
Full article ">Figure 10
<p>Map construction effect of DSOMF in the KITTI-04 dataset.</p>
Full article ">Figure 11
<p>Unmanned flight platform SLAM algorithm test system.</p>
Full article ">Figure 12
<p>Top view of fixed wing aircraft.</p>
Full article ">Figure 13
<p>Loop closure detection experiment. (<b>a</b>) The loop closure detection module runs; (<b>b</b>) the loop closure detection not runs.</p>
Full article ">Figure 14
<p>SLAM algorithm test system for unmanned ground platform.</p>
Full article ">Figure 15
<p>Comparison of outdoor dynamic environment trajectories (In the real-life scenario, the outlined boxes represent the trajectories of dynamic objects. Route one denotes the path of vehicles, while route two signifies pedestrian pathways).</p>
Full article ">Figure 16
<p>Environment image. (<b>a</b>) Pedestrian environment image; (<b>b</b>) electric vehicle environment image.</p>
Full article ">
15 pages, 2894 KiB  
Article
Phase Error Reduction for a Structured-Light 3D System Based on a Texture-Modulated Reprojection Method
by Chenbo Shi, Zheng Qin, Xiaowei Hu, Changsheng Zhu, Yuanzheng Mo, Zelong Li, Shaojia Yan, Yue Yu, Xiangteng Zang and Chun Zhang
Sensors 2024, 24(7), 2075; https://doi.org/10.3390/s24072075 - 24 Mar 2024
Viewed by 995
Abstract
Fringe projection profilometry (FPP), with benefits such as high precision and a large depth of field, is a popular 3D optical measurement method widely used in precision reconstruction scenarios. However, the pixel brightness at reflective edges does not satisfy the conditions of the [...] Read more.
Fringe projection profilometry (FPP), with benefits such as high precision and a large depth of field, is a popular 3D optical measurement method widely used in precision reconstruction scenarios. However, the pixel brightness at reflective edges does not satisfy the conditions of the ideal pixel-wise phase-shifting model due to the influence of scene texture and system defocus, resulting in severe phase errors. To address this problem, we theoretically analyze the non-pixel-wise phase propagation model for texture edges and propose a reprojection strategy based on scene texture modulation. The strategy first obtains the reprojection weight mask by projecting typical FPP patterns and calculating the scene texture reflection ratio, then reprojects stripe patterns modulated by the weight mask to eliminate texture edge effects, and finally fuses coarse and refined phase maps to generate an accurate phase map. We validated the proposed method on various texture scenes, including a smooth plane, depth surface, and curved surface. Experimental results show that the root mean square error (RMSE) of the phase at the texture edge decreased by 53.32%, proving the effectiveness of the reprojection strategy in eliminating depth errors at texture edges. Full article
Show Figures

Figure 1

Figure 1
<p>Measurement effect of traditional FPP method. (<b>a</b>) Smooth texture plane; (<b>b</b>) traditional FPP method measurement results.</p>
Full article ">Figure 2
<p>The process of capturing the intensity change in the stripe image by the camera.</p>
Full article ">Figure 3
<p>The relationship between camera defocus and phase in the scene. (<b>a</b>) The camera captures scene intensity; (<b>b</b>) the two-dimensional Gaussian distribution; (<b>c</b>) the phase value of (<b>a</b>).</p>
Full article ">Figure 4
<p>Computational framework of our proposed method.</p>
Full article ">Figure 5
<p>Modulation mask. (<b>a</b>) Maximum light modulation pattern; (<b>b</b>) scene image after adding mask; (<b>c</b>) gradient absolute value image; (<b>d</b>) absolute gradient value and phase comparison of line drawing positions.</p>
Full article ">Figure 6
<p>Phase error analysis of simulated modulated scene. (<b>a</b>) Simulation of original and modulated measurement scene pictures; (<b>b</b>) original phase error and modulated phase error; (<b>c</b>) comparison of the phase error.</p>
Full article ">Figure 7
<p>Structured-light 3D reconstruction system platform.</p>
Full article ">Figure 8
<p>Actual measurement objects. (<b>a</b>) Smooth scene with only texture edges; (<b>b</b>) Scenes affected by both depth edges and texture edges; (<b>c</b>) Smooth surfaces affected by only texture edges; (<b>d</b>) Scenes with different depths of field.</p>
Full article ">Figure 9
<p>Comparison of measurement results for different depth differences. (<b>a</b>) Original scene image; (<b>b</b>) original depth map; (<b>c</b>) comparison of local ROI regions; (<b>d</b>) modulated scene image; (<b>e</b>) fusion depth map; (<b>f</b>) comparison of original depth curve (red), fusion depth curve (blue), and ground truth (black).</p>
Full article ">Figure 10
<p>Comparison of measurement scenes only modulated by texture. (<b>a</b>) Original measurement scene; (<b>b</b>) original depth map; (<b>c</b>) fusion depth map; (<b>d</b>) depth comparison between position A and position B; (<b>e</b>) depth comparison between position C and position D; (<b>f</b>) depth comparison between position E and position F.</p>
Full article ">Figure 11
<p>Comparison of measured effects on scenes modulated by depth and texture. (<b>a</b>) Original scene image; (<b>b</b>) original depth map; (<b>c</b>) comparison of local ROI regions; (<b>d</b>) modulated scene image; (<b>e</b>) fusion depth map; (<b>f</b>) comparison of original depth curve, modulated depth curve, and true depth curve.</p>
Full article ">Figure 12
<p>Comparison of measurements of different texture widths. (<b>a</b>) Original measurement scene; (<b>b</b>) modulated scene image; (<b>c</b>) original depth map; (<b>d</b>) fusion depth map; (<b>e</b>) ROI of original measurement scene; (<b>f</b>–<b>l</b>) original depth (red), fusion depth (blue), and actual value (black) comparison at positions A–G in (<b>e</b>).</p>
Full article ">Figure 13
<p>Measurement experiments under different depths of field. (<b>a</b>) The original scene image; (<b>b</b>) the original depth map; (<b>c</b>) the original depth map compared with the depth curve of position A in the fusion depth map; (<b>d</b>) the modulated scene image; (<b>e</b>) the fusion depth map; (<b>f</b>) the original depth map compared with the depth curve of position B in the fused depth map.</p>
Full article ">Figure 14
<p>Comparison of the modulation effects of different light intensities. (<b>a</b>) The reconstruction effect of the traditional method; (<b>b</b>–<b>d</b>) the reconstruction result when the modulated light intensity is 220, 90, and 50.</p>
Full article ">
21 pages, 25891 KiB  
Article
An Improved TransMVSNet Algorithm for Three-Dimensional Reconstruction in the Unmanned Aerial Vehicle Remote Sensing Domain
by Jiawei Teng, Haijiang Sun, Peixun Liu and Shan Jiang
Sensors 2024, 24(7), 2064; https://doi.org/10.3390/s24072064 - 23 Mar 2024
Viewed by 903
Abstract
It is important to achieve the 3D reconstruction of UAV remote sensing images in deep learning-based multi-view stereo (MVS) vision. The lack of obvious texture features and detailed edges in UAV remote sensing images leads to inaccurate feature point matching or depth estimation. [...] Read more.
It is important to achieve the 3D reconstruction of UAV remote sensing images in deep learning-based multi-view stereo (MVS) vision. The lack of obvious texture features and detailed edges in UAV remote sensing images leads to inaccurate feature point matching or depth estimation. To address this problem, this study improves the TransMVSNet algorithm in the field of 3D reconstruction by optimizing its feature extraction network and costumed body depth prediction network. The improvement is mainly achieved by extracting features with the Asymptotic Pyramidal Network (AFPN) and assigning weights to different levels of features through the ASFF module to increase the importance of key levels and also using the UNet structured network combined with an attention mechanism to predict the depth information, which also extracts the key area information. It aims to improve the performance and accuracy of the TransMVSNet algorithm’s 3D reconstruction of UAV remote sensing images. In this work, we have performed comparative experiments and quantitative evaluation with other algorithms on the DTU dataset as well as on a large UAV remote sensing image dataset. After a large number of experimental studies, it is shown that our improved TransMVSNet algorithm has better performance and robustness, providing a valuable reference for research and application in the field of 3D reconstruction of UAV remote sensing images. Full article
Show Figures

Figure 1

Figure 1
<p>The architecture of the proposed asymptotic feature pyramid network (AFPN). AFPN is initiated by fusing two neighboring low-level features and progressively incorporating high-level features into the fusion process.</p>
Full article ">Figure 2
<p>Cost volume regularization network: (<b>a</b>) the overall network, (<b>b</b>) the UBA layer, and (<b>c</b>) the CCA module in the UBA layer.</p>
Full article ">Figure 3
<p>Fully connected (FC) network structure.</p>
Full article ">Figure 4
<p>Comparison of depth prediction results for Scan1, where (<b>a</b>–<b>d</b>) are the results of our algorithm and (<b>e</b>–<b>h</b>) are the results of the original algorithm.</p>
Full article ">Figure 5
<p>Comparison of depth prediction results for Scan4, where (<b>a</b>–<b>d</b>) are the results of our algorithm and (<b>e</b>–<b>h</b>) are the results of the original algorithm.</p>
Full article ">Figure 6
<p>Comparison of depth prediction results for Scan9, where (<b>a</b>–<b>d</b>) are the results of our algorithm and (<b>e</b>–<b>h</b>) are the results of the original algorithm.</p>
Full article ">Figure 7
<p>Comparison of depth prediction results for Scan10, where (<b>a</b>–<b>d</b>) are the results of our algorithm and (<b>e</b>–<b>h</b>) are the results of the original algorithm.</p>
Full article ">Figure 8
<p>(<b>a</b>–<b>h</b>) are overhead drone images of buildings from the drone mapping dataset Pix4D. Scene 1 is an unfinished building and Scene 2 is a residential home in Chicago, IL, USA.</p>
Full article ">Figure 9
<p>Depth map of the first scene in <a href="#sensors-24-02064-f008" class="html-fig">Figure 8</a>: (<b>a</b>–<b>d</b>) are depth prediction images of our improved algorithm; (<b>e</b>–<b>h</b>) are depth prediction images of the original algorithm.</p>
Full article ">Figure 10
<p>Depth map of the second scene in <a href="#sensors-24-02064-f008" class="html-fig">Figure 8</a>: (<b>a</b>–<b>d</b>) are depth prediction images of our improved algorithm; (<b>e</b>–<b>h</b>) are depth prediction images of the original algorithm.</p>
Full article ">Figure 11
<p>The 3D reconstruction results of the improved algorithm: (<b>a</b>–<b>d</b>) for Scene 1 and (<b>e</b>–<b>h</b>) for Scene 2 in <a href="#sensors-24-02064-f008" class="html-fig">Figure 8</a>.</p>
Full article ">Figure 12
<p>Self-constructed wrap-around robotic arm overhead shooting scene dataset: (<b>a</b>–<b>d</b>) are Scene 1, which mainly includes schools and stadiums; (<b>e</b>–<b>h</b>) are Scene 2, which mainly includes residential areas.</p>
Full article ">Figure 13
<p>(<b>a</b>–<b>p</b>) contain the comparison of the depth maps of the two self-built scenario datasets: Scene 1 is the result of <a href="#sensors-24-02064-f013" class="html-fig">Figure 13</a>, and Scene 2 is the result of <a href="#sensors-24-02064-f014" class="html-fig">Figure 14</a>.</p>
Full article ">Figure 14
<p>(<b>a</b>–<b>h</b>) are the three-dimensional reconstruction modeling diagram of the two scenes in <a href="#sensors-24-02064-f012" class="html-fig">Figure 12</a>.</p>
Full article ">Figure 15
<p>(<b>a</b>–<b>h</b>) are iconic site in Lausanne, showcasing the city’s beauty and history. It is the capital of the canton of Vaud, Switzerland, from the unmanned remote sensing dataset Pix4D.</p>
Full article ">Figure 16
<p>Depth prediction image of the scene in <a href="#sensors-24-02064-f015" class="html-fig">Figure 15</a>, comparing our improved algorithm with TransMVSNet, where (<b>a</b>–<b>d</b>) are the results of our algorithm and (<b>e</b>–<b>h</b>) are the results of the TransMVSNet.</p>
Full article ">Figure 17
<p>(<b>a</b>–<b>d</b>) are the three-dimensional reconstructed model view of the scene in <a href="#sensors-24-02064-f015" class="html-fig">Figure 15</a>.</p>
Full article ">Figure 18
<p>(<b>a</b>–<b>e</b>) show coastal mountain village in the capital of the canton of Vaud, district of Lausanne, Switzerland, photographed on Pix4D.</p>
Full article ">Figure 19
<p>Depth map of the scene in <a href="#sensors-24-02064-f018" class="html-fig">Figure 18</a>: (<b>a</b>–<b>d</b>) are depth prediction images of our improved algorithm; (<b>e</b>–<b>h</b>) are depth prediction images of the original algorithm.</p>
Full article ">Figure 20
<p>(<b>a</b>–<b>d</b>) are the three-dimensional reconstruction model view of the scene in <a href="#sensors-24-02064-f018" class="html-fig">Figure 18</a>.</p>
Full article ">Figure 21
<p>Comparison with state-of-the-art deep learning-based MVS methods on DTU dataset (lower is better).</p>
Full article ">Figure 22
<p>Quantitative experiments: (<b>a</b>–<b>d</b>) are the first to quantify the number of scans in three planes, (<b>e</b>–<b>h</b>) are the second to quantify the number of scans in three planes, and (<b>i</b>–<b>l</b>) are the third to quantify the number of scans in three planes.</p>
Full article ">
32 pages, 8391 KiB  
Article
Model-Based 3D Gaze Estimation Using a TOF Camera
by Kuanxin Shen, Yingshun Li, Zhannan Guo, Jintao Gao and Yingjian Wu
Sensors 2024, 24(4), 1070; https://doi.org/10.3390/s24041070 - 6 Feb 2024
Viewed by 1473
Abstract
Among the numerous gaze-estimation methods currently available, appearance-based methods predominantly use RGB images as input and employ convolutional neural networks (CNNs) to detect facial images to regressively obtain gaze angles or gaze points. Model-based methods require high-resolution images to obtain a clear eyeball [...] Read more.
Among the numerous gaze-estimation methods currently available, appearance-based methods predominantly use RGB images as input and employ convolutional neural networks (CNNs) to detect facial images to regressively obtain gaze angles or gaze points. Model-based methods require high-resolution images to obtain a clear eyeball geometric model. These methods face significant challenges in outdoor environments and practical application scenarios. This paper proposes a model-based gaze-estimation algorithm using a low-resolution 3D TOF camera. This study uses infrared images instead of RGB images as input to overcome the impact of varying illumination intensity in the environment on gaze estimation. We utilized a trained YOLOv8 neural network model to detect eye landmarks in captured facial images. Combined with the depth map from a time-of-flight (TOF) camera, we calculated the 3D coordinates of the canthus points of a single eye of the subject. Based on this, we fitted a 3D geometric model of the eyeball to determine the subject’s gaze angle. Experimental validation showed that our method achieved a root mean square error of 6.03° and 4.83° in the horizontal and vertical directions, respectively, for the detection of the subject’s gaze angle. We also tested the proposed method in a real car driving environment, achieving stable driver gaze detection at various locations inside the car, such as the dashboard, driver mirror, and the in-vehicle screen. Full article
Show Figures

Figure 1

Figure 1
<p>The overall process of the proposed model-based 3D gaze estimation using a TOF camera method. The green arrow represents the subject’s gaze direction.</p>
Full article ">Figure 2
<p>The partial effectiveness of data augmentation.</p>
Full article ">Figure 3
<p>The eye region and landmark detection model trained on the IRGD dataset using YOLOv8 shows the detection effect on the subject’s gaze image (<b>a</b>). The landmark detection model outputs 7 target points for a single-eye image of the subject (<b>b</b>): 1—Left eye corner point; 2—First upper eyelid point; 3—Second upper eyelid point; 4—Right eye corner point; 5—First lower eyelid point; 6—Second lower eyelid point; 7—Pupil point.</p>
Full article ">Figure 4
<p>The subject maintained a head pose angle of 0° in both the horizontal and vertical directions and performed a series of coherent lizard movements. The green arrow indicates the ground-truth gaze direction, while the red arrow represents the final gaze direction obtained using the eyeball center calculation method proposed in [<a href="#B30-sensors-24-01070" class="html-bibr">30</a>]. As the subject’s gaze angle gradually increased, the deviation between the gaze angle calculated by this eyeball center localization method and the ground-truth gaze angle began to increase. <a href="#sensors-24-01070-t001" class="html-table">Table 1</a> shows the results of our calculations.</p>
Full article ">Figure 5
<p>Eight marked points are manually annotated on the image of the subject’s single eye. These points are randomly distributed on the sclera of the eye, not the cornea. We use these eight 3D coordinate points to fit the eyeball model and solve for the 3D coordinates of the eyeball center and the radius of the eyeball.</p>
Full article ">Figure 6
<p>Eye detail images taken by the TOF camera at a distance of 200 mm–500 mm from the subject. The experiment is divided into two scenarios: the subject not wearing myopia glasses (<b>top</b>) and wearing glasses (<b>bottom</b>). The occlusion of glasses reduces some of the clarity and contrast of the subject’s eyes, but it is much less than the impact of a longer distance. When the distance between the subject and the TOF camera exceeds 300 mm, the only observable details in the eye area are the corners of the eyes and the pupil points.</p>
Full article ">Figure 7
<p>Creating a standard plane with multiple gaze points using a level’s laser line (<b>a</b>) and fixing a TOF camera on the plane (<b>b</b>).</p>
Full article ">Figure 8
<p>Sample pictures of the IRGD dataset proposed in this paper. We recorded gaze data at five different distances from the participant to the TOF camera, ranging from 200 mm to 600 mm. The TOF camera simultaneously collected IR images and depth images of the participant gazing at the gaze points on the standard plane. All participants performed natural eye movements and coherent head movements.</p>
Full article ">Figure 9
<p>The absolute values of the average head pose angles of the participants at 35 gaze points in the IRGD dataset. The maximum absolute angle of the participants’ head pose in the horizontal direction (yaw) is approximately 50°, while in the vertical direction (pitch), it is approximately 30°.</p>
Full article ">Figure 10
<p>Independent modeling and solution of eyeball center coordinates in horizontal (<b>a</b>) and vertical (<b>b</b>) gaze directions of subjects.</p>
Full article ">Figure 11
<p>Variation trends of the aspect ratio of eye appearance with vertical gaze angle in male (<b>a</b>) and female (<b>b</b>) participants. In male participants, the aspect ratio of eye appearance is less than 0.3 when the eyeball is looking down, while in female participants, the aspect ratio of eye appearance is less than 0.4 when the eyeball is looking down.</p>
Full article ">Figure 12
<p>The drawback of the inability to extract pupil point depth values from the depth image of the TOF camera. For gaze images at certain special angles, the pupil point can be observed on its IR image (<b>a</b>), but due to the absorption of infrared light by the pupil, a ‘black hole’ appears at the position of the pupil point on the corresponding depth image (<b>b</b>) of the IR image.</p>
Full article ">Figure 13
<p>Schematic diagram of the calibration process for individual-specific eyeball parameters of the subject.</p>
Full article ">Figure 14
<p>Calibration results of eyeball parameters for three subjects. We obtained the optimal eyeball structure parameters <math display="inline"><semantics> <mrow> <mrow> <mo>(</mo> <mrow> <msub> <mi>R</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>d</mi> <mn>1</mn> </msub> </mrow> <mo>)</mo> </mrow> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mrow> <mo>(</mo> <mrow> <msub> <mi>R</mi> <mn>2</mn> </msub> <mo>,</mo> <msub> <mi>d</mi> <mn>2</mn> </msub> </mrow> <mo>)</mo> </mrow> </mrow> </semantics></math> for 3 subjects through 10 calibrations, each involving gazing at 20 gaze points. At the same time, we calculated the mean absolute deviation between the gaze angles in the horizontal direction (blue) and vertical direction (orange) computed from this set of parameters and the ground-truth angles.</p>
Full article ">Figure 15
<p>Experiment results on calculating the average pupil depth information and corresponding ground-truth values in horizontal and vertical gaze directions for the male group (<b>a</b>) and female group (<b>b</b>).</p>
Full article ">Figure 16
<p>Results of the subject’s gaze detection. Column (<b>a</b>) presents the original gaze images of the subject, column (<b>b</b>) shows the results of eye landmark detection based on YOLOv8, and column (<b>c</b>) visualizes the subject’s gaze direction. The green arrow indicates the gaze direction detected by our model.</p>
Full article ">Figure 17
<p>Gaze angle detection results of male and female subject groups using the gaze-estimation method proposed in this study. Specifically, (<b>a</b>) represents the horizontal gaze results of the male group, (<b>b</b>) shows the vertical gaze results of the male group. (<b>c</b>) illustrates the horizontal gaze results of the female group, and (<b>d</b>) presents the vertical gaze results of the female subjects.</p>
Full article ">Figure 17 Cont.
<p>Gaze angle detection results of male and female subject groups using the gaze-estimation method proposed in this study. Specifically, (<b>a</b>) represents the horizontal gaze results of the male group, (<b>b</b>) shows the vertical gaze results of the male group. (<b>c</b>) illustrates the horizontal gaze results of the female group, and (<b>d</b>) presents the vertical gaze results of the female subjects.</p>
Full article ">Figure 18
<p>Comparative accuracy results of our proposed gaze-estimation model with other state-of-the-art models in infrared gaze test images.</p>
Full article ">Figure 19
<p>Detection results of driver’s partial gaze points in the interior of a Toyota business SUV. Green arrows indicate the driver’s gaze direction detected by our gaze-estimation model.</p>
Full article ">Figure 20
<p>Mean absolute error between the detected driver’s gaze angles and ground-truth angles at various gaze points inside the car.</p>
Full article ">Figure 21
<p>Detection effect of existing state-of-the-art gaze-estimation methods on the IRGD dataset proposed in this study, with arrows and lines indicating the predicted gaze direction of the subject by each model.</p>
Full article ">
22 pages, 7968 KiB  
Article
Ship-Fire Net: An Improved YOLOv8 Algorithm for Ship Fire Detection
by Ziyang Zhang, Lingye Tan and Robert Lee Kong Tiong
Sensors 2024, 24(3), 727; https://doi.org/10.3390/s24030727 - 23 Jan 2024
Cited by 6 | Viewed by 2309
Abstract
Ship fire may result in significant damage to its structure and large economic loss. Hence, the prompt identification of fires is essential in order to provide prompt reactions and effective mitigation strategies. However, conventional detection systems exhibit limited efficacy and accuracy in detecting [...] Read more.
Ship fire may result in significant damage to its structure and large economic loss. Hence, the prompt identification of fires is essential in order to provide prompt reactions and effective mitigation strategies. However, conventional detection systems exhibit limited efficacy and accuracy in detecting targets, which has been mostly attributed to limitations imposed by distance constraints and the motion of ships. Although the development of deep learning algorithms provides a potential solution, the computational complexity of ship fire detection algorithm pose significant challenges. To solve this, this paper proposes a lightweight ship fire detection algorithm based on YOLOv8n. Initially, a dataset, including more than 4000 unduplicated images and their labels, is established before training. In order to ensure the performance of algorithms, both fire inside ship rooms and also fire on board are considered. Then after tests, YOLOv8n is selected as the model with the best performance and fastest speed from among several advanced object detection algorithms. GhostnetV2-C2F is then inserted in the backbone of the algorithm for long-range attention with inexpensive operation. In addition, spatial and channel reconstruction convolution (SCConv) is used to reduce redundant features with significantly lower complexity and computational costs for real-time ship fire detection. For the neck part, omni-dimensional dynamic convolution is used for the multi-dimensional attention mechanism, which also lowers the parameters. After these improvements, a lighter and more accurate YOLOv8n algorithm, called Ship-Fire Net, was proposed. The proposed method exceeds 0.93, both in precision and recall for fire and smoke detection in ships. In addition, the [email protected] reaches about 0.9. Despite the improvement in accuracy, Ship-Fire Net also has fewer parameters and lower FLOPs compared to the original, which accelerates its detection speed. The FPS of Ship-Fire Net also reaches 286, which is helpful for real-time ship fire monitoring. Full article
Show Figures

Figure 1

Figure 1
<p>Structure of the YOLOv8n network.</p>
Full article ">Figure 2
<p>The architecture of the C2F-GhostNetV2 block.</p>
Full article ">Figure 3
<p>The architecture of the GhostNetV2 bottleneck and DFC attention.</p>
Full article ">Figure 4
<p>The architecture of SCConv integrated with a SRU and a CRU.</p>
Full article ">Figure 5
<p>The architecture of the Spatial Reconstruction Unit (SRU).</p>
Full article ">Figure 6
<p>The architecture of the channel reconstruction unit (CRU).</p>
Full article ">Figure 7
<p>Schematic of an omni-dimensional dynamic convolution.</p>
Full article ">Figure 8
<p>The architecture of proposed model (Ship-Fire Net).</p>
Full article ">Figure 9
<p>The pre-process before labeling using Visual Similarity Duplicate Image Finder.</p>
Full article ">Figure 10
<p>Example of the ship fire and smoke datasets (outside).</p>
Full article ">Figure 11
<p>Example of the ship fire and smoke datasets (inside).</p>
Full article ">Figure 12
<p>Visualization results of the analysis of the dataset. (<b>a</b>) Distribution of object centroid locations; (<b>b</b>) distribution of object sizes.</p>
Full article ">Figure 13
<p>Precision–epoch and recall-epoch curve.</p>
Full article ">Figure 14
<p>Results of Ship-Fire Net and YOLOv8n for outside images.</p>
Full article ">Figure 15
<p>Results of Ship-Fire Net and YOLOv8n for inside images.</p>
Full article ">
18 pages, 4863 KiB  
Article
Research on Pedestrian Crossing Decision Models and Predictions Based on Machine Learning
by Jun Cai, Mengjia Wang and Yishuang Wu
Sensors 2024, 24(1), 258; https://doi.org/10.3390/s24010258 - 1 Jan 2024
Cited by 2 | Viewed by 2207
Abstract
Systematically and comprehensively enhancing road traffic safety using artificial intelligence (AI) is of paramount importance, and it is gradually becoming a crucial framework in smart cities. Within this context of heightened attention, we propose to utilize machine learning (ML) to optimize and ameliorate [...] Read more.
Systematically and comprehensively enhancing road traffic safety using artificial intelligence (AI) is of paramount importance, and it is gradually becoming a crucial framework in smart cities. Within this context of heightened attention, we propose to utilize machine learning (ML) to optimize and ameliorate pedestrian crossing predictions in intelligent transportation systems, where the crossing process is vital to pedestrian crossing behavior. Compared with traditional analytical models, the application of OpenCV image recognition and machine learning methods can analyze the mechanisms of pedestrian crossing behaviors with greater accuracy, thereby more precisely judging and simulating pedestrian violations in crossing. Authentic pedestrian crossing behavior data were extracted from signalized intersection scenarios in Chinese cities, and several machine learning models, including decision trees, multilayer perceptrons, Bayesian algorithms, and support vector machines, were trained and tested. In comparing the various models, the results indicate that the support vector machine (SVM) model exhibited optimal accuracy in predicting pedestrian crossing probabilities and speeds, and it can be applied in pedestrian crossing prediction and traffic simulation systems in intelligent transportation. Full article
Show Figures

Figure 1

Figure 1
<p>Camera angles at four data collection sites: (<b>a</b>) Shandong Road–Songjiang Road; (<b>b</b>) Hongyun Road–Zhelin Street; (<b>c</b>) Zhangqian Road–Hongjin Road; and (<b>d</b>) Huadong Road–Qianshan Road.</p>
Full article ">Figure 2
<p>Installation process of cameras for data collection.</p>
Full article ">Figure 3
<p>Image recognition interface.</p>
Full article ">Figure 4
<p>Vehicle speed and distance statistics. (<b>a</b>) Statistics of the elderly; (<b>b</b>) statistics of middle-aged people; (<b>c</b>) statistics of children.</p>
Full article ">Figure 5
<p>Pedestrian crossing prediction methods and procedures.</p>
Full article ">Figure 6
<p>Structure diagram of decision tree.</p>
Full article ">Figure 7
<p>The structure of multi-layer perceptron.</p>
Full article ">Figure 8
<p>ROC curves for each machine learning model. (<b>a</b>) Decision tree; (<b>b</b>) SVM; (<b>c</b>) MLP; and (<b>d</b>) Naïve Bayes.</p>
Full article ">Figure 9
<p>SHAP analysis conducted on the crossing probability prediction model based on the SVM.</p>
Full article ">Figure 10
<p>Probability model of pedestrians’ crossing behaviors. (<b>a</b>) Crossing probability model for the elderly; (<b>b</b>) crossing probability model for middle−aged adult pedestrians; (<b>c</b>) crossing probability model for children.</p>
Full article ">Figure 11
<p>SHAP analysis based on the support vector regression (SVR) model.</p>
Full article ">Figure 12
<p>Crossing speed model of the pedestrians. (<b>a</b>) Crossing speeds of elderly individuals; (<b>b</b>) crossing speeds of middle-aged individuals; (<b>c</b>) crossing speeds of children.</p>
Full article ">
19 pages, 5724 KiB  
Article
Image-Enhanced U-Net: Optimizing Defect Detection in Window Frames for Construction Quality Inspection
by Jorge Vasquez, Tomotake Furuhata and Kenji Shimada
Buildings 2024, 14(1), 3; https://doi.org/10.3390/buildings14010003 - 19 Dec 2023
Viewed by 1465
Abstract
Ensuring the structural integrity of window frames and detecting subtle defects, such as dents and scratches, is crucial for maintaining product quality. Traditional machine vision systems face challenges in defect identification, especially with reflective materials and varied environments. Modern machine and deep learning [...] Read more.
Ensuring the structural integrity of window frames and detecting subtle defects, such as dents and scratches, is crucial for maintaining product quality. Traditional machine vision systems face challenges in defect identification, especially with reflective materials and varied environments. Modern machine and deep learning (DL) systems hold promise for post-installation inspections but face limitations due to data scarcity and environmental variability. Our study introduces an innovative approach to enhance DL-based defect detection, even with limited data. We present a comprehensive window frame defect detection framework incorporating optimized image enhancement, data augmentation, and a core U-Net model. We constructed five datasets using cell phones and the Spot Robot for autonomous inspection, evaluating our approach across various scenarios and lighting conditions in real-world window frame inspections. Our results demonstrate significant performance improvements over the standard U-Net model, with a notable 7.43% increase in the F1 score and 15.1% in IoU. Our approach enhances defect detection capabilities, even in challenging real-world conditions. To enhance the generalizability of this study, it would be advantageous to apply its methodology across a broader range of diverse construction sites. Full article
Show Figures

Figure 1

Figure 1
<p>The framework of the window frame defect detection system (WFDD). The input comprises RGB images captured by the Spot Robot. The data augmentation module employs geometric operations and applies different image enhancement techniques. The preprocessing module is then employed to enhance the performance of the defect detection model. Within the detection module, defects are identified among all detected window frames, with the output showcasing U-Net-generated segmentation blobs.</p>
Full article ">Figure 2
<p>Example from Cellphone Dataset.</p>
Full article ">Figure 3
<p>Samples of Construction Site Dataset.</p>
Full article ">Figure 4
<p>Example from Lab-1 Dataset.</p>
Full article ">Figure 5
<p>Example from Lab-2 Dataset.</p>
Full article ">Figure 6
<p>Samples of Demo Site Dataset.</p>
Full article ">Figure 7
<p>Example of labeling.</p>
Full article ">Figure 8
<p>Comparative sample using the shadow removal technique.</p>
Full article ">Figure 9
<p>Comparative sample using the color neutralization technique.</p>
Full article ">Figure 10
<p>Comparative sample using the contrast enhancement technique.</p>
Full article ">Figure 11
<p>Comparative sample using the intensity neutralization technique.</p>
Full article ">Figure 12
<p>Comparative sample using the CLAHE technique.</p>
Full article ">
16 pages, 5787 KiB  
Article
The Spatio-Temporal Patterns of Regional Development in Shandong Province of China from 2012 to 2021 Based on Nighttime Light Remote Sensing
by Hongli Zhang, Quanzhou Yu, Yujie Liu, Jie Jiang, Junjie Chen and Ruyun Liu
Sensors 2023, 23(21), 8728; https://doi.org/10.3390/s23218728 - 26 Oct 2023
Cited by 1 | Viewed by 1856
Abstract
As a major coastal economic province in the east of China, it is of great significance to clarify the temporal and spatial patterns of regional development in Shandong Province in recent years to support regional high-quality development. Nightlight remote sensing data can reveal [...] Read more.
As a major coastal economic province in the east of China, it is of great significance to clarify the temporal and spatial patterns of regional development in Shandong Province in recent years to support regional high-quality development. Nightlight remote sensing data can reveal the spatio-temporal patterns of social and economic activities on a fine pixel scale. We based the nighttime light patterns at three spatial scales in three geographical regions on monthly nighttime light remote sensing data and social statistics. Different cities and different counties in Shandong Province in the last 10 years were studied by using the methods of trend analysis, stability analysis and correlation analysis. The results show that: (1) The nighttime light pattern was generally consistent with the spatial pattern of construction land. The nighttime light intensity of most urban, built-up areas showed an increasing trend, while the old urban areas of Qingdao and Yantai showed a weakening trend. (2) At the geographical unit scale, the total nighttime light in south-central Shandong was significantly higher than that in eastern and northwest Shandong, while the nighttime light growth rate in northwest Shandong was significantly highest. At the urban scale, Liaocheng had the highest nighttime light growth rate. At the county scale, the nighttime light growth rate of counties with a better economy was lower, while that of counties with a backward economy was higher. (3) The nighttime light growth was significantly correlated with Gross Domestic Product (GDP) and population growth, indicating that regional economic development and population growth were the main causes of nighttime light change. Full article
Show Figures

Figure 1

Figure 1
<p>Data processing flow chart.</p>
Full article ">Figure 2
<p>Land cover in Shandong province in 2020.</p>
Full article ">Figure 3
<p>Spatial pattern of mean Nighttime Light in Shandong province from April 2012 to October 2021.</p>
Full article ">Figure 4
<p>Spatio-temporal changes in Nighttime Light in Shandong province from April 2012 to October 2021.</p>
Full article ">Figure 5
<p>Key areas of Nighttime Light change in Shandong province from April 2012 to October 2021.</p>
Full article ">Figure 6
<p>Stability pattern of Nighttime Light in Shandong province from April 2012 to October 2021.</p>
Full article ">
17 pages, 45348 KiB  
Article
Enhanced 3D Pose Estimation in Multi-Person, Multi-View Scenarios through Unsupervised Domain Adaptation with Dropout Discriminator
by Junli Deng, Haoyuan Yao and Ping Shi
Sensors 2023, 23(20), 8406; https://doi.org/10.3390/s23208406 - 12 Oct 2023
Cited by 1 | Viewed by 1236
Abstract
Data-driven pose estimation methods often assume equal distributions between training and test data. However, in reality, this assumption does not always hold true, leading to significant performance degradation due to distribution mismatches. In this study, our objective is to enhance the cross-domain robustness [...] Read more.
Data-driven pose estimation methods often assume equal distributions between training and test data. However, in reality, this assumption does not always hold true, leading to significant performance degradation due to distribution mismatches. In this study, our objective is to enhance the cross-domain robustness of multi-view, multi-person 3D pose estimation. We tackle the domain shift challenge through three key approaches: (1) A domain adaptation component is introduced to improve estimation accuracy for specific target domains. (2) By incorporating a dropout mechanism, we train a more reliable model tailored to the target domain. (3) Transferable Parameter Learning is employed to retain crucial parameters for learning domain-invariant data. The foundation for these approaches lies in the H-divergence theory and the lottery ticket hypothesis, which are realized through adversarial training by learning domain classifiers. Our proposed methodology is evaluated using three datasets: Panoptic, Shelf, and Campus, allowing us to assess its efficacy in addressing domain shifts in multi-view, multi-person pose estimation. Both qualitative and quantitative experiments demonstrate that our algorithm performs well in two different domain shift scenarios. Full article
Show Figures

Figure 1

Figure 1
<p>Depiction of various datasets utilized for multi-view, multi-person 3D pose estimation. Image examples are sourced from Panoptic [<a href="#B9-sensors-23-08406" class="html-bibr">9</a>], Campus [<a href="#B10-sensors-23-08406" class="html-bibr">10</a>], and Shelf [<a href="#B10-sensors-23-08406" class="html-bibr">10</a>], respectively. While all datasets feature scenes with clean backgrounds, they differ in aspects such as clothing, resolution, lighting, body size, and more. These visual disparities among the datasets complicate the task of applying pose estimation models across different domains.</p>
Full article ">Figure 2
<p>An overview of our Domain Adaptive VoxelPose model. An adversarial training method is used to train the domain classifier. The selection of certain discriminators is determined by a probability <math display="inline"><semantics> <msub> <mi>δ</mi> <mi>k</mi> </msub> </semantics></math>. The network performs a robust positive update for the transferable parameters and performs a negative update for the untransferable parameters.</p>
Full article ">Figure 3
<p>The original adversarial framework (<b>a</b>) is extended to incorporate multiple adversaries. In this enhancement, certain discriminators are probabilistically omitted (<b>b</b>), resulting in only a random subset of feedback (depicted by the arrows) being utilized by the feature extractor at the end of each batch.</p>
Full article ">Figure 4
<p>Estimated 3D poses and their corresponding images in an outdoor environment (Campus Dataset). Different colors represent different people detected. The penultimate column is the output result of the original voxelpose, which has misestimated the person. The last column shows the estimated 3D poses by our algorithm.</p>
Full article ">Figure 5
<p>Cross-domain qualitative comparison between our method and other state-of-the-art multi-view multi-person 3D pose estimation algorithms. The evaluated methods were trained on the Panoptic dataset and validated on the Campus dataset. Different colors represent different people detected, with red indicating the ground truth.</p>
Full article ">Figure 6
<p>Estimated 3D poses and their corresponding images in an indoor social interaction environment (Shelf Dataset). The penultimate column is the output result of the original voxelpose, which has misestimated person. The last column shows the estimated 3D poses by our algorithm.</p>
Full article ">Figure 7
<p>Cross-domain qualitative comparison between our method and other state-of-the-art multi-view multi-person 3D pose estimation algorithms in the Shelf dataset. the evaluated methods were trained on the Panoptic dataset and validated on the Shelf dataset.</p>
Full article ">Figure 8
<p>An illustration of the Average Percentage of Correct Parts (PCP3D) on the Campus and Shelf datasets, with the Dropout Rate (d) plotted on the horizontal axis and PCP3D on the vertical axis. The methods are distinguished by color: the red line for the DA baseline method, the yellow line for the dropout DA method, and the blue line for our proposed full method with TransPar.</p>
Full article ">Figure 9
<p>The Average Percentage of Correct Parts (PCP3D), based on Wider Ratios of Transferable Parameters, on Campus and Shelf dataset.</p>
Full article ">
Back to TopTop